Why Canonicalization Is Important for Your Online Content

Austin Mullins
Austin Mullins  |  July 1, 2019

If you’ve ever met someone who’s a true superfan of something like Star Wars or the Grateful Dead, you know they’re deeply concerned with who created each song, movie, or book.

When it comes to content published on the web, search engines are similarly concerned with what’s original, and much like your college professor, they don’t take kindly to plagiarism.

Now, obviously you shouldn’t be stealing anyone else’s content - that’s no way to build a lasting and impactful brand. The trouble comes when you need to display the same content on multiple pages to provide the best experience for users. It turns out that there are a lot of reasons this could happen (which we will cover).

But regardless of your motives, duplicate content is a sure-fire way to hurt your ranking on search engines.

Lucky for us, Google, Yahoo, and Microsoft introduced a simple solution to this problem way back in 2009: Canonicalization.

What is canonicalization?

Basically, canonicalization is a simple method for letting search engines know which URL is the original or “master” copy of a particular piece of content.

Adding a canonical tag with a link to the original content marks the “duplicate” page linking to the original content as republished or derivative, and let’s them know it’s not worth indexing.

In other words, it keeps Googlebot and other search engines on track when it comes to directing search queries towards the original source of information.

Why use a canonical tag?

Mostly to avoid issues related to duplicate content.

When search engine crawlers see your post about cool stuff on http://example.com/post-about-cool-stuff, and then the same article republished on Medium.com (or some other use case where the same content is displayed multiple places), they try to determine which is the original and deindex the rest.The reason for this is actually in the best interest of users. You wouldn’t want to click multiple results only to read the same content. When performing a search, it’s natural to expect each result to be unique.

Unfortunately, instead of sending you a nice email asking you to clarify, crawlers which uncover duplicate content and try to determine which is the original on their own, and often get it wrong. Better to have a preference in place.

Luckily, like humans, search engines speak a certain language. A canonical tag tells them that http://example.com/post-about-cool-stuff is “canon”  (the original version), and that the republished version is both there on purpose and of secondary importance.

What does it look like?

Typically, if you were to view the page source of a derivative post that had been properly canonicalized, you would notice a link with the tag rel='canonical.'

Here’s an example of what that looks like:

canonical tag

Here’s another example from the Conversion Creatives site, where we republished a guest post we wrote on Copyblogger. While republishing a post that originally appeared on another publication is a common use for canonical links, there are quite a few reasons you might need this tool.

Uses for canonical tags

According to Matt Cutts, 25-30% of all content on the internet is duplicate content.

Some of this duplication is intentional. Some of isn’t. Some is malicious. Most isn’t.

You could have content which repeats across multiple parts of a website for a variety of reasons, such as a URL modified by filtering, location specific pages, or product options.

For the vast majority of these situations, a link with the rel='canonical' tag is a perfectly acceptable solution.

Republishing original content

Let’s say we did an outreach campaign pitching editors on republishing our great new content marketing article. Even though we own the rights to this content, we still have to be careful about directing search engines as to which version we want to rank.

TIP: If you're curious about other SEO strategies that could help your content marketing campaigns, looking into SEO software may be the next step.

See the Easiest-to-Use SEO Tools →

In order to avoid our new publishing partner potentially being viewed as the original source, leading to our original post being viewed as duplicate and thereby becoming deindexed, we need to give Google some direction.

To avoid this potential conflict, it’s important to make sure the website republishing your content uses a rel='canonical' tag linking back to the original URL. As long as you do that, you’re free to use the content any way you want, without fear of duplicate content leading to your original being deindexed.

This is just one example, there are many other reasons for duplicate content that aren’t as straightforward as the humble guest post.

Product variations

Another common reason for duplicate content is unique to ecommerce sites. Many products offer a variety of different options (such as different colors of a shoe) which will likely have the same name and description.

For instance, let’s say you start searching for a blue men’s shoe on Shoe Carnival’s website. When you go to the URL that displays men’s shoes filtered into only the color blue (https://www.shoecarnival.com/mens/blue/), you will see a rel ='canonical' tag in the source page which links the page specific to blue men’s shoes back to their men’s category page.

canonical example

Remember, blue men’s shoes are simply a subset of men’s shoes - in there amongst the blacks and highlighter yellows and other options. There’s no need for the average shopper to worry about the fact that they may have the same description, but Googlebot sees the descriptions repeating and will only rank one variation.

In order to make sure google chooses our preferred version for any applicable keywords, the “duplicate” page(s) will need to be canonicalized.

Mobile-specific pages

The same goes for pages that are optimized for mobile devices, such as Google AMP.

Again, this is duplicate content for the user’s benefit — they get faster loading pages on smaller devices with limited internet bandwidth. However, since this plugin displays the same information on a different domain, it is a true duplicate.

So, while the same content may appear on the AMP and the regular HTML page, it’s a variant of the original URL. No worries, though. Add a canonical tag to the AMP page linking back to the original, and search engines get the hint.

Location-specific pages

If you have a business with a physical presence (such as a storefront) in multiple locations, you may have a good reason to create multiple location-specific pages.

Usually, these exist to provide hours of operation, address information, and anything else relevant to local customers in that area who might visit the location. However, because this information is likely to be similar, crawlers may determine this is duplicate content.

For example, if you have a restaurant chain that directs people to a unique page for each of its five locations, you probably have the same information about the menu, hours, pricing, and reservations on all of them. The only thing difference is the location.

Now, you could try to work your way around the duplicate content issue by posting a paragraph of fun facts about that location, or you could just use a canonical tag that points back to a single page, such as one that lists all your locations.

It’s worth noting that this does not apply to pages which display the same content but in a different language. For example, you might be a business with customers in the United States and Mexico. Obviously you’ll want to have a specific landing page for both sets of customers, but this isn’t considered duplicate content. Instead of using a canonical link for pages appearing in multiple languages, you’d want to use an hreflang attribute to communicate that this is simply a localized version.

Tracking add-ons to URLs

Sometimes you may have a variation of a URL used for tracking purposes. These URL Parameters are often used to gauge the effectiveness of ad campaigns, but because they present the same information on another URL are considered duplicate content. Often, the normal version of the content will look something like this: www.example.com/article_about_stuff

While the modified URL used for tracking will look something like this: www.example.com/?ref=medium&utm_campaign=article_about_stuff

There’s no reason to stop using URL parameters to track your campaigns, but you need to be sure to add a "rel=canonical" to the tracking page linking back to the shorter, simpler URL.

Sorting options for content directories or lists

Similar to the blue shoes example, filtering search options on a website (by price, relevance, size, etc.) often results in a modified URL.

Revisiting our Shoe Carnival example, we can see that if I modify my search for men’s shoes to focus on blue shoes in a size 9.5 between $40 and $50, we end up with the following URL:https://www.shoecarnival.com/mens/9.5/blue/?pmax=50.00&pmin=40.00 Again, because all the same content for this query also appears under the broader “Men’s Shoes” category, all filtered variations should be canonicalized back to the original. Otherwise, you may find that search engines choose the filtered version, limiting the scope of keywords you can rank for with your product listings.

Should you use self-referencing canonicals?

While it isn’t necessary to use canonicalization across every page on your site, there are some cases where you might want to as a preventative measure.

For instance, if you make extensive use of URL parameters, it’s probably a good idea to also have a self-referencing canonical on the core page.

In other words, it can’t hurt. If you have a rel= 'canonical' on a URL pointing to the same URL, you’ll be doubly sure that Google knows the original.

When to use a redirect versus a canonical tag?

In some cases, it’s better for visitors if you simply forward them to the original content, rather than republishing with a canonical tag.

A 301 redirect allows you to redirect users from the duplicate URL to the preferred URL, rather than posting the same content on that page.

For instance, if you’ve migrated from one domain to another, a 301 (permanent) redirect makes far more sense. You can do this at the domain level, then submit a change of address within Google Search Console to make sure the new pages are properly recognized.

HTTP versus HTTPS

Google’s preference for sites which utilize HTTPS by default is well documented. Having both a secure and non-secure version of the website available will of course mean that the same content is available on both.

In theory, you could use a canonical tag to tackle this, but that would be improper use.Instead, You’re far better off making everyone go through the secure version of the site with redirects, especially given Google’s strong preference for this to be the default.To accomplish this, it’s best to set a hard rule at the server level to automatically set up 301 redirects for all nonsecure pages to redirect users to their secure counterparts, keeping visitors safe and search engines happy.

WWW. or not at the beginning of URLs

Although this doesn’t pose the same security threats, it’s still a good idea to do a 301 redirect (at the server level) to the preferred version of the website.

Adding a canonical link to the “original” would still solve the duplicate content problem, but why split traffic between two identical versions of your website? It’s best to keep things simple.

Other methods to tackle duplicate content issues

In addition to canonical links and 301 redirects, you can also add a noindex tag to the robots.txt file of the duplicate page. This tells crawlers that they shouldn’t crawl or index the page, preventing any possibility of duplicate content issues.

This can be a more scalable solution if you have many pages that would otherwise need canonical links. That being said, you need to be careful with this tool. If search engines can’t crawl on your website because you mistakenly listed too many pages, they won’t be able to serve those pages in their search results.

For more information on when and how to use them, check out Google’s support page on using noindex.

Final thoughts

Without proper attention, duplicate content can be a real problem for your content marketing efforts

But, as the old adage goes, there’s more than one way to keep search engine robots directing traffic to the preferred form of content.

In many cases, canonicalization is the ideal tool to direct search engines towards original content, while still being able to display it on your site however you want.

Still, every situation is different. For each scenario, you’ll have to decide which tool is most appropriate to balance optimizing for search engines as well as your user experience.

Want to learn how search engine optimization (SEO) and search engine marketing (SEM) can impact your business? Check out this guide on SEO vs SEM.

Austin Mullins
Author

Austin Mullins

Austin Mullins is the founder of Conversion Creatives, a content marketing & SEO agency focused on helping innovative companies grow by creating, distributing, and ranking world-class content.