When you start to get more involved in SEO, sooner or later, you will encounter the problem of duplicate content. This is not a new issue; it has been around for quite some time, which means that several methods have been developed to address it. Here are a few of them.
Blocking Using robots.txt
- This is the oldest and most widely used method. The idea is that the robots.txt file blocks a specific domain or page, preventing Google’s search engine from finding and indexing it. The disadvantage of this solution is that if someone links to your blocked page, Google will not be able to follow that link. This makes it a less effective method for SEO optimisation, though it can still be useful if you simply want to block certain content, regardless of whether it is duplicated or not.
Meta Robots Noindex/Follow tag
- <meta name=”robots” content=”noindex,follow” />
- With this tag (noindex), we can prevent search engines from indexing a page, effectively avoiding duplication. The follow value tells the search engine to follow the links on the page. This method is particularly useful for blogs, where the primary content to be indexed is the blog itself, while secondary content, such as archived posts, can still be accessed by search engines. This ensures that older content remains available without causing duplication issues.
Redirect 301
- In most cases, the most effective solution for duplicate content is to use Redirect 301, which redirects search engines from the duplicate page to the original one. By merging pages with potentially strong page rankings into a single page, they will no longer compete with each other. Instead, their combined relevance will improve, which has a positive impact on SEO. Redirect 301 should be used when the redirection does not negatively impact user experience. Typical use cases include redirecting content without an IP address, index file, www, or non-www variations.
Canonical tag
- Another method to handle duplicate content is to use the rel=canonical tag. This tag achieves the same PageRank benefits as Redirect 301, but it is easier to implement.
The rel=canonical tag is placed within the HTML head section of the webpage and looks like this:
<link href=”hxxp://www.example.uk/subpage/” rel=”canonical” />
This tag tells search engines to treat the page as a copy of the www.example.uk/subpage URL and that all links and content on the duplicate page actually belong to the original page. The canonical tag is particularly useful for websites that use multiple categories and subcategories, where different URLs may lead to the same content.
Alternate link tag
- The alternate link tag functions similarly to the canonical tag, but it is primarily used for international or multilingual SEO.
<link rel=”alternate” hreflang=”en” href=”hxxp://www.example.com/something” />
<link rel=”alternate” hreflang=”en” href=”hxxp://www.example.uk/something” />
<link rel=”alternate” hreflang=”en” href=”hxxp://www.example.de/something” />
The alternate tag helps Google to identify the content that is relevant to the country you are in. The downside of this tag is that Google may treat some of our pages as duplicate content. It can be used for pages that target more than one country.
Google Search Console
- Google Search Console allows website owners to set their preferred domain and configure different URL parameters. The main drawback of this method is that it only affects Google. Search engines like Bing or Yahoo will not be influenced by these settings.
The methods described above are not just for handling duplicate content but can also be used more generally. If there is a subpage of your website that you do not want to be indexed by search engines, you should take the appropriate steps. Otherwise, the search engine may mistakenly treat the page as a 404 error.