Explore our Search Engine Optimization (SEO) Toolkit

SEO: Avoiding Duplicate Content

By Chris Soames 09 Jul, 2012

Essential

Index inclusion

Explore our Search Engine Optimization (SEO) Toolkit

4 techniques to reduce duplicate site content

Although we still hear a lot of recommendations on how to optimise pages and the importance of links from other sites to improve rankings, we don't hear so many recommendations about reducing the amount of duplicate content. This seems to be a neglected area of SEO advice, so this refresher highlights some of the techniques to avoid duplicate content.

What is duplicate content?

Duplicate content is where different pages on a site are interpreted as similar by Google. As a result, pages can be discarded or down-weighted in Google's index so that they won't rank. In its webmaster tools guidance, Google says:

Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin. Examples of non-malicious duplicate content could include: discussion forums that can generate both regular and stripped-down pages targeted at mobile devices; store items shown or linked via multiple distinct URLs and printer-only versions of web pages.

We would add that, more importantly for site owners, category, product or service pages can be interpreted as identical because the pages are too similar. You may be able to see this within the analytics if you have some product pages that aren't attracting any natural search traffic.

To reduce this problem here are my recommendations for dealing with duplicate content that you could check with your agency / in-house team. Numbers 1, 3 and 4 are most relevant typically

1. Show Google pages have distinct content

This is straightforward, but you have to make sure the page editors creating the content are aware of how to do this. It's normal on-page optimisation good practice. You should set unique titles, meta descriptions, headings and body copy for each page you want to rank. Each page should have a focus subject or main keyword you want to target. Brief your copywriter to stay focussed on that subject is key to ensuring your pages are unique.

To check that your pages don't have problems with duplicate titles or descriptions you should check out the HTML improvements section in Google Webmaster Tools which will flag pages with problems of identical titles or desctiptions for you.

Here's an example of where ASOS are doing a good job of keeping page titles and descriptions unique:

While the products are very similar ASOS has kept page titles and descriptions different. To arrive at these results I searched specifically on Asos.com for "Ralph Lauren Rugby Top".

2. Robots.txt – exclude crawlers from duplicate content

An established tool in the SEO toolkit, robots.txt is a powerful text file located in the root of your website. By using the robots.txt file, site owners can instruct specific search engines or all of them to ignore specific pieces of content. The example below would block any files within the print-versions directory from been indexed by Google.

User-agent: Googlebot

Disallow: /print-versions/

If you choose to you can utilise the Meta Tag to specify whether a particular page should be indexable to search engines, an example of what you would include in your header would be:

<meta name="robots" content="noindex,nofollow" >

You can read more about utilising the meta tag here if you need to.

3. 301 permanent redirects

301 redirects are a common way of making sure visitors are directed to the right page, for example, if a page is turned off because the campaign has finished you can redirect people to your homepage so they do not see an error page and bounce off. A 301 also tells search engines that this page has now permanently moved to a new location. This is of great use if you need to move from older versions or duplicate versions of webpages. For example, sometimes sites have similar, duplicate versions of the homepage. A 301 will ensure Google finds the new page and also passes on as much of that pages rank it feels relevant.

Another common use of a 301 redirect is to redirect non "www" traffic as per the example below:

http://example.com > http://www.example.com

4. Use rel=“canonical” or exclude parameters in Google Webmaster Tools

The canonical tag is becoming a widely used method for instructing search engines of duplicate pages & more specifically which URL is the primary. I wrote specifically about implementing canonical tags on ecommerce sites previously.

The tag should look like this in the HEAD section of your html page specifying the preferred version of duplicates.

<link rel=”canonical” href=”http://www.example.com/product-1.html” />

The tag alerts search engines that it is a duplicate page & also informs them where the primary page is located!

A more recent feature that Google has launched within the Webmaster Tools area is the ability to inform Google of URL parameters to exclude on your website where the page content is similar. This will then let you determine whether Google should index the content or not, see example below:

This solution is often particularly useful on Ecommerce sites where duplicate content can occurs when sites make the same content available via different URLs in a faceted search for selecting products or by using session IDs or affiliate parameters, like this:

http://www.example.com/products/women/dresses/green.htm
http://www.example.com/products/women?category=dresses&color=green
http://example.com/shop/index.php?product_id=32&highlight=green+dress&cat_id=1&sessionid=123&affid=431

In these cases you would specify "category", "product_id", "highlight", "sessionid" and "affid" as parameters.

From this post you can see there are a quite a few way duplicate content can occur and a range of tools to help manage it. So next time you're speaking to them, ask whoever is responsible for your SEO about the steps they take to minimise duplicate content.

By Chris Soames

Chris Soames is a Smart Insights blogger and consultant, he has worked in digital marketing for over 6 years with the last few years managing international web strategies for a leading travel brand. Now the Commercial Director at First 10, an Integrated marketing agency, he helps clients get clarity on their marketing strategy and create campaigns engineered to engage with their consumers to help drive sell-through. Most of all, Chris enjoys working with talented people who want to create great (& commercial) things not just tick boxes.

Search engine optimization (SEO) Toolkit

Turbocharge your results with this toolkit containing 7 resources

2024 comparison of Google organic clickthrough rates (SEO CTR) by ranking position

Keyword research data reveals the SEO CTR rewards for top of SERP rankings in Google. Inform your SEO marketing strategy with the latest trends in 2024 You often hear company owners saying “I want to rank top” for their target …..

SEO analytics

Essential

By Gabrielle Wright 02 Sep 2020

65% of marketers LOVE the new Google Search Console Insights

Google Search Console Insights combines Google Search Console and Google Analytics to introduce a new SEO platform for content marketers If you regularly use Google Analytics and/or Google Search Console tools to evaluate and optimize your marketing (and if you …..

Content marketing strategy

Essential

By Expert commentator 20 Jul 2020

10 KPIs to track your SEO performance and drive better results

Monitoring the right KPIs is the best way to avoid spending time and money on something that isn’t driving the expected profit Is your SEO strategy effective enough? How can you ensure your efforts drive tangible results? What can you …..

Search Engine Optimisation (SEO)

Essential

SEO: Avoiding Duplicate Content

4 techniques to reduce duplicate site content

What is duplicate content?

1. Show Google pages have distinct content

2. Robots.txt – exclude crawlers from duplicate content

3. 301 permanent redirects

4. Use rel=“canonical” or exclude parameters in Google Webmaster Tools

FREE marketing planning templates

Solutions to your marketing challenges

Expert advice by sector

Improve your digital marketing skills with our FREE guides and templates

2024 comparison of Google organic clickthrough rates (SEO CTR) by ranking position

65% of marketers LOVE the new Google Search Console Insights

10 KPIs to track your SEO performance and drive better results

SEO: Avoiding Duplicate Content

4 techniques to reduce duplicate site content

What is duplicate content?

1. Show Google pages have distinct content

2. Robots.txt – exclude crawlers from duplicate content

3. 301 permanent redirects

4. Use rel=“canonical” or exclude parameters in Google Webmaster Tools

FREE marketing planning templates

Solutions to your marketing challenges

Expert advice by sector

Improve your digital marketing skills with our FREE guides and templates

Join the Conversation

2024 comparison of Google organic clickthrough rates (SEO CTR) by ranking position

65% of marketers LOVE the new Google Search Console Insights

10 KPIs to track your SEO performance and drive better results