Blog

How to Efficiently Solve for Duplicate Content

08.17.20 // Joyce Collarde

The majority of websites are dealing with duplicate content issues and too often I see recommendations such as “add unique content”. But what do you do when you have hundreds or thousands of duplicate or thin content issues? Let’s talk about efficiently addressing duplicate content errors, from bulk fixes to nitty-gritty solutions.

I like to think of fixing duplicate content as a funnel, starting from high-level, broad solutions and moving on to more hands-on fixes. 

 

I. De-Indexing Low-Value Pages

 

 

Low-value pages are all pages users have no interest in visiting, including:

These pages very often cause duplicate content issues because they don’t have enough content to be differentiated from one another.

If you’re using WordPress it’s very simple to de-index these low-value pages. Download the Yoast plugin if you haven’t already, then go to Search Appearance > Taxonomy > Categories. There, you can just switch the toggle under “show in search results” to “No”.

You can also block these paths in your robots.txt to prevent search engines from indexing them. Search pages will need to be disallowed in the robots.txt as well.

In my experience, these low category pages make up the vast majority of duplicate content issues, so hopefully, your number of duplicates will be much more manageable once you have taken that step.

 

II. Using Website-Level Redirects

 

A lot of duplicate content issues are caused by pages available under www vs non-www URLs, HTTP vs HTTPS, URLs with a trailing slash, and without a trailing slash, or pages available under /index and without /index.

First, decide on one source of truth for your URL format. For example, all URLs should follow these rules:

  • HTTPS
  • Ends with trailing slash
  • WWW
  • No index or php

Use an htaccess file to add website-level redirects to your preferred format. 

 

III. Unpublishing Outdated, Thin Content Blog Posts

 

If your website was created a long time ago, chances are you have many old, thin content blog posts that don’t necessarily align with your current strategy and are clogging up your crawl budget. I often see thin content flagged as duplicate.

I would recommend that you audit your blog posts to help you prune old content that isn’t performing to your standards. 

Here’s the method I follow:

  • Create a spreadsheet with the list of every blog post. Use Screaming Frog to get the list of blog URLs. 
  • Add columns for:
    • Word count
    • Sessions
    • % of new sessions
    • Bounce rate
    • Number of conversions
    • Conversion rate
    • Pages/sessions
    • Number of backlinks
  • Pull blog URL data from Google Analytics. Use the VLOOKUP function to add sessions, % of new sessions, bounce rate, number of conversions, conversion rate and pages/session for the past 12 months
  • Crawl your entire website with SEMRush’s backlink analytics or Moz’s link explorer. Using the VLOOKUP function again, add the number of external links to each blog post URL.
  • In Screaming Frog, select list mode, add your list of URLs and crawl. In the Content tab, you will find the word count for every page.
  • Decide on a baseline: how many organic sessions/year means that your blog post is performing well? How many conversions? What is a good bounce rate? What’s the acceptable word count?
  • Analyze your data and mark all underperforming blog posts as okay to unpublish.
  • After 90 days, if your SEO performance is steady, 301 redirect the unpublished blog posts.

IV. Adding Video Transcripts

 

Video pages are a common culprit behind duplicate content. An easy way to differentiate these pages for search engines and to help visually-impaired users is to add video transcripts. 

Video transcript is usually affordable and you can add the content directly below the video, à la Whiteboard Friday.

 

V. Adding New Content

 

Once you have eliminated low-value pages, removed irrelevant thin content, added website-level redirects, and addressed video pages you should be left with a much smaller list of duplicate content.

Going through your blog posts in step III should have helped you understand where your most relevant content lives. Solutions and product pages aren’t often duplicated, but if they are they should be on your “adding new content” list. 

Use the metrics gathered in step III to prioritize the pages where you can manually add content.

 

 

Addressing duplicate content can often feel like a herculean task but there are steps you can take to simplify it.

If you only have limited time and resources, start by de-indexing low-value pages and implementing redirects for non-www pages to www pages, or no trailing slashes to the trailing slash versions. 

If you have more time to dedicate to improving your site’s technical health, audit and redirect thin blog posts, add video transcripts to all your video pages, and unique content to the remaining duplicate pages (plus added bonus that it will help improve rankings and traffic for these pages).