Checking for and Dealing with Duplicate Content

In this tip, I am going to explain duplicate content, how to spot it and what to do about it. In this video, we’ll be using Google Webmaster tools and I’ll quickly explain a cool function in the web crawling tools screening pro. Duplicate content is where two or more URLs contain the same content. This can cause problems for your SEO because it confuses search engines.

First, they don’t know which version of the page to show in results. They also don’t know where the link value should be assigned or even if it should be divided between multiple versions. Finally, it can be unclear which version to run for specific queries. The overall result of this is that websites can lose rankings and traffic.

Duplicate content is most often caused by content management systems that create multiple URLs for the same content or printing PDF versions of the page. There are a couple of ways to spot duplicate content. First, you can look for duplicate titles and descriptions in webmaster tools. Just open your account, go to search appearance, then HTML improvements. Here you can see where there are duplicate descriptions and titles. This gives you a strong indication of where there might be problems. If we look at an example in the duplicate titles, we can see a page which is showing a large number of Facebook tracking tags in the URL. This indicates that these pages aren’t using a proper canonical tag. I’ll explain this more in a moment.

If you’d like to try a more advanced method, you can use the web crawling tool called Screaming Frog. If you have a site with 500 pages, the free version should be enough to get you started. Once you’ve earned a crawl of your site, you can look at the internal pages and scroll to the far right. Here you’ll see a hash for each of the pages. This is a way of summarizing all the content on the page into a single value. If you sort by the hash, you’ll be able to spot the values that match. You could also export to Excel and use conditional formatting to highlight any field values. If you’re not comfortable using this more advanced method, don’t worry, using webmaster tools will be all it’s needed in most cases.

So, what do you do once you’ve found duplicate content? There are several ways of solving the problem. The first and generally most effective is to make sure that your site is using canonical tags to point to the preferred version of the URL. This removes a lot of duplicate content from the user link tracking or campaign text. This is the best approach where you might still want a particular version of a page to be accessible, like a landing page for your paid advertising.

Another method of removing duplicate content is a 301 redirect. This automatically sends anyone or anything trying to access the page to another page. This is best to use when you have no need of a particular URL and you want to make sure another version is indexed. It’s important that while you’re using a 301, you try and make sure all your links on your site are pointing to the newer preferred version.

Finally, if you have some duplicate content that doesn’t have another version on your site and you still need people to access it, you can use the robots noindex tag. This tells search engine not to index the page. You might use this approach if you have content on your site which has been duplicated on other domains but which you still want users to be able to access, such as privacy pages or terms and conditions.

Generally speaking, duplicate content is one of those things that pops up when your site is first being audited. Once you have best practices in place, like canonical links, it tends to disappear and requires little active management.

Chris is Director of Organic Search for Reprise Melbourne.