In this tip, I’m going to show you different ways your site can be blocked from being indexed by search engines. It’s important to be able to spot things that can block search engines from indexing your content because they can prevent your site from ranking. It’s also useful to know how to block pages that you might want to keep out of search engines like sensitive, private, or duplicate content. This is something you should be reviewing at the start of your audit process but tends to be something that you’ll just happen to spot as you’re looking around the site. There are a couple of tools which will make your job a lot easier. First, you might want to install Google Chrome browser. This is a common preference among SEOs and web designers for many reasons, but for this audit, you’ll need to be able to install browser extensions. The same thing can be done in Firefox, but all my examples will be in Chrome.
You’ll also want to have an extension which can check for and highlight noindex and nofollow tags, an extension which can show canonical links. Ideally, the Ayima redirect path plugin or a similar extension which can show redirects server response codes. And you should already have Webmaster tools installed which we mentioned in the first video. While some of these might sound a bit complicated, once you have them ready they can make long difficult jobs quick and easy and help you understand more about how your website is put together.
There are several ways your content can be prevented from being indexed including the robots noindex tag, the robots text file, canonical links, and server response codes. The noindex tag tells search engines not to add the page to their index while still allowing them to crawl the page and the links on it. It’s placed between the head tags in the HTML. You can find these by looking at the source code or you can more easily spot these by installing a browser extension that pops up whenever the tag is present. The robots text file is a way of telling search engines not to look at parts your site. You can find the robots text file by going to your domain followed by robots.txt. URLs that follow the disallow will not be crawled by search engines. This can be useful for preventing Google crawling parts of the site you don’t want them to, but if done incorrectly, can prevent Google from accessing areas of the site you do want them to crawl and index.
Something to watch out for is the disallow followed by a single forward slash. This means that your whole site is being blocked from web crawlers like Google. Canonical links sound complicated but are actually very simple. Some websites create multiple URLs for the same content. One of the most common is the index.html version of the homepage. Canonical links are used to tell search engines which version of the page you want to index. Whilst they don’t strictly stop content from being indexed, where two pages are identical, they do give a strong indication of which version should be indexed. Canonical links are placed between the head tags in the HTML. Again, the easiest way to spot if a page is using a canonical tag is to use a browser plugin or extension.
Server response codes are a way for websites to tell browsers and search engines the type of result they’re getting back from the server. You might be familiar with “404 pages” or “page not found”. There are also 500 pages which means there’s an error on the server end. If Google sees these types of error codes, they will either not index new content or gradually remove existing pages from the index. Although rare, it’s possible for normal pages to return an error code, but appear perfectly normal to users. Again, the easiest way to spot server response codes is using a browser plugin like redirect path. These plugins can also show you what types of redirects are in place on your site which can be useful for other technical tasks.
With each of these, there are lots of things to tests and tools to try out. If you want to try something more advanced, you can download a free version of a web crawler tool called Screaming Frog, which lets you see lots of information about your site all at once.