====== Supplemental Results ====== Supplemental results were introduced to Google searches during 2003. Google says that the results are part of an auxiliary index with fewer constraints placed on pages. For example pages may be orphans, doorway pages with no inbound links, empty pages or have content that Google cannot index (the results relying on meta data). SERPS from the supplemental index are only shown only where there are very few matches from the main index. It is like a final throw of the search dice to throw up some useful information. Supplemental cache results are frozen at the time they were indexed and will often be stale and may show information you no longer want to be public. Supplemental updates are infrequent and results can stick around for up to a year. There is nothing particularly bad about supplemental results. Due to their position in SERPs they will generate very little traffic and assuming that the supplemental URL no longer exists the result will eventually disappear from the SERPs. However supplemental results can also indicate that something is wrong with your website. If you do a site search on Google:
and a large proportion of the results are supplemental, sometimes the whole site can “go supplemental” except for the home page, then your optimization efforts and traffic will be badly affected. This can happen as a result of a Google glitch, usually when there is an update such as during the Big Daddy rollout. In this case it may simply be that you have weather the storm and wait for things to stabilize.
Supplemental results are also result of Google finding near duplicate pages in terms of meta-data, content or URL. If you read the section on PageRank you will realize that internal links are a very powerful tool for you to distribute PageRank and anchor text optimizations around your website. Duplicate content means that you are not wholly in control of this process and it will dilute any optimizations that you make and may result in parts of your site not being properly indexed. The usual culprits for duplicate content are multiple hostnames and database driven dynamic websites where different URLs can refer to the same page.
The first problem to address is multiple hosts. For branding reasons many businesses will register the same domain name with different top level registrars, for example mysite.com, mysite.co.uk etc. All of the additional domain names should send a 301 Moved Permanently redirect to the principal domain. Search engine optimizers frequently register multiple domains targeted at different keywords. Once again there should be one principal domain and the rest should redirect to this domain.
If the site can be accessed using a
www. and a none
www. name then frequently one version will appear in supplemental results. Same thing for http and https (secure http) URLs. If you don’t need to support encrypted pages for passwords or eCommerce then you should disable https on the server and possibly block it on your firewall (https uses TCP port 443) so that robots cannot access pages using this method.
Content Management Systems (CMS) such as wikis, blogs and forums also cause many problems. They frequently use numeric identifiers to pull the content out of a database so a URL may be:
That in itself is not so bad but you may find that the same content can be accessed using different URLs depending where you are on the site. Here we access the content at a particular point on the page
Same content but starting with the 40th comment
and even an “SEO friendly” URL based on the topic title
Many CMS will also ignore random parameters added to the end of the query string:
From a search engine’s point of view this is all very confusing. They don’t want the SERPS clogged with lots of links to the same content so will attempt to sort this out by deleting duplicates or relegating them to supplemental results as is the case with Google. Even if all the internal URLs are consistent you have no control how a third party links to your site. The may use any one of www, https or some random parameter and a robot may end up spidering your entire site using this form of URL. Welcome to supplemental hell. The only way to correct this mess is to make sure that all non-canonical versions of the URL redirect to the canonical version or that results. You may be able to achieve this by using a mixture of the robots.txt file, noindex tag and redirect and redirectmatch (allows regulare expressions) directives in the .htaccess file (on Apache webserver). In more extreme cases you may need to rewrite the CMS to do a 301 redirect to the canonical URL. The following example is typical of a Wiki. We have done a site search for Rossignol B3. Google has returned the canonical URL and two supplementals. These are the same page with the diff action and edit action. These are cases where the content is different but the Google knows they are effectively the same page due to the base URL, title and meta information. The wiki software should mark the pages with the nofollow relationship and also use the noindex Meta directive in the page to tell robots not to index the supplemental results. Googlebot also understands wildcards in the robots.txt file
User-Agent: Googlebot Disallow: /*?action=edit$ Disallow: /*?action=print$
Relative URLs are a special case. These are URLs within the web page are relative to the domain that was used to access the page. To ensure that the robot uses the correct URL when spidering the page use the base Meta element in the document header: