It’s true! Your older blog posts may have been thrown into Google’s Supplemental Index (a.k.a. Google Hell), which means they have almost zero chance of appearing in search results. Here’s how to check if you’re affected and what you can do about it. The Supplemental Results problem is mostly attributed to Google seeing similar posts in Archives, Categories, Feeds, Comments, etc. and considering them as duplicate content eventhough they’re on the same website - yup, Google is not as smart as you think. Even trusted PR6 websites like Shoemoney.com is not spared and Jeremy had thousands of his pages in Supplemental until he fixed his robots.txt file, which gave him a “1,400% increase in Google traffic within 1 month”! Here’s how to check if your blog is in Google’s Supplemental Index: I checked mine this morning and found that I have 92 pages: Now, let’s create a robots.txt file to help reduce the problem by blocking Google’s spiders from crawling the less important areas: The final step is to wait and be patient because it could take a while for your pages to get back into the main index. Need a second opinion? See how some of the bigger players have implemented this fix, e.g.
1) go to Google.com (duh!).
2) enter site:larrylim.net *** -view (replace larrylim.net with your own domain).
3) see if you have pages with “Supplemental Result” next to them.
1) open your Notepad application.
2) enter the following -
User-Agent: Googlebot
Disallow: /category/
Disallow: /page/
Disallow: /pages/
Disallow: /feed/
Disallow: /feed
3) save the file as “robots.txt”.
4) upload the file into your server’s root directory.
5) go to Google Sitemaps and resubmit.
http://www.shoemoney.com/robots.txt



Great stuff, Larry =)
I have also came across the following plug-in that helps to cure most of the duplicated pages issue. What it does is that it adds META tags on problem pages such as category.
http://www.seologs.com/wordpress-duplicate-content-cure/
However, this plug-in can never beat a good robots.txt as the method you suggest is highly customizable.
Thanks for the information!
Comment by kc tan — June 19, 2007 @ 8:21 pm
You’re welcome KC.
I was really, really surprised that many of my blog posts had gone supplemental and decided to find out the problem/solution. I’ll post an update later on to report if the robots.txt method worked.
Comment by Larry — June 20, 2007 @ 7:22 am
Looks like it worked for me. Monitoring it for a while
Comment by Mark — June 20, 2007 @ 12:09 pm
Larry,
A friend referred me to this post, and I’d like to give it a try, too. But, I’m not really tech savvy.
I’ve already got a robots.txt that contains:
User-agent: *
Disallow: /cgi-bin/
Disallow: /forms/
Disallow: /images/
to keep the bots out of those directories.
Can I simply add the code you posted after those lines? Or, will the two user-agent lines do something stupid like cancel each other out?
Thanks for any insight.
Comment by Kathleen — June 22, 2007 @ 11:58 am
Hi Kathleen,
Yes, you can. Simply add a space and then followed by what I posted. Here’s an example of an implementation for multiple spiders/bots:
http://www.seobook.com/robots.txt
Hope that helped.
Comment by Larry — June 22, 2007 @ 12:40 pm
Larry,
The revised robots.txt is in place. I host two sites. I’ve got 193 blog pages in Supplemental and the other site has 162.
I didn’t add the space in the robots.txt originally, but we should be good to go now. We appreciate your assistance!
Thank you again.
Comment by Kathleen — June 22, 2007 @ 1:00 pm
gr8 info, does this apply to blogspot.com sites also ?
Rajj
Comment by Rajj — December 19, 2007 @ 1:53 am