Next: Traffic Analysis and Requests
Previous: Traffic Analysis and Search Engine Optimization

Robots and Spiders

Most search engine robots are configured to report a specific user agent when they visit a website. The Yahoo! robot reports:

Yahoo! Slurp.

Log analysis software will translate these entries into a report that have gives:

  • How many robots visited the website
  • How many pages they have indexed
  • How much bandwidth robots have consumed

Remember that bandwidth used by spiders is overhead although it should have the benefit of later visits to the website. I've sometimes considered banning some spiders that eat lots of site bandwidth but send few visitors but have hesitated on the grounds that today's bandwidth hog may just be tomorrow's gold mine.

If you want to know if a search engine has indexed a particular page you will have to look at the log file:

65.54.188.134 - 65.54.188.134.325781095131986605 [14/Sep/2004:04:19:46 +0100] "GET /products/widgets/description.htm HTTP/1.0" 200 9303 "-" "msnbot/0.11 (+http://search.msn.com/msnbot.htm)"

65.54.188.134 - 65.54.188.134.325781095131986605 [14/Sep/2004:04:19:53 +0100] "GET /products/widgets/order.htm HTTP/1.0" 200 9303 "-" "msnbot/0.11 (+http://search.msn.com/msnbot.htm)"

This shows that version 0.11 of the MSN Search robot visited the widget description page on the 14th of September and then indexed the order page 7 seconds later. The file, order.htm is a link from the description page. This shows that the robot is able to traverse the pages without any problems. This doesn't guarantee that the pages will make it into the index or that people will find them.

This information was obtained using the command line 'grep' tool. Perl is also a great language for writing ad-hoc scripts to analyse log files. If programming isn't for you, you may be able to find someone to help with these tasks.

Search Engine Optimization Book            

See Also

Home ] Table of Contents ] Start ]