|
Next: Sandbox Bad Robots and SpidersNot all robots are good. Some roam the web looking for security holes, probing for known weaknesses in web servers and content management systems. Gateways for sending email are particularly sought after as they can be used by spammers for hiding the origins of their junk email. Other robots harvest email addresses from web pages to add to spam lists. Examples are EmailSiphon and Cherry Picker and they are normally referred to as spambots. Some robots may appear to be more benign. A site I manage was recently visited by a robot identifying itself as: NPBot it belongs to Name Protect, a company that searches the web for its clients looking for intellectual property infringements on your server. Nothing much wrong with that except that the robot consumes resources and the results will not increase visitors so there is no advantage to having their robot to visit. Some bad robots don't obey the robots.txt file. In this case the site can be banned by its IP address or range of addresses. This can be done through the web server's administration utility or directly in the .htaccess file in the case of the venerable Apache web server:
This should be done judiciously as you may block some real users and it puts extra load on the web server as it now has to check the client's IP address with each request. Robots Exclusion StandardThe robot exclusion standard gives more information about the robots.txt synatx. http://www.robotstxt.org/wc/norobots.html
See Also
|
|
©1994-2006 All text and images copyright: www.abcseo.com; last updated: |