Blocking Unwanted Robots

2006 Sep 13

Spam is a problem, posting entries to blogs with email address should be avoided. There are lots of automated programs used to collect email addresses; other than spam, bandwidth may also be an issue for these programs reads your entire website. If you only have a small bandwidth allocated to your site then you will be seeing that Bandwidth Limit Error in due time.

What I did? Blocking all unwanted robots out of my site using mod_rewrite by apache. First, you need to examine your access log file ang try to google on the robots that has visited your site if they are safe or just they are just scrapers. Just be carefull not to block those major search engine spiders like googlebot, inktomi slurp, msnbot or ask jeeves. Unless you don't want them crawl your website.

You need to modify your .htaccess file to block unwanted robots from scraping your website by:


RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^Link[Ww]alker [OR]
RewriteRule ^.* - [F]

The above code tells the spiders Siphon and LinkWalker that they are not allowed on our website by returning a 403 Forbidden Error.

There are also good robots, most of them are used for link checking, so redirecting them to the proper areas would be a better solution.


RewriteCond %{HTTP_USER_AGENT} reciprocalman [OR]
RewriteCond %{HTTP_USER_AGENT} LinksManager.com_bot
RewriteRule ^$ /resources/

The code above tells the reciprocalman and the LinksManager.com_bot to go directly to the resources directory.

Tweet this post

Leave a comment


About Me


Alfredo Sanchez is an internet professional focusing on the study search engines behavior in particular. Supports Free Open Source Software and currently develops applications with it using XAMPP.

Recent Entries

Close