Yahoo! Slurp and Me

They helped cause a spike on this site that led to my host suspending the domain.

After at least 30 hours of being suspended, I was allowed to bring the domain back online. Since then, I’ve been monitoring its bandwidth consumption. And particularly watching two bots: Googlebot and Yahoo! Slurp.

I also set up robots.txt to severely restrict bot access to this domain.

Googlebot is behaving; Slurp appears not to be.


Robots/Spiders at 6:53 am on 10/18/08
  Yahoo Slurp  7785+312  156.18 MB  09:46
  Googlebot    4411+33   111.47 MB  02:42

Robots/Spiders at 7:20 am on 10/19/08
  Yahoo Slurp  8357+334  164.00 MB  10:05
  Googlebot    4413+35   111.53 MB  03:59

I checked my recent accesses…and found a whole bunch by Yahoo. 🙁

At this point, I’m trying to give them the benefit of the doubt because my robots.txt file had a long disallow section for all bots and another special very short one with no disallows for Slurp because I figured they would honor both. Apparently they don’t. So I modified my robots file to remove the special Slurp section.

Now we’ll see what happens.

If I knew what IPs to ban in htaccess, I would do that as well!

Oh…and I’m posting about this here and on this day for the record. Now I can point my virtual server provider to this blog post. The surge in traffic ain’t my fault, guys!

:mrgreen:

October 20 update:

OK, here’s what I’m banning so far in my .htaccess file:

RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus

deny from 8.11.2.98
deny from 41.202.78.138
deny from 62.12.137.20
deny from 62.37.113.210
deny from 62.37.112.104
deny from 62.37.117.18
deny from 62.4.64.119
deny from 64.34.161.151
deny from 66.198.41.11
deny from 66.7.210.15
deny from 67.227.134.4
deny from 69.42.90.166
deny from 72.37.152.51
deny from 74.208.14.215
deny from 76.79.154.182
deny from 77.88.26.26
deny from 77.91.224.
deny from 78.86.153.121
deny from 83.133.114.2
deny from 83.133.125.166
deny from 85.124.67.254
deny from 85.92.70.252
deny from 85.92.71.147
deny from 87.224.173.
deny from 89.208.32.109
deny from 91.205.124.
deny from 189.30.142.213
deny from 196.35.158.180
deny from 200.63.42.136
deny from 201.66.127.38
deny from 202.75.62.162
deny from 205.234.140.219
deny from 207.182.151.154
deny from 208.109.181.121
deny from 208.148.196.71
deny from 209.200.18.
deny from 209.200.48.
deny from 209.216.249.184
deny from 212.116.219.170
deny from 212.58.4.129
deny from 213.4.106.85
deny from 213.134.144.134
deny from 216.130.187.100
deny from 220.225.241.4

Comment? Sure!

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Above all, love God!
%d bloggers like this: