{"id":1232,"date":"2008-10-19T08:13:40","date_gmt":"2008-10-19T15:13:40","guid":{"rendered":"http:\/\/eaf.net\/mvp\/?p=1232"},"modified":"2011-11-07T07:25:34","modified_gmt":"2011-11-07T15:25:34","slug":"yahoo-slurp-and-me","status":"publish","type":"post","link":"https:\/\/www.eaf.net\/mvp\/2008\/yahoo-slurp-and-me\/","title":{"rendered":"Yahoo! Slurp and Me"},"content":{"rendered":"<p>They helped cause a spike on this site that led to my host suspending the domain.<\/p>\n<p>After at least 30 hours of being suspended, I was allowed to bring the domain back online. Since then, I&#8217;ve been monitoring its bandwidth consumption. And particularly watching two bots: <i>Googlebot<\/i> and <i>Yahoo! Slurp<\/i>.<\/p>\n<p>I also set up robots.txt to severely restrict bot access to this domain.<\/p>\n<p>Googlebot is behaving; Slurp appears not to be.<\/p>\n<blockquote><p><code><br \/>Robots\/Spiders at 6:53 am on 10\/18\/08<br \/>\n&nbsp; Yahoo Slurp &nbsp;7785+312 &nbsp;156.18 MB &nbsp;09:46<br \/>\n&nbsp; Googlebot &nbsp; &nbsp;4411+33 &nbsp; 111.47 MB &nbsp;02:42<\/code><\/p>\n<p><code>Robots\/Spiders at 7:20 am on 10\/19\/08<br \/>\n&nbsp; Yahoo Slurp &nbsp;8357+334 &nbsp;164.00 MB &nbsp;10:05<br \/>\n&nbsp; Googlebot &nbsp; &nbsp;4413+35 &nbsp; 111.53 MB &nbsp;03:59<\/code><\/p><\/blockquote>\n<p>I checked my recent accesses&#8230;and found a whole bunch by Yahoo.  \ud83d\ude41<\/p>\n<p><!--more--><\/p>\n<p>At this point, I&#8217;m trying to give them the benefit of the doubt because my robots.txt file had a long disallow section for all bots and another special very short one with no disallows for Slurp because I figured they would honor both. Apparently they don&#8217;t. So I modified my robots file to remove the special Slurp section.<\/p>\n<p>Now we&#8217;ll see what happens.<\/p>\n<p>If I knew what IPs to ban in htaccess, I would do that as well!<\/p>\n<p>Oh&#8230;and I&#8217;m posting about this here and on this day <b>for the record<\/b>. Now I can point my virtual server provider to this blog post. The surge in traffic ain&#8217;t my fault, guys!<\/p>\n<p><img src=\"https:\/\/www.eaf.net\/mvp\/wp-includes\/images\/smilies\/mrgreen.png\" alt=\":mrgreen:\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\" \/><\/p>\n<p><b>October 20 update:<\/b><\/p>\n<p>OK, here&#8217;s what I&#8217;m banning so far in my .htaccess file:<\/p>\n<p>RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^Bot\\ mailto:craftbot@yahoo.com [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^Custo [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^Download\\ Demon [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^Express\\ WebPictures [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^HMView [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^Image\\ Stripper [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^Image\\ Sucker [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} Indy\\ Library [NC,OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^Internet\\ Ninja [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^JOC\\ Web\\ Spider [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^larbin [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^Mass\\ Downloader [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^MIDown\\ tool [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^Mister\\ PiX [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^Net\\ Vampire [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^Offline\\ Explorer [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^Offline\\ Navigator [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^Papa\\ Foto [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^pavuk [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^Teleport\\ Pro [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^Web\\ Image\\ Collector [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^Web\\ Sucker [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^WebGo\\ IS [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^Website\\ eXtractor [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^Website\\ Quester [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^Wget [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^Widow [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^Xaldon\\ WebSpider [OR]<br \/>\nRewriteCond %{HTTP_USER_AGENT} ^Zeus<\/p>\n<p>deny from 8.11.2.98<br \/>\ndeny from 41.202.78.138<br \/>\ndeny from 62.12.137.20<br \/>\ndeny from 62.37.113.210<br \/>\ndeny from 62.37.112.104<br \/>\ndeny from 62.37.117.18<br \/>\ndeny from 62.4.64.119<br \/>\ndeny from 64.34.161.151<br \/>\ndeny from 66.198.41.11<br \/>\ndeny from 66.7.210.15<br \/>\ndeny from 67.227.134.4<br \/>\ndeny from 69.42.90.166<br \/>\ndeny from 72.37.152.51<br \/>\ndeny from 74.208.14.215<br \/>\ndeny from 76.79.154.182<br \/>\ndeny from 77.88.26.26<br \/>\ndeny from 77.91.224.<br \/>\ndeny from 78.86.153.121<br \/>\ndeny from 83.133.114.2<br \/>\ndeny from 83.133.125.166<br \/>\ndeny from 85.124.67.254<br \/>\ndeny from 85.92.70.252<br \/>\ndeny from 85.92.71.147<br \/>\ndeny from 87.224.173.<br \/>\ndeny from 89.208.32.109<br \/>\ndeny from 91.205.124.<br \/>\ndeny from 189.30.142.213<br \/>\ndeny from 196.35.158.180<br \/>\ndeny from 200.63.42.136<br \/>\ndeny from 201.66.127.38<br \/>\ndeny from 202.75.62.162<br \/>\ndeny from 205.234.140.219<br \/>\ndeny from 207.182.151.154<br \/>\ndeny from 208.109.181.121<br \/>\ndeny from 208.148.196.71<br \/>\ndeny from 209.200.18.<br \/>\ndeny from 209.200.48.<br \/>\ndeny from 209.216.249.184<br \/>\ndeny from 212.116.219.170<br \/>\ndeny from 212.58.4.129<br \/>\ndeny from 213.4.106.85<br \/>\ndeny from 213.134.144.134<br \/>\ndeny from 216.130.187.100<br \/>\ndeny from 220.225.241.4<\/p>\n<!-- AddThis Advanced Settings generic via filter on the_content --><!-- AddThis Share Buttons generic via filter on the_content -->","protected":false},"excerpt":{"rendered":"<p>They helped cause a spike on this site that led to my host suspending the domain. After at least 30 hours of being suspended, I was allowed to bring the domain back online. Since then, I&#8217;ve been monitoring its bandwidth consumption. And particularly watching two bots: Googlebot and Yahoo! Slurp. I also set up robots.txt &#8230; <a title=\"Yahoo! Slurp and Me\" class=\"read-more\" href=\"https:\/\/www.eaf.net\/mvp\/2008\/yahoo-slurp-and-me\/\" aria-label=\"More on Yahoo! Slurp and Me\">Read more<\/a><!-- AddThis Advanced Settings generic via filter on get_the_excerpt --><!-- AddThis Share Buttons generic via filter on get_the_excerpt --><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false},"version":2}},"categories":[13],"tags":[424,716],"class_list":["post-1232","post","type-post","status-publish","format-standard","hentry","category-tech-stuff","tag-security","tag-wordpress"],"jetpack_publicize_connections":[],"aioseo_notices":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/prJUJ-jS","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/www.eaf.net\/mvp\/wp-json\/wp\/v2\/posts\/1232","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.eaf.net\/mvp\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.eaf.net\/mvp\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.eaf.net\/mvp\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.eaf.net\/mvp\/wp-json\/wp\/v2\/comments?post=1232"}],"version-history":[{"count":0,"href":"https:\/\/www.eaf.net\/mvp\/wp-json\/wp\/v2\/posts\/1232\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.eaf.net\/mvp\/wp-json\/wp\/v2\/media?parent=1232"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.eaf.net\/mvp\/wp-json\/wp\/v2\/categories?post=1232"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.eaf.net\/mvp\/wp-json\/wp\/v2\/tags?post=1232"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}