htaccess屏蔽垃圾蜘蛛恶意抓取方法(附代码)
网站做成之后,会有很多的蜘蛛爬行和抓取网站内容,其中就有大量垃圾蜘蛛,它们会不停的抓取网站的内容,但不会给我们网站带来任意流量,这时,我们就可以把这些垃圾蜘蛛屏蔽掉。
下面是通过htaccess屏蔽垃圾蜘蛛恶意抓取方法和代码:
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} "^$|^-$|MSNbot|Webdup|AcoonBot|SemrushBot|CrawlDaddy|DotBot|Applebot|AhrefsBot|Ezooms|EdisterBot|EC2LinkFinder|jikespider|Purebot|MJ12bot|DingTalkBot|DuckDuckBot|WangIDSpider|WBSearchBot|Wotbox|xbfMozilla|Yottaa|YandexBot|Barkrowler|SeznamBot|Jorgee|CCBot|SWEBot|PetalBot|spbot|TurnitinBot-Agent|mail.RU|curl|perl|Python|Wget|Xenu|ZmEu|EasouSpider|YYSpider|python-requests|oBot|MauiBot" [NC]
RewriteRule !(^robots\.txt$) http://en.wikipedia.org/wiki/Robots_exclusion_standard [R=403,L]
将常见的垃圾蜘蛛名称都加进去,然后把这些代码放到htaccess里,就可以屏蔽垃圾蜘蛛的爬行了。