怎么样正确识别百度蜘蛛Baiduspider,以及真伪检测?
<div style="color: black; text-align: left; margin-bottom: 10px;">
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/153580559508016a851a822~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1725104837&x-signature=J5JyD8hsxKTSTsHXehn5UrNU0PY%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">咱们</span>经常在网站日志中看到<span style="color: black;">各样</span>爬虫抓取记录,最<span style="color: black;">平常</span>的<span style="color: black;">便是</span>百度蜘蛛等搜索引擎,以百度为例,<span style="color: black;">一般</span><span style="color: black;">咱们</span>判断<span style="color: black;">是不是</span>是百度蜘蛛抓取看用户代理字符串<span style="color: black;">亦</span><span style="color: black;">便是</span>User-Agent,<span style="color: black;">然则</span>User-Agent是<span style="color: black;">能够</span>模拟的,<span style="color: black;">因此</span><span style="color: black;">非常多</span>时候<span style="color: black;">亦</span>会有虚假的模拟伪装成百度蜘蛛来抓取,<span style="color: black;">此时</span>候<span style="color: black;">咱们</span>就需要学会分辨真伪。</p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/153580541099244d235ae33~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1725104837&x-signature=r1ChPFT6Ntqls3HGSoxUAutLv1U%3D" style="width: 50%; margin-bottom: 20px;"></div>
<h1 style="color: black; text-align: left; margin-bottom: 10px;"><span style="color: black;">首要</span>是百度User-Agent代理字符串,百度官方<span style="color: black;">颁布</span>的有如下User-Agent:</h1>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">移动UA:</strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">Mozilla/5.0 (Linux;u;Android 4.2.2;zh-cn;) AppleWebKit/534.46 (KHTML,likeGecko) Version/5.1 Mobile Safari/10600.6.3 (compatible; Baiduspider/2.0;+</p>http://www.baidu.com/search/spider.html)
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">PC UA:</strong>Mozilla/5.0 (compatible; Baiduspider/2.0;+http://www.baidu.com/search/spider.html)</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">新增渲染UA:</strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">移动UA:</strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 likeMac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143Safari/601.1 (compatible; Baiduspider-render/2.0; +</p>http://www.baidu.com/search/spider.html)
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">PC UA:</strong>Mozilla/5.0 (compatible;Baiduspider-render/2.0; +http://www.baidu.com/search/spider.html)</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">需要<span style="color: black;">重视</span>的是百度新增了一个移动的User-Agent,和一个PC的User-Agent。<span style="color: black;">咱们</span><span style="color: black;">晓得</span>了百度蜘蛛的User-Agent,<span style="color: black;">怎样</span>正确识别和判断某条抓取<span style="color: black;">是不是</span>是真实的百度蜘蛛。</p>
<h1 style="color: black; text-align: left; margin-bottom: 10px;"><strong style="color: blue;"><span style="color: black;">怎样</span>识别百度蜘蛛</strong></h1>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">1、首选<span style="color: black;">咱们</span>需要<span style="color: black;">经过</span>关键词找User-Agent中是否<span style="color: black;">包括</span>Baiduspider;</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">2、<span style="color: black;">倘若</span>想区分移动和PC的蜘蛛的话,<span style="color: black;">咱们</span>还需要在结果中再进行关键词过滤,移动端的User-Agent<span style="color: black;">包括</span>:Android 、iPhone、Mobile等三个中<span style="color: black;">最少</span>一个。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">3、<span style="color: black;">经过</span><span style="color: black;">以上</span>操作<span style="color: black;">咱们</span><span style="color: black;">能够</span>分辨出<span style="color: black;">那些</span>抓取是百度蜘蛛抓取,但并<span style="color: black;">不可</span>分辨真伪。</p>
<h1 style="color: black; text-align: left; margin-bottom: 10px;"><strong style="color: blue;">判断百度蜘蛛真伪</strong></h1>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">1、验证百度蜘蛛真伪<span style="color: black;">咱们</span><span style="color: black;">通常</span><span style="color: black;">运用</span>的是DNS反查IP的方式来进行判断,首选<span style="color: black;">咱们</span>需要找到<span style="color: black;">以上</span>抓取记录中蜘蛛的IP。</p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/1535805307987cd69371280~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1725104837&x-signature=rVqLajc%2FY7MDWYR%2Bt4nbZ8ds%2Bhs%3D" style="width: 50%; margin-bottom: 20px;"></div>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/15358053081303b9ddc523b~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1725104837&x-signature=%2BBPAi3PSGVSm3LRW33TJeoNOY3w%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">2、以windows操作系统为例,<span style="color: black;">起始</span>运行中输入cmd 弹出窗口中输入 nslookup (<span style="color: black;">以上</span>IP<span style="color: black;">位置</span>),判断百度蜘蛛的真伪,真是百度蜘蛛结果中会返回以*.baidu.com 或*.baidu.jp 的格式命名hostname,如不<span style="color: black;">包括</span>则为假百度蜘蛛。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">3、<span style="color: black;">亦</span>有在线百度蜘蛛真伪<span style="color: black;">查找</span>工具可直接<span style="color: black;">查找</span>。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">原创<span style="color: black;">文案</span>,<span style="color: black;">倘若</span>对您有<span style="color: black;">帮忙</span>请点关注。</p>
</div>
外链发布社区 http://www.fok120.com/ seo常来的论坛,希望我的网站快点收录。 外链论坛的成功举办,是与各位领导、同仁们的关怀和支持分不开的。在此,我谨代表公司向关心和支持论坛的各界人士表示最衷心的感谢!
页:
[1]