l14107cb 发表于 2024-8-25 19:50:41

像蜘蛛侠同样抓数据:带你走进网络爬虫的有趣世界

<img src="https://mmbiz.qpic.cn/sz_mmbiz_gif/3JJQYPZSKDnwJlyzy9VH0QLULia8TducibVBUtJec183LCII9OQkR4rSYk2J5icfyrhuwicbP8z79t1psBtxicvp7lQ/640?wx_fmt=gif&amp;from=appmsg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1" style="width: 50%; margin-bottom: 20px;"><img src="https://mmbiz.qpic.cn/sz_mmbiz_gif/3JJQYPZSKDnwJlyzy9VH0QLULia8TducibbZGsJr44kPZoaY4T0tiaI1qeNLick9Kdj7okicPHJGjAicgUH1Imp1IFqw/640?wx_fmt=gif&amp;from=appmsg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1" style="width: 50%; margin-bottom: 20px;">
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">无论是大学牲小组作业的赶due人,</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">还是上班的打工社畜人,</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">或多或少都会被数据收集<span style="color: black;">熬煎</span></p><img src="https://mmbiz.qpic.cn/sz_mmbiz_jpg/3JJQYPZSKDnwJlyzy9VH0QLULia8TducibZc8HJwLEPjzOCg8BeEHzmGBViabQDYn6vvBREic07wVRbh8qv8kn7Mow/640?wx_fmt=jpeg&amp;from=appmsg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1" style="width: 50%; margin-bottom: 20px;"><img src="https://mmbiz.qpic.cn/sz_mmbiz_jpg/3JJQYPZSKDnwJlyzy9VH0QLULia8TducibDx3JrGtic0fpeLg8UcS7nPGXlvLjPZdiboc3VAWDjoIe0TX5GUqs5SicQ/640?wx_fmt=jpeg&amp;from=appmsg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1" style="width: 50%; margin-bottom: 20px;"><img src="https://mmbiz.qpic.cn/sz_mmbiz_gif/3JJQYPZSKDnwJlyzy9VH0QLULia8TducibbZGsJr44kPZoaY4T0tiaI1qeNLick9Kdj7okicPHJGjAicgUH1Imp1IFqw/640?wx_fmt=gif&amp;from=appmsg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1" style="width: 50%; margin-bottom: 20px;"><img src="https://mmbiz.qpic.cn/sz_mmbiz_gif/3JJQYPZSKDnwJlyzy9VH0QLULia8TducibGCblUOSykyO02icibIwkstW28SNWxdvcXGgxK2uRnRPpJpSMlribse22w/640?wx_fmt=gif&amp;from=appmsg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1" style="width: 50%; margin-bottom: 20px;">
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">因此</span>,爬虫到底是个啥</strong></p><img src="https://mmbiz.qpic.cn/sz_mmbiz_gif/3JJQYPZSKDnwJlyzy9VH0QLULia8TducibvVSRlSma9PFqevbjIxSndOTceFficHWI5ibbVr4LjVicibODWwBRziceicibQ/640?wx_fmt=gif&amp;from=appmsg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1" style="width: 50%; margin-bottom: 20px;">
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">网络爬虫英文是 Web Crawler,从任何一个网页出发,</span><span style="color: black;">用图的遍历算法,自动地<span style="color: black;">拜访</span>到每一个网页并把它们存起来,</span><span style="color: black;">完成这个功能的程序叫做</span><span style="color: black;">网络爬虫</span><span style="color: black;">。</span></p><img src="https://mmbiz.qpic.cn/sz_mmbiz_png/3JJQYPZSKDnwJlyzy9VH0QLULia8TducibJRq2KrrACx3blces5U9CqYeGTbTYlnfv4oMrL4d081yRFhd5m0mfpA/640?wx_fmt=png&amp;from=appmsg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1" style="width: 50%; margin-bottom: 20px;">
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;">1、什么是爬虫</strong></span></p><img src="https://mmbiz.qpic.cn/sz_mmbiz_png/3JJQYPZSKDnwJlyzy9VH0QLULia8TducibcB1PPMba6af0PvpqdsR5QozqsZ8EKvk84x2XeuLFwodXgLOibRKIljA/640?wx_fmt=png&amp;from=appmsg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1" style="width: 50%; margin-bottom: 20px;">
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">爬虫基本概念</p>

    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">咱们</span><span style="color: black;">每日</span>上网冲浪的“互联网”,其实<span style="color: black;">便是</span>是<span style="color: black;">设备</span>与<span style="color: black;">设备</span>的连接,<span style="color: black;">倘若</span>将<span style="color: black;">设备</span><span style="color: black;">拜访</span>的<span style="color: black;">网页</span>当作一个<span style="color: black;">节点</span>,那实现上网页之间相互连接的方式——<span style="color: black;">超链接</span>,<span style="color: black;">便是</span>链接网页的<span style="color: black;">弧</span>。</p><img src="https://mmbiz.qpic.cn/sz_mmbiz_png/3JJQYPZSKDnwJlyzy9VH0QLULia8TducibdUge4OljAXXPZAQQqPicN9ibnZGFhIIu3wmhT4Ric8Ou38kQSVcb9ibbDQ/640?wx_fmt=png&amp;from=appmsg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1" style="width: 50%; margin-bottom: 20px;">
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">网络爬虫<span style="color: black;">实质</span>上<span style="color: black;">便是</span>模拟<span style="color: black;">设备</span>上浏览器发送网络请求,接收请求响应,<span style="color: black;">爬虫<span style="color: black;">便是</span>模拟浏览器的<span style="color: black;">行径</span></span>,越像越好,越像就越<span style="color: black;">不易</span>被<span style="color: black;">发掘</span>。</p><img src="https://mmbiz.qpic.cn/sz_mmbiz_png/3JJQYPZSKDnwJlyzy9VH0QLULia8TducibJRq2KrrACx3blces5U9CqYeGTbTYlnfv4oMrL4d081yRFhd5m0mfpA/640?wx_fmt=png&amp;from=appmsg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1" style="width: 50%; margin-bottom: 20px;">
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;">2、<span style="color: black;">为何</span>要爬虫</strong></span></p><img src="https://mmbiz.qpic.cn/sz_mmbiz_png/3JJQYPZSKDnwJlyzy9VH0QLULia8TducibcB1PPMba6af0PvpqdsR5QozqsZ8EKvk84x2XeuLFwodXgLOibRKIljA/640?wx_fmt=png&amp;from=appmsg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1" style="width: 50%; margin-bottom: 20px;">
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">有效获取信息</strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">例如,小兰想要<span style="color: black;">经过</span>某宝评论信息分析出某品牌的牛仔裤值不值得买,<span style="color: black;">首要</span>会打开网页,<span style="color: black;">而后</span>找到评论信息,再一条一条的翻看。</p>
    <img src="https://mmbiz.qpic.cn/sz_mmbiz_jpg/3JJQYPZSKDnwJlyzy9VH0QLULia8TducibBlowbME5H6OwPTTreCibbKoOV8UwH7oo5jKlRokjvQLmia87npEYWPEw/640?wx_fmt=jpeg&amp;from=appmsg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1" style="width: 50%; margin-bottom: 20px;">
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">那样</span><span style="color: black;">针对</span>网络爬虫<span style="color: black;">来讲</span>,它能够<span style="color: black;">自动化</span><span style="color: black;">获取</span>网页的所有信息,<span style="color: black;">经过</span>提取网页中的你想要的评论内容,将信息<span style="color: black;">保留</span>到文档中,便于对数据进行查看和分析。</p><img src="https://mmbiz.qpic.cn/sz_mmbiz_png/3JJQYPZSKDnwJlyzy9VH0QLULia8TducibJRq2KrrACx3blces5U9CqYeGTbTYlnfv4oMrL4d081yRFhd5m0mfpA/640?wx_fmt=png&amp;from=appmsg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1" style="width: 50%; margin-bottom: 20px;">
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;">爬虫<span style="color: black;">详细</span>应用</strong></span></p><img src="https://mmbiz.qpic.cn/sz_mmbiz_png/3JJQYPZSKDnwJlyzy9VH0QLULia8TducibcB1PPMba6af0PvpqdsR5QozqsZ8EKvk84x2XeuLFwodXgLOibRKIljA/640?wx_fmt=png&amp;from=appmsg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1" style="width: 50%; margin-bottom: 20px;">
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">购物助手——淘宝、京东等</strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">舆情分析——<span style="color: black;">博客</span>、知乎等</strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">自动操作——抢票软件、自动关注等</strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">搜索引擎——百度、谷歌等</strong></p><img src="https://mmbiz.qpic.cn/sz_mmbiz_png/3JJQYPZSKDnwJlyzy9VH0QLULia8TducibFBhlKGtZcEvTF2icxlX5H6mriaM712V2HWLKicRWvnUOvr0j8RPmjTy8Q/640?wx_fmt=png&amp;from=appmsg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1" style="width: 50%; margin-bottom: 20px;"><img src="https://mmbiz.qpic.cn/sz_mmbiz_gif/3JJQYPZSKDnwJlyzy9VH0QLULia8TducibbZGsJr44kPZoaY4T0tiaI1qeNLick9Kdj7okicPHJGjAicgUH1Imp1IFqw/640?wx_fmt=gif&amp;from=appmsg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1" style="width: 50%; margin-bottom: 20px;">
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">认识</span>了爬虫的<span style="color: black;">基本</span>概念,接下来<span style="color: black;">咱们</span>再细说爬虫基本原理以及<span style="color: black;">必须</span>的网络<span style="color: black;">基本</span>概念</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">从哪里爬取</strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">l 、网页链接——URL</strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">URL</strong> 是网页的<span style="color: black;">位置</span>,就像你的家庭住址<span style="color: black;">同样</span>。它告诉浏览器去哪里找到你想<span style="color: black;">拜访</span>的网页。<span style="color: black;">能够</span>把它想象成一个网址,在网络上为每一个网页、<span style="color: black;">照片</span>或视频定位。</p><img src="https://mmbiz.qpic.cn/sz_mmbiz_png/3JJQYPZSKDnwJlyzy9VH0QLULia8TducibcwkjicLWLkz0qyMbAnFkF1EXH88IqeJTzKPiark5cBIBfh5p4Dhmad2w/640?wx_fmt=png&amp;from=appmsg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1" style="width: 50%; margin-bottom: 20px;">
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">2、网络传输协议http和https</strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">HyperText Transfer Protocol,简<span style="color: black;">叫作</span>http,即超文本传输协议。在URL的开头,<span style="color: black;">你<span style="color: black;">一般</span>能看到HTTP四个字母打头</span>。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">HTTP协议是互联网数据传输的一种规则,它规定了数据的传输方式。<span style="color: black;">就像是<span style="color: black;">咱们</span>寄快递<span style="color: black;">选取</span>的快递<span style="color: black;">机构</span>,快递<span style="color: black;">机构</span>规定了邮寄方式。</span></span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">而HTTPS(HyperText Transfer Protocol Secure,超文本传输安全协议)<span style="color: black;">能够</span><span style="color: black;">通俗理解为加密版本的http。</span></span></p><img src="https://mmbiz.qpic.cn/sz_mmbiz_png/3JJQYPZSKDnwJlyzy9VH0QLULia8Tducibt5rLYjCM1D1Tp40SNZaaloIiaIWGQ8Z9rxbpZliaj9JVOQa4YZVbbq8g/640?wx_fmt=png&amp;from=appmsg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1" style="width: 50%; margin-bottom: 20px;">
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">爬取下来的内容长什么样子</strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">l 、&nbsp;http请求(request)</strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">http发送的请求(request)最<span style="color: black;">重点</span>的两部分是:“对谁?”和“干什么?”</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">“对谁?”——<span style="color: black;">亦</span><span style="color: black;">便是</span>URL,即<span style="color: black;">拜访</span>的对象</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">“干什么?”——<span style="color: black;">亦</span><span style="color: black;">便是</span>浏览器想让服务器完成什么任务。</span></p><img src="https://mmbiz.qpic.cn/sz_mmbiz_png/3JJQYPZSKDnwJlyzy9VH0QLULia8TducibNxatibpTNNUyvTqmaTRASz61E8fvHCfD98biaEUkIib4ZIMFSuA3d4kPw/640?wx_fmt=png&amp;from=appmsg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1" style="width: 50%; margin-bottom: 20px;">
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">✦</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">✧</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;">&nbsp;http请求内容参考&nbsp;</strong></span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">✦</span></p><img src="https://mmbiz.qpic.cn/sz_mmbiz_png/3JJQYPZSKDnwJlyzy9VH0QLULia8Tducibd5fK0Nsiadgx6cSvaxmUtNS98LVPgKLNRFY1p5mmAcRjJp0KdCqxLgw/640?wx_fmt=png&amp;from=appmsg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1" style="width: 50%; margin-bottom: 20px;">
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">2、 http响应(response)</strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">按照</span>上面所<span style="color: black;">说到</span>的http请求,服务器在接收到任务后,<span style="color: black;">按照</span>请求处理任务,并将响应<span style="color: black;">信息</span>(response)的状态等返回给浏览器。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">✦</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">✧</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;">&nbsp;http响应内容参考&nbsp;</strong></span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">✦</span></p><img src="https://mmbiz.qpic.cn/sz_mmbiz_png/3JJQYPZSKDnwJlyzy9VH0QLULia8Tducib60ibk03ZszcNcxOa6kU1m1J2vjjgz0KoVicq5OgpKibYl0LYmmBcicyBicQ/640?wx_fmt=png&amp;from=appmsg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1" style="width: 50%; margin-bottom: 20px;">
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">3 、网页<span style="color: black;">构成</span>html、CSS、JS</strong></p><img src="https://mmbiz.qpic.cn/sz_mmbiz_png/3JJQYPZSKDnwJlyzy9VH0QLULia8TducibbNpUUaBicvHWTiaRCZ9ibVQ3Icibfenp2Vwy7gnhBtkQKQN2DnZXFiajvibw/640?wx_fmt=png&amp;from=appmsg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1" style="width: 50%; margin-bottom: 20px;">
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">HTML(超文本标记语言)</span></strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">HTML 是网页的结构和内容,就像建筑物的框架和房间。它用标签来定义网页的标题、段落、链接、<span style="color: black;">照片</span>等内容。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">举个例子</p><img src="https://mmbiz.qpic.cn/sz_mmbiz_png/3JJQYPZSKDnwJlyzy9VH0QLULia8Tducib1FrHTUgvkf36swv6YMp3uTHXnOXIkgHhicxatX6TEdWaomKWdPb24Ig/640?wx_fmt=png&amp;from=appmsg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1" style="width: 50%; margin-bottom: 20px;">
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">&nbsp;HTML 标签‘&lt;h1&gt;’就像是房间的门牌号,告诉浏览器这是一个一级标题。&lt;p&gt;标签用来定义一个段落,就像是房间里的文字内容。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">CSS(层叠样式表)</strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">CSS 是用来美化网页的,就像装修师傅会为房间刷墙、摆放家具、挂装饰品<span style="color: black;">同样</span>。它决定了网页的颜色、字体、布局等视觉效果。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">举个例子</p>
    <img src="https://mmbiz.qpic.cn/sz_mmbiz_png/3JJQYPZSKDnwJlyzy9VH0QLULia8Tducib1FrHTUgvkf36swv6YMp3uTHXnOXIkgHhicxatX6TEdWaomKWdPb24Ig/640?wx_fmt=png&amp;from=appmsg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1" style="width: 50%; margin-bottom: 20px;">
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">你<span style="color: black;">能够</span>用 CSS 让所有的段落字体变成红色,背景颜色变成黄色</span>。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">JavaScript (JS)</strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">JavaScript</strong><span style="color: black;">是给网页添加互动功能的,像家里的电器,它们让你的房子变得更智能、更有趣。JS&nbsp;<span style="color: black;">一样</span><span style="color: black;">能够</span>让网页响应用户的动作,<span style="color: black;">例如</span>点击按钮、输入数据、动态<span style="color: black;">表示</span>信息等。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">举个例子</p>
    <img src="https://mmbiz.qpic.cn/sz_mmbiz_png/3JJQYPZSKDnwJlyzy9VH0QLULia8Tducib1FrHTUgvkf36swv6YMp3uTHXnOXIkgHhicxatX6TEdWaomKWdPb24Ig/640?wx_fmt=png&amp;from=appmsg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1" style="width: 50%; margin-bottom: 20px;">
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">当你点击一个按钮,弹出一个提示框说“你好”,<span style="color: black;">便是</span> JS 在<span style="color: black;">背面</span>工作的结果。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">结合起来,HTML、CSS 和 JavaScript <span style="color: black;">能够</span>创建一个完整且富有交互性的网页。</strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">4、&nbsp;静态网页和动态网页</strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">静态网页</strong>,随着HTML代码的生成,页面的内容和<span style="color: black;">表示</span>效果就基本上不会<span style="color: black;">出现</span>变化。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">动态网页</strong>则<span style="color: black;">否则</span>,页面代码虽然<span style="color: black;">无</span>变,<span style="color: black;">然则</span><span style="color: black;">表示</span>的内容却是<span style="color: black;">能够</span>随着时间、环境<span style="color: black;">或</span>数据库操作的结果而<span style="color: black;">出现</span>改变的。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">爬虫取到的页面仅仅是一个静态的页面,即网页的源代码,就像在浏览器上的“查看网页源代码”<span style="color: black;">同样</span>。<span style="color: black;">有些</span>动态的东西如JavaScript脚本执行后所产生的信息,是抓取不到的。</p><img src="https://mmbiz.qpic.cn/sz_mmbiz_gif/3JJQYPZSKDnwJlyzy9VH0QLULia8TducibbZGsJr44kPZoaY4T0tiaI1qeNLick9Kdj7okicPHJGjAicgUH1Imp1IFqw/640?wx_fmt=gif&amp;from=appmsg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1" style="width: 50%; margin-bottom: 20px;"><img src="https://mmbiz.qpic.cn/sz_mmbiz_jpg/3JJQYPZSKDnwJlyzy9VH0QLULia8TducibRbNIkhq3Q2WZ6dDtZrtyQZXtak9AFWGxr9aYiaApsn93eEmD1TjnWqw/640?wx_fmt=jpeg&amp;from=appmsg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1" style="width: 50%; margin-bottom: 20px;"><span style="color: black;">
      <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">oho~要长脑子了</p>

    </span>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">前面简单介绍了爬虫<span style="color: black;">基本</span>概念以及网络<span style="color: black;">基本</span>概念,想必你<span style="color: black;">已然</span>对爬虫有了一个大致的<span style="color: black;">认识</span>。在推文的最后一部分,想给<span style="color: black;">大众</span>介绍<span style="color: black;">有些</span>爬虫知识小技巧,供<span style="color: black;">大众</span>进阶学习参考~</p>

    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">爬虫论坛与书籍<span style="color: black;">举荐</span></strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">1. V2EX 爬虫版:讨论技术细节、交流实战经验的好<span style="color: black;">地区</span>。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">2. 知乎爬虫<span style="color: black;">专题</span>:[知乎爬虫<span style="color: black;">专题</span>]下有<span style="color: black;">海量</span>的讨论和问答,适合新手和进阶者。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">3、《Python 爬虫<span style="color: black;">研发</span>与项目实战》 - 适合初学者,涵盖<span style="color: black;">基本</span>知识到实战项目。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">爬虫工具介绍</strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">1、Scrapy:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">开源且功能强大的 Python 爬虫框架,支持异步请求和扩展插件。(https://docs.scrapy.org/en/latest/)</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">2. Beautiful Soup:</p>

    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">简洁易用的 Python 库,适用于解析 HTML 和 XML 文档。(https://www.crummy.com/software/BeautifulSoup/bs4/doc/)</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">爬虫与反爬虫技术</strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在互联网数据抓取中,爬虫技术与反爬虫技术如同猫和鼠般互相较量。理解这两者的对抗是有效数据抓取的关键。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">其中反爬虫技术<span style="color: black;">包含</span>有:</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">1、User-Agent 检测:<span style="color: black;">检测</span><span style="color: black;">拜访</span>请求中的 User-Agent 字段<span style="color: black;">是不是</span><span style="color: black;">恰当</span>,以辨别自动化<span style="color: black;">拜访</span>。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">2. IP 限制:<span style="color: black;">经过</span>限制相同 IP 的<span style="color: black;">拜访</span>频率来防止<span style="color: black;">海量</span>数据抓取。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">爬虫小常识</strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">网络蜘蛛需要抓取网页,<span style="color: black;">区别</span>于<span style="color: black;">通常</span>的<span style="color: black;">拜访</span>,<span style="color: black;">倘若</span><span style="color: black;">掌控</span><span style="color: black;">欠好</span>,则会<span style="color: black;">导致</span>网站服务器<span style="color: black;">包袱</span>过重。</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">例如</span>,淘宝就<span style="color: black;">由于</span>雅虎搜索引擎的网络蜘蛛抓取其数据<span style="color: black;">导致</span>淘宝网服务器的不稳定。</span></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">&nbsp;●好的网络爬虫,需要遵守Robots协议。Robots协议的全<span style="color: black;">叫作</span>是“网络爬虫排除标准”(Robots Exclusion Protocol) 。&nbsp;</p><img src="https://mmbiz.qpic.cn/sz_mmbiz_gif/3JJQYPZSKDnwJlyzy9VH0QLULia8TducibbZGsJr44kPZoaY4T0tiaI1qeNLick9Kdj7okicPHJGjAicgUH1Imp1IFqw/640?wx_fmt=gif&amp;from=appmsg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1" style="width: 50%; margin-bottom: 20px;"><img src="https://mmbiz.qpic.cn/sz_mmbiz_jpg/3JJQYPZSKDnwJlyzy9VH0QLULia8TducibRbNIkhq3Q2WZ6dDtZrtyQZXtak9AFWGxr9aYiaApsn93eEmD1TjnWqw/640?wx_fmt=jpeg&amp;from=appmsg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1" style="width: 50%; margin-bottom: 20px;">
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">那看来,无论是大学生在赶小组作业的 due 日,还是上班族面对数据收集的“<span style="color: black;">熬煎</span>”,网络爬虫都<span style="color: black;">能够</span><span style="color: black;">作为</span><span style="color: black;">咱们</span>获取信息的“神器”哎!</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">是啊!想象一下,在浩瀚的互联网海洋中,网络爬虫就像是一只敏捷的蜘蛛,<span style="color: black;">容易</span>编织出一张信息的网,为<span style="color: black;">咱们</span><span style="color: black;">捉捕</span>有价值的数据。</p><img src="https://mmbiz.qpic.cn/sz_mmbiz_jpg/3JJQYPZSKDnwJlyzy9VH0QLULia8Tducib6m8ib3tmySMEZMegUKB2U5Q5xPKNXcbrsOk2xDaKS18lECxBIuy8gFA/640?wx_fmt=jpeg&amp;from=appmsg&amp;tp=webp&amp;wxfrom=5&amp;wx_lazy=1&amp;wx_co=1" style="width: 50%; margin-bottom: 20px;">
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">因此</span>,下次当你被数据收集所<span style="color: black;">困惑</span>时,别忘了这只智能的“蜘蛛”哟!</p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">END</p>

    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">爱好</span>的话 点赞关注</strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">科盐系男神<span style="color: black;">连续</span>推送</strong></p>
    <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;"><span style="color: black;">咱们</span>下期见!</strong></p>




4lqedz 发表于 2024-10-23 08:58:31

感谢楼主的分享!我学到了很多。

nqkk58 发表于 2024-10-23 20:08:11

谢谢、感谢、感恩、辛苦了、有你真好等。

nqkk58 发表于 2024-11-10 16:49:44

你的话语如春风拂面,温暖了我的心房,真的很感谢。

1fy07h 发表于 4 天前

百度seo优化论坛 http://www.fok120.com/

wrjc1hod 发表于 昨天 20:39

感谢楼主分享,祝愿外链论坛越办越好!
页: [1]
查看完整版本: 像蜘蛛侠同样抓数据:带你走进网络爬虫的有趣世界