利用python爬虫爬取网站音乐
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">近期</span>我想在网站上下载几首音乐放到我的u盘里听,<span style="color: black;">然则</span>上网上一找,各大音乐网站下载歌曲(尤其是好听的歌曲)都需要vip。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">针对</span>像我<span style="color: black;">这般</span>的穷人<span style="color: black;">来讲</span>,肯定是不会花几十块钱去下载几首音乐啦,<span style="color: black;">况且</span><span style="color: black;">做为</span>程序员,充钱去下载音乐那<span style="color: black;">亦</span>是不可能的,于是我花了一天时间,上网找了<span style="color: black;">各样</span>资料来学习一下<span style="color: black;">怎么样</span><span style="color: black;">才可</span>不花钱白嫖到网站上的音乐。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">当然,<span style="color: black;">办法</span>还是有<span style="color: black;">非常多</span>种的,最后我还是<span style="color: black;">选取</span>了一种最简单,最方便的一种<span style="color: black;">办法</span>: </span><strong style="color: blue;"><span style="color: black;">python爬虫</span></strong><span style="color: black;">。下面,我就跟<span style="color: black;">大众</span>分享一下我在用python爬虫时遇到的坑。</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/7ab2a5037f1341e786812ad942c4e318~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1729477183&x-signature=QFiWiVu1x4zy2%2BveQ35D9CFyuOc%3D" style="width: 50%; margin-bottom: 20px;"></div>
<h1 style="color: black; text-align: left; margin-bottom: 10px;">下面,我以爬取某易云音乐为例,介绍一下我时<span style="color: black;">怎样</span>学习python爬虫的:</h1>
<h1 style="color: black; text-align: left; margin-bottom: 10px;">思路:</h1><span style="color: black;"><span style="color: black;">音乐从哪里来?---网站的服务器里</span></span><span style="color: black;"><span style="color: black;">怎么从网址里得到音乐?---向网站发起网络请求</span></span><span style="color: black;"><span style="color: black;">删选音乐文件</span></span><span style="color: black;"><span style="color: black;">下载音乐文件</span></span>
<h1 style="color: black; text-align: left; margin-bottom: 10px;"><span style="color: black;">详细</span>实现</h1>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">1. 引入发送网络请求的第三方库</span></span></p><span style="color: black;">import</span> requests <span style="color: black;"># 发送网络请求的第三方库</span>
复制代码<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">安装<span style="color: black;">办法</span></span></span></p><span style="color: black;">pip</span> install requests
<span style="color: black;"><span style="color: black;">引入数据解析第三方库</span></span><span style="color: black;">from</span> lxml <span style="color: black;">import</span> etree <span style="color: black;"># 数据解析第三方库</span>
复制代码<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">安装<span style="color: black;">办法</span></span></span></p><span style="color: black;">pip</span>install lxml<span style="color: black;"><span style="color: black;">某易云音乐网站列表url为https://music.163.com/#/discover/toplist?id=3778678</span></span>url = https://music.163.com/<span style="color: black;">#/discover/toplist?id=3778678</span>
复制代码<span style="color: black;"><span style="color: black;">发送请求获取页面数据</span></span>response = requests.<span style="color: black;">get</span>(url=url) <span style="color: black;"># 请求页面数据</span>
复制代码<span style="color: black;"><span style="color: black;">解析数据</span></span>html=etree.HTML(response.text) <span style="color: black;"># 解析页面数据</span>
复制代码<span style="color: black;"><span style="color: black;">获取所有歌曲标签集合( </span><strong style="color: blue;"><span style="color: black;">a标签</span></strong><span style="color: black;"> )</span></span><span style="color: black;">id_list</span> = html.xpath(<span style="color: black;">//a</span>) <span style="color: black;"># 所有歌曲id集合</span>
复制代码<span style="color: black;"><span style="color: black;">下载歌曲</span></span>base_url = <span style="color: black;">http://music.163.com/song/media/outer/url?id=</span> <span style="color: black;"># 下载音乐网址前缀</span>
<span style="color: black;"># 下载音乐url = 网址前缀 + 音乐id</span>
<span style="color: black;">for</span> data in id_list:
href = data.xpath(<span style="color: black;">./@href</span>)[<span style="color: black;">0</span>]
music_id = href.<span style="color: black;">split</span>(<span style="color: black;">=</span>)[<span style="color: black;">1</span>] <span style="color: black;"># 音乐id</span>
music_url = base_url + music_id <span style="color: black;"># 下载音乐url</span>
music_name = data.xpath(<span style="color: black;">./text()</span>)[<span style="color: black;">0</span>] <span style="color: black;"># 下载音乐名<span style="color: black;">叫作</span></span>music = requests.get(url = music_url)<span style="color: black;"># 将下载的音乐以文件形式<span style="color: black;">保留</span>下来</span>
with <span style="color: black;">open</span>(<span style="color: black;">./music/%s.mp3</span> % music_name, <span style="color: black;">wb</span>) as file:
file.write(music.content)
<span style="color: black;">print</span>(<span style="color: black;"><%s>下载成功</span> % music_name)
复制代码<h1 style="color: black; text-align: left; margin-bottom: 10px;">遇到的坑</h1>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">以上的<span style="color: black;">办法</span>我是从一个视频里学到的,那个视频是半年前出的,可能当时这种<span style="color: black;">办法</span>还好使,<span style="color: black;">然则</span>今天我在用这种<span style="color: black;">办法</span>下载音乐文件的时候<span style="color: black;">忽然</span>就报错了。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">首要</span>,编辑器报错找不到 </span><strong style="color: blue;"><span style="color: black;">music_name</span></strong><span style="color: black;"> 和 </span><strong style="color: blue;"><span style="color: black;">music_id</span></strong><span style="color: black;">,我仔细一看,获取的id_list集合里(<span style="color: black;">亦</span><span style="color: black;">便是</span>标签集合里)的id<span style="color: black;">基本</span>不是id,是代码,估计在<span style="color: black;">这儿</span>音乐网站<span style="color: black;">亦</span>做了相应的反扒机制。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">其次,我自己在网站里找到了一首音乐获取了它的id并把id赋值给music_id,结果当用外链下载音乐时报错460,<span style="color: black;">表示</span>网络拥挤,估计下载音乐的网址<span style="color: black;">亦</span><span style="color: black;">欠好</span>使了。</span></span></p>base_url = http://music.163.com/song/media/outer/url?id=
music_id = 1804320463.mp3
music_url = base_url + music_id
music = requests.get(url=music_url)
print(music.text)
复制代码<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">{"msg":"网络太拥挤,请稍候再试!","code":-460,"message":"网络太拥挤,请稍候再试!"}e</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">最后,我打印出music_url,点击进去,还是<span style="color: black;">能够</span>听歌和下载的,不<span style="color: black;">晓得</span>这是<span style="color: black;">为何</span>了</span></span></p>base_url = <span style="color: black;">http://music.163.com/song/media/outer/url?id=</span>
music_id = <span style="color: black;">1804320463.mp3</span>
music_url = base_url + music_id
<span style="color: black;"># music = requests.get(url=music_url)</span>
<span style="color: black;">print</span>(music_url)
复制代码<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">music.163.com/song/media/…</span></span></p>
<h1 style="color: black; text-align: left; margin-bottom: 10px;">总结</h1>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">此刻</span>的网站技术更新太快,<span style="color: black;">非常多</span>网站都有了高级反爬机制,毕竟<span style="color: black;">吗</span>,有些东西还是<span style="color: black;">不可</span>随随便便就给你的,我写这篇<span style="color: black;">文案</span><span style="color: black;">重点</span>是跟<span style="color: black;">大众</span>分享一下我学习python爬虫时的<span style="color: black;">有些</span>经验,<span style="color: black;">同期</span>,我<span style="color: black;">亦</span>想请教各位大神,像遇到了我这种问题了,我应该怎么办<span style="color: black;">才可</span>将这个网站的音乐文件爬到我的本地电脑里,还请各大神指点一二。</span></span></p>
论坛是一个舞台,让我们在这里尽情的释放自己。
页:
[1]