8个常用Python库从安装命令及应用
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">Python本身的数据分析功能并不强,<span style="color: black;">必须</span>安装<span style="color: black;">有些</span>第三方扩展库来<span style="color: black;">加强</span>其相应的功能。本文将对NumPy、SciPy、Matplotlib、pandas、StatsModels、scikit-learn、Keras、Gensim等库的安装和<span style="color: black;">运用</span>进行简单的介绍。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">倘若</span>读者安装的是Anaconda发行版,<span style="color: black;">那样</span>它<span style="color: black;">已然</span>自带了以下库:NumPy、SciPy、Matplotlib、pandas、scikit-learn。</p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/3d5f6ec13de84463a9d0a5d3efa30e07~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1723897148&x-signature=olmYT12lkUlJCPV1yUtxyV6KJUw%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">本文<span style="color: black;">重点</span>是对这些库进行简单的介绍,读者<span style="color: black;">亦</span><span style="color: black;">能够</span>到官网阅读更加<span style="color: black;">仔细</span>的<span style="color: black;">运用</span>教程。</p>NumPy:<span style="color: black;">供给</span>数组支持以及相应的<span style="color: black;">有效</span>的处理函数SciPy:<span style="color: black;">供给</span>矩阵支持以及矩阵<span style="color: black;">关联</span>的数值计算模块Matplotlib:强大的数据可视化工具、作图库pandas:强大、灵活的数据分析和探索工具StatsModels:统计建模和计量经济学,<span style="color: black;">包含</span>描述统计、统计模型估计和推断scikit-learn:支持回归、<span style="color: black;">归类</span>、聚类等强大的<span style="color: black;">设备</span>学习库Keras:深度学习库,用于<span style="color: black;">创立</span>神经网络以及深度学习模型Gensim:用来做文本主题模型的库,文本挖掘可能会用到<h3 style="color: black; text-align: left; margin-bottom: 10px;"><strong style="color: blue;">01 NumPy</strong></h3>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">Python并<span style="color: black;">无</span><span style="color: black;">供给</span>数组功能。虽然列表<span style="color: black;">能够</span>完成基本的数组功能,但它不是真正的数组,<span style="color: black;">况且</span>在数据量<span style="color: black;">很强</span>时,<span style="color: black;">运用</span>列表的速度就会很慢。为此,NumPy<span style="color: black;">供给</span>了真正的数组功能以及对数据进行快速处理的函数。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">NumPy还是<span style="color: black;">非常多</span>更高级的扩展库的依赖库,<span style="color: black;">咱们</span>后面介绍的SciPy、Matplotlib、pandas等库都依赖于它。值得强调的是,NumPy内置函数处理数据的速度是C语言级别的,<span style="color: black;">因此呢</span>在编写程序的时候,应当<span style="color: black;">尽可能</span><span style="color: black;">运用</span>其内置函数,避免效率瓶颈的(尤其是<span style="color: black;">触及</span>循环的问题)<span style="color: black;">显现</span>。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在Windows操作系统中,NumPy的安装跟普通第三方库的安装<span style="color: black;">同样</span>,<span style="color: black;">能够</span><span style="color: black;">经过</span>pip命令进行,命令如下:</p><span style="color: black;">pip</span> install numpy <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">亦</span><span style="color: black;">能够</span><span style="color: black;">自动</span>下载源代码,<span style="color: black;">而后</span><span style="color: black;">运用</span>如下命令安装:</p><span style="color: black;">python</span> <span style="color: black;">setup</span><span style="color: black;">.py</span> <span style="color: black;">install</span> <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在Linux操作系统下,<span style="color: black;">以上</span><span style="color: black;">办法</span><span style="color: black;">亦</span>是可行的。<span style="color: black;">另外</span>,<span style="color: black;">非常多</span>Linux发行版的软件源中都有Python<span style="color: black;">平常</span>的库,<span style="color: black;">因此呢</span>还<span style="color: black;">能够</span><span style="color: black;">经过</span>Linux系统自带的软件管理器安装,如在Ubuntu下<span style="color: black;">能够</span>用如下命令安装:</p>sudo apt-<span style="color: black;">get</span> install python-numpy <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">安装完成后,<span style="color: black;">能够</span><span style="color: black;">运用</span>NumPy对数据进行操作,如代码<span style="color: black;">名单</span>2-27所示。</p><strong style="color: blue;">代码<span style="color: black;">名单</span>2-27 <span style="color: black;">运用</span>NumPy操作数组</strong># -*- coding: utf<span style="color: black;">-8</span>-* import numpy as np # <span style="color: black;">通常</span>以np<span style="color: black;">做为</span>NumPy库的别名 a = np.array([<span style="color: black;">2</span>, <span style="color: black;">0</span>, <span style="color: black;">1</span>, <span style="color: black;">5</span>]) # 创建数组 <span style="color: black;">print</span>(a) # 输出数组 <span style="color: black;">print</span>(a[:<span style="color: black;">3</span>]) # 引用前三个数字(切片)<span style="color: black;">print</span>(a.<span style="color: black;">min</span>()) # 输出a的最小值 a.<span style="color: black;">sort</span>() # 将a的元素从小到大排序,此操作直接修改a,<span style="color: black;">因此呢</span><span style="color: black;">此时</span>候a为[<span style="color: black;">0</span>, <span style="color: black;">1</span>, <span style="color: black;">2</span>, <span style="color: black;">5</span>] b= np.array(<span style="color: black;">[, ]</span>) # 创建二维数组 <span style="color: black;">print</span>(b*b) # 输出数组的平方阵,即<span style="color: black;">[, ]</span> <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">NumPy是Python中相当成熟和常用的库,<span style="color: black;">因此呢</span>关于它的教程有<span style="color: black;">非常多</span>,最值得一看的是其官网的<span style="color: black;">帮忙</span>文档,其次还有<span style="color: black;">非常多</span>中英文教程,读者遇到相应的问题时,<span style="color: black;">能够</span>查阅<span style="color: black;">关联</span>资料。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">参考链接:</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">http://www.numpy.org</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">http://reverland.org/python/2012/08/22/numpy</p>
<h3 style="color: black; text-align: left; margin-bottom: 10px;"><strong style="color: blue;">02 SciPy</strong></h3>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">倘若</span>说NumPy让Python有了MATLAB的味道,<span style="color: black;">那样</span>SciPy就让Python真正<span style="color: black;">作为</span>半个MATLAB了。NumPy<span style="color: black;">供给</span>了多维数组功能,但它只是<span style="color: black;">通常</span>的数组,并不是矩阵,<span style="color: black;">例如</span>当两个数组相乘时,只是对应元素相乘,而不是矩阵乘法。SciPy<span style="color: black;">供给</span>了真正的矩阵以及<span style="color: black;">海量</span>基于矩阵运算的对象与函数。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">SciPy<span style="color: black;">包括</span>的功能有最优化、线性代数、积分、插值、拟合、特殊函数、快速傅里叶变换、信号处理和图像处理、常微分方程求解和其他科学与工程中常用的计算,显然,这些功能都是挖掘与建模必需的。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">SciPy依赖于NumPy,<span style="color: black;">因此呢</span>安装之前得先安装好NumPy。安装SciPy的方式与安装NumPy的<span style="color: black;">办法</span>大同小异,<span style="color: black;">必须</span>提及的是,在Ubuntu下<span style="color: black;">亦</span><span style="color: black;">能够</span>用类似的命令安装SciPy,安装命令如下:</p>sudo apt-<span style="color: black;">get</span> install python-scipy <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">安装好SciPy后,<span style="color: black;">运用</span>SciPy求解非线性方程组和数值积分,如代码<span style="color: black;">名单</span>2-28所示。</p><strong style="color: blue;">代码<span style="color: black;">名单</span>2-28 <span style="color: black;">运用</span>SciPy求解非线性方程组和数值积分</strong><span style="color: black;"># -*- coding: utf-8 -* </span>
<span style="color: black;"># 求解非线性方程组2x1-x2^2=1,x1^2-x2=2</span>
<span style="color: black;">from</span> scipy.optimize <span style="color: black;">import</span> fsolve <span style="color: black;"># 导入求解方程组的函数 </span>
<span style="color: black;"><span style="color: black;">def</span> <span style="color: black;">f</span><span style="color: black;">(x)</span>:</span> <span style="color: black;"># 定义<span style="color: black;">需求</span>解的方程组 </span>
x1 = x[<span style="color: black;">0</span>]
x2 = x[<span style="color: black;">1</span>]
<span style="color: black;">return</span> [<span style="color: black;">2</span>*x1 - x2**<span style="color: black;">2</span> - <span style="color: black;">1</span>, x1**<span style="color: black;">2</span> - x2 <span style="color: black;">-2</span>]
result = fsolve(f, [<span style="color: black;">1</span>,<span style="color: black;">1</span>]) <span style="color: black;"># 输入初值并求解 </span>
print(result) <span style="color: black;"># 输出结果,为array([ 1.91963957, 1.68501606]) </span>
<span style="color: black;"># 数值积分 </span>
<span style="color: black;">from</span> scipy <span style="color: black;">import</span> integrate <span style="color: black;"># 导入积分函数 </span>
<span style="color: black;"><span style="color: black;">def</span> <span style="color: black;">g</span><span style="color: black;">(x)</span>:</span> <span style="color: black;"># 定义被积函数 </span>
<span style="color: black;">return</span> (<span style="color: black;">1</span>-x**<span style="color: black;">2</span>)**<span style="color: black;">0.5</span>
pi_2, err = integrate.quad(g, <span style="color: black;">-1</span>, <span style="color: black;">1</span>) <span style="color: black;"># 积分结果和误差 </span>
print(pi_2 * <span style="color: black;">2</span>) <span style="color: black;"># 由微积分知识<span style="color: black;">晓得</span>积分结果为圆周率pi的一半 </span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">参考链接:</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">http://www.scipy.org</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">http://reverland.org/python/2012/08/24/scipy</p>
<h3 style="color: black; text-align: left; margin-bottom: 10px;"><strong style="color: blue;">03 Matplotlib</strong></h3>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">不论是数据挖掘还是数学建模,都要面对数据可视化的问题。<span style="color: black;">针对</span>Python<span style="color: black;">来讲</span>,Matplotlib是最著名的绘图库,<span style="color: black;">重点</span>用于二维绘图,当然<span style="color: black;">亦</span><span style="color: black;">能够</span>进行简单的三维绘图。它不仅<span style="color: black;">供给</span>了一整套和MATLAB<span style="color: black;">类似</span>但更为丰富的命令,让<span style="color: black;">咱们</span><span style="color: black;">能够</span>非常快捷地用Python可视化数据,<span style="color: black;">况且</span><span style="color: black;">准许</span>输出达到出版质量的多种图像格式。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">Matplotlib的安装并<span style="color: black;">无</span>什么<span style="color: black;">尤其</span>之处,<span style="color: black;">能够</span><span style="color: black;">经过</span>“pip install matplotlib”命令安装<span style="color: black;">或</span><span style="color: black;">自动</span>下载源代码安装,在Ubuntu下<span style="color: black;">亦</span><span style="color: black;">能够</span>用类似的命令安装,命令如下:</p>sudo apt-<span style="color: black;">get</span>install python-matplotlib<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">必须</span><span style="color: black;">重视</span>的是,Matplotlib的上级依赖库相对较多,手动安装的时候,<span style="color: black;">必须</span>逐一把这些依赖库都安装好。安装完成后就<span style="color: black;">能够</span>牛刀小试了。下面是一个简单的作图例子,如代码<span style="color: black;">名单</span>2-29所示,它基本<span style="color: black;">包括</span>了Matplotlib作图的关键要素,作图效果如图2-5所示。</p><strong style="color: blue;">代码<span style="color: black;">名单</span>2-29 Matplotlib作图示例</strong><span style="color: black;"># -*- coding: utf-8 -*-</span>
<span style="color: black;">import</span> numpy <span style="color: black;">as</span> np
<span style="color: black;">import</span> matplotlib.pyplot <span style="color: black;">as</span> plt <span style="color: black;"># 导入Matplotlib </span>
x = np.linspace(<span style="color: black;">0</span>, <span style="color: black;">10</span>, <span style="color: black;">1000</span>) <span style="color: black;"># 作图的变量自变量 </span>
y = np.sin(x) + <span style="color: black;">1</span> <span style="color: black;"># 因变量y </span>z = np.cos(x**<span style="color: black;">2</span>) + <span style="color: black;">1</span> <span style="color: black;"># 因变量z </span>
plt.figure(figsize = (<span style="color: black;">8</span>, <span style="color: black;">4</span>)) <span style="color: black;"># 设置图像<span style="color: black;">体积</span> </span>
plt.plot(x,y,label = <span style="color: black;">$\sin x+1$</span>, color = <span style="color: black;">red</span>, linewidth = <span style="color: black;">2</span>)
<span style="color: black;"># 作图,设置标签、线条颜色、线条<span style="color: black;">体积</span> </span>
plt.plot(x, z, <span style="color: black;">b--</span>, label = <span style="color: black;">$\cos x^2+1$</span>) <span style="color: black;"># 作图,设置标签、线条类型 </span>
plt.xlabel(<span style="color: black;">Time(s) </span>) <span style="color: black;"># x轴名<span style="color: black;">叫作</span> </span>
plt.ylabel(<span style="color: black;">Volt</span>) <span style="color: black;"># y轴名<span style="color: black;">叫作</span> </span>
plt.title(<span style="color: black;">A Simple Example</span>) <span style="color: black;"># 标题 </span>
plt.ylim(<span style="color: black;">0</span>, <span style="color: black;">2.2</span>) <span style="color: black;"># <span style="color: black;">表示</span>的y轴范围 </span>
plt.legend() <span style="color: black;"># <span style="color: black;">表示</span>图例 </span>plt.show()<span style="color: black;"># <span style="color: black;">表示</span>作图结果 </span>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/pgc-image/f702d6641e6e422e888ca5fa000dca18~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1723897148&x-signature=dTfTbu127q8lwPDdwRWmiKsgHp8%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">▲图2-5 Matplotlib的作图效果展示</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">倘若</span>读者<span style="color: black;">运用</span>的是中文标签,就会<span style="color: black;">发掘</span>中文标签<span style="color: black;">没法</span>正常<span style="color: black;">表示</span>,这是<span style="color: black;">由于</span>Matplotlib的默认字体是英文字体,<span style="color: black;">处理</span><span style="color: black;">办法</span>是在作图之前手动指定默认字体为中文字体,如黑体(Sim-Hei),命令如下:</p>plt.rcParams[<span style="color: black;">font.sans-serif</span>] = [<span style="color: black;">SimHei</span>] <span style="color: black;"># 用来正常<span style="color: black;">表示</span>中文标签 </span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">其次,<span style="color: black;">保留</span>作图图像时,负号有可能<span style="color: black;">不可</span><span style="color: black;">表示</span>,对此<span style="color: black;">能够</span><span style="color: black;">经过</span>以下代码<span style="color: black;">处理</span>:</p>plt.rcParams[<span style="color: black;">axes.unicode_minus</span>] = <span style="color: black;">False</span> <span style="color: black;"># <span style="color: black;">处理</span><span style="color: black;">保留</span>图像是负号-<span style="color: black;">表示</span>为方块的问题 </span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">这儿</span>有一个小<span style="color: black;">意见</span>:有时间多去Matplotlib<span style="color: black;">供给</span>的“画廊”欣赏用它做出的<span style="color: black;">美丽</span><span style="color: black;">照片</span>,<span style="color: black;">亦</span>许你就会慢慢爱上Matplotlib作图了。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">画廊网址:</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">http://matplotlib.org/gallery.html</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">参考链接:</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">http://matplotlib.org</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">http://reverland.org/python/2012/09/07/matplotlib-tutorial</p>
<h3 style="color: black; text-align: left; margin-bottom: 10px;"><strong style="color: blue;">04 pandas</strong></h3>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">pandas是Python下最强大的数据分析和探索工具。它<span style="color: black;">包括</span>高级的数据结构和精巧的工具,使得用户在Python中处理数据非常快速和简单。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">pandas建造在NumPy之上,它使得以NumPy为中心的应用<span style="color: black;">运用</span>起来<span style="color: black;">更易</span>。pandas的名<span style="color: black;">叫作</span>来自于面板数据(Panel Data)和Python数据分析(Data Analysis),它最初<span style="color: black;">做为</span>金融数据分析工具被<span style="color: black;">研发</span>,由AQR Capital Management于2008年4月<span style="color: black;">研发</span><span style="color: black;">面世</span>,并于2009年底开源出来。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">pandas的功能非常强大,支持类似SQL的数据增、删、查、改,并且带有丰富的数据处理函数;支持时间序列分析功能;支持灵活处理缺失数据;等等。事实上,单纯地用pandas这个工具就足以写一本书,读者<span style="color: black;">能够</span>阅读pandas的<span style="color: black;">重点</span>作者之一Wes Mc-Kinney写的《利用Python进行数据分析》来学习更<span style="color: black;">仔细</span>的内容。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">1. 安装</strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">pandas的安装相对<span style="color: black;">来讲</span>比较容易<span style="color: black;">有些</span>,只要安装好NumPy之后,就<span style="color: black;">能够</span>直接安装了,<span style="color: black;">经过</span>pip install pandas命令或下载源码后<span style="color: black;">经过</span>python setup.py install命令安装均可。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">因为</span><span style="color: black;">咱们</span>频繁用到读取和写入Excel,但默认的pandas还<span style="color: black;">不可</span>读写Excel文件,<span style="color: black;">必须</span>安装xlrd(读)度和xlwt(写)库<span style="color: black;">才可</span>支持Excel的读写。为Python添加读取/写入Excel功能的命令如下:</p>
pip <span style="color: black;">install</span> xlrd <span style="color: black;"># 为Python添加读取Excel的功能 </span>
pip <span style="color: black;">install</span> xlwt <span style="color: black;"># 为Python添加写入Excel的功能 </span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">2. <span style="color: black;">运用</span></strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在后面的章节中,<span style="color: black;">咱们</span>会逐步展示pandas的强大功能,而在本节,<span style="color: black;">咱们</span>先以简单的例子一睹为快。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">首要</span>,pandas基本的数据结构是Series和DataFrame。Series顾名思义<span style="color: black;">便是</span>序列,类似一维数组;DataFrame则相当于一张二维的表格,类似二维数组,它的每一列都是一个Series。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">为了定位Series中的元素,pandas<span style="color: black;">供给</span>了Index这一对象,<span style="color: black;">每一个</span>Series都会带有一个对应的Index,用来标记<span style="color: black;">区别</span>的元素,Index的内容不<span style="color: black;">必定</span>是数字,<span style="color: black;">亦</span><span style="color: black;">能够</span>是字母、中文等,它类似于SQL中的主键。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">类似的,DataFrame相当于多个带有<span style="color: black;">一样</span>Index的Series的组合(本质是Series的容器),<span style="color: black;">每一个</span>Series都带有一个<span style="color: black;">独一</span>的表头,用来标识<span style="color: black;">区别</span>的Series。pandas中常用操作的示例如代码<span style="color: black;">名单</span>2-30所示。</p><strong style="color: blue;">代码<span style="color: black;">名单</span>2-30 pandas中的常用操作</strong><span style="color: black;"># -*- coding: utf-8 -*- </span>
<span style="color: black;">import</span> numpy <span style="color: black;">as</span> np
<span style="color: black;">import</span> pandas <span style="color: black;">as</span> pd <span style="color: black;"># <span style="color: black;">一般</span>用pd<span style="color: black;">做为</span>pandas的别名。 </span>
s = pd.Series([<span style="color: black;">1</span>,<span style="color: black;">2</span>,<span style="color: black;">3</span>], index=[<span style="color: black;">a</span>, <span style="color: black;">b</span>, <span style="color: black;">c</span>]) <span style="color: black;"># 创建一个序列s </span>
<span style="color: black;"># 创建一个表</span>
d = pd.DataFrame([[<span style="color: black;">1</span>, <span style="color: black;">2</span>, <span style="color: black;">3</span>], [<span style="color: black;">4</span>, <span style="color: black;">5</span>, <span style="color: black;">6</span>]], columns=[<span style="color: black;">a</span>, <span style="color: black;">b</span>, <span style="color: black;">c</span>])
d2 = pd.DataFrame(s) <span style="color: black;"># <span style="color: black;">亦</span><span style="color: black;">能够</span>用已有的序列来创建数据框 </span>
d.head() <span style="color: black;"># 预览前5行数据 </span>
d.describe() <span style="color: black;"># 数据基本统计量 </span>
<span style="color: black;"># 读取文件,<span style="color: black;">重视</span>文件的存储路径<span style="color: black;">不可</span>带有中文,否则读取可能出错。</span>
pd.read_excel(<span style="color: black;">data.xls</span>) <span style="color: black;"># 读取Excel文件,创建DataFrame。 </span>
pd.read_csv(<span style="color: black;">data.csv</span>, encoding=<span style="color: black;">utf-8</span>) <span style="color: black;"># 读取文本格式的数据,<span style="color: black;">通常</span>用encoding指定编码。</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">因为</span>pandas是本书的主力工具,在后面将会频繁<span style="color: black;">运用</span>它,<span style="color: black;">因此呢</span><span style="color: black;">这儿</span><span style="color: black;">再也不</span><span style="color: black;">仔细</span>介绍,后文会更加详尽地讲解pandas的<span style="color: black;">运用</span><span style="color: black;">办法</span>。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">参考链接:</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">http://pandas.pydata.org/pandas-docs/stable/</p>
<h3 style="color: black; text-align: left; margin-bottom: 10px;"><strong style="color: blue;">05 StatsModels</strong></h3>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">pandas着重于数据的读取、处理和探索,而StatsModels则更加注重数据的统计建模分析,它使得Python有了R语言的味道。StatsModels支持与pandas进行数据交互,<span style="color: black;">因此呢</span>,它与pandas结合<span style="color: black;">作为</span>Python下强大的数据挖掘组合。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">安装StatsModels相当简单,既<span style="color: black;">能够</span><span style="color: black;">经过</span>pip命令安装,又<span style="color: black;">能够</span><span style="color: black;">经过</span>源码安装。<span style="color: black;">针对</span>Windows用户<span style="color: black;">来讲</span>,官网上<span style="color: black;">乃至</span><span style="color: black;">已然</span>有编译好的exe文件可供下载。<span style="color: black;">倘若</span>手动安装的话,<span style="color: black;">必须</span><span style="color: black;">自动</span><span style="color: black;">处理</span>好依赖问题,StatsModels依赖于pandas(当然<span style="color: black;">亦</span>依赖于pandas所依赖的库),<span style="color: black;">同期</span>还依赖于Pasty(一个描述统计的库)。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">运用</span>StatsModels进行ADF<span style="color: black;">安稳</span>性检验,如代码<span style="color: black;">名单</span>2-31所示。</p><strong style="color: blue;">代码<span style="color: black;">名单</span>2-31 <span style="color: black;">运用</span>StatsModels进行ADF<span style="color: black;">安稳</span>性检验</strong><span style="color: black;"># -*- coding: utf-8 -*- </span>
<span style="color: black;">from</span> statsmodels.tsa.stattools <span style="color: black;">import</span> adfuller <span style="color: black;">as</span> ADF <span style="color: black;"># 导入ADF检验 </span>
<span style="color: black;">import</span> numpy <span style="color: black;">as</span> np
ADF(np.random.rand(<span style="color: black;">100</span>)) <span style="color: black;"># 返回的结果有ADF值、p值等</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">参考链接:</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">http://statsmodels.sourceforge.net/stable/index.html</p>
<h3 style="color: black; text-align: left; margin-bottom: 10px;"><strong style="color: blue;">06 scikit-learn</strong></h3>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">从该库的名字<span style="color: black;">能够</span>看出,这是一个与<span style="color: black;">设备</span>学习<span style="color: black;">关联</span>的库。不错,scikit-learn是Python下强大的<span style="color: black;">设备</span>学习工具包,它<span style="color: black;">供给</span>了完善的<span style="color: black;">设备</span>学习工具箱,<span style="color: black;">包含</span>数据预处理、<span style="color: black;">归类</span>、回归、聚类、预测、模型分析等。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">scikit-learn依赖于NumPy、SciPy和Matplotlib,<span style="color: black;">因此呢</span>,只<span style="color: black;">必须</span>提前安装好这几个库,<span style="color: black;">而后</span>安装scikit-learn基本上就<span style="color: black;">无</span>什么问题了,安装<span style="color: black;">办法</span>跟前几个库的安装<span style="color: black;">同样</span>,<span style="color: black;">能够</span><span style="color: black;">经过</span>pip install scikit-learn命令安装,<span style="color: black;">亦</span><span style="color: black;">能够</span>下载源码<span style="color: black;">自动</span>安装。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">运用</span>scikit-learn创建<span style="color: black;">设备</span>学习的模型很简单,示例如代码<span style="color: black;">名单</span>2-32所示。</p><strong style="color: blue;">代码<span style="color: black;">名单</span>2-32 <span style="color: black;">运用</span>scikit-learn创建<span style="color: black;">设备</span>学习模型</strong><span style="color: black;"># -*- coding: utf-8 -*- </span>
<span style="color: black;">from</span> sklearn.linear_model <span style="color: black;">import</span>LinearRegression<span style="color: black;"># 导入线性回归模型 </span>
model = LinearRegression() <span style="color: black;"># <span style="color: black;">创立</span>线性回归模型 </span>
<span style="color: black;">print</span>(model) <p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">1. 所有模型<span style="color: black;">供给</span>的接口有</strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">针对</span>训练模型<span style="color: black;">来讲</span>是model.fit(),<span style="color: black;">针对</span>监督模型<span style="color: black;">来讲</span>是fit(X, y),<span style="color: black;">针对</span>非监督模型是fit(X)。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">2. 监督模型<span style="color: black;">供给</span>如下接口</strong></p>model.predict(X_new):预测新样本。model.predict_proba(X_new):预测概率,仅对某些模型有用(<span style="color: black;">例如</span>LR)。model.score():得分越高,fit越好。<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">3. 非监督模型<span style="color: black;">供给</span>如下接口</strong></p>model.transform():从数据中学到新的“基空间”。model.fit_transform():从数据中学到新的基并将这个数据<span style="color: black;">根据</span>这组“基”进行转换。<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">Scikit-learn本身<span style="color: black;">供给</span>了<span style="color: black;">有些</span>实例数据供<span style="color: black;">咱们</span>上手学习,比较<span style="color: black;">平常</span>的有安德森鸢尾花卉数据集、手写图像数据集等。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">安德森鸢尾花卉数据集有150个鸢尾花的尺寸观测值,如萼片长度和宽度,花瓣长度和宽度;还有它们的亚属:山鸢尾(iris setosa)、变色鸢尾(iris versicolor)和维吉尼亚鸢尾(iris virginica)。导入iris数据集并<span style="color: black;">运用</span>该数据训练SVM模型,如代码<span style="color: black;">名单</span>2-33所示。</p><strong style="color: blue;">代码<span style="color: black;">名单</span>2-33 导入iris数据集并训练SVM模型</strong><span style="color: black;"># -*- coding: utf-8 -*- </span>
<span style="color: black;">from</span>sklearn<span style="color: black;">import</span> datasets <span style="color: black;"># 导入数据集 </span>
iris = datasets.load_iris() <span style="color: black;"># 加载数据集 </span>
<span style="color: black;">print</span>(iris.data.shape) <span style="color: black;"># 查看数据集<span style="color: black;">体积</span> </span>
<span style="color: black;">from</span> sklearn <span style="color: black;">import</span> svm <span style="color: black;"># 导入SVM模型 </span>
clf = svm.LinearSVC() <span style="color: black;"># <span style="color: black;">创立</span>线性SVM<span style="color: black;">归类</span>器</span>
clf.fit(iris.data, iris.target) <span style="color: black;"># 用数据训练模型 </span>
clf.predict([[ <span style="color: black;">5.0</span>, <span style="color: black;">3.6</span>, <span style="color: black;">1.3</span>, <span style="color: black;">0.25</span>]]) <span style="color: black;"># 训练好模型之后,输入新的数据进行预测 </span>
clf.coef_ <span style="color: black;"># 查看训练好模型的参数 </span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">参考链接:</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">http://scikit-learn.org/stable/</p>
<h3 style="color: black; text-align: left; margin-bottom: 10px;">07 Keras</h3>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">scikit-learn<span style="color: black;">已然</span>足够强大了,然而它并<span style="color: black;">无</span><span style="color: black;">包括</span>这一强大的模型—人工神经网络。人工神经网络是功能相当强大<span style="color: black;">然则</span>原理又相当简单的模型,在语言处理、图像识别等<span style="color: black;">行业</span>都有重要的<span style="color: black;">功效</span>。近年来<span style="color: black;">逐步</span>流行的“深度学习”算法,实质上<span style="color: black;">亦</span>是一种神经网络,可见在Python中实现神经网络是非常必要的。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">本书用Keras库来搭建神经网络。事实上,Keras并非简单的神经网络库,而是一个基于Theano的强大的深度学习库,利用它不仅<span style="color: black;">能够</span>搭建普通的神经网络,还<span style="color: black;">能够</span>搭建<span style="color: black;">各样</span>深度学习模型,如自编码器、循环神经网络、递归神经网络、卷积神经网络等。<span style="color: black;">因为</span>它是基于Theano的,<span style="color: black;">因此呢</span>速度<span style="color: black;">亦</span>相当快。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">Theano<span style="color: black;">亦</span>是Python的一个库,它<span style="color: black;">是由于</span>深度学习专家Yoshua Bengio带领的实验室<span style="color: black;">研发</span>出来的,用来定义、优化和<span style="color: black;">有效</span>地<span style="color: black;">处理</span>多维数组数据对应数学表达式的模拟估计问题。它<span style="color: black;">拥有</span><span style="color: black;">有效</span>实现符号分解、高度优化的速度和稳定性等特点,最重要的是它还实现了GPU加速,使得密集型数据的处理速度是CPU的数十倍。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">用Theano就<span style="color: black;">能够</span>搭建起<span style="color: black;">有效</span>的神经网络模型,然而<span style="color: black;">针对</span>普通读者<span style="color: black;">来讲</span>门槛还是相当高的。Keras正是为此而生,它大大简化了搭建<span style="color: black;">各样</span>神经网络模型的<span style="color: black;">过程</span>,<span style="color: black;">准许</span>普通用户<span style="color: black;">容易</span>地搭建并求解<span style="color: black;">拥有</span>几百个输入节点的深层神经网络,<span style="color: black;">况且</span>定制的自由度非常大,读者<span style="color: black;">乃至</span><span style="color: black;">因此呢</span>惊呼:搭建神经网络<span style="color: black;">能够</span>如此简单!</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">1. 安装</strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">安装Keras之前<span style="color: black;">首要</span><span style="color: black;">必须</span>安装NumPy、SciPy和Theano。安装Theano之前<span style="color: black;">首要</span><span style="color: black;">必须</span>准备一个C++编译器,这在Linux系统下是自带的。<span style="color: black;">因此呢</span>,在Linux系统下安装Theano和Keras都非常简单,只<span style="color: black;">必须</span>下载源代码,<span style="color: black;">而后</span>用python setup.py install安装就行了,<span style="color: black;">详细</span><span style="color: black;">能够</span>参考官方文档。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">可是在Windows系统下就<span style="color: black;">无</span><span style="color: black;">那样</span>简单了,<span style="color: black;">由于</span>它<span style="color: black;">无</span>现成的编译环境,<span style="color: black;">通常</span>而言是先安装MinGW(Windows系统下的GCC和G++),<span style="color: black;">而后</span>再安装Theano(提前装好NumPy等依赖库),最后安装Keras,<span style="color: black;">倘若</span>要实现GPU加速,还<span style="color: black;">必须</span>安装和配置CUDA。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">值得一提的是,在Windows系统下的Keras速度会大打折扣,<span style="color: black;">因此呢</span>,想要在神经网络、深度学习做深入<span style="color: black;">科研</span>的读者,请在Linux系统下搭建相应的环境。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">参考链接:</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">http://deeplearning.net/software/theano/install.html#install</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><strong style="color: blue;">2. <span style="color: black;">运用</span></strong></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">用Keras搭建神经网络模型的过程相当简单,<span style="color: black;">亦</span>相当直观,就像搭积木<span style="color: black;">通常</span>,<span style="color: black;">经过</span>短短几十行代码,就<span style="color: black;">能够</span>搭建起一个非常强大的神经网络模型,<span style="color: black;">乃至</span>是深度学习模型。简单搭建一个MLP(多层感知器),如代码<span style="color: black;">名单</span>2-34所示。</p><strong style="color: blue;">代码<span style="color: black;">名单</span>2-34 搭建一个MLP(多层感知器)</strong><span style="color: black;"># -*- coding: utf-8 -*- </span>
<span style="color: black;">from</span>keras.models import Sequential<span style="color: black;">from</span> keras.layers.core import Dense, Dropout, Activation
<span style="color: black;">from</span> keras.optimizers import SGD
model = Sequential() <span style="color: black;"># 模型初始化 </span>
model.<span style="color: black;">add</span>(Dense(<span style="color: black;">20</span>, <span style="color: black;">64</span>)) <span style="color: black;"># 添加输入层(20节点)、<span style="color: black;">第1</span><span style="color: black;">隐匿</span>层(64节点)的连接</span>
model.<span style="color: black;">add</span>(Activation(<span style="color: black;">tanh</span>)) <span style="color: black;"># <span style="color: black;">第1</span><span style="color: black;">隐匿</span>层用tanh<span style="color: black;">做为</span>激活函数 </span>
model.<span style="color: black;">add</span>(Dropout(<span style="color: black;">0.5</span>)) <span style="color: black;"># <span style="color: black;">运用</span>Dropout防止过拟合 </span>
model.<span style="color: black;">add</span>(Dense(<span style="color: black;">64</span>, <span style="color: black;">64</span>)) <span style="color: black;"># 添加<span style="color: black;">第1</span><span style="color: black;">隐匿</span>层(64节点)、第二隐藏层(64节点)的连接</span>
model.<span style="color: black;">add</span>(Activation(<span style="color: black;">tanh</span>)) <span style="color: black;"># 第二<span style="color: black;">隐匿</span>层用tanh<span style="color: black;">做为</span>激活函数 </span>
model.<span style="color: black;">add</span>(Dropout(<span style="color: black;">0.5</span>)) <span style="color: black;"># <span style="color: black;">运用</span>Dropout防止过拟合 </span>
model.<span style="color: black;">add</span>(Dense(<span style="color: black;">64</span>, <span style="color: black;">1</span>)) <span style="color: black;"># 添加第二<span style="color: black;">隐匿</span>层(64节点)、输出层(1节点)的连接 </span>model.<span style="color: black;">add</span>(Activation(<span style="color: black;">sigmoid</span>)) <span style="color: black;"># 输出层用sigmoid<span style="color: black;">做为</span>激活函数 </span>
sgd = SGD(lr=<span style="color: black;">0.1</span>, decay=<span style="color: black;">1e-6</span>, momentum=<span style="color: black;">0.9</span>, nesterov=True) <span style="color: black;"># 定义求解算法 </span>
model.compile(loss=<span style="color: black;">mean_squared_error</span>, optimizer=sgd)<span style="color: black;"># 编译生成模型,损失函数为平均误差平方和 </span>
model.fit(X_train, y_train, nb_epoch=<span style="color: black;">20</span>, batch_size=<span style="color: black;">16</span>) <span style="color: black;"># 训练模型 </span>
score = model.evaluate(X_test, y_test, batch_size=<span style="color: black;">16</span>) <span style="color: black;"># 测试模型</span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">要<span style="color: black;">重视</span>的是,Keras的预测函数跟scikit-learn有所差别,Keras用model.predict()<span style="color: black;">办法</span>给出概率,用model.predict_classes()给出<span style="color: black;">归类</span>结果。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">参考链接:</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">https://keras.io/</p>
<h3 style="color: black; text-align: left; margin-bottom: 10px;">08 Gensim</h3>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在Gensim官网中,它对自己的简介<span style="color: black;">仅有</span>一句话:topic modelling for humans!</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">Gensim用来处理语言方面的任务,如文本<span style="color: black;">类似</span>度计算、LDA、Word2Vec等,这些<span style="color: black;">行业</span>的任务<span style="color: black;">常常</span><span style="color: black;">必须</span>比较多的背景知识。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在这一节中,<span style="color: black;">咱们</span>只是提醒读者有这么一个库的存在,<span style="color: black;">况且</span>这个库很强大,<span style="color: black;">倘若</span>读者想深入<span style="color: black;">认识</span>这个库,<span style="color: black;">能够</span>去阅读官方<span style="color: black;">帮忙</span>文档或参考链接。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">值得一提的是,Gensim把Google在2013年开源的著名的词向量构造工具Word2Vec编译好了,<span style="color: black;">做为</span>它的子库,<span style="color: black;">因此呢</span><span style="color: black;">必须</span>用到Word2Vec的读者<span style="color: black;">亦</span><span style="color: black;">能够</span>直接<span style="color: black;">运用</span>Gensim,而无须<span style="color: black;">自动</span>编译了。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">Gensim的作者对Word2Vec的代码进行了优化,<span style="color: black;">因此</span>它在Gensim下的表现比原生的Word2Vec还要快。(为了实现加速,<span style="color: black;">必须</span>准备C++编译器环境,<span style="color: black;">因此呢</span>,<span style="color: black;">意见</span><span style="color: black;">运用</span>Gensim的Word2Vec的读者在Linux系统环境下运行。)</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">下面是一个Gensim<span style="color: black;">运用</span>Word2Vec的简单例子,如代码<span style="color: black;">名单</span>2-35所示。</p><strong style="color: blue;">代码<span style="color: black;">名单</span>2-35 Gensim<span style="color: black;">运用</span>Word2Vec的简单示例</strong><span style="color: black;"># -*- coding: utf-8 -*-</span>
<span style="color: black;">import</span> gensim, logging
logging.basicConfig(format=<span style="color: black;">%(asctime)s : %(levelname)s : %(message)s</span>, level= logging.INFO)
<span style="color: black;"># logging是用来输出训练日志 </span>
<span style="color: black;"># 分好词的句子,<span style="color: black;">每一个</span>句子以词列表的形式输入</span>
sentences = [[<span style="color: black;">first</span>, <span style="color: black;">sentence</span>], [<span style="color: black;">second</span>, <span style="color: black;">sentence</span>]]
<span style="color: black;"># 用以上句子训练词向量模型 </span>
model = gensim.models.Word2Vec(sentences, min_count=<span style="color: black;">1</span>)
<span style="color: black;">print</span>(model[<span style="color: black;">sentence</span>]) <span style="color: black;"># 输出单词sentence的词向量。 </span>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">最后,<span style="color: black;">博主</span>想说:我是一名python<span style="color: black;">研发</span>工程师,</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">整理了一套最新的python系统学习教程,</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">想要这些资料的<span style="color: black;">能够</span>关注私信<span style="color: black;">博主</span>“01”<span style="color: black;">就可</span>(免费分享哦)<span style="color: black;">期盼</span>能对你有所<span style="color: black;">帮忙</span></p>
外贸论坛是我们的,责任是我们的,荣誉是我们的,成就是我们的,辉煌是我们的。 楼主的文章深得我心,表示由衷的感谢! 论坛的成果是显著的,但我们不能因为成绩而沾沾自喜。
页:
[1]