优化 | 基于值迭代网络的多智能体路由问题
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/c43e2c994e8b46bf81e7162ba4d70db7~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1725640122&x-signature=EjzA%2FGzk6kZS5rkgJZxLtgW9FT8%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">『运筹OR帷幄』原创</span></span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">作者:高建琦</span></strong></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">编者按</span></span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">本文节选自作者参加“OR 青年计划”的结题报告,<span style="color: black;">针对</span>天舒老师在活动<span style="color: black;">时期</span><span style="color: black;">举荐</span>的一篇论文《Multi -Agent Routing Value Iteration Network》进行重点详述。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">1 论文<span style="color: black;">源自</span></span></strong></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">论文<span style="color: black;">源自</span>自于天舒老师发布的 paperlist。<span style="color: black;">日前</span>我的<span style="color: black;">科研</span>课题中<span style="color: black;">触及</span>到多<span style="color: black;">设备</span>人任务分配和路径规划,这篇<span style="color: black;">文案</span>里面部分知识点<span style="color: black;">能够</span>用于我的课题,<span style="color: black;">因此</span>对该文做进一步的详述。该论文<span style="color: black;">发布</span>于 ICML 2020,代码和数据已开源:</p>https://github.com/uber/MARVIN。
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/bb5ab59c83bd47b9983488453924672e~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1725640122&x-signature=gDH5MrCbgl%2FA1XNqXlGV0dZilo4%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">2 论文概述</span></strong></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">这篇论文提出了一种基于图神经网络的模型(Multi-Agent Routing Value Iteration Network, MARVIN),该模型能够在交通<span style="color: black;">要求</span>动态变化的稀疏连接图中完成基于值迭代网络的多智能体路由问题。<span style="color: black;">另外</span>,该<span style="color: black;">文案</span>还<span style="color: black;">运用</span>通信模块使多个 agent 能够在线协调,更有效地适应变化。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">实验方面,该论文中创建了一个模拟环境,以模拟自动驾驶汽车在未知的最小边缘覆盖和交通<span style="color: black;">要求</span>下执行的真实地图。结果<span style="color: black;">表示</span>:1)该论文模型在总成本和运行时间方面<span style="color: black;">明显</span>优于传统的求解器;2)模型<span style="color: black;">拥有</span>较好的扩展性。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">3 背景</span></strong></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">针对</span>一群<span style="color: black;">设备</span>人制定执行任务的路线问题,<span style="color: black;">一般</span><span style="color: black;">能够</span><span style="color: black;">运用</span>TSP(Toth Vigo, 2002) <span style="color: black;">或</span>VRP(TothVigo, 2002) 模型去<span style="color: black;">处理</span>。TSP 或者 VRP 问题是计算机科学中的经典的 NP-Hard 组合优化问题。尽管有<span style="color: black;">非常多</span>经典的求解器 (Applegate et al, 2006;Helsgaun,2017) 来<span style="color: black;">处理</span>这些问题,但<span style="color: black;">她们</span><span style="color: black;">一般</span>为离线计算,<span style="color: black;">没法</span>在线<span style="color: black;">调节</span>其<span style="color: black;">处理</span><span style="color: black;">方法</span>。<span style="color: black;">另一</span>这些求解器中不<span style="color: black;">包括</span>代理之间的在线通信。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">这篇<span style="color: black;">文案</span>需要<span style="color: black;">处理</span>的是多车辆自主建图问题,<span style="color: black;">详细</span>描述为:给定一组车辆,<span style="color: black;">按照</span>交通<span style="color: black;">要求</span>找到找到最小的路径代价去完成城市街区的地图构建,其中<span style="color: black;">每一个</span>城市道路<span style="color: black;">最少</span>遍历<span style="color: black;">必定</span>的次数,并且遍历数量事先是未知的,这是<span style="color: black;">由于</span>一个区域可能<span style="color: black;">因为</span>闭塞、交通拥挤、定位错误、传感器故障、<span style="color: black;">卑劣</span>天气<span style="color: black;">要求</span>等<span style="color: black;">原由</span>需要重新收集。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">4 <span style="color: black;">关联</span>工作</span></strong></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">4.1 车辆路由问题(VRP)</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">现有 VRP 求解器可分为两类:传统迭代求解器和深度学习<span style="color: black;">办法</span>。传统迭代求解器<span style="color: black;">一般</span><span style="color: black;">没法</span>在线<span style="color: black;">运用</span>,并且在这些求解器中 agent 之间<span style="color: black;">不可</span>进行任<span style="color: black;">安在</span>线通信,以纳入 agent 当地的观察结果。与传统求解器相比,<span style="color: black;">因为</span>广泛采用<span style="color: black;">重视</span>力机制 (Vinyals et al., 2015; Vaswani et al., 2017) 和图神经网络 (Kipf Welling, 2017; Khalil et al., 2017) 技术,深度学习<span style="color: black;">办法</span><span style="color: black;">已然</span><span style="color: black;">作为</span>求解组合问题的有效近似<span style="color: black;">办法</span>。<span style="color: black;">一样</span>,深度学习<span style="color: black;">办法</span>都不是能处理动态环境下的 VRP 问题。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">4.2 值迭代网络(VIN)</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">基于深度学习的路径规划<span style="color: black;">办法</span><span style="color: black;">亦</span><span style="color: black;">表示</span>出良好的性能,<span style="color: black;">重点</span>工作<span style="color: black;">包含</span>:</p>值迭代网络 (Tamar et al.,2016),它在神经网络中嵌入了来自值迭代 (Bellman, 1954) 的结构偏差;门通路径规划网络 (Lee et al.,2018),<span style="color: black;">运用</span>通用长短期记忆 (LSTM) 改变了最大池化层(Hochreiter Schmidhuber, 1997),<span style="color: black;">明显</span><span style="color: black;">加强</span>了训练稳定性,有助于延长迭代次数;广义值迭代网络 (GVIN)(Niu et al.,2018),<span style="color: black;">经过</span>用图(Graph)中的边(edge)替换过渡(transition)。<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">然而,<span style="color: black;">以上</span>算法仅能<span style="color: black;">处理</span>简单的路径规划环境,如二维迷宫,而 Dijkstra 的最短路径算法在<span style="color: black;">处理</span>这些问题上<span style="color: black;">已然</span>非常有效。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">4.3 图神经网络(GNN)</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">图神经网络 (Scarselli et al.,2009;Wu et al.,2019) <span style="color: black;">供给</span>了一种学习图表征的<span style="color: black;">办法</span>,这些表征既与图中的节点数量无关,又与局部邻域中的置换不变式无关。来自节点邻域的信息<span style="color: black;">能够</span><span style="color: black;">经过</span>图卷积 (Kipf Welling, 2017)、递归神经网络 (Li et al.,2016) 和<span style="color: black;">重视</span>机制 (Yun et al.,2019;Velickovic,2018 年) 等技术聚合得到。图<span style="color: black;">重视</span>力模块<span style="color: black;">亦</span>出<span style="color: black;">此刻</span>基于深度学习的 VRP/TSP 求解器中,如 AM (Kool et al.,2019) 和 EAN (Deudon et al.,2018)。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">4.4 多智能体通信</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">Sukhbaatar 等人 (2016) 对 agent 之间的信息进行了简单求和。Jiang 和 Lu(2018) 利用<span style="color: black;">重视</span>机制识别 agent 之间对自己有用的信息。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">5 问题定义</span></strong></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/7edbfa67203b4b68b9770968f0c3a5b2~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1725640122&x-signature=p4Gp9NyUBkPNOzfJNQHon3q2cgo%3D" style="width: 50%; margin-bottom: 20px;"></div>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/32baf7bc72764c749e3015aeea617a98~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1725640122&x-signature=9k858gJM7Edim6RAdrJF9FegLuI%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">6 MARVIN 框架</span></strong></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">在整体框架中<span style="color: black;">重点</span><span style="color: black;">包含</span>两个模块:值迭代模块(图 1.B)和通信模块(图 1.C)。</p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p26-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/c712ed451e1049fba9baaa66bd7b9157~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1725640122&x-signature=mbmtIJ35ISm5dh8oWwGarR9uGNI%3D" style="width: 50%; margin-bottom: 20px;"></div>
<h1 style="color: black; text-align: left; margin-bottom: 10px;">6.1 值迭代模块</h1>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">该模块工作流程为:<span style="color: black;">针对</span>给定的初始节点特征,<span style="color: black;">咱们</span>将其输入图神经网络,<span style="color: black;">而后</span>经过<span style="color: black;">必定</span>次数的迭代将特征解码输出给<span style="color: black;">以上</span>节点的标量值函数,最后<span style="color: black;">选取</span><span style="color: black;">拥有</span>最大的特征值的节点<span style="color: black;">做为</span><span style="color: black;">咱们</span>下一个<span style="color: black;">目的</span>点。</p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/ccd5c01cb7d24b78b333fdf29cc05339~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1725640122&x-signature=t128oCPTU9onjtLf68VJ%2BXV2vXE%3D" style="width: 50%; margin-bottom: 20px;"></div>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/d2e91dc1dd5a4027bbf89dd32111df21~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1725640122&x-signature=fYbfwYdFuYUsD7kV60za7p4kjwo%3D" style="width: 50%; margin-bottom: 20px;"></div>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/a29dbfc54bb143de936bdcbc1a6242d3~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1725640122&x-signature=76de4EvjqWyZ2rSNOa5XtyJHvPI%3D" style="width: 50%; margin-bottom: 20px;"></div>
<h1 style="color: black; text-align: left; margin-bottom: 10px;">6.2 通信模块</h1>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;">因为</span>本文现实问题的部分观察性质,让 agent 交流<span style="color: black;">她们</span>的预期轨迹是有益的,从而<span style="color: black;">能够</span>有<span style="color: black;">更加多</span>的协作<span style="color: black;">行径</span>。为了达到这个<span style="color: black;">目的</span>,<span style="color: black;">咱们</span>提出的模型还<span style="color: black;">拥有</span>一个基于<span style="color: black;">重视</span>力的通信模块。</p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/0cddf808e47744af868643b50ad200a3~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1725640122&x-signature=Y%2Fxh6tNc3gdQq4t1Z%2Bo0sDVCZt4%3D" style="width: 50%; margin-bottom: 20px;"></div>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/587e3415aaca4a4d8340452d3291d30e~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1725640122&x-signature=zrRw%2F3lZA9i03F2XlsKiYQXb1dM%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">7 模型训练<span style="color: black;">办法</span></span></strong></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">本文提出的网络框架 MARVIN <span style="color: black;">能够</span><span style="color: black;">经过</span>模仿学习或强化学习两种<span style="color: black;">区别</span>的<span style="color: black;">办法</span>进行端到端训练。<span style="color: black;">针对</span>模仿学习,假设有一个依赖于完全可观察的环境的神谕“oracle",他<span style="color: black;">能够</span><span style="color: black;">处理</span>本文<span style="color: black;">说到</span>的规划问题。<span style="color: black;">然则</span>,<span style="color: black;">一般</span> oracle 会减慢训练过程,<span style="color: black;">由于</span><span style="color: black;">咱们</span>会为每次滚动迭代重新生成一个训练图,oracle 都会基于新图重新进行规划计算。<span style="color: black;">另一</span>,本文还<span style="color: black;">运用</span>强化学习对网络框架进行训练,较其他<span style="color: black;">办法</span>它更难于收敛,但<span style="color: black;">能够</span>直接优化<span style="color: black;">最后</span><span style="color: black;">目的</span>。</p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">7.1 模仿学习</p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/fcc7a5a8ef83489ca8ef62055179b208~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1725640122&x-signature=Gg2jhccDRnM8UxbTULMsjccJRes%3D" style="width: 50%; margin-bottom: 20px;"></div>
<h1 style="color: black; text-align: left; margin-bottom: 10px;">7.2 强化学习</h1>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/fc51219a4f75440ca6148a3937622125~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1725640122&x-signature=0kIcDntB3XqmDwXoJcK7XlwWSQU%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">8 实验结果</span></strong></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">实验结果<span style="color: black;">显示</span>:1)利用强化学习<span style="color: black;">办法</span>进行训练的 MARVIN 框架在总成本 cost 和运行时间runtime 方面<span style="color: black;">明显</span>优于传统的求解器;2)MARVIN 框架<span style="color: black;">拥有</span>较好的扩展性,小样本训练的模型拓展到<span style="color: black;">很强</span>样本<span style="color: black;">亦</span><span style="color: black;">拥有</span>比较好的性能表现。</p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/4495bf73133542c2afcfda6985d09af7~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1725640122&x-signature=Y%2FdlCDB0YQHPoHX1CbLZtn93dwQ%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">9 可拓展工作</span></strong></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-qvj2lq49k0/474a933aa0f043948e5cab2d6377b037~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1725640122&x-signature=9Ja1j1XBlxTGdW6uJKBwt380iFA%3D" style="width: 50%; margin-bottom: 20px;"></div>
软文发布论坛开幕式圆满成功。 http://www.fok120.com 谢谢、感谢、感恩、辛苦了、有你真好等。
页:
[1]