基于深度强化学习算法的无人机智能规避决策
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">源自:系统工程与电子技术</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">作者:吴冯国 陶伟 李辉 张建伟 郑成辰</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">“人工智能技术与咨询” 发布</p>
<h1 style="color: black; text-align: left; margin-bottom: 10px;">摘 要</h1>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">为<span style="color: black;">提高</span>无人机在<span style="color: black;">繁杂</span>空战场景中的存活率, 基于公开无人机空战博弈仿真平台, <span style="color: black;">运用</span>强化学习<span style="color: black;">办法</span>生成机动策略, 以深度双Q网络(double deep Q-network, DDQN)和深度确定性策略梯度(deep deterministic policy gradient, DDPG)算法为<span style="color: black;">基本</span>, 提出单元状态序列(unit state sequence, USS), 并采用门控循环单元(gated recurrent unit, GRU)融合USS中的态势特征, <span style="color: black;">增多</span><span style="color: black;">繁杂</span>空战场景下的状态特征识别能力和算法收敛能力。实验结果<span style="color: black;">显示</span>, 智能体在面对采用标准比例导引算法的导弹攻击时, 取得了98%的规避导弹存活率, 使无人机在多发导弹<span style="color: black;">同期</span>攻击的<span style="color: black;">繁杂</span>场景中, <span style="color: black;">亦</span>能够取得88%的存活率, 对比传统的简单机动模式, 无人机的存活率大幅<span style="color: black;">加强</span>。</span></span></p>
<h1 style="color: black; text-align: left; margin-bottom: 10px;">关键词</h1>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">深度强化学习, 无人机, 单元状态序列, 门控循环单元</span></span></p>
<h1 style="color: black; text-align: left; margin-bottom: 10px;">引言</h1>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">现代空战环境错综<span style="color: black;">繁杂</span>, 空空导弹和机载雷达性能<span style="color: black;">持续</span><span style="color: black;">提高</span>, 超视距空战<span style="color: black;">已然</span>在现代空战中占据主导地位, 空空导弹<span style="color: black;">亦</span>早已<span style="color: black;">作为</span>打击空中单位的<span style="color: black;">重点</span>武器。无人机<span style="color: black;">做为</span>空中战场的理想作战<span style="color: black;">目的</span>之一, 被<span style="color: black;">广泛</span>运用到军事<span style="color: black;">行业</span><span style="color: black;">其中</span>。利用无人机可<span style="color: black;">连续</span>大机动的飞行特点, 采取<span style="color: black;">有效</span>的机动策略以<span style="color: black;">加强</span>无人机对导弹的规避、逃逸成功率, 对<span style="color: black;">提高</span>无人机的空战<span style="color: black;">存活</span>能力而言至关重要。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">无人机规避空空导弹问题<span style="color: black;">始终</span>都是空战的<span style="color: black;">科研</span>热点。王怀威等采用蒙特卡罗<span style="color: black;">办法</span>验证了无人机实施常规盘旋机动规避导弹的效果。Imado等利用微分对策法<span style="color: black;">科研</span>导弹与无人机差速博弈的问题。<span style="color: black;">另一</span>, 还有<span style="color: black;">许多</span>针对导弹的规避方式、规避效能<span style="color: black;">评定</span>以及无人机最优或次优规避策略解析解等方面的<span style="color: black;">科研</span>。以上<span style="color: black;">办法</span>依赖于完备的空战对战模型以求解在单枚导弹打击<span style="color: black;">状况</span>下的最优机动策略, 当导弹数量变化时, 模型很难理解, <span style="color: black;">况且</span><span style="color: black;">创立</span>空战对战模型本身<span style="color: black;">便是</span>一个非常<span style="color: black;">繁杂</span>的过程, 需要<span style="color: black;">运用</span><span style="color: black;">海量</span>微分函数结合积分函数,<span style="color: black;">才可</span>表征无人机与导弹状态属性的转移规律。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">深度强化学习(deep reinforcement learning, DRL)算法在马尔可夫决策过程(Markov decision process, MDP)<span style="color: black;">基本</span>上, 采用端到端学习方式, 以态势信息为输入, 直接利用神经网络获取输出, <span style="color: black;">掌控</span>智能体作出决策, 被广泛应用于自动化<span style="color: black;">掌控</span>当中。范鑫磊等将深度确定性策略梯度(deep deterministic policy gradient, DDPG)算法应用于无人机规避导弹训练, 在简易模型下对固定态势攻击的空空导弹进行仿真验证。宋宏川等针对导弹制导规则设计成型奖励, 用DDPG算法训练无人机规避正面来袭的导弹, 对比典型规避策略, 训练出了仅次于置尾下降机动的逃逸策略。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">以上</span><span style="color: black;">科研</span><span style="color: black;">显示</span>, 无人机能够<span style="color: black;">经过</span>特定的机动方式来规避空空导弹的打击, 而深度强化学习算法<span style="color: black;">能够</span>训练出自动规避空空导弹的智能体。总体而言, 以往<span style="color: black;">科研</span>大多基于单枚导弹打击场景。<span style="color: black;">然则</span>在超视距空战中, 多枚导弹从<span style="color: black;">区别</span>方向锁定无人机并发动协同攻击的<span style="color: black;">状况</span>屡见不鲜。在这种情形下, DRL算法会存在状态空间维度大, 状态信息维度<span style="color: black;">持续</span>变化, 神经网络输入维度难以固定, 算法收敛性能差等问题。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">针对以上问题, 本文提出一种基于单元状态序列(unit state sequence, USS)的强化学习算法(reinforcement learning method based on USS, SSRL)。在该算法中,<span style="color: black;">首要</span>,将导弹和无人机进行一对一的特征编码,形成特征单元; 其次,<span style="color: black;">按照</span>距离优先级对所有编码后的特征单元进行排序, 组合成一个USS; <span style="color: black;">而后</span>,<span style="color: black;">运用</span>门控循环单元(gated recurrent unit, GRU)对USS中的特征单元进行特征融合, 提取其中的<span style="color: black;">隐匿</span>特征信息; 最后,将<span style="color: black;">隐匿</span>特征信息看作该时刻的状态信息,并将信息传入强化学习算法的神经网络。将该算法分别应用于深度双Q网络(double deep Q-network, DDQN)和DDPG算法上, 在公开无人机空战博弈仿真平台上进行训练。仿真结果<span style="color: black;">显示</span>, 由SSRL算法训练的智能体能够学到连续规避机动策略, <span style="color: black;">掌控</span>无人机进行规避导弹机动, <span style="color: black;">增多</span>导弹脱靶量, <span style="color: black;">提高</span>无人机连续规避导弹的成功率。</span></span></p>
<h1 style="color: black; text-align: left; margin-bottom: 10px;">1 <span style="color: black;">关联</span>理论</h1>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">1.1 </span></strong><span style="color: black;">MDP</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">强化学习训练过程类似于人类学习, 即智能体在<span style="color: black;">持续</span>探索和获取外界反馈中学习能够<span style="color: black;">得到</span>的最大利益, <span style="color: black;">一般</span>被建模成MDP。MDP由状态空间S、动作空间A、状态转移函数P和奖励函数R<span style="color: black;">构成</span>:状态空间是所有可能的状态集合; 动作空间是所有可能的动作集合; 状态转移函数则描述了在当前状态下采取某个动作后到达下一个状态的概率;奖励函数用于描述在当前状态下采取该行动所<span style="color: black;">得到</span>的奖励。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">1.2 </span></strong><span style="color: black;">DDQN算法</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">强化学习任务<span style="color: black;">一般</span>是时间序列决策问题, 与训练数据高度<span style="color: black;">关联</span>。文献引入经验重放机制, 降低数据之间的<span style="color: black;">关联</span>性, 使样本可重复利用, <span style="color: black;">加强</span>学习效率。DDQN算法<span style="color: black;">运用</span>两个神经网络,将动作<span style="color: black;">选取</span>和值函数估计进行解耦, <span style="color: black;">评定</span>网络Q用于环境交互, 用于动作<span style="color: black;">选取</span>的公式如下:</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-axegupay5k/73dc7aa4fc2d44408b22a900dc6dd097~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=D8H8HVeZTQAw%2BVvSr9YPk%2B6fz4I%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">(1)</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">目的</span>网络Q′用于估计下个状态值函数, <span style="color: black;">经过</span>最小化损失函数, 更新<span style="color: black;">评定</span>网络参数, 使训练过程更加稳定:</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/2c4d724d033148e884afd7ff12f906e3~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=3TUkLgM0fBQP0%2FPIFerd%2BHwqhn4%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">(2)</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">式中: yi<span style="color: black;">表率</span><span style="color: black;">目的</span>值, 即:</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/4d05e8d980514529be7330f50dcdf121~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=zTX39CibZ5Je4mqTUQalogxAh9s%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">1.3 </span></strong><span style="color: black;">DDPG算法</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">DDPG算法基于确定性策略梯度(deterministic policy gradient, DPG)算法, 并将DDQN中的双重网络机制应用到ActorCritic框架, 分别<span style="color: black;">运用</span>参数为θμ、θμ′、θQ和θQ′的深度神经网络拟合策略<span style="color: black;">评定</span>函数μ、策略<span style="color: black;">目的</span>函数μ′、动作值<span style="color: black;">评定</span>函数Q和动作值<span style="color: black;">目的</span>函数Q′。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">策略<span style="color: black;">评定</span>函数负责与环境交互, 从环境中获取状态S、奖励r、结束标识d, 进行动作<span style="color: black;">选取</span>如下:</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/9ea60dfda8a24040959c638f7004c533~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=Za%2FD8inQjq0jHU53uX4bHp2vRuA%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">(3)</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">式中: Ni为动作噪声, 对噪声<span style="color: black;">运用</span>模拟退火以避免陷入局部最优, <span style="color: black;">同期</span><span style="color: black;">增多</span>了算法的探索能力。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">智能体<span style="color: black;">经过</span>最小化损失公式,以更新值<span style="color: black;">评定</span>网络参数:</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/583706293b6546a49daedef2ddd284e6~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=x31V4pf4Bwzw%2B7aTQ4PXKr0RQKs%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">(4)</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">式中: yi为<span style="color: black;">目的</span>动作值。即:</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/cb95bce8c7e8454da1f3a8a6762c4a7c~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=vHhXKTIP5Y7QSkJF1q2TjCFxSFw%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">(5)</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">按照</span>DPG算法的理论证明策略函数关于θμ的梯度等价于动作值函数关于Q(s, a|θQ)的期望梯度, 使得<span style="color: black;">能够</span>以梯度更新策略<span style="color: black;">评定</span>网络:</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/c0b1965dda34455ca87b4780bde186b3~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=ggmhLyVVlxGoBSn%2BgxOBpqe1jnw%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">(6)</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">因此呢</span>,策略<span style="color: black;">评定</span>网络<span style="color: black;">根据</span>此方向更新网络参数:</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/cdc81cca08ab4027994f16849383f51f~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=2V5FDtWA%2FyCHSCOmP2IQJJ5xJik%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">(7)</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">动作值网络参数在更新过程中又用于计算策略网络的梯度, 软更新方式为</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/02809eb5c2114ae99df02536f5ceeda8~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=t5GtOSd2Ts1N0T%2Br5HShgsMbWt8%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">(8)</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">以减少学习过程中动作值网络可能<span style="color: black;">显现</span>的不稳定性。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">1.4 </span></strong><span style="color: black;">GRU网络</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">循环神经网络(recurrent neural network, RNN)能够处理时序型输入, 以序列x在t时刻的数据xt和前t-1时刻神经网络的输出ht-1<span style="color: black;">做为</span>输入, 输出yt和ht, ht又<span style="color: black;">做为</span>之后的输入。这种结构能结合历史信息, <span style="color: black;">拥有</span>前瞻能力。GRU是改进的RNN, 其网络结构如图 1所示, 更新门zt和重置门rt 将新的数据与历史信息相结合并<span style="color: black;">选取</span>遗忘某些历史信息。GRU网络能够<span style="color: black;">处理</span>传统RNN在<span style="color: black;">长时间</span>记忆和反向传播中梯度消失的问题, 与长短期记忆(long-short term memory, LSTM)网络收敛效果相当, 并且在训练速度上有很大程度的<span style="color: black;">提高</span>。</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/4da3dd3c447843bd9f49fb9c0cfab222~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=hJE7g2Z5Jd4OleoMe770yndGefQ%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">图1 GRU网络</span></span></span></p>
<h1 style="color: black; text-align: left; margin-bottom: 10px;">2 SSRL算法</h1>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">在超视距空战中, <span style="color: black;">一般</span>以机载雷达、地面雷达及红外瞄准跟踪<span style="color: black;">安装</span>等进行<span style="color: black;">目的</span>探测, 其战场范围太大, 作战的实体数目难以确定, 经常以多机、多<span style="color: black;">目的</span>协同作战方式<span style="color: black;">显现</span>。当无人机面临多枚导弹的协同打击时, 环境中的实体数量会<span style="color: black;">出现</span>动态变化, 神经网络在训练时需要<span style="color: black;">根据</span>最大实体数目输入<span style="color: black;">所有</span>环境信息, 实体信息默认值为零, 存在的实体按<span style="color: black;">实质</span>状态值进行填充。这种方式适用于实体数目确定的场景, 且并非所有的环境信息都会对当前智能体的决策产生影响。<span style="color: black;">怎样</span>有效地获取空战环境中的态势特征, 筛选出更加重要的状态信息, 对<span style="color: black;">提高</span>训练效果而言<span style="color: black;">非常</span>重要。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">USS编码方式能够有效<span style="color: black;">处理</span>这个问题。在自然语言处理中, <span style="color: black;">区别</span>的单词被转换成等长的稠密向量, 用于区分各个单词之间的差异, 而RNN能够从词向量模型中提取特征信息。借鉴词向量编码的<span style="color: black;">优良</span>, <span style="color: black;">思虑</span>将智能体与单个<span style="color: black;">目的</span>视为一组特征映射, 进行状态编码, 提取其中的重要态势特征, 并将其编码为状态单元。该时刻的USS由多个状态单元<span style="color: black;">构成</span>, 利用GRU网络<span style="color: black;">有效</span>训练和融合序列信息的能力, 从USS中提取<span style="color: black;">隐匿</span>特征, 用于强化学习。这种特征提取<span style="color: black;">办法</span>在理论上<span style="color: black;">能够</span>结合到在单实体对多<span style="color: black;">目的</span>(可变)的任意场景中, 属于通用<span style="color: black;">处理</span><span style="color: black;">方法</span>。本文将USS结合到DDQN和DDPG算法中, 在超视距空战中连续规避导弹的场景下进行实验。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">2.1 </span></strong><span style="color: black;">USS</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">单元状态序列的编码过程如下。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">过程</span>1 从仿真环境中获取无人机观测信息Of和导弹的观测信息O, O=(o1, o2, …, on)。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">过程</span>2 将无人机与每一枚敌方空空导弹的观测信息沿水平面和垂直平面分别进行分解, 并<span style="color: black;">运用</span>天线偏转角(antenna train angle, ATA)和航向交叉角(heading crossing angle, HCA)描述其朝向和位置差异。ATAh和HCAh分别<span style="color: black;">暗示</span>水平面的天线偏转角和航向交叉角, 如图 2所示。</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/c10139997f984aebb55a07a4ccb007e6~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=rYEcUXoKo1tUzCM4d46FbVq7m%2BI%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">图2 水平面态势分解</span></span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">ATAv和HCAv分别<span style="color: black;">暗示</span>无人机运动垂直切面的天线偏转角和航向交叉角, β<span style="color: black;">暗示</span>飞机俯仰角, βm<span style="color: black;">暗示</span>导弹俯仰角, 如图 3所示。</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/44a40da75c9c471faf52d09859ab2cfa~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=wbH9YjR2dn7%2B0E1eU00kYj99x28%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">图3 垂直面态势分解</span></span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">过程</span>3 <span style="color: black;">按照</span>无人机与每一枚敌方空空导弹的观测信息, 提取其中的相对运动信息,<span style="color: black;">包含</span>导弹对飞机的相对运动速度Δv、导弹转动角速度Δω、无人机与导弹相对运动距离D、无人机所处飞行高度H。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">过程</span>4 将无人机与每一枚敌方空空导弹经过<span style="color: black;">过程</span>2和<span style="color: black;">过程</span>3得到的状态信息进行拼接, <span style="color: black;">而后</span>归一化, <span style="color: black;">保留</span>为状态单元Si, 如下所示:</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/8634180f80fe4d4b9590962ffcbc87d3~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=UccNxWo2PVRQ%2F002MPtZ03pSkYE%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">(9)</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">过程</span>5 无人机在面临多枚导弹打击的<span style="color: black;">状况</span>下, 越早遭遇导弹, 威胁紧急程度越高, 需要优先进行<span style="color: black;">思虑</span>。与自然语言处理<span style="color: black;">同样</span>, 单词<span style="color: black;">显现</span>的先后<span style="color: black;">次序</span>会呈现出<span style="color: black;">必定</span>的<span style="color: black;">关联</span>性, 状态单元之间<span style="color: black;">亦</span>会存在这种联系。采用下式估计无人机与导弹的碰撞时间:</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/2334877924ba4988abc59600efc65b85~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=DknOE8b8IEqTXruhYt2PCIwdiZw%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">(10)</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">式中: D为无人机到导弹形成的距离向量, |vm + vp|cos ξ为无人机与该导弹速度矢量和在D方向上的投影, 由式(10)可得到遭遇导弹的剩余时间估计量</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/14f09263d2fa40859761a5d59f43a231~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=Bo6bHzh4qQiU0H8FplHHcTYeujQ%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">过程</span>6 <span style="color: black;">根据</span></span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/ccb2488bb2ff453b90701ced49d5b749~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=PLUGNnh60U%2BeoDV8GyTcjDbMY9Q%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">对状态单元进行排序并结合, <span style="color: black;">形成</span>单元状态序列USS,USS =。导弹数量是不固定的, <span style="color: black;">因此呢</span>USS中含有的状态单元数量不固定。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">2.2 </span></strong><span style="color: black;"><span style="color: black;">隐匿</span>特征提取</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">USS能对无人机当前所处的环境信息进行<span style="color: black;">独一</span>标识, <span style="color: black;">运用</span>GRU网络对USS中的状态单元进行特征融合, 提取<span style="color: black;">隐匿</span>特征的<span style="color: black;">仔细</span><span style="color: black;">过程</span>如下。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">过程</span>1 初始化<span style="color: black;">隐匿</span>特征hzero为全零矩阵。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">过程</span>2 将前一时刻<span style="color: black;">隐匿</span>特征ht-1与当前状态单元St输入公式进行计算, 得到更新门神经元向量zt。</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/f42157313f854615a7d3071aaca44a19~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=eLfkZA0weblanhUAM%2FWjm1C8tZ8%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">(11)</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">过程</span>3 将前一时刻<span style="color: black;">隐匿</span>特征ht-1与当前状态单元St输入公式进行计算, 得到重置门神经元向量rt。</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/4d94722fd6dc41eabbda9b2fca7c530b~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=XR8K8o9aFr8eeLDaAD2DSoOQFic%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">(12)</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">过程</span>4 将前一时刻<span style="color: black;">隐匿</span>特征ht-1、当前状态单元St以及重置门神经元向量rt输入到公式进行计算, 得到候选<span style="color: black;">隐匿</span>特征</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/12320ba304ea4ec7a3501a4afb5bb6f2~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=fwTWFvP%2Fc8t%2BawndCe28uP6f4mM%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">。</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/234c265322524af6ae2085f02a8a2b10~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=afU%2FSUThW95ek%2Fyi9vwOIiSkC%2Bk%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">(13)</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">过程</span>5 将前一时刻<span style="color: black;">隐匿</span>特征ht-1、更新门神经元向量zt以及候选<span style="color: black;">隐匿</span>特征</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/33fe76617daa480a8696210525401c8c~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=c%2BMcvYHFG3XLZGbOCZCxee3BHX0%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">进行计算, 在平衡历史特征的<span style="color: black;">同期</span>加入了新状态所<span style="color: black;">包括</span>的信息, 求到新的<span style="color: black;">隐匿</span>特征。</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/b477e67a332249c0b98a620408456944~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=fAffuGNxcTI%2BdEVkdXffCjSUPmE%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">(14)</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">过程</span>6 <span style="color: black;">倘若</span>USS中还有状态单元St+1, 则将新的<span style="color: black;">隐匿</span>特征ht和该状态单元St+1返回<span style="color: black;">过程</span>2进行计算。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">过程</span>7 得到USS的<span style="color: black;">隐匿</span>特征hN, 将其<span style="color: black;">做为</span>该时刻USS的<span style="color: black;">隐匿</span>特征输出Feature。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">其中, Wr、Wz和</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/6b6d15560abb4e3e8d8b772dd447d3a2~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=35y349p0js6OsETDFDiti0ykc58%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">都是可学习参数, σ为sigmoid函数, 输出0~1为信息所占百分比。在强化学习动作<span style="color: black;">选取</span><span style="color: black;">周期</span>, 直接<span style="color: black;">运用</span>特征输出Feature<span style="color: black;">做为</span>神经网络的输入, <span style="color: black;">选取</span>动作; 在训练<span style="color: black;">周期</span>,<span style="color: black;">一样</span><span style="color: black;">运用</span>GRU提取单元状态序列中的特征Feature, 输入神经网络, 反向传播过程中的神经网络误差值error关于状态单元S的梯度Δe(S)将用于GRU网络中Wr、Wz和</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/7a4fcf7b02844990bcfc6b8fa750083d~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=ZFXqJ65DeoUT9nQngtH5pUe6P2g%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">的更新。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">2.3 </span></strong><span style="color: black;">结合DDQN算法构建的算法</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">环境中的观测信息经过状态编码转码为USS后, 再<span style="color: black;">运用</span>GRU进行特征融合与提取, 就能得到定长编码的特征信息Feature, 可将其直接<span style="color: black;">做为</span>强化学习的神经网络输入。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">结合DDQN算法, 构建基于DDQN的算法(DDQN algorithm based on USS, SSDDQN), 其算法流程如图 4所示。SSDDQN<span style="color: black;">守护</span>一个GRU模块,用于提取USS中的<span style="color: black;">隐匿</span>特征, 优化器<span style="color: black;">经过</span>式(2)计算<span style="color: black;">评定</span>网络梯度, <span style="color: black;">而后</span>更新<span style="color: black;">评定</span>网络参数, 梯度参数会传播到GRU模块,以进行同步更新。</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/2c39087a21944fcbbb5e0e2491450cf6~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=NYD6Cy5%2FgLOldBJDUvShBknigaY%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">图4 SSDDQN算法流程</span></span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">结合DDPG算法, 构建基于USS的DDPG算法(DDPG algorithm based on USS, SSDDPG), 其算法流程如图 5所示。SSDDPG<span style="color: black;">守护</span>两个GRU模块, Feature分别用于策略网络和值网络。值网络优化器<span style="color: black;">经过</span>式(4)计算值神经网络梯度, 策略网络优化器<span style="color: black;">经过</span>式(7)计算策略梯度, 网络参数更新时的梯度参数会传播到对应GRU模块进行更新。</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/067733e6190c4e6fbbbcc2167e5f58c6~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=650GM9Lcuz%2B0W7M1ElPy%2BzwR1cE%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">图5 SSDDPG算法流程</span></span></span></p>
<h1 style="color: black; text-align: left; margin-bottom: 10px;">3 基于强化学习的空战决策</h1>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">3.1 </span></strong><span style="color: black;">无人机机动模型</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">本文<span style="color: black;">科研</span>的无人机动力学模型如图 6所示, 可将无人机视作<span style="color: black;">上下</span>对<span style="color: black;">叫作</span>的理想刚体, 其运动方式<span style="color: black;">暗示</span>为三自由度飞行<span style="color: black;">掌控</span>仿真。将无人机所受的合力沿运动方向和垂直于运动方向进行分解, nv<span style="color: black;">暗示</span>升力、阻力和推力的合力对速度产生的影响, <span style="color: black;">供给</span>切向加速度; 无人机的法向过载用nh<span style="color: black;">暗示</span>, 垂直于速度方向, <span style="color: black;">掌控</span>无人机俯仰角; 滚转角用μ<span style="color: black;">暗示</span>; 重力加速度用g<span style="color: black;">暗示</span>。</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/89ba5d2b1ba644c0817f8e8938505977~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=Qfvzek%2FkUp0%2FP7c1q%2Fot%2FKHSfsM%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">图6 无人机动力学模型</span></span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">无人机三自由度仿真的运动方程如下:</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/8fffd300e0b941499e124ee420fbaff7~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=NxD%2B7lsUOPqCJOI%2F0uSkj4Q2RSw%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">(15)</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">式中: β为俯仰角, <span style="color: black;">暗示</span>速度与水平面的夹角, 取值范围为[-π/2, π/2], 水平面向上为正, 向下为负; α为方位角, <span style="color: black;">暗示</span>速度在水平面上的投影与正北方向的夹角, <span style="color: black;">体积</span>范围为。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">3.2 </span></strong><span style="color: black;">无人机动作空间</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">本文中的无人机动作以其运动方向和飞行姿态为相对坐标系的原点, 始终保持无人机以最大速度进行飞行, 即切向过载nv=sin β。智能体<span style="color: black;">经过</span>改变无人机的滚转角μ和法向过载nh<span style="color: black;">掌控</span>无人机机动。其中,μ∈[-π/2, π/2], nh∈[-Gmax, Gmax]。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">DDPG算法直接输出两个-1到1的动作, 并分别<span style="color: black;">根据</span>滚转角μ和法向过载nh进行反归一化, <span style="color: black;">掌控</span>无人机机动。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">DDQN算法适用于处理离散型的动作决策, <span style="color: black;">针对</span>无人机智能机动这种连续型的动作空间, <span style="color: black;">广泛</span>做法是将连续动作进行离散化处理, 将其转换为离散机动<span style="color: black;">掌控</span>。<span style="color: black;">按照</span>美国国家航空航天局提出的空战基本动作库, <span style="color: black;">根据</span>机动方向和过载<span style="color: black;">体积</span>,将滚转角与过载离散化为表 1所示的9个<span style="color: black;">基本</span>动作。其中,滚转角μh与当前俯仰角β的关系如下:</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/d691510fd82140b7b12a9b2d182166a8~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=H%2BHXkeYvPSWdQ4a1V9IdtuV8qGQ%3D" style="width: 50%; margin-bottom: 20px;">
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">表1 DDQN算法的动作空间</p>
</div>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/c21de07601f64941998d31329a68fe5d~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=Nbg3V3QM8YTBto1Ezpf%2B6RVmAX4%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">(16)</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">此时,</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p26-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/8086a352e3fa47efbec4f7eb99bc9833~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=Qp8rCG8GdWPSrvgHlnx8C1hNEU0%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">, 无人机会以当前运动平面做转向机动。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">3.3 </span></strong><span style="color: black;">实验奖励设计</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">智能体在训练的过程中, <span style="color: black;">因为</span>任务完成次数太少或任务步数过大, 会<span style="color: black;">引起</span>学习缓慢, <span style="color: black;">乃至</span><span style="color: black;">没法</span>收敛。本文<span style="color: black;">经过</span>奖励重塑给智能体设计<span style="color: black;">周期</span>性<span style="color: black;">目的</span>, 引导智能体朝着完成任务的方向前进。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">(1) 高度奖励</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">无人机最高飞行高度为15 km, 最低飞行高度为2 km, 在规避机动的过程中可能<span style="color: black;">显现</span>越过边界的<span style="color: black;">状况</span>, 超过高度限制则判定为损毁。<span style="color: black;">思虑</span>到靠近边界则<span style="color: black;">处在</span>危险态势, 为减少在规避过程中超过边界<span style="color: black;">引起</span>损毁的现象, <span style="color: black;">创立</span>高度奖励函数:</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/8fe1839891444edba76b9c9350b8f38e~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=SRdbqoxZJLKIoUrpN8M2p4429OM%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">(17)</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">(2) 规避奖励</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">在多枚导弹打击无人机时, 将任务拆分为规避多个单枚导弹的<span style="color: black;">周期</span>性任务, 并鼓励智能体规避更多导弹。奖励出<span style="color: black;">此刻</span>导弹爆炸时, 设计成功规避奖励:</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/89ef31d8632f47daad9b0f1d4e756f0a~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=c4o3lp5aFfZvpJfQ3wsBJg0w7Sc%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">(18)</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">(3) 仿真结束奖励</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">在连续规避空空导弹的问题中, <span style="color: black;">最后</span>的决策<span style="color: black;">评估</span>标准为无人机能否<span style="color: black;">存活</span>。在一局仿真模拟结束时, <span style="color: black;">按照</span>无人机<span style="color: black;">存活</span>状态,<span style="color: black;">创立</span><span style="color: black;">存活</span>奖励:</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/3be078b8c93f45dda379098df8ccd24b~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=ANlSTQr4uQWadUGT2m47rptOhqA%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">(19)</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">针对战斗机规避导弹问题, <span style="color: black;">思虑</span>以上3种奖励, <span style="color: black;">创立</span>环境奖励函数:</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/709f2523ca1d48158b236be1939728ef~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=KlpQe5w0U%2FrFtOntT82JRPbLzBE%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">(20)</span></span></p>
<h1 style="color: black; text-align: left; margin-bottom: 10px;">4 超视距空战仿真实验</h1>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">本文实验平台采用由中国指挥与<span style="color: black;">掌控</span>学会和共青团航空工业集团委员会联合主办的“2021首届全国空中智能博弈大赛”的决赛环境。该平台以作战计划和<span style="color: black;">方法</span>为<span style="color: black;">科研</span>对象, 构建典型空战场景, 拓展空战作战样式和作战机理, 赋能作战指挥决策。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">4.1 </span></strong><span style="color: black;">规避导弹任务想定</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">针对无人机规避空空导弹问题, 实验设计了两个任务想定: ①蓝方飞机挂载导弹随机出<span style="color: black;">此刻</span>红方无人机四周, 并利用导弹进行打击, 红方无人机<span style="color: black;">经过</span>连续机动决策规避导弹; ②蓝方4架飞机分别挂载导弹, <span style="color: black;">同期</span>从<span style="color: black;">区别</span>方向攻击无人机。红方无人机尝试<span style="color: black;">经过</span>连续机动决策以求规避<span style="color: black;">所有</span>导弹, 并<span style="color: black;">最后</span><span style="color: black;">得到</span><span style="color: black;">存活</span>。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">影响空空导弹对无人机进行有效打击的<span style="color: black;">原因</span>有<span style="color: black;">非常多</span>, <span style="color: black;">包含</span>导弹和无人机的速度、高度、俯仰角、朝向角、<span style="color: black;">目的</span>进入角以及导弹制导律、无人机机动方式等。为衡量在<span style="color: black;">区别</span>态势下智能体<span style="color: black;">掌控</span>无人机规避导弹的性能优劣, 设计如图 7所示的初始场景。红方无人机速度为400 m/s, 最大过载Gmax为6g, 初始高度为9 km。蓝方无人机携带空空导弹在距离红方无人机20 km到60 km、高度在3 km到15 km的范围随机<span style="color: black;">显现</span>。蓝方飞机在导弹发射之后便跟随红方无人机飞行, 为导弹制导<span style="color: black;">供给</span>雷达照射。</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/98c63d7cfb7c40ec8d3ac11f22096e6e~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=fB6Tav9A1GUX1a8pU60p%2B3MZwH4%3D" style="width: 50%; margin-bottom: 20px;">
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">图7 初始想定场景</p>
</div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">空空导弹的初始速度为300 m/s, 最大速度为1 000 m/s, 发动机工作时间为30 s, 其速度变化如图 8所示。导弹在发射之后会先急剧加速, 在到达最大速度后匀速飞行, 发动机停止工作后做减速运动。导弹制导规则为比例导引算法, 杀伤半径为100 m, 脱靶后会立即爆炸。</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/abb4529f012c4468b7958160368ce547~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=is7jOtWqeRzagwKIfUWgf%2FZbdNg%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">图8 导弹飞行速度随时间变化的曲线</span></span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">4.2 </span></strong><span style="color: black;">神经网络参数</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">在强化学习中,DDQN算法、DDPG算法、SSDDQN算法以及SSDDPG算法的神经网络均采用3层反向传播神经网络, 每一层神经元节点个数分别为256, 256, 256, 经验池<span style="color: black;">体积</span>为50 000。DDQN算法和DDPG算法的输入将状态单元Si按最大实体数目进行直接拼接, 初始状态默认为0, 对存在的实体按状态值进行填充, 即在单枚导弹场景训练时的输入维度为10, 在4枚导弹场景训练时的输入维度为40。SSDDQN算法和SSDDPG算法均<span style="color: black;">运用</span>GRU网络单元进行特征提取, GRU网络单元以USS为输入, 输出维度为10的<span style="color: black;">隐匿</span>特征Feature, <span style="color: black;">隐匿</span>特征用于强化学习训练和动作<span style="color: black;">选取</span>。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">在DDQN算法和SSDDQN算法中, 折扣系数γ= 0.99, 学习率lr=1e-4, 网络更新频率为100, 即训练100次,就能将<span style="color: black;">评定</span>网络的参数拷贝到<span style="color: black;">目的</span>网络中。在DDPG算法和SSDDPG算法中, 折扣系数γ=0.99, 值网络学习率lrc=5e-4, 策略网络学习率lra=1e-4, 软更新频率τ=0.01。每训练2 000局, 就利用训练后的智能体完成1 000轮对局测试,以记录规避导弹存活率。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">4.3 </span></strong><span style="color: black;">仿真实验分析</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">在实验中, DDQN算法、DDPG算法、SSDDQN算法以及SSDDPG算法分别在想定①和想定②的场景中进行训练, 并将4种强化学习算法与3种简单的固定机动策略进行比较,即高速直线飞行、高速俯冲、最大过载转圈。本文分析了规避导弹的存活率、智能体<span style="color: black;">掌控</span>无人机连续规避导弹的飞行轨迹,以及<span style="color: black;">区别</span>机动方式下的导弹脱靶量。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">4.3.1 </span><span style="color: black;">规避导弹成功率</span></strong></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">在想定①中的训练曲线如图 9(a)所示: 高速直线飞行、高速俯冲、最大过载转圈面对导弹打击时的存活率分别为40%、44%、67%;DDQN、DDPG、SSDDQN、SSDDPG这4种算法面对导弹打击时的最高存活率分别为99%、99.5%、98.5%、99.5%。在3种简单机动算法中, <span style="color: black;">仅有</span>最大过载转圈机动策略的存活率超过了60%, 而由4种强化学习算法训练出来的智能体<span style="color: black;">操作</span>无人机规避导弹的存活率均超过了98%, 证明强化学习能够在无人机规避导弹场景下训练出<span style="color: black;">拥有</span>自主机动决策的智能体。<span style="color: black;">同期</span><span style="color: black;">发掘</span>, <span style="color: black;">增多</span>了USS的强化学习算法在收敛速度上比传统强化学习算法更慢, 但<span style="color: black;">最后</span>的智能体规避导弹的存活率差异很小。在想定②中的训练曲线如图 9(b)所示: 高速直线飞行、高速俯冲、最大过载转圈在导弹打击时的存活率分别为2.5%、3.7%、20.1%;DDQN、DDPG、SSDDQN、SSDDPG这4种算法在导弹打击时的最高存活率分别为25.5%、28.5%、88.5%、88%。<span style="color: black;">增多</span>打击无人机的导弹数量后, 高速直线飞行和高速俯冲机动这两种机动方式很难存活, 最大过载转圈这种机动方式仍然<span style="color: black;">拥有</span><span style="color: black;">必定</span>的规避能力。DDQN算法和DDPG算法在该场景下虽然<span style="color: black;">亦</span>能够<span style="color: black;">提高</span>连续规避多枚导弹的存活率, <span style="color: black;">然则</span>其收敛效果较差。结合USS的SSDDQN算法与SSDDPG算法, 在应对4枚导弹时仍然能够训练出以很大概率规避<span style="color: black;">所有</span>导弹的智能体。对比<span style="color: black;">发掘</span>, 结合USS的强化学习算法在多枚导弹<span style="color: black;">同期</span>打击场景下的收敛速度更快、精度更高, 能够<span style="color: black;">显著</span><span style="color: black;">提高</span>无人机连续规避多枚空空导弹打击的存活率。</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/46f16ec2257b4fe7867fc7594510fc90~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=Nx29vV%2B7%2BBUc7uq%2F2G2vB%2BgTFPA%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">图9 无人机规避导弹打击的训练曲线</span></span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">由图 9可知, 由SSRL算法训练的智能体在规避单枚空空导弹中的存活率与DDQN算法和DDPG算法相当, 并且在连续规避多枚空空导弹中的存活率远高于DDQN算法和DDPG算法。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">4.3.2 </span><span style="color: black;">无人机规避飞行轨迹</span></strong></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">在想定①中, DDQN算法、DDPG算法、SSDDQN算法、SSDDPG算法训练的智能体, 针对<span style="color: black;">区别</span>态势来袭的导弹会采取<span style="color: black;">区别</span>的规避机动方式。训练出的典型机动方式如图 10所示, <span style="color: black;">包含</span>垂直于导弹运动方向、急速下潜、垂直置尾再急速转向等机动方式。智能体<span style="color: black;">按照</span>态势<span style="color: black;">区别</span>和飞行速度<span style="color: black;">区别</span>的来袭导弹,会<span style="color: black;">选取</span>采用<span style="color: black;">区别</span>的机动方式进行规避, 尽管这些机动方式<span style="color: black;">区别</span>, <span style="color: black;">然则</span>都能够有效规避智能体遭遇导弹的攻击。</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/7bd69b72b1374c739c31d5e458fd210f~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=bQB4pFe0B7Jw%2FphHkRBZh2f8oDQ%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">图10 无人机规避单枚导弹的飞行轨迹</span></span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">在想定②中, SSDDQN算法与SSDDPG算法训练的智能体<span style="color: black;">掌控</span>无人机机动的飞行轨迹近似, 如图 11所示。图 11(a)为SSRL算法训练的智能体<span style="color: black;">掌控</span>无人机连续规避4枚导弹的飞行轨迹。在规避导弹的<span style="color: black;">全部</span>过程中, 智能体采取不规则的爬升和俯冲动作, 并在最后时刻<span style="color: black;">运用</span>最大过载进行极限转向。图 11(b)为无人机遭遇导弹<span style="color: black;">周期</span>的飞行轨迹, 当导弹靠近时, 智能体<span style="color: black;">掌控</span>无人机在垂直平面做不规则的蛇形机动, <span style="color: black;">持续</span><span style="color: black;">调节</span>飞行姿态, 使得无人机在最后时刻能够<span style="color: black;">经过</span>反向大机动规避空空导弹。</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/40af7902983c41ea9f085a66b2efe963~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=ag0sBhC5Z986MUruKYItZ4AUqOQ%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">图11 无人机规避4枚导弹的飞行轨迹</span></span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">4.3.3 </span><span style="color: black;">导弹脱靶量</span></strong></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">脱靶量是用来形容导弹与<span style="color: black;">目的</span>在运动过程中相对距离的最小值, 是评定导弹系统命中精度的重要指标, 脱靶量的<span style="color: black;">体积</span>能够直接影响导弹的毁伤概率。脱靶量越大, 导弹对无人机的威胁度越低, 当导弹的脱靶量大于攻击范围时, 无人机<span style="color: black;">能够</span>成功避开导弹。在遭遇<span style="color: black;">周期</span>, <span style="color: black;">能够</span>认为导弹和无人机均做匀速直线运动, 并以此计算智能体应对<span style="color: black;">区别</span>位置来袭导弹的平均脱靶量。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">图 12为<span style="color: black;">区别</span>机动方式下的导弹脱靶量箱型分布, 表 2为<span style="color: black;">区别</span>机动方式下的导弹平均脱靶量。图 12 (a)为想定①中导弹对无人机脱靶量的箱型分布: 高速直线飞行、高速俯冲的平均脱靶量集中在100 m范围内, 平均脱靶量分别为97 m、99 m; 最大过载转圈的脱靶量集中在100 m<span style="color: black;">周边</span>, 平均导弹脱靶量为106 m; DDQN算法、SSDDQN算法、DDPG算法、SSDDPG算法训练的智能体应对导弹的脱靶量集中在132 m<span style="color: black;">周边</span>。图 12 (b)为想定②中导弹对无人机脱靶量的箱型分布: 高速直线飞行、高速俯冲、最大过载转圈的脱靶量集中在100 m范围内, 平均脱靶量分别为43 m、74 m、83 m; DDQN算法、DDPG算法训练的智能体应对导弹的脱靶量分别为92 m、98 m; SSDDQN算法、SSDDPG算法训练的智能体应对导弹的脱靶量分别为128 m、127 m。由此可知, 在连续规避导弹的场景中, SSRL算法训练的智能体能够有效<span style="color: black;">加强</span>导弹脱靶量, 更好地避开导弹的打击。</span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p3-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/78430f6f16b145ccb500992a1cb534d8~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=4OH2ct3vkqUGwlOXDb05fMnm8EA%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">图12 <span style="color: black;">区别</span>机动方式下的导弹脱靶量</span></span></span></p>
<div style="color: black; text-align: left; margin-bottom: 10px;"><img src="https://p26-sign.toutiaoimg.com/tos-cn-i-twdt4qpehh/e61fbe41872e4433901f06991fc08356~noop.image?_iz=58558&from=article.pc_detail&lk3s=953192f4&x-expires=1724919529&x-signature=auqxUiuSujkyfPgEM768gFOqbnE%3D" style="width: 50%; margin-bottom: 20px;"></div>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;"><span style="color: black;">表2 导弹平均脱靶量</span></span></span></p>
<h1 style="color: black; text-align: left; margin-bottom: 10px;">5 结论</h1>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><span style="color: black;">本文针对超视距空战中无人机在面对多枚导弹<span style="color: black;">同期</span>打击情景下状态空间维度大、状态信息维度<span style="color: black;">持续</span>变化、强化学习算法训练效果差等问题, 提出<span style="color: black;">运用</span>USS<span style="color: black;">暗示</span>单实体对多<span style="color: black;">目的</span>场景下的状态特征, 并利用GRU网络对USS进行特征融合与提取, 提取的特征用于强化学习, <span style="color: black;">形成</span>SSRL算法。算法在“2021首届全国空中智能博弈大赛”仿真平台上进行训练。仿真结果<span style="color: black;">显示</span>, 相比DDQN算法、DDPG算法、简单机动算法, SSRL算法训练出来的智能体能够<span style="color: black;">增多</span>导弹脱靶量, <span style="color: black;">提高</span>无人机在连续规避导弹场景中的存活率。</span></span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;"><span style="color: black;"><strong style="color: blue;"><span style="color: black;">声明:</span></strong>公众号转载的<span style="color: black;">文案</span>及<span style="color: black;">照片</span>出于非<span style="color: black;">商场</span>性的教育和<span style="color: black;">研究</span>目的供<span style="color: black;">大众</span>参考和探讨,并不<span style="color: black;">寓意</span>着支持其观点或证实其内容的真实性。版权归原作者所有,如转载稿<span style="color: black;">触及</span>版权等问题,请立即联系<span style="color: black;">咱们</span>删除。</span></p>
<p style="font-size: 16px; color: black; line-height: 40px; text-align: left; margin-bottom: 15px;">“人工智能技术与咨询” 发布</p>
期待楼主的下一次分享!” 外贸网站建设方法 http://www.fok120.com/
页:
[1]