基于自动信息系统数据和通用对手模仿学习的船舶智能避碰决策模型——深度确定性策略梯度

Jiao Liu, Guoyou Shi, Kaige Zhu, Jiahui Shi, Yuchuang Wang
{"title":"基于自动信息系统数据和通用对手模仿学习的船舶智能避碰决策模型——深度确定性策略梯度","authors":"Jiao Liu, Guoyou Shi, Kaige Zhu, Jiahui Shi, Yuchuang Wang","doi":"10.1145/3583788.3583790","DOIUrl":null,"url":null,"abstract":"Aiming at the problems that the current decision-making model of ship collision avoidance does not consider International Regulations for Preventing Collisions at Sea (COLREGS), ship maneuverability, and the need for a lot of training time, combined with the advantages of reinforcement learning and imitation learning, a ship intelligent collision avoidance decision-making model based on Generic Adversary Imitation Learning (GAIL) is proposed: Firstly, the collision avoidance data in Automatic Information System (AIS) data is extracted as expert data; Secondly, in the generator part, the environment model is established based on Mathematical Model Group (MMG) and S-57 chart rendering, and the state space, behaviour space and reward function of reinforcement learning are constructed. The deep deterministic policy gradient (DDPG) is used to interact with the environment model to generate ship trajectory data. At the same time, the generator can constantly learn expert data; Finally, a discriminator can distinguish the expert data from the data generated by the generator is constructed and trained. The model training is completed when the discriminator cannot distinguish the two. In order to verify the performance of the model, AIS data near the South China Sea is used to process and extract collision avoidance decision data, and a ship intelligent collision avoidance decision model based on GAIL is established. After the model converges, the final generated data is compared with the expert data. The experimental results verify that the model proposed in this paper can reproduce the expert collision avoidance trajectory and is a practical decision model of ship collision avoidance.","PeriodicalId":292167,"journal":{"name":"Proceedings of the 2023 7th International Conference on Machine Learning and Soft Computing","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Decision Model of Ship Intelligent Collision Avoidance Based on Automatic Information System Data and Generic Adversary Imitation Learning-Deep Deterministic Policy Gradient\",\"authors\":\"Jiao Liu, Guoyou Shi, Kaige Zhu, Jiahui Shi, Yuchuang Wang\",\"doi\":\"10.1145/3583788.3583790\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Aiming at the problems that the current decision-making model of ship collision avoidance does not consider International Regulations for Preventing Collisions at Sea (COLREGS), ship maneuverability, and the need for a lot of training time, combined with the advantages of reinforcement learning and imitation learning, a ship intelligent collision avoidance decision-making model based on Generic Adversary Imitation Learning (GAIL) is proposed: Firstly, the collision avoidance data in Automatic Information System (AIS) data is extracted as expert data; Secondly, in the generator part, the environment model is established based on Mathematical Model Group (MMG) and S-57 chart rendering, and the state space, behaviour space and reward function of reinforcement learning are constructed. The deep deterministic policy gradient (DDPG) is used to interact with the environment model to generate ship trajectory data. At the same time, the generator can constantly learn expert data; Finally, a discriminator can distinguish the expert data from the data generated by the generator is constructed and trained. The model training is completed when the discriminator cannot distinguish the two. In order to verify the performance of the model, AIS data near the South China Sea is used to process and extract collision avoidance decision data, and a ship intelligent collision avoidance decision model based on GAIL is established. After the model converges, the final generated data is compared with the expert data. The experimental results verify that the model proposed in this paper can reproduce the expert collision avoidance trajectory and is a practical decision model of ship collision avoidance.\",\"PeriodicalId\":292167,\"journal\":{\"name\":\"Proceedings of the 2023 7th International Conference on Machine Learning and Soft Computing\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2023 7th International Conference on Machine Learning and Soft Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3583788.3583790\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 7th International Conference on Machine Learning and Soft Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3583788.3583790","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

针对当前船舶避碰决策模型未考虑《国际海上避碰规则》(COLREGS)、船舶机动性以及需要大量训练时间等问题,结合强化学习和模仿学习的优点,提出了一种基于通用对手模仿学习(GAIL)的船舶智能避碰决策模型:首先,将自动信息系统(AIS)数据中的避碰数据提取为专家数据;其次,在生成器部分,基于数学模型组(MMG)和S-57图绘制建立环境模型,构造强化学习的状态空间、行为空间和奖励函数;利用深度确定性策略梯度(deep deterministic policy gradient, DDPG)与环境模型交互生成船舶轨迹数据。同时,生成器可以不断学习专家数据;最后,构造并训练了一个鉴别器来区分专家数据和生成器生成的数据。当鉴别器无法区分两者时,模型训练完成。为了验证模型的性能,利用南海附近AIS数据对避碰决策数据进行处理和提取,建立了基于GAIL的船舶智能避碰决策模型。模型收敛后,将最终生成的数据与专家数据进行比较。实验结果表明,该模型能较好地再现专家避碰轨迹,是一种实用的船舶避碰决策模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Decision Model of Ship Intelligent Collision Avoidance Based on Automatic Information System Data and Generic Adversary Imitation Learning-Deep Deterministic Policy Gradient
Aiming at the problems that the current decision-making model of ship collision avoidance does not consider International Regulations for Preventing Collisions at Sea (COLREGS), ship maneuverability, and the need for a lot of training time, combined with the advantages of reinforcement learning and imitation learning, a ship intelligent collision avoidance decision-making model based on Generic Adversary Imitation Learning (GAIL) is proposed: Firstly, the collision avoidance data in Automatic Information System (AIS) data is extracted as expert data; Secondly, in the generator part, the environment model is established based on Mathematical Model Group (MMG) and S-57 chart rendering, and the state space, behaviour space and reward function of reinforcement learning are constructed. The deep deterministic policy gradient (DDPG) is used to interact with the environment model to generate ship trajectory data. At the same time, the generator can constantly learn expert data; Finally, a discriminator can distinguish the expert data from the data generated by the generator is constructed and trained. The model training is completed when the discriminator cannot distinguish the two. In order to verify the performance of the model, AIS data near the South China Sea is used to process and extract collision avoidance decision data, and a ship intelligent collision avoidance decision model based on GAIL is established. After the model converges, the final generated data is compared with the expert data. The experimental results verify that the model proposed in this paper can reproduce the expert collision avoidance trajectory and is a practical decision model of ship collision avoidance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Decision Model of Ship Intelligent Collision Avoidance Based on Automatic Information System Data and Generic Adversary Imitation Learning-Deep Deterministic Policy Gradient Joint Action Representation and Prioritized Experience Replay for Reinforcement Learning in Large Discrete Action Spaces Neural Network Optimization Objective Vector Representation based on Genetic Algorithm and Its Multi-objective Optimization Method Deep Learning-Enabled Prediction of Daily Solar Irradiance from Simulated Climate Data CascadeTransformer: Multi-label Classification with Transformer in Chronic Disease Prediction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1