基于深度 Q 网络的多无人机系统交通规划，并考虑后见之明的经验回放和经济因素

IF 2.2 3区工程技术 Q2 ENGINEERING, AEROSPACE Aerospace Pub Date : 2023-11-22 DOI:10.3390/aerospace10120980

Shao Xuan Seah, S. Srigrarom

{"title":"基于深度 Q 网络的多无人机系统交通规划，并考虑后见之明的经验回放和经济因素","authors":"Shao Xuan Seah, S. Srigrarom","doi":"10.3390/aerospace10120980","DOIUrl":null,"url":null,"abstract":"This paper explores the use of deep reinforcement learning in solving the multi-agent aircraft traffic planning (individual paths) and collision avoidance problem for a multiple UAS, such as that for a cargo drone network. Specifically, the Deep Q-Network (DQN) with Hindsight Experience Replay framework is adopted and trained on a three-dimensional state space that represents a congested urban environment with dynamic obstacles. Through formalising a Markov decision process (MDP), various flight and control parameters are varied between training simulations to study their effects on agent performance. Both fully observable MDPs (FOMDPs) and partially observable MDPs (POMDPs) are formulated to understand the role of shaping reward signals on training performance. While conventional traffic planning and optimisation techniques are evaluated based on path length or time, this paper aims to incorporate economic analysis by considering tangible and intangible sources of cost, such as the cost of energy, the value of time (VOT) and the value of reliability (VOR). By comparing outcomes from an integration of multiple cost sources, this paper is better able to gauge the impact of various parameters on efficiency. To further explore the feasibility of multiple UAS traffic planning, such as cargo drone networks, the trained agents are also subjected to multi-agent point-to-point and hub-and-spoke network environments. In these simulations, delivery orders are generated using a discrete event simulator with an arrival rate, which is varied to investigate the effect of travel demand on economic costs. Simulation results point to the importance of signal engineering, as reward signals play a crucial role in shaping reinforcements. The results also reflect an increase in costs for environments where congestion and arrival time uncertainty arise because of the presence of other agents in the network.","PeriodicalId":48525,"journal":{"name":"Aerospace","volume":"30 4","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multiple UAS Traffic Planning Based on Deep Q-Network with Hindsight Experience Replay and Economic Considerations\",\"authors\":\"Shao Xuan Seah, S. Srigrarom\",\"doi\":\"10.3390/aerospace10120980\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper explores the use of deep reinforcement learning in solving the multi-agent aircraft traffic planning (individual paths) and collision avoidance problem for a multiple UAS, such as that for a cargo drone network. Specifically, the Deep Q-Network (DQN) with Hindsight Experience Replay framework is adopted and trained on a three-dimensional state space that represents a congested urban environment with dynamic obstacles. Through formalising a Markov decision process (MDP), various flight and control parameters are varied between training simulations to study their effects on agent performance. Both fully observable MDPs (FOMDPs) and partially observable MDPs (POMDPs) are formulated to understand the role of shaping reward signals on training performance. While conventional traffic planning and optimisation techniques are evaluated based on path length or time, this paper aims to incorporate economic analysis by considering tangible and intangible sources of cost, such as the cost of energy, the value of time (VOT) and the value of reliability (VOR). By comparing outcomes from an integration of multiple cost sources, this paper is better able to gauge the impact of various parameters on efficiency. To further explore the feasibility of multiple UAS traffic planning, such as cargo drone networks, the trained agents are also subjected to multi-agent point-to-point and hub-and-spoke network environments. In these simulations, delivery orders are generated using a discrete event simulator with an arrival rate, which is varied to investigate the effect of travel demand on economic costs. Simulation results point to the importance of signal engineering, as reward signals play a crucial role in shaping reinforcements. The results also reflect an increase in costs for environments where congestion and arrival time uncertainty arise because of the presence of other agents in the network.\",\"PeriodicalId\":48525,\"journal\":{\"name\":\"Aerospace\",\"volume\":\"30 4\",\"pages\":\"\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2023-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Aerospace\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.3390/aerospace10120980\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, AEROSPACE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Aerospace","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3390/aerospace10120980","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, AEROSPACE","Score":null,"Total":0}

引用次数: 0

摘要

本文探讨了深度强化学习在解决多无人机系统（如货运无人机网络）的多代理飞机交通规划（单个路径）和避免碰撞问题中的应用。具体来说，采用了具有后视经验回放功能的深度 Q 网络（DQN）框架，并在三维状态空间上进行了训练，该空间代表了具有动态障碍物的拥挤城市环境。通过将马尔可夫决策过程（MDP）形式化，在训练模拟之间改变各种飞行和控制参数，以研究它们对代理性能的影响。为了解塑造奖励信号对训练表现的作用，我们制定了完全可观测的 MDP（FOMDP）和部分可观测的 MDP（POMDP）。传统的交通规划和优化技术基于路径长度或时间进行评估，而本文旨在通过考虑有形和无形的成本来源，如能源成本、时间价值（VOT）和可靠性价值（VOR），将经济分析纳入其中。通过比较多种成本来源的综合结果，本文能够更好地衡量各种参数对效率的影响。为了进一步探索多无人机系统交通规划（如货运无人机网络）的可行性，还对训练有素的代理进行了多代理点对点和枢纽辐射网络环境模拟。在这些模拟中，使用离散事件模拟器生成交货订单，并改变到达率，以研究旅行需求对经济成本的影响。模拟结果表明了信号工程的重要性，因为奖励信号在形成增援方面起着至关重要的作用。仿真结果还反映出，在网络中存在其他代理而导致拥堵和到达时间不确定的环境下，成本会增加。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Multiple UAS Traffic Planning Based on Deep Q-Network with Hindsight Experience Replay and Economic Considerations

This paper explores the use of deep reinforcement learning in solving the multi-agent aircraft traffic planning (individual paths) and collision avoidance problem for a multiple UAS, such as that for a cargo drone network. Specifically, the Deep Q-Network (DQN) with Hindsight Experience Replay framework is adopted and trained on a three-dimensional state space that represents a congested urban environment with dynamic obstacles. Through formalising a Markov decision process (MDP), various flight and control parameters are varied between training simulations to study their effects on agent performance. Both fully observable MDPs (FOMDPs) and partially observable MDPs (POMDPs) are formulated to understand the role of shaping reward signals on training performance. While conventional traffic planning and optimisation techniques are evaluated based on path length or time, this paper aims to incorporate economic analysis by considering tangible and intangible sources of cost, such as the cost of energy, the value of time (VOT) and the value of reliability (VOR). By comparing outcomes from an integration of multiple cost sources, this paper is better able to gauge the impact of various parameters on efficiency. To further explore the feasibility of multiple UAS traffic planning, such as cargo drone networks, the trained agents are also subjected to multi-agent point-to-point and hub-and-spoke network environments. In these simulations, delivery orders are generated using a discrete event simulator with an arrival rate, which is varied to investigate the effect of travel demand on economic costs. Simulation results point to the importance of signal engineering, as reward signals play a crucial role in shaping reinforcements. The results also reflect an increase in costs for environments where congestion and arrival time uncertainty arise because of the presence of other agents in the network.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Aerospace ENGINEERING, AEROSPACE-

CiteScore

3.40

自引率

23.10%

发文量

661

审稿时长

6 weeks

期刊介绍： Aerospace is a multidisciplinary science inviting submissions on, but not limited to, the following subject areas: aerodynamics computational fluid dynamics fluid-structure interaction flight mechanics plasmas research instrumentation test facilities environment material science structural analysis thermophysics and heat transfer thermal-structure interaction aeroacoustics optics electromagnetism and radar propulsion power generation and conversion fuels and propellants combustion multidisciplinary design optimization software engineering data analysis signal and image processing artificial intelligence aerospace vehicles'' operation, control and maintenance risk and reliability human factors human-automation interaction airline operations and management air traffic management airport design meteorology space exploration multi-physics interaction.