Constrained Reinforcement Learning Using Distributional Representation for Trustworthy Quadrotor UAV Tracking Control

IF 6.4 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS IEEE Transactions on Automation Science and Engineering Pub Date : 2024-07-29 DOI:10.1109/TASE.2024.3432405

Yanran Wang;David Boyle

{"title":"Constrained Reinforcement Learning Using Distributional Representation for Trustworthy Quadrotor UAV Tracking Control","authors":"Yanran Wang;David Boyle","doi":"10.1109/TASE.2024.3432405","DOIUrl":null,"url":null,"abstract":"Simultaneously accurate and reliable tracking control for quadrotors in complex dynamic environments is challenging. The chaotic nature of aerodynamics, derived from drag forces and moment variations, makes precise identification difficult. Consequently, many existing quadrotor tracking systems treat these aerodynamic effects as simple ‘disturbances’ in conventional control approaches. We propose a novel and interpretable trajectory tracker integrating a distributional Reinforcement Learning (RL) disturbance estimator for unknown aerodynamic effects with a Stochastic Model Predictive Controller (SMPC). Specifically, the proposed estimator ‘Constrained Distributional REinforced-Disturbance-estimator’ (ConsDRED) effectively identifies uncertainties between the true and estimated values of aerodynamic effects. Control parameterization employs simplified affine disturbance feedback to ensure convexity, which is seamlessly integrated with the SMPC. We theoretically guarantee that ConsDRED achieves an optimal global convergence rate, and sublinear rates if constraints are violated with certain error decreases as neural network dimensions increase. To demonstrate practicality, we show convergent training, in simulation and real-world experiments, and empirically verify that ConsDRED is less sensitive to hyperparameter settings compared with canonical constrained RL. Our system substantially improves accumulative tracking errors by at least 70%, compared with the recent art. Importantly, the proposed ConsDRED-SMPC framework balances the trade-off between pursuing high performance and obeying conservative constraints for practical implementations. Note to Practitioners—This work is motivated by challenges in training Reinforcement Learning (RL) for autonomous navigation in unmanned aerial vehicles, but its implications extend to other high-criticality applications in, for example, healthcare and financial services. The implementation of RL algorithm policies may exhibit various deficiencies, including (i) opaque or unstable training due to the blackbox nature of deep neural networks, (ii) difficulty in reproducing RL outcomes, and (iii) heightened costs associated with robustness investigation, e.g., hyperparameter tuning and generalization. We present a novel ‘Constrained Distributional REinforced-Disturbance-estimator’ (ConsDRED) to identify uncertainties arising from aerodynamic disturbances in agile flight. The proposed algorithm demonstrates at least an optimal convergence rate for both global optimization and constraint violations. Theoretical guarantees offered by this approach can make training convergence processes more transparent and provide a degree of confidence in the expected convergence outcomes in real-world applications, thereby addressing, to a considerable extent, the deficiencies related to (i) and (ii). Our robustness investigation, related to (iii), empirically demonstrates that ConsDRED is significantly less sensitive to hyperparameter settings compared to traditional constrained RL approaches. The experiments further showcase the generalization capability of ConsDRED by introducing new and previously unseen external forces in real-world scenarios. While ConsDRED strikes a reasonable balance between performance and interpretability, there remains a trade-off with conservative decision-making. Hence, we must strike a balance between the desired degree of interpretability and the acceptable level of scalability in RL algorithms. Our proposed flight control framework evidences such a balance between high performance and safety constraints. A fundamental limitation is the computational complexity, where we sample at 16 Hz for disturbance estimation owing to limited onboard computational resources. This may be improved in the future by implementing dedicated hardware.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"5877-5894"},"PeriodicalIF":6.4000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10614102/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Simultaneously accurate and reliable tracking control for quadrotors in complex dynamic environments is challenging. The chaotic nature of aerodynamics, derived from drag forces and moment variations, makes precise identification difficult. Consequently, many existing quadrotor tracking systems treat these aerodynamic effects as simple ‘disturbances’ in conventional control approaches. We propose a novel and interpretable trajectory tracker integrating a distributional Reinforcement Learning (RL) disturbance estimator for unknown aerodynamic effects with a Stochastic Model Predictive Controller (SMPC). Specifically, the proposed estimator ‘Constrained Distributional REinforced-Disturbance-estimator’ (ConsDRED) effectively identifies uncertainties between the true and estimated values of aerodynamic effects. Control parameterization employs simplified affine disturbance feedback to ensure convexity, which is seamlessly integrated with the SMPC. We theoretically guarantee that ConsDRED achieves an optimal global convergence rate, and sublinear rates if constraints are violated with certain error decreases as neural network dimensions increase. To demonstrate practicality, we show convergent training, in simulation and real-world experiments, and empirically verify that ConsDRED is less sensitive to hyperparameter settings compared with canonical constrained RL. Our system substantially improves accumulative tracking errors by at least 70%, compared with the recent art. Importantly, the proposed ConsDRED-SMPC framework balances the trade-off between pursuing high performance and obeying conservative constraints for practical implementations. Note to Practitioners—This work is motivated by challenges in training Reinforcement Learning (RL) for autonomous navigation in unmanned aerial vehicles, but its implications extend to other high-criticality applications in, for example, healthcare and financial services. The implementation of RL algorithm policies may exhibit various deficiencies, including (i) opaque or unstable training due to the blackbox nature of deep neural networks, (ii) difficulty in reproducing RL outcomes, and (iii) heightened costs associated with robustness investigation, e.g., hyperparameter tuning and generalization. We present a novel ‘Constrained Distributional REinforced-Disturbance-estimator’ (ConsDRED) to identify uncertainties arising from aerodynamic disturbances in agile flight. The proposed algorithm demonstrates at least an optimal convergence rate for both global optimization and constraint violations. Theoretical guarantees offered by this approach can make training convergence processes more transparent and provide a degree of confidence in the expected convergence outcomes in real-world applications, thereby addressing, to a considerable extent, the deficiencies related to (i) and (ii). Our robustness investigation, related to (iii), empirically demonstrates that ConsDRED is significantly less sensitive to hyperparameter settings compared to traditional constrained RL approaches. The experiments further showcase the generalization capability of ConsDRED by introducing new and previously unseen external forces in real-world scenarios. While ConsDRED strikes a reasonable balance between performance and interpretability, there remains a trade-off with conservative decision-making. Hence, we must strike a balance between the desired degree of interpretability and the acceptable level of scalability in RL algorithms. Our proposed flight control framework evidences such a balance between high performance and safety constraints. A fundamental limitation is the computational complexity, where we sample at 16 Hz for disturbance estimation owing to limited onboard computational resources. This may be improved in the future by implementing dedicated hardware.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用分布式表示进行受限强化学习，实现值得信赖的四旋翼无人机跟踪控制

在复杂的动态环境中，四旋翼飞行器同时实现准确可靠的跟踪控制是一个具有挑战性的问题。空气动力学的混沌性质，源自阻力和力矩的变化，使得精确的识别变得困难。因此，许多现有的四旋翼跟踪系统将这些空气动力学效应视为传统控制方法中的简单“干扰”。我们提出了一种新型的可解释的轨迹跟踪器，该跟踪器集成了用于未知气动效应的分布式强化学习（RL）干扰估计器和随机模型预测控制器（SMPC）。具体来说，所提出的估计量“约束分布增强扰动估计量”（ConsDRED）有效地识别了气动效应真实值和估定值之间的不确定性。控制参数化采用简化的仿射扰动反馈来保证系统的凸性，并与SMPC无缝集成。从理论上保证了ConsDRED达到最优的全局收敛速率，并且随着神经网络维数的增加，在一定误差下违反约束的次线性速率降低。为了证明实用性，我们在模拟和现实世界的实验中展示了收敛训练，并经验验证了与规范约束强化学习相比，ConsDRED对超参数设置的敏感性较低。与最近的技术相比，我们的系统大大提高了累计跟踪误差至少70%。重要的是，建议的condred - smpc框架在追求高性能和遵守实际实现的保守约束之间取得了平衡。从业人员注意事项：这项工作的动机是在无人驾驶飞行器的自主导航中训练强化学习（RL）的挑战，但其影响扩展到其他高关键应用，例如医疗保健和金融服务。RL算法策略的实施可能会出现各种缺陷，包括(i)由于深度神经网络的黑箱性质而导致的不透明或不稳定的训练，（ii）难以再现RL结果，以及（iii）与鲁棒性调查相关的成本增加，例如，超参数调整和泛化。我们提出了一种新的“约束分布增强扰动估计器”（ConsDRED）来识别敏捷飞行中由气动扰动引起的不确定性。所提出的算法对于全局优化和约束违反至少具有最优的收敛速度。该方法提供的理论保证可以使训练收敛过程更加透明，并为实际应用中的预期收敛结果提供一定程度的信心，从而在很大程度上解决与(i)和（ii）相关的缺陷。我们对与（iii）相关的鲁棒性调查经验表明，与传统的约束强化学习方法相比，ConsDRED对超参数设置的敏感性明显降低。实验通过在现实世界场景中引入新的和以前未见过的外力，进一步展示了ConsDRED的泛化能力。虽然ConsDRED在性能和可解释性之间取得了合理的平衡，但仍然存在与保守决策的权衡。因此，我们必须在RL算法中期望的可解释性程度和可接受的可伸缩性水平之间取得平衡。我们提出的飞行控制框架证明了高性能和安全约束之间的平衡。一个基本的限制是计算复杂性，由于有限的板载计算资源，我们在16hz采样进行干扰估计。这可能在将来通过实现专用硬件得到改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Automation Science and Engineering 工程技术-自动化与控制系统

CiteScore

12.50

自引率

14.30%

发文量

404

审稿时长

3.0 months

期刊介绍： The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.