{"title":"Constrained Reinforcement Learning Using Distributional Representation for Trustworthy Quadrotor UAV Tracking Control","authors":"Yanran Wang;David Boyle","doi":"10.1109/TASE.2024.3432405","DOIUrl":null,"url":null,"abstract":"Simultaneously accurate and reliable tracking control for quadrotors in complex dynamic environments is challenging. The chaotic nature of aerodynamics, derived from drag forces and moment variations, makes precise identification difficult. Consequently, many existing quadrotor tracking systems treat these aerodynamic effects as simple ‘disturbances’ in conventional control approaches. We propose a novel and interpretable trajectory tracker integrating a distributional Reinforcement Learning (RL) disturbance estimator for unknown aerodynamic effects with a Stochastic Model Predictive Controller (SMPC). Specifically, the proposed estimator ‘Constrained Distributional REinforced-Disturbance-estimator’ (ConsDRED) effectively identifies uncertainties between the true and estimated values of aerodynamic effects. Control parameterization employs simplified affine disturbance feedback to ensure convexity, which is seamlessly integrated with the SMPC. We theoretically guarantee that ConsDRED achieves an optimal global convergence rate, and sublinear rates if constraints are violated with certain error decreases as neural network dimensions increase. To demonstrate practicality, we show convergent training, in simulation and real-world experiments, and empirically verify that ConsDRED is less sensitive to hyperparameter settings compared with canonical constrained RL. Our system substantially improves accumulative tracking errors by at least 70%, compared with the recent art. Importantly, the proposed ConsDRED-SMPC framework balances the trade-off between pursuing high performance and obeying conservative constraints for practical implementations. Note to Practitioners—This work is motivated by challenges in training Reinforcement Learning (RL) for autonomous navigation in unmanned aerial vehicles, but its implications extend to other high-criticality applications in, for example, healthcare and financial services. The implementation of RL algorithm policies may exhibit various deficiencies, including (i) opaque or unstable training due to the blackbox nature of deep neural networks, (ii) difficulty in reproducing RL outcomes, and (iii) heightened costs associated with robustness investigation, e.g., hyperparameter tuning and generalization. We present a novel ‘Constrained Distributional REinforced-Disturbance-estimator’ (ConsDRED) to identify uncertainties arising from aerodynamic disturbances in agile flight. The proposed algorithm demonstrates at least an optimal convergence rate for both global optimization and constraint violations. Theoretical guarantees offered by this approach can make training convergence processes more transparent and provide a degree of confidence in the expected convergence outcomes in real-world applications, thereby addressing, to a considerable extent, the deficiencies related to (i) and (ii). Our robustness investigation, related to (iii), empirically demonstrates that ConsDRED is significantly less sensitive to hyperparameter settings compared to traditional constrained RL approaches. The experiments further showcase the generalization capability of ConsDRED by introducing new and previously unseen external forces in real-world scenarios. While ConsDRED strikes a reasonable balance between performance and interpretability, there remains a trade-off with conservative decision-making. Hence, we must strike a balance between the desired degree of interpretability and the acceptable level of scalability in RL algorithms. Our proposed flight control framework evidences such a balance between high performance and safety constraints. A fundamental limitation is the computational complexity, where we sample at 16 Hz for disturbance estimation owing to limited onboard computational resources. This may be improved in the future by implementing dedicated hardware.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"5877-5894"},"PeriodicalIF":6.4000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10614102/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Simultaneously accurate and reliable tracking control for quadrotors in complex dynamic environments is challenging. The chaotic nature of aerodynamics, derived from drag forces and moment variations, makes precise identification difficult. Consequently, many existing quadrotor tracking systems treat these aerodynamic effects as simple ‘disturbances’ in conventional control approaches. We propose a novel and interpretable trajectory tracker integrating a distributional Reinforcement Learning (RL) disturbance estimator for unknown aerodynamic effects with a Stochastic Model Predictive Controller (SMPC). Specifically, the proposed estimator ‘Constrained Distributional REinforced-Disturbance-estimator’ (ConsDRED) effectively identifies uncertainties between the true and estimated values of aerodynamic effects. Control parameterization employs simplified affine disturbance feedback to ensure convexity, which is seamlessly integrated with the SMPC. We theoretically guarantee that ConsDRED achieves an optimal global convergence rate, and sublinear rates if constraints are violated with certain error decreases as neural network dimensions increase. To demonstrate practicality, we show convergent training, in simulation and real-world experiments, and empirically verify that ConsDRED is less sensitive to hyperparameter settings compared with canonical constrained RL. Our system substantially improves accumulative tracking errors by at least 70%, compared with the recent art. Importantly, the proposed ConsDRED-SMPC framework balances the trade-off between pursuing high performance and obeying conservative constraints for practical implementations. Note to Practitioners—This work is motivated by challenges in training Reinforcement Learning (RL) for autonomous navigation in unmanned aerial vehicles, but its implications extend to other high-criticality applications in, for example, healthcare and financial services. The implementation of RL algorithm policies may exhibit various deficiencies, including (i) opaque or unstable training due to the blackbox nature of deep neural networks, (ii) difficulty in reproducing RL outcomes, and (iii) heightened costs associated with robustness investigation, e.g., hyperparameter tuning and generalization. We present a novel ‘Constrained Distributional REinforced-Disturbance-estimator’ (ConsDRED) to identify uncertainties arising from aerodynamic disturbances in agile flight. The proposed algorithm demonstrates at least an optimal convergence rate for both global optimization and constraint violations. Theoretical guarantees offered by this approach can make training convergence processes more transparent and provide a degree of confidence in the expected convergence outcomes in real-world applications, thereby addressing, to a considerable extent, the deficiencies related to (i) and (ii). Our robustness investigation, related to (iii), empirically demonstrates that ConsDRED is significantly less sensitive to hyperparameter settings compared to traditional constrained RL approaches. The experiments further showcase the generalization capability of ConsDRED by introducing new and previously unseen external forces in real-world scenarios. While ConsDRED strikes a reasonable balance between performance and interpretability, there remains a trade-off with conservative decision-making. Hence, we must strike a balance between the desired degree of interpretability and the acceptable level of scalability in RL algorithms. Our proposed flight control framework evidences such a balance between high performance and safety constraints. A fundamental limitation is the computational complexity, where we sample at 16 Hz for disturbance estimation owing to limited onboard computational resources. This may be improved in the future by implementing dedicated hardware.
期刊介绍:
The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.