学习可解释的、高性能的自动驾驶政策

Rohan R. Paleja, Yaru Niu, Andrew Silva, Chace Ritchie, Sugju Choi, M. Gombolay
{"title":"学习可解释的、高性能的自动驾驶政策","authors":"Rohan R. Paleja, Yaru Niu, Andrew Silva, Chace Ritchie, Sugju Choi, M. Gombolay","doi":"10.15607/rss.2022.xviii.068","DOIUrl":null,"url":null,"abstract":"Gradient-based approaches in reinforcement learning (RL) have achieved tremendous success in learning policies for autonomous vehicles. While the performance of these approaches warrants real-world adoption, these policies lack interpretability, limiting deployability in the safety-critical and legally-regulated domain of autonomous driving (AD). AD requires interpretable and verifiable control policies that maintain high performance. We propose Interpretable Continuous Control Trees (ICCTs), a tree-based model that can be optimized via modern, gradient-based, RL approaches to produce high-performing, interpretable policies. The key to our approach is a procedure for allowing direct optimization in a sparse decision-tree-like representation. We validate ICCTs against baselines across six domains, showing that ICCTs are capable of learning interpretable policy representations that parity or outperform baselines by up to 33% in AD scenarios while achieving a 300x-600x reduction in the number of policy parameters against deep learning baselines. Furthermore, we demonstrate the interpretability and utility of our ICCTs through a 14-car physical robot demonstration.","PeriodicalId":340265,"journal":{"name":"Robotics: Science and Systems XVIII","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Learning Interpretable, High-Performing Policies for Autonomous Driving\",\"authors\":\"Rohan R. Paleja, Yaru Niu, Andrew Silva, Chace Ritchie, Sugju Choi, M. Gombolay\",\"doi\":\"10.15607/rss.2022.xviii.068\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Gradient-based approaches in reinforcement learning (RL) have achieved tremendous success in learning policies for autonomous vehicles. While the performance of these approaches warrants real-world adoption, these policies lack interpretability, limiting deployability in the safety-critical and legally-regulated domain of autonomous driving (AD). AD requires interpretable and verifiable control policies that maintain high performance. We propose Interpretable Continuous Control Trees (ICCTs), a tree-based model that can be optimized via modern, gradient-based, RL approaches to produce high-performing, interpretable policies. The key to our approach is a procedure for allowing direct optimization in a sparse decision-tree-like representation. We validate ICCTs against baselines across six domains, showing that ICCTs are capable of learning interpretable policy representations that parity or outperform baselines by up to 33% in AD scenarios while achieving a 300x-600x reduction in the number of policy parameters against deep learning baselines. Furthermore, we demonstrate the interpretability and utility of our ICCTs through a 14-car physical robot demonstration.\",\"PeriodicalId\":340265,\"journal\":{\"name\":\"Robotics: Science and Systems XVIII\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-02-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Robotics: Science and Systems XVIII\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15607/rss.2022.xviii.068\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics: Science and Systems XVIII","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15607/rss.2022.xviii.068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

基于梯度的强化学习方法在自动驾驶汽车的策略学习中取得了巨大的成功。虽然这些方法的性能保证了实际应用,但这些策略缺乏可解释性,限制了在安全关键和法律监管的自动驾驶(AD)领域的部署。AD需要可解释和可验证的控制策略,以保持高性能。我们提出了可解释的连续控制树(icct),这是一种基于树的模型,可以通过现代的、基于梯度的RL方法进行优化,以产生高性能的、可解释的策略。我们方法的关键是允许在稀疏决策树表示中直接优化的过程。我们在六个领域的基线上验证了icct,表明icct能够学习可解释的策略表示,在AD场景中,这些策略表示与基线相同或优于基线高达33%,同时在深度学习基线上实现了策略参数数量减少300 -600倍。此外,我们通过14辆物理机器人演示展示了icct的可解释性和实用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Learning Interpretable, High-Performing Policies for Autonomous Driving
Gradient-based approaches in reinforcement learning (RL) have achieved tremendous success in learning policies for autonomous vehicles. While the performance of these approaches warrants real-world adoption, these policies lack interpretability, limiting deployability in the safety-critical and legally-regulated domain of autonomous driving (AD). AD requires interpretable and verifiable control policies that maintain high performance. We propose Interpretable Continuous Control Trees (ICCTs), a tree-based model that can be optimized via modern, gradient-based, RL approaches to produce high-performing, interpretable policies. The key to our approach is a procedure for allowing direct optimization in a sparse decision-tree-like representation. We validate ICCTs against baselines across six domains, showing that ICCTs are capable of learning interpretable policy representations that parity or outperform baselines by up to 33% in AD scenarios while achieving a 300x-600x reduction in the number of policy parameters against deep learning baselines. Furthermore, we demonstrate the interpretability and utility of our ICCTs through a 14-car physical robot demonstration.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Underwater Robot-To-Human Communication Via Motion: Implementation and Full-Loop Human Interface Evaluation Meta Value Learning for Fast Policy-Centric Optimal Motion Planning A Learning-based Iterative Control Framework for Controlling a Robot Arm with Pneumatic Artificial Muscles Aerial Layouting: Design and Control of a Compliant and Actuated End-Effector for Precise In-flight Marking on Ceilings Occupancy-SLAM: Simultaneously Optimizing Robot Poses and Continuous Occupancy Map
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1