学习可解释的、高性能的自动驾驶政策

Robotics: Science and Systems XVIII Pub Date : 2022-02-04 DOI:10.15607/rss.2022.xviii.068

Rohan R. Paleja, Yaru Niu, Andrew Silva, Chace Ritchie, Sugju Choi, M. Gombolay

{"title":"学习可解释的、高性能的自动驾驶政策","authors":"Rohan R. Paleja, Yaru Niu, Andrew Silva, Chace Ritchie, Sugju Choi, M. Gombolay","doi":"10.15607/rss.2022.xviii.068","DOIUrl":null,"url":null,"abstract":"Gradient-based approaches in reinforcement learning (RL) have achieved tremendous success in learning policies for autonomous vehicles. While the performance of these approaches warrants real-world adoption, these policies lack interpretability, limiting deployability in the safety-critical and legally-regulated domain of autonomous driving (AD). AD requires interpretable and verifiable control policies that maintain high performance. We propose Interpretable Continuous Control Trees (ICCTs), a tree-based model that can be optimized via modern, gradient-based, RL approaches to produce high-performing, interpretable policies. The key to our approach is a procedure for allowing direct optimization in a sparse decision-tree-like representation. We validate ICCTs against baselines across six domains, showing that ICCTs are capable of learning interpretable policy representations that parity or outperform baselines by up to 33% in AD scenarios while achieving a 300x-600x reduction in the number of policy parameters against deep learning baselines. Furthermore, we demonstrate the interpretability and utility of our ICCTs through a 14-car physical robot demonstration.","PeriodicalId":340265,"journal":{"name":"Robotics: Science and Systems XVIII","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Learning Interpretable, High-Performing Policies for Autonomous Driving\",\"authors\":\"Rohan R. Paleja, Yaru Niu, Andrew Silva, Chace Ritchie, Sugju Choi, M. Gombolay\",\"doi\":\"10.15607/rss.2022.xviii.068\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Gradient-based approaches in reinforcement learning (RL) have achieved tremendous success in learning policies for autonomous vehicles. While the performance of these approaches warrants real-world adoption, these policies lack interpretability, limiting deployability in the safety-critical and legally-regulated domain of autonomous driving (AD). AD requires interpretable and verifiable control policies that maintain high performance. We propose Interpretable Continuous Control Trees (ICCTs), a tree-based model that can be optimized via modern, gradient-based, RL approaches to produce high-performing, interpretable policies. The key to our approach is a procedure for allowing direct optimization in a sparse decision-tree-like representation. We validate ICCTs against baselines across six domains, showing that ICCTs are capable of learning interpretable policy representations that parity or outperform baselines by up to 33% in AD scenarios while achieving a 300x-600x reduction in the number of policy parameters against deep learning baselines. Furthermore, we demonstrate the interpretability and utility of our ICCTs through a 14-car physical robot demonstration.\",\"PeriodicalId\":340265,\"journal\":{\"name\":\"Robotics: Science and Systems XVIII\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-02-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Robotics: Science and Systems XVIII\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15607/rss.2022.xviii.068\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics: Science and Systems XVIII","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15607/rss.2022.xviii.068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

基于梯度的强化学习方法在自动驾驶汽车的策略学习中取得了巨大的成功。虽然这些方法的性能保证了实际应用，但这些策略缺乏可解释性，限制了在安全关键和法律监管的自动驾驶(AD)领域的部署。AD需要可解释和可验证的控制策略，以保持高性能。我们提出了可解释的连续控制树(icct)，这是一种基于树的模型，可以通过现代的、基于梯度的RL方法进行优化，以产生高性能的、可解释的策略。我们方法的关键是允许在稀疏决策树表示中直接优化的过程。我们在六个领域的基线上验证了icct，表明icct能够学习可解释的策略表示，在AD场景中，这些策略表示与基线相同或优于基线高达33%，同时在深度学习基线上实现了策略参数数量减少300 -600倍。此外，我们通过14辆物理机器人演示展示了icct的可解释性和实用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Learning Interpretable, High-Performing Policies for Autonomous Driving

Gradient-based approaches in reinforcement learning (RL) have achieved tremendous success in learning policies for autonomous vehicles. While the performance of these approaches warrants real-world adoption, these policies lack interpretability, limiting deployability in the safety-critical and legally-regulated domain of autonomous driving (AD). AD requires interpretable and verifiable control policies that maintain high performance. We propose Interpretable Continuous Control Trees (ICCTs), a tree-based model that can be optimized via modern, gradient-based, RL approaches to produce high-performing, interpretable policies. The key to our approach is a procedure for allowing direct optimization in a sparse decision-tree-like representation. We validate ICCTs against baselines across six domains, showing that ICCTs are capable of learning interpretable policy representations that parity or outperform baselines by up to 33% in AD scenarios while achieving a 300x-600x reduction in the number of policy parameters against deep learning baselines. Furthermore, we demonstrate the interpretability and utility of our ICCTs through a 14-car physical robot demonstration.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Robotics: Science and Systems XVIII

自引率

0.00%

发文量