人类是人工智能的导师:强化人在环强化学习,实现安全高效的自动驾驶

IF 12.5 Q1 TRANSPORTATION Communications in Transportation Research Pub Date : 2024-05-08 DOI:10.1016/j.commtr.2024.100127
Zilin Huang, Zihao Sheng, Chengyuan Ma, Sikai Chen
{"title":"人类是人工智能的导师:强化人在环强化学习,实现安全高效的自动驾驶","authors":"Zilin Huang,&nbsp;Zihao Sheng,&nbsp;Chengyuan Ma,&nbsp;Sikai Chen","doi":"10.1016/j.commtr.2024.100127","DOIUrl":null,"url":null,"abstract":"<div><p>Despite significant progress in autonomous vehicles (AVs), the development of driving policies that ensure both the safety of AVs and traffic flow efficiency has not yet been fully explored. In this paper, we propose an enhanced human-in-the-loop reinforcement learning method, termed the Human as AI mentor-based deep reinforcement learning (HAIM-DRL) framework, which facilitates safe and efficient autonomous driving in mixed traffic platoon. Drawing inspiration from the human learning process, we first introduce an innovative learning paradigm that effectively injects human intelligence into AI, termed Human as AI mentor (HAIM). In this paradigm, the human expert serves as a mentor to the AI agent. While allowing the agent to sufficiently explore uncertain environments, the human expert can take control in dangerous situations and demonstrate correct actions to avoid potential accidents. On the other hand, the agent could be guided to minimize traffic flow disturbance, thereby optimizing traffic flow efficiency. In detail, HAIM-DRL leverages data collected from free exploration and partial human demonstrations as its two training sources. Remarkably, we circumvent the intricate process of manually designing reward functions; instead, we directly derive proxy state-action values from partial human demonstrations to guide the agents’ policy learning. Additionally, we employ a minimal intervention technique to reduce the human mentor’s cognitive load. Comparative results show that HAIM-DRL outperforms traditional methods in driving safety, sampling efficiency, mitigation of traffic flow disturbance, and generalizability to unseen traffic scenarios.</p></div>","PeriodicalId":100292,"journal":{"name":"Communications in Transportation Research","volume":null,"pages":null},"PeriodicalIF":12.5000,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772424724000106/pdfft?md5=926541f5937b5ee27465791694dbead5&pid=1-s2.0-S2772424724000106-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Human as AI mentor: Enhanced human-in-the-loop reinforcement learning for safe and efficient autonomous driving\",\"authors\":\"Zilin Huang,&nbsp;Zihao Sheng,&nbsp;Chengyuan Ma,&nbsp;Sikai Chen\",\"doi\":\"10.1016/j.commtr.2024.100127\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Despite significant progress in autonomous vehicles (AVs), the development of driving policies that ensure both the safety of AVs and traffic flow efficiency has not yet been fully explored. In this paper, we propose an enhanced human-in-the-loop reinforcement learning method, termed the Human as AI mentor-based deep reinforcement learning (HAIM-DRL) framework, which facilitates safe and efficient autonomous driving in mixed traffic platoon. Drawing inspiration from the human learning process, we first introduce an innovative learning paradigm that effectively injects human intelligence into AI, termed Human as AI mentor (HAIM). In this paradigm, the human expert serves as a mentor to the AI agent. While allowing the agent to sufficiently explore uncertain environments, the human expert can take control in dangerous situations and demonstrate correct actions to avoid potential accidents. On the other hand, the agent could be guided to minimize traffic flow disturbance, thereby optimizing traffic flow efficiency. In detail, HAIM-DRL leverages data collected from free exploration and partial human demonstrations as its two training sources. Remarkably, we circumvent the intricate process of manually designing reward functions; instead, we directly derive proxy state-action values from partial human demonstrations to guide the agents’ policy learning. Additionally, we employ a minimal intervention technique to reduce the human mentor’s cognitive load. Comparative results show that HAIM-DRL outperforms traditional methods in driving safety, sampling efficiency, mitigation of traffic flow disturbance, and generalizability to unseen traffic scenarios.</p></div>\",\"PeriodicalId\":100292,\"journal\":{\"name\":\"Communications in Transportation Research\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":12.5000,\"publicationDate\":\"2024-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2772424724000106/pdfft?md5=926541f5937b5ee27465791694dbead5&pid=1-s2.0-S2772424724000106-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Communications in Transportation Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772424724000106\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"TRANSPORTATION\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications in Transportation Research","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772424724000106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TRANSPORTATION","Score":null,"Total":0}
引用次数: 0

摘要

尽管自动驾驶汽车(AVs)取得了重大进展,但如何制定既能确保自动驾驶汽车安全又能提高交通流量效率的驾驶政策尚未得到充分探索。在本文中,我们提出了一种增强型环内强化学习方法,即基于人工智能导师的人类深度强化学习(HAIM-DRL)框架,该框架有助于在混合交通队列中实现安全高效的自动驾驶。从人类的学习过程中汲取灵感,我们首先介绍了一种创新的学习范式,它能有效地将人类智能注入人工智能,即 "人类即人工智能导师"(HAIM)。在这一范例中,人类专家充当人工智能代理的导师。在允许代理充分探索不确定环境的同时,人类专家可以在危险情况下进行控制,并示范正确的操作以避免潜在事故。另一方面,人工智能代理可以在指导下尽量减少对交通流的干扰,从而优化交通流效率。具体来说,HAIM-DRL 利用从自由探索和部分人类示范中收集的数据作为两个训练源。值得注意的是,我们避免了人工设计奖励函数的复杂过程,而是直接从部分人类演示中得出代理状态-行动值,以指导代理的策略学习。此外,我们还采用了最小干预技术,以减轻人类指导员的认知负担。比较结果表明,HAIM-DRL 在驾驶安全性、采样效率、减轻交通流干扰以及对未知交通场景的泛化能力方面均优于传统方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Human as AI mentor: Enhanced human-in-the-loop reinforcement learning for safe and efficient autonomous driving

Despite significant progress in autonomous vehicles (AVs), the development of driving policies that ensure both the safety of AVs and traffic flow efficiency has not yet been fully explored. In this paper, we propose an enhanced human-in-the-loop reinforcement learning method, termed the Human as AI mentor-based deep reinforcement learning (HAIM-DRL) framework, which facilitates safe and efficient autonomous driving in mixed traffic platoon. Drawing inspiration from the human learning process, we first introduce an innovative learning paradigm that effectively injects human intelligence into AI, termed Human as AI mentor (HAIM). In this paradigm, the human expert serves as a mentor to the AI agent. While allowing the agent to sufficiently explore uncertain environments, the human expert can take control in dangerous situations and demonstrate correct actions to avoid potential accidents. On the other hand, the agent could be guided to minimize traffic flow disturbance, thereby optimizing traffic flow efficiency. In detail, HAIM-DRL leverages data collected from free exploration and partial human demonstrations as its two training sources. Remarkably, we circumvent the intricate process of manually designing reward functions; instead, we directly derive proxy state-action values from partial human demonstrations to guide the agents’ policy learning. Additionally, we employ a minimal intervention technique to reduce the human mentor’s cognitive load. Comparative results show that HAIM-DRL outperforms traditional methods in driving safety, sampling efficiency, mitigation of traffic flow disturbance, and generalizability to unseen traffic scenarios.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
15.20
自引率
0.00%
发文量
0
期刊最新文献
Exploring the application of blockchain technology in crowdsource autonomous driving map updating Efficacy of decentralized traffic signal controllers on stabilizing heterogeneous urban grid network Fleet data based traffic modeling Experimental assessment of communication delay's impact on connected automated vehicle speed volatility and energy consumption Decentralizing e-bus charging infrastructure deployment leads to economic and environmental benefits
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1