Wenjie Song, Shixian Liu, Yujun Li, Yi Yang, C. Xiang
{"title":"Smooth Actor-Critic Algorithm for End-to-End Autonomous Driving","authors":"Wenjie Song, Shixian Liu, Yujun Li, Yi Yang, C. Xiang","doi":"10.23919/ACC45564.2020.9147960","DOIUrl":null,"url":null,"abstract":"For the intelligent sequential decision-making tasks like autonomous driving, decisions or actions made by the agent in a short period of time should be smooth enough or not too choppy. In order to help the agent learn smooth actions (steering, accelerating, braking) for autonomous driving, this paper proposes the smooth actor-critic algorithm for both deterministic policy and stochastic policy systems. Specifically, a regularization term is added to the objective function of actorcritic methods to constrain the difference between neighbouring actions in a small region without affecting the convergence performance of the whole system. Then, the theoretical analysis and proof for the modified methods are conducted so that it can be theoretically guaranteed in terms of iterative improvements. Moreover, experiments in different simulation systems also prove that the methods can generate much smoother actions and obtain more robust performance for reinforcement learning-based End-to-End autonomous driving.","PeriodicalId":288450,"journal":{"name":"2020 American Control Conference (ACC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 American Control Conference (ACC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/ACC45564.2020.9147960","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
For the intelligent sequential decision-making tasks like autonomous driving, decisions or actions made by the agent in a short period of time should be smooth enough or not too choppy. In order to help the agent learn smooth actions (steering, accelerating, braking) for autonomous driving, this paper proposes the smooth actor-critic algorithm for both deterministic policy and stochastic policy systems. Specifically, a regularization term is added to the objective function of actorcritic methods to constrain the difference between neighbouring actions in a small region without affecting the convergence performance of the whole system. Then, the theoretical analysis and proof for the modified methods are conducted so that it can be theoretically guaranteed in terms of iterative improvements. Moreover, experiments in different simulation systems also prove that the methods can generate much smoother actions and obtain more robust performance for reinforcement learning-based End-to-End autonomous driving.