基于Actor-Critic强化学习的车辆系统近似最优滤波器设计

IF 4.8 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC Automotive Innovation Pub Date : 2022-11-04 DOI:10.1007/s42154-022-00195-z

Yuming Yin, Shengbo Eben Li, Kaiming Tang, Wenhan Cao, Wei Wu, Hongbo Li

{"title":"基于Actor-Critic强化学习的车辆系统近似最优滤波器设计","authors":"Yuming Yin, Shengbo Eben Li, Kaiming Tang, Wenhan Cao, Wei Wu, Hongbo Li","doi":"10.1007/s42154-022-00195-z","DOIUrl":null,"url":null,"abstract":"<div><p>Precise state and parameter estimations are essential for identification, analysis and control of vehicle engineering problems, especially under significant model and measurement uncertainties. The widely used filtering/estimation algorithms, such as Kalman series like Kalman filter, extended Kalman filter, unscented Kalman filter, and particle filter, generally aim to approach the true state/parameter distribution via iteratively updating the filter gain at each time step. However, the optimality of these filters would be deteriorated by unrealistic initial condition or significant model error. Alternatively, this paper proposes to approximate the optimal filter gain by considering the effect factors within infinite time horizon, on the basis of estimation-control duality. The proposed approximate optimal filter (AOF) problem is designed and subsequently solved by actor-critic reinforcement learning (RL) method. The AOF design transforms the traditional optimal filtering problem with the minimum expected mean square error into an optimal control problem with the minimum accumulated estimation error, in which the estimation error is used as the surrogate system state and the infinite-horizon filter gain is the control input. The estimation-control duality is proved to hold when certain conditions about initial vehicle state distributions and policy structure are maintained. In order to evaluate of the effectiveness of AOF, a vehicle state estimation problem is then demonstrated and compared with the steady-state Kalman filter. The results showed that the obtained filter policy via RL with different discount factors can converge to theoretical optimal gain with an error within 5%, and the average estimation errors of vehicle slip angle and yaw rate are less than 1.5 × 10<sup>–4</sup>.</p></div>","PeriodicalId":36310,"journal":{"name":"Automotive Innovation","volume":"5 4","pages":"415 - 426"},"PeriodicalIF":4.8000,"publicationDate":"2022-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Approximate Optimal Filter Design for Vehicle System through Actor-Critic Reinforcement Learning\",\"authors\":\"Yuming Yin, Shengbo Eben Li, Kaiming Tang, Wenhan Cao, Wei Wu, Hongbo Li\",\"doi\":\"10.1007/s42154-022-00195-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Precise state and parameter estimations are essential for identification, analysis and control of vehicle engineering problems, especially under significant model and measurement uncertainties. The widely used filtering/estimation algorithms, such as Kalman series like Kalman filter, extended Kalman filter, unscented Kalman filter, and particle filter, generally aim to approach the true state/parameter distribution via iteratively updating the filter gain at each time step. However, the optimality of these filters would be deteriorated by unrealistic initial condition or significant model error. Alternatively, this paper proposes to approximate the optimal filter gain by considering the effect factors within infinite time horizon, on the basis of estimation-control duality. The proposed approximate optimal filter (AOF) problem is designed and subsequently solved by actor-critic reinforcement learning (RL) method. The AOF design transforms the traditional optimal filtering problem with the minimum expected mean square error into an optimal control problem with the minimum accumulated estimation error, in which the estimation error is used as the surrogate system state and the infinite-horizon filter gain is the control input. The estimation-control duality is proved to hold when certain conditions about initial vehicle state distributions and policy structure are maintained. In order to evaluate of the effectiveness of AOF, a vehicle state estimation problem is then demonstrated and compared with the steady-state Kalman filter. The results showed that the obtained filter policy via RL with different discount factors can converge to theoretical optimal gain with an error within 5%, and the average estimation errors of vehicle slip angle and yaw rate are less than 1.5 × 10<sup>–4</sup>.</p></div>\",\"PeriodicalId\":36310,\"journal\":{\"name\":\"Automotive Innovation\",\"volume\":\"5 4\",\"pages\":\"415 - 426\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2022-11-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Automotive Innovation\",\"FirstCategoryId\":\"1087\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s42154-022-00195-z\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automotive Innovation","FirstCategoryId":"1087","ListUrlMain":"https://link.springer.com/article/10.1007/s42154-022-00195-z","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 1

摘要

精确的状态和参数估计对于车辆工程问题的识别、分析和控制至关重要，特别是在模型和测量存在重大不确定性的情况下。目前广泛使用的滤波/估计算法，如卡尔曼滤波、扩展卡尔曼滤波、无气味卡尔曼滤波和粒子滤波等卡尔曼级数算法，一般都是通过在每个时间步迭代更新滤波器增益来接近真实状态/参数分布。然而，这些滤波器的最优性会因不现实的初始条件或显著的模型误差而降低。或者，本文提出在估计-控制对偶性的基础上，通过考虑无限时间范围内的影响因素来近似最优滤波器增益。设计了近似最优滤波器(AOF)问题，并采用行为-评价强化学习(RL)方法进行求解。AOF设计将传统的期望均方误差最小的最优滤波问题转化为累积估计误差最小的最优控制问题，其中估计误差作为系统状态的代理，无限水平滤波器增益作为控制输入。证明了当初始车辆状态分布和策略结构保持一定条件时，估计-控制对偶性成立。为了评价AOF算法的有效性，给出了一个车辆状态估计问题，并与稳态卡尔曼滤波进行了比较。结果表明，采用不同折现因子的RL得到的滤波策略均能收敛到理论最优增益，误差在5%以内，车辆偏转角和横摆角速度的平均估计误差小于1.5 × 10-4。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Approximate Optimal Filter Design for Vehicle System through Actor-Critic Reinforcement Learning

Precise state and parameter estimations are essential for identification, analysis and control of vehicle engineering problems, especially under significant model and measurement uncertainties. The widely used filtering/estimation algorithms, such as Kalman series like Kalman filter, extended Kalman filter, unscented Kalman filter, and particle filter, generally aim to approach the true state/parameter distribution via iteratively updating the filter gain at each time step. However, the optimality of these filters would be deteriorated by unrealistic initial condition or significant model error. Alternatively, this paper proposes to approximate the optimal filter gain by considering the effect factors within infinite time horizon, on the basis of estimation-control duality. The proposed approximate optimal filter (AOF) problem is designed and subsequently solved by actor-critic reinforcement learning (RL) method. The AOF design transforms the traditional optimal filtering problem with the minimum expected mean square error into an optimal control problem with the minimum accumulated estimation error, in which the estimation error is used as the surrogate system state and the infinite-horizon filter gain is the control input. The estimation-control duality is proved to hold when certain conditions about initial vehicle state distributions and policy structure are maintained. In order to evaluate of the effectiveness of AOF, a vehicle state estimation problem is then demonstrated and compared with the steady-state Kalman filter. The results showed that the obtained filter policy via RL with different discount factors can converge to theoretical optimal gain with an error within 5%, and the average estimation errors of vehicle slip angle and yaw rate are less than 1.5 × 10^–4.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Automotive Innovation Engineering-Automotive Engineering

CiteScore

8.50

自引率

4.90%

发文量

期刊介绍： Automotive Innovation is dedicated to the publication of innovative findings in the automotive field as well as other related disciplines, covering the principles, methodologies, theoretical studies, experimental studies, product engineering and engineering application. The main topics include but are not limited to: energy-saving, electrification, intelligent and connected, new energy vehicle, safety and lightweight technologies. The journal presents the latest trend and advances of automotive technology.