Toward a Theoretical Foundation of Policy Optimization for Learning Control Policies

IF 11.2 1区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Annual Review of Control Robotics and Autonomous Systems Pub Date : 2023-05-03 DOI:10.1146/annurev-control-042920-020021
Bin Hu, Kaiqing Zhang, Na Li, Mehran Mesbahi, Maryam Fazel, Tamer Başar
{"title":"Toward a Theoretical Foundation of Policy Optimization for Learning Control Policies","authors":"Bin Hu, Kaiqing Zhang, Na Li, Mehran Mesbahi, Maryam Fazel, Tamer Başar","doi":"10.1146/annurev-control-042920-020021","DOIUrl":null,"url":null,"abstract":"Gradient-based methods have been widely used for system design and optimization in diverse application domains. Recently, there has been a renewed interest in studying theoretical properties of these methods in the context of control and reinforcement learning. This article surveys some of the recent developments on policy optimization, a gradient-based iterative approach for feedback control synthesis that has been popularized by successes of reinforcement learning. We take an interdisciplinary perspective in our exposition that connects control theory, reinforcement learning, and large-scale optimization. We review a number of recently developed theoretical results on the optimization landscape, global convergence, and sample complexityof gradient-based methods for various continuous control problems, such as the linear quadratic regulator (LQR), [Formula: see text] control, risk-sensitive control, linear quadratic Gaussian (LQG) control, and output feedback synthesis. In conjunction with these optimization results, we also discuss how direct policy optimization handles stability and robustness concerns in learning-based control, two main desiderata in control engineering. We conclude the survey by pointing out several challenges and opportunities at the intersection of learning and control.","PeriodicalId":29961,"journal":{"name":"Annual Review of Control Robotics and Autonomous Systems","volume":null,"pages":null},"PeriodicalIF":11.2000,"publicationDate":"2023-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Review of Control Robotics and Autonomous Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1146/annurev-control-042920-020021","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 2

Abstract

Gradient-based methods have been widely used for system design and optimization in diverse application domains. Recently, there has been a renewed interest in studying theoretical properties of these methods in the context of control and reinforcement learning. This article surveys some of the recent developments on policy optimization, a gradient-based iterative approach for feedback control synthesis that has been popularized by successes of reinforcement learning. We take an interdisciplinary perspective in our exposition that connects control theory, reinforcement learning, and large-scale optimization. We review a number of recently developed theoretical results on the optimization landscape, global convergence, and sample complexityof gradient-based methods for various continuous control problems, such as the linear quadratic regulator (LQR), [Formula: see text] control, risk-sensitive control, linear quadratic Gaussian (LQG) control, and output feedback synthesis. In conjunction with these optimization results, we also discuss how direct policy optimization handles stability and robustness concerns in learning-based control, two main desiderata in control engineering. We conclude the survey by pointing out several challenges and opportunities at the intersection of learning and control.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于学习控制策略的策略优化理论基础
基于梯度的方法已广泛应用于各种应用领域的系统设计和优化。最近,在控制和强化学习的背景下,研究这些方法的理论性质重新引起了人们的兴趣。本文概述了策略优化的一些最新发展,这是一种基于梯度的反馈控制综合迭代方法,通过强化学习的成功而得到推广。我们采取跨学科的观点,在我们的博览会,连接控制理论,强化学习和大规模优化。我们回顾了一些最近发展的理论结果,关于各种连续控制问题的基于梯度的方法的优化前景,全局收敛和样本复杂性,例如线性二次调节器(LQR),[公式:见文本]控制,风险敏感控制,线性二次高斯(LQG)控制和输出反馈综合。结合这些优化结果,我们还讨论了直接策略优化如何处理基于学习的控制中的稳定性和鲁棒性问题,这是控制工程中的两个主要需求。我们通过指出在学习和控制的交叉点上的一些挑战和机遇来结束调查。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
28.30
自引率
2.20%
发文量
25
期刊介绍: The Annual Review of Control, Robotics, and Autonomous Systems offers comprehensive reviews on theoretical and applied developments influencing autonomous and semiautonomous systems engineering. Major areas covered include control, robotics, mechanics, optimization, communication, information theory, machine learning, computing, and signal processing. The journal extends its reach beyond engineering to intersect with fields like biology, neuroscience, and human behavioral sciences. The current volume has transitioned to open access through the Subscribe to Open program, with all articles published under a CC BY license.
期刊最新文献
Control Co-Design of Wind Turbines Instinctive Negotiation by Autonomous Agents in Dense, Unstructured Traffic: A Controls Perspective A Control Framework for Ocean Wave Energy Conversion Systems: The Potential of Moments From Virtual Reality to the Emerging Discipline of Perception Engineering Ethics of Social Robotics: Individual and Societal Concerns and Opportunities
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1