二次逼近函数的值梯度迭代

IF 7.3 2区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Annual Reviews in Control Pub Date : 2023-01-01 DOI:10.1016/j.arcontrol.2023.100917
Alan Yang, Stephen Boyd
{"title":"二次逼近函数的值梯度迭代","authors":"Alan Yang,&nbsp;Stephen Boyd","doi":"10.1016/j.arcontrol.2023.100917","DOIUrl":null,"url":null,"abstract":"<div><p>We propose a method for designing policies for convex stochastic control problems characterized by random linear dynamics and convex stage cost. We consider policies that employ quadratic approximate value functions as a substitute for the true value function. Evaluating the associated control policy involves solving a convex problem<span>, typically a quadratic program, which can be carried out reliably in real-time. Such policies often perform well even when the approximate value function is not a particularly good approximation of the true value function. We propose value-gradient iteration, which fits the gradient of value function, with regularization that can include constraints reflecting known bounds on the true value function. Our value-gradient iteration method can yield a good approximate value function with few samples, and little hyperparameter tuning. We find that the method can find a good policy with computational effort comparable to that required to just evaluate a control policy via simulation.</span></p></div>","PeriodicalId":50750,"journal":{"name":"Annual Reviews in Control","volume":"56 ","pages":"Article 100917"},"PeriodicalIF":7.3000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Value-gradient iteration with quadratic approximate value functions\",\"authors\":\"Alan Yang,&nbsp;Stephen Boyd\",\"doi\":\"10.1016/j.arcontrol.2023.100917\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>We propose a method for designing policies for convex stochastic control problems characterized by random linear dynamics and convex stage cost. We consider policies that employ quadratic approximate value functions as a substitute for the true value function. Evaluating the associated control policy involves solving a convex problem<span>, typically a quadratic program, which can be carried out reliably in real-time. Such policies often perform well even when the approximate value function is not a particularly good approximation of the true value function. We propose value-gradient iteration, which fits the gradient of value function, with regularization that can include constraints reflecting known bounds on the true value function. Our value-gradient iteration method can yield a good approximate value function with few samples, and little hyperparameter tuning. We find that the method can find a good policy with computational effort comparable to that required to just evaluate a control policy via simulation.</span></p></div>\",\"PeriodicalId\":50750,\"journal\":{\"name\":\"Annual Reviews in Control\",\"volume\":\"56 \",\"pages\":\"Article 100917\"},\"PeriodicalIF\":7.3000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annual Reviews in Control\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1367578823000810\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Reviews in Control","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1367578823000810","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

针对具有随机线性动力学和凸阶段代价的凸随机控制问题,提出了一种策略设计方法。我们考虑使用二次近似值函数代替真值函数的策略。评估相关的控制策略涉及求解一个凸问题,通常是一个二次规划,可以可靠地实时执行。即使近似值函数不是真实值函数的特别好的近似值,这种策略通常也会表现良好。我们提出了值梯度迭代,它适合值函数的梯度,正则化可以包括反映真值函数上已知边界的约束。我们的值梯度迭代方法可以在少量样本和少量超参数调优的情况下得到一个很好的近似值函数。我们发现,该方法可以找到一个好的策略,其计算量与仅通过仿真评估控制策略所需的计算量相当。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Value-gradient iteration with quadratic approximate value functions

We propose a method for designing policies for convex stochastic control problems characterized by random linear dynamics and convex stage cost. We consider policies that employ quadratic approximate value functions as a substitute for the true value function. Evaluating the associated control policy involves solving a convex problem, typically a quadratic program, which can be carried out reliably in real-time. Such policies often perform well even when the approximate value function is not a particularly good approximation of the true value function. We propose value-gradient iteration, which fits the gradient of value function, with regularization that can include constraints reflecting known bounds on the true value function. Our value-gradient iteration method can yield a good approximate value function with few samples, and little hyperparameter tuning. We find that the method can find a good policy with computational effort comparable to that required to just evaluate a control policy via simulation.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Annual Reviews in Control
Annual Reviews in Control 工程技术-自动化与控制系统
CiteScore
19.00
自引率
2.10%
发文量
53
审稿时长
36 days
期刊介绍: The field of Control is changing very fast now with technology-driven “societal grand challenges” and with the deployment of new digital technologies. The aim of Annual Reviews in Control is to provide comprehensive and visionary views of the field of Control, by publishing the following types of review articles: Survey Article: Review papers on main methodologies or technical advances adding considerable technical value to the state of the art. Note that papers which purely rely on mechanistic searches and lack comprehensive analysis providing a clear contribution to the field will be rejected. Vision Article: Cutting-edge and emerging topics with visionary perspective on the future of the field or how it will bridge multiple disciplines, and Tutorial research Article: Fundamental guides for future studies.
期刊最新文献
Editorial Board Analysis and design of model predictive control frameworks for dynamic operation—An overview Advances in controller design of pacemakers for pacing control: A comprehensive review Recent advances in path integral control for trajectory optimization: An overview in theoretical and algorithmic perspectives Analyzing stability in 2D systems via LMIs: From pioneering to recent contributions
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1