Uniform Ergodicity and Ergodic-Risk Constrained Policy Optimization

Shahriar Talebi, Na Li
{"title":"Uniform Ergodicity and Ergodic-Risk Constrained Policy Optimization","authors":"Shahriar Talebi, Na Li","doi":"arxiv-2409.10767","DOIUrl":null,"url":null,"abstract":"In stochastic systems, risk-sensitive control balances performance with\nresilience to less likely events. Although existing methods rely on\nfinite-horizon risk criteria, this paper introduces \\textit{limiting-risk\ncriteria} that capture long-term cumulative risks through probabilistic\nlimiting theorems. Extending the Linear Quadratic Regulation (LQR) framework,\nwe incorporate constraints on these limiting-risk criteria derived from the\nasymptotic behavior of cumulative costs, accounting for extreme deviations.\nUsing tailored Functional Central Limit Theorems (FCLT), we demonstrate that\nthe time-correlated terms in the limiting-risk criteria converge under strong\nergodicity, and establish conditions for convergence in non-stationary settings\nwhile characterizing the distribution and providing explicit formulations for\nthe limiting variance of the risk functional. The FCLT is developed by applying\nergodic theory for Markov chains and obtaining \\textit{uniform ergodicity} of\nthe controlled process. For quadratic risk functionals on linear dynamics, in\naddition to internal stability, the uniform ergodicity requires the (possibly\nheavy-tailed) dynamic noise to have a finite fourth moment. This offers a clear\npath to quantifying long-term uncertainty. We also propose a primal-dual\nconstrained policy optimization method that optimizes the average performance\nwhile ensuring limiting-risk constraints are satisfied. Our framework offers a\npractical, theoretically guaranteed approach for long-term risk-sensitive\ncontrol, backed by convergence guarantees and validations through simulations.","PeriodicalId":501175,"journal":{"name":"arXiv - EE - Systems and Control","volume":"118 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Systems and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10767","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In stochastic systems, risk-sensitive control balances performance with resilience to less likely events. Although existing methods rely on finite-horizon risk criteria, this paper introduces \textit{limiting-risk criteria} that capture long-term cumulative risks through probabilistic limiting theorems. Extending the Linear Quadratic Regulation (LQR) framework, we incorporate constraints on these limiting-risk criteria derived from the asymptotic behavior of cumulative costs, accounting for extreme deviations. Using tailored Functional Central Limit Theorems (FCLT), we demonstrate that the time-correlated terms in the limiting-risk criteria converge under strong ergodicity, and establish conditions for convergence in non-stationary settings while characterizing the distribution and providing explicit formulations for the limiting variance of the risk functional. The FCLT is developed by applying ergodic theory for Markov chains and obtaining \textit{uniform ergodicity} of the controlled process. For quadratic risk functionals on linear dynamics, in addition to internal stability, the uniform ergodicity requires the (possibly heavy-tailed) dynamic noise to have a finite fourth moment. This offers a clear path to quantifying long-term uncertainty. We also propose a primal-dual constrained policy optimization method that optimizes the average performance while ensuring limiting-risk constraints are satisfied. Our framework offers a practical, theoretically guaranteed approach for long-term risk-sensitive control, backed by convergence guarantees and validations through simulations.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
均匀遍历性与遍历风险约束政策优化
在随机系统中,对风险敏感的控制可以在性能与对较小概率事件的应变能力之间取得平衡。虽然现有的方法依赖于无限视距风险标准,但本文引入了textit{限制风险标准},通过概率限制定理捕捉长期累积风险。通过扩展线性四则运算(LQR)框架,我们在这些极限风险标准中加入了从累积成本渐近行为中得出的约束条件,并考虑到了极端偏差。通过使用量身定制的函数中心极限定理(FCLT),我们证明了极限风险标准中的时间相关项会在较强的正态性条件下收敛,并建立了在非稳态环境下收敛的条件,同时描述了风险函数的分布特征,并为风险函数的极限方差提供了明确的公式。FCLT 是通过应用马尔可夫链的正交理论并获得受控过程的 "均匀正交性"(textit{uniform ergodicity})而发展起来的。对于线性动态的二次风险函数,除了内部稳定性之外,均匀遍历性还要求(可能是重尾的)动态噪声具有有限的第四矩。这为量化长期不确定性提供了一条清晰的途径。我们还提出了一种基元-双约束策略优化方法,在确保满足极限风险约束的同时优化平均性能。我们的框架为长期风险敏感控制提供了一种实用的、理论上有保证的方法,并有收敛性保证和模拟验证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Data-Efficient Quadratic Q-Learning Using LMIs On the Stability of Consensus Control under Rotational Ambiguities System-Level Efficient Performance of EMLA-Driven Heavy-Duty Manipulators via Bilevel Optimization Framework with a Leader--Follower Scenario ReLU Surrogates in Mixed-Integer MPC for Irrigation Scheduling Model-Free Generic Robust Control for Servo-Driven Actuation Mechanisms with Experimental Verification
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1