Markov Chain Variance Estimation: A Stochastic Approximation Approach

Shubhada Agrawal, Prashanth L. A., Siva Theja Maguluri
{"title":"Markov Chain Variance Estimation: A Stochastic Approximation Approach","authors":"Shubhada Agrawal, Prashanth L. A., Siva Theja Maguluri","doi":"arxiv-2409.05733","DOIUrl":null,"url":null,"abstract":"We consider the problem of estimating the asymptotic variance of a function\ndefined on a Markov chain, an important step for statistical inference of the\nstationary mean. We design the first recursive estimator that requires $O(1)$\ncomputation at each step, does not require storing any historical samples or\nany prior knowledge of run-length, and has optimal $O(\\frac{1}{n})$ rate of\nconvergence for the mean-squared error (MSE) with provable finite sample\nguarantees. Here, $n$ refers to the total number of samples generated. The\npreviously best-known rate of convergence in MSE was $O(\\frac{\\log n}{n})$,\nachieved by jackknifed estimators, which also do not enjoy these other\ndesirable properties. Our estimator is based on linear stochastic approximation\nof an equivalent formulation of the asymptotic variance in terms of the\nsolution of the Poisson equation. We generalize our estimator in several directions, including estimating the\ncovariance matrix for vector-valued functions, estimating the stationary\nvariance of a Markov chain, and approximately estimating the asymptotic\nvariance in settings where the state space of the underlying Markov chain is\nlarge. We also show applications of our estimator in average reward\nreinforcement learning (RL), where we work with asymptotic variance as a risk\nmeasure to model safety-critical applications. We design a temporal-difference\ntype algorithm tailored for policy evaluation in this context. We consider both\nthe tabular and linear function approximation settings. Our work paves the way\nfor developing actor-critic style algorithms for variance-constrained RL.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Statistics Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05733","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We consider the problem of estimating the asymptotic variance of a function defined on a Markov chain, an important step for statistical inference of the stationary mean. We design the first recursive estimator that requires $O(1)$ computation at each step, does not require storing any historical samples or any prior knowledge of run-length, and has optimal $O(\frac{1}{n})$ rate of convergence for the mean-squared error (MSE) with provable finite sample guarantees. Here, $n$ refers to the total number of samples generated. The previously best-known rate of convergence in MSE was $O(\frac{\log n}{n})$, achieved by jackknifed estimators, which also do not enjoy these other desirable properties. Our estimator is based on linear stochastic approximation of an equivalent formulation of the asymptotic variance in terms of the solution of the Poisson equation. We generalize our estimator in several directions, including estimating the covariance matrix for vector-valued functions, estimating the stationary variance of a Markov chain, and approximately estimating the asymptotic variance in settings where the state space of the underlying Markov chain is large. We also show applications of our estimator in average reward reinforcement learning (RL), where we work with asymptotic variance as a risk measure to model safety-critical applications. We design a temporal-difference type algorithm tailored for policy evaluation in this context. We consider both the tabular and linear function approximation settings. Our work paves the way for developing actor-critic style algorithms for variance-constrained RL.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
马尔可夫链方差估计:随机逼近法
我们考虑的问题是估计一个定义在马尔可夫链上的函数的渐近方差,这是统计推断静态均值的一个重要步骤。我们设计了第一个递归估计器,它每一步都需要计算 $O(1)$,不需要存储任何历史样本或运行长度的任何先验知识,并且具有最优的均方误差(MSE)$O(\frac{1}{n})$收敛率和可证明的有限样本保证。这里的 $n$ 是指生成的样本总数。之前最著名的 MSE 收敛率为 $O(\frac\log n}{n})$,由千斤顶式估计器实现,它也不具备这些其他理想特性。我们的估计方法基于线性随机近似,即用泊松方程的解等价表达渐近方差。我们在多个方向上推广了我们的估计器,包括估计向量值函数的协方差矩阵、估计马尔可夫链的静态方差,以及在底层马尔可夫链的状态空间很大的情况下近似估计渐近方差。我们还展示了我们的估计器在平均奖励强化学习(RL)中的应用,我们将渐近方差作为一种风险度量来模拟安全关键型应用。在这种情况下,我们设计了一种为策略评估量身定制的时差型算法。我们考虑了表格和线性函数近似设置。我们的工作为开发方差受限 RL 的行为批判式算法铺平了道路。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Cyclicity Analysis of the Ornstein-Uhlenbeck Process Linear hypothesis testing in high-dimensional heteroscedastics via random integration Asymptotics for conformal inference Sparse Factor Analysis for Categorical Data with the Group-Sparse Generalized Singular Value Decomposition Incremental effects for continuous exposures
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1