强化学习的不确定性量化与探索

IF 0.7 4区 管理学 Q3 Engineering Military Operations Research Pub Date : 2019-10-12 DOI:10.1287/opre.2023.2436
Yi Zhu, Jing Dong, H. Lam
{"title":"强化学习的不确定性量化与探索","authors":"Yi Zhu, Jing Dong, H. Lam","doi":"10.1287/opre.2023.2436","DOIUrl":null,"url":null,"abstract":"Quantify the uncertainty to decide and explore better In statistical inference, large-sample behavior and confidence interval construction are fundamental in assessing the error and reliability of estimated quantities with respect to the data noises. In the paper “Uncertainty Quantification and Exploration for Reinforcement Learning”, Dong, Lam, and Zhu study the large sample behavior in the classic setting of reinforcement learning. They derive appropriate large-sample asymptotic distributions for the state-action value function (Q-value) and optimal value function estimations when data are collected from the underlying Markov chain. This allows one to evaluate the assertiveness of performances among different decisions. The tight uncertainty quantification also facilitates the development of a pure exploration policy by maximizing the worst-case relative discrepancy among the estimated Q-values (ratio of the mean squared difference to the variance). This exploration policy aims to collect informative training data to maximize the probability of learning the optimal reward collecting policy, and it achieves good empirical performance.","PeriodicalId":49809,"journal":{"name":"Military Operations Research","volume":"12 1","pages":""},"PeriodicalIF":0.7000,"publicationDate":"2019-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Uncertainty Quantification and Exploration for Reinforcement Learning\",\"authors\":\"Yi Zhu, Jing Dong, H. Lam\",\"doi\":\"10.1287/opre.2023.2436\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Quantify the uncertainty to decide and explore better In statistical inference, large-sample behavior and confidence interval construction are fundamental in assessing the error and reliability of estimated quantities with respect to the data noises. In the paper “Uncertainty Quantification and Exploration for Reinforcement Learning”, Dong, Lam, and Zhu study the large sample behavior in the classic setting of reinforcement learning. They derive appropriate large-sample asymptotic distributions for the state-action value function (Q-value) and optimal value function estimations when data are collected from the underlying Markov chain. This allows one to evaluate the assertiveness of performances among different decisions. The tight uncertainty quantification also facilitates the development of a pure exploration policy by maximizing the worst-case relative discrepancy among the estimated Q-values (ratio of the mean squared difference to the variance). This exploration policy aims to collect informative training data to maximize the probability of learning the optimal reward collecting policy, and it achieves good empirical performance.\",\"PeriodicalId\":49809,\"journal\":{\"name\":\"Military Operations Research\",\"volume\":\"12 1\",\"pages\":\"\"},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2019-10-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Military Operations Research\",\"FirstCategoryId\":\"91\",\"ListUrlMain\":\"https://doi.org/10.1287/opre.2023.2436\",\"RegionNum\":4,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Engineering\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Military Operations Research","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1287/opre.2023.2436","RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 1

摘要

在统计推断中,大样本行为和置信区间构造是评估相对于数据噪声的估计量的误差和可靠性的基础。在论文“不确定性量化和探索强化学习”中,Dong, Lam和Zhu研究了经典强化学习环境下的大样本行为。当从底层马尔可夫链收集数据时,他们推导出适当的大样本渐近分布的状态-作用值函数(q值)和最优值函数估计。这允许人们在不同的决策中评估表现的自信。严格的不确定性量化还通过最大化估计q值(均方差与方差的比值)之间的最坏情况相对差异,促进了纯勘探策略的发展。该探索策略旨在收集信息丰富的训练数据,使学习到最优奖励收集策略的概率最大化,并取得了良好的经验性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Uncertainty Quantification and Exploration for Reinforcement Learning
Quantify the uncertainty to decide and explore better In statistical inference, large-sample behavior and confidence interval construction are fundamental in assessing the error and reliability of estimated quantities with respect to the data noises. In the paper “Uncertainty Quantification and Exploration for Reinforcement Learning”, Dong, Lam, and Zhu study the large sample behavior in the classic setting of reinforcement learning. They derive appropriate large-sample asymptotic distributions for the state-action value function (Q-value) and optimal value function estimations when data are collected from the underlying Markov chain. This allows one to evaluate the assertiveness of performances among different decisions. The tight uncertainty quantification also facilitates the development of a pure exploration policy by maximizing the worst-case relative discrepancy among the estimated Q-values (ratio of the mean squared difference to the variance). This exploration policy aims to collect informative training data to maximize the probability of learning the optimal reward collecting policy, and it achieves good empirical performance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Military Operations Research
Military Operations Research 管理科学-运筹学与管理科学
CiteScore
1.00
自引率
0.00%
发文量
0
审稿时长
>12 weeks
期刊介绍: Military Operations Research is a peer-reviewed journal of high academic quality. The Journal publishes articles that describe operations research (OR) methodologies and theories used in key military and national security applications. Of particular interest are papers that present: Case studies showing innovative OR applications Apply OR to major policy issues Introduce interesting new problems areas Highlight education issues Document the history of military and national security OR.
期刊最新文献
Optimal Routing Under Demand Surges: The Value of Future Arrival Rates Demand Estimation Under Uncertain Consideration Sets Optimal Routing to Parallel Servers in Heavy Traffic The When and How of Delegated Search A Data-Driven Approach to Beating SAA Out of Sample
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1