用局部处理对马尔可夫决策过程进行实验

Shuze Chen, David Simchi-Levi, Chonghuan Wang
{"title":"用局部处理对马尔可夫决策过程进行实验","authors":"Shuze Chen, David Simchi-Levi, Chonghuan Wang","doi":"arxiv-2407.19618","DOIUrl":null,"url":null,"abstract":"As service systems grow increasingly complex and dynamic, many interventions\nbecome localized, available and taking effect only in specific states. This\npaper investigates experiments with local treatments on a widely-used class of\ndynamic models, Markov Decision Processes (MDPs). Particularly, we focus on\nutilizing the local structure to improve the inference efficiency of the\naverage treatment effect. We begin by demonstrating the efficiency of classical\ninference methods, including model-based estimation and temporal difference\nlearning under a fixed policy, as well as classical A/B testing with general\ntreatments. We then introduce a variance reduction technique that exploits the\nlocal treatment structure by sharing information for states unaffected by the\ntreatment policy. Our new estimator effectively overcomes the variance lower\nbound for general treatments while matching the more stringent lower bound\nincorporating the local treatment structure. Furthermore, our estimator can\noptimally achieve a linear reduction with the number of test arms for a major\npart of the variance. Finally, we explore scenarios with perfect knowledge of\nthe control arm and design estimators that further improve inference\nefficiency.","PeriodicalId":501293,"journal":{"name":"arXiv - ECON - Econometrics","volume":"73 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Experimenting on Markov Decision Processes with Local Treatments\",\"authors\":\"Shuze Chen, David Simchi-Levi, Chonghuan Wang\",\"doi\":\"arxiv-2407.19618\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As service systems grow increasingly complex and dynamic, many interventions\\nbecome localized, available and taking effect only in specific states. This\\npaper investigates experiments with local treatments on a widely-used class of\\ndynamic models, Markov Decision Processes (MDPs). Particularly, we focus on\\nutilizing the local structure to improve the inference efficiency of the\\naverage treatment effect. We begin by demonstrating the efficiency of classical\\ninference methods, including model-based estimation and temporal difference\\nlearning under a fixed policy, as well as classical A/B testing with general\\ntreatments. We then introduce a variance reduction technique that exploits the\\nlocal treatment structure by sharing information for states unaffected by the\\ntreatment policy. Our new estimator effectively overcomes the variance lower\\nbound for general treatments while matching the more stringent lower bound\\nincorporating the local treatment structure. Furthermore, our estimator can\\noptimally achieve a linear reduction with the number of test arms for a major\\npart of the variance. Finally, we explore scenarios with perfect knowledge of\\nthe control arm and design estimators that further improve inference\\nefficiency.\",\"PeriodicalId\":501293,\"journal\":{\"name\":\"arXiv - ECON - Econometrics\",\"volume\":\"73 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - ECON - Econometrics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.19618\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - ECON - Econometrics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.19618","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

随着服务系统变得越来越复杂和动态,许多干预措施也变得局部化,只能在特定状态下使用和生效。本文研究了在一类广泛使用的动态模型--马尔可夫决策过程(Markov Decision Processes,MDPs)--上进行局部治疗的实验。我们尤其关注利用局部结构来提高平均治疗效果的推断效率。我们首先展示了经典推断方法的效率,包括固定策略下基于模型的估计和时差学习,以及使用一般治疗方法的经典 A/B 测试。然后,我们引入了一种方差缩小技术,通过共享不受治疗政策影响的状态信息来利用局部治疗结构。我们的新估计器有效地克服了一般处理方法的方差下限,同时与包含本地处理结构的更严格的下限相匹配。此外,我们的估计器还能以最佳方式实现方差的主要部分与测试臂数量的线性减少。最后,我们探讨了完全了解控制臂的情况,并设计了能进一步提高推断效率的估计器。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Experimenting on Markov Decision Processes with Local Treatments
As service systems grow increasingly complex and dynamic, many interventions become localized, available and taking effect only in specific states. This paper investigates experiments with local treatments on a widely-used class of dynamic models, Markov Decision Processes (MDPs). Particularly, we focus on utilizing the local structure to improve the inference efficiency of the average treatment effect. We begin by demonstrating the efficiency of classical inference methods, including model-based estimation and temporal difference learning under a fixed policy, as well as classical A/B testing with general treatments. We then introduce a variance reduction technique that exploits the local treatment structure by sharing information for states unaffected by the treatment policy. Our new estimator effectively overcomes the variance lower bound for general treatments while matching the more stringent lower bound incorporating the local treatment structure. Furthermore, our estimator can optimally achieve a linear reduction with the number of test arms for a major part of the variance. Finally, we explore scenarios with perfect knowledge of the control arm and design estimators that further improve inference efficiency.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Simple robust two-stage estimation and inference for generalized impulse responses and multi-horizon causality GPT takes the SAT: Tracing changes in Test Difficulty and Math Performance of Students A Simple and Adaptive Confidence Interval when Nuisance Parameters Satisfy an Inequality Why you should also use OLS estimation of tail exponents On LASSO Inference for High Dimensional Predictive Regression
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1