观察时间间隔不规则、结果取决于观察时间的非政策评估

Xin Chen, Wenbin Lu, Shu Yang, Dipankar Bandyopadhyay
{"title":"观察时间间隔不规则、结果取决于观察时间的非政策评估","authors":"Xin Chen, Wenbin Lu, Shu Yang, Dipankar Bandyopadhyay","doi":"arxiv-2409.09236","DOIUrl":null,"url":null,"abstract":"While the classic off-policy evaluation (OPE) literature commonly assumes\ndecision time points to be evenly spaced for simplicity, in many real-world\nscenarios, such as those involving user-initiated visits, decisions are made at\nirregularly-spaced and potentially outcome-dependent time points. For a more\nprincipled evaluation of the dynamic policies, this paper constructs a novel\nOPE framework, which concerns not only the state-action process but also an\nobservation process dictating the time points at which decisions are made. The\nframework is closely connected to the Markov decision process in computer\nscience and with the renewal process in the statistical literature. Within the\nframework, two distinct value functions, derived from cumulative reward and\nintegrated reward respectively, are considered, and statistical inference for\neach value function is developed under revised Markov and time-homogeneous\nassumptions. The validity of the proposed method is further supported by\ntheoretical results, simulation studies, and a real-world application from\nelectronic health records (EHR) evaluating periodontal disease treatments.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"10 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Off-Policy Evaluation with Irregularly-Spaced, Outcome-Dependent Observation Times\",\"authors\":\"Xin Chen, Wenbin Lu, Shu Yang, Dipankar Bandyopadhyay\",\"doi\":\"arxiv-2409.09236\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"While the classic off-policy evaluation (OPE) literature commonly assumes\\ndecision time points to be evenly spaced for simplicity, in many real-world\\nscenarios, such as those involving user-initiated visits, decisions are made at\\nirregularly-spaced and potentially outcome-dependent time points. For a more\\nprincipled evaluation of the dynamic policies, this paper constructs a novel\\nOPE framework, which concerns not only the state-action process but also an\\nobservation process dictating the time points at which decisions are made. The\\nframework is closely connected to the Markov decision process in computer\\nscience and with the renewal process in the statistical literature. Within the\\nframework, two distinct value functions, derived from cumulative reward and\\nintegrated reward respectively, are considered, and statistical inference for\\neach value function is developed under revised Markov and time-homogeneous\\nassumptions. The validity of the proposed method is further supported by\\ntheoretical results, simulation studies, and a real-world application from\\nelectronic health records (EHR) evaluating periodontal disease treatments.\",\"PeriodicalId\":501425,\"journal\":{\"name\":\"arXiv - STAT - Methodology\",\"volume\":\"10 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Methodology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.09236\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Methodology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09236","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

经典的非政策评估(OPE)文献通常假定决策时间点间隔均匀以简化评估,但在现实世界的许多场景中,例如涉及用户主动访问的场景,决策是在间隔不规则且可能与结果相关的时间点上做出的。为了对动态策略进行更原则性的评估,本文构建了一个新颖的 OPE 框架,该框架不仅涉及状态-行动过程,还涉及决定决策时间点的观察过程。该框架与计算机科学中的马尔可夫决策过程和统计文献中的更新过程密切相关。在该框架内,考虑了两种不同的价值函数,它们分别来自累积奖励和积分奖励,并在修正的马尔可夫假设和时间均质假设下对每个价值函数进行了统计推断。理论结果、模拟研究和电子健康记录(EHR)评估牙周病治疗的实际应用进一步证明了所提方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Off-Policy Evaluation with Irregularly-Spaced, Outcome-Dependent Observation Times
While the classic off-policy evaluation (OPE) literature commonly assumes decision time points to be evenly spaced for simplicity, in many real-world scenarios, such as those involving user-initiated visits, decisions are made at irregularly-spaced and potentially outcome-dependent time points. For a more principled evaluation of the dynamic policies, this paper constructs a novel OPE framework, which concerns not only the state-action process but also an observation process dictating the time points at which decisions are made. The framework is closely connected to the Markov decision process in computer science and with the renewal process in the statistical literature. Within the framework, two distinct value functions, derived from cumulative reward and integrated reward respectively, are considered, and statistical inference for each value function is developed under revised Markov and time-homogeneous assumptions. The validity of the proposed method is further supported by theoretical results, simulation studies, and a real-world application from electronic health records (EHR) evaluating periodontal disease treatments.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Poisson approximate likelihood compared to the particle filter Optimising the Trade-Off Between Type I and Type II Errors: A Review and Extensions Bias Reduction in Matched Observational Studies with Continuous Treatments: Calipered Non-Bipartite Matching and Bias-Corrected Estimation and Inference Forecasting age distribution of life-table death counts via α-transformation Probability-scale residuals for event-time data
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1