观察时间间隔不规则、结果取决于观察时间的非政策评估

arXiv - STAT - Methodology Pub Date : 2024-09-14 DOI:arxiv-2409.09236

Xin Chen, Wenbin Lu, Shu Yang, Dipankar Bandyopadhyay

{"title":"观察时间间隔不规则、结果取决于观察时间的非政策评估","authors":"Xin Chen, Wenbin Lu, Shu Yang, Dipankar Bandyopadhyay","doi":"arxiv-2409.09236","DOIUrl":null,"url":null,"abstract":"While the classic off-policy evaluation (OPE) literature commonly assumes\ndecision time points to be evenly spaced for simplicity, in many real-world\nscenarios, such as those involving user-initiated visits, decisions are made at\nirregularly-spaced and potentially outcome-dependent time points. For a more\nprincipled evaluation of the dynamic policies, this paper constructs a novel\nOPE framework, which concerns not only the state-action process but also an\nobservation process dictating the time points at which decisions are made. The\nframework is closely connected to the Markov decision process in computer\nscience and with the renewal process in the statistical literature. Within the\nframework, two distinct value functions, derived from cumulative reward and\nintegrated reward respectively, are considered, and statistical inference for\neach value function is developed under revised Markov and time-homogeneous\nassumptions. The validity of the proposed method is further supported by\ntheoretical results, simulation studies, and a real-world application from\nelectronic health records (EHR) evaluating periodontal disease treatments.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"10 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Off-Policy Evaluation with Irregularly-Spaced, Outcome-Dependent Observation Times\",\"authors\":\"Xin Chen, Wenbin Lu, Shu Yang, Dipankar Bandyopadhyay\",\"doi\":\"arxiv-2409.09236\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"While the classic off-policy evaluation (OPE) literature commonly assumes\\ndecision time points to be evenly spaced for simplicity, in many real-world\\nscenarios, such as those involving user-initiated visits, decisions are made at\\nirregularly-spaced and potentially outcome-dependent time points. For a more\\nprincipled evaluation of the dynamic policies, this paper constructs a novel\\nOPE framework, which concerns not only the state-action process but also an\\nobservation process dictating the time points at which decisions are made. The\\nframework is closely connected to the Markov decision process in computer\\nscience and with the renewal process in the statistical literature. Within the\\nframework, two distinct value functions, derived from cumulative reward and\\nintegrated reward respectively, are considered, and statistical inference for\\neach value function is developed under revised Markov and time-homogeneous\\nassumptions. The validity of the proposed method is further supported by\\ntheoretical results, simulation studies, and a real-world application from\\nelectronic health records (EHR) evaluating periodontal disease treatments.\",\"PeriodicalId\":501425,\"journal\":{\"name\":\"arXiv - STAT - Methodology\",\"volume\":\"10 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Methodology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.09236\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Methodology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09236","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

经典的非政策评估（OPE）文献通常假定决策时间点间隔均匀以简化评估，但在现实世界的许多场景中，例如涉及用户主动访问的场景，决策是在间隔不规则且可能与结果相关的时间点上做出的。为了对动态策略进行更原则性的评估，本文构建了一个新颖的 OPE 框架，该框架不仅涉及状态-行动过程，还涉及决定决策时间点的观察过程。该框架与计算机科学中的马尔可夫决策过程和统计文献中的更新过程密切相关。在该框架内，考虑了两种不同的价值函数，它们分别来自累积奖励和积分奖励，并在修正的马尔可夫假设和时间均质假设下对每个价值函数进行了统计推断。理论结果、模拟研究和电子健康记录（EHR）评估牙周病治疗的实际应用进一步证明了所提方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Off-Policy Evaluation with Irregularly-Spaced, Outcome-Dependent Observation Times

While the classic off-policy evaluation (OPE) literature commonly assumes decision time points to be evenly spaced for simplicity, in many real-world scenarios, such as those involving user-initiated visits, decisions are made at irregularly-spaced and potentially outcome-dependent time points. For a more principled evaluation of the dynamic policies, this paper constructs a novel OPE framework, which concerns not only the state-action process but also an observation process dictating the time points at which decisions are made. The framework is closely connected to the Markov decision process in computer science and with the renewal process in the statistical literature. Within the framework, two distinct value functions, derived from cumulative reward and integrated reward respectively, are considered, and statistical inference for each value function is developed under revised Markov and time-homogeneous assumptions. The validity of the proposed method is further supported by theoretical results, simulation studies, and a real-world application from electronic health records (EHR) evaluating periodontal disease treatments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - STAT - Methodology

自引率

0.00%

发文量