Xin Chen, Wenbin Lu, Shu Yang, Dipankar Bandyopadhyay
{"title":"观察时间间隔不规则、结果取决于观察时间的非政策评估","authors":"Xin Chen, Wenbin Lu, Shu Yang, Dipankar Bandyopadhyay","doi":"arxiv-2409.09236","DOIUrl":null,"url":null,"abstract":"While the classic off-policy evaluation (OPE) literature commonly assumes\ndecision time points to be evenly spaced for simplicity, in many real-world\nscenarios, such as those involving user-initiated visits, decisions are made at\nirregularly-spaced and potentially outcome-dependent time points. For a more\nprincipled evaluation of the dynamic policies, this paper constructs a novel\nOPE framework, which concerns not only the state-action process but also an\nobservation process dictating the time points at which decisions are made. The\nframework is closely connected to the Markov decision process in computer\nscience and with the renewal process in the statistical literature. Within the\nframework, two distinct value functions, derived from cumulative reward and\nintegrated reward respectively, are considered, and statistical inference for\neach value function is developed under revised Markov and time-homogeneous\nassumptions. The validity of the proposed method is further supported by\ntheoretical results, simulation studies, and a real-world application from\nelectronic health records (EHR) evaluating periodontal disease treatments.","PeriodicalId":501425,"journal":{"name":"arXiv - STAT - Methodology","volume":"10 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Off-Policy Evaluation with Irregularly-Spaced, Outcome-Dependent Observation Times\",\"authors\":\"Xin Chen, Wenbin Lu, Shu Yang, Dipankar Bandyopadhyay\",\"doi\":\"arxiv-2409.09236\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"While the classic off-policy evaluation (OPE) literature commonly assumes\\ndecision time points to be evenly spaced for simplicity, in many real-world\\nscenarios, such as those involving user-initiated visits, decisions are made at\\nirregularly-spaced and potentially outcome-dependent time points. For a more\\nprincipled evaluation of the dynamic policies, this paper constructs a novel\\nOPE framework, which concerns not only the state-action process but also an\\nobservation process dictating the time points at which decisions are made. The\\nframework is closely connected to the Markov decision process in computer\\nscience and with the renewal process in the statistical literature. Within the\\nframework, two distinct value functions, derived from cumulative reward and\\nintegrated reward respectively, are considered, and statistical inference for\\neach value function is developed under revised Markov and time-homogeneous\\nassumptions. The validity of the proposed method is further supported by\\ntheoretical results, simulation studies, and a real-world application from\\nelectronic health records (EHR) evaluating periodontal disease treatments.\",\"PeriodicalId\":501425,\"journal\":{\"name\":\"arXiv - STAT - Methodology\",\"volume\":\"10 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Methodology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.09236\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Methodology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09236","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Off-Policy Evaluation with Irregularly-Spaced, Outcome-Dependent Observation Times
While the classic off-policy evaluation (OPE) literature commonly assumes
decision time points to be evenly spaced for simplicity, in many real-world
scenarios, such as those involving user-initiated visits, decisions are made at
irregularly-spaced and potentially outcome-dependent time points. For a more
principled evaluation of the dynamic policies, this paper constructs a novel
OPE framework, which concerns not only the state-action process but also an
observation process dictating the time points at which decisions are made. The
framework is closely connected to the Markov decision process in computer
science and with the renewal process in the statistical literature. Within the
framework, two distinct value functions, derived from cumulative reward and
integrated reward respectively, are considered, and statistical inference for
each value function is developed under revised Markov and time-homogeneous
assumptions. The validity of the proposed method is further supported by
theoretical results, simulation studies, and a real-world application from
electronic health records (EHR) evaluating periodontal disease treatments.