Non-asymptotic Properties of Individualized Treatment Rules from Sequentially Rule-Adaptive Trials.

IF 5.2 3区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Journal of Machine Learning Research Pub Date : 2022-01-01

Daiqi Gao, Yufeng Liu, Donglin Zeng

{"title":"Non-asymptotic Properties of Individualized Treatment Rules from Sequentially Rule-Adaptive Trials.","authors":"Daiqi Gao, Yufeng Liu, Donglin Zeng","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Learning optimal individualized treatment rules (ITRs) has become increasingly important in the modern era of precision medicine. Many statistical and machine learning methods for learning optimal ITRs have been developed in the literature. However, most existing methods are based on data collected from traditional randomized controlled trials and thus cannot take advantage of the accumulative evidence when patients enter the trials sequentially. It is also ethically important that future patients should have a high probability to be treated optimally based on the updated knowledge so far. In this work, we propose a new design called sequentially rule-adaptive trials to learn optimal ITRs based on the contextual bandit framework, in contrast to the response-adaptive design in traditional adaptive trials. In our design, each entering patient will be allocated with a high probability to the current best treatment for this patient, which is estimated using the past data based on some machine learning algorithm (for example, outcome weighted learning in our implementation). We explore the tradeoff between training and test values of the estimated ITR in single-stage problems by proving theoretically that for a higher probability of following the estimated ITR, the training value converges to the optimal value at a faster rate, while the test value converges at a slower rate. This problem is different from traditional decision problems in the sense that the training data are generated sequentially and are dependent. We also develop a tool that combines martingale with empirical process to tackle the problem that cannot be solved by previous techniques for i.i.d. data. We show by numerical examples that without much loss of the test value, our proposed algorithm can improve the training value significantly as compared to existing methods. Finally, we use a real data study to illustrate the performance of the proposed method.</p>","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"23 250","pages":""},"PeriodicalIF":5.2000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10419117/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Machine Learning Research","FirstCategoryId":"94","ListUrlMain":"","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Learning optimal individualized treatment rules (ITRs) has become increasingly important in the modern era of precision medicine. Many statistical and machine learning methods for learning optimal ITRs have been developed in the literature. However, most existing methods are based on data collected from traditional randomized controlled trials and thus cannot take advantage of the accumulative evidence when patients enter the trials sequentially. It is also ethically important that future patients should have a high probability to be treated optimally based on the updated knowledge so far. In this work, we propose a new design called sequentially rule-adaptive trials to learn optimal ITRs based on the contextual bandit framework, in contrast to the response-adaptive design in traditional adaptive trials. In our design, each entering patient will be allocated with a high probability to the current best treatment for this patient, which is estimated using the past data based on some machine learning algorithm (for example, outcome weighted learning in our implementation). We explore the tradeoff between training and test values of the estimated ITR in single-stage problems by proving theoretically that for a higher probability of following the estimated ITR, the training value converges to the optimal value at a faster rate, while the test value converges at a slower rate. This problem is different from traditional decision problems in the sense that the training data are generated sequentially and are dependent. We also develop a tool that combines martingale with empirical process to tackle the problem that cannot be solved by previous techniques for i.i.d. data. We show by numerical examples that without much loss of the test value, our proposed algorithm can improve the training value significantly as compared to existing methods. Finally, we use a real data study to illustrate the performance of the proposed method.

Abstract Image

微信好友朋友圈 QQ好友复制链接

本刊更多论文

序贯规则自适应试验个体化治疗规则的非渐近性质。

在现代精准医学时代，学习最佳个体化治疗规则(ITRs)变得越来越重要。在文献中已经开发了许多用于学习最优itr的统计和机器学习方法。然而，现有的方法大多是基于传统的随机对照试验收集的数据，无法利用患者顺序进入试验时累积的证据。根据目前的最新知识，未来的患者应该有很高的概率得到最佳治疗，这在伦理上也很重要。在这项工作中，我们提出了一种新的设计，称为顺序规则自适应试验，以学习基于上下文强盗框架的最优itr，而不是传统自适应试验中的响应自适应设计。在我们的设计中，每个进入的患者将以高概率分配给该患者当前的最佳治疗，这是使用基于某些机器学习算法的过去数据来估计的(例如，我们实现中的结果加权学习)。我们通过从理论上证明，对于遵循估计ITR的概率越高，训练值收敛到最优值的速度越快，而测试值收敛的速度越慢，从而探讨了单阶段问题中估计ITR的训练值和测试值之间的权衡。这个问题与传统的决策问题不同，因为训练数据是顺序生成的，并且是相互依赖的。我们还开发了一个将鞅与经验过程相结合的工具，以解决以前的i.i.d数据技术无法解决的问题。通过算例表明，在不损失测试值的情况下，与现有方法相比，本文提出的算法可以显著提高训练值。最后，我们用一个真实的数据研究来说明所提出的方法的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Machine Learning Research 工程技术-计算机：人工智能

CiteScore

18.80

自引率

0.00%

发文量

审稿时长

3 months

期刊介绍： The Journal of Machine Learning Research (JMLR) provides an international forum for the electronic and paper publication of high-quality scholarly articles in all areas of machine learning. All published papers are freely available online. JMLR has a commitment to rigorous yet rapid reviewing. JMLR seeks previously unpublished papers on machine learning that contain: new principled algorithms with sound empirical validation, and with justification of theoretical, psychological, or biological nature; experimental and/or theoretical studies yielding new insight into the design and behavior of learning in intelligent systems; accounts of applications of existing techniques that shed light on the strengths and weaknesses of the methods; formalization of new learning tasks (e.g., in the context of new applications) and of methods for assessing performance on those tasks; development of new analytical frameworks that advance theoretical studies of practical learning methods; computational models of data from natural learning systems at the behavioral or neural level; or extremely well-written surveys of existing work.