CURE：在大规模患者数据上预先训练的深度学习框架，用于估计治疗效果

IF 6.7 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Patterns Pub Date : 2024-05-01 DOI:10.1016/j.patter.2024.100973

Ruoqi Liu, Pin-Yu Chen, Ping Zhang

{"title":"CURE：在大规模患者数据上预先训练的深度学习框架，用于估计治疗效果","authors":"Ruoqi Liu, Pin-Yu Chen, Ping Zhang","doi":"10.1016/j.patter.2024.100973","DOIUrl":null,"url":null,"abstract":"<p>Treatment effect estimation (TEE) aims to identify the causal effects of treatments on important outcomes. Current machine-learning-based methods, mainly trained on labeled data for specific treatments or outcomes, can be sub-optimal with limited labeled data. In this article, we propose a new pre-training and fine-tuning framework, CURE (causal treatment effect estimation), for TEE from observational data. CURE is pre-trained on large-scale unlabeled patient data to learn representative contextual patient representations and fine-tuned on labeled patient data for TEE. We present a new sequence encoding approach for longitudinal patient data embedding both structure and time. Evaluated on four downstream TEE tasks, CURE outperforms the state-of-the-art methods, marking a 7% increase in area under the precision-recall curve and an 8% rise in the influence-function-based precision of estimating heterogeneous effects. Validation with four randomized clinical trials confirms its efficacy in producing trial conclusions, highlighting CURE’s capacity to supplement traditional clinical trials.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"2011 1","pages":""},"PeriodicalIF":6.7000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CURE: A deep learning framework pre-trained on large-scale patient data for treatment effect estimation\",\"authors\":\"Ruoqi Liu, Pin-Yu Chen, Ping Zhang\",\"doi\":\"10.1016/j.patter.2024.100973\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Treatment effect estimation (TEE) aims to identify the causal effects of treatments on important outcomes. Current machine-learning-based methods, mainly trained on labeled data for specific treatments or outcomes, can be sub-optimal with limited labeled data. In this article, we propose a new pre-training and fine-tuning framework, CURE (causal treatment effect estimation), for TEE from observational data. CURE is pre-trained on large-scale unlabeled patient data to learn representative contextual patient representations and fine-tuned on labeled patient data for TEE. We present a new sequence encoding approach for longitudinal patient data embedding both structure and time. Evaluated on four downstream TEE tasks, CURE outperforms the state-of-the-art methods, marking a 7% increase in area under the precision-recall curve and an 8% rise in the influence-function-based precision of estimating heterogeneous effects. Validation with four randomized clinical trials confirms its efficacy in producing trial conclusions, highlighting CURE’s capacity to supplement traditional clinical trials.</p>\",\"PeriodicalId\":36242,\"journal\":{\"name\":\"Patterns\",\"volume\":\"2011 1\",\"pages\":\"\"},\"PeriodicalIF\":6.7000,\"publicationDate\":\"2024-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Patterns\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1016/j.patter.2024.100973\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Patterns","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.patter.2024.100973","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

治疗效果估计（TEE）旨在确定治疗对重要结果的因果效应。目前基于机器学习的方法主要是针对特定治疗或结果的标注数据进行训练，但在标注数据有限的情况下，这些方法可能无法达到最佳效果。在本文中，我们提出了一种新的预训练和微调框架 CURE（因果治疗效果估计），用于从观察数据中获得 TEE。CURE 在大规模无标记患者数据上进行预训练，以学习有代表性的上下文患者表征，并在有标记患者数据上进行微调，以用于 TEE。我们提出了一种新的序列编码方法，用于嵌入结构和时间的纵向患者数据。在四项下游 TEE 任务的评估中，CURE 的表现优于最先进的方法，其精度-召回曲线下的面积增加了 7%，基于影响函数的异质效应估计精度提高了 8%。四项随机临床试验的验证证实了 CURE 在得出试验结论方面的功效，凸显了 CURE 补充传统临床试验的能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

CURE: A deep learning framework pre-trained on large-scale patient data for treatment effect estimation

Treatment effect estimation (TEE) aims to identify the causal effects of treatments on important outcomes. Current machine-learning-based methods, mainly trained on labeled data for specific treatments or outcomes, can be sub-optimal with limited labeled data. In this article, we propose a new pre-training and fine-tuning framework, CURE (causal treatment effect estimation), for TEE from observational data. CURE is pre-trained on large-scale unlabeled patient data to learn representative contextual patient representations and fine-tuned on labeled patient data for TEE. We present a new sequence encoding approach for longitudinal patient data embedding both structure and time. Evaluated on four downstream TEE tasks, CURE outperforms the state-of-the-art methods, marking a 7% increase in area under the precision-recall curve and an 8% rise in the influence-function-based precision of estimating heterogeneous effects. Validation with four randomized clinical trials confirms its efficacy in producing trial conclusions, highlighting CURE’s capacity to supplement traditional clinical trials.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊