利用医疗索赔数据进行亚组识别的高维迭代因果森林 (hdiCF)。

IF 5 2区 医学 Q1 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH American journal of epidemiology Pub Date : 2024-09-05 DOI:10.1093/aje/kwae322
Tiansheng Wang, Virginia Pate, Richard Wyss, John B Buse, Michael R Kosorok, Til Stürmer
{"title":"利用医疗索赔数据进行亚组识别的高维迭代因果森林 (hdiCF)。","authors":"Tiansheng Wang, Virginia Pate, Richard Wyss, John B Buse, Michael R Kosorok, Til Stürmer","doi":"10.1093/aje/kwae322","DOIUrl":null,"url":null,"abstract":"<p><p>We recently developed a machine-learning subgrouping algorithm, iterative causal forest (iCF), to identify subgroups with heterogeneous treatment effects (HTEs) using predefined covariates. However, such predefined covariates may miss or poorly define important features leading to inaccurate subgrouping. To address such limitations, we developed a new semi-automatic subgrouping algorithm, hdiCF, which adapts methodology from high-dimensional propensity score for feature recognition in claims data. The hdiCF algorithm has 3 steps: 1) high-dimensional feature identification by International Classification of Diseases, Current Procedural Terminology, and Anatomical Therapeutic Chemical codes (in/outpatient diagnoses, procedures, prescriptions) and creation of ordinal variables by frequency of occurrence; 2) propensity score trimming and high-dimensional feature preparation; 3) iCF implementation to identify subgroups. We applied hdiCF in a 20% random sample of fee-for-service Medicare beneficiaries who initiated sodium-glucose cotransporter-2 inhibitors (SGLT2i) or glucagon-like peptide-1 receptor agonists to identify subgroups with HTEs for incidence of hospitalized heart failure. HdiCF findings were consistent with studies suggesting SGLT2i to be more beneficial for patients with pre-existing heart failure or chronic kidney disease. HdiCF is not dependent on prior hypotheses about HTEs and identifies subgroups with markers for potential HTEs in real-world evidence studies where active-comparator, new-user study designs limit the potential for unmeasured confounding.</p>","PeriodicalId":7472,"journal":{"name":"American journal of epidemiology","volume":null,"pages":null},"PeriodicalIF":5.0000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"High-dimensional Iterative Causal Forest (hdiCF) for Subgroup Identification Using Health Care Claims Data.\",\"authors\":\"Tiansheng Wang, Virginia Pate, Richard Wyss, John B Buse, Michael R Kosorok, Til Stürmer\",\"doi\":\"10.1093/aje/kwae322\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>We recently developed a machine-learning subgrouping algorithm, iterative causal forest (iCF), to identify subgroups with heterogeneous treatment effects (HTEs) using predefined covariates. However, such predefined covariates may miss or poorly define important features leading to inaccurate subgrouping. To address such limitations, we developed a new semi-automatic subgrouping algorithm, hdiCF, which adapts methodology from high-dimensional propensity score for feature recognition in claims data. The hdiCF algorithm has 3 steps: 1) high-dimensional feature identification by International Classification of Diseases, Current Procedural Terminology, and Anatomical Therapeutic Chemical codes (in/outpatient diagnoses, procedures, prescriptions) and creation of ordinal variables by frequency of occurrence; 2) propensity score trimming and high-dimensional feature preparation; 3) iCF implementation to identify subgroups. We applied hdiCF in a 20% random sample of fee-for-service Medicare beneficiaries who initiated sodium-glucose cotransporter-2 inhibitors (SGLT2i) or glucagon-like peptide-1 receptor agonists to identify subgroups with HTEs for incidence of hospitalized heart failure. HdiCF findings were consistent with studies suggesting SGLT2i to be more beneficial for patients with pre-existing heart failure or chronic kidney disease. HdiCF is not dependent on prior hypotheses about HTEs and identifies subgroups with markers for potential HTEs in real-world evidence studies where active-comparator, new-user study designs limit the potential for unmeasured confounding.</p>\",\"PeriodicalId\":7472,\"journal\":{\"name\":\"American journal of epidemiology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":5.0000,\"publicationDate\":\"2024-09-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American journal of epidemiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1093/aje/kwae322\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American journal of epidemiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/aje/kwae322","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0

摘要

我们最近开发了一种机器学习亚组算法--迭代因果森林(iCF),利用预定义协变量识别具有异质性治疗效果(HTE)的亚组。然而,这种预定义协变量可能会遗漏或不能很好地定义重要特征,从而导致亚组划分不准确。为了解决这些局限性,我们开发了一种新的半自动亚分组算法 hdiCF,它采用了高维倾向评分的方法来识别报销数据中的特征。hdiCF 算法分为 3 个步骤:1)通过国际疾病分类、现行医疗程序术语和解剖治疗化学代码(住院/门诊病人诊断、手术、处方)识别高维特征,并通过发生频率创建序变量;2)倾向得分修剪和高维特征准备;3)实施 iCF 以识别亚组。我们在 20% 的付费医疗保险受益人随机样本中应用了 hdiCF,这些受益人开始使用钠-葡萄糖共转运体-2 抑制剂 (SGLT2i) 或胰高血糖素样肽-1 受体激动剂,以确定住院心衰发生率具有 HTEs 的亚组。HdiCF 的发现与研究结果一致,即 SGLT2i 对已有心衰或慢性肾病的患者更有益。HdiCF 并不依赖于先前关于 HTEs 的假设,它能在真实世界的证据研究中识别出具有潜在 HTEs 标记的亚组,而在真实世界的证据研究中,主动比较者、新用户研究设计限制了未测量混杂的可能性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
High-dimensional Iterative Causal Forest (hdiCF) for Subgroup Identification Using Health Care Claims Data.

We recently developed a machine-learning subgrouping algorithm, iterative causal forest (iCF), to identify subgroups with heterogeneous treatment effects (HTEs) using predefined covariates. However, such predefined covariates may miss or poorly define important features leading to inaccurate subgrouping. To address such limitations, we developed a new semi-automatic subgrouping algorithm, hdiCF, which adapts methodology from high-dimensional propensity score for feature recognition in claims data. The hdiCF algorithm has 3 steps: 1) high-dimensional feature identification by International Classification of Diseases, Current Procedural Terminology, and Anatomical Therapeutic Chemical codes (in/outpatient diagnoses, procedures, prescriptions) and creation of ordinal variables by frequency of occurrence; 2) propensity score trimming and high-dimensional feature preparation; 3) iCF implementation to identify subgroups. We applied hdiCF in a 20% random sample of fee-for-service Medicare beneficiaries who initiated sodium-glucose cotransporter-2 inhibitors (SGLT2i) or glucagon-like peptide-1 receptor agonists to identify subgroups with HTEs for incidence of hospitalized heart failure. HdiCF findings were consistent with studies suggesting SGLT2i to be more beneficial for patients with pre-existing heart failure or chronic kidney disease. HdiCF is not dependent on prior hypotheses about HTEs and identifies subgroups with markers for potential HTEs in real-world evidence studies where active-comparator, new-user study designs limit the potential for unmeasured confounding.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
American journal of epidemiology
American journal of epidemiology 医学-公共卫生、环境卫生与职业卫生
CiteScore
7.40
自引率
4.00%
发文量
221
审稿时长
3-6 weeks
期刊介绍: The American Journal of Epidemiology is the oldest and one of the premier epidemiologic journals devoted to the publication of empirical research findings, opinion pieces, and methodological developments in the field of epidemiologic research. It is a peer-reviewed journal aimed at both fellow epidemiologists and those who use epidemiologic data, including public health workers and clinicians.
期刊最新文献
Assessing trends in internalizing symptoms among racialized and minoritized adolescents: results from the Monitoring the Future Study 2005-2020. Targeted learning with an undersmoothed LASSO propensity score model for large-scale covariate adjustment in health-care database studies. DNA methylation as a possible mechanism linking childhood adversity and health: results from a 2-sample mendelian randomization study. Invited commentary: it's not all about residual confounding-a plea for quantitative bias analysis for epidemiologic researchers and educators. Validation of algorithms in studies based on routinely collected health data: general principles.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1