Tiansheng Wang, Virginia Pate, Richard Wyss, John B Buse, Michael R Kosorok, Til Stürmer
{"title":"利用医疗索赔数据进行亚组识别的高维迭代因果森林 (hdiCF)。","authors":"Tiansheng Wang, Virginia Pate, Richard Wyss, John B Buse, Michael R Kosorok, Til Stürmer","doi":"10.1093/aje/kwae322","DOIUrl":null,"url":null,"abstract":"<p><p>We recently developed a machine-learning subgrouping algorithm, iterative causal forest (iCF), to identify subgroups with heterogeneous treatment effects (HTEs) using predefined covariates. However, such predefined covariates may miss or poorly define important features leading to inaccurate subgrouping. To address such limitations, we developed a new semi-automatic subgrouping algorithm, hdiCF, which adapts methodology from high-dimensional propensity score for feature recognition in claims data. The hdiCF algorithm has 3 steps: 1) high-dimensional feature identification by International Classification of Diseases, Current Procedural Terminology, and Anatomical Therapeutic Chemical codes (in/outpatient diagnoses, procedures, prescriptions) and creation of ordinal variables by frequency of occurrence; 2) propensity score trimming and high-dimensional feature preparation; 3) iCF implementation to identify subgroups. We applied hdiCF in a 20% random sample of fee-for-service Medicare beneficiaries who initiated sodium-glucose cotransporter-2 inhibitors (SGLT2i) or glucagon-like peptide-1 receptor agonists to identify subgroups with HTEs for incidence of hospitalized heart failure. HdiCF findings were consistent with studies suggesting SGLT2i to be more beneficial for patients with pre-existing heart failure or chronic kidney disease. HdiCF is not dependent on prior hypotheses about HTEs and identifies subgroups with markers for potential HTEs in real-world evidence studies where active-comparator, new-user study designs limit the potential for unmeasured confounding.</p>","PeriodicalId":7472,"journal":{"name":"American journal of epidemiology","volume":null,"pages":null},"PeriodicalIF":5.0000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"High-dimensional Iterative Causal Forest (hdiCF) for Subgroup Identification Using Health Care Claims Data.\",\"authors\":\"Tiansheng Wang, Virginia Pate, Richard Wyss, John B Buse, Michael R Kosorok, Til Stürmer\",\"doi\":\"10.1093/aje/kwae322\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>We recently developed a machine-learning subgrouping algorithm, iterative causal forest (iCF), to identify subgroups with heterogeneous treatment effects (HTEs) using predefined covariates. However, such predefined covariates may miss or poorly define important features leading to inaccurate subgrouping. To address such limitations, we developed a new semi-automatic subgrouping algorithm, hdiCF, which adapts methodology from high-dimensional propensity score for feature recognition in claims data. The hdiCF algorithm has 3 steps: 1) high-dimensional feature identification by International Classification of Diseases, Current Procedural Terminology, and Anatomical Therapeutic Chemical codes (in/outpatient diagnoses, procedures, prescriptions) and creation of ordinal variables by frequency of occurrence; 2) propensity score trimming and high-dimensional feature preparation; 3) iCF implementation to identify subgroups. We applied hdiCF in a 20% random sample of fee-for-service Medicare beneficiaries who initiated sodium-glucose cotransporter-2 inhibitors (SGLT2i) or glucagon-like peptide-1 receptor agonists to identify subgroups with HTEs for incidence of hospitalized heart failure. HdiCF findings were consistent with studies suggesting SGLT2i to be more beneficial for patients with pre-existing heart failure or chronic kidney disease. HdiCF is not dependent on prior hypotheses about HTEs and identifies subgroups with markers for potential HTEs in real-world evidence studies where active-comparator, new-user study designs limit the potential for unmeasured confounding.</p>\",\"PeriodicalId\":7472,\"journal\":{\"name\":\"American journal of epidemiology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":5.0000,\"publicationDate\":\"2024-09-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American journal of epidemiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1093/aje/kwae322\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American journal of epidemiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/aje/kwae322","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
High-dimensional Iterative Causal Forest (hdiCF) for Subgroup Identification Using Health Care Claims Data.
We recently developed a machine-learning subgrouping algorithm, iterative causal forest (iCF), to identify subgroups with heterogeneous treatment effects (HTEs) using predefined covariates. However, such predefined covariates may miss or poorly define important features leading to inaccurate subgrouping. To address such limitations, we developed a new semi-automatic subgrouping algorithm, hdiCF, which adapts methodology from high-dimensional propensity score for feature recognition in claims data. The hdiCF algorithm has 3 steps: 1) high-dimensional feature identification by International Classification of Diseases, Current Procedural Terminology, and Anatomical Therapeutic Chemical codes (in/outpatient diagnoses, procedures, prescriptions) and creation of ordinal variables by frequency of occurrence; 2) propensity score trimming and high-dimensional feature preparation; 3) iCF implementation to identify subgroups. We applied hdiCF in a 20% random sample of fee-for-service Medicare beneficiaries who initiated sodium-glucose cotransporter-2 inhibitors (SGLT2i) or glucagon-like peptide-1 receptor agonists to identify subgroups with HTEs for incidence of hospitalized heart failure. HdiCF findings were consistent with studies suggesting SGLT2i to be more beneficial for patients with pre-existing heart failure or chronic kidney disease. HdiCF is not dependent on prior hypotheses about HTEs and identifies subgroups with markers for potential HTEs in real-world evidence studies where active-comparator, new-user study designs limit the potential for unmeasured confounding.
期刊介绍:
The American Journal of Epidemiology is the oldest and one of the premier epidemiologic journals devoted to the publication of empirical research findings, opinion pieces, and methodological developments in the field of epidemiologic research.
It is a peer-reviewed journal aimed at both fellow epidemiologists and those who use epidemiologic data, including public health workers and clinicians.