Xing He, Ruoqi Wei, Yu Huang, Zhaoyi Chen, Tianchen Lyu, Sarah Bost, Jiayi Tong, Lu Li, Yujia Zhou, Zhao Li, Jingchuan Guo, Huilin Tang, Fei Wang, Steven DeKosky, Hua Xu, Yong Chen, Rui Zhang, Jie Xu, Yi Guo, Yonghui Wu, Jiang Bian
{"title":"Develop and validate a computable phenotype for the identification of Alzheimer's disease patients using electronic health record data.","authors":"Xing He, Ruoqi Wei, Yu Huang, Zhaoyi Chen, Tianchen Lyu, Sarah Bost, Jiayi Tong, Lu Li, Yujia Zhou, Zhao Li, Jingchuan Guo, Huilin Tang, Fei Wang, Steven DeKosky, Hua Xu, Yong Chen, Rui Zhang, Jie Xu, Yi Guo, Yonghui Wu, Jiang Bian","doi":"10.1002/dad2.12613","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Alzheimer's disease (AD) is often misclassified in electronic health records (EHRs) when relying solely on diagnosis codes. This study aimed to develop a more accurate, computable phenotype (CP) for identifying AD patients using structured and unstructured EHR data.</p><p><strong>Methods: </strong>We used EHRs from the University of Florida Health (UFHealth) system and created rule-based CPs iteratively through manual chart reviews. The CPs were then validated using data from the University of Texas Health Science Center at Houston (UTHealth) and the University of Minnesota (UMN).</p><p><strong>Results: </strong>Our best-performing CP was \"<i>patient has at least 2 AD diagnoses and AD-related keywords in AD encounters</i>,\" with an F1-score of 0.817 at UF, 0.961 at UTHealth, and 0.623 at UMN, respectively.</p><p><strong>Discussion: </strong>We developed and validated rule-based CPs for AD identification with good performance, which will be crucial for studies that aim to use real-world data like EHRs.</p><p><strong>Highlights: </strong>Developed a computable phenotype (CP) to identify Alzheimer's disease (AD) patients using EHR data.Utilized both structured and unstructured EHR data to enhance CP accuracy.Achieved a high F1-score of 0.817 at UFHealth, and 0.961 and 0.623 at UTHealth and UMN.Validated the CP across different demographics, ensuring robustness and fairness.</p>","PeriodicalId":53226,"journal":{"name":"Alzheimer''s and Dementia: Diagnosis, Assessment and Disease Monitoring","volume":"16 3","pages":"e12613"},"PeriodicalIF":4.0000,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11220631/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Alzheimer''s and Dementia: Diagnosis, Assessment and Disease Monitoring","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/dad2.12613","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Alzheimer's disease (AD) is often misclassified in electronic health records (EHRs) when relying solely on diagnosis codes. This study aimed to develop a more accurate, computable phenotype (CP) for identifying AD patients using structured and unstructured EHR data.
Methods: We used EHRs from the University of Florida Health (UFHealth) system and created rule-based CPs iteratively through manual chart reviews. The CPs were then validated using data from the University of Texas Health Science Center at Houston (UTHealth) and the University of Minnesota (UMN).
Results: Our best-performing CP was "patient has at least 2 AD diagnoses and AD-related keywords in AD encounters," with an F1-score of 0.817 at UF, 0.961 at UTHealth, and 0.623 at UMN, respectively.
Discussion: We developed and validated rule-based CPs for AD identification with good performance, which will be crucial for studies that aim to use real-world data like EHRs.
Highlights: Developed a computable phenotype (CP) to identify Alzheimer's disease (AD) patients using EHR data.Utilized both structured and unstructured EHR data to enhance CP accuracy.Achieved a high F1-score of 0.817 at UFHealth, and 0.961 and 0.623 at UTHealth and UMN.Validated the CP across different demographics, ensuring robustness and fairness.
期刊介绍:
Alzheimer''s & Dementia: Diagnosis, Assessment & Disease Monitoring (DADM) is an open access, peer-reviewed, journal from the Alzheimer''s Association® that will publish new research that reports the discovery, development and validation of instruments, technologies, algorithms, and innovative processes. Papers will cover a range of topics interested in the early and accurate detection of individuals with memory complaints and/or among asymptomatic individuals at elevated risk for various forms of memory disorders. The expectation for published papers will be to translate fundamental knowledge about the neurobiology of the disease into practical reports that describe both the conceptual and methodological aspects of the submitted scientific inquiry. Published topics will explore the development of biomarkers, surrogate markers, and conceptual/methodological challenges. Publication priority will be given to papers that 1) describe putative surrogate markers that accurately track disease progression, 2) biomarkers that fulfill international regulatory requirements, 3) reports from large, well-characterized population-based cohorts that comprise the heterogeneity and diversity of asymptomatic individuals and 4) algorithmic development that considers multi-marker arrays (e.g., integrated-omics, genetics, biofluids, imaging, etc.) and advanced computational analytics and technologies.