Thomas Z Li, Kaiwen Xu, Neil C Chada, Heidi Chen, Michael Knight, Sanja Antic, Kim L Sandler, Fabien Maldonado, Bennett A Landman, Thomas A Lasko
{"title":"Curating retrospective multimodal and longitudinal data for community cohorts at risk for lung cancer.","authors":"Thomas Z Li, Kaiwen Xu, Neil C Chada, Heidi Chen, Michael Knight, Sanja Antic, Kim L Sandler, Fabien Maldonado, Bennett A Landman, Thomas A Lasko","doi":"10.3233/CBM-230340","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Large community cohorts are useful for lung cancer research, allowing for the analysis of risk factors and development of predictive models.</p><p><strong>Objective: </strong>A robust methodology for (1) identifying lung cancer and pulmonary nodules diagnoses as well as (2) associating multimodal longitudinal data with these events from electronic health record (EHRs) is needed to optimally curate cohorts at scale.</p><p><strong>Methods: </strong>In this study, we leveraged (1) SNOMED concepts to develop ICD-based decision rules for building a cohort that captured lung cancer and pulmonary nodules and (2) clinical knowledge to define time windows for collecting longitudinal imaging and clinical concepts. We curated three cohorts with clinical data and repeated imaging for subjects with pulmonary nodules from our Vanderbilt University Medical Center.</p><p><strong>Results: </strong>Our approach achieved an estimated sensitivity 0.930 (95% CI: [0.879, 0.969]), specificity of 0.996 (95% CI: [0.989, 1.00]), positive predictive value of 0.979 (95% CI: [0.959, 1.000]), and negative predictive value of 0.987 (95% CI: [0.976, 0.994]) for distinguishing lung cancer from subjects with SPNs.</p><p><strong>Conclusion: </strong>This work represents a general strategy for high-throughput curation of multi-modal longitudinal cohorts at risk for lung cancer from routinely collected EHRs.</p>","PeriodicalId":56320,"journal":{"name":"Cancer Biomarkers","volume":" ","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11380038/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer Biomarkers","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3233/CBM-230340","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Large community cohorts are useful for lung cancer research, allowing for the analysis of risk factors and development of predictive models.
Objective: A robust methodology for (1) identifying lung cancer and pulmonary nodules diagnoses as well as (2) associating multimodal longitudinal data with these events from electronic health record (EHRs) is needed to optimally curate cohorts at scale.
Methods: In this study, we leveraged (1) SNOMED concepts to develop ICD-based decision rules for building a cohort that captured lung cancer and pulmonary nodules and (2) clinical knowledge to define time windows for collecting longitudinal imaging and clinical concepts. We curated three cohorts with clinical data and repeated imaging for subjects with pulmonary nodules from our Vanderbilt University Medical Center.
Results: Our approach achieved an estimated sensitivity 0.930 (95% CI: [0.879, 0.969]), specificity of 0.996 (95% CI: [0.989, 1.00]), positive predictive value of 0.979 (95% CI: [0.959, 1.000]), and negative predictive value of 0.987 (95% CI: [0.976, 0.994]) for distinguishing lung cancer from subjects with SPNs.
Conclusion: This work represents a general strategy for high-throughput curation of multi-modal longitudinal cohorts at risk for lung cancer from routinely collected EHRs.
期刊介绍:
Concentrating on molecular biomarkers in cancer research, Cancer Biomarkers publishes original research findings (and reviews solicited by the editor) on the subject of the identification of markers associated with the disease processes whether or not they are an integral part of the pathological lesion.
The disease markers may include, but are not limited to, genomic, epigenomic, proteomics, cellular and morphologic, and genetic factors predisposing to the disease or indicating the occurrence of the disease. Manuscripts on these factors or biomarkers, either in altered forms, abnormal concentrations or with abnormal tissue distribution leading to disease causation will be accepted.