Development and Portability of a Text Mining Algorithm for Capturing Disease Progression in Electronic Health Records of Patients With Stage IV Non-Small Cell Lung Cancer.
M V Verschueren, H Abedian Kalkhoran, M Deenen, B E E M van den Borne, J Zwaveling, L E Visser, L T Bloem, B J M Peters, E M W van de Garde
{"title":"Development and Portability of a Text Mining Algorithm for Capturing Disease Progression in Electronic Health Records of Patients With Stage IV Non-Small Cell Lung Cancer.","authors":"M V Verschueren, H Abedian Kalkhoran, M Deenen, B E E M van den Borne, J Zwaveling, L E Visser, L T Bloem, B J M Peters, E M W van de Garde","doi":"10.1200/CCI.24.00053","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>The objective was to develop and evaluate the portability of a text mining algorithm for prospectively capturing disease progression in electronic health record (EHR) data of patients with metastatic non-small cell lung cancer (mNSCLC) treated with immunochemotherapy.</p><p><strong>Methods: </strong>This study used EHR data from patients with mNSCLC receiving immunochemotherapy (between October 1, 2018, and December 31, 2022) in four Dutch hospitals. A text mining algorithm for capturing disease progression was developed in hospitals 1 and 2 and then transferred to hospitals 3 and 4 to evaluate portability. Performance metrics were calculated by comparing its outcomes with manual chart review. In addition, data were simulated to come available over time to assess performance in real-time applications. Median progression-free survival (PFS) was calculated using the Kaplan-Meier method to compare text mining with manual chart review.</p><p><strong>Results: </strong>During development and portability, the text mining algorithm performed well in capturing disease progression, with all performance scores >90%. When real-time performance was simulated, the performance scores in all four hospitals exceeded 90% from week 15 after the start of follow-up. Although the exact progression dates varied in 46 patients of 157 patients with progressive disease, the number of patients labeled with progression too early (n = 24) and too late (n = 22) was well balanced with discrepancies ranging from -116 to 384 days. Nevertheless, the PFS curves constructed with text mining and manual chart review were highly similar for each hospital.</p><p><strong>Conclusion: </strong>In this study, an accurate text mining algorithm for capturing disease progression in the EHR data of patients with mNSCLC was developed. The algorithm was portable across different hospitals, and the performance over time was good, making this an interesting approach for prospective follow-up of multicenter cohorts.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":null,"pages":null},"PeriodicalIF":3.3000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11469628/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JCO Clinical Cancer Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1200/CCI.24.00053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/4 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: The objective was to develop and evaluate the portability of a text mining algorithm for prospectively capturing disease progression in electronic health record (EHR) data of patients with metastatic non-small cell lung cancer (mNSCLC) treated with immunochemotherapy.
Methods: This study used EHR data from patients with mNSCLC receiving immunochemotherapy (between October 1, 2018, and December 31, 2022) in four Dutch hospitals. A text mining algorithm for capturing disease progression was developed in hospitals 1 and 2 and then transferred to hospitals 3 and 4 to evaluate portability. Performance metrics were calculated by comparing its outcomes with manual chart review. In addition, data were simulated to come available over time to assess performance in real-time applications. Median progression-free survival (PFS) was calculated using the Kaplan-Meier method to compare text mining with manual chart review.
Results: During development and portability, the text mining algorithm performed well in capturing disease progression, with all performance scores >90%. When real-time performance was simulated, the performance scores in all four hospitals exceeded 90% from week 15 after the start of follow-up. Although the exact progression dates varied in 46 patients of 157 patients with progressive disease, the number of patients labeled with progression too early (n = 24) and too late (n = 22) was well balanced with discrepancies ranging from -116 to 384 days. Nevertheless, the PFS curves constructed with text mining and manual chart review were highly similar for each hospital.
Conclusion: In this study, an accurate text mining algorithm for capturing disease progression in the EHR data of patients with mNSCLC was developed. The algorithm was portable across different hospitals, and the performance over time was good, making this an interesting approach for prospective follow-up of multicenter cohorts.