Purpose: The expanding presence of the electronic health record (EHR) underscores the necessity for improved interoperability. To test the interoperability within the field of oncology research, our team at Vanderbilt University Medical Center (VUMC) enabled our Epic-based EHR to be compatible with the Minimal Common Oncology Data Elements (mCODE), which is a Fast Healthcare Interoperability Resources (FHIR)-based consensus data standard created to facilitate the transmission of EHRs for patients with cancer.
Methods: Our approach used an extract, transform, load tool for converting EHR data from the VUMC Epic Clarity database into mCODE-compatible profiles. We established a sandbox environment on Microsoft Azure for data migration, deployed a FHIR server to handle application programming interface (API) requests, and mapped VUMC data to align with mCODE structures. In addition, we constructed a web application to demonstrate the practical use of mCODE profiles in health care.
Results: We developed an end-to-end pipeline that converted EHR data into mCODE-compliant profiles, as well as a web application that visualizes genomic data and provides cancer risk assessments. Despite the complexities of aligning traditional EHR databases with mCODE standards and the limitations of FHIR APIs in supporting advanced statistical methodologies, this project successfully demonstrates the practical integration of mCODE standards into existing health care infrastructures.
Conclusion: This study provides a proof of concept for the interoperability of mCODE within a major health care institution's EHR system, highlighting both the potential and the current limitations of FHIR APIs in supporting complex data analysis for oncology research.
目的:电子病历(EHR)的应用范围不断扩大,凸显了提高互操作性的必要性。为了测试肿瘤学研究领域的互操作性,我们范德比尔特大学医学中心(VUMC)的团队使我们基于 Epic 的电子病历与最小通用肿瘤学数据元素(mCODE)兼容,后者是基于快速医疗互操作性资源(FHIR)的共识数据标准,旨在促进癌症患者电子病历的传输:我们的方法是使用一种提取、转换、加载工具,将 VUMC Epic Clarity 数据库中的电子病历数据转换为与 mCODE 兼容的配置文件。我们在 Microsoft Azure 上建立了一个用于数据迁移的沙盒环境,部署了一个 FHIR 服务器来处理应用编程接口(API)请求,并映射 VUMC 数据以与 mCODE 结构保持一致。此外,我们还构建了一个网络应用程序,以演示 mCODE 配置文件在医疗保健领域的实际应用:我们开发了一个端到端的管道,可将电子病历数据转换成符合 mCODE 标准的档案,还开发了一个网络应用程序,可将基因组数据可视化并提供癌症风险评估。尽管将传统的电子病历数据库与 mCODE 标准相匹配非常复杂,而且 FHIR API 在支持高级统计方法方面存在局限性,但该项目成功地展示了将 mCODE 标准实际整合到现有医疗基础设施中的可行性:本研究为 mCODE 在一家大型医疗机构的 EHR 系统中的互操作性提供了概念验证,突出了 FHIR API 在支持肿瘤研究复杂数据分析方面的潜力和当前局限性。
{"title":"Minimal Common Oncology Data Elements Genomics Pilot Project: Enhancing Oncology Research Through Electronic Health Record Interoperability at Vanderbilt University Medical Center.","authors":"Yanwei Li, Jiarong Ye, Yuxin Huang, Jiayi Wu, Xiaohan Liu, Shun Ahmed, Travis Osterman","doi":"10.1200/CCI.23.00249","DOIUrl":"10.1200/CCI.23.00249","url":null,"abstract":"<p><strong>Purpose: </strong>The expanding presence of the electronic health record (EHR) underscores the necessity for improved interoperability. To test the interoperability within the field of oncology research, our team at Vanderbilt University Medical Center (VUMC) enabled our Epic-based EHR to be compatible with the Minimal Common Oncology Data Elements (mCODE), which is a Fast Healthcare Interoperability Resources (FHIR)-based consensus data standard created to facilitate the transmission of EHRs for patients with cancer.</p><p><strong>Methods: </strong>Our approach used an extract, transform, load tool for converting EHR data from the VUMC Epic Clarity database into mCODE-compatible profiles. We established a sandbox environment on Microsoft Azure for data migration, deployed a FHIR server to handle application programming interface (API) requests, and mapped VUMC data to align with mCODE structures. In addition, we constructed a web application to demonstrate the practical use of mCODE profiles in health care.</p><p><strong>Results: </strong>We developed an end-to-end pipeline that converted EHR data into mCODE-compliant profiles, as well as a web application that visualizes genomic data and provides cancer risk assessments. Despite the complexities of aligning traditional EHR databases with mCODE standards and the limitations of FHIR APIs in supporting advanced statistical methodologies, this project successfully demonstrates the practical integration of mCODE standards into existing health care infrastructures.</p><p><strong>Conclusion: </strong>This study provides a proof of concept for the interoperability of mCODE within a major health care institution's EHR system, highlighting both the potential and the current limitations of FHIR APIs in supporting complex data analysis for oncology research.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2300249"},"PeriodicalIF":3.3,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11371088/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141472502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marc L Berger, Patricia A Ganz, Kelly H Zou, Sheldon Greenfield
Randomized trials provide high-quality, internally consistent data on selected clinical questions, but lack generalizability for the aging population who are most often diagnosed with cancer and have comorbid conditions that may affect the interpretation of treatment benefit. The need for high-quality, relevant, and timely data is greater than ever. Promising solutions lie in the collection and analysis of real-world data (RWD), which can potentially provide timely insights about the patient's course during and after initial treatment and the outcomes of important subgroups such as the elderly, rural populations, children, and patients with greater social health needs. However, to inform practice and policy, real-world evidence must be created from trustworthy and comprehensive sources of RWD; these may include pragmatic clinical trials, registries, prospective observational studies, electronic health records (EHRs), administrative claims, and digital technologies. There are unique challenges in oncology since key parameters (eg, cancer stage, biomarker status, genomic assays, imaging response, side effects, quality of life) are not recorded, siloed in inaccessible documents, or available only as free text or unstructured reports in the EHR. Advances in analytics, such as artificial intelligence, may greatly enhance the ability to obtain more granular information from EHRs and support integrated diagnostics; however, they will need to be validated purpose by purpose. We recommend a commitment to standardizing data across sources and building infrastructures that can produce fit-for-purpose RWD that will provide timely understanding of the effectiveness of individual interventions.
{"title":"When Will Real-World Data Fulfill Its Promise to Provide Timely Insights in Oncology?","authors":"Marc L Berger, Patricia A Ganz, Kelly H Zou, Sheldon Greenfield","doi":"10.1200/CCI.24.00039","DOIUrl":"https://doi.org/10.1200/CCI.24.00039","url":null,"abstract":"<p><p>Randomized trials provide high-quality, internally consistent data on selected clinical questions, but lack generalizability for the aging population who are most often diagnosed with cancer and have comorbid conditions that may affect the interpretation of treatment benefit. The need for high-quality, relevant, and timely data is greater than ever. Promising solutions lie in the collection and analysis of real-world data (RWD), which can potentially provide timely insights about the patient's course during and after initial treatment and the outcomes of important subgroups such as the elderly, rural populations, children, and patients with greater social health needs. However, to inform practice and policy, real-world evidence must be created from trustworthy and comprehensive sources of RWD; these may include pragmatic clinical trials, registries, prospective observational studies, electronic health records (EHRs), administrative claims, and digital technologies. There are unique challenges in oncology since key parameters (eg, cancer stage, biomarker status, genomic assays, imaging response, side effects, quality of life) are not recorded, siloed in inaccessible documents, or available only as free text or unstructured reports in the EHR. Advances in analytics, such as artificial intelligence, may greatly enhance the ability to obtain more granular information from EHRs and support integrated diagnostics; however, they will need to be validated purpose by purpose. We recommend a commitment to standardizing data across sources and building infrastructures that can produce fit-for-purpose RWD that will provide timely understanding of the effectiveness of individual interventions.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400039"},"PeriodicalIF":3.3,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141477942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ravi N Sharaf, Natalia Udaltsova, Dan Li, Rish K Pai, Soham Sinha, Zixuan Li, Douglas A Corley
Purpose: Identification of those at risk of hereditary cancer syndromes using electronic health record (EHR) data sources is important for clinical care, quality improvement, and research. We describe diagnostic processes, previously seldom reported, for a common hereditary cancer syndrome, Lynch syndrome (LS), using EHR data within a community-based, multicenter, demographically diverse health system.
Methods: Within a retrospective cohort enrolled between 2015 and 2020 at Kaiser Permanente Northern California, we assessed electronic diagnostic domains for LS including (1) family history of LS-associated cancer; (2) personal history of LS-associated cancer; (3) LS screening via mismatch repair deficiency (MMRD) testing of newly diagnosed malignancy; (4) germline genetic test results; and (5) clinician-entered diagnostic codes for LS. We calculated proportions and overlap for each diagnostic domain descriptively.
Results: Among 5.8 million individuals, (1) 28,492 (0.49%) had a family history of LS-associated cancer of whom 3,635 (13%) underwent genetic testing; (2) 100,046 (1.7%) had a personal history of a LS-associated cancer; and (3) 8,711 (0.1%) were diagnosed with colorectal cancer of whom 7,533 (86%) underwent MMRD screening and of the positive screens (486), 130 (27%) underwent germline testing. One thousand seven hundred and fifty-seven (0.03%) were diagnosed with endometrial cancer of whom 1,613 (92%) underwent MMRD screening and of the 195 who screened positive, 55 (28%) underwent genetic testing. (4) 30,790 (0.05%) had LS germline genetic testing with 707 (0.01%) testing positive; and (5) 1,273 (0.02%) had a clinician-entered diagnosis of LS.
Conclusion: It is feasible to electronically characterize the diagnostic processes of LS. No single data source comprehensively identifies all LS carriers. There is underutilization of LS genetic testing for those eligible and underdiagnosis of LS. Our work informs similar efforts in other settings for hereditary cancer syndromes.
{"title":"Population-Level Identification of Patients With Lynch Syndrome for Clinical Care, Quality Improvement, and Research.","authors":"Ravi N Sharaf, Natalia Udaltsova, Dan Li, Rish K Pai, Soham Sinha, Zixuan Li, Douglas A Corley","doi":"10.1200/CCI.23.00157","DOIUrl":"https://doi.org/10.1200/CCI.23.00157","url":null,"abstract":"<p><strong>Purpose: </strong>Identification of those at risk of hereditary cancer syndromes using electronic health record (EHR) data sources is important for clinical care, quality improvement, and research. We describe diagnostic processes, previously seldom reported, for a common hereditary cancer syndrome, Lynch syndrome (LS), using EHR data within a community-based, multicenter, demographically diverse health system.</p><p><strong>Methods: </strong>Within a retrospective cohort enrolled between 2015 and 2020 at Kaiser Permanente Northern California, we assessed electronic diagnostic domains for LS including (1) family history of LS-associated cancer; (2) personal history of LS-associated cancer; (3) LS screening via mismatch repair deficiency (MMRD) testing of newly diagnosed malignancy; (4) germline genetic test results; and (5) clinician-entered diagnostic codes for LS. We calculated proportions and overlap for each diagnostic domain descriptively.</p><p><strong>Results: </strong>Among 5.8 million individuals, (1) 28,492 (0.49%) had a family history of LS-associated cancer of whom 3,635 (13%) underwent genetic testing; (2) 100,046 (1.7%) had a personal history of a LS-associated cancer; and (3) 8,711 (0.1%) were diagnosed with colorectal cancer of whom 7,533 (86%) underwent MMRD screening and of the positive screens (486), 130 (27%) underwent germline testing. One thousand seven hundred and fifty-seven (0.03%) were diagnosed with endometrial cancer of whom 1,613 (92%) underwent MMRD screening and of the 195 who screened positive, 55 (28%) underwent genetic testing. (4) 30,790 (0.05%) had LS germline genetic testing with 707 (0.01%) testing positive; and (5) 1,273 (0.02%) had a clinician-entered diagnosis of LS.</p><p><strong>Conclusion: </strong>It is feasible to electronically characterize the diagnostic processes of LS. No single data source comprehensively identifies all LS carriers. There is underutilization of LS genetic testing for those eligible and underdiagnosis of LS. Our work informs similar efforts in other settings for hereditary cancer syndromes.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2300157"},"PeriodicalIF":4.2,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141262671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Samuel Smith, Kate Drummond, Anthony Dowling, Iwan Bennett, David Campbell, Ronnie Freilich, Claire Phillips, Elizabeth Ahern, Simone Reeves, Robert Campbell, Ian M Collins, Julie Johns, Megan Dumas, Wei Hong, Peter Gibbs, Lucy Gately
Purpose: Real-world data (RWD) collected on patients treated as part of routine clinical care form the basis of cancer clinical registries. Capturing accurate death data can be challenging, with inaccurate survival data potentially compromising the integrity of registry-based research. Here, we explore the utility of data linkage (DL) to state-based registries to enhance the capture of survival outcomes.
Methods: We identified consecutive adult patients with brain tumors treated in the state of Victoria from the Brain Tumour Registry Australia: Innovation and Translation (BRAIN) database, who had no recorded date of death and no follow-up within the last 6 months. Full name and date of birth were used to match patients in the BRAIN registry with those in the Victorian Births, Deaths and Marriages (BDM) registry. Overall survival (OS) outcomes were compared pre- and post-DL.
Results: Of the 7,346 clinical registry patients, 5,462 (74%) had no date of death and no follow-up recorded within the last 6 months. Of the 5,462 patients, 1,588 (29%) were matched with a date of death in BDM. Factors associated with an increased number of matches were poor prognosis tumors, older age, and social disadvantage. OS was significantly overestimated pre-DL compared with post-DL for the entire cohort (pre- v post-DL: hazard ratio, 1.43; P < .001; median, 29.9 months v 16.7 months) and for most individual tumor types. This finding was present independent of the tumor prognosis.
Conclusion: As revealed by linkage with BDM, a high proportion of patients in a brain cancer clinical registry had missing death data, contributed to by informative censoring, inflating OS calculations. DL to pertinent registries on an ongoing basis should be considered to ensure accurate reporting of survival data and interpretation of RWD outcomes.
{"title":"Improving Clinical Registry Data Quality via Linkage With Survival Data From State-Based Population Registries.","authors":"Samuel Smith, Kate Drummond, Anthony Dowling, Iwan Bennett, David Campbell, Ronnie Freilich, Claire Phillips, Elizabeth Ahern, Simone Reeves, Robert Campbell, Ian M Collins, Julie Johns, Megan Dumas, Wei Hong, Peter Gibbs, Lucy Gately","doi":"10.1200/CCI.24.00025","DOIUrl":"https://doi.org/10.1200/CCI.24.00025","url":null,"abstract":"<p><strong>Purpose: </strong>Real-world data (RWD) collected on patients treated as part of routine clinical care form the basis of cancer clinical registries. Capturing accurate death data can be challenging, with inaccurate survival data potentially compromising the integrity of registry-based research. Here, we explore the utility of data linkage (DL) to state-based registries to enhance the capture of survival outcomes.</p><p><strong>Methods: </strong>We identified consecutive adult patients with brain tumors treated in the state of Victoria from the Brain Tumour Registry Australia: Innovation and Translation (BRAIN) database, who had no recorded date of death and no follow-up within the last 6 months. Full name and date of birth were used to match patients in the BRAIN registry with those in the Victorian Births, Deaths and Marriages (BDM) registry. Overall survival (OS) outcomes were compared pre- and post-DL.</p><p><strong>Results: </strong>Of the 7,346 clinical registry patients, 5,462 (74%) had no date of death and no follow-up recorded within the last 6 months. Of the 5,462 patients, 1,588 (29%) were matched with a date of death in BDM. Factors associated with an increased number of matches were poor prognosis tumors, older age, and social disadvantage. OS was significantly overestimated pre-DL compared with post-DL for the entire cohort (pre- <i>v</i> post-DL: hazard ratio, 1.43; <i>P</i> < .001; median, 29.9 months <i>v</i> 16.7 months) and for most individual tumor types. This finding was present independent of the tumor prognosis.</p><p><strong>Conclusion: </strong>As revealed by linkage with BDM, a high proportion of patients in a brain cancer clinical registry had missing death data, contributed to by informative censoring, inflating OS calculations. DL to pertinent registries on an ongoing basis should be considered to ensure accurate reporting of survival data and interpretation of RWD outcomes.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400025"},"PeriodicalIF":3.3,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141460687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Saverio D'Amico, Lorenzo Dall'Olio, Cesare Rollo, Patricia Alonso, Iñigo Prada-Luengo, Daniele Dall'Olio, Claudia Sala, Elisabetta Sauta, Gianluca Asti, Luca Lanino, Giulia Maggioni, Alessia Campagna, Elena Zazzetti, Mattia Delleani, Maria Elena Bicchieri, Pierandrea Morandini, Victor Savevski, Borja Arroyo, Juan Parras, Lin Pierre Zhao, Uwe Platzbecker, Maria Diez-Campelo, Valeria Santini, Pierre Fenaux, Torsten Haferlach, Anders Krogh, Santiago Zazo, Piero Fariselli, Tiziana Sanavia, Matteo Giovanni Della Porta, Gastone Castellani
Purpose: Rare cancers constitute over 20% of human neoplasms, often affecting patients with unmet medical needs. The development of effective classification and prognostication systems is crucial to improve the decision-making process and drive innovative treatment strategies. We have created and implemented MOSAIC, an artificial intelligence (AI)-based framework designed for multimodal analysis, classification, and personalized prognostic assessment in rare cancers. Clinical validation was performed on myelodysplastic syndrome (MDS), a rare hematologic cancer with clinical and genomic heterogeneities.
Methods: We analyzed 4,427 patients with MDS divided into training and validation cohorts. Deep learning methods were applied to integrate and impute clinical/genomic features. Clustering was performed by combining Uniform Manifold Approximation and Projection for Dimension Reduction + Hierarchical Density-Based Spatial Clustering of Applications with Noise (UMAP + HDBSCAN) methods, compared with the conventional Hierarchical Dirichlet Process (HDP). Linear and AI-based nonlinear approaches were compared for survival prediction. Explainable AI (Shapley Additive Explanations approach [SHAP]) and federated learning were used to improve the interpretation and the performance of the clinical models, integrating them into distributed infrastructure.
Results: UMAP + HDBSCAN clustering obtained a more granular patient stratification, achieving a higher average silhouette coefficient (0.16) with respect to HDP (0.01) and higher balanced accuracy in cluster classification by Random Forest (92.7% ± 1.3% and 85.8% ± 0.8%). AI methods for survival prediction outperform conventional statistical techniques and the reference prognostic tool for MDS. Nonlinear Gradient Boosting Survival stands in the internal (Concordance-Index [C-Index], 0.77; SD, 0.01) and external validation (C-Index, 0.74; SD, 0.02). SHAP analysis revealed that similar features drove patients' subgroups and outcomes in both training and validation cohorts. Federated implementation improved the accuracy of developed models.
Conclusion: MOSAIC provides an explainable and robust framework to optimize classification and prognostic assessment of rare cancers. AI-based approaches demonstrated superior accuracy in capturing genomic similarities and providing individual prognostic information compared with conventional statistical methods. Its federated implementation ensures broad clinical application, guaranteeing high performance and data protection.
{"title":"MOSAIC: An Artificial Intelligence-Based Framework for Multimodal Analysis, Classification, and Personalized Prognostic Assessment in Rare Cancers.","authors":"Saverio D'Amico, Lorenzo Dall'Olio, Cesare Rollo, Patricia Alonso, Iñigo Prada-Luengo, Daniele Dall'Olio, Claudia Sala, Elisabetta Sauta, Gianluca Asti, Luca Lanino, Giulia Maggioni, Alessia Campagna, Elena Zazzetti, Mattia Delleani, Maria Elena Bicchieri, Pierandrea Morandini, Victor Savevski, Borja Arroyo, Juan Parras, Lin Pierre Zhao, Uwe Platzbecker, Maria Diez-Campelo, Valeria Santini, Pierre Fenaux, Torsten Haferlach, Anders Krogh, Santiago Zazo, Piero Fariselli, Tiziana Sanavia, Matteo Giovanni Della Porta, Gastone Castellani","doi":"10.1200/CCI.24.00008","DOIUrl":"10.1200/CCI.24.00008","url":null,"abstract":"<p><strong>Purpose: </strong>Rare cancers constitute over 20% of human neoplasms, often affecting patients with unmet medical needs. The development of effective classification and prognostication systems is crucial to improve the decision-making process and drive innovative treatment strategies. We have created and implemented MOSAIC, an artificial intelligence (AI)-based framework designed for multimodal analysis, classification, and personalized prognostic assessment in rare cancers. Clinical validation was performed on myelodysplastic syndrome (MDS), a rare hematologic cancer with clinical and genomic heterogeneities.</p><p><strong>Methods: </strong>We analyzed 4,427 patients with MDS divided into training and validation cohorts. Deep learning methods were applied to integrate and impute clinical/genomic features. Clustering was performed by combining Uniform Manifold Approximation and Projection for Dimension Reduction + Hierarchical Density-Based Spatial Clustering of Applications with Noise (UMAP + HDBSCAN) methods, compared with the conventional Hierarchical Dirichlet Process (HDP). Linear and AI-based nonlinear approaches were compared for survival prediction. Explainable AI (Shapley Additive Explanations approach [SHAP]) and federated learning were used to improve the interpretation and the performance of the clinical models, integrating them into distributed infrastructure.</p><p><strong>Results: </strong>UMAP + HDBSCAN clustering obtained a more granular patient stratification, achieving a higher average silhouette coefficient (0.16) with respect to HDP (0.01) and higher balanced accuracy in cluster classification by Random Forest (92.7% ± 1.3% and 85.8% ± 0.8%). AI methods for survival prediction outperform conventional statistical techniques and the reference prognostic tool for MDS. Nonlinear Gradient Boosting Survival stands in the internal (Concordance-Index [C-Index], 0.77; SD, 0.01) and external validation (C-Index, 0.74; SD, 0.02). SHAP analysis revealed that similar features drove patients' subgroups and outcomes in both training and validation cohorts. Federated implementation improved the accuracy of developed models.</p><p><strong>Conclusion: </strong>MOSAIC provides an explainable and robust framework to optimize classification and prognostic assessment of rare cancers. AI-based approaches demonstrated superior accuracy in capturing genomic similarities and providing individual prognostic information compared with conventional statistical methods. Its federated implementation ensures broad clinical application, guaranteeing high performance and data protection.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400008"},"PeriodicalIF":3.3,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11371092/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141321999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Erratum: Phenotyping Hepatic Immune-Related Adverse Events in the Setting of Immune Checkpoint Inhibitor Therapy.","authors":"","doi":"10.1200/CCI.24.00125","DOIUrl":"https://doi.org/10.1200/CCI.24.00125","url":null,"abstract":"","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400125"},"PeriodicalIF":3.3,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141443743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Advancements in variant curation challenges: minority representation and incomplete data reporting.
推进变体整理的挑战:少数群体的代表性和不完整的数据报告。
{"title":"Addressing the Historic Challenges of <i>BRCA1</i> and <i>BRCA2</i> Variant Curation and the Need for More Diverse Representation.","authors":"Yanin Chavarri-Guerra, Jeffrey N Weitzel","doi":"10.1200/CCI.24.00076","DOIUrl":"10.1200/CCI.24.00076","url":null,"abstract":"<p><p>Advancements in variant curation challenges: minority representation and incomplete data reporting.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400076"},"PeriodicalIF":3.3,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141307387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Future of Cancer Treatment Guidelines: Integrating Real-World Insights for Equitable Cancer Care.","authors":"Rebecca A Miksad, Gregory S Calip","doi":"10.1200/CCI.24.00081","DOIUrl":"https://doi.org/10.1200/CCI.24.00081","url":null,"abstract":"","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400081"},"PeriodicalIF":3.3,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141499587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xu Zuo, Ashok Kumar, Shuhan Shen, Jianfu Li, Grace Cong, Edward Jin, Qingxia Chen, Jeremy L Warner, Ping Yang, Hua Xu
Purpose: The RECIST guidelines provide a standardized approach for evaluating the response of cancer to treatment, allowing for consistent comparison of treatment efficacy across different therapies and patients. However, collecting such information from electronic health records manually can be extremely labor-intensive and time-consuming because of the complexity and volume of clinical notes. The aim of this study is to apply natural language processing (NLP) techniques to automate this process, minimizing manual data collection efforts, and improving the consistency and reliability of the results.
Methods: We proposed a complex, hybrid NLP system that automates the process of extracting, linking, and summarizing anticancer therapy and associated RECIST-like responses from narrative clinical text. The system consists of multiple machine learning-/deep learning-based and rule-based modules for diverse NLP tasks such as named entity recognition, assertion classification, relation extraction, and text normalization, to address different challenges associated with anticancer therapy and response information extraction. We then evaluated the system performances on two independent test sets from different institutions to demonstrate its effectiveness and generalizability.
Results: The system used domain-specific language models, BioBERT and BioClinicalBERT, for high-performance therapy mentions identification and RECIST responses extraction and categorization. The best-performing model achieved a 0.66 score in linking therapy and RECIST response mentions, with end-to-end performance peaking at 0.74 after relation normalization, indicating substantial efficacy with room for improvement.
Conclusion: We developed, implemented, and tested an information extraction system from clinical notes for cancer treatment and efficacy assessment information. We expect this system will support future cancer research, particularly oncologic studies that focus on efficiently assessing the effectiveness and reliability of cancer therapeutics.
{"title":"Extracting Systemic Anticancer Therapy and Response Information From Clinical Notes Following the RECIST Definition.","authors":"Xu Zuo, Ashok Kumar, Shuhan Shen, Jianfu Li, Grace Cong, Edward Jin, Qingxia Chen, Jeremy L Warner, Ping Yang, Hua Xu","doi":"10.1200/CCI.23.00166","DOIUrl":"https://doi.org/10.1200/CCI.23.00166","url":null,"abstract":"<p><strong>Purpose: </strong>The RECIST guidelines provide a standardized approach for evaluating the response of cancer to treatment, allowing for consistent comparison of treatment efficacy across different therapies and patients. However, collecting such information from electronic health records manually can be extremely labor-intensive and time-consuming because of the complexity and volume of clinical notes. The aim of this study is to apply natural language processing (NLP) techniques to automate this process, minimizing manual data collection efforts, and improving the consistency and reliability of the results.</p><p><strong>Methods: </strong>We proposed a complex, hybrid NLP system that automates the process of extracting, linking, and summarizing anticancer therapy and associated RECIST-like responses from narrative clinical text. The system consists of multiple machine learning-/deep learning-based and rule-based modules for diverse NLP tasks such as named entity recognition, assertion classification, relation extraction, and text normalization, to address different challenges associated with anticancer therapy and response information extraction. We then evaluated the system performances on two independent test sets from different institutions to demonstrate its effectiveness and generalizability.</p><p><strong>Results: </strong>The system used domain-specific language models, BioBERT and BioClinicalBERT, for high-performance therapy mentions identification and RECIST responses extraction and categorization. The best-performing model achieved a 0.66 score in linking therapy and RECIST response mentions, with end-to-end performance peaking at 0.74 after relation normalization, indicating substantial efficacy with room for improvement.</p><p><strong>Conclusion: </strong>We developed, implemented, and tested an information extraction system from clinical notes for cancer treatment and efficacy assessment information. We expect this system will support future cancer research, particularly oncologic studies that focus on efficiently assessing the effectiveness and reliability of cancer therapeutics.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2300166"},"PeriodicalIF":4.2,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141421853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Akshay Swaminathan, Alexander L Ren, Janet Y Wu, Aarohi Bhargava-Shah, Ivan Lopez, Ujwal Srivastava, Vassilis Alexopoulos, Rebecca Pizzitola, Brandon Bui, Layth Alkhani, Susan Lee, Nathan Mohit, Noel Seo, Nicholas Macedo, Winson Cheng, William Wang, Edward Tran, Reena Thomas, Olivier Gevaert
Purpose: Data on lines of therapy (LOTs) for cancer treatment are important for clinical oncology research, but LOTs are not explicitly recorded in electronic health records (EHRs). We present an efficient approach for clinical data abstraction and a flexible algorithm to derive LOTs from EHR-based medication data on patients with glioblastoma multiforme (GBM).
Methods: Nonclinicians were trained to abstract the diagnosis of GBM from EHRs, and their accuracy was compared with abstraction performed by clinicians. The resulting data were used to build a cohort of patients with confirmed GBM diagnosis. An algorithm was developed to derive LOTs using structured medication data, accounting for the addition and discontinuation of therapies and drug class. Descriptive statistics were calculated and time-to-next-treatment (TTNT) analysis was performed using the Kaplan-Meier method.
Results: Treating clinicians as the gold standard, nonclinicians abstracted GBM diagnosis with a sensitivity of 0.98, specificity 1.00, positive predictive value 1.00, and negative predictive value 0.90, suggesting that nonclinician abstraction of GBM diagnosis was comparable with clinician abstraction. Of 693 patients with a confirmed diagnosis of GBM, 246 patients contained structured information about the types of medications received. Of them, 165 (67.1%) received a first-line therapy (1L) of temozolomide, and the median TTNT from the start of 1L was 179 days.
Conclusion: We described a workflow for extracting diagnosis of GBM and LOT from EHR data that combines nonclinician abstraction with algorithmic processing, demonstrating comparable accuracy with clinician abstraction and highlighting the potential for scalable and efficient EHR-based oncology research.
{"title":"Extraction of Unstructured Electronic Health Records to Evaluate Glioblastoma Treatment Patterns.","authors":"Akshay Swaminathan, Alexander L Ren, Janet Y Wu, Aarohi Bhargava-Shah, Ivan Lopez, Ujwal Srivastava, Vassilis Alexopoulos, Rebecca Pizzitola, Brandon Bui, Layth Alkhani, Susan Lee, Nathan Mohit, Noel Seo, Nicholas Macedo, Winson Cheng, William Wang, Edward Tran, Reena Thomas, Olivier Gevaert","doi":"10.1200/CCI.23.00091","DOIUrl":"10.1200/CCI.23.00091","url":null,"abstract":"<p><strong>Purpose: </strong>Data on lines of therapy (LOTs) for cancer treatment are important for clinical oncology research, but LOTs are not explicitly recorded in electronic health records (EHRs). We present an efficient approach for clinical data abstraction and a flexible algorithm to derive LOTs from EHR-based medication data on patients with glioblastoma multiforme (GBM).</p><p><strong>Methods: </strong>Nonclinicians were trained to abstract the diagnosis of GBM from EHRs, and their accuracy was compared with abstraction performed by clinicians. The resulting data were used to build a cohort of patients with confirmed GBM diagnosis. An algorithm was developed to derive LOTs using structured medication data, accounting for the addition and discontinuation of therapies and drug class. Descriptive statistics were calculated and time-to-next-treatment (TTNT) analysis was performed using the Kaplan-Meier method.</p><p><strong>Results: </strong>Treating clinicians as the gold standard, nonclinicians abstracted GBM diagnosis with a sensitivity of 0.98, specificity 1.00, positive predictive value 1.00, and negative predictive value 0.90, suggesting that nonclinician abstraction of GBM diagnosis was comparable with clinician abstraction. Of 693 patients with a confirmed diagnosis of GBM, 246 patients contained structured information about the types of medications received. Of them, 165 (67.1%) received a first-line therapy (1L) of temozolomide, and the median TTNT from the start of 1L was 179 days.</p><p><strong>Conclusion: </strong>We described a workflow for extracting diagnosis of GBM and LOT from EHR data that combines nonclinician abstraction with algorithmic processing, demonstrating comparable accuracy with clinician abstraction and highlighting the potential for scalable and efficient EHR-based oncology research.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2300091"},"PeriodicalIF":3.3,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11371099/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141302007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}