Pub Date : 2026-01-01Epub Date: 2026-01-06DOI: 10.1200/CCI-25-00194
Peter May, Sina Nokodian, Christoph Nuernbergk, Manuel Knauer, Maike Hefter, Aaron Becker von Rose, Florian Bassermann, Johannes Jung
Purpose: In high-risk specialties such as oncology, errors in clinical documentation can have severe consequences, highlighting a need for enhanced safety checks. We therefore aimed to evaluate the capability of frontier large language models (LLMs) to identify and correct errors in complex clinical documentation in oncology.
Methods: We conducted a two-phase evaluation. First, we assessed LLMs (GPT o4-mini and Gemini 2.5 Pro) on 1,000 synthetic clinical hematology/oncology vignettes with controlled errors, benchmarking against human expert data for error flag detection and sentence localization. Second, we evaluated advanced LLMs and a local LLM (Gemma 3 27B) against six clinicians in detecting single, predefined, and clinically relevant errors, such as wrong risk classifications or omission of critical medication within 90 synthetic discharge summaries from oncologic patients.
Results: LLMs outperformed human benchmark in error flag and sentence localization tasks, with Gemini 2.5 Pro achieving top accuracies of 0.928 and 0.915, respectively. Results were robust across subgroups and scalable, with simultaneous processing of up to 50 vignettes. Within complex discharge summaries, Gemini 2.5 Pro and GPT o4-mini-high identified 97.8% and 87.8% of injected errors, respectively, substantially exceeding the 47.8% average detection rate of human specialists. Gemma 3 27B detected 35.6% of errors. Analysis of error detection overlap revealed a synergistic potential for hybrid human-artificial intelligence (AI) systems.
Conclusion: Frontier LLMs exhibit superior error-detection capabilities and speed compared with both local models and human specialists, who are inherently time-constrained. Although synthetic data provide a controlled testbed, real-world evaluation across diverse errors and documentation styles remains critical. Advanced LLMs can serve as powerful assistants for clinical documentation reviews, substantially reducing the risk of oversight and clinician workload. Integrating LLM-driven error flagging into electronic health record workflows offers a promising strategy for enhancing documentation accuracy, treatment quality, and patient safety in oncology.
{"title":"Artificial Intelligence-Assisted Error Detection in Complex Clinical Documentation: Leveraging Large Language Models to Enhance Patient Safety in Oncology.","authors":"Peter May, Sina Nokodian, Christoph Nuernbergk, Manuel Knauer, Maike Hefter, Aaron Becker von Rose, Florian Bassermann, Johannes Jung","doi":"10.1200/CCI-25-00194","DOIUrl":"10.1200/CCI-25-00194","url":null,"abstract":"<p><strong>Purpose: </strong>In high-risk specialties such as oncology, errors in clinical documentation can have severe consequences, highlighting a need for enhanced safety checks. We therefore aimed to evaluate the capability of frontier large language models (LLMs) to identify and correct errors in complex clinical documentation in oncology.</p><p><strong>Methods: </strong>We conducted a two-phase evaluation. First, we assessed LLMs (GPT o4-mini and Gemini 2.5 Pro) on 1,000 synthetic clinical hematology/oncology vignettes with controlled errors, benchmarking against human expert data for error flag detection and sentence localization. Second, we evaluated advanced LLMs and a local LLM (Gemma 3 27B) against six clinicians in detecting single, predefined, and clinically relevant errors, such as wrong risk classifications or omission of critical medication within 90 synthetic discharge summaries from oncologic patients.</p><p><strong>Results: </strong>LLMs outperformed human benchmark in error flag and sentence localization tasks, with Gemini 2.5 Pro achieving top accuracies of 0.928 and 0.915, respectively. Results were robust across subgroups and scalable, with simultaneous processing of up to 50 vignettes. Within complex discharge summaries, Gemini 2.5 Pro and GPT o4-mini-high identified 97.8% and 87.8% of injected errors, respectively, substantially exceeding the 47.8% average detection rate of human specialists. Gemma 3 27B detected 35.6% of errors. Analysis of error detection overlap revealed a synergistic potential for hybrid human-artificial intelligence (AI) systems.</p><p><strong>Conclusion: </strong>Frontier LLMs exhibit superior error-detection capabilities and speed compared with both local models and human specialists, who are inherently time-constrained. Although synthetic data provide a controlled testbed, real-world evaluation across diverse errors and documentation styles remains critical. Advanced LLMs can serve as powerful assistants for clinical documentation reviews, substantially reducing the risk of oversight and clinician workload. Integrating LLM-driven error flagging into electronic health record workflows offers a promising strategy for enhancing documentation accuracy, treatment quality, and patient safety in oncology.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500194"},"PeriodicalIF":2.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12794695/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145913830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2026-01-28DOI: 10.1200/CCI-25-00190
Catherine Ning, Dimitris Bertsimas, Per Eystein Lønning, Federico N Auecio, Richard Burkhart, Felix Balzer, Stefan Buettner, Hideo Baba, Itaru Endo, Georgios Stasinos, Johan Gagnière, Cornelis Verhoef, Martin E Kreis, Georgios Antonios Margonis
Purpose: We explore whether survival model performance in underrepresented high- and low-risk subgroups-regions of the prognostic spectrum where clinical decisions are most consequential-can be improved through targeted restructuring of the training data set. Rather than modifying model architecture, we propose a novel risk-stratified sampling method that addresses imbalances in prognostic subgroup density to support more reliable learning in underrepresented tail strata.
Methods: We introduce a novel methodology that partitions patients by baseline prognostic risk and applies matching within each stratum to equalize representation across the risk distribution. We implement this framework on a cohort of 1,799 patients with resected colorectal liver metastases (CRLM), including 1,197 who received adjuvant chemotherapy and 602 who did not. All models used in this study are Cox proportional hazards models trained on the same set of selected variables. Model performance is assessed via Harrell's C index and Integrated Calibration Index, with internal validation using Efron's bias-corrected bootstrapping. External validation is conducted on two independent CRLM data sets.
Results: Cox models trained on risk-balanced cohorts showed consistent improvements in internal validation compared with models trained on the full data set. The proposed approach preserved overall model calibration while noticeably improving stratified C index values in underrepresented high- and low-risk strata of the external cohorts.
Conclusion: Our findings suggest that survival model performance in observational oncology cohorts can be meaningfully improved through targeted rebalancing of the training data across prognostic risk strata. This approach offers a practical and model-agnostic complement to existing methods, especially in applications where predictive reliability across the full risk continuum is critical to downstream clinical decisions.
{"title":"Improving Survival Models in Health Care by Balancing Imbalanced Cohorts: A Novel Approach.","authors":"Catherine Ning, Dimitris Bertsimas, Per Eystein Lønning, Federico N Auecio, Richard Burkhart, Felix Balzer, Stefan Buettner, Hideo Baba, Itaru Endo, Georgios Stasinos, Johan Gagnière, Cornelis Verhoef, Martin E Kreis, Georgios Antonios Margonis","doi":"10.1200/CCI-25-00190","DOIUrl":"https://doi.org/10.1200/CCI-25-00190","url":null,"abstract":"<p><strong>Purpose: </strong>We explore whether survival model performance in underrepresented high- and low-risk subgroups-regions of the prognostic spectrum where clinical decisions are most consequential-can be improved through targeted restructuring of the training data set. Rather than modifying model architecture, we propose a novel risk-stratified sampling method that addresses imbalances in prognostic subgroup density to support more reliable learning in underrepresented tail strata.</p><p><strong>Methods: </strong>We introduce a novel methodology that partitions patients by baseline prognostic risk and applies matching within each stratum to equalize representation across the risk distribution. We implement this framework on a cohort of 1,799 patients with resected colorectal liver metastases (CRLM), including 1,197 who received adjuvant chemotherapy and 602 who did not. All models used in this study are Cox proportional hazards models trained on the same set of selected variables. Model performance is assessed via Harrell's C index and Integrated Calibration Index, with internal validation using Efron's bias-corrected bootstrapping. External validation is conducted on two independent CRLM data sets.</p><p><strong>Results: </strong>Cox models trained on risk-balanced cohorts showed consistent improvements in internal validation compared with models trained on the full data set. The proposed approach preserved overall model calibration while noticeably improving stratified C index values in underrepresented high- and low-risk strata of the external cohorts.</p><p><strong>Conclusion: </strong>Our findings suggest that survival model performance in observational oncology cohorts can be meaningfully improved through targeted rebalancing of the training data across prognostic risk strata. This approach offers a practical and model-agnostic complement to existing methods, especially in applications where predictive reliability across the full risk continuum is critical to downstream clinical decisions.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500190"},"PeriodicalIF":2.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12854512/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146094800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2026-01-14DOI: 10.1200/CCI-25-00177
Joshi Hogenboom, Varsha Gouthamchand, Charlotte Cairns, Silvie H M Janssen, Kirsty Way, Andre L A J Dekker, Winette T A van der Graaf, Anne-Sophie Darlington, Olga Husson, Leonard Y L Wee, Johan van Soest, Aiara Lobo Gomes
Purpose: Rare diseases are difficult to fully capture, and regularly call for large, geographically dispersed initiatives. Such initiatives are often met with data harmonization challenges. These challenges render data incompatible and impede successful realization. The STRONG AYA project is such an initiative, specifically focusing on adolescent and young adult (AYAs) with cancer. STRONG AYA is setting up a federated data infrastructure containing data of varying format. Here, we elaborate on how we used health care-agnostic semantic web technologies to overcome such challenges.
Methods: We structured the STRONG AYA case-mix and core outcome measures concepts and their properties as knowledge graphs. Having identified the corresponding standard terminologies, we developed a semantic map on the basis of the knowledge graphs and the here introduced annotation helper plugin for Flyover. Flyover is a tool that converts structured data into resource description framework (RDF) triples and enables semantic interoperability. As a demonstration, we mapped data that are to be included in the STRONG AYA infrastructure.
Results: The knowledge graphs provided a comprehensive overview of the large number of STRONG AYA concepts. The semantic terminology mapping and annotation helper allowed us to query data with incomprehensible terminologies, without changing them. Both the knowledge graphs and semantic map were made available on a Hugo webpage for increased transparency and understanding.
Conclusion: The use of semantic web technologies, such as RDF and knowledge graphs, is a viable solution to overcome challenges regarding data interoperability and reusability for a federated AYA cancer data infrastructure without being bound to rigid standardized schemas. The linkage of semantically meaningful concepts to otherwise incomprehensible data elements demonstrates how by using these domain-agnostic technologies we made nonstandardized health care data interoperable.
{"title":"Knowledge Representation of a Multicenter Adolescent and Young Adult Cancer Infrastructure: Development of the STRONG AYA Knowledge Graph.","authors":"Joshi Hogenboom, Varsha Gouthamchand, Charlotte Cairns, Silvie H M Janssen, Kirsty Way, Andre L A J Dekker, Winette T A van der Graaf, Anne-Sophie Darlington, Olga Husson, Leonard Y L Wee, Johan van Soest, Aiara Lobo Gomes","doi":"10.1200/CCI-25-00177","DOIUrl":"10.1200/CCI-25-00177","url":null,"abstract":"<p><strong>Purpose: </strong>Rare diseases are difficult to fully capture, and regularly call for large, geographically dispersed initiatives. Such initiatives are often met with data harmonization challenges. These challenges render data incompatible and impede successful realization. The STRONG AYA project is such an initiative, specifically focusing on adolescent and young adult (AYAs) with cancer. STRONG AYA is setting up a federated data infrastructure containing data of varying format. Here, we elaborate on how we used health care-agnostic semantic web technologies to overcome such challenges.</p><p><strong>Methods: </strong>We structured the STRONG AYA case-mix and core outcome measures concepts and their properties as knowledge graphs. Having identified the corresponding standard terminologies, we developed a semantic map on the basis of the knowledge graphs and the here introduced annotation helper plugin for <i>Flyover</i>. <i>Flyover</i> is a tool that converts structured data into resource description framework (RDF) triples and enables semantic interoperability. As a demonstration, we mapped data that are to be included in the STRONG AYA infrastructure.</p><p><strong>Results: </strong>The knowledge graphs provided a comprehensive overview of the large number of STRONG AYA concepts. The semantic terminology mapping and annotation helper allowed us to query data with incomprehensible terminologies, without changing them. Both the knowledge graphs and semantic map were made available on a Hugo webpage for increased transparency and understanding.</p><p><strong>Conclusion: </strong>The use of semantic web technologies, such as RDF and knowledge graphs, is a viable solution to overcome challenges regarding data interoperability and reusability for a federated AYA cancer data infrastructure without being bound to rigid standardized schemas. The linkage of semantically meaningful concepts to otherwise incomprehensible data elements demonstrates how by using these domain-agnostic technologies we made nonstandardized health care data interoperable.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500177"},"PeriodicalIF":2.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12834280/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145985727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2026-01-06DOI: 10.1200/CCI-25-00255
William McGahan, Nick Butler, Thomas O'Rourke, Bernard Mark Smithers, David Cavallucci
Purpose: To address incomplete and inconsistent classification of pancreatic cancer resectability according to International Association of Pancreatology (IAP) anatomic, biologic, and conditional criteria.
Materials and methods: We designed, implemented, and evaluated an interoperable, web-based platform that captured structured pretreatment data and performed algorithm-driven resectability classification. Linked modules supported referral to and discussion at multidisciplinary team meetings (MDTMs) at two quaternary hospitals (June 2021-February 2022) and populated downstream documentation. In a pre-post study, Pearson χ2 test and multivariable logistic regression (odds ratios [ORs] with 95% CI) compared data completeness (primary end point), as well as the distribution of IAP-defined resectability and treatment intent (secondary end points). In the postintervention cohort, overall survival (OS) was stratified by IAP resectability using Kaplan-Meier curves and compared using the log-rank test. Hazard ratios (HRs) with 95% CIs and log-rank statistics were calculated for individual resectability criteria using Cox models. All tests were two-sided with nominal significance (P < .05). An embedded module evaluated workflow integration and user experience.
Results: Ninety-five patients with pancreatic cancer were referred to MDTMs during the intervention period, of whom 71 were eligible. Compared with 71 preintervention patients, the system improved documentation of tumor-vessel relationships (OR, 9.39 [95% CI, 4.43 to 21.7]), locoregional lymphadenopathy (OR, 30.5 [95% CI, 11.1 to 102]), and performance status (PS; OR, 3.34 [95% CI, 1.67 to 6.85]), reducing the number with unknown resectability (OR, 0.10 [95% CI, 0.03 to 0.25]). PS ≥ 2 (HR, 2.16 [95% CI, 1.06 to 4.43]) and serum CA19.9 ≥ 500 U/mL (HR, 1.94 [95% CI, 1.03 to 3.63]) were significantly associated with OS, whereas anatomic criteria were not.
Discussion: A synoptic intervention integrated into MDTM workflows across multiple sites improved structured data capture, reduced unknown resectability, and highlighted the relevance of biologic and conditional criteria in addition to tumor anatomy.
{"title":"Synoptic Multidisciplinary Team Meeting Workflows to Promote Guideline-Based Classification of Resectability in Pancreatic Cancer: A Multicenter Prospective Study.","authors":"William McGahan, Nick Butler, Thomas O'Rourke, Bernard Mark Smithers, David Cavallucci","doi":"10.1200/CCI-25-00255","DOIUrl":"10.1200/CCI-25-00255","url":null,"abstract":"<p><strong>Purpose: </strong>To address incomplete and inconsistent classification of pancreatic cancer resectability according to International Association of Pancreatology (IAP) anatomic, biologic, and conditional criteria.</p><p><strong>Materials and methods: </strong>We designed, implemented, and evaluated an interoperable, web-based platform that captured structured pretreatment data and performed algorithm-driven resectability classification. Linked modules supported referral to and discussion at multidisciplinary team meetings (MDTMs) at two quaternary hospitals (June 2021-February 2022) and populated downstream documentation. In a pre-post study, Pearson χ<sup>2</sup> test and multivariable logistic regression (odds ratios [ORs] with 95% CI) compared data completeness (primary end point), as well as the distribution of IAP-defined resectability and treatment intent (secondary end points). In the postintervention cohort, overall survival (OS) was stratified by IAP resectability using Kaplan-Meier curves and compared using the log-rank test. Hazard ratios (HRs) with 95% CIs and log-rank statistics were calculated for individual resectability criteria using Cox models. All tests were two-sided with nominal significance (<i>P</i> < .05). An embedded module evaluated workflow integration and user experience.</p><p><strong>Results: </strong>Ninety-five patients with pancreatic cancer were referred to MDTMs during the intervention period, of whom 71 were eligible. Compared with 71 preintervention patients, the system improved documentation of tumor-vessel relationships (OR, 9.39 [95% CI, 4.43 to 21.7]), locoregional lymphadenopathy (OR, 30.5 [95% CI, 11.1 to 102]), and performance status (PS; OR, 3.34 [95% CI, 1.67 to 6.85]), reducing the number with <i>unknown</i> resectability (OR, 0.10 [95% CI, 0.03 to 0.25]). PS ≥ 2 (HR, 2.16 [95% CI, 1.06 to 4.43]) and serum CA19.9 ≥ 500 U/mL (HR, 1.94 [95% CI, 1.03 to 3.63]) were significantly associated with OS, whereas anatomic criteria were not.</p><p><strong>Discussion: </strong>A synoptic intervention integrated into MDTM workflows across multiple sites improved structured data capture, reduced <i>unknown</i> resectability, and highlighted the relevance of biologic and conditional criteria in addition to tumor anatomy.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500255"},"PeriodicalIF":2.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145913860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2026-01-07DOI: 10.1200/CCI-25-00286
Jiasheng Wang, Kirti Arora, David M Swoboda, Aziz Nazha
Purpose: Clinical guidelines are essential for evidence-based oncology care but are often long, complex, and difficult to navigate. We developed a multiagent artificial intelligence (AI) system to accurately retrieve and interpret guideline content in response to guideline-based clinical questions.
Methods: We included 34 ASCO guidelines published between January 2021 and December 2024. Using a multiagent framework, we assigned distinct roles to AI agents: a Coordinator Agent selected the relevant guideline, specialized Tumor Board Agents extracted information from text, tables, and figures, and a Reviewer Agent synthesized a final answer. A total of 100 open-ended questions were created on the basis of the guideline content. The system's performance was compared with GPT-4o, Claude 3.7, Gemini 2.5 flash, DeepSeek-R1, and the ASCO Guidelines Assistant.
Results: The multi-agent system achieved (94% [95% CI, 89.3 to 98.7]) accuracy in selecting the correct guidelines and (90% [95% CI, 84.1 to 95.9]) accuracy in answering questions. This significantly outperformed GPT-4o (48%), Claude 3.7 (49%), Gemini 2.5 (50%), DeepSeek-R1 (58%), and the ASCO Guidelines Assistant (67%, all P < .01, McNemar's test). Most errors were due to incorrect guideline selection or misinterpretation; no hallucinated answers were observed. Removing the Coordinator Agent reduced accuracy to 40%, and excluding tables and figures reduced accuracy to 51%.
Conclusion: By assigning specialized tasks to AI agents and incorporating visual elements from clinical guidelines, our system outperformed existing tools in accurately answering oncology questions. This pilot study, limited to ASCO guidelines, may improve access to guideline-based care.
{"title":"Tumor Board-Inspired Multiagent Artificial Intelligence System for Interpreting Oncology Guidelines.","authors":"Jiasheng Wang, Kirti Arora, David M Swoboda, Aziz Nazha","doi":"10.1200/CCI-25-00286","DOIUrl":"https://doi.org/10.1200/CCI-25-00286","url":null,"abstract":"<p><strong>Purpose: </strong>Clinical guidelines are essential for evidence-based oncology care but are often long, complex, and difficult to navigate. We developed a multiagent artificial intelligence (AI) system to accurately retrieve and interpret guideline content in response to guideline-based clinical questions.</p><p><strong>Methods: </strong>We included 34 ASCO guidelines published between January 2021 and December 2024. Using a multiagent framework, we assigned distinct roles to AI agents: a Coordinator Agent selected the relevant guideline, specialized Tumor Board Agents extracted information from text, tables, and figures, and a Reviewer Agent synthesized a final answer. A total of 100 open-ended questions were created on the basis of the guideline content. The system's performance was compared with GPT-4o, Claude 3.7, Gemini 2.5 flash, DeepSeek-R1, and the ASCO Guidelines Assistant.</p><p><strong>Results: </strong>The multi-agent system achieved (94% [95% CI, 89.3 to 98.7]) accuracy in selecting the correct guidelines and (90% [95% CI, 84.1 to 95.9]) accuracy in answering questions. This significantly outperformed GPT-4o (48%), Claude 3.7 (49%), Gemini 2.5 (50%), DeepSeek-R1 (58%), and the ASCO Guidelines Assistant (67%, all <i>P</i> < .01, McNemar's test). Most errors were due to incorrect guideline selection or misinterpretation; no hallucinated answers were observed. Removing the Coordinator Agent reduced accuracy to 40%, and excluding tables and figures reduced accuracy to 51%.</p><p><strong>Conclusion: </strong>By assigning specialized tasks to AI agents and incorporating visual elements from clinical guidelines, our system outperformed existing tools in accurately answering oncology questions. This pilot study, limited to ASCO guidelines, may improve access to guideline-based care.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500286"},"PeriodicalIF":2.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145918949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2026-01-07DOI: 10.1200/CCI-24-00334
Anobel Y Odisho, Andrew W Liu, William A Pace, Marvin N Carlisle, Robert Krumm, Janet E Cowan, Peter R Carroll, Matthew R Cooperberg
Purpose: Radiology reports are stored as plain text in most electronic health records, rendering the data computationally inaccessible. Large language models are powerful tools for analyzing unstructured text but relatively untested in urologic oncology. We aimed to develop a pipeline to extract data from plain text prostate magnetic resonance imaging (MRI) reports using GPT4.0 and compare the accuracy to manually abstracted data.
Methods: We developed a data pipeline using a secure, enterprise-wide deployment of OpenAI's GPT-4.0 to automatically extract data elements from prostate MRI report text when presented with prostate MRI reports. Identical prompts and reports were sent multiple times to determine response variability. We extracted 15 data elements per report and compared accuracy to a manually abstracted gold standard.
Results: Across 424 prostate MRI reports, GPT-4.0 response accuracy was consistently above 95%. Individual field accuracies were 98.3% (96.3%-99.3%) for prostate-specific antigen density, 97.4% (95.4%-98.7%) for extracapsular extension, and 98.1% (96.3%-99.2%) for TNM stage, and had a median of 98.1% (96.3%-99.2%), a mean of 97.2% (95.2%-98.3%), and a range of 99.8% (98.7%-100.0%) for number of suspicious lesions to 87.7% (84.2%-90.7%) for identification of lesion location in the base of the prostate. Response variability over five repeated runs ranged from 0.14% to 3.61%, differed based on the data element extracted (P < .001), and was inversely correlated with accuracy (P < .001). In disagreements between manual and GPT-4.0 extracted data, GPT-4.0 responses were more often deemed correct by an additional reviewer.
Conclusion: GPT-4.0 had high accuracy with low variability in extracting data points from prostate cancer MRI reports with low upfront programming requirements. This represents an effective tool to expedite medical data extraction for clinical and research use cases.
{"title":"Generative Artificial Intelligence Successfully Automates Data Extraction From Unstructured Magnetic Resonance Imaging Reports: Feasibility in Prostate Cancer Care.","authors":"Anobel Y Odisho, Andrew W Liu, William A Pace, Marvin N Carlisle, Robert Krumm, Janet E Cowan, Peter R Carroll, Matthew R Cooperberg","doi":"10.1200/CCI-24-00334","DOIUrl":"https://doi.org/10.1200/CCI-24-00334","url":null,"abstract":"<p><strong>Purpose: </strong>Radiology reports are stored as plain text in most electronic health records, rendering the data computationally inaccessible. Large language models are powerful tools for analyzing unstructured text but relatively untested in urologic oncology. We aimed to develop a pipeline to extract data from plain text prostate magnetic resonance imaging (MRI) reports using GPT4.0 and compare the accuracy to manually abstracted data.</p><p><strong>Methods: </strong>We developed a data pipeline using a secure, enterprise-wide deployment of OpenAI's GPT-4.0 to automatically extract data elements from prostate MRI report text when presented with prostate MRI reports. Identical prompts and reports were sent multiple times to determine response variability. We extracted 15 data elements per report and compared accuracy to a manually abstracted gold standard.</p><p><strong>Results: </strong>Across 424 prostate MRI reports, GPT-4.0 response accuracy was consistently above 95%. Individual field accuracies were 98.3% (96.3%-99.3%) for prostate-specific antigen density, 97.4% (95.4%-98.7%) for extracapsular extension, and 98.1% (96.3%-99.2%) for TNM stage, and had a median of 98.1% (96.3%-99.2%), a mean of 97.2% (95.2%-98.3%), and a range of 99.8% (98.7%-100.0%) for number of suspicious lesions to 87.7% (84.2%-90.7%) for identification of lesion location in the base of the prostate. Response variability over five repeated runs ranged from 0.14% to 3.61%, differed based on the data element extracted (<i>P</i> < .001), and was inversely correlated with accuracy (<i>P</i> < .001). In disagreements between manual and GPT-4.0 extracted data, GPT-4.0 responses were more often deemed correct by an additional reviewer.</p><p><strong>Conclusion: </strong>GPT-4.0 had high accuracy with low variability in extracting data points from prostate cancer MRI reports with low upfront programming requirements. This represents an effective tool to expedite medical data extraction for clinical and research use cases.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2400334"},"PeriodicalIF":2.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145919369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2026-01-14DOI: 10.1200/CCI-25-00148
Emily C Liang, Yein Jeon, Yang Qiao, Xiancheng Wu, Jennifer J Huang, Andrew J Portuguese, Ryan Basom, Aiko Torkelson, Delaney Kirchmeier, Kristina Braathen, Andrew J Cowan, Mazyar Shadman, Alexandre V Hirayama, Brian G Till, Erik L Kimble, Qian Wu, Jordan Gauthier
Purpose: Immune effector cell-associated hematotoxicity (ICAHT) is a major cause of nonrelapse mortality after chimeric antigen receptor (CAR) T-cell therapy. We hypothesized that unsupervised time-series clustering could better identify archetypal patterns of early hematotoxicity compared to the early ICAHT (eICAHT) grading system.
Methods: We applied unsupervised k-means time-series clustering based on Euclidean distances to longitudinal absolute neutrophil count (ANC) data from days +0 through +30 post-CAR T-cell infusion in 691 patients treated at our center (training set: n = 483, 70%; test set: n = 208, 30%).
Results: Within our training set, we identified an optimal cluster solution based on four ANC recovery clusters, which were labeled as very good, good, poor, and very poor. We trained a random forest (RF) model including the top five most important features (day +3, +4, +5, +26, and +27 ANC values) to predict the cluster assignments. Within our test set, we applied the RF model to predict cluster assignments. Compared with the eICAHT criteria, the RF-predicted clusters were more compact and better separated (Dunn index: 0.078 v 0.034; average silhouette width: 0.12 v 0.010). In addition, the RF model identified patients in the good recovery cluster with intermediate overall survival (hazard ratio [HR], 1.70 [95% CI, 1.05 to 2.74]; P = .029; reference, very good), which was not captured by grade 2 eICAHT (HR, 1.37 [95% CI, 0.80 to 2.35]; P = .25; reference, grade 0-1).
Conclusion: Unsupervised time-series clustering identified distinct and clinically relevant patterns of hematotoxicity after CAR T-cell therapy. We trained and tested an RF model that accurately predicted cluster assignments using only five features. Predictions can be generated using our online web application.
目的:免疫效应细胞相关血液毒性(ICAHT)是嵌合抗原受体(CAR) t细胞治疗后非复发性死亡的主要原因。我们假设,与早期ICAHT (eICAHT)分级系统相比,无监督时间序列聚类可以更好地识别早期血液毒性的原型模式。方法:我们将基于欧几里得距离的无监督k-均值时间序列聚类应用于691名在我们中心接受治疗的患者(训练集:n = 483,70%;测试集:n = 208,30%) car - t细胞输注后+0至+30天的纵向绝对中性粒细胞计数(ANC)数据。结果:在我们的训练集中,我们确定了一个基于四个ANC恢复集群的最优集群解决方案,这些集群被标记为非常好、好、差和非常差。我们训练了一个随机森林(RF)模型,包括前五个最重要的特征(日+3、+4、+5、+26和+27 ANC值)来预测聚类分配。在我们的测试集中,我们应用RF模型来预测集群分配。与eICAHT标准相比,rf预测的聚类更紧凑,分离性更好(Dunn指数:0.078 v 0.034;平均轮廓宽度:0.12 v 0.010)。此外,RF模型确定了处于良好恢复组的患者,总生存率为中等(风险比[HR], 1.70 [95% CI, 1.05至2.74];P = 0.029;参考文献,非常好),2级eICAHT未捕获这些患者(风险比[HR], 1.37 [95% CI, 0.80至2.35];P = 0.25;参考文献,0-1级)。结论:无监督的时间序列聚类识别出CAR - t细胞治疗后血液毒性的独特和临床相关模式。我们训练并测试了一个RF模型,该模型仅使用五个特征就能准确地预测聚类分配。预测可以使用我们的在线web应用程序生成。
{"title":"Time-Series Clustering Captures Patterns of Early Immune Effector Cell-Associated Hematotoxicity That Are Predictable Using Tree-Based Models.","authors":"Emily C Liang, Yein Jeon, Yang Qiao, Xiancheng Wu, Jennifer J Huang, Andrew J Portuguese, Ryan Basom, Aiko Torkelson, Delaney Kirchmeier, Kristina Braathen, Andrew J Cowan, Mazyar Shadman, Alexandre V Hirayama, Brian G Till, Erik L Kimble, Qian Wu, Jordan Gauthier","doi":"10.1200/CCI-25-00148","DOIUrl":"10.1200/CCI-25-00148","url":null,"abstract":"<p><strong>Purpose: </strong>Immune effector cell-associated hematotoxicity (ICAHT) is a major cause of nonrelapse mortality after chimeric antigen receptor (CAR) T-cell therapy. We hypothesized that unsupervised time-series clustering could better identify archetypal patterns of early hematotoxicity compared to the early ICAHT (eICAHT) grading system.</p><p><strong>Methods: </strong>We applied unsupervised k-means time-series clustering based on Euclidean distances to longitudinal absolute neutrophil count (ANC) data from days +0 through +30 post-CAR T-cell infusion in 691 patients treated at our center (training set: n = 483, 70%; test set: n = 208, 30%).</p><p><strong>Results: </strong>Within our training set, we identified an optimal cluster solution based on four ANC recovery clusters, which were labeled as very good, good, poor, and very poor. We trained a random forest (RF) model including the top five most important features (day +3, +4, +5, +26, and +27 ANC values) to predict the cluster assignments. Within our test set, we applied the RF model to predict cluster assignments. Compared with the eICAHT criteria, the RF-predicted clusters were more compact and better separated (Dunn index: 0.078 <i>v</i> 0.034; average silhouette width: 0.12 <i>v</i> 0.010). In addition, the RF model identified patients in the good recovery cluster with intermediate overall survival (hazard ratio [HR], 1.70 [95% CI, 1.05 to 2.74]; <i>P</i> = .029; reference, very good), which was not captured by grade 2 eICAHT (HR, 1.37 [95% CI, 0.80 to 2.35]; <i>P</i> = .25; reference, grade 0-1).</p><p><strong>Conclusion: </strong>Unsupervised time-series clustering identified distinct and clinically relevant patterns of hematotoxicity after CAR T-cell therapy. We trained and tested an RF model that accurately predicted cluster assignments using only five features. Predictions can be generated using our online web application.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500148"},"PeriodicalIF":2.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12810859/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145985901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2026-01-21DOI: 10.1200/CCI-25-00207
Jyoti Malhotra, Shilpa Viswanathan, Shivani K Mhatre, Inderjit K Dhillon, Riddhi Patel, Nicole Yohn, Furaha Kariburyo-Yay, Xinye Li, Biagio Ricciuti
Purpose: This study examined real-world overall survival (rwOS) in patients with advanced or metastatic non-small cell lung cancer (a/mNSCLC) treated with combination immunotherapy (IO) and platinum chemotherapy in first line (1L), followed by second-line or beyond (2L+) non-IO, nonplatinum chemotherapy and explored the association between real-world duration of response (rwDOR) to 1L treatment and rwOS on the 2L+ treatment.
Methods: This study used two US-based data sets: ConcertAI Patient360 NSCLC data set (ConcertAI) and the Flatiron Health Research Database (FHRD), and included adults with a/mNSCLC diagnosed from January 1, 2018, to March 31, 2023 (data cutoff: March 31, 2024). Kaplan-Meier and multivariate Cox regression analyses estimated rwOS for the index regimen by rwDOR to 1L.
Results: Patients with rwDOR ≤6 (≈60%) v >6 months (≈40%) in 1L were similar across the 596 ConcertAI patients and 1,094 FHRD patients. Across the ConcertAI data set/FHRD, 52.6%/55.7% of patients achieved complete/partial response as real-world best overall response to 1L combination IO and platinum chemotherapy and 17.8%/19.1% had stable disease. The median rwOS on 2L+ treatment was 8.3 v 5.2 months (P = .001; ConcertAI) and 8.3 v 5.1 months (P < .001; FHRD) for patients with 1L rwDOR >6 v ≤6 months. The adjusted hazard ratio for patients with 1L rwDOR >6 v ≤6 months was 0.74 (95% CI, 0.61 to 0.90; P = .002) and 0.76 (95% CI, 0.67 to 0.88; P < .001) in the ConcertAI data set and FHRD, respectively.
Conclusion: Our findings demonstrate that patients with rwDOR ≥6 months on 1L combination IO and platinum chemotherapy exhibit longer rwOS on subsequent treatments. This emphasizes the need for 1L treatments that extend DOR and delay the onset of acquired resistance, which remains an unmet need for approximately 60% of patients who do not achieve a sustained response in clinical practice.
目的:本研究考察了在一线(1L)联合免疫治疗(IO)和铂类化疗后,二线或二线以上(2L+)非IO、非铂类化疗的晚期或转移性非小细胞肺癌(a/mNSCLC)患者的真实总生存期(rwOS),并探讨了1L治疗的真实反应时间(rwDOR)与2L+治疗的rwOS之间的关系。方法:本研究使用了两个基于美国的数据集:ConcertAI Patient360 NSCLC数据集(ConcertAI)和Flatiron健康研究数据库(FHRD),并纳入了2018年1月1日至2023年3月31日诊断为a/mNSCLC的成年人(数据截止日期:2024年3月31日)。Kaplan-Meier和多变量Cox回归分析以rwDOR为1L估计指标方案的rwOS。结果:596例ConcertAI患者和1094例FHRD患者在1L中rwDOR≤6(≈60%)v >6个月(≈40%)的患者相似。在ConcertAI数据集/FHRD中,52.6%/55.7%的患者对1L IO联合铂化疗达到完全/部分缓解,达到真实世界最佳总体缓解,17.8%/19.1%的患者病情稳定。2L+治疗的中位rwOS为8.3 v 5.2个月(P = 0.001; ConcertAI), 1L rwDOR≤6个月的患者中位rwOS为8.3 v 5.1个月(P < 0.001; FHRD)。在ConcertAI数据集和FHRD中,1L rwDOR≤6个月患者的校正危险比分别为0.74 (95% CI, 0.61 ~ 0.90, P = 0.002)和0.76 (95% CI, 0.67 ~ 0.88, P < 0.001)。结论:我们的研究结果表明,rwDOR≥6个月的1L IO联合铂化疗患者在后续治疗中表现出更长的rwOS。这强调了l治疗的必要性,延长DOR和延迟获得性耐药的发生,对于在临床实践中没有实现持续反应的大约60%的患者来说,这仍然是一个未满足的需求。
{"title":"Impact of Real-World Response to First-Line Immunotherapy and Chemotherapy on Subsequent Treatment Outcomes in Patients With Advanced or Metastatic Non-Small Cell Lung Cancer.","authors":"Jyoti Malhotra, Shilpa Viswanathan, Shivani K Mhatre, Inderjit K Dhillon, Riddhi Patel, Nicole Yohn, Furaha Kariburyo-Yay, Xinye Li, Biagio Ricciuti","doi":"10.1200/CCI-25-00207","DOIUrl":"https://doi.org/10.1200/CCI-25-00207","url":null,"abstract":"<p><strong>Purpose: </strong>This study examined real-world overall survival (rwOS) in patients with advanced or metastatic non-small cell lung cancer (a/mNSCLC) treated with combination immunotherapy (IO) and platinum chemotherapy in first line (1L), followed by second-line or beyond (2L+) non-IO, nonplatinum chemotherapy and explored the association between real-world duration of response (rwDOR) to 1L treatment and rwOS on the 2L+ treatment.</p><p><strong>Methods: </strong>This study used two US-based data sets: ConcertAI Patient360 NSCLC data set (ConcertAI) and the Flatiron Health Research Database (FHRD), and included adults with a/mNSCLC diagnosed from January 1, 2018, to March 31, 2023 (data cutoff: March 31, 2024). Kaplan-Meier and multivariate Cox regression analyses estimated rwOS for the index regimen by rwDOR to 1L.</p><p><strong>Results: </strong>Patients with rwDOR ≤6 (≈60%) <i>v</i> >6 months (≈40%) in 1L were similar across the 596 ConcertAI patients and 1,094 FHRD patients. Across the ConcertAI data set/FHRD, 52.6%/55.7% of patients achieved complete/partial response as real-world best overall response to 1L combination IO and platinum chemotherapy and 17.8%/19.1% had stable disease. The median rwOS on 2L+ treatment was 8.3 <i>v</i> 5.2 months (<i>P</i> = .001; ConcertAI) and 8.3 <i>v</i> 5.1 months (<i>P</i> < .001; FHRD) for patients with 1L rwDOR >6 <i>v</i> ≤6 months. The adjusted hazard ratio for patients with 1L rwDOR >6 <i>v</i> ≤6 months was 0.74 (95% CI, 0.61 to 0.90; <i>P</i> = .002) and 0.76 (95% CI, 0.67 to 0.88; <i>P</i> < .001) in the ConcertAI data set and FHRD, respectively.</p><p><strong>Conclusion: </strong>Our findings demonstrate that patients with rwDOR ≥6 months on 1L combination IO and platinum chemotherapy exhibit longer rwOS on subsequent treatments. This emphasizes the need for 1L treatments that extend DOR and delay the onset of acquired resistance, which remains an unmet need for approximately 60% of patients who do not achieve a sustained response in clinical practice.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500207"},"PeriodicalIF":2.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146020548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2026-01-09DOI: 10.1200/CCI-24-00265
Kaleem S Ahmed, Clayton T Marcinak, Muhammad Maisam Ali, Sheriff M Issaka, Yonghe Yan, Gabriel McMahan, Thomas Callaci, Noelle K LoConte, Andrea Shiefelbein, Sharon Weber, Majid Afshar, Matthew M Churpek, Jomol Mathew, Syed Nabeel Zafar
Purpose: Clinical research in pancreatic cancer (PC) has been limited because of a lack of granular data in national data sets. An electronic health record (EHR)-based data set specifically designed for PC has immense potential to advance research. This study describes the creation of an EHR-based data commons for patients with PC.
Methods: We generated an index cohort of adult patients at our institution diagnosed with PC (International Classification of Diseases for Oncology, codes C25.0-25.9) between January 1, 2010, and December 31, 2023. To develop the Pancreatic Cancer Data Commons (PCDC), we linked six data sources: (1) institutional EHR data, (2) cancer-specific data from the North American Association of Central Cancer Registries, (3) surgical outcomes from the National Surgical Quality Improvement Program, (4) community-level data from the American Community Survey, (5) national mortality data from Obituary.com, and (6) genomic data from the cBioPortal for Cancer Genomics. We evaluated the feasibility of using the Observational Medical Outcomes Partnership common data model. The data set is stored on a cloud-based, Health Insurance Portability and Accountability Act-secure, and National Institute of Standards and Technology-compliant server.
Results: The PCDC currently includes data of 3,542 unique patients. The mean age at diagnosis is 66.6 ± 11.7 years; 53.3% is male, and 92.2% is White. Linkage to six national data sets increased the completeness of cancer-specific data from 31.3% to 71.6%. Most patients presented at stage IV (43.6%), followed by stage I (22.6%). As of the latest update, 1,074 (30.3%) patients were still alive.
Conclusion: The PCDC is a centralized resource that solves a gap in PC research. The ability to securely link and analyze protected patient data is a strategic step toward enhancing clinical research and optimizing care for patients with PC. Our future work includes expanding the PCDC to multiple centers using common data models.
{"title":"Novel Electronic Health Record-Based Data Commons for Pancreatic Cancer.","authors":"Kaleem S Ahmed, Clayton T Marcinak, Muhammad Maisam Ali, Sheriff M Issaka, Yonghe Yan, Gabriel McMahan, Thomas Callaci, Noelle K LoConte, Andrea Shiefelbein, Sharon Weber, Majid Afshar, Matthew M Churpek, Jomol Mathew, Syed Nabeel Zafar","doi":"10.1200/CCI-24-00265","DOIUrl":"10.1200/CCI-24-00265","url":null,"abstract":"<p><strong>Purpose: </strong>Clinical research in pancreatic cancer (PC) has been limited because of a lack of granular data in national data sets. An electronic health record (EHR)-based data set specifically designed for PC has immense potential to advance research. This study describes the creation of an EHR-based data commons for patients with PC.</p><p><strong>Methods: </strong>We generated an index cohort of adult patients at our institution diagnosed with PC (International Classification of Diseases for Oncology, codes C25.0-25.9) between January 1, 2010, and December 31, 2023. To develop the Pancreatic Cancer Data Commons (PCDC), we linked six data sources: (1) institutional EHR data, (2) cancer-specific data from the North American Association of Central Cancer Registries, (3) surgical outcomes from the National Surgical Quality Improvement Program, (4) community-level data from the American Community Survey, (5) national mortality data from Obituary.com, and (6) genomic data from the cBioPortal for Cancer Genomics. We evaluated the feasibility of using the Observational Medical Outcomes Partnership common data model. The data set is stored on a cloud-based, Health Insurance Portability and Accountability Act-secure, and National Institute of Standards and Technology-compliant server.</p><p><strong>Results: </strong>The PCDC currently includes data of 3,542 unique patients. The mean age at diagnosis is 66.6 ± 11.7 years; 53.3% is male, and 92.2% is White. Linkage to six national data sets increased the completeness of cancer-specific data from 31.3% to 71.6%. Most patients presented at stage IV (43.6%), followed by stage I (22.6%). As of the latest update, 1,074 (30.3%) patients were still alive.</p><p><strong>Conclusion: </strong>The PCDC is a centralized resource that solves a gap in PC research. The ability to securely link and analyze protected patient data is a strategic step toward enhancing clinical research and optimizing care for patients with PC. Our future work includes expanding the PCDC to multiple centers using common data models.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2400265"},"PeriodicalIF":2.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12795310/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145946747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose: The Pediatric Relapse Prediction and Risk Evaluation for Acute Lymphoblastic Leukemia (PREPARE-ALL) tool aims to predict relapse in pediatric ALL by integrating clinical expertise with artificial intelligence and machine learning (ML), particularly Extreme Gradient Boosting (XGBoost). PREPARE-ALL demonstrates that multicenter, protocol-driven clinical and laboratory data can be used through ML to generate reproducible relapse predictions with greater sensitivity than individual clinician assessments.
Methods: PREPARE-ALL was developed using data from the ICiCLe ALL-14 pretrial cohort across five centers, incorporating 33 clinical and laboratory features.
Results: Among 2,252 patients enrolled in the study, 565 (25.1%) relapsed. Using an 80:20 train-test split, XGBoost achieved a sensitivity of 68.5% (245/447 relapses detected). Additional metrics included a positive predictive value of 31.3%, a negative predictive value of 82.8%, an accuracy of 54.8%, and a specificity of 50.3%. Key predictors of relapse included high hyperdiploidy and BCR-ABL1 fusion positive, positive measurable residual disease status at the end of induction, sex, age, highest presenting WBC, and final risk group. Three clinicians scored the validation data set; the developed model achieved a higher recall (68.5%) compared with clinical judgment (approximately 31%-36%).
Conclusion: PREPARE-ALL identifies twice as many relapses as clinicians and serves as a practical decision-support tool for early relapse triage and treatment planning, enabling timely therapeutic adjustments and improved outcomes in pediatric ALL.
{"title":"PREPARE ALL: An Artificial Intelligence Tool for Predicting Relapse in Children With Acute Lymphoblastic Leukemia.","authors":"Subikksha Saravanan, Raghunathan Rengaswamy, Gaurav Narula, Sameer Bakhshi, Rachna Seth, Nandana Das, Manash Pratim Gogoi, Shripad Banavali, Prasanth Srinivasan, Gargi Das, T K Balaji, Shekar Krishnan, Vaskar Saha, Vijayalakshmi Ramshankar, Venkatraman Radhakrishnan","doi":"10.1200/CCI-25-00222","DOIUrl":"10.1200/CCI-25-00222","url":null,"abstract":"<p><strong>Purpose: </strong>The Pediatric Relapse Prediction and Risk Evaluation for Acute Lymphoblastic Leukemia (PREPARE-ALL) tool aims to predict relapse in pediatric ALL by integrating clinical expertise with artificial intelligence and machine learning (ML), particularly Extreme Gradient Boosting (XGBoost). PREPARE-ALL demonstrates that multicenter, protocol-driven clinical and laboratory data can be used through ML to generate reproducible relapse predictions with greater sensitivity than individual clinician assessments.</p><p><strong>Methods: </strong>PREPARE-ALL was developed using data from the ICiCLe ALL-14 pretrial cohort across five centers, incorporating 33 clinical and laboratory features.</p><p><strong>Results: </strong>Among 2,252 patients enrolled in the study, 565 (25.1%) relapsed. Using an 80:20 train-test split, XGBoost achieved a sensitivity of 68.5% (245/447 relapses detected). Additional metrics included a positive predictive value of 31.3%, a negative predictive value of 82.8%, an accuracy of 54.8%, and a specificity of 50.3%. Key predictors of relapse included high hyperdiploidy and BCR-ABL1 fusion positive, positive measurable residual disease status at the end of induction, sex, age, highest presenting WBC, and final risk group. Three clinicians scored the validation data set; the developed model achieved a higher recall (68.5%) compared with clinical judgment (approximately 31%-36%).</p><p><strong>Conclusion: </strong>PREPARE-ALL identifies twice as many relapses as clinicians and serves as a practical decision-support tool for early relapse triage and treatment planning, enabling timely therapeutic adjustments and improved outcomes in pediatric ALL.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500222"},"PeriodicalIF":2.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146020517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}