Pub Date : 2026-03-01Epub Date: 2026-03-03DOI: 10.1200/CCI-25-00173
Suvd Zulbayar, Jennifer Brooks, Arian Aminoleslami, Shana Kim, Renzo Jose Carlos Calderon Anyosa, Feifan Xiang, Julia Knight, Andrea Eisen, Geoffrey M Anderson
Purpose: Early-stage breast cancer (ESBC) in women younger than 50 years often presents with tumor features, including grade and hormone receptor and human epidermal growth factor receptor 2 (HER2) status different from older women. Machine learning clustering techniques can reveal underlying patterns in the inter-relationships of these features and provide novel insights to inform and guide decision making by patients and providers.
Methods: Partitioning around medoids (PAM) was applied to SEER data from 67,746 women age 18-49 years diagnosed with ESBC. PAM clustering based on tumor size (T), nodal status (N), grade, and receptor status identified 10 distinct clusters. The PAM clusters and American Joint Committee on Cancer (AJCC) anatomic and prognostic stages were compared in terms of their tumor features and their association with chemotherapy and survival.
Results: AJCC anatomic and prognostic stages are primarily defined by T and N. PAM clusters were primarily defined by receptor status and grade. PAM clusters align closely with luminal A, luminal B, triple-negative, or HER2-overexpressing treatment-related subtypes. PAM clusters better discriminated chemotherapy treatment, with C-statistic 0.839 (95% CI, 0.836 to 0.842), than either anatomic, with C-statistic 0.770 (95% CI, 0.767 to 0.773), or prognostic staging, with C-statistic 0.796 (95% CI, 0.794 to 0.800). PAM clusters were better predictors of 5-year overall survival, with C-statistic 0.733 (95% CI, 0.727 to 0.739), than anatomic stages, with C-statistic 0.721 (95% CI, 0.715 to 728), but not as predictive as prognostic stages, with C-statistic 0.759 (95% CI, 0.753 to 0.764).
Conclusion: Data-driven PAM clusters provide novel insights into the inter-relationship of tumor features and their association with hormonal, targeted, and chemotherapy treatment and with survival outcomes in women younger than 50 years with ESBC. An online application was created so that the PAM clusters could be used as alternatives or in addition to traditional AJCC staging to inform and guide patients and providers.
{"title":"Early-Stage Breast Cancer in Women Younger Than 50 Years: Comparing American Joint Committee on Cancer Anatomic and Prognostic Stages With Partitioning Around Medoids Clusters in SEER Data.","authors":"Suvd Zulbayar, Jennifer Brooks, Arian Aminoleslami, Shana Kim, Renzo Jose Carlos Calderon Anyosa, Feifan Xiang, Julia Knight, Andrea Eisen, Geoffrey M Anderson","doi":"10.1200/CCI-25-00173","DOIUrl":"10.1200/CCI-25-00173","url":null,"abstract":"<p><strong>Purpose: </strong>Early-stage breast cancer (ESBC) in women younger than 50 years often presents with tumor features, including grade and hormone receptor and human epidermal growth factor receptor 2 (HER2) status different from older women. Machine learning clustering techniques can reveal underlying patterns in the inter-relationships of these features and provide novel insights to inform and guide decision making by patients and providers.</p><p><strong>Methods: </strong>Partitioning around medoids (PAM) was applied to SEER data from 67,746 women age 18-49 years diagnosed with ESBC. PAM clustering based on tumor size (T), nodal status (N), grade, and receptor status identified 10 distinct clusters. The PAM clusters and American Joint Committee on Cancer (AJCC) anatomic and prognostic stages were compared in terms of their tumor features and their association with chemotherapy and survival.</p><p><strong>Results: </strong>AJCC anatomic and prognostic stages are primarily defined by T and N. PAM clusters were primarily defined by receptor status and grade. PAM clusters align closely with luminal A, luminal B, triple-negative, or HER2-overexpressing treatment-related subtypes. PAM clusters better discriminated chemotherapy treatment, with C-statistic 0.839 (95% CI, 0.836 to 0.842), than either anatomic, with C-statistic 0.770 (95% CI, 0.767 to 0.773), or prognostic staging, with C-statistic 0.796 (95% CI, 0.794 to 0.800). PAM clusters were better predictors of 5-year overall survival, with C-statistic 0.733 (95% CI, 0.727 to 0.739), than anatomic stages, with C-statistic 0.721 (95% CI, 0.715 to 728), but not as predictive as prognostic stages, with C-statistic 0.759 (95% CI, 0.753 to 0.764).</p><p><strong>Conclusion: </strong>Data-driven PAM clusters provide novel insights into the inter-relationship of tumor features and their association with hormonal, targeted, and chemotherapy treatment and with survival outcomes in women younger than 50 years with ESBC. An online application was created so that the PAM clusters could be used as alternatives or in addition to traditional AJCC staging to inform and guide patients and providers.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500173"},"PeriodicalIF":2.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12959580/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147349744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose: Comprehensive genomic profiling (CGP) is a key strategy in precision medicine for lung cancer, yet its clinical implementation remains limited, partly because of the uncertainty in identifying druggable mutations in individual patients. In this study, we investigated the potential of an artificial intelligence (AI)-based tool to predict the probability of identifying druggable mutations before CGP (pretest probability).
Methods: We developed an eXtreme Gradient Boosting (XGBoost) prediction model trained on pre-CGP clinical variables from 3,470 patients with lung cancer (June 2019-November 2023) to estimate the probability of identifying druggable mutations. The key predictors were identified using explainable artificial intelligence (XAI) analysis. The refined model was deployed as a web application and evaluated in a temporally independent test cohort of 1,307 patients (December 2023-November 2024), with Brier score as the primary end point.
Results: The prediction model achieved an area under the receiver operating characteristic curve (AUROC) of 0.85 (95% CI, 0.82 to 0.89) in the overall validation cohort and 0.79 (95% CI, 0.74 to 0.84) in patients for whom a driver mutation had not been identified through companion diagnostic testing. The XAI analysis identified sex, smoking history, histology, and metastatic sites as important predictors. Even among patients who underwent tissue CGP, bone (P = .011) and lung (P < .001) metastases were significantly associated with a higher druggable mutation detection rate. The deployed model achieved Brier scores of 0.19 in the overall independent test cohort and 0.16 in patients for whom a driver mutation had not been identified through companion diagnostic testing.
Conclusion: These findings indicate that an AI-based tool using pre-CGP clinical data may support broader CGP implementation and improve access to targeted therapies.
{"title":"Modeling the Pretest Probability of Identifying Druggable Mutations in Lung Cancer Using Nationwide Comprehensive Genomic Profiling Data.","authors":"Hiroaki Ikushima, Kousuke Watanabe, Aya Shinozaki-Ushiku, Satoshi Kodera, Norihiko Takeda, Katsutoshi Oda, Hidenori Kage","doi":"10.1200/CCI-25-00269","DOIUrl":"https://doi.org/10.1200/CCI-25-00269","url":null,"abstract":"<p><strong>Purpose: </strong>Comprehensive genomic profiling (CGP) is a key strategy in precision medicine for lung cancer, yet its clinical implementation remains limited, partly because of the uncertainty in identifying druggable mutations in individual patients. In this study, we investigated the potential of an artificial intelligence (AI)-based tool to predict the probability of identifying druggable mutations before CGP (pretest probability).</p><p><strong>Methods: </strong>We developed an eXtreme Gradient Boosting (XGBoost) prediction model trained on pre-CGP clinical variables from 3,470 patients with lung cancer (June 2019-November 2023) to estimate the probability of identifying druggable mutations. The key predictors were identified using explainable artificial intelligence (XAI) analysis. The refined model was deployed as a web application and evaluated in a temporally independent test cohort of 1,307 patients (December 2023-November 2024), with Brier score as the primary end point.</p><p><strong>Results: </strong>The prediction model achieved an area under the receiver operating characteristic curve (AUROC) of 0.85 (95% CI, 0.82 to 0.89) in the overall validation cohort and 0.79 (95% CI, 0.74 to 0.84) in patients for whom a driver mutation had not been identified through companion diagnostic testing. The XAI analysis identified sex, smoking history, histology, and metastatic sites as important predictors. Even among patients who underwent tissue CGP, bone (<i>P</i> = .011) and lung (<i>P</i> < .001) metastases were significantly associated with a higher druggable mutation detection rate. The deployed model achieved Brier scores of 0.19 in the overall independent test cohort and 0.16 in patients for whom a driver mutation had not been identified through companion diagnostic testing.</p><p><strong>Conclusion: </strong>These findings indicate that an AI-based tool using pre-CGP clinical data may support broader CGP implementation and improve access to targeted therapies.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500269"},"PeriodicalIF":2.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147488321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-03-16DOI: 10.1200/CCI-25-00252
Maria S Rosito, Aleck E Cervantes, Christine Hong, Joseph D Bonner, Bita Nehoray, Alison Schwartz Levine, Danna Rosenberg, Isabel Anez-Bruzual, Giovanni Parmigiani, Christopher I Amos, Judy E Garber, Stephen B Gruber, Danielle Braun
Purpose: Studying rare genetic conditions often requires multicenter research to gather sufficient data. However, data from multiple institutions may include relatives from the same family enrolled at different sites, increasing the likelihood of duplicate records. This issue is compounded by the use of deidentified data, which limits the direct linkage through personal identifiers. These redundancies can bias family-based genetic studies underscoring the need for robust methods for pedigree deduplication. We propose an interpretable, active learning-based approach to efficiently identify duplicate records in genetic studies, with specific application to families with TP53 mutations in the Li-Fraumeni and TP53: Understanding and Progress (LiFT UP) study.
Materials and methods: Our approach combines heuristic labeling with graph-based features and a machine learning model to iteratively refine duplicate detection. We first generate a partially labeled data set leveraging mutation variant diversity and family characteristics. A random forest classifier is then trained to predict duplicate pairs, with active learning guiding iterative refinement. This method is applied to real-world pedigree data from the LiFT UP study to assess its effectiveness in a multicenter setting.
Results: Our method labeled pedigree pairs in data from the LiFT UP study with a high degree of automation, achieving 99.95% automated processing in the deduplication workflow. By prioritizing likely duplicates for human review, it minimized manual effort while aiming for high specificity. This automated approach avoids dependence on rule-based filters, such as identifier matching, which ultimately require manual confirmation, offering a more scalable solution for improving data quality in risk estimation.
Conclusion: Interpretable active learning provides an effective solution for pedigree deduplication. Future work will explore refinements in identifying potential duplicates and evaluate its generalizability across other genetic data sets.
{"title":"Interpretable Active Learning for Pedigree Data Deduplication in Cancer Genetics.","authors":"Maria S Rosito, Aleck E Cervantes, Christine Hong, Joseph D Bonner, Bita Nehoray, Alison Schwartz Levine, Danna Rosenberg, Isabel Anez-Bruzual, Giovanni Parmigiani, Christopher I Amos, Judy E Garber, Stephen B Gruber, Danielle Braun","doi":"10.1200/CCI-25-00252","DOIUrl":"10.1200/CCI-25-00252","url":null,"abstract":"<p><strong>Purpose: </strong>Studying rare genetic conditions often requires multicenter research to gather sufficient data. However, data from multiple institutions may include relatives from the same family enrolled at different sites, increasing the likelihood of duplicate records. This issue is compounded by the use of deidentified data, which limits the direct linkage through personal identifiers. These redundancies can bias family-based genetic studies underscoring the need for robust methods for pedigree deduplication. We propose an interpretable, active learning-based approach to efficiently identify duplicate records in genetic studies, with specific application to families with <i>TP53</i> mutations in the Li-Fraumeni and <i>TP53</i>: Understanding and Progress (LiFT UP) study.</p><p><strong>Materials and methods: </strong>Our approach combines heuristic labeling with graph-based features and a machine learning model to iteratively refine duplicate detection. We first generate a partially labeled data set leveraging mutation variant diversity and family characteristics. A random forest classifier is then trained to predict duplicate pairs, with active learning guiding iterative refinement. This method is applied to real-world pedigree data from the LiFT UP study to assess its effectiveness in a multicenter setting.</p><p><strong>Results: </strong>Our method labeled pedigree pairs in data from the LiFT UP study with a high degree of automation, achieving 99.95% automated processing in the deduplication workflow. By prioritizing likely duplicates for human review, it minimized manual effort while aiming for high specificity. This automated approach avoids dependence on rule-based filters, such as identifier matching, which ultimately require manual confirmation, offering a more scalable solution for improving data quality in risk estimation.</p><p><strong>Conclusion: </strong>Interpretable active learning provides an effective solution for pedigree deduplication. Future work will explore refinements in identifying potential duplicates and evaluate its generalizability across other genetic data sets.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500252"},"PeriodicalIF":2.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13001896/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147470191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-03-25DOI: 10.1200/CCI-25-00171
Uriel Kim, Siran Koroukian, Johnie Rose
Purpose: Population-based cancer registries are a key data resource for catchment area informatics, but their utility for quantifying differences in cancer burden by socioeconomic status is limited. Here, we describe an approach that estimates cancer incidence along income gradients, leveraging a newly validated method called weighting by income probabilities (WIP).
Methods: We estimated income-specific colorectal cancer incidence, stratified by sex and race/ethnicity, in a catchment area (Ohio) as a case study. Income-specific numerator data (number of cancer cases per income bracket) were estimated using WIP, whereas denominators (population at risk by income bracket) were derived from US Census data.
Results: In the case study of the 52,257 patients with invasive colorectal cancer diagnosed in the catchment area of Ohio between 2010 and 2019, lower income was generally associated with higher incidence rates, except in non-Hispanic (NH) White female individuals. The highest incidence was observed in NH Black male individuals at 0-149% of the Federal Poverty Level, with 113.7 cases per 100,000 (95% CI, 99.6 to 129.3) in 2010-2012, compared with 57.8 (95% CI, 54.7 to 61.2) in their NH White counterparts. Sensitivity analyses showed that income-specific incidence statistics were robust to sources of error in numerator and denominator estimation, with incidence estimates varying by no more than 1.98% from the reference estimates.
Conclusion: The approach described here accurately estimates cancer incidence along income gradients and can be expanded to estimate income-specific survival and mortality. The case study of colorectal cancer in Ohio demonstrates important insights into the burden of cancer by income. These granular income-specific data can enhance our understanding of the relationship between cancer burden and socioeconomic status and inform cancer surveillance, prevention, and control efforts.
{"title":"Weighting by Income Probabilities as a Novel Approach to Quantifying Differences in the Burden of Cancer by Income: A Case Study of Colorectal Cancer in Ohio.","authors":"Uriel Kim, Siran Koroukian, Johnie Rose","doi":"10.1200/CCI-25-00171","DOIUrl":"https://doi.org/10.1200/CCI-25-00171","url":null,"abstract":"<p><strong>Purpose: </strong>Population-based cancer registries are a key data resource for catchment area informatics, but their utility for quantifying differences in cancer burden by socioeconomic status is limited. Here, we describe an approach that estimates cancer incidence along income gradients, leveraging a newly validated method called weighting by income probabilities (WIP).</p><p><strong>Methods: </strong>We estimated income-specific colorectal cancer incidence, stratified by sex and race/ethnicity, in a catchment area (Ohio) as a case study. Income-specific numerator data (number of cancer cases per income bracket) were estimated using WIP, whereas denominators (population at risk by income bracket) were derived from US Census data.</p><p><strong>Results: </strong>In the case study of the 52,257 patients with invasive colorectal cancer diagnosed in the catchment area of Ohio between 2010 and 2019, lower income was generally associated with higher incidence rates, except in non-Hispanic (NH) White female individuals. The highest incidence was observed in NH Black male individuals at 0-149% of the Federal Poverty Level, with 113.7 cases per 100,000 (95% CI, 99.6 to 129.3) in 2010-2012, compared with 57.8 (95% CI, 54.7 to 61.2) in their NH White counterparts. Sensitivity analyses showed that income-specific incidence statistics were robust to sources of error in numerator and denominator estimation, with incidence estimates varying by no more than 1.98% from the reference estimates.</p><p><strong>Conclusion: </strong>The approach described here accurately estimates cancer incidence along income gradients and can be expanded to estimate income-specific survival and mortality. The case study of colorectal cancer in Ohio demonstrates important insights into the burden of cancer by income. These granular income-specific data can enhance our understanding of the relationship between cancer burden and socioeconomic status and inform cancer surveillance, prevention, and control efforts.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500171"},"PeriodicalIF":2.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147516764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-03-16DOI: 10.1200/CCI-25-00215
Melissa Estevez, Nisha Singh, Lauren Dyson, Blythe Adamson, Konstantin Krismer, Kelly Magee, Qianyu Yuan, Megan W Hildner, Erin Fidyk, Tori Williams, Olive Mbah, Farhad Khan, Kathi Seidl-Rathkopf, Aaron B Cohen
Large language models (LLMs) are increasingly used to extract clinical data from electronic health records, offering significant improvements in scalability and efficiency for real-world data (RWD) curation in oncology. However, the adoption of LLMs introduces new challenges in ensuring the reliability, accuracy, and fairness of extracted data, which are essential for research, regulatory, and clinical applications. Existing quality assurance frameworks for RWD and artificial intelligence (AI) do not fully address the unique error modes and complexities associated with LLM-extracted data. In this paper, we propose a comprehensive framework for evaluating the quality of clinical data extracted by LLMs. The framework integrates variable-level performance benchmarking against expert human abstraction, verification checks for internal consistency and plausibility, and replication analyses comparing LLM-extracted data to human-abstracted data sets or external standards. This multidimensional approach enables the identification of variables most in need of improvement, systematic detection of latent errors, and confirmation of data set fitness-for-purpose in real-world research. Additionally, the framework supports bias assessment by stratifying across demographic subgroups. By providing a rigorous and transparent method for assessing LLM-extracted RWD, this framework advances industry standards and supports the trustworthy use of AI-powered evidence generation in oncology research and practice.
{"title":"Ensuring Reliability of Curated Electronic Health Record-Derived Data: The Validation of Accuracy for Large Language Model-/Machine Learning-Extracted Information and Data (VALID) Framework.","authors":"Melissa Estevez, Nisha Singh, Lauren Dyson, Blythe Adamson, Konstantin Krismer, Kelly Magee, Qianyu Yuan, Megan W Hildner, Erin Fidyk, Tori Williams, Olive Mbah, Farhad Khan, Kathi Seidl-Rathkopf, Aaron B Cohen","doi":"10.1200/CCI-25-00215","DOIUrl":"10.1200/CCI-25-00215","url":null,"abstract":"<p><p>Large language models (LLMs) are increasingly used to extract clinical data from electronic health records, offering significant improvements in scalability and efficiency for real-world data (RWD) curation in oncology. However, the adoption of LLMs introduces new challenges in ensuring the reliability, accuracy, and fairness of extracted data, which are essential for research, regulatory, and clinical applications. Existing quality assurance frameworks for RWD and artificial intelligence (AI) do not fully address the unique error modes and complexities associated with LLM-extracted data. In this paper, we propose a comprehensive framework for evaluating the quality of clinical data extracted by LLMs. The framework integrates variable-level performance benchmarking against expert human abstraction, verification checks for internal consistency and plausibility, and replication analyses comparing LLM-extracted data to human-abstracted data sets or external standards. This multidimensional approach enables the identification of variables most in need of improvement, systematic detection of latent errors, and confirmation of data set fitness-for-purpose in real-world research. Additionally, the framework supports bias assessment by stratifying across demographic subgroups. By providing a rigorous and transparent method for assessing LLM-extracted RWD, this framework advances industry standards and supports the trustworthy use of AI-powered evidence generation in oncology research and practice.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500215"},"PeriodicalIF":2.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13001894/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147470196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-03-06DOI: 10.1200/CCI-25-00230
Roberto Buonaiuto, Aldo Caltavituro, Rossana Di Rienzo, Angela Grieco, Federica P Mangiacotti, Alessandra Longobardi, Vincenza Cantile, Vittoria Molinaro, Martina Pagliuca, Giuseppe Buono, Pietro De Placido, Erica Pietroluongo, Valeria Forestieri, Claudia Martinelli, Vincenzo di Lauro, Luigi Leo, Massimiliano D'Aiuto, Giampaolo Bianchini, Carmen Criscitiello, Roberto Bianco, Lucia Del Mastro, Michelino De Laurentiis, Grazia Arpino, Carmine De Angelis, Mario Giuliano
Purpose: To assess the ability of GPT-4o in adjuvant treatment decision making in hormone receptor-positive (HR+)/human epidermal growth factor receptor 2-negative (HER2-) early breast cancer by comparing its recommendations with those of clinicians including Oncotype DX data, and to explore its potential as a decision-support tool in routine clinical practice.
Methods: We compared clinician and GPT-4o recommendations in patients tested with Oncotype DX in routine practice at the University of Naples Federico II (n = 607, cohort 1 [C1]) and within the prospective, multicenter PRO BONO study (n = 237, cohort 2 [C2]). Pre- and post-Oncotype DX treatment recommendations were categorized as chemotherapy (CT) + endocrine therapy (ET) or ET alone. Concordance between clinician and GPT-4o recommendations was assessed using agreement rates and Cohen's kappa. The accuracy of Oncotype DX results was evaluated using the AUC metric.
Results: The agreement between clinicians and GPT-4o in pretest recommendations was 68% (kappa, 0.381 [95% CI, 0.31 to 0.45], P < .001) in C1 and 70% (0.401 [95% CI, 0.29 to 0.52], P < .001) in C2. Before Oncotype DX, clinicians recommended CT more frequently than GPT-4o for C1 (58% v 38%) and C2 (53% v 43%). Post-test agreement increased to 93% (0.814 [95% CI, 0.76 to 0.87], P < .001) in C1 and 90% (0.741 [95% CI, 0.64 to 0.84], P < .001) in C2. The agreement between pre- and post-Oncotype DX treatment recommendations for clinicians was 56% and 63% versus 68% and 60% for GPT-4o in C1 and C2, respectively. GPT-4o showed higher accuracy in predicting low than high genomic risk in postmenopausal patients (87% v 43% in C1; 85% v 45% in C2, P < .001) and low versus intermediate and high risk in premenopausal patients in both cohorts (P < .001).
Conclusion: The agreement between clinicians and GPT-4o in pretest recommendations was modest but improved post-test, highlighting the importance of multigene testing and the potential of large language models in clinical decision making.
目的:通过比较gpt - 40在激素受体阳性(HR+)/人表皮生长因子受体2阴性(HER2-)早期乳腺癌辅助治疗决策中的推荐值与临床医生的推荐值(包括Oncotype DX数据),评估gpt - 40在辅助治疗决策中的能力,并探讨其作为常规临床实践决策支持工具的潜力。方法:我们比较了那不勒斯费德里科大学(n = 607,队列1 [C1])和前瞻性多中心PRO - BONO研究(n = 237,队列2 [C2])在常规实践中检测Oncotype DX患者的临床医生和gpt - 40建议。oncotype DX术前和术后的治疗建议分为化疗(CT) +内分泌治疗(ET)或单独ET。临床医生和gpt - 40建议之间的一致性评估使用协议率和科恩kappa。使用AUC指标评估Oncotype DX结果的准确性。结果:临床医生和gpt - 40在检测前推荐方面的一致性在C1组为68% (kappa, 0.381 [95% CI, 0.31 ~ 0.45], P < .001),在C2组为70% (0.401 [95% CI, 0.29 ~ 0.52], P < .001)。在Oncotype DX之前,临床医生推荐CT检查C1 (58% vs 38%)和C2 (53% vs 43%)的频率高于gpt - 40。C1组检验后一致性提高到93% (0.814 [95% CI, 0.76 ~ 0.87], P < 0.001), C2组提高到90% (0.741 [95% CI, 0.64 ~ 0.84], P < 0.001)。临床医生推荐的oncotype DX治疗前和后的一致性分别为56%和63%,而gpt - 40治疗C1和C2的一致性分别为68%和60%。gpt - 40在预测绝经后患者低基因组风险比高基因组风险的准确性更高(C1组为87% vs 43%; C2组为85% vs 45%, P < 0.001),两组队列中绝经前患者低基因组风险比中基因组风险高基因组风险的准确性更高(P < 0.001)。结论:临床医生和gpt - 40在测试前推荐方面的一致性不高,但在测试后有所改善,突出了多基因测试的重要性和大语言模型在临床决策中的潜力。
{"title":"Analysis of Large Language Model Decision Making in Hormone Receptor-Positive/Human Epidermal Growth Factor Receptor 2-Negative Early Breast Cancer.","authors":"Roberto Buonaiuto, Aldo Caltavituro, Rossana Di Rienzo, Angela Grieco, Federica P Mangiacotti, Alessandra Longobardi, Vincenza Cantile, Vittoria Molinaro, Martina Pagliuca, Giuseppe Buono, Pietro De Placido, Erica Pietroluongo, Valeria Forestieri, Claudia Martinelli, Vincenzo di Lauro, Luigi Leo, Massimiliano D'Aiuto, Giampaolo Bianchini, Carmen Criscitiello, Roberto Bianco, Lucia Del Mastro, Michelino De Laurentiis, Grazia Arpino, Carmine De Angelis, Mario Giuliano","doi":"10.1200/CCI-25-00230","DOIUrl":"10.1200/CCI-25-00230","url":null,"abstract":"<p><strong>Purpose: </strong>To assess the ability of GPT-4o in adjuvant treatment decision making in hormone receptor-positive (HR+)/human epidermal growth factor receptor 2-negative (HER2-) early breast cancer by comparing its recommendations with those of clinicians including Oncotype DX data, and to explore its potential as a decision-support tool in routine clinical practice.</p><p><strong>Methods: </strong>We compared clinician and GPT-4o recommendations in patients tested with Oncotype DX in routine practice at the University of Naples Federico II (n = 607, cohort 1 [C1]) and within the prospective, multicenter PRO BONO study (n = 237, cohort 2 [C2]). Pre- and post-Oncotype DX treatment recommendations were categorized as chemotherapy (CT) + endocrine therapy (ET) or ET alone. Concordance between clinician and GPT-4o recommendations was assessed using agreement rates and Cohen's kappa. The accuracy of Oncotype DX results was evaluated using the AUC metric.</p><p><strong>Results: </strong>The agreement between clinicians and GPT-4o in pretest recommendations was 68% (kappa, 0.381 [95% CI, 0.31 to 0.45], <i>P</i> < .001) in C1 and 70% (0.401 [95% CI, 0.29 to 0.52], <i>P</i> < .001) in C2. Before Oncotype DX, clinicians recommended CT more frequently than GPT-4o for C1 (58% <i>v</i> 38%) and C2 (53% <i>v</i> 43%). Post-test agreement increased to 93% (0.814 [95% CI, 0.76 to 0.87], <i>P</i> < .001) in C1 and 90% (0.741 [95% CI, 0.64 to 0.84], <i>P</i> < .001) in C2. The agreement between pre- and post-Oncotype DX treatment recommendations for clinicians was 56% and 63% versus 68% and 60% for GPT-4o in C1 and C2, respectively. GPT-4o showed higher accuracy in predicting low than high genomic risk in postmenopausal patients (87% <i>v</i> 43% in C1; 85% <i>v</i> 45% in C2, <i>P</i> < .001) and low versus intermediate and high risk in premenopausal patients in both cohorts (<i>P</i> < .001).</p><p><strong>Conclusion: </strong>The agreement between clinicians and GPT-4o in pretest recommendations was modest but improved post-test, highlighting the importance of multigene testing and the potential of large language models in clinical decision making.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500230"},"PeriodicalIF":2.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12986038/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147370764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-03-09DOI: 10.1200/CCI-25-00321
Lauren B Davis Rivera, Lauren Mitchell, Muhammad Danyal Ahsan, Isabelle Chandler, Emily S Epstein, Emerson P Borsato, Caitlin Allen, Kimberly A Kaphingst, Richard L Bradshaw, Guilherme Del Fiol, Kensaku Kawamoto, Anne C Madeo, Ravi N Sharaf, Melissa K Frey
Purpose: Cascade genetic testing enables identification of relatives at risk of hereditary cancer syndromes, creating opportunities for early detection and prevention. However, uptake of cascade testing remains low, with approximately one-third of eligible relatives completing testing, largely because of reliance on patient-mediated communication. Although clinician-mediated outreach has demonstrated improved efficacy, it is often limited by resource demands. Scalable digital health tools are a promising strategy to address this gap in testing uptake.
Methods: In this quality improvement initiative, we developed a digital cascade chatbot to deliver gene-specific education and facilitate access to genetic services among at-risk relatives. Between October 2024 and January 2025, 100 consecutive probands with a hereditary cancer pathogenic variant seen in a gynecologic oncology clinic were offered a cascade chatbot to share with their relatives. The primary outcome was proband acceptance of the cascade chatbot. Secondary outcomes included sharing of the cascade chatbot with at-risk relatives and relatives' subsequent utilization of genetic services. Outcomes were evaluated through telephone follow-up at 2 weeks and 3 months after chatbot introduction.
Results: Fifty-nine of 100 probands reported having relatives who had not undergone genetic testing. Among this group, 58 (98.3%) accepted the cascade chatbot. At 2-week follow-up, 44 of 58 probands (75.9%) had shared the cascade chatbot with at least one relative, and an additional eight (13.8%) reported plans to share. At 3-month follow-up with probands, 48 (82.8%) probands had shared the cascade chatbot with at least one relative. A total of 122 relatives received the cascade chatbot and 96 (78.7%) were reached for 3-month follow-up. Among the 96 relatives reached, 49 (51.0%) had scheduled or completed a genetics appointment, and of them, 36 (73.5%) had completed testing.
Conclusion: A cascade chatbot was highly acceptable to probands and effectively engaged relatives. Scalable digital health tools may enhance cascade testing and support precision cancer prevention.
{"title":"Cascade Chatbot: A Scalable Approach to Family-Based Genetic Testing for Hereditary Cancer Syndromes.","authors":"Lauren B Davis Rivera, Lauren Mitchell, Muhammad Danyal Ahsan, Isabelle Chandler, Emily S Epstein, Emerson P Borsato, Caitlin Allen, Kimberly A Kaphingst, Richard L Bradshaw, Guilherme Del Fiol, Kensaku Kawamoto, Anne C Madeo, Ravi N Sharaf, Melissa K Frey","doi":"10.1200/CCI-25-00321","DOIUrl":"https://doi.org/10.1200/CCI-25-00321","url":null,"abstract":"<p><strong>Purpose: </strong>Cascade genetic testing enables identification of relatives at risk of hereditary cancer syndromes, creating opportunities for early detection and prevention. However, uptake of cascade testing remains low, with approximately one-third of eligible relatives completing testing, largely because of reliance on patient-mediated communication. Although clinician-mediated outreach has demonstrated improved efficacy, it is often limited by resource demands. Scalable digital health tools are a promising strategy to address this gap in testing uptake.</p><p><strong>Methods: </strong>In this quality improvement initiative, we developed a digital cascade chatbot to deliver gene-specific education and facilitate access to genetic services among at-risk relatives. Between October 2024 and January 2025, 100 consecutive probands with a hereditary cancer pathogenic variant seen in a gynecologic oncology clinic were offered a cascade chatbot to share with their relatives. The primary outcome was proband acceptance of the cascade chatbot. Secondary outcomes included sharing of the cascade chatbot with at-risk relatives and relatives' subsequent utilization of genetic services. Outcomes were evaluated through telephone follow-up at 2 weeks and 3 months after chatbot introduction.</p><p><strong>Results: </strong>Fifty-nine of 100 probands reported having relatives who had not undergone genetic testing. Among this group, 58 (98.3%) accepted the cascade chatbot. At 2-week follow-up, 44 of 58 probands (75.9%) had shared the cascade chatbot with at least one relative, and an additional eight (13.8%) reported plans to share. At 3-month follow-up with probands, 48 (82.8%) probands had shared the cascade chatbot with at least one relative. A total of 122 relatives received the cascade chatbot and 96 (78.7%) were reached for 3-month follow-up. Among the 96 relatives reached, 49 (51.0%) had scheduled or completed a genetics appointment, and of them, 36 (73.5%) had completed testing.</p><p><strong>Conclusion: </strong>A cascade chatbot was highly acceptable to probands and effectively engaged relatives. Scalable digital health tools may enhance cascade testing and support precision cancer prevention.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500321"},"PeriodicalIF":2.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147391620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-03-18DOI: 10.1200/CCI-25-00159
Matthias Vanderkerken, Koen Van Eygen, Veerle Galle, Annelies Verbiest, Ann Janssens, Imke Masuy, Kristof Theys, Tine Cuppens, Katoo Muylle, Ann De Becker
Purpose: Chronic lymphocytic leukemia (CLL) treatment paradigms have evolved significantly, yet real-world evidence (RWE) on guideline implementation and patient characteristics remains limited.
Materials and methods: This multicenter retrospective study leveraged artificial intelligence (AI) to analyze structured and unstructured data from four Belgian hospitals (January 1, 2018-October 31, 2021). Structured data including diagnosis codes, laboratory results, treatment records, and national registries were standardized using the Observational Medical Outcomes Partnership (OMOP) Common Data Model. Unstructured clinical notes and reports were processed using a transformer-based natural language processing (NLP) pipeline. We examined clinical characteristics, diagnostic testing, and treatment patterns among patients with newly diagnosed CLL.
Results: Of 22 variable groups analyzed, 50.0% was derived from structured data only, 36.4% from unstructured data only (NLP-extracted), and 13.6% from mixed sources. Five hundred eighty-six patients with CLL were identified, with a median age of 74 years. One hundred seventy-four patients (29.7%) initiated first-line (1L) treatment, and 41 progressed to second-line treatment. Of 1L treated patients, 68.4% had at least one prespecified comorbidity, including 12.1% with significant cardiovascular disease. TP53/del17p testing was documented in 34.3% of patients before 1L treatment, with aberrations detected in 42.8%. Bruton's tyrosine kinase inhibitors (BTKi; 35.6%) were the most common 1L treatment, followed by chemoimmunotherapy (CIT; 25.9%). CIT use declined (30.6% to 17.5%), whereas BTKi use remained stable (34.2% to 38.1%) between 2018 and 2021.
Conclusion: This AI-augmented study demonstrates the feasibility and scalability of combining NLP-derived insights with OMOP-standardized structured data to generate reproducible RWE in hematology. Our results highlight an elderly CLL population with significant comorbidities and a shift toward targeted therapies. While treatment patterns aligned with guidelines, data quality depended on source documentation accessibility. Improved integration of molecular testing into electronic health records is essential for enhancing clinical decision making, patient outcomes, and future research.
{"title":"Leveraging Digital Technology and Artificial Intelligence to Describe the Real-World Belgian Chronic Lymphocytic Leukemia Patient Population: The BE-CLLEAR Study.","authors":"Matthias Vanderkerken, Koen Van Eygen, Veerle Galle, Annelies Verbiest, Ann Janssens, Imke Masuy, Kristof Theys, Tine Cuppens, Katoo Muylle, Ann De Becker","doi":"10.1200/CCI-25-00159","DOIUrl":"10.1200/CCI-25-00159","url":null,"abstract":"<p><strong>Purpose: </strong>Chronic lymphocytic leukemia (CLL) treatment paradigms have evolved significantly, yet real-world evidence (RWE) on guideline implementation and patient characteristics remains limited.</p><p><strong>Materials and methods: </strong>This multicenter retrospective study leveraged artificial intelligence (AI) to analyze structured and unstructured data from four Belgian hospitals (January 1, 2018-October 31, 2021). Structured data including diagnosis codes, laboratory results, treatment records, and national registries were standardized using the Observational Medical Outcomes Partnership (OMOP) Common Data Model. Unstructured clinical notes and reports were processed using a transformer-based natural language processing (NLP) pipeline. We examined clinical characteristics, diagnostic testing, and treatment patterns among patients with newly diagnosed CLL.</p><p><strong>Results: </strong>Of 22 variable groups analyzed, 50.0% was derived from structured data only, 36.4% from unstructured data only (NLP-extracted), and 13.6% from mixed sources. Five hundred eighty-six patients with CLL were identified, with a median age of 74 years. One hundred seventy-four patients (29.7%) initiated first-line (1L) treatment, and 41 progressed to second-line treatment. Of 1L treated patients, 68.4% had at least one prespecified comorbidity, including 12.1% with significant cardiovascular disease. <i>TP53</i>/del17p testing was documented in 34.3% of patients before 1L treatment, with aberrations detected in 42.8%. Bruton's tyrosine kinase inhibitors (BTKi; 35.6%) were the most common 1L treatment, followed by chemoimmunotherapy (CIT; 25.9%). CIT use declined (30.6% to 17.5%), whereas BTKi use remained stable (34.2% to 38.1%) between 2018 and 2021.</p><p><strong>Conclusion: </strong>This AI-augmented study demonstrates the feasibility and scalability of combining NLP-derived insights with OMOP-standardized structured data to generate reproducible RWE in hematology. Our results highlight an elderly CLL population with significant comorbidities and a shift toward targeted therapies. While treatment patterns aligned with guidelines, data quality depended on source documentation accessibility. Improved integration of molecular testing into electronic health records is essential for enhancing clinical decision making, patient outcomes, and future research.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500159"},"PeriodicalIF":2.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13003938/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147482349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-03-10DOI: 10.1200/CCI-26-00031
Christine Adams
Large language models (LLMs) and artificial intelligence systems possess the transformative potential to revolutionize cancer care. However, their integration into oncology presents both extraordinary opportunities and challenges. Clinically, these tools can extract actionable insights from pathology reports, radiology imaging, and genomic sequencing at previously impossible scales. They also enhance the patient-facing dimension by providing accurate informational support and improving patient-clinical trial matching. In translational research, LLMs accelerate informatics analysis for single-cell transcriptomics, spatial omics, and computational pathology, thereby improving support for precision oncology. However, ethical concerns regarding trust, equity, privacy, transparency, non-maleficence, and accountability call for caution. Implementation challenges include hallucination risks, high computational costs, and the potential to exacerbate existing healthcare disparities. Furthermore, developers must navigate a fragmented regulatory landscape consisting of an evolving patchwork of federal, state, and international rules. Responsible implementation requires appropriate skepticism, rigorous validation, and a commitment to patient welfare to navigate this rapidly evolving landscape.
{"title":"Opportunities and Challenges in Implementing Large Language Models (LLMs) in Oncology.","authors":"Christine Adams","doi":"10.1200/CCI-26-00031","DOIUrl":"https://doi.org/10.1200/CCI-26-00031","url":null,"abstract":"<p><p>Large language models (LLMs) and artificial intelligence systems possess the transformative potential to revolutionize cancer care. However, their integration into oncology presents both extraordinary opportunities and challenges. Clinically, these tools can extract actionable insights from pathology reports, radiology imaging, and genomic sequencing at previously impossible scales. They also enhance the patient-facing dimension by providing accurate informational support and improving patient-clinical trial matching. In translational research, LLMs accelerate informatics analysis for single-cell transcriptomics, spatial omics, and computational pathology, thereby improving support for precision oncology. However, ethical concerns regarding trust, equity, privacy, transparency, non-maleficence, and accountability call for caution. Implementation challenges include hallucination risks, high computational costs, and the potential to exacerbate existing healthcare disparities. Furthermore, developers must navigate a fragmented regulatory landscape consisting of an evolving patchwork of federal, state, and international rules. Responsible implementation requires appropriate skepticism, rigorous validation, and a commitment to patient welfare to navigate this rapidly evolving landscape.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2600031"},"PeriodicalIF":2.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147437517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-03-20DOI: 10.1200/CCI-26-00005
David H Noyd
{"title":"Empowering Children and Adolescents With Cancer Through Novel, Electronic Health Record-Embedded Symptom Management Tools.","authors":"David H Noyd","doi":"10.1200/CCI-26-00005","DOIUrl":"https://doi.org/10.1200/CCI-26-00005","url":null,"abstract":"","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2600005"},"PeriodicalIF":2.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147492261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}