Ji-Eun Irene Yum, Syed Arsalan Ahmed Naqvi, Ben Zhou, Irbaz Bin Riaz
The emergence of state-of-the-art large language models (LLMs), which hold the ability to generalize to diverse natural language processing tasks, has led to new opportunities in health care. Oncology is especially well-suited to leverage these resources as the journeys of patients with cancer inherently yield extensive, longitudinal data sets comprising clinical narratives, pathology and radiology reports, and genomic sequencing reports. This review begins with an overview of the fundamental concepts behind LLMs, including the definitions, architecture, training paradigm, and performance optimization through prompt engineering and retrieval-augmented generation. We also take a moment to explore the newly emerging paradigm of LLMs in a multiagentic framework. We then synthesize current research on how LLMs may benefit stakeholders within the practice of oncology, including patients, oncologists, researchers, and learners. Finally, we address the limitations and risks of LLMs, including hallucinations, inherent biases, patient privacy, and clinician deskilling. While research thus far shows significant potential for LLMs to transform cancer care, necessary future directions include studies emphasizing patient stakeholder perspectives on LLM incorporation in clinical workflows, the development of relevant clinical benchmarks for LLM evaluation, a greater focus on real-world prospective testing, and deeper exploration of LLM reasoning capabilities.
{"title":"Reimagining Cancer Care With Generative Artificial Intelligence: The Promise of Large Language Models.","authors":"Ji-Eun Irene Yum, Syed Arsalan Ahmed Naqvi, Ben Zhou, Irbaz Bin Riaz","doi":"10.1200/CCI-25-00134","DOIUrl":"https://doi.org/10.1200/CCI-25-00134","url":null,"abstract":"<p><p>The emergence of state-of-the-art large language models (LLMs), which hold the ability to generalize to diverse natural language processing tasks, has led to new opportunities in health care. Oncology is especially well-suited to leverage these resources as the journeys of patients with cancer inherently yield extensive, longitudinal data sets comprising clinical narratives, pathology and radiology reports, and genomic sequencing reports. This review begins with an overview of the fundamental concepts behind LLMs, including the definitions, architecture, training paradigm, and performance optimization through prompt engineering and retrieval-augmented generation. We also take a moment to explore the newly emerging paradigm of LLMs in a multiagentic framework. We then synthesize current research on how LLMs may benefit stakeholders within the practice of oncology, including patients, oncologists, researchers, and learners. Finally, we address the limitations and risks of LLMs, including hallucinations, inherent biases, patient privacy, and clinician deskilling. While research thus far shows significant potential for LLMs to transform cancer care, necessary future directions include studies emphasizing patient stakeholder perspectives on LLM incorporation in clinical workflows, the development of relevant clinical benchmarks for LLM evaluation, a greater focus on real-world prospective testing, and deeper exploration of LLM reasoning capabilities.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500134"},"PeriodicalIF":2.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145656461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-12-23DOI: 10.1200/CCI-25-00163
Nicole Rademacher, Connor Sisk, Joshua S Richman, Kristy Broman, Changzhen Wang
Purpose: The Commission on Cancer (CoC) seeks to expand access to high-quality care through community engagement standards targeting centers' catchment areas and efforts to accredit centers in more areas including rural hospitals. Little is known about the social, environmental, and geographic characteristics of their catchment areas. To support future investigation into the impact of CoC-accredited centers, this study compares characteristics of cancer care utilization-based catchment areas, termed Cancer Service Areas (CSAs), with and without CoC-accredited centers.
Methods: Geocoded CoC-accredited centers and cancer care patient flows extracted from Medicare claims data were used to delineate CSAs using a spatially constrained community detection method. Characteristics including environmental justice index (EJI), social vulnerability index (SVI), rurality, travel time, and localization index (LI, a ratio of cancer care received by patients within a CSA) were aggregated by CSA. A logistic regression model was created to evaluate characteristics associated with the presence of a CoC-accredited center within a CSA.
Results: Six hundred sixty-eight CSAs were defined, of which 511 CSAs had at least one CoC-accredited center. CSAs with CoC-accredited centers had lower health vulnerability (odds ratio [OR], 0.65 [95% CI, 0.427 to 0.993]) and lower racial and ethnic minority status vulnerability (OR, 0.61 [95% CI, 0.424 to 0.886]), but no differences for other components of the EJI or SVI. These CSAs also had higher LIs, meaning patients remained in their local CSA for care (OR, 9.00 [95% CI, 2.408 to 33.640] for high v low LIs).
Conclusion: Minority and comorbid populations may have more difficulty accessing cancer center care, further exacerbating observed variations in cancer outcomes. Cancer centers may address this by broadening their outreach into at-risk catchment areas.
{"title":"Geospatial Analysis of Commission on Cancer-Accredited Centers Within Cancer Care Utilization-Based Catchment Areas.","authors":"Nicole Rademacher, Connor Sisk, Joshua S Richman, Kristy Broman, Changzhen Wang","doi":"10.1200/CCI-25-00163","DOIUrl":"https://doi.org/10.1200/CCI-25-00163","url":null,"abstract":"<p><strong>Purpose: </strong>The Commission on Cancer (CoC) seeks to expand access to high-quality care through community engagement standards targeting centers' catchment areas and efforts to accredit centers in more areas including rural hospitals. Little is known about the social, environmental, and geographic characteristics of their catchment areas. To support future investigation into the impact of CoC-accredited centers, this study compares characteristics of cancer care utilization-based catchment areas, termed <i>Cancer Service Areas</i> (<i>CSAs</i>), with and without CoC-accredited centers.</p><p><strong>Methods: </strong>Geocoded CoC-accredited centers and cancer care patient flows extracted from Medicare claims data were used to delineate CSAs using a spatially constrained community detection method. Characteristics including environmental justice index (EJI), social vulnerability index (SVI), rurality, travel time, and localization index (LI, a ratio of cancer care received by patients within a CSA) were aggregated by CSA. A logistic regression model was created to evaluate characteristics associated with the presence of a CoC-accredited center within a CSA.</p><p><strong>Results: </strong>Six hundred sixty-eight CSAs were defined, of which 511 CSAs had at least one CoC-accredited center. CSAs with CoC-accredited centers had lower health vulnerability (odds ratio [OR], 0.65 [95% CI, 0.427 to 0.993]) and lower racial and ethnic minority status vulnerability (OR, 0.61 [95% CI, 0.424 to 0.886]), but no differences for other components of the EJI or SVI. These CSAs also had higher LIs, meaning patients remained in their local CSA for care (OR, 9.00 [95% CI, 2.408 to 33.640] for high <i>v</i> low LIs).</p><p><strong>Conclusion: </strong>Minority and comorbid populations may have more difficulty accessing cancer center care, further exacerbating observed variations in cancer outcomes. Cancer centers may address this by broadening their outreach into at-risk catchment areas.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500163"},"PeriodicalIF":2.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145821737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-11-07DOI: 10.1200/CCI-25-00086
Jesse Persily, Steven L Chang, Chen Chen, Yassamin Neshatvar, Siri Desiraju, Rajesh Ranganath, Katie Murray, Adam Feldman, Douglas Dahl, Samir S Taneja, William C Huang, Madhur Nayan
Purpose: Partial nephrectomy has been advocated as the preferred surgical approach for small kidney tumors over total nephrectomy. However, partial nephrectomy is associated with increased perioperative risk. Estimating renal function after nephrectomy can facilitate personalized patient counseling, guide surgical approach, and identify patients who could benefit from perioperative interventions. Existing prediction models have several limitations including the lack of external validation or a user-friendly tool or application, and most have used traditional statistical methods.
Methods: We used data from two academic medical institutions and machine learning (ML) methods to develop and externally validate renal function after nephrectomy-machine learning (RFAN-ML), a model to estimate long-term renal function after partial or total nephrectomy. Boruta feature selection was used to select four routinely available clinical features, specifically age, BMI, preoperative renal function, and nephrectomy type. In the training set of 1,932 patients, we compared six ML regression models representing a set of both ensemble and nonensemble ML algorithms and optimized for root mean squared error (RMSE). This model was evaluated in a test set of 1,995 patients, and the best performing model was selected as RFAN-ML.
Results: We compared RFAN-ML with existing renal function prediction benchmarks and found that RFAN-ML outperformed or had competitive performance with benchmarks on RMSE (16.6 [95% CI, 15.6 to 17.5]), R2, and mean absolute error.
Conclusion: We developed and externally validated RFAN-ML, a ML model to predict renal function after nephrectomy, and have deployed our model online. RFAN-ML has the potential to improve the care and outcomes in patients with kidney tumors by informing personalized patient counseling and guiding surgical planning.
{"title":"Development, External Validation, and Deployment of RFAN-ML: A Machine Learning Model to Estimate Renal Function After Nephrectomy.","authors":"Jesse Persily, Steven L Chang, Chen Chen, Yassamin Neshatvar, Siri Desiraju, Rajesh Ranganath, Katie Murray, Adam Feldman, Douglas Dahl, Samir S Taneja, William C Huang, Madhur Nayan","doi":"10.1200/CCI-25-00086","DOIUrl":"https://doi.org/10.1200/CCI-25-00086","url":null,"abstract":"<p><strong>Purpose: </strong>Partial nephrectomy has been advocated as the preferred surgical approach for small kidney tumors over total nephrectomy. However, partial nephrectomy is associated with increased perioperative risk. Estimating renal function after nephrectomy can facilitate personalized patient counseling, guide surgical approach, and identify patients who could benefit from perioperative interventions. Existing prediction models have several limitations including the lack of external validation or a user-friendly tool or application, and most have used traditional statistical methods.</p><p><strong>Methods: </strong>We used data from two academic medical institutions and machine learning (ML) methods to develop and externally validate renal function after nephrectomy-machine learning (RFAN-ML), a model to estimate long-term renal function after partial or total nephrectomy. Boruta feature selection was used to select four routinely available clinical features, specifically age, BMI, preoperative renal function, and nephrectomy type. In the training set of 1,932 patients, we compared six ML regression models representing a set of both ensemble and nonensemble ML algorithms and optimized for root mean squared error (RMSE). This model was evaluated in a test set of 1,995 patients, and the best performing model was selected as RFAN-ML.</p><p><strong>Results: </strong>We compared RFAN-ML with existing renal function prediction benchmarks and found that RFAN-ML outperformed or had competitive performance with benchmarks on RMSE (16.6 [95% CI, 15.6 to 17.5]), R<sup>2</sup>, and mean absolute error.</p><p><strong>Conclusion: </strong>We developed and externally validated RFAN-ML, a ML model to predict renal function after nephrectomy, and have deployed our model online. RFAN-ML has the potential to improve the care and outcomes in patients with kidney tumors by informing personalized patient counseling and guiding surgical planning.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500086"},"PeriodicalIF":2.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145472458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose: To examine the geospatial distribution of melanoma incidence in Pennsylvania (PA), quantify its association with agriculture practices and patterns, and consider its relevance for cancer control.
Methods: The study used an ecologic design with county-level PA data on the 2017-2021 incidence of invasive melanoma among adults 50 years and older, as well as agricultural patterns and practices, ultraviolet radiation (UVR), and demographics/socioeconomics. Spatial clustering was examined using local indicators of spatial association and Getis-Ord Gi*. Separate adjacency-weighted Conway-Maxwell-Poisson models, adjusted for UVR and social vulnerability, quantified the association between melanoma and (1) cultivated and pasture/hay acreage and (2) herbicide-, insecticide-, fungicide-, and manure-treated acreage.
Results: Melanoma incidence was 57.1% greater in a 15-county cluster (P < .05) in South Central PA; eight counties were designated as metropolitan. Compared with noncluster counties, cluster counties had significantly more cultivated land (mean 19.8% v 6.9%, P < .001) and herbicide-treated land (16.8% v 6.5%, P < .001). In adjusted models, a 10% increase in cultivated land and a 9% increase in herbicide-treated acreage each independently corresponded to a 14% increase in incidence.
Conclusion: Melanoma incidence clustered in South Central PA, an area with substantial agricultural industry. However, a majority of counties in the cluster were designated as metropolitan, challenging the concept that agriculture is primarily an industry of counties designated as nonmetropolitan (rural). Agricultural practices and patterns were associated with incidence, suggesting that cancer control adopt an integrated One Health approach to concurrently address occupational, environmental, and behavioral risks. The cluster was entirely within the 28-county catchment area of the Penn State Cancer Institute, demonstrating the utility of geospatial data and analysis for cancer control by a cancer center.
目的:研究宾夕法尼亚州(PA)黑色素瘤发病率的地理空间分布,量化其与农业实践和模式的关系,并考虑其与癌症控制的相关性。方法:该研究采用生态设计,结合2017-2021年50岁及以上成年人侵袭性黑色素瘤发病率的县级PA数据,以及农业模式和实践、紫外线辐射(UVR)和人口统计学/社会经济学数据。利用空间关联局部指标和Getis-Ord Gi*检验空间聚类。单独的邻接加权康威-麦克斯韦-泊松模型,对紫外线辐射和社会脆弱性进行了调整,量化了黑色素瘤与(1)耕地和牧场/干草面积以及(2)除草剂、杀虫剂、杀菌剂和肥料处理面积之间的关系。结果:PA中南部15个县的黑色素瘤发病率高出57.1% (P < 0.05);8个县被指定为都会县。与非聚类县相比,聚类县的耕地(平均19.8% vs 6.9%, P < .001)和除草剂处理土地(16.8% vs 6.5%, P < .001)显著增加。在调整后的模型中,耕地面积增加10%和除草剂处理面积增加9%各自对应于发病率增加14%。结论:黑色素瘤发病集中在PA中南部,该地区农业产业丰富。然而,集群中的大多数县被指定为大都市,挑战了农业主要是被指定为非大都市(农村)县的产业的概念。农业实践和模式与发病率相关,这表明癌症控制应采用综合的“同一个健康”方法,同时处理职业、环境和行为风险。该集群完全位于宾夕法尼亚州立癌症研究所的28个县的集水区内,展示了癌症中心在癌症控制方面的地理空间数据和分析的效用。
{"title":"Harvesting Risk: An Ecologic Study of Agricultural Practices and Patterns and Melanoma Incidence in Pennsylvania.","authors":"Benjamin J Marks, Jiangang Liao, Charlene Lam, Camille Moeckel, Eugene J Lengerich","doi":"10.1200/CCI-25-00160","DOIUrl":"10.1200/CCI-25-00160","url":null,"abstract":"<p><strong>Purpose: </strong>To examine the geospatial distribution of melanoma incidence in Pennsylvania (PA), quantify its association with agriculture practices and patterns, and consider its relevance for cancer control.</p><p><strong>Methods: </strong>The study used an ecologic design with county-level PA data on the 2017-2021 incidence of invasive melanoma among adults 50 years and older, as well as agricultural patterns and practices, ultraviolet radiation (UVR), and demographics/socioeconomics. Spatial clustering was examined using local indicators of spatial association and Getis-Ord Gi*. Separate adjacency-weighted Conway-Maxwell-Poisson models, adjusted for UVR and social vulnerability, quantified the association between melanoma and (1) cultivated and pasture/hay acreage and (2) herbicide-, insecticide-, fungicide-, and manure-treated acreage.</p><p><strong>Results: </strong>Melanoma incidence was 57.1% greater in a 15-county cluster (<i>P</i> < .05) in South Central PA; eight counties were designated as metropolitan. Compared with noncluster counties, cluster counties had significantly more cultivated land (mean 19.8% <i>v</i> 6.9%, <i>P</i> < .001) and herbicide-treated land (16.8% <i>v</i> 6.5%, <i>P</i> < .001). In adjusted models, a 10% increase in cultivated land and a 9% increase in herbicide-treated acreage each independently corresponded to a 14% increase in incidence.</p><p><strong>Conclusion: </strong>Melanoma incidence clustered in South Central PA, an area with substantial agricultural industry. However, a majority of counties in the cluster were designated as metropolitan, challenging the concept that agriculture is primarily an industry of counties designated as nonmetropolitan (rural). Agricultural practices and patterns were associated with incidence, suggesting that cancer control adopt an integrated One Health approach to concurrently address occupational, environmental, and behavioral risks. The cluster was entirely within the 28-county catchment area of the Penn State Cancer Institute, demonstrating the utility of geospatial data and analysis for cancer control by a cancer center.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500160"},"PeriodicalIF":2.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12629121/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145524820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose: Family caregivers of patients with terminal cancer need psychospiritual care. The assessment of their psychospiritual distress is challenging. An automated system can be used to detect psychospiritual distress from large medical records in electronic medical records and help health care providers to accurately assess distress. This study aimed to develop an artificial intelligence system that automatically detects the psychological and spiritual distress of the families of patients with terminal cancer from unstructured text data in electronic medical records.
Methods: This retrospective study collected medical records (n = 1,554,736) from 1 month before the participants died. The participants (n = 808) died at Tohoku University Hospital in Japan between January 1, 2018, and December 31, 2019. We randomly selected 10,000 records from physician and nursing records and split the data set into training and testing sets at a ratio of 70:30. We used the area under the receiver operating characteristic curve (AUROC) and precision-recall curve (AUPRC) to evaluate the model performances. We used explain it like I am 5 and identified important expressions for detecting psychospiritual distress.
Results: The model with the highest performance for detecting psychological distress had AUROC and AUPRC values of 0.92 and 0.62, respectively. The model with the highest performance for detecting spiritual distress had values of 0.92 and 0.41, respectively. In psychological distress, the expressions with higher values were anxiety, worry, and tears. In spiritual distress, the expressions with higher values were want, me, and how.
Conclusion: This study showed the application of machine learning models for the detection of psychospiritual distress among family caregivers of patients with terminal cancer from electronic medical records.
{"title":"Artificial Intelligence System for Psychospiritual Distress in Family Caregivers of Patients With Terminal Cancer: A Retrospective Study.","authors":"Kento Masukawa, Ryusho Suzuki, Momoka Tanno, Masaharu Nakayama, Mitsunori Miyashita","doi":"10.1200/CCI-25-00129","DOIUrl":"https://doi.org/10.1200/CCI-25-00129","url":null,"abstract":"<p><strong>Purpose: </strong>Family caregivers of patients with terminal cancer need psychospiritual care. The assessment of their psychospiritual distress is challenging. An automated system can be used to detect psychospiritual distress from large medical records in electronic medical records and help health care providers to accurately assess distress. This study aimed to develop an artificial intelligence system that automatically detects the psychological and spiritual distress of the families of patients with terminal cancer from unstructured text data in electronic medical records.</p><p><strong>Methods: </strong>This retrospective study collected medical records (n = 1,554,736) from 1 month before the participants died. The participants (n = 808) died at Tohoku University Hospital in Japan between January 1, 2018, and December 31, 2019. We randomly selected 10,000 records from physician and nursing records and split the data set into training and testing sets at a ratio of 70:30. We used the area under the receiver operating characteristic curve (AUROC) and precision-recall curve (AUPRC) to evaluate the model performances. We used explain it like I am 5 and identified important expressions for detecting psychospiritual distress.</p><p><strong>Results: </strong>The model with the highest performance for detecting psychological distress had AUROC and AUPRC values of 0.92 and 0.62, respectively. The model with the highest performance for detecting spiritual distress had values of 0.92 and 0.41, respectively. In psychological distress, the expressions with higher values were anxiety, worry, and tears. In spiritual distress, the expressions with higher values were want, me, and how.</p><p><strong>Conclusion: </strong>This study showed the application of machine learning models for the detection of psychospiritual distress among family caregivers of patients with terminal cancer from electronic medical records.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500129"},"PeriodicalIF":2.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145453531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-11-14DOI: 10.1200/CCI-25-00220
Nikhil Gautam Thaker, Navid Redjal, Adam Dicker, Arturo Loaiza-Bonilla, Trevor Royce, Vivek Subbiah, Vikash Deendyal, Jonathan R Gabriel, Neena Shetty, Ajay Choudhri, Gautam H Thaker
Purpose: Large language models (LLMs) show promise in assisting knowledge-intensive fields such as oncology, where up-to-date information and multidisciplinary expertise are critical. Traditional LLMs risk hallucinations and reliance on static, possibly outdated data that lack domain-specific context. Retrieval-augmented generation (RAG) has emerged as a strategy to address these issues by incorporating domain-specific information from external knowledge repositories.
Methods: We evaluated 15 LLMs, including Meta Llama-2/3, generative pretrained transformer (GPT)-3.5/4/4o variants, Claude-3, Gemini-2.0, and DeepSeek-R1. In a zero-shot workflow, each LLM answered 298 scorable questions from the 2021 American College of Radiology in-training examination. We implemented a RAG pipeline (Iridium Model) that transforms user prompts into vector embeddings, queries a specialized radiation oncology database, and merges relevant text with the original prompt to form an augmented query. We compared zero-shot versus RAG-augmented performance.
Results: Larger-parameter LLMs had higher zero-shot accuracy, with six models outscoring graduating residents (P < .01). Top scorers were reasoning models GPT-4o1, o3-mini, and DeepSeek-R1, which achieved 91.6%, 86.6%, and 91.6% without RAG, respectively. Gemini-2.0 improved 6.7% (to 79.2%), Llama-3-70b 8.4% (to 75.8%), and GPT-4o 5.7% (to 85.6%) with RAG. Top scoring reasoning models surpassed graduating resident averages by 17.7%-20% (P < .01), but had no improvement or detriment with RAG. Domain-specific gains occurred in clinical, biology, and physics. Majority voting boosted aggregate accuracy when individual model performance exceeded 50%. RAG workflows and reasoning models incurred higher computational costs.
Conclusion: Radiation-oncology-specific retrieval-augmented generation pipeline enhances nonreasoning LLM performance in radiation oncology by integrating domain-specific evidence, whereas it does not improve performance of reasoning models. These findings demonstrate that RAG can elevate clinical decision support by enabling simpler, cost-effective nonreasoning models to tackle complex tasks through retrieval capabilities-an efficient alternative to extensive model training that also yields citable, evidence-based explanations.
{"title":"RadOncRAG: A Novel Retrieval-Augmented Generation Framework Improves Large Language Model Benchmark Performance in Radiation Oncology.","authors":"Nikhil Gautam Thaker, Navid Redjal, Adam Dicker, Arturo Loaiza-Bonilla, Trevor Royce, Vivek Subbiah, Vikash Deendyal, Jonathan R Gabriel, Neena Shetty, Ajay Choudhri, Gautam H Thaker","doi":"10.1200/CCI-25-00220","DOIUrl":"https://doi.org/10.1200/CCI-25-00220","url":null,"abstract":"<p><strong>Purpose: </strong>Large language models (LLMs) show promise in assisting knowledge-intensive fields such as oncology, where up-to-date information and multidisciplinary expertise are critical. Traditional LLMs risk hallucinations and reliance on static, possibly outdated data that lack domain-specific context. Retrieval-augmented generation (RAG) has emerged as a strategy to address these issues by incorporating domain-specific information from external knowledge repositories.</p><p><strong>Methods: </strong>We evaluated 15 LLMs, including Meta Llama-2/3, generative pretrained transformer (GPT)-3.5/4/4o variants, Claude-3, Gemini-2.0, and DeepSeek-R1. In a zero-shot workflow, each LLM answered 298 scorable questions from the 2021 American College of Radiology in-training examination. We implemented a RAG pipeline (Iridium Model) that transforms user prompts into vector embeddings, queries a specialized radiation oncology database, and merges relevant text with the original prompt to form an augmented query. We compared zero-shot versus RAG-augmented performance.</p><p><strong>Results: </strong>Larger-parameter LLMs had higher zero-shot accuracy, with six models outscoring graduating residents (<i>P</i> < .01). Top scorers were reasoning models GPT-4o1, o3-mini, and DeepSeek-R1, which achieved 91.6%, 86.6%, and 91.6% without RAG, respectively. Gemini-2.0 improved 6.7% (to 79.2%), Llama-3-70b 8.4% (to 75.8%), and GPT-4o 5.7% (to 85.6%) with RAG. Top scoring reasoning models surpassed graduating resident averages by 17.7%-20% (<i>P</i> < .01), but had no improvement or detriment with RAG. Domain-specific gains occurred in clinical, biology, and physics. Majority voting boosted aggregate accuracy when individual model performance exceeded 50%. RAG workflows and reasoning models incurred higher computational costs.</p><p><strong>Conclusion: </strong>Radiation-oncology-specific retrieval-augmented generation pipeline enhances nonreasoning LLM performance in radiation oncology by integrating domain-specific evidence, whereas it does not improve performance of reasoning models. These findings demonstrate that RAG can elevate clinical decision support by enabling simpler, cost-effective nonreasoning models to tackle complex tasks through retrieval capabilities-an efficient alternative to extensive model training that also yields citable, evidence-based explanations.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500220"},"PeriodicalIF":2.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145524818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-11-17DOI: 10.1200/CCI-24-00310
Soheil Hashtarkhani, Brianna M White, Benyamin Hoseini, David L Schwartz, Arash Shaban-Nejad
Purpose: Lung cancer (LC) is a leading cause of cancer-related mortality in the United States. Accurate prediction of LC mortality rates is crucial for guiding targeted interventions and addressing health disparities. Although traditional regression-based models have been commonly used, explainable machine learning models may offer enhanced predictive accuracy and deeper insights into the factors influencing LC mortality.
Methods: This study applied three models-random forest (RF), gradient boosting regression (GBR), and linear regression (LR)-to predict county-level LC mortality rates across the United States. Model performance was evaluated using R-squared and root mean squared error (RMSE). Shapley Additive Explanations (SHAP) values were used to determine variable importance and their directional impact. Geographic disparities in LC mortality were analyzed through Getis-Ord (Gi*) hotspot analysis.
Results: The RF model outperformed both GBR and LR, achieving an R2 value of 41.9% and an RMSE of 12.8. SHAP analysis identified smoking rate as the most important predictor, followed by median home value and the percentage of the Hispanic ethnic population. Spatial analysis revealed significant clusters of elevated LC mortality in the mid-eastern counties of the United States.
Conclusion: The RF model demonstrated superior predictive performance for LC mortality rates, emphasizing the critical roles of smoking prevalence, housing values, and the percentage of Hispanic ethnic population. These findings offer valuable actionable insights for designing targeted interventions, promoting screening, and addressing health disparities in regions most affected by LC in the United States.
{"title":"Comparative Evaluation of Explainable Machine Learning Versus Linear Regression for Predicting County-Level Lung Cancer Mortality Rate in the United States.","authors":"Soheil Hashtarkhani, Brianna M White, Benyamin Hoseini, David L Schwartz, Arash Shaban-Nejad","doi":"10.1200/CCI-24-00310","DOIUrl":"10.1200/CCI-24-00310","url":null,"abstract":"<p><strong>Purpose: </strong>Lung cancer (LC) is a leading cause of cancer-related mortality in the United States. Accurate prediction of LC mortality rates is crucial for guiding targeted interventions and addressing health disparities. Although traditional regression-based models have been commonly used, explainable machine learning models may offer enhanced predictive accuracy and deeper insights into the factors influencing LC mortality.</p><p><strong>Methods: </strong>This study applied three models-random forest (RF), gradient boosting regression (GBR), and linear regression (LR)-to predict county-level LC mortality rates across the United States. Model performance was evaluated using R-squared and root mean squared error (RMSE). Shapley Additive Explanations (SHAP) values were used to determine variable importance and their directional impact. Geographic disparities in LC mortality were analyzed through Getis-Ord (Gi*) hotspot analysis.</p><p><strong>Results: </strong>The RF model outperformed both GBR and LR, achieving an <i>R</i><sup>2</sup> value of 41.9% and an RMSE of 12.8. SHAP analysis identified smoking rate as the most important predictor, followed by median home value and the percentage of the Hispanic ethnic population. Spatial analysis revealed significant clusters of elevated LC mortality in the mid-eastern counties of the United States.</p><p><strong>Conclusion: </strong>The RF model demonstrated superior predictive performance for LC mortality rates, emphasizing the critical roles of smoking prevalence, housing values, and the percentage of Hispanic ethnic population. These findings offer valuable actionable insights for designing targeted interventions, promoting screening, and addressing health disparities in regions most affected by LC in the United States.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2400310"},"PeriodicalIF":2.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12643560/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145543792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-11-19DOI: 10.1200/CCI-25-00098
Peter P Yu, W Scott Campbell, Eric B Durbin, Lawrence N Shulman, Jeremy L Warner
The National Cancer Policy Forum workshop Enabling 21st Century Applications for Cancer Surveillance Through Enhanced Registries and Beyond examined the current state of cancer registries and how they might evolve to extend registry missions to national health priorities related to improving patient and health economic outcomes, equitable access to care, and improvement in quality of health care and health system operational efficiencies. Session 3 of the workshop focused on medical informatics as a driver of improvement in cancer registry data quality and interoperability. Data quality begins with precision in data definitions as codified in controlled vocabularies and ontologies. Oncology data dictionaries that have been established or are evolving are described. Harmonization of various data dictionaries through representation in Systematized Nomenclature of Medicine-Clinical Terms and hierarchical classification systems within Common Data Models are outlined. Interoperability requires transmission standards that facilitate exchange of data between data sources, registries, and data consumers. While highly structured data capture and representation support semantically appropriate data use, the high degree of effort related to data capture and the accompanying rigidity in the data structure are challenges to implementation. Artificial intelligence may provide alternative paths for the extraction and representation of cancer registry data. Higher-fidelity cancer data and greater interoperability of data combined with data governance will help realize a Learning Health System for oncology, but economic benefits need to be shared to support the infrastructure costs incurred by health care systems.
{"title":"Informatics Perspectives on the National Cancer Policy Forum Workshop \"Enabling 21st Century Applications for Cancer Surveillance Through Enhanced Registries and Beyond\".","authors":"Peter P Yu, W Scott Campbell, Eric B Durbin, Lawrence N Shulman, Jeremy L Warner","doi":"10.1200/CCI-25-00098","DOIUrl":"https://doi.org/10.1200/CCI-25-00098","url":null,"abstract":"<p><p>The National Cancer Policy Forum workshop <i>Enabling 21st Century Applications for Cancer Surveillance Through Enhanced Registries and Beyond</i> examined the current state of cancer registries and how they might evolve to extend registry missions to national health priorities related to improving patient and health economic outcomes, equitable access to care, and improvement in quality of health care and health system operational efficiencies. Session 3 of the workshop focused on medical informatics as a driver of improvement in cancer registry data quality and interoperability. Data quality begins with precision in data definitions as codified in controlled vocabularies and ontologies. Oncology data dictionaries that have been established or are evolving are described. Harmonization of various data dictionaries through representation in Systematized Nomenclature of Medicine-Clinical Terms and hierarchical classification systems within Common Data Models are outlined. Interoperability requires transmission standards that facilitate exchange of data between data sources, registries, and data consumers. While highly structured data capture and representation support semantically appropriate data use, the high degree of effort related to data capture and the accompanying rigidity in the data structure are challenges to implementation. Artificial intelligence may provide alternative paths for the extraction and representation of cancer registry data. Higher-fidelity cancer data and greater interoperability of data combined with data governance will help realize a Learning Health System for oncology, but economic benefits need to be shared to support the infrastructure costs incurred by health care systems.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500098"},"PeriodicalIF":2.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145558418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-11-21DOI: 10.1200/CCI-24-00287
Daniel Kates-Harbeck, Hans Kreipe, Oleg Gluz, Matthias Christgen, Sherko Kuemmel, Monika Graeser, Ulrike Nitz, Sven Mahner, Doris Mayr, Rachel Wuerstlein, Akinori Mitani, Jingbin Zhang, Hans Pinckaers, Gijs Smit, Yi Ren, Songwan Joun, Jacqueline Griffin, Nancy Lin, Felix Feng, Andre Esteva, Ronald Kates, Nadia Harbeck
Purpose: Prognostic assessment in hormone receptor-positive (HR+)/human epidermal growth factor receptor 2-negative (HER2-) early breast cancer (EBC) remains challenging, given relatively low rates of disease progression. Modern artificial intelligence (AI)-based techniques have provided advanced prognostic tools in cancer.
Patients and methods: The Artera multimodal AI (MMAI) platform, using digital histopathology and clinical data, was applied to develop and test a prognostic risk assessment algorithm in HR+/HER2- EBC. Hematoxylin and eosin (H&E) slides from pretreatment breast biopsy and surgical specimens were digitized from the WSG PlanB and ADAPT trials. Patients with available images and complete data (n = 5,259) were stratified by trial, treatment, and distant metastasis (DM) into training (development: 60%) and internal validation (holdout: 40%) cohorts. The algorithm provided prognostic DM risk scores on the basis of image data and clinical variables (age, T and N stages, and tumor size). Univariable and multivariable Fine-Gray models were used to assess performance on the test cohort; subdistribution hazard ratios (sHR) are reported per standard deviation increase of the model scores. Prespecified prognostic subgroups for analysis were defined by nodal status, menopausal status, and tumor grade.
Results: The trained MMAI score was significantly associated with risk of DM in the test cohort (sHR, 2.3 [95% CI, 2.0 to 2.8]) as a whole and across subgroups. The score remained significant (sHR, 2.2 [95% CI, 1.7 to 2.8]) after adjusting for clinical prognostic factors. The MMAI image component alone had significant prognostic value (sHR, 1.6 [95% CI, 1.3 to 1.9]) in the test cohort; it also had significant prognostic value separately within the G2 and G3 subgroups, with sHR of 1.5 per standard deviation increase, and in most of the other predefined clinical subgroups.
Conclusion: MMAI using digital pathology from H&E slides provides enhanced prognostic quality in HR+/HER2- EBC and could help to advance personalized breast cancer management.
{"title":"Multimodal Artificial Intelligence Model From Baseline Histopathology Adds Prognostic Information for Distant Recurrence Assessment in Hormone Receptor-Positive/Human Epidermal Growth Factor Receptor 2-Negative Early Breast Cancer.","authors":"Daniel Kates-Harbeck, Hans Kreipe, Oleg Gluz, Matthias Christgen, Sherko Kuemmel, Monika Graeser, Ulrike Nitz, Sven Mahner, Doris Mayr, Rachel Wuerstlein, Akinori Mitani, Jingbin Zhang, Hans Pinckaers, Gijs Smit, Yi Ren, Songwan Joun, Jacqueline Griffin, Nancy Lin, Felix Feng, Andre Esteva, Ronald Kates, Nadia Harbeck","doi":"10.1200/CCI-24-00287","DOIUrl":"https://doi.org/10.1200/CCI-24-00287","url":null,"abstract":"<p><strong>Purpose: </strong>Prognostic assessment in hormone receptor-positive (HR+)/human epidermal growth factor receptor 2-negative (HER2-) early breast cancer (EBC) remains challenging, given relatively low rates of disease progression. Modern artificial intelligence (AI)-based techniques have provided advanced prognostic tools in cancer.</p><p><strong>Patients and methods: </strong>The Artera multimodal AI (MMAI) platform, using digital histopathology and clinical data, was applied to develop and test a prognostic risk assessment algorithm in HR+/HER2- EBC. Hematoxylin and eosin (H&E) slides from pretreatment breast biopsy and surgical specimens were digitized from the WSG PlanB and ADAPT trials. Patients with available images and complete data (n = 5,259) were stratified by trial, treatment, and distant metastasis (DM) into training (development: 60%) and internal validation (holdout: 40%) cohorts. The algorithm provided prognostic DM risk scores on the basis of image data and clinical variables (age, T and N stages, and tumor size). Univariable and multivariable Fine-Gray models were used to assess performance on the test cohort; subdistribution hazard ratios (sHR) are reported per standard deviation increase of the model scores. Prespecified prognostic subgroups for analysis were defined by nodal status, menopausal status, and tumor grade.</p><p><strong>Results: </strong>The trained MMAI score was significantly associated with risk of DM in the test cohort (sHR, 2.3 [95% CI, 2.0 to 2.8]) as a whole and across subgroups. The score remained significant (sHR, 2.2 [95% CI, 1.7 to 2.8]) after adjusting for clinical prognostic factors. The MMAI image component alone had significant prognostic value (sHR, 1.6 [95% CI, 1.3 to 1.9]) in the test cohort; it also had significant prognostic value separately within the G2 and G3 subgroups, with sHR of 1.5 per standard deviation increase, and in most of the other predefined clinical subgroups.</p><p><strong>Conclusion: </strong>MMAI using digital pathology from H&E slides provides enhanced prognostic quality in HR+/HER2- EBC and could help to advance personalized breast cancer management.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2400287"},"PeriodicalIF":2.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145574675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}