Pub Date : 2025-11-01Epub Date: 2025-11-14DOI: 10.1200/CCI-25-00220
Nikhil Gautam Thaker, Navid Redjal, Adam Dicker, Arturo Loaiza-Bonilla, Trevor Royce, Vivek Subbiah, Vikash Deendyal, Jonathan R Gabriel, Neena Shetty, Ajay Choudhri, Gautam H Thaker
Purpose: Large language models (LLMs) show promise in assisting knowledge-intensive fields such as oncology, where up-to-date information and multidisciplinary expertise are critical. Traditional LLMs risk hallucinations and reliance on static, possibly outdated data that lack domain-specific context. Retrieval-augmented generation (RAG) has emerged as a strategy to address these issues by incorporating domain-specific information from external knowledge repositories.
Methods: We evaluated 15 LLMs, including Meta Llama-2/3, generative pretrained transformer (GPT)-3.5/4/4o variants, Claude-3, Gemini-2.0, and DeepSeek-R1. In a zero-shot workflow, each LLM answered 298 scorable questions from the 2021 American College of Radiology in-training examination. We implemented a RAG pipeline (Iridium Model) that transforms user prompts into vector embeddings, queries a specialized radiation oncology database, and merges relevant text with the original prompt to form an augmented query. We compared zero-shot versus RAG-augmented performance.
Results: Larger-parameter LLMs had higher zero-shot accuracy, with six models outscoring graduating residents (P < .01). Top scorers were reasoning models GPT-4o1, o3-mini, and DeepSeek-R1, which achieved 91.6%, 86.6%, and 91.6% without RAG, respectively. Gemini-2.0 improved 6.7% (to 79.2%), Llama-3-70b 8.4% (to 75.8%), and GPT-4o 5.7% (to 85.6%) with RAG. Top scoring reasoning models surpassed graduating resident averages by 17.7%-20% (P < .01), but had no improvement or detriment with RAG. Domain-specific gains occurred in clinical, biology, and physics. Majority voting boosted aggregate accuracy when individual model performance exceeded 50%. RAG workflows and reasoning models incurred higher computational costs.
Conclusion: Radiation-oncology-specific retrieval-augmented generation pipeline enhances nonreasoning LLM performance in radiation oncology by integrating domain-specific evidence, whereas it does not improve performance of reasoning models. These findings demonstrate that RAG can elevate clinical decision support by enabling simpler, cost-effective nonreasoning models to tackle complex tasks through retrieval capabilities-an efficient alternative to extensive model training that also yields citable, evidence-based explanations.
{"title":"RadOncRAG: A Novel Retrieval-Augmented Generation Framework Improves Large Language Model Benchmark Performance in Radiation Oncology.","authors":"Nikhil Gautam Thaker, Navid Redjal, Adam Dicker, Arturo Loaiza-Bonilla, Trevor Royce, Vivek Subbiah, Vikash Deendyal, Jonathan R Gabriel, Neena Shetty, Ajay Choudhri, Gautam H Thaker","doi":"10.1200/CCI-25-00220","DOIUrl":"https://doi.org/10.1200/CCI-25-00220","url":null,"abstract":"<p><strong>Purpose: </strong>Large language models (LLMs) show promise in assisting knowledge-intensive fields such as oncology, where up-to-date information and multidisciplinary expertise are critical. Traditional LLMs risk hallucinations and reliance on static, possibly outdated data that lack domain-specific context. Retrieval-augmented generation (RAG) has emerged as a strategy to address these issues by incorporating domain-specific information from external knowledge repositories.</p><p><strong>Methods: </strong>We evaluated 15 LLMs, including Meta Llama-2/3, generative pretrained transformer (GPT)-3.5/4/4o variants, Claude-3, Gemini-2.0, and DeepSeek-R1. In a zero-shot workflow, each LLM answered 298 scorable questions from the 2021 American College of Radiology in-training examination. We implemented a RAG pipeline (Iridium Model) that transforms user prompts into vector embeddings, queries a specialized radiation oncology database, and merges relevant text with the original prompt to form an augmented query. We compared zero-shot versus RAG-augmented performance.</p><p><strong>Results: </strong>Larger-parameter LLMs had higher zero-shot accuracy, with six models outscoring graduating residents (<i>P</i> < .01). Top scorers were reasoning models GPT-4o1, o3-mini, and DeepSeek-R1, which achieved 91.6%, 86.6%, and 91.6% without RAG, respectively. Gemini-2.0 improved 6.7% (to 79.2%), Llama-3-70b 8.4% (to 75.8%), and GPT-4o 5.7% (to 85.6%) with RAG. Top scoring reasoning models surpassed graduating resident averages by 17.7%-20% (<i>P</i> < .01), but had no improvement or detriment with RAG. Domain-specific gains occurred in clinical, biology, and physics. Majority voting boosted aggregate accuracy when individual model performance exceeded 50%. RAG workflows and reasoning models incurred higher computational costs.</p><p><strong>Conclusion: </strong>Radiation-oncology-specific retrieval-augmented generation pipeline enhances nonreasoning LLM performance in radiation oncology by integrating domain-specific evidence, whereas it does not improve performance of reasoning models. These findings demonstrate that RAG can elevate clinical decision support by enabling simpler, cost-effective nonreasoning models to tackle complex tasks through retrieval capabilities-an efficient alternative to extensive model training that also yields citable, evidence-based explanations.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500220"},"PeriodicalIF":2.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145524818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose: Family caregivers of patients with terminal cancer need psychospiritual care. The assessment of their psychospiritual distress is challenging. An automated system can be used to detect psychospiritual distress from large medical records in electronic medical records and help health care providers to accurately assess distress. This study aimed to develop an artificial intelligence system that automatically detects the psychological and spiritual distress of the families of patients with terminal cancer from unstructured text data in electronic medical records.
Methods: This retrospective study collected medical records (n = 1,554,736) from 1 month before the participants died. The participants (n = 808) died at Tohoku University Hospital in Japan between January 1, 2018, and December 31, 2019. We randomly selected 10,000 records from physician and nursing records and split the data set into training and testing sets at a ratio of 70:30. We used the area under the receiver operating characteristic curve (AUROC) and precision-recall curve (AUPRC) to evaluate the model performances. We used explain it like I am 5 and identified important expressions for detecting psychospiritual distress.
Results: The model with the highest performance for detecting psychological distress had AUROC and AUPRC values of 0.92 and 0.62, respectively. The model with the highest performance for detecting spiritual distress had values of 0.92 and 0.41, respectively. In psychological distress, the expressions with higher values were anxiety, worry, and tears. In spiritual distress, the expressions with higher values were want, me, and how.
Conclusion: This study showed the application of machine learning models for the detection of psychospiritual distress among family caregivers of patients with terminal cancer from electronic medical records.
{"title":"Artificial Intelligence System for Psychospiritual Distress in Family Caregivers of Patients With Terminal Cancer: A Retrospective Study.","authors":"Kento Masukawa, Ryusho Suzuki, Momoka Tanno, Masaharu Nakayama, Mitsunori Miyashita","doi":"10.1200/CCI-25-00129","DOIUrl":"https://doi.org/10.1200/CCI-25-00129","url":null,"abstract":"<p><strong>Purpose: </strong>Family caregivers of patients with terminal cancer need psychospiritual care. The assessment of their psychospiritual distress is challenging. An automated system can be used to detect psychospiritual distress from large medical records in electronic medical records and help health care providers to accurately assess distress. This study aimed to develop an artificial intelligence system that automatically detects the psychological and spiritual distress of the families of patients with terminal cancer from unstructured text data in electronic medical records.</p><p><strong>Methods: </strong>This retrospective study collected medical records (n = 1,554,736) from 1 month before the participants died. The participants (n = 808) died at Tohoku University Hospital in Japan between January 1, 2018, and December 31, 2019. We randomly selected 10,000 records from physician and nursing records and split the data set into training and testing sets at a ratio of 70:30. We used the area under the receiver operating characteristic curve (AUROC) and precision-recall curve (AUPRC) to evaluate the model performances. We used explain it like I am 5 and identified important expressions for detecting psychospiritual distress.</p><p><strong>Results: </strong>The model with the highest performance for detecting psychological distress had AUROC and AUPRC values of 0.92 and 0.62, respectively. The model with the highest performance for detecting spiritual distress had values of 0.92 and 0.41, respectively. In psychological distress, the expressions with higher values were anxiety, worry, and tears. In spiritual distress, the expressions with higher values were want, me, and how.</p><p><strong>Conclusion: </strong>This study showed the application of machine learning models for the detection of psychospiritual distress among family caregivers of patients with terminal cancer from electronic medical records.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500129"},"PeriodicalIF":2.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145453531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-11-17DOI: 10.1200/CCI-24-00310
Soheil Hashtarkhani, Brianna M White, Benyamin Hoseini, David L Schwartz, Arash Shaban-Nejad
Purpose: Lung cancer (LC) is a leading cause of cancer-related mortality in the United States. Accurate prediction of LC mortality rates is crucial for guiding targeted interventions and addressing health disparities. Although traditional regression-based models have been commonly used, explainable machine learning models may offer enhanced predictive accuracy and deeper insights into the factors influencing LC mortality.
Methods: This study applied three models-random forest (RF), gradient boosting regression (GBR), and linear regression (LR)-to predict county-level LC mortality rates across the United States. Model performance was evaluated using R-squared and root mean squared error (RMSE). Shapley Additive Explanations (SHAP) values were used to determine variable importance and their directional impact. Geographic disparities in LC mortality were analyzed through Getis-Ord (Gi*) hotspot analysis.
Results: The RF model outperformed both GBR and LR, achieving an R2 value of 41.9% and an RMSE of 12.8. SHAP analysis identified smoking rate as the most important predictor, followed by median home value and the percentage of the Hispanic ethnic population. Spatial analysis revealed significant clusters of elevated LC mortality in the mid-eastern counties of the United States.
Conclusion: The RF model demonstrated superior predictive performance for LC mortality rates, emphasizing the critical roles of smoking prevalence, housing values, and the percentage of Hispanic ethnic population. These findings offer valuable actionable insights for designing targeted interventions, promoting screening, and addressing health disparities in regions most affected by LC in the United States.
{"title":"Comparative Evaluation of Explainable Machine Learning Versus Linear Regression for Predicting County-Level Lung Cancer Mortality Rate in the United States.","authors":"Soheil Hashtarkhani, Brianna M White, Benyamin Hoseini, David L Schwartz, Arash Shaban-Nejad","doi":"10.1200/CCI-24-00310","DOIUrl":"10.1200/CCI-24-00310","url":null,"abstract":"<p><strong>Purpose: </strong>Lung cancer (LC) is a leading cause of cancer-related mortality in the United States. Accurate prediction of LC mortality rates is crucial for guiding targeted interventions and addressing health disparities. Although traditional regression-based models have been commonly used, explainable machine learning models may offer enhanced predictive accuracy and deeper insights into the factors influencing LC mortality.</p><p><strong>Methods: </strong>This study applied three models-random forest (RF), gradient boosting regression (GBR), and linear regression (LR)-to predict county-level LC mortality rates across the United States. Model performance was evaluated using R-squared and root mean squared error (RMSE). Shapley Additive Explanations (SHAP) values were used to determine variable importance and their directional impact. Geographic disparities in LC mortality were analyzed through Getis-Ord (Gi*) hotspot analysis.</p><p><strong>Results: </strong>The RF model outperformed both GBR and LR, achieving an <i>R</i><sup>2</sup> value of 41.9% and an RMSE of 12.8. SHAP analysis identified smoking rate as the most important predictor, followed by median home value and the percentage of the Hispanic ethnic population. Spatial analysis revealed significant clusters of elevated LC mortality in the mid-eastern counties of the United States.</p><p><strong>Conclusion: </strong>The RF model demonstrated superior predictive performance for LC mortality rates, emphasizing the critical roles of smoking prevalence, housing values, and the percentage of Hispanic ethnic population. These findings offer valuable actionable insights for designing targeted interventions, promoting screening, and addressing health disparities in regions most affected by LC in the United States.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2400310"},"PeriodicalIF":2.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12643560/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145543792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-11-19DOI: 10.1200/CCI-25-00098
Peter P Yu, W Scott Campbell, Eric B Durbin, Lawrence N Shulman, Jeremy L Warner
The National Cancer Policy Forum workshop Enabling 21st Century Applications for Cancer Surveillance Through Enhanced Registries and Beyond examined the current state of cancer registries and how they might evolve to extend registry missions to national health priorities related to improving patient and health economic outcomes, equitable access to care, and improvement in quality of health care and health system operational efficiencies. Session 3 of the workshop focused on medical informatics as a driver of improvement in cancer registry data quality and interoperability. Data quality begins with precision in data definitions as codified in controlled vocabularies and ontologies. Oncology data dictionaries that have been established or are evolving are described. Harmonization of various data dictionaries through representation in Systematized Nomenclature of Medicine-Clinical Terms and hierarchical classification systems within Common Data Models are outlined. Interoperability requires transmission standards that facilitate exchange of data between data sources, registries, and data consumers. While highly structured data capture and representation support semantically appropriate data use, the high degree of effort related to data capture and the accompanying rigidity in the data structure are challenges to implementation. Artificial intelligence may provide alternative paths for the extraction and representation of cancer registry data. Higher-fidelity cancer data and greater interoperability of data combined with data governance will help realize a Learning Health System for oncology, but economic benefits need to be shared to support the infrastructure costs incurred by health care systems.
{"title":"Informatics Perspectives on the National Cancer Policy Forum Workshop \"Enabling 21st Century Applications for Cancer Surveillance Through Enhanced Registries and Beyond\".","authors":"Peter P Yu, W Scott Campbell, Eric B Durbin, Lawrence N Shulman, Jeremy L Warner","doi":"10.1200/CCI-25-00098","DOIUrl":"https://doi.org/10.1200/CCI-25-00098","url":null,"abstract":"<p><p>The National Cancer Policy Forum workshop <i>Enabling 21st Century Applications for Cancer Surveillance Through Enhanced Registries and Beyond</i> examined the current state of cancer registries and how they might evolve to extend registry missions to national health priorities related to improving patient and health economic outcomes, equitable access to care, and improvement in quality of health care and health system operational efficiencies. Session 3 of the workshop focused on medical informatics as a driver of improvement in cancer registry data quality and interoperability. Data quality begins with precision in data definitions as codified in controlled vocabularies and ontologies. Oncology data dictionaries that have been established or are evolving are described. Harmonization of various data dictionaries through representation in Systematized Nomenclature of Medicine-Clinical Terms and hierarchical classification systems within Common Data Models are outlined. Interoperability requires transmission standards that facilitate exchange of data between data sources, registries, and data consumers. While highly structured data capture and representation support semantically appropriate data use, the high degree of effort related to data capture and the accompanying rigidity in the data structure are challenges to implementation. Artificial intelligence may provide alternative paths for the extraction and representation of cancer registry data. Higher-fidelity cancer data and greater interoperability of data combined with data governance will help realize a Learning Health System for oncology, but economic benefits need to be shared to support the infrastructure costs incurred by health care systems.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500098"},"PeriodicalIF":2.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145558418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-11-21DOI: 10.1200/CCI-24-00287
Daniel Kates-Harbeck, Hans Kreipe, Oleg Gluz, Matthias Christgen, Sherko Kuemmel, Monika Graeser, Ulrike Nitz, Sven Mahner, Doris Mayr, Rachel Wuerstlein, Akinori Mitani, Jingbin Zhang, Hans Pinckaers, Gijs Smit, Yi Ren, Songwan Joun, Jacqueline Griffin, Nancy Lin, Felix Feng, Andre Esteva, Ronald Kates, Nadia Harbeck
Purpose: Prognostic assessment in hormone receptor-positive (HR+)/human epidermal growth factor receptor 2-negative (HER2-) early breast cancer (EBC) remains challenging, given relatively low rates of disease progression. Modern artificial intelligence (AI)-based techniques have provided advanced prognostic tools in cancer.
Patients and methods: The Artera multimodal AI (MMAI) platform, using digital histopathology and clinical data, was applied to develop and test a prognostic risk assessment algorithm in HR+/HER2- EBC. Hematoxylin and eosin (H&E) slides from pretreatment breast biopsy and surgical specimens were digitized from the WSG PlanB and ADAPT trials. Patients with available images and complete data (n = 5,259) were stratified by trial, treatment, and distant metastasis (DM) into training (development: 60%) and internal validation (holdout: 40%) cohorts. The algorithm provided prognostic DM risk scores on the basis of image data and clinical variables (age, T and N stages, and tumor size). Univariable and multivariable Fine-Gray models were used to assess performance on the test cohort; subdistribution hazard ratios (sHR) are reported per standard deviation increase of the model scores. Prespecified prognostic subgroups for analysis were defined by nodal status, menopausal status, and tumor grade.
Results: The trained MMAI score was significantly associated with risk of DM in the test cohort (sHR, 2.3 [95% CI, 2.0 to 2.8]) as a whole and across subgroups. The score remained significant (sHR, 2.2 [95% CI, 1.7 to 2.8]) after adjusting for clinical prognostic factors. The MMAI image component alone had significant prognostic value (sHR, 1.6 [95% CI, 1.3 to 1.9]) in the test cohort; it also had significant prognostic value separately within the G2 and G3 subgroups, with sHR of 1.5 per standard deviation increase, and in most of the other predefined clinical subgroups.
Conclusion: MMAI using digital pathology from H&E slides provides enhanced prognostic quality in HR+/HER2- EBC and could help to advance personalized breast cancer management.
{"title":"Multimodal Artificial Intelligence Model From Baseline Histopathology Adds Prognostic Information for Distant Recurrence Assessment in Hormone Receptor-Positive/Human Epidermal Growth Factor Receptor 2-Negative Early Breast Cancer.","authors":"Daniel Kates-Harbeck, Hans Kreipe, Oleg Gluz, Matthias Christgen, Sherko Kuemmel, Monika Graeser, Ulrike Nitz, Sven Mahner, Doris Mayr, Rachel Wuerstlein, Akinori Mitani, Jingbin Zhang, Hans Pinckaers, Gijs Smit, Yi Ren, Songwan Joun, Jacqueline Griffin, Nancy Lin, Felix Feng, Andre Esteva, Ronald Kates, Nadia Harbeck","doi":"10.1200/CCI-24-00287","DOIUrl":"https://doi.org/10.1200/CCI-24-00287","url":null,"abstract":"<p><strong>Purpose: </strong>Prognostic assessment in hormone receptor-positive (HR+)/human epidermal growth factor receptor 2-negative (HER2-) early breast cancer (EBC) remains challenging, given relatively low rates of disease progression. Modern artificial intelligence (AI)-based techniques have provided advanced prognostic tools in cancer.</p><p><strong>Patients and methods: </strong>The Artera multimodal AI (MMAI) platform, using digital histopathology and clinical data, was applied to develop and test a prognostic risk assessment algorithm in HR+/HER2- EBC. Hematoxylin and eosin (H&E) slides from pretreatment breast biopsy and surgical specimens were digitized from the WSG PlanB and ADAPT trials. Patients with available images and complete data (n = 5,259) were stratified by trial, treatment, and distant metastasis (DM) into training (development: 60%) and internal validation (holdout: 40%) cohorts. The algorithm provided prognostic DM risk scores on the basis of image data and clinical variables (age, T and N stages, and tumor size). Univariable and multivariable Fine-Gray models were used to assess performance on the test cohort; subdistribution hazard ratios (sHR) are reported per standard deviation increase of the model scores. Prespecified prognostic subgroups for analysis were defined by nodal status, menopausal status, and tumor grade.</p><p><strong>Results: </strong>The trained MMAI score was significantly associated with risk of DM in the test cohort (sHR, 2.3 [95% CI, 2.0 to 2.8]) as a whole and across subgroups. The score remained significant (sHR, 2.2 [95% CI, 1.7 to 2.8]) after adjusting for clinical prognostic factors. The MMAI image component alone had significant prognostic value (sHR, 1.6 [95% CI, 1.3 to 1.9]) in the test cohort; it also had significant prognostic value separately within the G2 and G3 subgroups, with sHR of 1.5 per standard deviation increase, and in most of the other predefined clinical subgroups.</p><p><strong>Conclusion: </strong>MMAI using digital pathology from H&E slides provides enhanced prognostic quality in HR+/HER2- EBC and could help to advance personalized breast cancer management.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2400287"},"PeriodicalIF":2.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145574675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-11-17DOI: 10.1200/CCI-25-00140
L Lee Dupuis, Terrence Lo, Martin Yi, Lillian Sung, Mina Tadrous, Cherry Chu
Purpose: Direct pediatric information to inform chemotherapy emetogenicity in pediatric patients is limited. Therefore, the framework for antiemetic selection is uncertain. This study classified the acute emetogenicity of chemotherapy regimens in pediatric patients using data extracted from the electronic health record (EHR).
Methods: This retrospective, single-institution study extracted data from the EHR of patients age 0 to 18 years who received chemotherapy during an inpatient admission from July 1, 2018, through February 29, 2024. Data were organized by patient and chemotherapy block including patient demographics; date, time, and route of chemotherapy and antiemetic administration; and date and time of vomiting. When at least 30 patients received the same chemotherapy and antiemetics during a chemotherapy block, the proportion of chemotherapy blocks where patients experienced complete, partial, or failed chemotherapy-induced vomiting control was determined. Chemotherapy regimen emetogenicity was assigned using a revision of an accepted pediatric chemotherapy emetogenicity classification framework that adjusted for antiemetic administration.
Results: Seven thousand two hundred ninety-six chemotherapy blocks in 1,386 patients were identified. The emetogenicity of 25 chemotherapy regimens was classified: highly (7), moderately (5), low (10), and minimally (3) emetogenic. For 19 of these, no direct pediatric information was previously available. In five, our findings confirm the previous pediatric emetogenicity classification. Relative to emetogenicity classifications for adults, our findings led to classifications that were higher (seven regimens), lower (one regimen), or the same (four regimens).
Conclusion: We have applied a novel method, EHR data extraction, to provide direct pediatric evidence to classify chemotherapy emetogenicity. Increasing the certainty of chemotherapy emetogenicity facilitates effective antiemetic selection for pediatric patients. This method may be applied in multi-institution studies to increase the number of chemotherapy regimens whose emetogenicity is classified using direct pediatric evidence.
{"title":"Using Real-World Data to Determine Acute Chemotherapy Emetogenicity in Pediatric Patients.","authors":"L Lee Dupuis, Terrence Lo, Martin Yi, Lillian Sung, Mina Tadrous, Cherry Chu","doi":"10.1200/CCI-25-00140","DOIUrl":"https://doi.org/10.1200/CCI-25-00140","url":null,"abstract":"<p><strong>Purpose: </strong>Direct pediatric information to inform chemotherapy emetogenicity in pediatric patients is limited. Therefore, the framework for antiemetic selection is uncertain. This study classified the acute emetogenicity of chemotherapy regimens in pediatric patients using data extracted from the electronic health record (EHR).</p><p><strong>Methods: </strong>This retrospective, single-institution study extracted data from the EHR of patients age 0 to 18 years who received chemotherapy during an inpatient admission from July 1, 2018, through February 29, 2024. Data were organized by patient and chemotherapy block including patient demographics; date, time, and route of chemotherapy and antiemetic administration; and date and time of vomiting. When at least 30 patients received the same chemotherapy and antiemetics during a chemotherapy block, the proportion of chemotherapy blocks where patients experienced complete, partial, or failed chemotherapy-induced vomiting control was determined. Chemotherapy regimen emetogenicity was assigned using a revision of an accepted pediatric chemotherapy emetogenicity classification framework that adjusted for antiemetic administration.</p><p><strong>Results: </strong>Seven thousand two hundred ninety-six chemotherapy blocks in 1,386 patients were identified. The emetogenicity of 25 chemotherapy regimens was classified: highly (7), moderately (5), low (10), and minimally (3) emetogenic. For 19 of these, no direct pediatric information was previously available. In five, our findings confirm the previous pediatric emetogenicity classification. Relative to emetogenicity classifications for adults, our findings led to classifications that were higher (seven regimens), lower (one regimen), or the same (four regimens).</p><p><strong>Conclusion: </strong>We have applied a novel method, EHR data extraction, to provide direct pediatric evidence to classify chemotherapy emetogenicity. Increasing the certainty of chemotherapy emetogenicity facilitates effective antiemetic selection for pediatric patients. This method may be applied in multi-institution studies to increase the number of chemotherapy regimens whose emetogenicity is classified using direct pediatric evidence.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500140"},"PeriodicalIF":2.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145543799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-11-19DOI: 10.1200/CCI-25-00152
Ilana Graetz, Sara Arshad, Clara Cai, Samuel Hernandez, Tamar Sapir, Jeffrey Carter, Cherilyn Heggen, Kelly E McKinnon, Freddie Yang, Gelareh Sadigh, Jane Meisel
Purpose: Cyclin-dependent kinase 4 and 6 inhibitors (CDKIs) are effective breast cancer therapies but pose adherence challenges because of cost, side effects, and complexity of medication schedule. We assessed the feasibility and usability of a smart label-enabled remote therapeutic monitoring (RTM) mHealth intervention for women with breast cancer prescribed a CDKI. Exploratory adjusted analyses examined factors associated with usability and CDKI adherence.
Methods: Participants were recruited from a comprehensive cancer center between April and August 2024. For 3 months, participants used Tappt smart labels and web app to record CDKI doses, receive missed dose reminders, report symptoms biweekly, and complete baseline and follow-up surveys. Alerts were sent to oncology teams for nonadherence (>20% missed doses) or moderate-to-severe symptoms. Feasibility was defined as ≥70% of participants using the smart label >30 days and completing the follow-up survey. Usability was assessed using the System Usability Scale, with a benchmark score of ≥68. Linear regression was used to examine factors associated with usability and CDKI adherence.
Results: Among 168 screened, 107 were eligible and reached; 75.7% (81/107) consented; 90.1% (73/81) completed the follow-up survey, and 88.9% (72/81) used the intervention >30 days. Most participants self-identified as White (69.9%), were privately insured (72.6%), and had early-stage breast cancer (58.9%) and depression or anxiety (58.9%). The mean usability score was 75.8; participants who self-identified as Black reported 12.0 points higher usability than those who self-identified as White (P = .03). Mean CDKI adherence was 92.8%. A history of anxiety or depression was associated with an 8.6 percentage-point lower CDKI adherence rate (P = .02).
Conclusion: A smart label-enabled RTM mHealth intervention exceeded feasibility and usability benchmarks and showed promise for supporting CDKI adherence and symptom management.
{"title":"Feasibility of a Smart Label-Enabled Remote Therapeutic Monitoring Intervention to Support Cyclin-Dependent Kinase 4/6 Inhibitor Adherence in Breast Cancer Care.","authors":"Ilana Graetz, Sara Arshad, Clara Cai, Samuel Hernandez, Tamar Sapir, Jeffrey Carter, Cherilyn Heggen, Kelly E McKinnon, Freddie Yang, Gelareh Sadigh, Jane Meisel","doi":"10.1200/CCI-25-00152","DOIUrl":"https://doi.org/10.1200/CCI-25-00152","url":null,"abstract":"<p><strong>Purpose: </strong>Cyclin-dependent kinase 4 and 6 inhibitors (CDKIs) are effective breast cancer therapies but pose adherence challenges because of cost, side effects, and complexity of medication schedule. We assessed the feasibility and usability of a smart label-enabled remote therapeutic monitoring (RTM) mHealth intervention for women with breast cancer prescribed a CDKI. Exploratory adjusted analyses examined factors associated with usability and CDKI adherence.</p><p><strong>Methods: </strong>Participants were recruited from a comprehensive cancer center between April and August 2024. For 3 months, participants used Tappt smart labels and web app to record CDKI doses, receive missed dose reminders, report symptoms biweekly, and complete baseline and follow-up surveys. Alerts were sent to oncology teams for nonadherence (>20% missed doses) or moderate-to-severe symptoms. Feasibility was defined as ≥70% of participants using the smart label >30 days and completing the follow-up survey. Usability was assessed using the System Usability Scale, with a benchmark score of ≥68. Linear regression was used to examine factors associated with usability and CDKI adherence.</p><p><strong>Results: </strong>Among 168 screened, 107 were eligible and reached; 75.7% (81/107) consented; 90.1% (73/81) completed the follow-up survey, and 88.9% (72/81) used the intervention >30 days. Most participants self-identified as White (69.9%), were privately insured (72.6%), and had early-stage breast cancer (58.9%) and depression or anxiety (58.9%). The mean usability score was 75.8; participants who self-identified as Black reported 12.0 points higher usability than those who self-identified as White (<i>P</i> = .03). Mean CDKI adherence was 92.8%. A history of anxiety or depression was associated with an 8.6 percentage-point lower CDKI adherence rate (<i>P</i> = .02).</p><p><strong>Conclusion: </strong>A smart label-enabled RTM mHealth intervention exceeded feasibility and usability benchmarks and showed promise for supporting CDKI adherence and symptom management.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500152"},"PeriodicalIF":2.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145558438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-11-03DOI: 10.1200/CCI-25-00283
A Jay Holmgren
{"title":"Rapid Growth in Patient Portal Messages Underscores the Need for Actionable Paths Forward.","authors":"A Jay Holmgren","doi":"10.1200/CCI-25-00283","DOIUrl":"https://doi.org/10.1200/CCI-25-00283","url":null,"abstract":"","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500283"},"PeriodicalIF":2.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145439863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-11-06DOI: 10.1200/CCI-24-00311
Paul Windisch, Fabio Dennstädt, Julia Weyrich, Christina Schröder, Daniel R Zwahlen, Robert Förster
Purpose: Chain-of-thought prompting is a method to make large language models generate intermediate reasoning steps when solving a complex problem. OpenAI's o1 preview and GPT-5 have been trained to create such a chain of thought internally before giving a response and have been claimed to surpass various benchmarks requiring complex reasoning. The purpose of this study was to evaluate their performance in text mining in oncology.
Methods: Six hundred trials from high-impact medical journals were classified depending on whether they allowed for the inclusion of patients with localized and/or metastatic disease. GPT-4o, o1 preview, and GPT-5 at different reasoning effort settings were instructed to do the same classification based on the publications' abstracts.
Results: For predicting whether patients with localized disease were enrolled, GPT-4o and o1 preview achieved F1 scores of 0.80 (0.76-0.83) and 0.91 (0.89-0.94), respectively. For predicting whether patients with metastatic disease were enrolled, GPT-4o and o1 preview achieved F1 scores of 0.97 (0.95-0.98) and 0.99 (0.99-1.00), respectively. For GPT-5, the F1 scores for predicting the eligibility of patients with localized disease increased from 0.84 to 0.93 and 0.94 with increased reasoning effort. F1 scores for metastatic disease were 0.97, 0.99, and 0.99.
Conclusion: o1 preview outperformed GPT-4o in extracting if people with localized and/or metastatic disease were eligible for a trial from its abstract. GPT-5 at high reasoning effort settings outperformed both GPT-4o and o1 preview, supporting the notion that reasoning models could become the new standard for text mining in medicine.
{"title":"Reasoning Models for Text Mining in Oncology: A Comparison Between o1 Preview, GPT-4o, and GPT-5 at Different Reasoning Levels.","authors":"Paul Windisch, Fabio Dennstädt, Julia Weyrich, Christina Schröder, Daniel R Zwahlen, Robert Förster","doi":"10.1200/CCI-24-00311","DOIUrl":"https://doi.org/10.1200/CCI-24-00311","url":null,"abstract":"<p><strong>Purpose: </strong>Chain-of-thought prompting is a method to make large language models generate intermediate reasoning steps when solving a complex problem. OpenAI's o1 preview and GPT-5 have been trained to create such a chain of thought internally before giving a response and have been claimed to surpass various benchmarks requiring complex reasoning. The purpose of this study was to evaluate their performance in text mining in oncology.</p><p><strong>Methods: </strong>Six hundred trials from high-impact medical journals were classified depending on whether they allowed for the inclusion of patients with localized and/or metastatic disease. GPT-4o, o1 preview, and GPT-5 at different reasoning effort settings were instructed to do the same classification based on the publications' abstracts.</p><p><strong>Results: </strong>For predicting whether patients with localized disease were enrolled, GPT-4o and o1 preview achieved F1 scores of 0.80 (0.76-0.83) and 0.91 (0.89-0.94), respectively. For predicting whether patients with metastatic disease were enrolled, GPT-4o and o1 preview achieved F1 scores of 0.97 (0.95-0.98) and 0.99 (0.99-1.00), respectively. For GPT-5, the F1 scores for predicting the eligibility of patients with localized disease increased from 0.84 to 0.93 and 0.94 with increased reasoning effort. F1 scores for metastatic disease were 0.97, 0.99, and 0.99.</p><p><strong>Conclusion: </strong>o1 preview outperformed GPT-4o in extracting if people with localized and/or metastatic disease were eligible for a trial from its abstract. GPT-5 at high reasoning effort settings outperformed both GPT-4o and o1 preview, supporting the notion that reasoning models could become the new standard for text mining in medicine.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2400311"},"PeriodicalIF":2.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145460668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}