Ji Hyun Chang, Amir Ashraf-Ganjouei, Isabel Friesner, Ryzen Benson, Travis Zack, Sumi Sinha, Jason Chan, Steve Braunstein, Amy Lin, Lisa Singer, Julian C Hong
Purpose: The increasing use of patient portal messages has enhanced patient-provider communication. However, the high volume of these messages has also contributed to physician burnout.
Methods: Patient-generated portal messages sent to a single cancer center from 2011 to 2023 were extracted. BERTopic, a natural language processing topic modeling technique based on large language models, was optimized. For further categorization, the topic words were labeled using GPT-4, followed by review by two oncologists. Uniform Manifold Approximation and Projection was used for dimensionality reduction and visualizing topics. Message volume changes over time were assessed using a Student's t test.
Results: A total of 2,280,851 messages were analyzed. The monthly average number of messages increased from 2,071 in 2012 to 43,430 in 2022 (P < .001). There was a significant rise in message volume after the COVID-19 pandemic, with a posterior probability of a causal effect of 96.4% (P = .04). Scheduling-related messages were the most frequent across departments, whereas symptoms and health concerns were second or third most common topics. In medical oncology and surgical oncology, topics on prescriptions and medications were more common compared with radiation oncology and gynecologic oncology. Despite concurrent institutional changes in self-scheduling systems, scheduling-related messages did not decrease over time.
Conclusion: The substantial increase in patient portal messages, particularly scheduling-related inquiries, underscores the need for streamlined communication to reduce the burden on health care providers. These findings highlight the need for strategies to manage message volume and mitigate physician burnout, laying groundwork for artificial intelligence-driven future triage systems to improve message management and patient care.
{"title":"Unsupervised Large Language Models to Identify Topics in Cancer Center Patient Portal Messages.","authors":"Ji Hyun Chang, Amir Ashraf-Ganjouei, Isabel Friesner, Ryzen Benson, Travis Zack, Sumi Sinha, Jason Chan, Steve Braunstein, Amy Lin, Lisa Singer, Julian C Hong","doi":"10.1200/CCI-25-00102","DOIUrl":"10.1200/CCI-25-00102","url":null,"abstract":"<p><strong>Purpose: </strong>The increasing use of patient portal messages has enhanced patient-provider communication. However, the high volume of these messages has also contributed to physician burnout.</p><p><strong>Methods: </strong>Patient-generated portal messages sent to a single cancer center from 2011 to 2023 were extracted. BERTopic, a natural language processing topic modeling technique based on large language models, was optimized. For further categorization, the topic words were labeled using GPT-4, followed by review by two oncologists. Uniform Manifold Approximation and Projection was used for dimensionality reduction and visualizing topics. Message volume changes over time were assessed using a Student's <i>t</i> test.</p><p><strong>Results: </strong>A total of 2,280,851 messages were analyzed. The monthly average number of messages increased from 2,071 in 2012 to 43,430 in 2022 (<i>P</i> < .001). There was a significant rise in message volume after the COVID-19 pandemic, with a posterior probability of a causal effect of 96.4% (<i>P</i> = .04). Scheduling-related messages were the most frequent across departments, whereas symptoms and health concerns were second or third most common topics. In medical oncology and surgical oncology, topics on prescriptions and medications were more common compared with radiation oncology and gynecologic oncology. Despite concurrent institutional changes in self-scheduling systems, scheduling-related messages did not decrease over time.</p><p><strong>Conclusion: </strong>The substantial increase in patient portal messages, particularly scheduling-related inquiries, underscores the need for streamlined communication to reduce the burden on health care providers. These findings highlight the need for strategies to manage message volume and mitigate physician burnout, laying groundwork for artificial intelligence-driven future triage systems to improve message management and patient care.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500102"},"PeriodicalIF":2.8,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12490804/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145208048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-10-29DOI: 10.1200/CCI-24-00259
Johannes Mammen, Calin-Petru Manta, Sarah Richter, Nora Liebers, Tobias Roider, Felix Czernilofsky, Katharina Kriegsmann, Carsten Müller-Tidow, Michael Hundemer, Sascha Dietrich
Purpose: Flow cytometry is a key diagnostic technique in hematology that provides protein information at a single-cell level. Traditionally interpreted manually in a sequence of two-dimensional plots, automated analysis techniques have grown in significance in both research and clinics improving interrater reliability and speeding up analysis. Published tools usually require a specific diagnostic setup, which hinders widespread implementation.
Methods: In this paper, we present the development of a software package and web app (diagnFlow) for the automated analysis of any in-house clinical flow cytometry data set. We exemplify the application of this classifier and its clinical benefit in lymphoma diagnosis and other settings.
Results: Routine performance for the focused diagnostic task was evaluated in a blinded one-examiner setup. Multiple customary workflows solving the task in an automated manner were designed using diagnFlow. Each workflow could improve on the performance of the manual interpretation. The most easily interpretable and computationally efficient workflow out-performed more complicated approaches and was made available as an easy-to-use web app. Same-sample wet laboratory data further elucidated the biological signal the classifier is based on. The approach made available as a web app was validated in additional data sets outperforming a competition-winning clustering-based approach.
Conclusion: diagnFlow provides a valuable data set-agnostic approach to flow cytometry data sets previously not leveraged for automatic analysis while maintaining interpretability and resource efficiency.
{"title":"Machine Learning Designed for Any Hematologic Flow Cytometry Data Set.","authors":"Johannes Mammen, Calin-Petru Manta, Sarah Richter, Nora Liebers, Tobias Roider, Felix Czernilofsky, Katharina Kriegsmann, Carsten Müller-Tidow, Michael Hundemer, Sascha Dietrich","doi":"10.1200/CCI-24-00259","DOIUrl":"https://doi.org/10.1200/CCI-24-00259","url":null,"abstract":"<p><strong>Purpose: </strong>Flow cytometry is a key diagnostic technique in hematology that provides protein information at a single-cell level. Traditionally interpreted manually in a sequence of two-dimensional plots, automated analysis techniques have grown in significance in both research and clinics improving interrater reliability and speeding up analysis. Published tools usually require a specific diagnostic setup, which hinders widespread implementation.</p><p><strong>Methods: </strong>In this paper, we present the development of a software package and web app (diagnFlow) for the automated analysis of any in-house clinical flow cytometry data set. We exemplify the application of this classifier and its clinical benefit in lymphoma diagnosis and other settings.</p><p><strong>Results: </strong>Routine performance for the focused diagnostic task was evaluated in a blinded one-examiner setup. Multiple customary workflows solving the task in an automated manner were designed using diagnFlow. Each workflow could improve on the performance of the manual interpretation. The most easily interpretable and computationally efficient workflow out-performed more complicated approaches and was made available as an easy-to-use web app. Same-sample wet laboratory data further elucidated the biological signal the classifier is based on. The approach made available as a web app was validated in additional data sets outperforming a competition-winning clustering-based approach.</p><p><strong>Conclusion: </strong>diagnFlow provides a valuable data set-agnostic approach to flow cytometry data sets previously not leveraged for automatic analysis while maintaining interpretability and resource efficiency.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2400259"},"PeriodicalIF":2.8,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145402756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-10-16DOI: 10.1200/CCI-24-00239
Melissa K Greene, Gloria Broadwater, Donna Niedzwiecki, Thomas W LeBlanc, Jessica E Ma, David J Casarett, Brittany A Davidson
Purpose: Goals-of-care (GOC) discussions during advanced serious illness and end-of-life (EOL) care are critical. Institutions are increasingly tracking the frequency and timing of GOC documentation, but large-scale content assessments have been limited. We aimed to use natural language processing (NLP) to assess GOC documentation quality and associations with EOL care for patients with cancer.
Methods: This is a retrospective review of patients at a single US center who died with cancer between 2018 and 2022, and had documented GOC notes in the last 12 months of life. Eight GOC components were identified: current understanding of illness, information preferences, prognostic disclosure, goals, fears, acceptable function, trade-offs, and family involvement. NLP software searched for the aggregate presence of these components at the patient level within extracted GOC notes. We evaluated associations between these eight components and receipt of aggressive EOL care (chemotherapy within 14 days of death, no hospice care, or hospice admission ≤3 days of death).
Results: Two thousand thirty-one patients met inclusion criteria. The most common GOC component addressed was family involvement (75.0%) and the least common was fears (21.1%). Only 5.4% had all eight components documented. More comprehensive GOC notes were associated with lower rates of aggressive EOL care; 73.2% received aggressive care when 0/8 components were documented, compared with 56.8% and 50.3% with six or seven components discussed, respectively. In multivariate logistic regression, GOC components documented (≤6 v ≥7: OR, 2.13; P < .0001) and primary tumor site (lymphoma: OR, 2.86; P < .0001) were independent predictors of aggressive EOL care.
Conclusion: Increasingly comprehensive and higher-quality GOC documentation is associated with a lower likelihood of receiving aggressive EOL care. Opportunities to improve the quality and documentation of GOC conversations may affect EOL care for patients with cancer.
目的:在晚期严重疾病和生命终结(EOL)护理期间,护理目标(GOC)的讨论是至关重要的。机构越来越多地跟踪GOC文件的频率和时间,但大规模的内容评估受到限制。我们的目的是使用自然语言处理(NLP)来评估GOC文件的质量及其与癌症患者EOL护理的关系。方法:这是一项对2018年至2022年期间在美国一个中心死于癌症的患者的回顾性研究,这些患者在生命的最后12个月内记录了GOC记录。确定了八个GOC组成部分:当前对疾病的理解、信息偏好、预后披露、目标、恐惧、可接受功能、权衡和家庭参与。NLP软件在提取的GOC记录中搜索这些成分在患者水平上的总体存在。我们评估了这八个组成部分与接受积极的EOL护理(死亡14天内化疗,无临终关怀,或死亡≤3天的临终关怀入院)之间的关系。结果:231例患者符合纳入标准。最常见的GOC组成部分是家庭参与(75.0%),最不常见的是恐惧(21.1%)。只有5.4%的人记录了所有8个组件。更全面的GOC记录与较低的积极EOL护理率相关;当0/8个成分被记录时,73.2%的患者接受了积极治疗,相比之下,分别有56.8%和50.3%的患者接受了6或7个成分的治疗。在多因素logistic回归中,记录的GOC成分(≤6 v≥7:OR, 2.13; P < 0.0001)和原发肿瘤部位(淋巴瘤:OR, 2.86; P < 0.0001)是积极EOL治疗的独立预测因子。结论:越来越全面和高质量的GOC文件与接受积极EOL治疗的可能性降低有关。改善GOC对话的质量和记录的机会可能会影响癌症患者的EOL护理。
{"title":"Using Natural Language Processing to Assess Goals-of-Care Conversations for Patients With Cancer.","authors":"Melissa K Greene, Gloria Broadwater, Donna Niedzwiecki, Thomas W LeBlanc, Jessica E Ma, David J Casarett, Brittany A Davidson","doi":"10.1200/CCI-24-00239","DOIUrl":"https://doi.org/10.1200/CCI-24-00239","url":null,"abstract":"<p><strong>Purpose: </strong>Goals-of-care (GOC) discussions during advanced serious illness and end-of-life (EOL) care are critical. Institutions are increasingly tracking the frequency and timing of GOC documentation, but large-scale content assessments have been limited. We aimed to use natural language processing (NLP) to assess GOC documentation quality and associations with EOL care for patients with cancer.</p><p><strong>Methods: </strong>This is a retrospective review of patients at a single US center who died with cancer between 2018 and 2022, and had documented GOC notes in the last 12 months of life. Eight GOC components were identified: current understanding of illness, information preferences, prognostic disclosure, goals, fears, acceptable function, trade-offs, and family involvement. NLP software searched for the aggregate presence of these components at the patient level within extracted GOC notes. We evaluated associations between these eight components and receipt of aggressive EOL care (chemotherapy within 14 days of death, no hospice care, or hospice admission ≤3 days of death).</p><p><strong>Results: </strong>Two thousand thirty-one patients met inclusion criteria. The most common GOC component addressed was family involvement (75.0%) and the least common was fears (21.1%). Only 5.4% had all eight components documented. More comprehensive GOC notes were associated with lower rates of aggressive EOL care; 73.2% received aggressive care when 0/8 components were documented, compared with 56.8% and 50.3% with six or seven components discussed, respectively. In multivariate logistic regression, GOC components documented (≤6 <i>v</i> ≥7: OR, 2.13; <i>P</i> < .0001) and primary tumor site (lymphoma: OR, 2.86; <i>P</i> < .0001) were independent predictors of aggressive EOL care.</p><p><strong>Conclusion: </strong>Increasingly comprehensive and higher-quality GOC documentation is associated with a lower likelihood of receiving aggressive EOL care. Opportunities to improve the quality and documentation of GOC conversations may affect EOL care for patients with cancer.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2400239"},"PeriodicalIF":2.8,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145309951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-10-02DOI: 10.1200/CCI-25-00035
Jose Peña, Sebastián Santana, Juan Cristobal Morales, Natalie Pinto, Mariano Suárez, Carola Sánchez, Juan Carlos Opazo, Rodrigo Villarroel, Claudio Montenegro, Bruno Nervi, Richard Weber
Purpose: Lung cancer is a leading cause of death in Chile, where late-stage diagnoses and high mortality rates prevail. Here, we describe the development of OncovigIA, a novel digital tool powered by natural language processing that enhances the identification of potential lung cancer cases by surveilling computed tomography (CT) reports in a large public Hospital in Santiago, Chile.
Materials and methods: We combined natural language processing and large language models with state-of-the-art machine learning techniques and approaches to treat unbalanced data sets and determine the best solution to implement in OncovigIA. Focusing on key sections of the reports and using various machine learning models, including a balanced Random Forest, the tool achieved high performance with 0.90 accuracy and 0.84 F1-score on the test set.
Results: When applied to 13,326 CT chest reports from 2022, it successfully identified 377 CTs of patients with suspected lung cancer previously undetected and not managed by the multidisciplinary local lung cancer team.
Conclusion: This study underscores the potential of artificial intelligence in early cancer detection and highlights the importance of its integration into local health care ecosystems. By promptly increasing the number of patients referred for specialized management, the tool OncovigIA offers a promising path toward improving lung cancer survival rates in Chile and beyond. Moreover, this article provides avenues for its broader implementation, extending it to other cancer types and/or health care-related texts for continuous surveillance, aiming at the early referral and treatment of cancer in low-resource settings.
{"title":"OncovigIA: Artificial Intelligence for Early Lung Cancer Detection and Referral in a Chilean Public Hospital.","authors":"Jose Peña, Sebastián Santana, Juan Cristobal Morales, Natalie Pinto, Mariano Suárez, Carola Sánchez, Juan Carlos Opazo, Rodrigo Villarroel, Claudio Montenegro, Bruno Nervi, Richard Weber","doi":"10.1200/CCI-25-00035","DOIUrl":"https://doi.org/10.1200/CCI-25-00035","url":null,"abstract":"<p><strong>Purpose: </strong>Lung cancer is a leading cause of death in Chile, where late-stage diagnoses and high mortality rates prevail. Here, we describe the development of <i>OncovigIA</i>, a novel digital tool powered by natural language processing that enhances the identification of potential lung cancer cases by surveilling computed tomography (CT) reports in a large public Hospital in Santiago, Chile.</p><p><strong>Materials and methods: </strong>We combined natural language processing and large language models with state-of-the-art machine learning techniques and approaches to treat unbalanced data sets and determine the best solution to implement in <i>OncovigIA</i>. Focusing on key sections of the reports and using various machine learning models, including a balanced Random Forest, the tool achieved high performance with 0.90 accuracy and 0.84 F1-score on the test set.</p><p><strong>Results: </strong>When applied to 13,326 CT chest reports from 2022, it successfully identified 377 CTs of patients with suspected lung cancer previously undetected and not managed by the multidisciplinary local lung cancer team.</p><p><strong>Conclusion: </strong>This study underscores the potential of artificial intelligence in early cancer detection and highlights the importance of its integration into local health care ecosystems. By promptly increasing the number of patients referred for specialized management, the tool <i>OncovigIA</i> offers a promising path toward improving lung cancer survival rates in Chile and beyond. Moreover, this article provides avenues for its broader implementation, extending it to other cancer types and/or health care-related texts for continuous surveillance, aiming at the early referral and treatment of cancer in low-resource settings.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500035"},"PeriodicalIF":2.8,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145214463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-10-17DOI: 10.1200/CCI-25-00151
Jinani C Jayasekera, Oliver W A Wilson, Clyde Schechter, Jennifer L Caswell Jin, Kaitlyn M Wojcik, Nicolien T van Ravesteyn, Jonathan Wall, Jacob Schneider, Lia L D'Addario, Janise M Roh, Swarnavo Sarkar, Lisa Cadmus-Bertram, John P Pierce, Amy Trentham-Dietz, Lawrence H Kushi, Charles E Matthews
Purpose: Clinical guidelines recommend offering individualized physical activity prescriptions to cancer survivors. However, there are limited tools to support individualized physical activity discussions and prescriptions. We developed and validated a simulation model-based tool to estimate individualized survival outcomes for postdiagnosis physical activity among postmenopausal breast cancer survivors.
Methods: We adapted an established simulation modeling approach developed within the Cancer Intervention and Surveillance Modeling Network to estimate breast cancer-specific and all-cause survival associated with postdiagnosis physical activity for 50- to 75-year-old (postmenopausal) women with stage I to III invasive breast cancer. Model estimates were generated for 60,480 subgroups based on age, weight status (BMI), stage, tumor subtype, treatment, aerobic (<30 min/wk [no/minimal], ≥30 to <150 min/wk [insufficient], ≥150 to <300 min/wk [active], ≥300 min/wk [highly active]), and muscle-strengthening (<2 or ≥2 d/wk) activity. The outcomes were 10-year survival and absolute survival benefits for different levels of physical activity by individual characteristics and treatment. Model inputs were derived from trials, cohort studies, registry, and surveillance data. External validation used independent data.
Results: Survival rates and absolute benefits for physical activity varied by age, weight status, stage, tumor subtype, and amount and type of activity. For example, the 10-year breast cancer-specific and all-cause survival for no/minimal activity in a 65- to 69-year-old-woman with stage II, hormone receptor-positive, human epidermal growth factor receptor 2-negative breast cancer with obesity was 79.2% and 72.2%, respectively. Increasing aerobic activity from no/minimal to insufficient activity with <2 d/wk of muscle-strengthening was associated with absolute increases in 10-year breast cancer-specific and all-cause survival by 2.8 and 3.4 percentage points, respectively. The model closely replicated survival rates in independent data.
Conclusion: Simulation model-based estimates could support clinical tools for guideline-recommended individualized discussions and physical activity prescriptions for breast cancer survivors.
{"title":"Development and Validation of a Simulation Model-Based Tool to Support Individualized Physical Activity Discussions and Prescriptions for Breast Cancer Survivors.","authors":"Jinani C Jayasekera, Oliver W A Wilson, Clyde Schechter, Jennifer L Caswell Jin, Kaitlyn M Wojcik, Nicolien T van Ravesteyn, Jonathan Wall, Jacob Schneider, Lia L D'Addario, Janise M Roh, Swarnavo Sarkar, Lisa Cadmus-Bertram, John P Pierce, Amy Trentham-Dietz, Lawrence H Kushi, Charles E Matthews","doi":"10.1200/CCI-25-00151","DOIUrl":"10.1200/CCI-25-00151","url":null,"abstract":"<p><strong>Purpose: </strong>Clinical guidelines recommend offering individualized physical activity prescriptions to cancer survivors. However, there are limited tools to support individualized physical activity discussions and prescriptions. We developed and validated a simulation model-based tool to estimate individualized survival outcomes for postdiagnosis physical activity among postmenopausal breast cancer survivors.</p><p><strong>Methods: </strong>We adapted an established simulation modeling approach developed within the Cancer Intervention and Surveillance Modeling Network to estimate breast cancer-specific and all-cause survival associated with postdiagnosis physical activity for 50- to 75-year-old (postmenopausal) women with stage I to III invasive breast cancer. Model estimates were generated for 60,480 subgroups based on age, weight status (BMI), stage, tumor subtype, treatment, aerobic (<30 min/wk [no/minimal], ≥30 to <150 min/wk [insufficient], ≥150 to <300 min/wk [active], ≥300 min/wk [highly active]), and muscle-strengthening (<2 or ≥2 d/wk) activity. The outcomes were 10-year survival and absolute survival benefits for different levels of physical activity by individual characteristics and treatment. Model inputs were derived from trials, cohort studies, registry, and surveillance data. External validation used independent data.</p><p><strong>Results: </strong>Survival rates and absolute benefits for physical activity varied by age, weight status, stage, tumor subtype, and amount and type of activity. For example, the 10-year breast cancer-specific and all-cause survival for no/minimal activity in a 65- to 69-year-old-woman with stage II, hormone receptor-positive, human epidermal growth factor receptor 2-negative breast cancer with obesity was 79.2% and 72.2%, respectively. Increasing aerobic activity from no/minimal to insufficient activity with <2 d/wk of muscle-strengthening was associated with absolute increases in 10-year breast cancer-specific and all-cause survival by 2.8 and 3.4 percentage points, respectively. The model closely replicated survival rates in independent data.</p><p><strong>Conclusion: </strong>Simulation model-based estimates could support clinical tools for guideline-recommended individualized discussions and physical activity prescriptions for breast cancer survivors.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500151"},"PeriodicalIF":2.8,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12543000/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145314305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-10-14DOI: 10.1200/CCI-25-00108
Yining Pan, Yanfei Wang, Guangyu Wang, Jing Su, Umit Topaloglu, Qianqian Song
Purpose: Cancer remains a leading cause of death worldwide. The growing volume of high-throughput single-cell and spatial transcriptomic data sets-particularly those related to cancer-offers immense opportunities as well as analytical challenges for effective data analysis and interpretation. Large language models (LLMs), pretrained on vast data sets and capable of various biomedical tasks, offer a promising solution. This review explores the application of LLMs in cancer research from both cellular and pathologic perspectives, aiming to showcase their potential in advancing precision oncology.
Materials and methods: We systematically review current LLMs in analyzing single-cell RNA sequencing, spatial transcriptomic, and histology image data, emphasizing their relevance to cancer biology and translational research.
Results: A total of 24 LLMs, published or in preprint between 2022 and 2025, were selected for review. In single-cell transcriptomics, LLMs have primarily been used for cell type annotation, batch integration, and drug-response prediction. In spatial transcriptomics, LLMs support multislide and multimodal spatial data integration, gene expression imputation, niche and region label prediction, spatial domain identification, cell-cell communication inference, and marker gene detection. In computational pathology, LLMs have been applied to cancer subtyping, detection of rare malignancies, genomic mutation prediction, image segmentation, as well as cross-modal retrieval. Despite these advances, many models remain underoptimized for cancer-specific applications, highlighting the need for domain-specific fine-tuning and scalable adaptation strategies.
Conclusion: LLMs have the potential to significantly advance cancer research by providing scalable and effective tools for analyzing and interpreting single-cell, spatial transcriptomic, and pathology data. Future efforts should prioritize tailoring these models to cancer-specific contexts to enhance their utility in uncovering disease mechanisms, identifying biomarkers, and informing therapeutic strategies.
{"title":"Large Language Models for Translational Cancer Informatics.","authors":"Yining Pan, Yanfei Wang, Guangyu Wang, Jing Su, Umit Topaloglu, Qianqian Song","doi":"10.1200/CCI-25-00108","DOIUrl":"10.1200/CCI-25-00108","url":null,"abstract":"<p><strong>Purpose: </strong>Cancer remains a leading cause of death worldwide. The growing volume of high-throughput single-cell and spatial transcriptomic data sets-particularly those related to cancer-offers immense opportunities as well as analytical challenges for effective data analysis and interpretation. Large language models (LLMs), pretrained on vast data sets and capable of various biomedical tasks, offer a promising solution. This review explores the application of LLMs in cancer research from both cellular and pathologic perspectives, aiming to showcase their potential in advancing precision oncology.</p><p><strong>Materials and methods: </strong>We systematically review current LLMs in analyzing single-cell RNA sequencing, spatial transcriptomic, and histology image data, emphasizing their relevance to cancer biology and translational research.</p><p><strong>Results: </strong>A total of 24 LLMs, published or in preprint between 2022 and 2025, were selected for review. In single-cell transcriptomics, LLMs have primarily been used for cell type annotation, batch integration, and drug-response prediction. In spatial transcriptomics, LLMs support multislide and multimodal spatial data integration, gene expression imputation, niche and region label prediction, spatial domain identification, cell-cell communication inference, and marker gene detection. In computational pathology, LLMs have been applied to cancer subtyping, detection of rare malignancies, genomic mutation prediction, image segmentation, as well as cross-modal retrieval. Despite these advances, many models remain underoptimized for cancer-specific applications, highlighting the need for domain-specific fine-tuning and scalable adaptation strategies.</p><p><strong>Conclusion: </strong>LLMs have the potential to significantly advance cancer research by providing scalable and effective tools for analyzing and interpreting single-cell, spatial transcriptomic, and pathology data. Future efforts should prioritize tailoring these models to cancer-specific contexts to enhance their utility in uncovering disease mechanisms, identifying biomarkers, and informing therapeutic strategies.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500108"},"PeriodicalIF":2.8,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145294228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-10-28DOI: 10.1200/CCI-25-00218
Fekede Asefa Kumsa, Christopher L Brett, Soheil Hashtarkhani, Rezaur Rashid, Lokesh Chinthala, Janet A Zink, Robert L Davis, Arash Shaban-Nejad, David L Schwartz
Purpose: Unplanned treatment interruptions represent an important care quality shortfall for patients undergoing cancer radiotherapy. This study aimed to evaluate use of a centralized electronic health record warehouse and large language model-based data preprocessing to facilitate identification of risk factors for radiation therapy interruptions (RTI).
Methods: We analyzed demographic, behavioral, clinical, and neighborhood-level data for 2,130 patients treated with radiotherapy at the University of Tennessee Medical Center in Knoxville. Treatment interruptions were measured as missed days, adjusted for weekends and holidays. Multinomial logistic regression was used to identify factors associated with moderate (2-4 days) and severe (≥5 days) RTI.
Results: Moderate RTI occurred in 15.8% of patients, while 7.7% experienced severe RTI. Moderate delays were associated with genitourinary cancer (adjusted odds ratio (AOR), 3.81; 95% CI, 1.24 to 11.66), prostate cancer (AOR, 2.44; 95% CI, 1.34 to 4.46), and Medicaid coverage (AOR, 2.22; 95% CI, 1.32 to 3.73). Severe RTI was associated with marital status (AOR for divorced or separated patients, 1.86; 95% CI, 1.18 to 2.94), head and neck cancer (AOR, 2.31; 95% CI, 1.10 to 4.87), gynecologic cancer (AOR, 2.97; 95% CI, 1.30 to 6.79), Medicaid insurance (AOR, 3.43; 95% CI, 1.77 to 6.64), daily dose of ≤225 cGy (AOR, 2.55; 95% CI, 1.21 to 5.37), and a total dose of ≥6,000 cGy (AOR, 2.30; 95% CI, 1.09 to 4.88). Severe interruptions were also significantly associated with high neighborhood social vulnerability (AOR, 2.60; 95% CI, 1.32 to 5.09).
Conclusion: Automated data preprocessing permitted efficient identification of treatment course length, marital status, disease site, Medicaid coverage, and socially vulnerable locations as significant factors associated with RTI. These findings underscore the need for data-driven risk assessment and intervention strategies to maintain cancer treatment quality at scale.
{"title":"Leveraging Centralized Health System Data Management and Large Language Model-Based Data Preprocessing to Identify Predictors for Radiation Therapy Interruption.","authors":"Fekede Asefa Kumsa, Christopher L Brett, Soheil Hashtarkhani, Rezaur Rashid, Lokesh Chinthala, Janet A Zink, Robert L Davis, Arash Shaban-Nejad, David L Schwartz","doi":"10.1200/CCI-25-00218","DOIUrl":"10.1200/CCI-25-00218","url":null,"abstract":"<p><strong>Purpose: </strong>Unplanned treatment interruptions represent an important care quality shortfall for patients undergoing cancer radiotherapy. This study aimed to evaluate use of a centralized electronic health record warehouse and large language model-based data preprocessing to facilitate identification of risk factors for radiation therapy interruptions (RTI).</p><p><strong>Methods: </strong>We analyzed demographic, behavioral, clinical, and neighborhood-level data for 2,130 patients treated with radiotherapy at the University of Tennessee Medical Center in Knoxville. Treatment interruptions were measured as missed days, adjusted for weekends and holidays. Multinomial logistic regression was used to identify factors associated with moderate (2-4 days) and severe (≥5 days) RTI.</p><p><strong>Results: </strong>Moderate RTI occurred in 15.8% of patients, while 7.7% experienced severe RTI. Moderate delays were associated with genitourinary cancer (adjusted odds ratio (AOR), 3.81; 95% CI, 1.24 to 11.66), prostate cancer (AOR, 2.44; 95% CI, 1.34 to 4.46), and Medicaid coverage (AOR, 2.22; 95% CI, 1.32 to 3.73). Severe RTI was associated with marital status (AOR for divorced or separated patients, 1.86; 95% CI, 1.18 to 2.94), head and neck cancer (AOR, 2.31; 95% CI, 1.10 to 4.87), gynecologic cancer (AOR, 2.97; 95% CI, 1.30 to 6.79), Medicaid insurance (AOR, 3.43; 95% CI, 1.77 to 6.64), daily dose of ≤225 cGy (AOR, 2.55; 95% CI, 1.21 to 5.37), and a total dose of ≥6,000 cGy (AOR, 2.30; 95% CI, 1.09 to 4.88). Severe interruptions were also significantly associated with high neighborhood social vulnerability (AOR, 2.60; 95% CI, 1.32 to 5.09).</p><p><strong>Conclusion: </strong>Automated data preprocessing permitted efficient identification of treatment course length, marital status, disease site, Medicaid coverage, and socially vulnerable locations as significant factors associated with RTI. These findings underscore the need for data-driven risk assessment and intervention strategies to maintain cancer treatment quality at scale.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500218"},"PeriodicalIF":2.8,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12558007/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145395087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-10-24DOI: 10.1200/CCI-25-00112
Ryzen Benson, Clodagh Kenny, Amir Ashraf Ganjouei, Michelle Zhao, Rami Darawsheh, Alexander Qian, Julian C Hong
Over the past 5 years, large language models (LLMs) have emerged and continued to improve in their generative abilities and are now capable of generating human-understandable text and performing complex data analyses. As these models continue to improve in their capabilities, they are increasingly used to support population oncology, including clinical information extraction, cancer care education, and clinical decision support. This narrative review provides a high-level description of the use of LLMs in cancer with an overview of the current literature, along with research gaps. Despite increasing interest in using LLMs for cancer care, prevention, and research, applied methods in cancer still lag advancements published in the computer science literature. Therefore, we recommend that cancer-focused LLM research and applications better incorporate technical advancements and techniques found in the computer science literature. Additionally, standardized evaluation metrics and approaches need to be better studied and adopted in oncology, along with data governance and computational infrastructure to support state-of-the-art model integration and the use of real-world data. Finally, we describe the need for researchers to incorporate principles and frameworks from implementation and dissemination science to promote LLM-based tool adaptation, effectiveness, fit, and sustainability.
{"title":"Large Language Models in Population Oncology: A Contemporary Review on the Use of Large Language Models to Support Data Collection, Aggregation, and Analysis in Cancer Care and Research.","authors":"Ryzen Benson, Clodagh Kenny, Amir Ashraf Ganjouei, Michelle Zhao, Rami Darawsheh, Alexander Qian, Julian C Hong","doi":"10.1200/CCI-25-00112","DOIUrl":"10.1200/CCI-25-00112","url":null,"abstract":"<p><p>Over the past 5 years, large language models (LLMs) have emerged and continued to improve in their generative abilities and are now capable of generating human-understandable text and performing complex data analyses. As these models continue to improve in their capabilities, they are increasingly used to support population oncology, including clinical information extraction, cancer care education, and clinical decision support. This narrative review provides a high-level description of the use of LLMs in cancer with an overview of the current literature, along with research gaps. Despite increasing interest in using LLMs for cancer care, prevention, and research, applied methods in cancer still lag advancements published in the computer science literature. Therefore, we recommend that cancer-focused LLM research and applications better incorporate technical advancements and techniques found in the computer science literature. Additionally, standardized evaluation metrics and approaches need to be better studied and adopted in oncology, along with data governance and computational infrastructure to support state-of-the-art model integration and the use of real-world data. Finally, we describe the need for researchers to incorporate principles and frameworks from implementation and dissemination science to promote LLM-based tool adaptation, effectiveness, fit, and sustainability.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500112"},"PeriodicalIF":2.8,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12707173/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145369214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-10-08DOI: 10.1200/CCI-25-00238
Katherine E Reeder-Hayes, Emily M Ray, Jennifer Elston Lafata
{"title":"Challenges and Opportunities in Optimizing Social Risk and Need Screening for 21st Century Cancer Care.","authors":"Katherine E Reeder-Hayes, Emily M Ray, Jennifer Elston Lafata","doi":"10.1200/CCI-25-00238","DOIUrl":"10.1200/CCI-25-00238","url":null,"abstract":"","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500238"},"PeriodicalIF":2.8,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12571040/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145253784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-10-21DOI: 10.1200/CCI-25-00144
Thao-Nguyen Pham, Julie Coupey, Samuel Valable, Juliette Thariat
Purpose: Radiotherapy can have an impact on the immune system through lymphocyte depletion. In particular, it can influence tumor control. Accurately predicting radiation-induced lymphopenia (RIL) is key to optimizing treatment strategies. We aimed to evaluate mathematical model structures capable of capturing the kinetics of lymphocyte concentrations after irradiation, with attention to the saturation effect observed during fractionated radiotherapy.
Materials and methods: A meta-analysis of aggregate data from patients with solid tumor treated with specific radiation doses was conducted. We extracted patient and treatment characteristics, lymphocyte counts, including tumor type/location, radiation modality (X-rays v protons), and baseline counts. Various models-including exponential, extended exponential, exponential-quadratic, and saturation models-were tested for their ability to predict lymphocyte kinetics.
Results: Data from 29 studies covering brain, nasopharyngeal, oropharyngeal, esophageal, non-small cell lung, hepatocarcinoma, cervical, pancreatic, rectal cancers, and soft tissue sarcoma were analyzed. Lymphocyte depletion rates varied, with cervical cancer showing the highest reduction, followed by esophageal, hepatocarcinoma, and others. Lymphocyte recovery post-treatment depended heavily on baseline counts and time since radiotherapy completion. Saturation models best fit most cancer types, but for head and neck/central nervous system cancer without nodal involvement, the exponential-quadratic model performed better, reflecting a unique early lymphocyte increase before RIL.
Conclusion: In conclusion, the choice of lymphocyte depletion model should align with cancer type and location. Standardized prospective studies are needed to refine models and enhance radiotherapy strategies.
{"title":"Tumor Site-Specific Radiation-Induced Lymphocyte Depletion Models After Fractionated Radiotherapy: Considerations of Model Structure From an Aggregate Data Meta-Analysis.","authors":"Thao-Nguyen Pham, Julie Coupey, Samuel Valable, Juliette Thariat","doi":"10.1200/CCI-25-00144","DOIUrl":"10.1200/CCI-25-00144","url":null,"abstract":"<p><strong>Purpose: </strong>Radiotherapy can have an impact on the immune system through lymphocyte depletion. In particular, it can influence tumor control. Accurately predicting radiation-induced lymphopenia (RIL) is key to optimizing treatment strategies. We aimed to evaluate mathematical model structures capable of capturing the kinetics of lymphocyte concentrations after irradiation, with attention to the saturation effect observed during fractionated radiotherapy.</p><p><strong>Materials and methods: </strong>A meta-analysis of aggregate data from patients with solid tumor treated with specific radiation doses was conducted. We extracted patient and treatment characteristics, lymphocyte counts, including tumor type/location, radiation modality (X-rays <i>v</i> protons), and baseline counts. Various models-including exponential, extended exponential, exponential-quadratic, and saturation models-were tested for their ability to predict lymphocyte kinetics.</p><p><strong>Results: </strong>Data from 29 studies covering brain, nasopharyngeal, oropharyngeal, esophageal, non-small cell lung, hepatocarcinoma, cervical, pancreatic, rectal cancers, and soft tissue sarcoma were analyzed. Lymphocyte depletion rates varied, with cervical cancer showing the highest reduction, followed by esophageal, hepatocarcinoma, and others. Lymphocyte recovery post-treatment depended heavily on baseline counts and time since radiotherapy completion. Saturation models best fit most cancer types, but for head and neck/central nervous system cancer without nodal involvement, the exponential-quadratic model performed better, reflecting a unique early lymphocyte increase before RIL.</p><p><strong>Conclusion: </strong>In conclusion, the choice of lymphocyte depletion model should align with cancer type and location. Standardized prospective studies are needed to refine models and enhance radiotherapy strategies.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500144"},"PeriodicalIF":2.8,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145349981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}