Pub Date : 2025-10-01Epub Date: 2025-10-16DOI: 10.1200/CCI-24-00254
Susan Chimonas, Charlie White, Kenneth Seier, Fernanda Polubriaginof, Chelsea Michael, Chasity Walters, Allison Lipitz-Snyderman, Gilad Kuperman
Purpose: Access to clinical notes enhances patient engagement and trust, and the 21st Century Cures Act enabled immediate electronic patient access in April 2021. Yet, technological advances may perpetuate disparities, which remain understudied. Understanding whether inequities in note access exist in oncology would highlight challenges around making this foundational health information available to all patients receiving ongoing, complex medical care.
Materials and methods: This study at a high-volume specialty cancer center explored disparities around clinical notes posted to patients' portal accounts from September 1, 2021, to August 31, 2022, and accessed by March 19, 2024. Logistic and Poisson regression were used to identify patient characteristics associated with note access and note opening rates.
Results: The study included 124,554 patients and 815,104 clinical notes, 43.7% of which (356,290) were accessed. Although modest differences in access rates emerged around sex, age, and marital status, larger disparities appeared for ethnicity, race, and language: Black patients (odds ratio [OR], 0.63 [95% CI, 0.60 to 0.66]; P < .001; incidence rate ratio [IRR], 0.74 [95% CI, 0.73 to 0.76]; P < .001), Hispanic patients (OR, 0.85 [95% CI, 0.80 to 0.90]; P < .001; IRR, 0.90 [95% CI, 0.89 to 0.92]; P < .001), and non-English-preferred language speakers (OR, 0.78 [95% CI, 0.72 to 0.84]; P < .001; IRR, 0.82 [95% CI, 0.80 to 0.84]; P < .001) were, respectively, 37%, 15%, and 22% less likely to open at least one note, and opened 26%, 10%, and 18% percent fewer notes, compared with white, non-Hispanic, and English-preferred patients, respectively.
Conclusion: This analysis highlighted disparities, by race, ethnicity, and language, in cancer patients' accessing clinical notes. Tailored interventions are crucial to ensure that diverse groups benefit from digital health care resources.
{"title":"Which Patients With Cancer Access Their Clinical Notes? A Disparities Analysis.","authors":"Susan Chimonas, Charlie White, Kenneth Seier, Fernanda Polubriaginof, Chelsea Michael, Chasity Walters, Allison Lipitz-Snyderman, Gilad Kuperman","doi":"10.1200/CCI-24-00254","DOIUrl":"https://doi.org/10.1200/CCI-24-00254","url":null,"abstract":"<p><strong>Purpose: </strong>Access to clinical notes enhances patient engagement and trust, and the 21st Century Cures Act enabled immediate electronic patient access in April 2021. Yet, technological advances may perpetuate disparities, which remain understudied. Understanding whether inequities in note access exist in oncology would highlight challenges around making this foundational health information available to all patients receiving ongoing, complex medical care.</p><p><strong>Materials and methods: </strong>This study at a high-volume specialty cancer center explored disparities around clinical notes posted to patients' portal accounts from September 1, 2021, to August 31, 2022, and accessed by March 19, 2024. Logistic and Poisson regression were used to identify patient characteristics associated with note access and note opening rates.</p><p><strong>Results: </strong>The study included 124,554 patients and 815,104 clinical notes, 43.7% of which (356,290) were accessed. Although modest differences in access rates emerged around sex, age, and marital status, larger disparities appeared for ethnicity, race, and language: Black patients (odds ratio [OR], 0.63 [95% CI, 0.60 to 0.66]; <i>P</i> < .001; incidence rate ratio [IRR], 0.74 [95% CI, 0.73 to 0.76]; <i>P</i> < .001), Hispanic patients (OR, 0.85 [95% CI, 0.80 to 0.90]; <i>P</i> < .001; IRR, 0.90 [95% CI, 0.89 to 0.92]; <i>P</i> < .001), and non-English-preferred language speakers (OR, 0.78 [95% CI, 0.72 to 0.84]; <i>P</i> < .001; IRR, 0.82 [95% CI, 0.80 to 0.84]; <i>P</i> < .001) were, respectively, 37%, 15%, and 22% less likely to open at least one note, and opened 26%, 10%, and 18% percent fewer notes, compared with white, non-Hispanic, and English-preferred patients, respectively.</p><p><strong>Conclusion: </strong>This analysis highlighted disparities, by race, ethnicity, and language, in cancer patients' accessing clinical notes. Tailored interventions are crucial to ensure that diverse groups benefit from digital health care resources.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2400254"},"PeriodicalIF":2.8,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145310033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-01Epub Date: 2025-10-15DOI: 10.1200/CCI-25-00069
Eric Ababio Anyimadu, Yaohua Wang, Amy C Moreno, Clifton David Fuller, Xinhua Zhang, G Elisabeta Marai, Guadalupe M Canahuate
Purpose: This study aims to improve survival modeling in head and neck cancer (HNC) by integrating patient-reported outcomes (PROs) using dimensionality reduction techniques. PROs capture symptom severity across the treatment timeline and offer key insights for personalized care. However, their high dimensionality poses challenges such as overfitting and computational complexity. This work focuses on transforming and incorporating PRO data to enhance model performance in HNC.
Materials and methods: We analyzed retrospective data of 923 patients with HNC treated at the University of Texas MD Anderson Cancer Center between 2010 and 2021. Baseline clinical data including demographic, treatment, and disease characteristics were used to build a reference survival model. PRO data, capturing symptom ratings, were integrated using dimensionality reduction techniques: principal component analysis (PCA), autoencoders (AEs), and patient clustering. These reduced representations, combined with clinical data, were input into Cox proportional hazards models to predict overall survival (OS) and progression-free survival (PFS). Model performance was assessed using the concordance index, time-dependent AUC, Brier score for calibration, and hazard ratios for predictor significance.
Results: Cox models incorporating PCA and AE outperformed the clinical-only reference model for both OS and PFS. The PCA-based model achieved the highest C-indices (0.74 for OS and 0.64 for PFS), followed by the AE model (0.73 and 0.63) and the clustering model (0.72 and 0.62). Time-dependent AUCs reinforced these results, with PCA showing the highest average AUC over 36 months. All models were well-calibrated, with low Brier scores. Key predictors included age, disease stage, and tumor subsite.
Conclusion: Dimensionality reduction techniques improve survival prediction in patients with HNC by effectively incorporating PRO data, potentially providing greater insights into more personalized treatment strategies.
{"title":"Evaluating Dimensionality Reduction for Patient-Reported Outcome-Based Survival Modeling in Patients With Head and Neck Cancer.","authors":"Eric Ababio Anyimadu, Yaohua Wang, Amy C Moreno, Clifton David Fuller, Xinhua Zhang, G Elisabeta Marai, Guadalupe M Canahuate","doi":"10.1200/CCI-25-00069","DOIUrl":"10.1200/CCI-25-00069","url":null,"abstract":"<p><strong>Purpose: </strong>This study aims to improve survival modeling in head and neck cancer (HNC) by integrating patient-reported outcomes (PROs) using dimensionality reduction techniques. PROs capture symptom severity across the treatment timeline and offer key insights for personalized care. However, their high dimensionality poses challenges such as overfitting and computational complexity. This work focuses on transforming and incorporating PRO data to enhance model performance in HNC.</p><p><strong>Materials and methods: </strong>We analyzed retrospective data of 923 patients with HNC treated at the University of Texas MD Anderson Cancer Center between 2010 and 2021. Baseline clinical data including demographic, treatment, and disease characteristics were used to build a reference survival model. PRO data, capturing symptom ratings, were integrated using dimensionality reduction techniques: principal component analysis (PCA), autoencoders (AEs), and patient clustering. These reduced representations, combined with clinical data, were input into Cox proportional hazards models to predict overall survival (OS) and progression-free survival (PFS). Model performance was assessed using the concordance index, time-dependent AUC, Brier score for calibration, and hazard ratios for predictor significance.</p><p><strong>Results: </strong>Cox models incorporating PCA and AE outperformed the clinical-only reference model for both OS and PFS. The PCA-based model achieved the highest C-indices (0.74 for OS and 0.64 for PFS), followed by the AE model (0.73 and 0.63) and the clustering model (0.72 and 0.62). Time-dependent AUCs reinforced these results, with PCA showing the highest average AUC over 36 months. All models were well-calibrated, with low Brier scores. Key predictors included age, disease stage, and tumor subsite.</p><p><strong>Conclusion: </strong>Dimensionality reduction techniques improve survival prediction in patients with HNC by effectively incorporating PRO data, potentially providing greater insights into more personalized treatment strategies.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500069"},"PeriodicalIF":2.8,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12529990/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145304315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose: Health insurance claims comprising diagnosis and treatment information offer insights into clinical practice and medical care costs. However, inaccurate diagnosis codes listed in claims and the absence of staging information limit the understanding of colorectal cancer (CRC)-related clinical practice. We developed and validated an algorithm to accurately identify incident CRC cases and their progression phases using claims data.
Methods: We conducted a retrospective study using claims data from three Japanese institutions, including two designated cancer care hospitals (DCCHs), between April 2016 and August 2022. An algorithm that uses CRC-associated diagnostic codes and claim codes for CRC-specific treatments was developed to identify incident CRC cases and classify patients into three progression phases (treatment-sequenced groups: endoscopic, surgical, and noncurative). The algorithm was refined using cohorts from two DCCHs in April-September 2017 and April-September 2019 to enhance performance metrics, with validity tested at these hospitals during different periods and at another hospital. The performance metrics of the algorithm included positive predictive value (PPV), sensitivity in identifying incident CRC, and accuracy in determining progression phases.
Results: The performance metrics of the algorithm were enhanced by filtering prevalent cases, selecting CRC-specific treatments, and targeting invasive CRC cases. The algorithm for identifying incident invasive CRC achieved high PPVs (91.2% [95% CI, 89.5 to 92.7] and 94.4% [95% CI, 87.6 to 97.6]), sensitivities (94.6% [95% CI, 93.1 to 95.7] and 100.0% [95% CI, 95.7 to 100.0]), and progression phase accuracies (91.5% [95% CI, 89.7 to 93.0] and 97.6% [95% CI, 91.8 to 99.4]) in two validation cohorts.
Conclusion: The developed algorithm accurately identified incident invasive CRC cases and determined their progression phases using claims data. Application of this algorithm could contribute to research on real-world practices and medical care costs associated with CRC.
{"title":"Development and Validation of a Claims-Based Algorithm for Identifying Incident Colorectal Cancer and Determining Progression Phases.","authors":"Nobukazu Agatsuma, Takahiro Utsumi, Takahiro Inoue, Yukari Tanaka, Yoshitaka Nishikawa, Takahiro Horimatsu, Yuki Nakanishi, Mitsuhiro Nikaido, Takeshi Seta, Nobuaki Hoshino, Yoshimitsu Takahashi, Takeo Nakayama, Hiroshi Seno","doi":"10.1200/CCI-25-00107","DOIUrl":"https://doi.org/10.1200/CCI-25-00107","url":null,"abstract":"<p><strong>Purpose: </strong>Health insurance claims comprising diagnosis and treatment information offer insights into clinical practice and medical care costs. However, inaccurate diagnosis codes listed in claims and the absence of staging information limit the understanding of colorectal cancer (CRC)-related clinical practice. We developed and validated an algorithm to accurately identify incident CRC cases and their progression phases using claims data.</p><p><strong>Methods: </strong>We conducted a retrospective study using claims data from three Japanese institutions, including two designated cancer care hospitals (DCCHs), between April 2016 and August 2022. An algorithm that uses CRC-associated diagnostic codes and claim codes for CRC-specific treatments was developed to identify incident CRC cases and classify patients into three progression phases (treatment-sequenced groups: endoscopic, surgical, and noncurative). The algorithm was refined using cohorts from two DCCHs in April-September 2017 and April-September 2019 to enhance performance metrics, with validity tested at these hospitals during different periods and at another hospital. The performance metrics of the algorithm included positive predictive value (PPV), sensitivity in identifying incident CRC, and accuracy in determining progression phases.</p><p><strong>Results: </strong>The performance metrics of the algorithm were enhanced by filtering prevalent cases, selecting CRC-specific treatments, and targeting invasive CRC cases. The algorithm for identifying incident invasive CRC achieved high PPVs (91.2% [95% CI, 89.5 to 92.7] and 94.4% [95% CI, 87.6 to 97.6]), sensitivities (94.6% [95% CI, 93.1 to 95.7] and 100.0% [95% CI, 95.7 to 100.0]), and progression phase accuracies (91.5% [95% CI, 89.7 to 93.0] and 97.6% [95% CI, 91.8 to 99.4]) in two validation cohorts.</p><p><strong>Conclusion: </strong>The developed algorithm accurately identified incident invasive CRC cases and determined their progression phases using claims data. Application of this algorithm could contribute to research on real-world practices and medical care costs associated with CRC.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500107"},"PeriodicalIF":2.8,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145356784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose: Large language models (LLMs) have demonstrated remarkable versatility in oncology applications, such as cancer staging and survival analysis. Despite their potential, ethical concerns such as data privacy breaches, bias in training data, lack of transparency, and risks associated with erroneous outputs pose significant challenges to their adoption in high-stakes oncology settings. Therefore, we aim to explore the ethical challenges associated with LLM-based applications in oncology and evaluate emerging techniques designed to address these issues.
Methods: Following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses framework, a systematic review was conducted to evaluate publications related to ethical issues of LLMs in oncology across eight academic databases (eg, PubMed, Web of Science, and Embase) between January 1, 2019, and December 31, 2024.
Results: The search retrieved 4,319 published articles, of which 65 publications were preserved and included in our analysis. We identified six prevalent ethical challenges in oncology, including trust, equity, privacy, transparency, nonmaleficence, and accountability. We then evaluated emerging technical solutions to mitigate ethical challenges and summarized evaluation metrics used to assess these solutions' effectiveness.
Conclusion: This review provides actionable recommendations for responsibly deploying LLMs in oncology, ensuring adherence to ethical guidelines, and fostering improved patient outcomes. By bridging technical and clinical perspectives, this review offers a foundational framework for advancing ethical artificial intelligence applications in oncology and highlights areas for future research.
目的:大型语言模型(llm)在肿瘤分期和生存分析等肿瘤学应用中表现出了显著的多功能性。尽管它们具有潜力,但诸如数据隐私泄露、训练数据偏差、缺乏透明度以及与错误输出相关的风险等伦理问题,对它们在高风险肿瘤学环境中的采用构成了重大挑战。因此,我们的目标是探索与基于法学硕士的肿瘤学应用相关的伦理挑战,并评估旨在解决这些问题的新兴技术。方法:根据系统评价和荟萃分析框架的首选报告项目,对2019年1月1日至2024年12月31日期间8个学术数据库(如PubMed、Web of Science和Embase)中与肿瘤学法学硕士伦理问题相关的出版物进行系统评价。结果:检索到4319篇已发表文章,其中65篇被保留并纳入我们的分析。我们确定了肿瘤学中六个普遍的伦理挑战,包括信任、公平、隐私、透明度、非恶意和问责制。然后,我们评估了新兴的技术解决方案,以减轻道德挑战,并总结了用于评估这些解决方案有效性的评估指标。结论:本综述为负责任地在肿瘤学中部署法学硕士提供了可操作的建议,确保遵守伦理准则,并促进改善患者预后。通过连接技术和临床观点,本综述为推进肿瘤伦理人工智能应用提供了一个基础框架,并强调了未来的研究领域。
{"title":"Mitigating Ethical Issues for Large Language Models in Oncology: A Systematic Review.","authors":"Shuang Zhou, Xingyi Liu, Zidu Xu, Zaifu Zhan, Meijia Song, Jun Wang, Shiao Liu, Hua Xu, Rui Zhang","doi":"10.1200/CCI-25-00076","DOIUrl":"10.1200/CCI-25-00076","url":null,"abstract":"<p><strong>Purpose: </strong>Large language models (LLMs) have demonstrated remarkable versatility in oncology applications, such as cancer staging and survival analysis. Despite their potential, ethical concerns such as data privacy breaches, bias in training data, lack of transparency, and risks associated with erroneous outputs pose significant challenges to their adoption in high-stakes oncology settings. Therefore, we aim to explore the ethical challenges associated with LLM-based applications in oncology and evaluate emerging techniques designed to address these issues.</p><p><strong>Methods: </strong>Following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses framework, a systematic review was conducted to evaluate publications related to ethical issues of LLMs in oncology across eight academic databases (eg, PubMed, Web of Science, and Embase) between January 1, 2019, and December 31, 2024.</p><p><strong>Results: </strong>The search retrieved 4,319 published articles, of which 65 publications were preserved and included in our analysis. We identified six prevalent ethical challenges in oncology, including trust, equity, privacy, transparency, nonmaleficence, and accountability. We then evaluated emerging technical solutions to mitigate ethical challenges and summarized evaluation metrics used to assess these solutions' effectiveness.</p><p><strong>Conclusion: </strong>This review provides actionable recommendations for responsibly deploying LLMs in oncology, ensuring adherence to ethical guidelines, and fostering improved patient outcomes. By bridging technical and clinical perspectives, this review offers a foundational framework for advancing ethical artificial intelligence applications in oncology and highlights areas for future research.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500076"},"PeriodicalIF":2.8,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145139554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-09-05DOI: 10.1200/CCI-25-00011
Kun-Han Lu, Sina Mehdinia, Kingson Man, Chi Wah Wong, Allen Mao, Zahra Eftekhari
Purpose: The recent advancements of retrieval-augmented generation (RAG) and large language models (LLMs) have revolutionized the extraction of real-world evidence from unstructured electronic health records (EHRs) in oncology. This study aims to enhance RAG's effectiveness by implementing a retriever encoder specifically designed for oncology EHRs, with the goal of improving the precision and relevance of retrieved clinical notes for oncology-related queries.
Methods: Our model was pretrained with more than six million oncology notes from 209,135 patients at City of Hope. The model was subsequently fine-tuned into a sentence transformer model using 12,371 query-passage training pairs. Specifically, the passages were obtained from actual patient notes, whereas the query was synthesized by an LLM. We evaluated the retrieval performance of our model by comparing it with six widely used embedding models on 50 oncology questions across 10 categories based on Normalized Discounted Cumulative Gain (NDCG), Precision, and Recall.
Results: In our test data set comprising 53 patients, our model exceeded the performance of the runner-up model by 9% for NDCG (evaluated at the top 10 results), 7% for Precision (top 10), and 6% for Recall (top 10). Our model showed exceptional retrieval performance across all metrics for oncology-specific categories, including biomarkers assessed, current diagnosis, disease status, laboratory results, tumor characteristics, and tumor staging.
Conclusion: Our findings highlight the effectiveness of pretrained contextual embeddings and sentence transformers in retrieving pertinent notes from oncology EHRs. The innovative use of LLM-synthesized query-passage pairs for data augmentation was proven to be effective. This fine-tuning approach holds significant promise in specialized fields like health care, where acquiring annotated data is challenging.
{"title":"Enhancing Oncology-Specific Question Answering With Large Language Models Through Fine-Tuned Embeddings With Synthetic Data.","authors":"Kun-Han Lu, Sina Mehdinia, Kingson Man, Chi Wah Wong, Allen Mao, Zahra Eftekhari","doi":"10.1200/CCI-25-00011","DOIUrl":"https://doi.org/10.1200/CCI-25-00011","url":null,"abstract":"<p><strong>Purpose: </strong>The recent advancements of retrieval-augmented generation (RAG) and large language models (LLMs) have revolutionized the extraction of real-world evidence from unstructured electronic health records (EHRs) in oncology. This study aims to enhance RAG's effectiveness by implementing a retriever encoder specifically designed for oncology EHRs, with the goal of improving the precision and relevance of retrieved clinical notes for oncology-related queries.</p><p><strong>Methods: </strong>Our model was pretrained with more than six million oncology notes from 209,135 patients at City of Hope. The model was subsequently fine-tuned into a sentence transformer model using 12,371 query-passage training pairs. Specifically, the passages were obtained from actual patient notes, whereas the query was synthesized by an LLM. We evaluated the retrieval performance of our model by comparing it with six widely used embedding models on 50 oncology questions across 10 categories based on Normalized Discounted Cumulative Gain (NDCG), Precision, and Recall.</p><p><strong>Results: </strong>In our test data set comprising 53 patients, our model exceeded the performance of the runner-up model by 9% for NDCG (evaluated at the top 10 results), 7% for Precision (top 10), and 6% for Recall (top 10). Our model showed exceptional retrieval performance across all metrics for oncology-specific categories, including biomarkers assessed, current diagnosis, disease status, laboratory results, tumor characteristics, and tumor staging.</p><p><strong>Conclusion: </strong>Our findings highlight the effectiveness of pretrained contextual embeddings and sentence transformers in retrieving pertinent notes from oncology EHRs. The innovative use of LLM-synthesized query-passage pairs for data augmentation was proven to be effective. This fine-tuning approach holds significant promise in specialized fields like health care, where acquiring annotated data is challenging.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500011"},"PeriodicalIF":2.8,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145006898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-09-10DOI: 10.1200/CCI-25-00256
Ann M Nguyen, Adriana Waldron-Corredor, Feng-Yi Liu, Xiaoling Yun, Jose Nova, Anita Y Kinney, Joel C Cantor, Jennifer Tsui
{"title":"Erratum: Breast, Cervical, and Colorectal Cancer Screening Among New Jersey Medicaid Enrollees: 2017-2022.","authors":"Ann M Nguyen, Adriana Waldron-Corredor, Feng-Yi Liu, Xiaoling Yun, Jose Nova, Anita Y Kinney, Joel C Cantor, Jennifer Tsui","doi":"10.1200/CCI-25-00256","DOIUrl":"https://doi.org/10.1200/CCI-25-00256","url":null,"abstract":"","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500256"},"PeriodicalIF":2.8,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145031163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-09-22DOI: 10.1200/CCI-25-00130
Nicolas Wagneur, Olivier Capitain, Stéphane Supiot, Florent Le Borgne, François Bocquet, Mario Campone, Tanguy Perennec
Purpose: This study presents a new method based on regular expressions (ReGex) and artificial intelligence for extracting relevant medical data from clinical reports. This hybrid approach is designed to address the limitations of each technique. The pipeline is evaluated for its effectiveness in extracting key clinical information from prostate cancer medical reports.
Methods: We developed a hybrid pipeline that combines ReGex for initial data extraction with a Natural Language Inference model for classification. This approach was retrospectively applied to 1,000 reports randomly selected among all consultation reports of patients with prostate cancer treated at the institute, focusing on identifying key clinical information such as rectal bleeding, dysuria, pollakiuria, and hematuria. The model's performance was evaluated using precision, recall, accuracy, F1-score, and Cohen's kappa coefficient.
Results: The pipeline demonstrated high performance, with precision scores ranging from 0.778 to 0.954 and recall consistently high at 0.920 to 1.00. F1-scores indicated balanced accuracy across symptoms, and Cohen's kappa values (0.871 to 0.951) reflected strong agreement with physician-labeled data.
Conclusion: The proposed pipeline is both efficient and fast while being computationally lightweight. It achieves high accuracy in extracting medical data from clinical reports, making it an effective and practical tool for clinical research and health care applications.
{"title":"Hybrid ReGex and Natural Language Inference Model as a Zero-Shot Classifier for Extracting Data From Medical Reports.","authors":"Nicolas Wagneur, Olivier Capitain, Stéphane Supiot, Florent Le Borgne, François Bocquet, Mario Campone, Tanguy Perennec","doi":"10.1200/CCI-25-00130","DOIUrl":"https://doi.org/10.1200/CCI-25-00130","url":null,"abstract":"<p><strong>Purpose: </strong>This study presents a new method based on regular expressions (ReGex) and artificial intelligence for extracting relevant medical data from clinical reports. This hybrid approach is designed to address the limitations of each technique. The pipeline is evaluated for its effectiveness in extracting key clinical information from prostate cancer medical reports.</p><p><strong>Methods: </strong>We developed a hybrid pipeline that combines ReGex for initial data extraction with a Natural Language Inference model for classification. This approach was retrospectively applied to 1,000 reports randomly selected among all consultation reports of patients with prostate cancer treated at the institute, focusing on identifying key clinical information such as rectal bleeding, dysuria, pollakiuria, and hematuria. The model's performance was evaluated using precision, recall, accuracy, F1-score, and Cohen's kappa coefficient.</p><p><strong>Results: </strong>The pipeline demonstrated high performance, with precision scores ranging from 0.778 to 0.954 and recall consistently high at 0.920 to 1.00. F1-scores indicated balanced accuracy across symptoms, and Cohen's kappa values (0.871 to 0.951) reflected strong agreement with physician-labeled data.</p><p><strong>Conclusion: </strong>The proposed pipeline is both efficient and fast while being computationally lightweight. It achieves high accuracy in extracting medical data from clinical reports, making it an effective and practical tool for clinical research and health care applications.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500130"},"PeriodicalIF":2.8,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145126390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-09-26DOI: 10.1200/CCI-25-00093
Peiling Yu, Weixing Chen, Nan Liu, Yang Yu, Hongyu Guo, Yinan Yuan, Weilin Guo, Yini Alatan, Jinming Zhao, Hongbo Su, Siru Nie, Xiaoyu Cui, Yuan Miao
Purpose: Accurately identifying gene mutations in lung cancer is crucial for treatment, while molecular diagnostic methods are time-consuming and complex. This study aims to develop an advanced deep learning model to address this issue.
Methods: In this study, the ResNeXt101 model framework was established to predict the gene mutation status in lung adenocarcinoma. The model was trained and validated using data from two cohorts: cohort 1, comprising 144 patients from the First Affiliated Hospital of China Medical University, and cohort 2, which includes 69 patients from the The Cancer Genome Atlas-Lung Adenocarcinoma public database. The model was trained and validated on the two data sets, respectively, and they served as external test sets for each other to further verify the performance of the model. Additionally, we tested the trained model on a metastatic cancer data set, which included metastases to organs outside the lungs. The performance of the model was evaluated using the AUC, accuracy, precision, recall, and F1 score.
Results: In cohort 1, the model achieved an AUC ranging from 0.93 to 1. In the external test on cohort 2, it performed well in predicting five of the six genes (AUC = 0.85-1). When tested on the metastatic cancer data set, it successfully predicted mutations of three of the six genes (AUC = 0.72-0.80).
Conclusion: The artificial intelligence model developed in this study has a high accuracy in predicting gene mutations in lung adenocarcinoma, which is conducive to improving the management of patients with lung adenocarcinoma and promoting precision medicine.
{"title":"Artificial Intelligence-Based Model Exploiting Hematoxylin and Eosin Images to Predict Rare Gene Mutations in Patients With Lung Adenocarcinoma.","authors":"Peiling Yu, Weixing Chen, Nan Liu, Yang Yu, Hongyu Guo, Yinan Yuan, Weilin Guo, Yini Alatan, Jinming Zhao, Hongbo Su, Siru Nie, Xiaoyu Cui, Yuan Miao","doi":"10.1200/CCI-25-00093","DOIUrl":"10.1200/CCI-25-00093","url":null,"abstract":"<p><strong>Purpose: </strong>Accurately identifying gene mutations in lung cancer is crucial for treatment, while molecular diagnostic methods are time-consuming and complex. This study aims to develop an advanced deep learning model to address this issue.</p><p><strong>Methods: </strong>In this study, the ResNeXt101 model framework was established to predict the gene mutation status in lung adenocarcinoma. The model was trained and validated using data from two cohorts: cohort 1, comprising 144 patients from the First Affiliated Hospital of China Medical University, and cohort 2, which includes 69 patients from the The Cancer Genome Atlas-Lung Adenocarcinoma public database. The model was trained and validated on the two data sets, respectively, and they served as external test sets for each other to further verify the performance of the model. Additionally, we tested the trained model on a metastatic cancer data set, which included metastases to organs outside the lungs. The performance of the model was evaluated using the AUC, accuracy, precision, recall, and F1 score.</p><p><strong>Results: </strong>In cohort 1, the model achieved an AUC ranging from 0.93 to 1. In the external test on cohort 2, it performed well in predicting five of the six genes (AUC = 0.85-1). When tested on the metastatic cancer data set, it successfully predicted mutations of three of the six genes (AUC = 0.72-0.80).</p><p><strong>Conclusion: </strong>The artificial intelligence model developed in this study has a high accuracy in predicting gene mutations in lung adenocarcinoma, which is conducive to improving the management of patients with lung adenocarcinoma and promoting precision medicine.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500093"},"PeriodicalIF":2.8,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12487657/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145180005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-09-17DOI: 10.1200/CCI-25-00067
Nadia S Siddiqui, Yazan Bouchi, Syed Jawad Hussain Shah, Saeed Alqarni, Suraj Sood, Yugyung Lee, John Park, John Kang
Advancements in oncology are accelerating in the fields of artificial intelligence (AI) and machine learning. The complexity and multidisciplinary nature of oncology necessitate a cautious approach to evaluating AI models. The surge in development of AI tools highlights a need for organized evaluation methods. Currently, widely accepted guidelines are aimed at developers and do not provide necessary technical background for clinicians. Additionally, published guides introducing clinicians to AI in medicine often lack user-friendly evaluation tools or lack specificity to oncology. This paper provides background on model development and proposes a yes/no checklist and questionnaire designed to help oncologists effectively assess AI models. The yes/no checklist is intended to be used as a more efficient scan of whether the model conforms to published best standards. The open-ended questionnaire is intended for a more in-depth survey. The checklist and the questionnaire were developed by clinical and AI researchers. Initial discussions identified broad domains, gradually narrowing to model development points relevant to clinical practice. The development process included two literature searches to align with current best practices. Insights from 24 articles were integrated to refine the questionnaire and the checklist. The developed tools are intended for use by clinicians in the field of oncology looking to evaluate AI models. Cases of four AI applications in oncology are analyzed, demonstrating utility in real-world scenarios and enhancing case-based learning for clinicians. These tools highlight the interdisciplinary nature of effective AI integration in oncology.
{"title":"Clinician's Artificial Intelligence Checklist and Evaluation Questionnaire: Tools for Oncologists to Assess Artificial Intelligence and Machine Learning Models.","authors":"Nadia S Siddiqui, Yazan Bouchi, Syed Jawad Hussain Shah, Saeed Alqarni, Suraj Sood, Yugyung Lee, John Park, John Kang","doi":"10.1200/CCI-25-00067","DOIUrl":"https://doi.org/10.1200/CCI-25-00067","url":null,"abstract":"<p><p>Advancements in oncology are accelerating in the fields of artificial intelligence (AI) and machine learning. The complexity and multidisciplinary nature of oncology necessitate a cautious approach to evaluating AI models. The surge in development of AI tools highlights a need for organized evaluation methods. Currently, widely accepted guidelines are aimed at developers and do not provide necessary technical background for clinicians. Additionally, published guides introducing clinicians to AI in medicine often lack user-friendly evaluation tools or lack specificity to oncology. This paper provides background on model development and proposes a yes/no checklist and questionnaire designed to help oncologists effectively assess AI models. The yes/no checklist is intended to be used as a more efficient scan of whether the model conforms to published best standards. The open-ended questionnaire is intended for a more in-depth survey. The checklist and the questionnaire were developed by clinical and AI researchers. Initial discussions identified broad domains, gradually narrowing to model development points relevant to clinical practice. The development process included two literature searches to align with current best practices. Insights from 24 articles were integrated to refine the questionnaire and the checklist. The developed tools are intended for use by clinicians in the field of oncology looking to evaluate AI models. Cases of four AI applications in oncology are analyzed, demonstrating utility in real-world scenarios and enhancing case-based learning for clinicians. These tools highlight the interdisciplinary nature of effective AI integration in oncology.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500067"},"PeriodicalIF":2.8,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145082288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-09-10DOI: 10.1200/CCI-25-00042
Conner Ganjavi, Ethan Layne, Francesco Cei, Karanvir Gill, Vasileios Magoulianitis, Andre Abreu, Mitchell Goldenberg, Mihir M Desai, Inderbir Gill, Giovanni E Cacciamani
Purpose: To evaluate a generative artificial intelligence (GAI) framework for creating readable lay abstracts and summaries (LASs) of urologic oncology research, while maintaining accuracy, completeness, and clarity, for the purpose of assessing their comprehension and perception among patients and caregivers.
Methods: Forty original abstracts (OAs) on prostate, bladder, kidney, and testis cancers from leading journals were selected. LASs were generated using a free GAI tool, with three versions per abstract for consistency. Readability was compared with OAs using validated metrics. Two independent reviewers assessed accuracy, completeness, and clarity and identified AI hallucinations. A pilot study was conducted with 277 patients and caregivers randomly assigned to receive either OAs or LASs and complete comprehension and perception assessments.
Results: Mean GAI-generated LAS generation time was <10 seconds. Across 600 sections generated, readability and quality metrics were consistent (P > .05). Quality scores ranged from 85% to 100%, with hallucinations in 1% of sections. The best test showed significantly better readability (68.9 v 25.3; P < .001), grade level, and text metrics compared with the OA. Methods sections had slightly lower accuracy (85% v 100%; P = .03) and trifecta achievement (82.5% v 100%; P = .01), but other sections retained high quality (≥92.5%; P > .05). GAI-generated LAS recipients scored significantly better in comprehension and most perception-based questions (P < .001) with LAS being the only consistently significant predictor (P < .001).
Conclusion: GAI-generated LASs for urologic oncology research are highly readable and generally preserve the quality of the OAs. Patients and caregivers demonstrated improved comprehension and more favorable perceptions of LASs compared with OAs. Human oversight remains essential to ensure the accurate, complete, and clear representations of the original research.
目的:评估一种生成式人工智能(GAI)框架,用于创建可读的泌尿外科肿瘤学研究摘要和摘要(LASs),同时保持准确性、完整性和清晰度,目的是评估患者和护理人员对其的理解和感知。方法:选取40篇主要期刊上关于前列腺癌、膀胱癌、肾癌和睾丸癌的原始摘要。LASs是使用免费的GAI工具生成的,为了一致性,每个摘要有三个版本。使用经过验证的指标将可读性与oa进行比较。两名独立评审员评估了准确性、完整性和清晰度,并确定了人工智能幻觉。一项试点研究对277名患者和护理人员进行了随机分配,接受oa或LASs,并完成理解和感知评估。结果:gai生成LAS的平均生成时间P < 0.05)。质量分数从85%到100%不等,有1%的部分出现幻觉。与OA相比,最佳测试显示出更好的可读性(68.9 v 25.3; P < 0.001)、年级水平和文本指标。方法切片准确度略低(85% v 100%, P = 0.03),三联片准确度略低(82.5% v 100%, P = 0.01),但其他切片质量较高(≥92.5%,P = 0.05)。ai生成的LAS接收者在理解和大多数基于感知的问题上得分明显更好(P < .001), LAS是唯一持续显著的预测因子(P < .001)。结论:人工智能生成的用于泌尿肿瘤研究的LASs具有很高的可读性,并且总体上保持了oa的质量。与oa相比,患者和护理人员表现出更好的理解和更有利的认知。人为的监督对于确保原始研究的准确、完整和清晰的表述仍然是必不可少的。
{"title":"Enhancing Readability of Lay Abstracts and Summaries for Urologic Oncology Literature Using Generative Artificial Intelligence: BRIDGE-AI 6 Randomized Controlled Trial.","authors":"Conner Ganjavi, Ethan Layne, Francesco Cei, Karanvir Gill, Vasileios Magoulianitis, Andre Abreu, Mitchell Goldenberg, Mihir M Desai, Inderbir Gill, Giovanni E Cacciamani","doi":"10.1200/CCI-25-00042","DOIUrl":"10.1200/CCI-25-00042","url":null,"abstract":"<p><strong>Purpose: </strong>To evaluate a generative artificial intelligence (GAI) framework for creating readable lay abstracts and summaries (LASs) of urologic oncology research, while maintaining accuracy, completeness, and clarity, for the purpose of assessing their comprehension and perception among patients and caregivers.</p><p><strong>Methods: </strong>Forty original abstracts (OAs) on prostate, bladder, kidney, and testis cancers from leading journals were selected. LASs were generated using a free GAI tool, with three versions per abstract for consistency. Readability was compared with OAs using validated metrics. Two independent reviewers assessed accuracy, completeness, and clarity and identified AI hallucinations. A pilot study was conducted with 277 patients and caregivers randomly assigned to receive either OAs or LASs and complete comprehension and perception assessments.</p><p><strong>Results: </strong>Mean GAI-generated LAS generation time was <10 seconds. Across 600 sections generated, readability and quality metrics were consistent (<i>P</i> > .05). Quality scores ranged from 85% to 100%, with hallucinations in 1% of sections. The best test showed significantly better readability (68.9 <i>v</i> 25.3; <i>P</i> < .001), grade level, and text metrics compared with the OA. Methods sections had slightly lower accuracy (85% <i>v</i> 100%; <i>P</i> = .03) and trifecta achievement (82.5% <i>v</i> 100%; <i>P</i> = .01), but other sections retained high quality (≥92.5%; <i>P</i> > .05). GAI-generated LAS recipients scored significantly better in comprehension and most perception-based questions (<i>P</i> < .001) with LAS being the only consistently significant predictor (<i>P</i> < .001).</p><p><strong>Conclusion: </strong>GAI-generated LASs for urologic oncology research are highly readable and generally preserve the quality of the OAs. Patients and caregivers demonstrated improved comprehension and more favorable perceptions of LASs compared with OAs. Human oversight remains essential to ensure the accurate, complete, and clear representations of the original research.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500042"},"PeriodicalIF":2.8,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145034690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}