JCO Clinical Cancer Informatics最新文献_第6页

RadOncRAG: A Novel Retrieval-Augmented Generation Framework Improves Large Language Model Benchmark Performance in Radiation Oncology. RadOncRAG：一种新的检索增强生成框架，提高了放射肿瘤学中大型语言模型的基准性能。

IF 2.8 Q2 ONCOLOGY

JCO Clinical Cancer Informatics

Pub Date : 2025-11-01 Epub Date: 2025-11-14 DOI: 10.1200/CCI-25-00220

Nikhil Gautam Thaker, Navid Redjal, Adam Dicker, Arturo Loaiza-Bonilla, Trevor Royce, Vivek Subbiah, Vikash Deendyal, Jonathan R Gabriel, Neena Shetty, Ajay Choudhri, Gautam H Thaker

Purpose: Large language models (LLMs) show promise in assisting knowledge-intensive fields such as oncology, where up-to-date information and multidisciplinary expertise are critical. Traditional LLMs risk hallucinations and reliance on static, possibly outdated data that lack domain-specific context. Retrieval-augmented generation (RAG) has emerged as a strategy to address these issues by incorporating domain-specific information from external knowledge repositories.

Methods: We evaluated 15 LLMs, including Meta Llama-2/3, generative pretrained transformer (GPT)-3.5/4/4o variants, Claude-3, Gemini-2.0, and DeepSeek-R1. In a zero-shot workflow, each LLM answered 298 scorable questions from the 2021 American College of Radiology in-training examination. We implemented a RAG pipeline (Iridium Model) that transforms user prompts into vector embeddings, queries a specialized radiation oncology database, and merges relevant text with the original prompt to form an augmented query. We compared zero-shot versus RAG-augmented performance.

Results: Larger-parameter LLMs had higher zero-shot accuracy, with six models outscoring graduating residents (P < .01). Top scorers were reasoning models GPT-4o1, o3-mini, and DeepSeek-R1, which achieved 91.6%, 86.6%, and 91.6% without RAG, respectively. Gemini-2.0 improved 6.7% (to 79.2%), Llama-3-70b 8.4% (to 75.8%), and GPT-4o 5.7% (to 85.6%) with RAG. Top scoring reasoning models surpassed graduating resident averages by 17.7%-20% (P < .01), but had no improvement or detriment with RAG. Domain-specific gains occurred in clinical, biology, and physics. Majority voting boosted aggregate accuracy when individual model performance exceeded 50%. RAG workflows and reasoning models incurred higher computational costs.

Conclusion: Radiation-oncology-specific retrieval-augmented generation pipeline enhances nonreasoning LLM performance in radiation oncology by integrating domain-specific evidence, whereas it does not improve performance of reasoning models. These findings demonstrate that RAG can elevate clinical decision support by enabling simpler, cost-effective nonreasoning models to tackle complex tasks through retrieval capabilities-an efficient alternative to extensive model training that also yields citable, evidence-based explanations.

目的：大型语言模型（llm）在协助肿瘤学等知识密集型领域显示出希望，在这些领域中，最新信息和多学科专业知识至关重要。传统法学硕士有产生幻觉的风险，并且依赖于静态的、可能过时的、缺乏特定领域背景的数据。检索增强生成（retrieve -augmented generation， RAG）作为一种策略出现，通过合并来自外部知识库的特定领域信息来解决这些问题。方法：我们评估了15个llm，包括Meta Llama-2/3、生成式预训练变压器(GPT)-3.5/4/ 40变体、claud -3、Gemini-2.0和DeepSeek-R1。在一个零射击的工作流程中，每个LLM回答了2021年美国放射学院在职考试中的298个可计分问题。我们实现了一个RAG管道（铱模型），它将用户提示转换为矢量嵌入，查询专门的放射肿瘤学数据库，并将相关文本与原始提示合并以形成增强查询。我们比较了零射击和ragar增强性能。结果：大参数LLMs的零射击精度更高，其中6个模型的零射击精度高于毕业居民（P < 0.01）。得分最高的是推理模型gpt - 410、o3-mini和DeepSeek-R1，它们在没有RAG的情况下分别达到了91.6%、86.6%和91.6%。使用RAG后，Gemini-2.0改善了6.7%（至79.2%），Llama-3-70b改善了8.4%（至75.8%），gpt - 40改善了5.7%（至85.6%）。得分最高的推理模型比毕业居民平均水平高出17.7% ~ 20% (P < 0.01)，但对RAG没有改善或损害。特定领域的收益发生在临床、生物学和物理学。当单个模型的性能超过50%时，多数投票提高了总体准确性。RAG工作流和推理模型产生了更高的计算成本。结论：放射肿瘤学特异性检索增强生成管道通过整合特定领域的证据来提高非推理LLM在放射肿瘤学中的性能，而它并没有提高推理模型的性能。这些发现表明，RAG可以提高临床决策支持，使更简单、成本效益高的非推理模型通过检索能力来处理复杂的任务，这是一种有效的替代广泛的模型训练，也可以产生可引用的、基于证据的解释。

{"title":"RadOncRAG: A Novel Retrieval-Augmented Generation Framework Improves Large Language Model Benchmark Performance in Radiation Oncology.","authors":"Nikhil Gautam Thaker, Navid Redjal, Adam Dicker, Arturo Loaiza-Bonilla, Trevor Royce, Vivek Subbiah, Vikash Deendyal, Jonathan R Gabriel, Neena Shetty, Ajay Choudhri, Gautam H Thaker","doi":"10.1200/CCI-25-00220","DOIUrl":"https://doi.org/10.1200/CCI-25-00220","url":null,"abstract":"Purpose: Large language models (LLMs) show promise in assisting knowledge-intensive fields such as oncology, where up-to-date information and multidisciplinary expertise are critical. Traditional LLMs risk hallucinations and reliance on static, possibly outdated data that lack domain-specific context. Retrieval-augmented generation (RAG) has emerged as a strategy to address these issues by incorporating domain-specific information from external knowledge repositories.Methods: We evaluated 15 LLMs, including Meta Llama-2/3, generative pretrained transformer (GPT)-3.5/4/4o variants, Claude-3, Gemini-2.0, and DeepSeek-R1. In a zero-shot workflow, each LLM answered 298 scorable questions from the 2021 American College of Radiology in-training examination. We implemented a RAG pipeline (Iridium Model) that transforms user prompts into vector embeddings, queries a specialized radiation oncology database, and merges relevant text with the original prompt to form an augmented query. We compared zero-shot versus RAG-augmented performance.Results: Larger-parameter LLMs had higher zero-shot accuracy, with six models outscoring graduating residents (P < .01). Top scorers were reasoning models GPT-4o1, o3-mini, and DeepSeek-R1, which achieved 91.6%, 86.6%, and 91.6% without RAG, respectively. Gemini-2.0 improved 6.7% (to 79.2%), Llama-3-70b 8.4% (to 75.8%), and GPT-4o 5.7% (to 85.6%) with RAG. Top scoring reasoning models surpassed graduating resident averages by 17.7%-20% (P < .01), but had no improvement or detriment with RAG. Domain-specific gains occurred in clinical, biology, and physics. Majority voting boosted aggregate accuracy when individual model performance exceeded 50%. RAG workflows and reasoning models incurred higher computational costs.Conclusion: Radiation-oncology-specific retrieval-augmented generation pipeline enhances nonreasoning LLM performance in radiation oncology by integrating domain-specific evidence, whereas it does not improve performance of reasoning models. These findings demonstrate that RAG can elevate clinical decision support by enabling simpler, cost-effective nonreasoning models to tackle complex tasks through retrieval capabilities-an efficient alternative to extensive model training that also yields citable, evidence-based explanations.","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500220"},"PeriodicalIF":2.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145524818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Artificial Intelligence System for Psychospiritual Distress in Family Caregivers of Patients With Terminal Cancer: A Retrospective Study. 人工智能系统对晚期癌症患者家属照顾者心理精神困扰的回顾性研究。

IF 2.8 Q2 ONCOLOGY

JCO Clinical Cancer Informatics

Pub Date : 2025-11-01 Epub Date: 2025-11-05 DOI: 10.1200/CCI-25-00129

Kento Masukawa, Ryusho Suzuki, Momoka Tanno, Masaharu Nakayama, Mitsunori Miyashita

Purpose: Family caregivers of patients with terminal cancer need psychospiritual care. The assessment of their psychospiritual distress is challenging. An automated system can be used to detect psychospiritual distress from large medical records in electronic medical records and help health care providers to accurately assess distress. This study aimed to develop an artificial intelligence system that automatically detects the psychological and spiritual distress of the families of patients with terminal cancer from unstructured text data in electronic medical records.

Methods: This retrospective study collected medical records (n = 1,554,736) from 1 month before the participants died. The participants (n = 808) died at Tohoku University Hospital in Japan between January 1, 2018, and December 31, 2019. We randomly selected 10,000 records from physician and nursing records and split the data set into training and testing sets at a ratio of 70:30. We used the area under the receiver operating characteristic curve (AUROC) and precision-recall curve (AUPRC) to evaluate the model performances. We used explain it like I am 5 and identified important expressions for detecting psychospiritual distress.

Results: The model with the highest performance for detecting psychological distress had AUROC and AUPRC values of 0.92 and 0.62, respectively. The model with the highest performance for detecting spiritual distress had values of 0.92 and 0.41, respectively. In psychological distress, the expressions with higher values were anxiety, worry, and tears. In spiritual distress, the expressions with higher values were want, me, and how.

Conclusion: This study showed the application of machine learning models for the detection of psychospiritual distress among family caregivers of patients with terminal cancer from electronic medical records.

目的：癌症晚期患者的家庭照顾者需要心理关怀。评估他们的精神痛苦是一项挑战。自动化系统可用于检测电子病历中大量医疗记录中的心理精神困扰，并帮助医疗保健提供者准确评估困扰。本研究旨在开发一种人工智能系统，从电子病历中的非结构化文本数据中自动检测癌症晚期患者家属的心理和精神痛苦。方法：本回顾性研究收集了参与者死亡前1个月的医疗记录（n = 1,554,736）。参与者（n = 808）于2018年1月1日至2019年12月31日期间在日本东北大学医院死亡。我们从医生和护理记录中随机选择了10,000条记录，并以70:30的比例将数据集分成训练集和测试集。我们使用接收者工作特征曲线（AUROC）和精确召回率曲线（AUPRC）下的面积来评估模型的性能。我们曾经把它解释成我5岁，并确定了检测心理痛苦的重要表达。结果：AUROC值为0.92，AUPRC值为0.62，对心理困扰的检测效果最好。对精神痛苦的检测效果最好的模型值分别为0.92和0.41。在心理困扰中，焦虑、担忧、泪水的表达值较高。在精神困境中，具有较高价值的表达是“想要”、“我”和“怎样”。结论：本研究展示了机器学习模型在从电子病历中检测晚期癌症患者家属照顾者心理精神困扰中的应用。

{"title":"Artificial Intelligence System for Psychospiritual Distress in Family Caregivers of Patients With Terminal Cancer: A Retrospective Study.","authors":"Kento Masukawa, Ryusho Suzuki, Momoka Tanno, Masaharu Nakayama, Mitsunori Miyashita","doi":"10.1200/CCI-25-00129","DOIUrl":"https://doi.org/10.1200/CCI-25-00129","url":null,"abstract":"Purpose: Family caregivers of patients with terminal cancer need psychospiritual care. The assessment of their psychospiritual distress is challenging. An automated system can be used to detect psychospiritual distress from large medical records in electronic medical records and help health care providers to accurately assess distress. This study aimed to develop an artificial intelligence system that automatically detects the psychological and spiritual distress of the families of patients with terminal cancer from unstructured text data in electronic medical records.Methods: This retrospective study collected medical records (n = 1,554,736) from 1 month before the participants died. The participants (n = 808) died at Tohoku University Hospital in Japan between January 1, 2018, and December 31, 2019. We randomly selected 10,000 records from physician and nursing records and split the data set into training and testing sets at a ratio of 70:30. We used the area under the receiver operating characteristic curve (AUROC) and precision-recall curve (AUPRC) to evaluate the model performances. We used explain it like I am 5 and identified important expressions for detecting psychospiritual distress.Results: The model with the highest performance for detecting psychological distress had AUROC and AUPRC values of 0.92 and 0.62, respectively. The model with the highest performance for detecting spiritual distress had values of 0.92 and 0.41, respectively. In psychological distress, the expressions with higher values were anxiety, worry, and tears. In spiritual distress, the expressions with higher values were want, me, and how.Conclusion: This study showed the application of machine learning models for the detection of psychospiritual distress among family caregivers of patients with terminal cancer from electronic medical records.","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500129"},"PeriodicalIF":2.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145453531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Comparative Evaluation of Explainable Machine Learning Versus Linear Regression for Predicting County-Level Lung Cancer Mortality Rate in the United States. 可解释机器学习与线性回归预测美国县级肺癌死亡率的比较评价

IF 2.8 Q2 ONCOLOGY

JCO Clinical Cancer Informatics

Pub Date : 2025-11-01 Epub Date: 2025-11-17 DOI: 10.1200/CCI-24-00310

Soheil Hashtarkhani, Brianna M White, Benyamin Hoseini, David L Schwartz, Arash Shaban-Nejad

Purpose: Lung cancer (LC) is a leading cause of cancer-related mortality in the United States. Accurate prediction of LC mortality rates is crucial for guiding targeted interventions and addressing health disparities. Although traditional regression-based models have been commonly used, explainable machine learning models may offer enhanced predictive accuracy and deeper insights into the factors influencing LC mortality.

Methods: This study applied three models-random forest (RF), gradient boosting regression (GBR), and linear regression (LR)-to predict county-level LC mortality rates across the United States. Model performance was evaluated using R-squared and root mean squared error (RMSE). Shapley Additive Explanations (SHAP) values were used to determine variable importance and their directional impact. Geographic disparities in LC mortality were analyzed through Getis-Ord (Gi*) hotspot analysis.

Results: The RF model outperformed both GBR and LR, achieving an R² value of 41.9% and an RMSE of 12.8. SHAP analysis identified smoking rate as the most important predictor, followed by median home value and the percentage of the Hispanic ethnic population. Spatial analysis revealed significant clusters of elevated LC mortality in the mid-eastern counties of the United States.

Conclusion: The RF model demonstrated superior predictive performance for LC mortality rates, emphasizing the critical roles of smoking prevalence, housing values, and the percentage of Hispanic ethnic population. These findings offer valuable actionable insights for designing targeted interventions, promoting screening, and addressing health disparities in regions most affected by LC in the United States.

目的：肺癌（LC）是美国癌症相关死亡的主要原因。准确预测低死亡率对于指导有针对性的干预措施和解决健康差距至关重要。尽管传统的基于回归的模型已被广泛使用，但可解释的机器学习模型可以提供更高的预测准确性，并更深入地了解影响LC死亡率的因素。方法：本研究采用随机森林（RF）、梯度增强回归（GBR）和线性回归（LR）三种模型来预测美国县级LC死亡率。使用r平方和均方根误差（RMSE）评估模型性能。Shapley加性解释（SHAP）值用于确定变量的重要性及其方向影响。通过Getis-Ord （Gi*）热点分析分析LC死亡率的地理差异。结果：RF模型优于GBR和LR， R2值为41.9%，RMSE为12.8。SHAP分析发现，吸烟率是最重要的预测因素，其次是房屋价值中位数和西班牙裔人口比例。空间分析显示，美国中东部地区的LC死亡率显著升高。结论：RF模型对LC死亡率的预测表现优异，强调了吸烟率、住房价值和西班牙裔人口比例的关键作用。这些发现为设计有针对性的干预措施、促进筛查和解决美国受LC影响最严重地区的健康差异提供了有价值的可操作见解。

{"title":"Comparative Evaluation of Explainable Machine Learning Versus Linear Regression for Predicting County-Level Lung Cancer Mortality Rate in the United States.","authors":"Soheil Hashtarkhani, Brianna M White, Benyamin Hoseini, David L Schwartz, Arash Shaban-Nejad","doi":"10.1200/CCI-24-00310","DOIUrl":"10.1200/CCI-24-00310","url":null,"abstract":"Purpose: Lung cancer (LC) is a leading cause of cancer-related mortality in the United States. Accurate prediction of LC mortality rates is crucial for guiding targeted interventions and addressing health disparities. Although traditional regression-based models have been commonly used, explainable machine learning models may offer enhanced predictive accuracy and deeper insights into the factors influencing LC mortality.Methods: This study applied three models-random forest (RF), gradient boosting regression (GBR), and linear regression (LR)-to predict county-level LC mortality rates across the United States. Model performance was evaluated using R-squared and root mean squared error (RMSE). Shapley Additive Explanations (SHAP) values were used to determine variable importance and their directional impact. Geographic disparities in LC mortality were analyzed through Getis-Ord (Gi*) hotspot analysis.Results: The RF model outperformed both GBR and LR, achieving an R2 value of 41.9% and an RMSE of 12.8. SHAP analysis identified smoking rate as the most important predictor, followed by median home value and the percentage of the Hispanic ethnic population. Spatial analysis revealed significant clusters of elevated LC mortality in the mid-eastern counties of the United States.Conclusion: The RF model demonstrated superior predictive performance for LC mortality rates, emphasizing the critical roles of smoking prevalence, housing values, and the percentage of Hispanic ethnic population. These findings offer valuable actionable insights for designing targeted interventions, promoting screening, and addressing health disparities in regions most affected by LC in the United States.","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2400310"},"PeriodicalIF":2.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12643560/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145543792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Reimagining Evidence: Artificial Intelligence Synthetic Data Generation for Cancer Research. 重新想象证据：癌症研究的人工智能合成数据生成。

IF 2.8 Q2 ONCOLOGY

JCO Clinical Cancer Informatics

Pub Date : 2025-11-01 Epub Date: 2025-11-14 DOI: 10.1200/CCI-25-00304

Guergana Savova, Shan Chen, Jiarui Yao, Danielle Bitterman

引用次数: 0

Informatics Perspectives on the National Cancer Policy Forum Workshop "Enabling 21st Century Applications for Cancer Surveillance Through Enhanced Registries and Beyond". 国家癌症政策论坛研讨会的信息学观点“通过增强登记及其他方式实现21世纪癌症监测应用”。

IF 2.8 Q2 ONCOLOGY

JCO Clinical Cancer Informatics

Pub Date : 2025-11-01 Epub Date: 2025-11-19 DOI: 10.1200/CCI-25-00098

Peter P Yu, W Scott Campbell, Eric B Durbin, Lawrence N Shulman, Jeremy L Warner

The National Cancer Policy Forum workshop Enabling 21st Century Applications for Cancer Surveillance Through Enhanced Registries and Beyond examined the current state of cancer registries and how they might evolve to extend registry missions to national health priorities related to improving patient and health economic outcomes, equitable access to care, and improvement in quality of health care and health system operational efficiencies. Session 3 of the workshop focused on medical informatics as a driver of improvement in cancer registry data quality and interoperability. Data quality begins with precision in data definitions as codified in controlled vocabularies and ontologies. Oncology data dictionaries that have been established or are evolving are described. Harmonization of various data dictionaries through representation in Systematized Nomenclature of Medicine-Clinical Terms and hierarchical classification systems within Common Data Models are outlined. Interoperability requires transmission standards that facilitate exchange of data between data sources, registries, and data consumers. While highly structured data capture and representation support semantically appropriate data use, the high degree of effort related to data capture and the accompanying rigidity in the data structure are challenges to implementation. Artificial intelligence may provide alternative paths for the extraction and representation of cancer registry data. Higher-fidelity cancer data and greater interoperability of data combined with data governance will help realize a Learning Health System for oncology, but economic benefits need to be shared to support the infrastructure costs incurred by health care systems.

国家癌症政策论坛研讨会“通过加强登记及其他方式实现21世纪癌症监测应用”审查了癌症登记的现状，以及它们如何发展，将登记任务扩展到与改善患者和健康经济结果、公平获得医疗服务、提高医疗质量和卫生系统运营效率相关的国家卫生优先事项。研讨会第3次会议的重点是医疗信息学作为改善癌症登记数据质量和互操作性的驱动因素。数据质量从数据定义的精确度开始，这些数据定义是在受控词汇表和本体中编码的。描述了已经建立或正在发展的肿瘤学数据词典。通过在医学-临床术语系统化命名法和公共数据模型中的分层分类系统中的表示来协调各种数据字典。互操作性需要传输标准来促进数据源、注册中心和数据使用者之间的数据交换。虽然高度结构化的数据捕获和表示支持语义上适当的数据使用，但与数据捕获相关的高度工作以及数据结构中伴随的刚性是实现的挑战。人工智能可以为癌症登记数据的提取和表示提供替代途径。更高保真度的癌症数据和更强的数据互操作性与数据治理相结合，将有助于实现肿瘤学的学习卫生系统，但经济效益需要共享，以支持卫生保健系统所产生的基础设施成本。

{"title":"Informatics Perspectives on the National Cancer Policy Forum Workshop \"Enabling 21st Century Applications for Cancer Surveillance Through Enhanced Registries and Beyond\".","authors":"Peter P Yu, W Scott Campbell, Eric B Durbin, Lawrence N Shulman, Jeremy L Warner","doi":"10.1200/CCI-25-00098","DOIUrl":"https://doi.org/10.1200/CCI-25-00098","url":null,"abstract":"The National Cancer Policy Forum workshop Enabling 21st Century Applications for Cancer Surveillance Through Enhanced Registries and Beyond examined the current state of cancer registries and how they might evolve to extend registry missions to national health priorities related to improving patient and health economic outcomes, equitable access to care, and improvement in quality of health care and health system operational efficiencies. Session 3 of the workshop focused on medical informatics as a driver of improvement in cancer registry data quality and interoperability. Data quality begins with precision in data definitions as codified in controlled vocabularies and ontologies. Oncology data dictionaries that have been established or are evolving are described. Harmonization of various data dictionaries through representation in Systematized Nomenclature of Medicine-Clinical Terms and hierarchical classification systems within Common Data Models are outlined. Interoperability requires transmission standards that facilitate exchange of data between data sources, registries, and data consumers. While highly structured data capture and representation support semantically appropriate data use, the high degree of effort related to data capture and the accompanying rigidity in the data structure are challenges to implementation. Artificial intelligence may provide alternative paths for the extraction and representation of cancer registry data. Higher-fidelity cancer data and greater interoperability of data combined with data governance will help realize a Learning Health System for oncology, but economic benefits need to be shared to support the infrastructure costs incurred by health care systems.","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500098"},"PeriodicalIF":2.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145558418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multimodal Artificial Intelligence Model From Baseline Histopathology Adds Prognostic Information for Distant Recurrence Assessment in Hormone Receptor-Positive/Human Epidermal Growth Factor Receptor 2-Negative Early Breast Cancer. 基于基线组织病理学的多模式人工智能模型为激素受体阳性/人表皮生长因子受体2阴性早期乳腺癌的远处复发评估增加了预后信息。

IF 2.8 Q2 ONCOLOGY

JCO Clinical Cancer Informatics

Pub Date : 2025-11-01 Epub Date: 2025-11-21 DOI: 10.1200/CCI-24-00287

Daniel Kates-Harbeck, Hans Kreipe, Oleg Gluz, Matthias Christgen, Sherko Kuemmel, Monika Graeser, Ulrike Nitz, Sven Mahner, Doris Mayr, Rachel Wuerstlein, Akinori Mitani, Jingbin Zhang, Hans Pinckaers, Gijs Smit, Yi Ren, Songwan Joun, Jacqueline Griffin, Nancy Lin, Felix Feng, Andre Esteva, Ronald Kates, Nadia Harbeck

Purpose: Prognostic assessment in hormone receptor-positive (HR+)/human epidermal growth factor receptor 2-negative (HER2-) early breast cancer (EBC) remains challenging, given relatively low rates of disease progression. Modern artificial intelligence (AI)-based techniques have provided advanced prognostic tools in cancer.

Patients and methods: The Artera multimodal AI (MMAI) platform, using digital histopathology and clinical data, was applied to develop and test a prognostic risk assessment algorithm in HR+/HER2- EBC. Hematoxylin and eosin (H&E) slides from pretreatment breast biopsy and surgical specimens were digitized from the WSG PlanB and ADAPT trials. Patients with available images and complete data (n = 5,259) were stratified by trial, treatment, and distant metastasis (DM) into training (development: 60%) and internal validation (holdout: 40%) cohorts. The algorithm provided prognostic DM risk scores on the basis of image data and clinical variables (age, T and N stages, and tumor size). Univariable and multivariable Fine-Gray models were used to assess performance on the test cohort; subdistribution hazard ratios (sHR) are reported per standard deviation increase of the model scores. Prespecified prognostic subgroups for analysis were defined by nodal status, menopausal status, and tumor grade.

Results: The trained MMAI score was significantly associated with risk of DM in the test cohort (sHR, 2.3 [95% CI, 2.0 to 2.8]) as a whole and across subgroups. The score remained significant (sHR, 2.2 [95% CI, 1.7 to 2.8]) after adjusting for clinical prognostic factors. The MMAI image component alone had significant prognostic value (sHR, 1.6 [95% CI, 1.3 to 1.9]) in the test cohort; it also had significant prognostic value separately within the G2 and G3 subgroups, with sHR of 1.5 per standard deviation increase, and in most of the other predefined clinical subgroups.

Conclusion: MMAI using digital pathology from H&E slides provides enhanced prognostic quality in HR+/HER2- EBC and could help to advance personalized breast cancer management.

目的：由于疾病进展率相对较低，激素受体阳性(HR+)/人表皮生长因子受体2阴性（HER2-）早期乳腺癌（EBC）的预后评估仍然具有挑战性。现代基于人工智能（AI）的技术为癌症提供了先进的预后工具。患者和方法：应用Artera multimodal AI （MMAI）平台，使用数字组织病理学和临床数据，开发和测试HR+/HER2- EBC的预后风险评估算法。从WSG PlanB和ADAPT试验中对预处理乳腺活检和手术标本的苏木精和伊红（H&E）切片进行数字化处理。具有可用图像和完整数据的患者（n = 5259）按试验、治疗和远处转移（DM）分层分为训练（发展：60%）和内部验证（保留：40%）队列。该算法根据图像数据和临床变量（年龄、T和N分期以及肿瘤大小）提供预后DM风险评分。使用单变量和多变量Fine-Gray模型评估测试队列的表现；子分布风险比（sHR）是模型分数每增加一个标准差所报告的。用于分析的预先指定预后亚组由淋巴结状态、绝经状态和肿瘤分级定义。结果：在整个和跨亚组中，训练后的MMAI评分与DM的风险显著相关（sHR, 2.3 [95% CI， 2.0至2.8]）。在调整临床预后因素后，评分仍然显著（sHR, 2.2 [95% CI， 1.7至2.8]）。在测试队列中，仅MMAI图像分量具有显著的预后价值（sHR, 1.6 [95% CI, 1.3 ~ 1.9]）；在G2和G3亚组中也具有显著的预后价值，每增加一个标准差的sHR为1.5，在大多数其他预定义的临床亚组中也是如此。结论：基于H&E载玻片数字病理学的MMAI提高了HR+/HER2- EBC的预后质量，有助于推进乳腺癌的个性化治疗。

{"title":"Multimodal Artificial Intelligence Model From Baseline Histopathology Adds Prognostic Information for Distant Recurrence Assessment in Hormone Receptor-Positive/Human Epidermal Growth Factor Receptor 2-Negative Early Breast Cancer.","authors":"Daniel Kates-Harbeck, Hans Kreipe, Oleg Gluz, Matthias Christgen, Sherko Kuemmel, Monika Graeser, Ulrike Nitz, Sven Mahner, Doris Mayr, Rachel Wuerstlein, Akinori Mitani, Jingbin Zhang, Hans Pinckaers, Gijs Smit, Yi Ren, Songwan Joun, Jacqueline Griffin, Nancy Lin, Felix Feng, Andre Esteva, Ronald Kates, Nadia Harbeck","doi":"10.1200/CCI-24-00287","DOIUrl":"https://doi.org/10.1200/CCI-24-00287","url":null,"abstract":"Purpose: Prognostic assessment in hormone receptor-positive (HR+)/human epidermal growth factor receptor 2-negative (HER2-) early breast cancer (EBC) remains challenging, given relatively low rates of disease progression. Modern artificial intelligence (AI)-based techniques have provided advanced prognostic tools in cancer.Patients and methods: The Artera multimodal AI (MMAI) platform, using digital histopathology and clinical data, was applied to develop and test a prognostic risk assessment algorithm in HR+/HER2- EBC. Hematoxylin and eosin (H&E) slides from pretreatment breast biopsy and surgical specimens were digitized from the WSG PlanB and ADAPT trials. Patients with available images and complete data (n = 5,259) were stratified by trial, treatment, and distant metastasis (DM) into training (development: 60%) and internal validation (holdout: 40%) cohorts. The algorithm provided prognostic DM risk scores on the basis of image data and clinical variables (age, T and N stages, and tumor size). Univariable and multivariable Fine-Gray models were used to assess performance on the test cohort; subdistribution hazard ratios (sHR) are reported per standard deviation increase of the model scores. Prespecified prognostic subgroups for analysis were defined by nodal status, menopausal status, and tumor grade.Results: The trained MMAI score was significantly associated with risk of DM in the test cohort (sHR, 2.3 [95% CI, 2.0 to 2.8]) as a whole and across subgroups. The score remained significant (sHR, 2.2 [95% CI, 1.7 to 2.8]) after adjusting for clinical prognostic factors. The MMAI image component alone had significant prognostic value (sHR, 1.6 [95% CI, 1.3 to 1.9]) in the test cohort; it also had significant prognostic value separately within the G2 and G3 subgroups, with sHR of 1.5 per standard deviation increase, and in most of the other predefined clinical subgroups.Conclusion: MMAI using digital pathology from H&E slides provides enhanced prognostic quality in HR+/HER2- EBC and could help to advance personalized breast cancer management.","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2400287"},"PeriodicalIF":2.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145574675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Using Real-World Data to Determine Acute Chemotherapy Emetogenicity in Pediatric Patients. 使用真实世界数据确定儿科患者的急性化疗致吐性。

IF 2.8 Q2 ONCOLOGY

JCO Clinical Cancer Informatics

Pub Date : 2025-11-01 Epub Date: 2025-11-17 DOI: 10.1200/CCI-25-00140

L Lee Dupuis, Terrence Lo, Martin Yi, Lillian Sung, Mina Tadrous, Cherry Chu

Purpose: Direct pediatric information to inform chemotherapy emetogenicity in pediatric patients is limited. Therefore, the framework for antiemetic selection is uncertain. This study classified the acute emetogenicity of chemotherapy regimens in pediatric patients using data extracted from the electronic health record (EHR).

Methods: This retrospective, single-institution study extracted data from the EHR of patients age 0 to 18 years who received chemotherapy during an inpatient admission from July 1, 2018, through February 29, 2024. Data were organized by patient and chemotherapy block including patient demographics; date, time, and route of chemotherapy and antiemetic administration; and date and time of vomiting. When at least 30 patients received the same chemotherapy and antiemetics during a chemotherapy block, the proportion of chemotherapy blocks where patients experienced complete, partial, or failed chemotherapy-induced vomiting control was determined. Chemotherapy regimen emetogenicity was assigned using a revision of an accepted pediatric chemotherapy emetogenicity classification framework that adjusted for antiemetic administration.

Results: Seven thousand two hundred ninety-six chemotherapy blocks in 1,386 patients were identified. The emetogenicity of 25 chemotherapy regimens was classified: highly (7), moderately (5), low (10), and minimally (3) emetogenic. For 19 of these, no direct pediatric information was previously available. In five, our findings confirm the previous pediatric emetogenicity classification. Relative to emetogenicity classifications for adults, our findings led to classifications that were higher (seven regimens), lower (one regimen), or the same (four regimens).

Conclusion: We have applied a novel method, EHR data extraction, to provide direct pediatric evidence to classify chemotherapy emetogenicity. Increasing the certainty of chemotherapy emetogenicity facilitates effective antiemetic selection for pediatric patients. This method may be applied in multi-institution studies to increase the number of chemotherapy regimens whose emetogenicity is classified using direct pediatric evidence.

目的：直接的儿科信息告知儿科患者化疗致吐性是有限的。因此，止吐选择的框架是不确定的。本研究使用从电子健康记录（EHR）中提取的数据对儿科患者化疗方案的急性致吐性进行分类。方法：这项回顾性的单机构研究从2018年7月1日至2024年2月29日住院期间接受化疗的0至18岁患者的电子病历中提取数据。数据按患者和化疗区进行整理，包括患者人口统计学；化疗和止吐的日期、时间和途径；还有呕吐的日期和时间。当至少30名患者在化疗期间接受相同的化疗和止吐药时，确定患者经历完全，部分或化疗诱导呕吐控制失败的化疗块的比例。化疗方案的致吐性是根据对公认的儿科化疗致吐性分类框架的修订进行的，该框架调整了止吐药的使用。结果：在1386例患者中确定了7296个化疗区。25种化疗方案的致吐性分为：高致吐性(7)、中等致吐性(5)、低致吐性（10）和最低致吐性(3)。其中19个病例以前没有直接的儿科信息。第五，我们的发现证实了以前的儿童致吐性分类。相对于成人致泻性分类，我们的研究结果导致分类较高（7种方案），较低（1种方案）或相同（4种方案）。结论：我们采用了一种新颖的方法，电子病历数据提取，为化疗致吐性分类提供了直接的儿科证据。提高化疗致吐性的确定性有助于儿科患者有效地选择止吐药。该方法可应用于多机构研究，以增加使用直接儿科证据分类致吐性的化疗方案的数量。

{"title":"Using Real-World Data to Determine Acute Chemotherapy Emetogenicity in Pediatric Patients.","authors":"L Lee Dupuis, Terrence Lo, Martin Yi, Lillian Sung, Mina Tadrous, Cherry Chu","doi":"10.1200/CCI-25-00140","DOIUrl":"https://doi.org/10.1200/CCI-25-00140","url":null,"abstract":"Purpose: Direct pediatric information to inform chemotherapy emetogenicity in pediatric patients is limited. Therefore, the framework for antiemetic selection is uncertain. This study classified the acute emetogenicity of chemotherapy regimens in pediatric patients using data extracted from the electronic health record (EHR).Methods: This retrospective, single-institution study extracted data from the EHR of patients age 0 to 18 years who received chemotherapy during an inpatient admission from July 1, 2018, through February 29, 2024. Data were organized by patient and chemotherapy block including patient demographics; date, time, and route of chemotherapy and antiemetic administration; and date and time of vomiting. When at least 30 patients received the same chemotherapy and antiemetics during a chemotherapy block, the proportion of chemotherapy blocks where patients experienced complete, partial, or failed chemotherapy-induced vomiting control was determined. Chemotherapy regimen emetogenicity was assigned using a revision of an accepted pediatric chemotherapy emetogenicity classification framework that adjusted for antiemetic administration.Results: Seven thousand two hundred ninety-six chemotherapy blocks in 1,386 patients were identified. The emetogenicity of 25 chemotherapy regimens was classified: highly (7), moderately (5), low (10), and minimally (3) emetogenic. For 19 of these, no direct pediatric information was previously available. In five, our findings confirm the previous pediatric emetogenicity classification. Relative to emetogenicity classifications for adults, our findings led to classifications that were higher (seven regimens), lower (one regimen), or the same (four regimens).Conclusion: We have applied a novel method, EHR data extraction, to provide direct pediatric evidence to classify chemotherapy emetogenicity. Increasing the certainty of chemotherapy emetogenicity facilitates effective antiemetic selection for pediatric patients. This method may be applied in multi-institution studies to increase the number of chemotherapy regimens whose emetogenicity is classified using direct pediatric evidence.","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500140"},"PeriodicalIF":2.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145543799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Feasibility of a Smart Label-Enabled Remote Therapeutic Monitoring Intervention to Support Cyclin-Dependent Kinase 4/6 Inhibitor Adherence in Breast Cancer Care. 支持周期蛋白依赖性激酶4/6抑制剂在乳腺癌护理中的依从性的智能标签远程治疗监测干预的可行性

IF 2.8 Q2 ONCOLOGY

JCO Clinical Cancer Informatics

Pub Date : 2025-11-01 Epub Date: 2025-11-19 DOI: 10.1200/CCI-25-00152

Ilana Graetz, Sara Arshad, Clara Cai, Samuel Hernandez, Tamar Sapir, Jeffrey Carter, Cherilyn Heggen, Kelly E McKinnon, Freddie Yang, Gelareh Sadigh, Jane Meisel

Purpose: Cyclin-dependent kinase 4 and 6 inhibitors (CDKIs) are effective breast cancer therapies but pose adherence challenges because of cost, side effects, and complexity of medication schedule. We assessed the feasibility and usability of a smart label-enabled remote therapeutic monitoring (RTM) mHealth intervention for women with breast cancer prescribed a CDKI. Exploratory adjusted analyses examined factors associated with usability and CDKI adherence.

Methods: Participants were recruited from a comprehensive cancer center between April and August 2024. For 3 months, participants used Tappt smart labels and web app to record CDKI doses, receive missed dose reminders, report symptoms biweekly, and complete baseline and follow-up surveys. Alerts were sent to oncology teams for nonadherence (>20% missed doses) or moderate-to-severe symptoms. Feasibility was defined as ≥70% of participants using the smart label >30 days and completing the follow-up survey. Usability was assessed using the System Usability Scale, with a benchmark score of ≥68. Linear regression was used to examine factors associated with usability and CDKI adherence.

Results: Among 168 screened, 107 were eligible and reached; 75.7% (81/107) consented; 90.1% (73/81) completed the follow-up survey, and 88.9% (72/81) used the intervention >30 days. Most participants self-identified as White (69.9%), were privately insured (72.6%), and had early-stage breast cancer (58.9%) and depression or anxiety (58.9%). The mean usability score was 75.8; participants who self-identified as Black reported 12.0 points higher usability than those who self-identified as White (P = .03). Mean CDKI adherence was 92.8%. A history of anxiety or depression was associated with an 8.6 percentage-point lower CDKI adherence rate (P = .02).

Conclusion: A smart label-enabled RTM mHealth intervention exceeded feasibility and usability benchmarks and showed promise for supporting CDKI adherence and symptom management.

目的：细胞周期蛋白依赖性激酶4和6抑制剂（CDKIs）是一种有效的乳腺癌治疗方法，但由于成本、副作用和用药计划的复杂性，其依从性面临挑战。我们评估了一种智能标签支持的远程治疗监测（RTM）移动健康干预乳腺癌妇女的可行性和可用性。探索性调整分析检查了与可用性和CDKI依从性相关的因素。方法：2024年4月至8月从一家综合性癌症中心招募参与者。在3个月的时间里，参与者使用Tappt智能标签和web应用程序记录CDKI剂量，接收遗漏剂量提醒，每两周报告症状，并完成基线和随访调查。如果出现不依从（漏给剂量20%）或中度至重度症状，将向肿瘤团队发出警报。可行性定义为≥70%的参与者使用智能标签30天并完成随访调查。可用性评估采用系统可用性量表，基准得分≥68分。线性回归用于检验与可用性和CDKI依从性相关的因素。结果：经筛选的168例中，符合条件的达到107例；75.7%（81/107）同意；90.1%（73/81）的患者完成了随访调查，88.9%（72/81）的患者在30天内使用了干预措施。大多数参与者自认为是白人（69.9%），有私人保险（72.6%），患有早期乳腺癌（58.9%）和抑郁症或焦虑症（58.9%）。平均可用性得分为75.8分；自认为是黑人的参与者报告的可用性比自认为是白人的参与者高12.0分（P = .03）。平均CDKI依从性为92.8%。焦虑或抑郁史与CDKI依从率降低8.6个百分点相关（P = 0.02）。结论：支持智能标签的RTM移动健康干预超过了可行性和可用性基准，并显示出支持CDKI依从性和症状管理的希望。

{"title":"Feasibility of a Smart Label-Enabled Remote Therapeutic Monitoring Intervention to Support Cyclin-Dependent Kinase 4/6 Inhibitor Adherence in Breast Cancer Care.","authors":"Ilana Graetz, Sara Arshad, Clara Cai, Samuel Hernandez, Tamar Sapir, Jeffrey Carter, Cherilyn Heggen, Kelly E McKinnon, Freddie Yang, Gelareh Sadigh, Jane Meisel","doi":"10.1200/CCI-25-00152","DOIUrl":"https://doi.org/10.1200/CCI-25-00152","url":null,"abstract":"Purpose: Cyclin-dependent kinase 4 and 6 inhibitors (CDKIs) are effective breast cancer therapies but pose adherence challenges because of cost, side effects, and complexity of medication schedule. We assessed the feasibility and usability of a smart label-enabled remote therapeutic monitoring (RTM) mHealth intervention for women with breast cancer prescribed a CDKI. Exploratory adjusted analyses examined factors associated with usability and CDKI adherence.Methods: Participants were recruited from a comprehensive cancer center between April and August 2024. For 3 months, participants used Tappt smart labels and web app to record CDKI doses, receive missed dose reminders, report symptoms biweekly, and complete baseline and follow-up surveys. Alerts were sent to oncology teams for nonadherence (>20% missed doses) or moderate-to-severe symptoms. Feasibility was defined as ≥70% of participants using the smart label >30 days and completing the follow-up survey. Usability was assessed using the System Usability Scale, with a benchmark score of ≥68. Linear regression was used to examine factors associated with usability and CDKI adherence.Results: Among 168 screened, 107 were eligible and reached; 75.7% (81/107) consented; 90.1% (73/81) completed the follow-up survey, and 88.9% (72/81) used the intervention >30 days. Most participants self-identified as White (69.9%), were privately insured (72.6%), and had early-stage breast cancer (58.9%) and depression or anxiety (58.9%). The mean usability score was 75.8; participants who self-identified as Black reported 12.0 points higher usability than those who self-identified as White (P = .03). Mean CDKI adherence was 92.8%. A history of anxiety or depression was associated with an 8.6 percentage-point lower CDKI adherence rate (P = .02).Conclusion: A smart label-enabled RTM mHealth intervention exceeded feasibility and usability benchmarks and showed promise for supporting CDKI adherence and symptom management.","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500152"},"PeriodicalIF":2.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145558438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Rapid Growth in Patient Portal Messages Underscores the Need for Actionable Paths Forward. 患者门户信息的快速增长强调了制定可行的前进路径的必要性。

IF 2.8 Q2 ONCOLOGY

JCO Clinical Cancer Informatics

Pub Date : 2025-11-01 Epub Date: 2025-11-03 DOI: 10.1200/CCI-25-00283

A Jay Holmgren

引用次数: 0

Reasoning Models for Text Mining in Oncology: A Comparison Between o1 Preview, GPT-4o, and GPT-5 at Different Reasoning Levels. 肿瘤学文本挖掘的推理模型：o1预览、gpt - 40和GPT-5在不同推理水平上的比较

IF 2.8 Q2 ONCOLOGY

JCO Clinical Cancer Informatics

Pub Date : 2025-11-01 Epub Date: 2025-11-06 DOI: 10.1200/CCI-24-00311

Paul Windisch, Fabio Dennstädt, Julia Weyrich, Christina Schröder, Daniel R Zwahlen, Robert Förster

Purpose: Chain-of-thought prompting is a method to make large language models generate intermediate reasoning steps when solving a complex problem. OpenAI's o1 preview and GPT-5 have been trained to create such a chain of thought internally before giving a response and have been claimed to surpass various benchmarks requiring complex reasoning. The purpose of this study was to evaluate their performance in text mining in oncology.

Methods: Six hundred trials from high-impact medical journals were classified depending on whether they allowed for the inclusion of patients with localized and/or metastatic disease. GPT-4o, o1 preview, and GPT-5 at different reasoning effort settings were instructed to do the same classification based on the publications' abstracts.

Results: For predicting whether patients with localized disease were enrolled, GPT-4o and o1 preview achieved F1 scores of 0.80 (0.76-0.83) and 0.91 (0.89-0.94), respectively. For predicting whether patients with metastatic disease were enrolled, GPT-4o and o1 preview achieved F1 scores of 0.97 (0.95-0.98) and 0.99 (0.99-1.00), respectively. For GPT-5, the F1 scores for predicting the eligibility of patients with localized disease increased from 0.84 to 0.93 and 0.94 with increased reasoning effort. F1 scores for metastatic disease were 0.97, 0.99, and 0.99.

Conclusion: o1 preview outperformed GPT-4o in extracting if people with localized and/or metastatic disease were eligible for a trial from its abstract. GPT-5 at high reasoning effort settings outperformed both GPT-4o and o1 preview, supporting the notion that reasoning models could become the new standard for text mining in medicine.

目的：思维链提示是在解决复杂问题时，使大型语言模型产生中间推理步骤的一种方法。OpenAI的o1预览版和GPT-5已经经过培训，可以在给出响应之前在内部创建这样的思维链，并声称超过了需要复杂推理的各种基准。本研究的目的是评估它们在肿瘤学文本挖掘中的表现。方法：根据是否允许纳入局部和/或转移性疾病患者，对来自高影响力医学期刊的600项试验进行分类。在不同的推理努力设置下，gpt - 40、o1预览和GPT-5被指示根据出版物的摘要进行相同的分类。结果：在预测患者是否入组时，gpt - 40和o1预览的F1评分分别为0.80（0.76-0.83）和0.91（0.89-0.94）。在预测是否有转移性疾病患者入组时，gpt - 40和o1预览的F1评分分别为0.97（0.95-0.98）和0.99（0.99-1.00）。对于GPT-5，预测局限性疾病患者资格的F1分数从0.84增加到0.93，随着推理努力的增加，F1分数增加到0.94。转移性疾病的F1评分分别为0.97、0.99和0.99。结论：o1预览在提取局部和/或转移性疾病患者是否有资格从摘要中进行试验方面优于gpt - 40。GPT-5在高推理努力设置下的表现优于gpt - 40和01预览，支持推理模型可能成为医学文本挖掘的新标准的观点。

{"title":"Reasoning Models for Text Mining in Oncology: A Comparison Between o1 Preview, GPT-4o, and GPT-5 at Different Reasoning Levels.","authors":"Paul Windisch, Fabio Dennstädt, Julia Weyrich, Christina Schröder, Daniel R Zwahlen, Robert Förster","doi":"10.1200/CCI-24-00311","DOIUrl":"https://doi.org/10.1200/CCI-24-00311","url":null,"abstract":"Purpose: Chain-of-thought prompting is a method to make large language models generate intermediate reasoning steps when solving a complex problem. OpenAI's o1 preview and GPT-5 have been trained to create such a chain of thought internally before giving a response and have been claimed to surpass various benchmarks requiring complex reasoning. The purpose of this study was to evaluate their performance in text mining in oncology.Methods: Six hundred trials from high-impact medical journals were classified depending on whether they allowed for the inclusion of patients with localized and/or metastatic disease. GPT-4o, o1 preview, and GPT-5 at different reasoning effort settings were instructed to do the same classification based on the publications' abstracts.Results: For predicting whether patients with localized disease were enrolled, GPT-4o and o1 preview achieved F1 scores of 0.80 (0.76-0.83) and 0.91 (0.89-0.94), respectively. For predicting whether patients with metastatic disease were enrolled, GPT-4o and o1 preview achieved F1 scores of 0.97 (0.95-0.98) and 0.99 (0.99-1.00), respectively. For GPT-5, the F1 scores for predicting the eligibility of patients with localized disease increased from 0.84 to 0.93 and 0.94 with increased reasoning effort. F1 scores for metastatic disease were 0.97, 0.99, and 0.99.Conclusion: o1 preview outperformed GPT-4o in extracting if people with localized and/or metastatic disease were eligible for a trial from its abstract. GPT-5 at high reasoning effort settings outperformed both GPT-4o and o1 preview, supporting the notion that reasoning models could become the new standard for text mining in medicine.","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2400311"},"PeriodicalIF":2.8,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145460668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0