Purpose: To evaluate the appropriateness of a cure model when analyzing right-censored end points of a randomized clinical trial (RCT) in malignancy in the presence of long-term survivors. We aim to derive how the ratio estimation of censored cured subjects (RECeUS), previously proposed for a homogeneous population, could be extended for use in RCTs.
Methods: Based on the RECeUS method, four decision rules were considered to assess the appropriateness of a cure model. They considered the eligibility conditions to be met: in both arms, in at least one randomized arm, in the entire sample, or when only considering an average of the conditions, respectively. A simulation study was performed to evaluate their performance and the impact of the link function when considering the appropriateness of cure models. We also illustrate the method using two real data examples from two RCTs conducted in patients with acute leukemia and COVID-19 disease.
Results: Simulation results show that the best decision rule that can be applied in all considered treatment effect scenarios might be to check the criteria in at least one randomized arm. Regardless of the rules, the cure model appeared to be appropriate in both RCT data.
Conclusion: When analyzing survival data from RCTs, the appropriateness of a cure model could be considered in the face of a plateau shape of the survival curves. To ensure that the presence of such a plateau in the survival curves is a reliable indicator of the presence of cured patients in the population, the RECeUS method should be used in each randomized arm separately, with criteria met in at least one randomized arm.
{"title":"Detecting the Cure Model Appropriateness in Randomized Clinical Trials With Long-Term Survivors.","authors":"Cheryl Kouadio, Subodh Selukar, Megan Othus, Sylvie Chevret","doi":"10.1200/CCI-25-00084","DOIUrl":"https://doi.org/10.1200/CCI-25-00084","url":null,"abstract":"<p><strong>Purpose: </strong>To evaluate the appropriateness of a cure model when analyzing right-censored end points of a randomized clinical trial (RCT) in malignancy in the presence of long-term survivors. We aim to derive how the ratio estimation of censored cured subjects (RECeUS), previously proposed for a homogeneous population, could be extended for use in RCTs.</p><p><strong>Methods: </strong>Based on the RECeUS method, four decision rules were considered to assess the appropriateness of a cure model. They considered the eligibility conditions to be met: in both arms, in at least one randomized arm, in the entire sample, or when only considering an average of the conditions, respectively. A simulation study was performed to evaluate their performance and the impact of the link function when considering the appropriateness of cure models. We also illustrate the method using two real data examples from two RCTs conducted in patients with acute leukemia and COVID-19 disease.</p><p><strong>Results: </strong>Simulation results show that the best decision rule that can be applied in all considered treatment effect scenarios might be to check the criteria in at least one randomized arm. Regardless of the rules, the cure model appeared to be appropriate in both RCT data.</p><p><strong>Conclusion: </strong>When analyzing survival data from RCTs, the appropriateness of a cure model could be considered in the face of a plateau shape of the survival curves. To ensure that the presence of such a plateau in the survival curves is a reliable indicator of the presence of cured patients in the population, the RECeUS method should be used in each randomized arm separately, with criteria met in at least one randomized arm.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500084"},"PeriodicalIF":2.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145764401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yiming Xue, Yunzheng Zhu, Luoting Zhuang, YongKyung Oh, Ricky Taira, Denise R Aberle, Ashley Elizabeth Prosper, William Hsu, Yannan Lin
Purpose: Tobacco use is a major risk factor for diseases such as cancer. Granular quantitative details of smoking (eg, pack years and years since quitting) are essential for assessing disease risk and determining eligibility for lung cancer screening (LCS). However, existing natural language processing (NLP) tools struggle to extract detailed quantitative smoking data from clinical narratives.
Methods: We cross-validated four pretrained Bidirectional Encoder Representations from Transformers (BERT)-based models-BERT, BioBERT, ClinicalBERT, and MedBERT-by fine-tuning them on 90% of 3,261 sentences mentioning smoking history to extract six quantitative smoking history variables from clinical narratives. The model with the highest cross-validated micro-averaged F1 scores across most variables was selected as the final SmokeBERT model and was further fine-tuned on the 90% training data. Model performance was evaluated on a 10% holdout test set and an external validation set containing 3,191 sentences.
Results: ClinicalBERT was selected as the final model based on cross-validation and was fine-tuned on the training data to create the SmokeBERT model. Compared with the state-of-the-art rule-based NLP model and the Generative Pre-trained Transformer Open Source Series 20 billion parameter model, SmokeBERT demonstrated superior performance in smoking data extraction (overall F1 score, holdout test: 0.97 v 0.88-0.90; external validation: 0.86 v 0.72-0.79) and in identifying LCS-eligible patients (97% v 59%-97% for ≥20 pack-years and 100% v 60%-84% for ≤15 years since quitting).
Conclusion: We developed SmokeBERT, a fine-tuned BERT-based model optimized for extracting detailed quantitative smoking histories. Future work includes evaluating performance on larger clinical data sets and developing a multilingual, language-agnostic version of SmokeBERT.
目的:烟草使用是癌症等疾病的一个主要危险因素。吸烟的细粒度定量细节(例如,吸烟年数和戒烟后的年数)对于评估疾病风险和确定肺癌筛查(LCS)的资格至关重要。然而,现有的自然语言处理(NLP)工具难以从临床叙述中提取详细的定量吸烟数据。方法:我们交叉验证了基于变形金刚(BERT)模型的四种预训练双向编码器表示——BERT、BioBERT、ClinicalBERT和medbert——通过对3261个提到吸烟史的句子中的90%进行微调,从临床叙述中提取出6个定量吸烟史变量。在大多数变量中交叉验证的微平均F1得分最高的模型被选为最终的SmokeBERT模型,并在90%的训练数据上进一步微调。模型的性能在10%的保留测试集和包含3191个句子的外部验证集上进行评估。结果:在交叉验证的基础上,ClinicalBERT被选择为最终模型,并在训练数据的基础上进行微调,建立了SmokeBERT模型。与最先进的基于规则的NLP模型和生成式预训练变压器开源系列200亿参数模型相比,SmokeBERT在吸烟数据提取(总体F1评分,坚持测试:0.97 v 0.88-0.90;外部验证:0.86 v 0.72-0.79)和识别lcs合格患者(≥20包年为97% v 59%-97%,戒烟≤15年为100% v 60%-84%)方面表现出更优越的性能。结论:我们开发了一个微调的基于bert的模型,用于提取详细的定量吸烟史。未来的工作包括评估在更大的临床数据集上的表现,以及开发一个多语言、语言无关的smoke - bert版本。
{"title":"SmokeBERT: A Bidirectional Encoder Representations From Transformers-Based Model for Quantitative Smoking History Extraction From Clinical Narratives to Improve Lung Cancer Screening.","authors":"Yiming Xue, Yunzheng Zhu, Luoting Zhuang, YongKyung Oh, Ricky Taira, Denise R Aberle, Ashley Elizabeth Prosper, William Hsu, Yannan Lin","doi":"10.1200/CCI-25-00223","DOIUrl":"10.1200/CCI-25-00223","url":null,"abstract":"<p><strong>Purpose: </strong>Tobacco use is a major risk factor for diseases such as cancer. Granular quantitative details of smoking (eg, pack years and years since quitting) are essential for assessing disease risk and determining eligibility for lung cancer screening (LCS). However, existing natural language processing (NLP) tools struggle to extract detailed quantitative smoking data from clinical narratives.</p><p><strong>Methods: </strong>We cross-validated four pretrained Bidirectional Encoder Representations from Transformers (BERT)-based models-BERT, BioBERT, ClinicalBERT, and MedBERT-by fine-tuning them on 90% of 3,261 sentences mentioning smoking history to extract six quantitative smoking history variables from clinical narratives. The model with the highest cross-validated micro-averaged F1 scores across most variables was selected as the final SmokeBERT model and was further fine-tuned on the 90% training data. Model performance was evaluated on a 10% holdout test set and an external validation set containing 3,191 sentences.</p><p><strong>Results: </strong>ClinicalBERT was selected as the final model based on cross-validation and was fine-tuned on the training data to create the SmokeBERT model. Compared with the state-of-the-art rule-based NLP model and the Generative Pre-trained Transformer Open Source Series 20 billion parameter model, SmokeBERT demonstrated superior performance in smoking data extraction (overall F1 score, holdout test: 0.97 <i>v</i> 0.88-0.90; external validation: 0.86 <i>v</i> 0.72-0.79) and in identifying LCS-eligible patients (97% <i>v</i> 59%-97% for ≥20 pack-years and 100% <i>v</i> 60%-84% for ≤15 years since quitting).</p><p><strong>Conclusion: </strong>We developed SmokeBERT, a fine-tuned BERT-based model optimized for extracting detailed quantitative smoking histories. Future work includes evaluating performance on larger clinical data sets and developing a multilingual, language-agnostic version of SmokeBERT.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500223"},"PeriodicalIF":2.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145656443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-12-05DOI: 10.1200/CCI-25-00132
David N Karp, Khaldoun Hamade, Christopher M McNair, Amy E Leader
Purpose: Cancer centers and health systems are tasked with deciding where to deploy community interventions to reduce the burden of cancer within their catchment areas. Few methods exist to prioritize communities in a systematic manner, considering features of individuals, populations, systems, and policies. We developed a geographically informed index to prioritize census tracts based on community need, with an initial focus on identifying communities in need of breast cancer screening (BCS) interventions.
Methods: This study used publicly available data to select variables known to be associated with disparities in BCS rates. Variables were identified from five categories: economic stability, education access and quality, neighborhood and built environment, social and community context, and health status and health care access and quality. Data were analyzed at the census tract level across the Sidney Kimmel Comprehensive Cancer Center catchment (N = 1,216). Principal component analysis was applied to 23 variables, and five principal components were selected to construct a composite measure using a weighted sum. The resulting index values were used to stratify the data set for further analysis and mapped for visualization.
Results: The analysis produced the Community Need Priority Index (CNPI)-BCS, with values ranging from 0 to 1 (mean, 0.259; standard deviation [SD], 0.161). The top quintile (Q5, n = 243) represented the highest-need communities. Q5 tracts were primarily concentrated in Philadelphia, Camden, and Delaware counties. Philadelphia County had the highest average (mean, 0.364; SD, 1.78) and the most tracts in the top quintile (45%, n = 175). Montgomery county had the lowest average (mean, 0.169; SD, 0.092).
Conclusion: This novel methodological approach considered the complex nature of multiple, intersectional barriers to good health to identify priority areas of need within cancer center catchment areas.
目的:癌症中心和卫生系统的任务是决定在何处部署社区干预措施,以减轻其集水区内的癌症负担。考虑到个人、群体、系统和政策的特点,很少有方法以系统的方式对社区进行优先排序。我们开发了一个地理信息指数,根据社区需求对人口普查区进行优先排序,最初的重点是确定需要乳腺癌筛查(BCS)干预的社区。方法:本研究使用公开可用的数据来选择已知与BCS发病率差异相关的变量。变量从五个类别中确定:经济稳定性、教育机会和质量、邻里和建成环境、社会和社区背景、健康状况和卫生保健机会和质量。数据在Sidney Kimmel综合癌症中心集水区的人口普查区水平上进行分析(N = 1,216)。对23个变量进行主成分分析,选取5个主成分,采用加权和构建复合测度。所得的指标值用于对数据集进行分层,以便进一步分析,并将其映射为可视化。结果:分析产生了社区需求优先指数(CNPI)-BCS,其值范围为0到1(平均值0.259;标准差[SD], 0.161)。前五分之一(Q5, n = 243)代表需求最高的社区。Q5主要集中在费城、卡姆登和特拉华州。费城县的平均值最高(平均值0.364;标准差1.78),前五分位数的土地最多(45%,n = 175)。蒙哥马利县的平均值最低(平均值0.169;标准差0.092)。结论:这种新颖的方法方法考虑了多种交叉的健康障碍的复杂性,以确定癌症中心集水区内的优先需求领域。
{"title":"Development of a Composite Measure to Identify Priority Areas of Need for Cancer Screening Interventions.","authors":"David N Karp, Khaldoun Hamade, Christopher M McNair, Amy E Leader","doi":"10.1200/CCI-25-00132","DOIUrl":"https://doi.org/10.1200/CCI-25-00132","url":null,"abstract":"<p><strong>Purpose: </strong>Cancer centers and health systems are tasked with deciding where to deploy community interventions to reduce the burden of cancer within their catchment areas. Few methods exist to prioritize communities in a systematic manner, considering features of individuals, populations, systems, and policies. We developed a geographically informed index to prioritize census tracts based on community need, with an initial focus on identifying communities in need of breast cancer screening (BCS) interventions.</p><p><strong>Methods: </strong>This study used publicly available data to select variables known to be associated with disparities in BCS rates. Variables were identified from five categories: economic stability, education access and quality, neighborhood and built environment, social and community context, and health status and health care access and quality. Data were analyzed at the census tract level across the Sidney Kimmel Comprehensive Cancer Center catchment (N = 1,216). Principal component analysis was applied to 23 variables, and five principal components were selected to construct a composite measure using a weighted sum. The resulting index values were used to stratify the data set for further analysis and mapped for visualization.</p><p><strong>Results: </strong>The analysis produced the Community Need Priority Index (CNPI)-BCS, with values ranging from 0 to 1 (mean, 0.259; standard deviation [SD], 0.161). The top quintile (Q5, n = 243) represented the highest-need communities. Q5 tracts were primarily concentrated in Philadelphia, Camden, and Delaware counties. Philadelphia County had the highest average (mean, 0.364; SD, 1.78) and the most tracts in the top quintile (45%, n = 175). Montgomery county had the lowest average (mean, 0.169; SD, 0.092).</p><p><strong>Conclusion: </strong>This novel methodological approach considered the complex nature of multiple, intersectional barriers to good health to identify priority areas of need within cancer center catchment areas.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500132"},"PeriodicalIF":2.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145688619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-12-05DOI: 10.1200/CCI-25-00233
Eunyoung Im, Bomi Kim, Sunghoon Kang, Hyeoneui Kim
Purpose: The rapid expansion of scientific literature has made it increasingly challenging for clinicians and researchers to efficiently identify relevant evidence. While large language models (LLMs) offer promising solutions for automating literature review tasks, few tools support integrated workflows that enable trend analysis as well. This study aimed to develop and evaluate Rapid Clinical Evidence eXplorer (RaCE-X), a Generative Pre-trained Transformer (GPT)-based automated pipeline designed to streamline abstract screening, extract structured information, and visualize key trends in clinical research.
Methods: We used GPT-4.1 mini to screen 865 PubMed abstracts based on predefined screening criteria. Structured information was then extracted from the 87 relevant abstracts based on a predefined information model covering nine fields. A gold standard data set was created through expert review to assess model performance. The extracted information was visualized through an interactive dashboard. Usability was evaluated using the Post-Study System Usability Questionnaire (PSSUQ) and open-ended feedback from five clinical research coordinators.
Results: RaCE-X demonstrated high screening performance (precision = 0.954, recall = 0.988, F1 = 0.971) and achieved strong average performance in information extraction (precision = 0.977, recall = 0.989, F1 = 0.983), with no hallucinations identified. Usability testing indicated generally positive feedback (overall PSSUQ score = 2.8), with users noting that RaCE-X was intuitive and effective for data interpretation.
Conclusion: RaCE-X enables efficient GPT-based abstract screening, structured information extraction, and research trend exploration, thereby facilitating the summary of clinically relevant evidence from the biomedical literature. This study demonstrates the feasibility of using LLMs to reduce manual workload and accelerate evidence-based research practices.
目的:科学文献的快速扩张使得临床医生和研究人员越来越难以有效地识别相关证据。虽然大型语言模型(llm)为自动化文献回顾任务提供了有希望的解决方案,但很少有工具支持集成工作流,也支持趋势分析。本研究旨在开发和评估快速临床证据探索者(RaCE-X),这是一种基于生成式预训练变压器(GPT)的自动化管道,旨在简化抽象筛选,提取结构化信息,并可视化临床研究中的关键趋势。方法:我们使用GPT-4.1 mini根据预先设定的筛选标准筛选865篇PubMed摘要。然后,基于涵盖9个字段的预定义信息模型,从87个相关摘要中提取结构化信息。通过专家评审创建了一个金标准数据集来评估模型的性能。提取的信息通过交互式仪表板可视化。可用性评估采用研究后系统可用性问卷(PSSUQ)和来自五位临床研究协调员的开放式反馈。结果:RaCE-X具有较高的筛选性能(precision = 0.954, recall = 0.988, F1 = 0.971),在信息提取方面具有较强的平均性能(precision = 0.977, recall = 0.989, F1 = 0.983),未发现幻觉。可用性测试显示总体反馈是积极的(PSSUQ总分= 2.8),用户注意到RaCE-X直观且有效地解释了数据。结论:RaCE-X能够高效地进行基于gpt的摘要筛选、结构化信息提取和研究趋势探索,从而便于从生物医学文献中总结临床相关证据。本研究证明了使用法学硕士减少人工工作量和加速循证研究实践的可行性。
{"title":"Rapid Clinical Evidence Explorer: A Generative Pre-Trained Transformer-Powered Tool for Automated Oncology Evidence Extraction.","authors":"Eunyoung Im, Bomi Kim, Sunghoon Kang, Hyeoneui Kim","doi":"10.1200/CCI-25-00233","DOIUrl":"https://doi.org/10.1200/CCI-25-00233","url":null,"abstract":"<p><strong>Purpose: </strong>The rapid expansion of scientific literature has made it increasingly challenging for clinicians and researchers to efficiently identify relevant evidence. While large language models (LLMs) offer promising solutions for automating literature review tasks, few tools support integrated workflows that enable trend analysis as well. This study aimed to develop and evaluate Rapid Clinical Evidence eXplorer (<i>RaCE-X</i>), a Generative Pre-trained Transformer (GPT)-based automated pipeline designed to streamline abstract screening, extract structured information, and visualize key trends in clinical research.</p><p><strong>Methods: </strong>We used GPT-4.1 mini to screen 865 PubMed abstracts based on predefined screening criteria. Structured information was then extracted from the 87 relevant abstracts based on a predefined information model covering nine fields. A gold standard data set was created through expert review to assess model performance. The extracted information was visualized through an interactive dashboard. Usability was evaluated using the Post-Study System Usability Questionnaire (PSSUQ) and open-ended feedback from five clinical research coordinators.</p><p><strong>Results: </strong>RaCE-X demonstrated high screening performance (precision = 0.954, recall = 0.988, F1 = 0.971) and achieved strong average performance in information extraction (precision = 0.977, recall = 0.989, F1 = 0.983), with no hallucinations identified. Usability testing indicated generally positive feedback (overall PSSUQ score = 2.8), with users noting that RaCE-X was intuitive and effective for data interpretation.</p><p><strong>Conclusion: </strong>RaCE-X enables efficient GPT-based abstract screening, structured information extraction, and research trend exploration, thereby facilitating the summary of clinically relevant evidence from the biomedical literature. This study demonstrates the feasibility of using LLMs to reduce manual workload and accelerate evidence-based research practices.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500233"},"PeriodicalIF":2.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145688642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-12-22DOI: 10.1200/CCI-25-00297
Mahima Akula, Ryan W Huey, Arthur S Hong
{"title":"Dissonance in the Sole Quality Measure for Outpatient Chemotherapy, OP-35.","authors":"Mahima Akula, Ryan W Huey, Arthur S Hong","doi":"10.1200/CCI-25-00297","DOIUrl":"10.1200/CCI-25-00297","url":null,"abstract":"","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500297"},"PeriodicalIF":2.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12724631/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145812123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-12-11DOI: 10.1200/CCI-25-00310
Ning Liao, Cheukfai Li, Charles M Balch
{"title":"Reply to: Critical Role of Model Selection in Evaluating AI Performance for Tumor Board Decision Making.","authors":"Ning Liao, Cheukfai Li, Charles M Balch","doi":"10.1200/CCI-25-00310","DOIUrl":"https://doi.org/10.1200/CCI-25-00310","url":null,"abstract":"","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500310"},"PeriodicalIF":2.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145745684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-12-19DOI: 10.1200/CCI-25-00176
Jiasheng Wang, David M Swoboda, Aziz Nazha
Purpose: Analyzing complex medical data sets is specialized and time-consuming. This study aimed to develop and evaluate a novel multiagent artificial intelligence (AI) framework for automating medical data analysis workflows and to compare its performance against nonagent-based approaches using large language models (LLMs).
Methods: A six-party AI agent system was developed using the AutoGen platform, with specialized agents for planning, data retrieval, cleaning, statistical analysis, and review, powered by OpenAI gpt-4o. This framework was applied to deidentified single patient-level data sets from 20 recent studies in the field of bone marrow transplantation (2021-2023). The primary objective was to evaluate its accuracy in replicating published primary outcomes, benchmarked against direct use of the Web site-based ChatGPT 4o.
Results: The multiagent framework successfully replicated 53.3% (95% CI, 40.7 to 66.0) of primary outcomes, significantly outperforming ChatGPT 4o (35.0% [95% CI, 22.9 to 47.1]; P = .04). The agent framework's failures were predominantly due to data transformation issues (46.4%) and analysis code errors (21.4%). In contrast, ChatGPT 4o failures largely stemmed from incorrect statistical method application (38.4%) and data transformation (45.6%), often attempting to resolve code errors by switching to alternative, incorrect statistical methods. Hallucinations of variables or results were not observed in the multiagent approach.
Conclusion: Our multiagent AI framework demonstrated superior accuracy and robustness in automating biomedical data analysis compared with a generalized LLM.
{"title":"Autonomous Analysis of Curated Patient Data Using a Large Language Model-Based Multiagent Framework.","authors":"Jiasheng Wang, David M Swoboda, Aziz Nazha","doi":"10.1200/CCI-25-00176","DOIUrl":"https://doi.org/10.1200/CCI-25-00176","url":null,"abstract":"<p><strong>Purpose: </strong>Analyzing complex medical data sets is specialized and time-consuming. This study aimed to develop and evaluate a novel multiagent artificial intelligence (AI) framework for automating medical data analysis workflows and to compare its performance against nonagent-based approaches using large language models (LLMs).</p><p><strong>Methods: </strong>A six-party AI agent system was developed using the AutoGen platform, with specialized agents for planning, data retrieval, cleaning, statistical analysis, and review, powered by OpenAI gpt-4o. This framework was applied to deidentified single patient-level data sets from 20 recent studies in the field of bone marrow transplantation (2021-2023). The primary objective was to evaluate its accuracy in replicating published primary outcomes, benchmarked against direct use of the Web site-based ChatGPT 4o.</p><p><strong>Results: </strong>The multiagent framework successfully replicated 53.3% (95% CI, 40.7 to 66.0) of primary outcomes, significantly outperforming ChatGPT 4o (35.0% [95% CI, 22.9 to 47.1]; <i>P</i> = .04). The agent framework's failures were predominantly due to data transformation issues (46.4%) and analysis code errors (21.4%). In contrast, ChatGPT 4o failures largely stemmed from incorrect statistical method application (38.4%) and data transformation (45.6%), often attempting to resolve code errors by switching to alternative, incorrect statistical methods. Hallucinations of variables or results were not observed in the multiagent approach.</p><p><strong>Conclusion: </strong>Our multiagent AI framework demonstrated superior accuracy and robustness in automating biomedical data analysis compared with a generalized LLM.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500176"},"PeriodicalIF":2.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145795137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-12-10DOI: 10.1200/CCI-25-00178
Kyle M Nolla, Maja Kuharic, Nicola Lancki, Callie L Walsh-Bailey, Ann Marie Flores, Sofia F Garcia, Roxanne E Jensen, Yingbao Wang, Quan Mai, Ambrosine M Mercer, Justin Dean Smith, Alexandra M Psihogios, Kimberly A Webster, Sheetal M Kircher, Patricia D Franklin, David Cella, Betina R Yanez
Purpose: Electronic patient portals can promote patient-centered care, but determinants of engagement remain underexplored in oncology. This study examines sociodemographic and clinical factors associated with engagement with four portal features, including invitations to complete patient-reported outcome (PRO) measures before appointments.
Methods: Secondary analysis of the Northwestern University IMproving the Management of symPtoms during and following Cancer Treatment study, a stepped-wedge cluster randomized trial to promote symptom management using PROs in adult oncology care was performed. For each enrolled participant, we examined portal usage across 1 year.
Results: A total of 3,457 patients were enrolled between April 2020 and April 2023 from 30 Northwestern Medicine ambulatory oncology clinics. Patients were 65% female, 85% White, and 85% non-Hispanic/Latino, with a mean age of 60.8 years. Cancer diagnoses were 30% breast, 12% lymphoma, and all other types accounted for <10% of the sample. Patients accessed laboratory results most frequently (median 23 days in the year), followed by messaging (median 11 days) and physician notes (median 2 days). A total of 62.6% of patients completed at least one invited PRO. Controlling for sociodemographic factors, patient characteristics that were associated with greater engagement across three or more features included more oncology appointments, high health literacy, high anxiety, one or more severe physical symptoms, and high shared decision making with their health care team. Black race, Hispanic/Latino ethnicity, and Medicaid insurance were associated with lower portal engagement. Patients who used any other portal features were more likely to complete PROs. In contrast to other portal features, patients with at least one severe physical symptom were less likely to complete PROs (incidence rate ratio, 0.87 [95% CI, 0.81 to 0.93]; P < .001).
Conclusion: Portal use among patients with cancer varies by sociodemographic and clinical characteristics. Findings suggest a need for targeted interventions to promote equitable use among under-represented groups and promote portal-based PRO completion for patients with higher symptom burden.
{"title":"Patient Portal Engagement in Oncology: Results From the NU IMPACT Study in a Large Health Care System.","authors":"Kyle M Nolla, Maja Kuharic, Nicola Lancki, Callie L Walsh-Bailey, Ann Marie Flores, Sofia F Garcia, Roxanne E Jensen, Yingbao Wang, Quan Mai, Ambrosine M Mercer, Justin Dean Smith, Alexandra M Psihogios, Kimberly A Webster, Sheetal M Kircher, Patricia D Franklin, David Cella, Betina R Yanez","doi":"10.1200/CCI-25-00178","DOIUrl":"10.1200/CCI-25-00178","url":null,"abstract":"<p><strong>Purpose: </strong>Electronic patient portals can promote patient-centered care, but determinants of engagement remain underexplored in oncology. This study examines sociodemographic and clinical factors associated with engagement with four portal features, including invitations to complete patient-reported outcome (PRO) measures before appointments.</p><p><strong>Methods: </strong>Secondary analysis of the Northwestern University IMproving the Management of symPtoms during and following Cancer Treatment study, a stepped-wedge cluster randomized trial to promote symptom management using PROs in adult oncology care was performed. For each enrolled participant, we examined portal usage across 1 year.</p><p><strong>Results: </strong>A total of 3,457 patients were enrolled between April 2020 and April 2023 from 30 Northwestern Medicine ambulatory oncology clinics. Patients were 65% female, 85% White, and 85% non-Hispanic/Latino, with a mean age of 60.8 years. Cancer diagnoses were 30% breast, 12% lymphoma, and all other types accounted for <10% of the sample. Patients accessed laboratory results most frequently (median 23 days in the year), followed by messaging (median 11 days) and physician notes (median 2 days). A total of 62.6% of patients completed at least one invited PRO. Controlling for sociodemographic factors, patient characteristics that were associated with greater engagement across three or more features included more oncology appointments, high health literacy, high anxiety, one or more severe physical symptoms, and high shared decision making with their health care team. Black race, Hispanic/Latino ethnicity, and Medicaid insurance were associated with lower portal engagement. Patients who used any other portal features were more likely to complete PROs. In contrast to other portal features, patients with at least one severe physical symptom were less likely to complete PROs (incidence rate ratio, 0.87 [95% CI, 0.81 to 0.93]; <i>P</i> < .001).</p><p><strong>Conclusion: </strong>Portal use among patients with cancer varies by sociodemographic and clinical characteristics. Findings suggest a need for targeted interventions to promote equitable use among under-represented groups and promote portal-based PRO completion for patients with higher symptom burden.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2500178"},"PeriodicalIF":2.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12698109/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145727129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-12-02DOI: 10.1200/CCI-24-00308
Nazmul Islam, Justin L Dale, Jamie S Reuben, Karan Sapiah, James W Coates, Frank R Markson, Jingjing Zhang, Lezhou Wu, Maura Gasparetto, Brett M Stevens, Sarah E Staggs, William M Showers, Monica R Ransom, Jairav Desai, Ujjwal V Kulkarni, Krysta L Engel, Craig T Jordan, Michael Boyiadzis, Clayton A Smith
Purpose: The objective of this study was to develop a flexible risk stratification strategy for AML that is specific for venetoclax plus azacitidine (ven/aza), addresses real-world data (RWD) issues, and is also adaptable to different use cases.
Methods: A series of tunable risk models (RMs) were generated from a dynamic counterfactual machine learning (ML) strategy. These used a range of features from diagnostic AML samples and were tested using objective metrics on a single-institution cohort of 316 newly diagnosed patients treated with ven/aza. RM performance was tested using various model assumptions, data elements, and end points and with applications to an external AML real-world cohort (RWC).
Results: Favorable, intermediate, and adverse risk groups were identified in a series of ML-based RMs using different assumptions, for genetic-only or genetic-plus-phenotypic data elements and with overall survival and event-free survival as end points. Most RMs demonstrated equitable patient distribution (approximately 20%-40% in each risk group), significant separation between risk strata (log-rank-based P values <0.001), and predictability computed by time-dependent survival AUC values of 0.60-0.70. Similar performance was observed when the proposed RM strategy was adapted and compared with the European Leukemia Net 2022 using the external RWC.
Conclusion: The proposed ML strategy addresses a variety of RWD considerations and is readily tunable through coding and parameter updates for different contexts and use case needs. This strategy represents a novel approach to developing more effective RMs for AML and possibly other diseases.
{"title":"Development of a Dynamic Counterfactual Risk Stratification Strategy for Newly Diagnosed Patients With AML Treated With Venetoclax and Azacitidine.","authors":"Nazmul Islam, Justin L Dale, Jamie S Reuben, Karan Sapiah, James W Coates, Frank R Markson, Jingjing Zhang, Lezhou Wu, Maura Gasparetto, Brett M Stevens, Sarah E Staggs, William M Showers, Monica R Ransom, Jairav Desai, Ujjwal V Kulkarni, Krysta L Engel, Craig T Jordan, Michael Boyiadzis, Clayton A Smith","doi":"10.1200/CCI-24-00308","DOIUrl":"10.1200/CCI-24-00308","url":null,"abstract":"<p><strong>Purpose: </strong>The objective of this study was to develop a flexible risk stratification strategy for AML that is specific for venetoclax plus azacitidine (ven/aza), addresses real-world data (RWD) issues, and is also adaptable to different use cases.</p><p><strong>Methods: </strong>A series of tunable risk models (RMs) were generated from a dynamic counterfactual machine learning (ML) strategy. These used a range of features from diagnostic AML samples and were tested using objective metrics on a single-institution cohort of 316 newly diagnosed patients treated with ven/aza. RM performance was tested using various model assumptions, data elements, and end points and with applications to an external AML real-world cohort (RWC).</p><p><strong>Results: </strong>Favorable, intermediate, and adverse risk groups were identified in a series of ML-based RMs using different assumptions, for genetic-only or genetic-plus-phenotypic data elements and with overall survival and event-free survival as end points. Most RMs demonstrated equitable patient distribution (approximately 20%-40% in each risk group), significant separation between risk strata (log-rank-based <i>P</i> values <0.001), and predictability computed by time-dependent survival AUC values of 0.60-0.70. Similar performance was observed when the proposed RM strategy was adapted and compared with the European Leukemia Net 2022 using the external RWC.</p><p><strong>Conclusion: </strong>The proposed ML strategy addresses a variety of RWD considerations and is readily tunable through coding and parameter updates for different contexts and use case needs. This strategy represents a novel approach to developing more effective RMs for AML and possibly other diseases.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2400308"},"PeriodicalIF":2.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12685322/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145662677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}