首页 > 最新文献

JCO Clinical Cancer Informatics最新文献

英文 中文
Development and Assessment of a Pipeline for Extracting Structured Data From Free-Text Medical Reports Using a Large Language Model. 使用大型语言模型从自由文本医疗报告中提取结构化数据管道的开发和评估。
IF 2.8 Q2 ONCOLOGY Pub Date : 2026-02-01 Epub Date: 2026-02-18 DOI: 10.1200/CCI-25-00133
Enzo Joseph, Paul Vallee, Tanguy Perennec, Nicolas Wagneur, Jean-Sébastien Frenel, Mario Campone, François Bocquet, Florent Le Borgne

Purpose: Medical free texts such as pathology reports contain valuable clinical data but are challenging to structure at scale. Traditional natural language processing approaches require extensive annotated data and training. We investigate the use of large language model (LLM) like Mistral to automatically extract three breast cancer (BC) biomarkers from pathology reports.

Materials and methods: We developed and evaluated a pipeline combining Mistral Large LLM and a postprocessing phase. The pipeline's performance was assessed both at document and patient levels. For evaluation, two data sets were used: a data set of 1,152 pathology reports associated with 150 patients with BC focused solely on biomarker values and a gold standard database containing 101 patients with metastatic BC, enriched with detailed patient and tumor characteristics and double-blind validated by clinical research assistants. We also explored the pipeline's performance according to the use of a confidence prompt (CP), a chain of thought (CoT), and few-shot examples.

Results: Our extraction pipeline achieved F1 scores of more than 95% and both recall and precision of more than 94% for each biomarker of interest (ie, estrogen receptor, progesterone receptor and human epidermal growth factor receptor 2 status and score) at the document level. At the patient level, the F1 score decreased between 87% and 90% with a greater drop in recall (ranging between 83% and 87%) compared with precision, which remained >90%. The results were similar whether the pipeline included a CP, CoT, or few-shot examples.

Conclusion: Our study provides strong evidence of the potential of LLMs like Mistral Large for extracting structured BC biomarker data from pathology reports and the potential of such methods for broader digital transformation of health care documents.

目的:医学免费文本,如病理报告,包含有价值的临床数据,但具有挑战性的结构在规模。传统的自然语言处理方法需要大量带注释的数据和训练。我们研究了使用Mistral等大型语言模型(LLM)从病理报告中自动提取三种乳腺癌(BC)生物标志物。材料和方法:我们开发并评估了结合Mistral Large LLM和后处理阶段的管道。该管道的性能在文件和患者水平上进行了评估。为了进行评估,使用了两个数据集:一个数据集包含与150例BC患者相关的1152份病理报告,仅关注生物标志物值;一个金标准数据库包含101例转移性BC患者,丰富了详细的患者和肿瘤特征,并由临床研究助理进行了双盲验证。我们还通过使用置信提示(CP)、思维链(CoT)和少量示例来探索管道的性能。结果:我们的提取管道在文献水平上对每个感兴趣的生物标志物(即雌激素受体、孕激素受体和人表皮生长因子受体2的状态和评分)达到了95%以上的F1评分,召回率和精确度均超过94%。在患者水平上,F1评分下降了87%到90%,召回率下降幅度更大(范围在83%到87%之间),而准确率保持在80%到90%之间。无论管道中是否包含CP、CoT或少量样本,结果都是相似的。结论:我们的研究提供了强有力的证据,证明了像Mistral Large这样的llm在从病理报告中提取结构化BC生物标志物数据方面的潜力,以及这种方法在更广泛的医疗保健文件数字化转换方面的潜力。
{"title":"Development and Assessment of a Pipeline for Extracting Structured Data From Free-Text Medical Reports Using a Large Language Model.","authors":"Enzo Joseph, Paul Vallee, Tanguy Perennec, Nicolas Wagneur, Jean-Sébastien Frenel, Mario Campone, François Bocquet, Florent Le Borgne","doi":"10.1200/CCI-25-00133","DOIUrl":"10.1200/CCI-25-00133","url":null,"abstract":"<p><strong>Purpose: </strong>Medical free texts such as pathology reports contain valuable clinical data but are challenging to structure at scale. Traditional natural language processing approaches require extensive annotated data and training. We investigate the use of large language model (LLM) like Mistral to automatically extract three breast cancer (BC) biomarkers from pathology reports.</p><p><strong>Materials and methods: </strong>We developed and evaluated a pipeline combining Mistral Large LLM and a postprocessing phase. The pipeline's performance was assessed both at document and patient levels. For evaluation, two data sets were used: a data set of 1,152 pathology reports associated with 150 patients with BC focused solely on biomarker values and a gold standard database containing 101 patients with metastatic BC, enriched with detailed patient and tumor characteristics and double-blind validated by clinical research assistants. We also explored the pipeline's performance according to the use of a confidence prompt (CP), a chain of thought (CoT), and few-shot examples.</p><p><strong>Results: </strong>Our extraction pipeline achieved F1 scores of more than 95% and both recall and precision of more than 94% for each biomarker of interest (ie, estrogen receptor, progesterone receptor and human epidermal growth factor receptor 2 status and score) at the document level. At the patient level, the F1 score decreased between 87% and 90% with a greater drop in recall (ranging between 83% and 87%) compared with precision, which remained >90%. The results were similar whether the pipeline included a CP, CoT, or few-shot examples.</p><p><strong>Conclusion: </strong>Our study provides strong evidence of the potential of LLMs like Mistral Large for extracting structured BC biomarker data from pathology reports and the potential of such methods for broader digital transformation of health care documents.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500133"},"PeriodicalIF":2.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12928813/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146222008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Validation of Claims-Based Algorithms to Classify Thoracic Radiation Therapy Courses. 基于索赔的胸椎放射治疗过程分类算法的验证。
IF 2.8 Q2 ONCOLOGY Pub Date : 2026-02-01 Epub Date: 2026-01-16 DOI: 10.1200/CCI-25-00266
Shane S Neibart, Nicholas Lin, Jacob Hogan, Shalini Moningi, Benjamin H Kann, Raymond H Mak, Miranda Lam

Purpose: Routinely collected administrative data provide insights into health care utilization and outcomes but lack detailed clinical information, such as the specific site and intent of radiation therapy (RT). This study aimed to validate claims-based algorithms to accurately identify thoracic RT (TRT) and curative-intent RT in administrative databases.

Methods: Patients at our institution with lung cancer and any RT Current Procedural Terminology (CPT) code from October 2015 to January 2024 were analyzed. RT claims were organized by treatment episode, and RT details were manually abstracted from the electronic health record to classify episodes as TRT or non-TRT and curative or noncurative. A priori algorithms were defined as the presence of respiratory motion management codes, >14 treatment codes (except for stereotactic body RT [SBRT] courses), with or without exclusive thoracic malignancy diagnosis codes. Positive predictive value (PPV) was computed for each episode, stratified by modality (three-dimensional conformal RT [3DCRT], intensity-modulated RT [IMRT], and SBRT). Algorithms were considered acceptable if the lower bound of the Clopper-Pearson 95% CI for PPV exceeded 70%.

Results: A total of 3,846 RT episodes were analyzed. The primary a priori TRT algorithm achieved a PPV of 97% (95% CI, 96 to 98) for IMRT, 99% (95% CI, 97 to 99) for SBRT, and 87% (95% CI, 81 to 92) for 3DCRT. Performance declined when exclusive thoracic malignancy diagnosis codes were excluded. For curative-intent RT, PPVs were 87% for IMRT, 90% for SBRT, and 55% for 3DCRT.

Conclusion: Clinically informed algorithms can accurately identify TRT in claims data, achieving high PPVs particularly for IMRT and SBRT courses. These algorithms can be applied in claims databases to assess RT toxicity and effectiveness. External validation across diverse data sets will be important to confirm generalizability.

目的:常规收集的管理数据提供了对医疗保健利用和结果的见解,但缺乏详细的临床信息,如放射治疗(RT)的具体部位和意图。本研究旨在验证基于索赔的算法,以准确识别管理数据库中的胸部RT (TRT)和治疗目的RT。方法:对我院2015年10月至2024年1月收治的肺癌及所有RT现行程序术语(CPT)编码患者进行分析。根据治疗事件组织RT声明,并从电子健康记录中手动提取RT详细信息,将发作分为TRT或非TRT,治愈或不可治愈。先验算法被定义为存在呼吸运动管理代码,bbbb14治疗代码(立体定向体RT [SBRT]疗程除外),具有或不具有排他胸部恶性肿瘤诊断代码。计算每次发作的阳性预测值(PPV),并按模式(三维适形放疗[3DCRT]、调强放疗[IMRT]和SBRT)分层。如果PPV的Clopper-Pearson 95% CI下界超过70%,则认为算法是可接受的。结果:共分析了3846例RT发作。初级先验TRT算法对于IMRT的PPV为97% (95% CI, 96 ~ 98),对于SBRT的PPV为99% (95% CI, 97 ~ 99),对于3DCRT的PPV为87% (95% CI, 81 ~ 92)。排除胸腔恶性肿瘤诊断代码后,诊断效果下降。对于治疗目的RT, IMRT的ppv为87%,SBRT为90%,3DCRT为55%。结论:临床知情算法可以准确识别索赔数据中的TRT,特别是在IMRT和SBRT疗程中实现高ppv。这些算法可以应用于索赔数据库,以评估RT的毒性和有效性。跨不同数据集的外部验证对于确认泛化性非常重要。
{"title":"Validation of Claims-Based Algorithms to Classify Thoracic Radiation Therapy Courses.","authors":"Shane S Neibart, Nicholas Lin, Jacob Hogan, Shalini Moningi, Benjamin H Kann, Raymond H Mak, Miranda Lam","doi":"10.1200/CCI-25-00266","DOIUrl":"https://doi.org/10.1200/CCI-25-00266","url":null,"abstract":"<p><strong>Purpose: </strong>Routinely collected administrative data provide insights into health care utilization and outcomes but lack detailed clinical information, such as the specific site and intent of radiation therapy (RT). This study aimed to validate claims-based algorithms to accurately identify thoracic RT (TRT) and curative-intent RT in administrative databases.</p><p><strong>Methods: </strong>Patients at our institution with lung cancer and any RT Current Procedural Terminology (CPT) code from October 2015 to January 2024 were analyzed. RT claims were organized by treatment episode, and RT details were manually abstracted from the electronic health record to classify episodes as TRT or non-TRT and curative or noncurative. A priori algorithms were defined as the presence of respiratory motion management codes, >14 treatment codes (except for stereotactic body RT [SBRT] courses), with or without exclusive thoracic malignancy diagnosis codes. Positive predictive value (PPV) was computed for each episode, stratified by modality (three-dimensional conformal RT [3DCRT], intensity-modulated RT [IMRT], and SBRT). Algorithms were considered acceptable if the lower bound of the Clopper-Pearson 95% CI for PPV exceeded 70%.</p><p><strong>Results: </strong>A total of 3,846 RT episodes were analyzed. The primary a priori TRT algorithm achieved a PPV of 97% (95% CI, 96 to 98) for IMRT, 99% (95% CI, 97 to 99) for SBRT, and 87% (95% CI, 81 to 92) for 3DCRT. Performance declined when exclusive thoracic malignancy diagnosis codes were excluded. For curative-intent RT, PPVs were 87% for IMRT, 90% for SBRT, and 55% for 3DCRT.</p><p><strong>Conclusion: </strong>Clinically informed algorithms can accurately identify TRT in claims data, achieving high PPVs particularly for IMRT and SBRT courses. These algorithms can be applied in claims databases to assess RT toxicity and effectiveness. External validation across diverse data sets will be important to confirm generalizability.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500266"},"PeriodicalIF":2.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146114914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-Supervised Transformer-Based Pipeline for Liver Tumor Segmentation and Type Classification. 基于自监督变压器的肝脏肿瘤分割与类型分类。
IF 2.8 Q2 ONCOLOGY Pub Date : 2026-02-01 Epub Date: 2026-01-30 DOI: 10.1200/CCI-25-00135
Ramtin Mojtahedi, Mohammad Hamghalam, Jacob J Peoples, William R Jarnagin, Richard K G Do, Amber L Simpson

Purpose: It is essential to detect and segment liver tumors to guide treatment and track disease progression. To reduce the need for large annotated data sets, we present an end-to-end pipeline that uses self-supervised pretraining to improve segmentation and then classifies tumor types with a separate pretrained classifier applied to the segmented tumor regions.

Methods: First, we pretrained the encoder of a transformer-based network using a self-supervised approach on unlabeled abdominal computed tomography images. Subsequently, we fine-tuned the segmentation network to segment the liver and tumors, and the tumor regions were classified using a pretrained convolutional neural network (Inception-v3 architecture) as intrahepatic cholangiocarcinoma (ICC), hepatocellular carcinoma (HCC), or colorectal liver metastases (CRLMs). We evaluated 459 images (155 HCC, 107 ICC, 197 CRLM). For external testing, we used an independent public data set (n = 40).

Results: Averaged across HCC, ICC, and CRLM, in comparison with a supervised baseline (no pretraining), self-supervised pretraining improved the liver Dice similarity coefficient (DSC) by 6.4 percentage points and reduced the 95th-percentile Hausdorff distance (HD95) by 32.97 mm. For tumors, the DSC increased by 6.0 percentage points and the HD95 decreased by 3.2 mm. Tumor type classification achieved AUC 0.98 (95% CI, 0.96 to 1.00) and accuracy 96% (95% CI, 92% to 99%). Segmentation performance on the external data was close to the internal cohort with tumor DSC 0.73, intersection over union (IoU) 0.60, and HD95 30.98 mm and liver DSC 0.91, IoU 0.83, and HD95 29.67 mm.

Conclusion: The proposed self-supervised, end-to-end pipeline improves liver tumor segmentation and provides accurate tumor type classification, supporting reliable radiologic assessment, treatment planning, and improved prognostication for patients with liver cancer.

目的:肝肿瘤的检测和分割对指导治疗和跟踪疾病进展至关重要。为了减少对大型注释数据集的需求,我们提出了一个端到端管道,该管道使用自监督预训练来改进分割,然后使用一个单独的预训练分类器对分割的肿瘤区域进行肿瘤类型分类。方法:首先,我们使用自监督方法对基于变压器的网络的编码器进行预训练,该网络使用未标记的腹部计算机断层扫描图像。随后,我们对分割网络进行了微调,以分割肝脏和肿瘤,并使用预训练的卷积神经网络(Inception-v3架构)将肿瘤区域分类为肝内胆管癌(ICC)、肝细胞癌(HCC)或结直肠癌肝转移瘤(crlm)。我们评估了459张图像(HCC 155张,ICC 107张,CRLM 197张)。对于外部测试,我们使用独立的公共数据集(n = 40)。结果:在HCC、ICC和CRLM中,与监督基线(无预训练)相比,自我监督预训练将肝脏Dice相似系数(DSC)提高了6.4个百分点,将第95百分位Hausdorff距离(HD95)降低了32.97 mm。对于肿瘤,DSC增加了6.0个百分点,HD95下降了3.2毫米。肿瘤类型分类达到AUC 0.98 (95% CI, 0.96 ~ 1.00),准确率96% (95% CI, 92% ~ 99%)。外部数据的分割性能接近内部队列,肿瘤DSC为0.73,IoU为0.60,HD95为30.98 mm,肝脏DSC为0.91,IoU为0.83,HD95为29.67 mm。结论:提出的自我监督的端到端管道改善了肝脏肿瘤分割,提供了准确的肿瘤类型分类,支持可靠的放射评估,治疗计划,改善了肝癌患者的预后。
{"title":"Self-Supervised Transformer-Based Pipeline for Liver Tumor Segmentation and Type Classification.","authors":"Ramtin Mojtahedi, Mohammad Hamghalam, Jacob J Peoples, William R Jarnagin, Richard K G Do, Amber L Simpson","doi":"10.1200/CCI-25-00135","DOIUrl":"10.1200/CCI-25-00135","url":null,"abstract":"<p><strong>Purpose: </strong>It is essential to detect and segment liver tumors to guide treatment and track disease progression. To reduce the need for large annotated data sets, we present an end-to-end pipeline that uses self-supervised pretraining to improve segmentation and then classifies tumor types with a separate pretrained classifier applied to the segmented tumor regions.</p><p><strong>Methods: </strong>First, we pretrained the encoder of a transformer-based network using a self-supervised approach on unlabeled abdominal computed tomography images. Subsequently, we fine-tuned the segmentation network to segment the liver and tumors, and the tumor regions were classified using a pretrained convolutional neural network (Inception-v3 architecture) as intrahepatic cholangiocarcinoma (ICC), hepatocellular carcinoma (HCC), or colorectal liver metastases (CRLMs). We evaluated 459 images (155 HCC, 107 ICC, 197 CRLM). For external testing, we used an independent public data set (n = 40).</p><p><strong>Results: </strong>Averaged across HCC, ICC, and CRLM, in comparison with a supervised baseline (no pretraining), self-supervised pretraining improved the liver Dice similarity coefficient (DSC) by 6.4 percentage points and reduced the 95th-percentile Hausdorff distance (HD<sub>95</sub>) by 32.97 mm. For tumors, the DSC increased by 6.0 percentage points and the HD<sub>95</sub> decreased by 3.2 mm. Tumor type classification achieved AUC 0.98 (95% CI, 0.96 to 1.00) and accuracy 96% (95% CI, 92% to 99%). Segmentation performance on the external data was close to the internal cohort with tumor DSC 0.73, intersection over union (IoU) 0.60, and HD<sub>95</sub> 30.98 mm and liver DSC 0.91, IoU 0.83, and HD<sub>95</sub> 29.67 mm.</p><p><strong>Conclusion: </strong>The proposed self-supervised, end-to-end pipeline improves liver tumor segmentation and provides accurate tumor type classification, supporting reliable radiologic assessment, treatment planning, and improved prognostication for patients with liver cancer.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500135"},"PeriodicalIF":2.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12866948/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146094763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Application of the Population Health Institute Model of Health for Identifying Cancer Catchment Area Priorities. 人口健康研究所健康模型在确定癌症集中地区优先事项中的应用。
IF 2.8 Q2 ONCOLOGY Pub Date : 2026-02-01 Epub Date: 2026-02-05 DOI: 10.1200/CCI-25-00126
Amy Trentham-Dietz, Thomas P Lawler, Ronald E Gangnon, Allison R Dahlke, Noelle K LoConte, Earlise C Ward, Christine P Muganda, Shaneda Warren Andersen, Marjory L Givens

Purpose: The University of Wisconsin Population Health Institute (PHI) Model of Health, grounded in models developed over a decade ago, provides a framework for prioritizing health-related investments including setting agendas, implementing policies, and sharing resources for improving community health and health equity. The model includes multiple determinants of health and two broad health outcomes (length and quality of life). We adapted the PHI Model of Health to cancer outcomes.

Methods: Using county-level publicly available data, health factor summary measures were derived in three areas: health infrastructure including health promotion and clinical care, physical environment, and social and economic factors. A composite health factor z-score was calculated as the weighted (40%, 15%, and 45%, respectively) average of the summary measures for each county, and k-means clustering was used to create unequally sized county groups with lower (healthier) to higher (less healthy) z-scores. We fit age-adjusted negative binomial regression models to estimate rate ratios and 95% CI for cancer mortality in relation to county health factor cluster.

Results: Age-adjusted cancer mortality rates increased across the 10 county health factor clusters for all-cancers as well as for lung, colorectal, breast, and prostate cancers. Rate ratios generally increased across the 10 health factor clusters for all cancers combined and for specific cancer types. Compared with counties with the most favorable health factor conditions, the counties with the least favorable conditions had an all-cancer mortality rate ratio of 1.49 (95% CI, 1.39 to 1.60).

Conclusion: The PHI model of health adapted to cancer outcomes provides an approach for linking community-specific conditions to the interventions that hold promise to directly address drivers of the cancer burden.

目的:威斯康星大学人口健康研究所(PHI)健康模型以十多年前开发的模型为基础,为确定健康相关投资的优先次序提供了一个框架,包括制定议程、实施政策和共享资源,以改善社区健康和健康公平。该模型包括多种健康决定因素和两种广泛的健康结果(寿命和生活质量)。我们将PHI健康模型应用于癌症结果。方法:利用县级公开数据,从卫生基础设施(包括健康促进和临床护理)、自然环境和社会经济因素三个方面得出健康因素汇总测度。综合健康因子z-得分计算为每个县的加权(分别为40%、15%和45%)平均值,k-均值聚类用于创建大小不等的县组,其z-得分较低(较健康)到较高(较不健康)。我们拟合年龄调整后的负二项回归模型来估计与县健康因素集群相关的癌症死亡率的比率和95% CI。结果:在10个县的所有癌症以及肺癌、结直肠癌、乳腺癌和前列腺癌的健康因素集群中,年龄调整后的癌症死亡率都有所增加。所有癌症和特定癌症类型的10个健康因素群的比率普遍增加。与健康因素条件最有利的县相比,条件最不利的县的所有癌症死亡率比为1.49 (95% CI, 1.39 ~ 1.60)。结论:适应癌症结果的PHI健康模型提供了一种将社区特定条件与有望直接解决癌症负担驱动因素的干预措施联系起来的方法。
{"title":"Application of the Population Health Institute Model of Health for Identifying Cancer Catchment Area Priorities.","authors":"Amy Trentham-Dietz, Thomas P Lawler, Ronald E Gangnon, Allison R Dahlke, Noelle K LoConte, Earlise C Ward, Christine P Muganda, Shaneda Warren Andersen, Marjory L Givens","doi":"10.1200/CCI-25-00126","DOIUrl":"https://doi.org/10.1200/CCI-25-00126","url":null,"abstract":"<p><strong>Purpose: </strong>The University of Wisconsin Population Health Institute (PHI) Model of Health, grounded in models developed over a decade ago, provides a framework for prioritizing health-related investments including setting agendas, implementing policies, and sharing resources for improving community health and health equity. The model includes multiple determinants of health and two broad health outcomes (length and quality of life). We adapted the PHI Model of Health to cancer outcomes.</p><p><strong>Methods: </strong>Using county-level publicly available data, health factor summary measures were derived in three areas: health infrastructure including health promotion and clinical care, physical environment, and social and economic factors. A composite health factor z-score was calculated as the weighted (40%, 15%, and 45%, respectively) average of the summary measures for each county, and k-means clustering was used to create unequally sized county groups with lower (healthier) to higher (less healthy) z-scores. We fit age-adjusted negative binomial regression models to estimate rate ratios and 95% CI for cancer mortality in relation to county health factor cluster.</p><p><strong>Results: </strong>Age-adjusted cancer mortality rates increased across the 10 county health factor clusters for all-cancers as well as for lung, colorectal, breast, and prostate cancers. Rate ratios generally increased across the 10 health factor clusters for all cancers combined and for specific cancer types. Compared with counties with the most favorable health factor conditions, the counties with the least favorable conditions had an all-cancer mortality rate ratio of 1.49 (95% CI, 1.39 to 1.60).</p><p><strong>Conclusion: </strong>The PHI model of health adapted to cancer outcomes provides an approach for linking community-specific conditions to the interventions that hold promise to directly address drivers of the cancer burden.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500126"},"PeriodicalIF":2.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146127245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
End-to-End Pretreatment Prediction of Radiation Pneumonitis in Patients With Non-Small Cell Lung Cancer Using Computed Tomography: A Vision Transformer Approach. 使用计算机断层扫描对非小细胞肺癌患者放射性肺炎的端到端预处理预测:一种视觉转换方法。
IF 2.8 Q2 ONCOLOGY Pub Date : 2026-02-01 Epub Date: 2026-02-25 DOI: 10.1200/CCI-25-00020
Julie Midroni, Felipe S Torres, Jay Hennessy, Tony Tadic, Andrew Hope, Philip Wong, Srinivas Raman

Purpose: Radiation pneumonitis (RP) is the most common toxicity after thoracic radiotherapy. We develop an artificial intelligence model to predict RP in an institutional cohort of patients undergoing radiotherapy for non-small cell lung cancer.

Methods: Data were collected from patients diagnosed between 2002 and 2020. Patients were screened for a known survival/RP outcome, as well as treatment and clinical parameters. A transformer, pretrained on an open-source data set, was first trained to predict abnormal versus normal pulmonary function based on computed tomography (CT) scans. Transfer learning was then used to apply this model to the RP data set. Three clinical-plus-dosimetric variable models were trained. Finally, a model that combined the CT-based risk score and clinical/dosimetric variables was also trained, to explore if the CT-based risk score improved risk stratification. All models were cross-validated.

Results: 1,023 patients were included in the RP data set, for a total of 2,257 pretreatment scans, with a 15% RP rate. The clinical-plus-dosimetric-only values were 0.70, 0.70, and 0.71, and the CT-only was 0.66. Combining the CT-based risk score and clinical parameters improved the receiver operating characteristic curve to a value of 0.74, averaged across all folds. The combined model also had superior sensitivity for a fixed specificity value of 60%. Precision-recall metrics were comparable across models. Activation mapping of the CT-only model showed prioritization of upper lung and right lung.

Conclusion: In a cohort treated heterogeneous radiotherapy techniques and doses, combining CT-based risk scores with clinical values enhances the prediction of RP. This suggests that CT scans contain additional information that has the potential to enhance RP predictions. Activation score mapping shows focus on lung structure, upper lung, and right lung. Model code is available online.

目的:放射性肺炎(RP)是胸部放疗后最常见的毒性反应。我们开发了一个人工智能模型来预测非小细胞肺癌放疗患者的机构队列中的RP。方法:收集2002 - 2020年诊断的患者资料。筛选患者已知的生存/RP结果,以及治疗和临床参数。在一个开源数据集上预先训练一个变压器,首先训练它来预测基于计算机断层扫描(CT)的异常与正常肺功能。然后使用迁移学习将该模型应用于RP数据集。训练了三个临床加剂量变量模型。最后,还训练了一个结合基于ct的风险评分和临床/剂量学变量的模型,以探索基于ct的风险评分是否能改善风险分层。所有模型进行交叉验证。结果:1023例患者被纳入RP数据集,共进行了2257次预处理扫描,RP率为15%。临床加剂量值分别为0.70、0.70和0.71,ct值为0.66。结合基于ct的风险评分和临床参数,将受试者工作特征曲线提高到0.74,所有折叠的平均值。联合模型在60%的固定特异性值上也具有优越的敏感性。各型号的精确召回指标具有可比性。ct模型的激活映射显示上肺和右肺优先。结论:在接受异质放疗技术和剂量治疗的队列中,将基于ct的风险评分与临床价值相结合可以增强RP的预测。这表明CT扫描包含额外的信息,有可能提高RP的预测。激活评分映射显示肺结构、上肺和右肺的重点。模型代码可在线获得。
{"title":"End-to-End Pretreatment Prediction of Radiation Pneumonitis in Patients With Non-Small Cell Lung Cancer Using Computed Tomography: A Vision Transformer Approach.","authors":"Julie Midroni, Felipe S Torres, Jay Hennessy, Tony Tadic, Andrew Hope, Philip Wong, Srinivas Raman","doi":"10.1200/CCI-25-00020","DOIUrl":"https://doi.org/10.1200/CCI-25-00020","url":null,"abstract":"<p><strong>Purpose: </strong>Radiation pneumonitis (RP) is the most common toxicity after thoracic radiotherapy. We develop an artificial intelligence model to predict RP in an institutional cohort of patients undergoing radiotherapy for non-small cell lung cancer.</p><p><strong>Methods: </strong>Data were collected from patients diagnosed between 2002 and 2020. Patients were screened for a known survival/RP outcome, as well as treatment and clinical parameters. A transformer, pretrained on an open-source data set, was first trained to predict abnormal versus normal pulmonary function based on computed tomography (CT) scans. Transfer learning was then used to apply this model to the RP data set. Three clinical-plus-dosimetric variable models were trained. Finally, a model that combined the CT-based risk score and clinical/dosimetric variables was also trained, to explore if the CT-based risk score improved risk stratification. All models were cross-validated.</p><p><strong>Results: </strong>1,023 patients were included in the RP data set, for a total of 2,257 pretreatment scans, with a 15% RP rate. The clinical-plus-dosimetric-only values were 0.70, 0.70, and 0.71, and the CT-only was 0.66. Combining the CT-based risk score and clinical parameters improved the receiver operating characteristic curve to a value of 0.74, averaged across all folds. The combined model also had superior sensitivity for a fixed specificity value of 60%. Precision-recall metrics were comparable across models. Activation mapping of the CT-only model showed prioritization of upper lung and right lung.</p><p><strong>Conclusion: </strong>In a cohort treated heterogeneous radiotherapy techniques and doses, combining CT-based risk scores with clinical values enhances the prediction of RP. This suggests that CT scans contain additional information that has the potential to enhance RP predictions. Activation score mapping shows focus on lung structure, upper lung, and right lung. Model code is available online.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500020"},"PeriodicalIF":2.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147291908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating the Readiness of Saudi Oncology Real-World Data for Standardization and Quality Enhancement. 评估沙特肿瘤学真实世界数据标准化和质量提高的准备情况。
IF 2.8 Q2 ONCOLOGY Pub Date : 2026-02-01 Epub Date: 2026-01-16 DOI: 10.1200/CCI-25-00248
Almaha Alfakhri, Ohoud Almadani, Ibrahim Asiri, Nada Alsuhebany, Ahmed Alanazi, Turki Althunian

Purpose: Real-world data (RWD) are increasingly used in oncology research, regulatory decisions, and clinical practice; however, variability in data quality and lack of standardization remain major limitations. This study assessed the readiness of oncology RWD from Saudi health care centers for standardization and evaluated their completeness and accuracy.

Methods: Deidentified electronic health records for adult patients (18 years and older) diagnosed with breast cancer, thyroid cancer, colorectal cancer, gastric cancer, hepatocellular carcinoma, or renal cell carcinoma were extracted from five health care centers within the Saudi Real-World Evidence Network. Readiness for standardization was evaluated by assessing alignment with data elements in the Minimal Common Oncology Data Elements (mCODE) framework, a standardized and clinically focused oncology data model. Data quality was evaluated using two dimensions: completeness, defined as the proportion of patients with at least one entered value for each element; and accuracy, defined as the proportion of correct entries based on verification checks (including plausibility and consistency). Outcomes were calculated at the element level and weighted to generate domain- and center-level proportions.

Results: A total of 20,671 oncology patients were included. Overall weighted alignment with mCODE domains was moderate (62.43%). The patient domain showed the highest alignment (71.43%), whereas the outcome domain exhibited significant gaps. Data completeness was low to moderate (49.02%), with higher levels in common cancers (54.33%) than in rare cancers (51.50%). Data accuracy was high overall (95.03%), with rare cancers showing higher accuracy (98.76%) than common cancers (94.62%).

Conclusion: Saudi oncology RWD show moderate alignment with mCODE, with consistently high accuracy across domains. However, gaps in data completeness highlight the need for broader adoption of standardized data frameworks to support interoperability and enable nationwide research and regulatory use.

目的:真实世界数据(RWD)越来越多地用于肿瘤研究、监管决策和临床实践;然而,数据质量的可变性和缺乏标准化仍然是主要的限制。本研究评估了沙特卫生保健中心肿瘤RWD的标准化准备情况,并评估了其完整性和准确性。方法:从沙特真实世界证据网络的五个卫生保健中心提取诊断为乳腺癌、甲状腺癌、结直肠癌、胃癌、肝细胞癌或肾细胞癌的成年患者(18岁及以上)的未识别电子健康记录。通过评估与最小通用肿瘤数据元素(mCODE)框架中数据元素的一致性来评估标准化的准备情况,mCODE框架是一种标准化的临床肿瘤数据模型。数据质量通过两个维度进行评估:完整性,定义为每个元素至少有一个输入值的患者比例;准确性,定义为基于验证检查(包括合理性和一致性)的正确条目的比例。结果在元素水平上计算,并加权以产生领域和中心水平的比例。结果:共纳入肿瘤患者20671例。与mCODE域的总体加权一致性中等(62.43%)。患者域显示出最高的一致性(71.43%),而结果域显示出显著的差距。数据完整性为中低(49.02%),常见癌症(54.33%)高于罕见癌症(51.50%)。总体而言,数据准确性较高(95.03%),其中罕见癌症的准确性(98.76%)高于常见癌症(94.62%)。结论:沙特肿瘤RWD显示出与mCODE的中等一致性,跨域具有一致的高准确性。然而,数据完整性方面的差距突出表明,需要更广泛地采用标准化数据框架,以支持互操作性,并使全国范围的研究和监管使用成为可能。
{"title":"Evaluating the Readiness of Saudi Oncology Real-World Data for Standardization and Quality Enhancement.","authors":"Almaha Alfakhri, Ohoud Almadani, Ibrahim Asiri, Nada Alsuhebany, Ahmed Alanazi, Turki Althunian","doi":"10.1200/CCI-25-00248","DOIUrl":"https://doi.org/10.1200/CCI-25-00248","url":null,"abstract":"<p><strong>Purpose: </strong>Real-world data (RWD) are increasingly used in oncology research, regulatory decisions, and clinical practice; however, variability in data quality and lack of standardization remain major limitations. This study assessed the readiness of oncology RWD from Saudi health care centers for standardization and evaluated their completeness and accuracy.</p><p><strong>Methods: </strong>Deidentified electronic health records for adult patients (18 years and older) diagnosed with breast cancer, thyroid cancer, colorectal cancer, gastric cancer, hepatocellular carcinoma, or renal cell carcinoma were extracted from five health care centers within the Saudi Real-World Evidence Network. Readiness for standardization was evaluated by assessing alignment with data elements in the Minimal Common Oncology Data Elements (mCODE) framework, a standardized and clinically focused oncology data model. Data quality was evaluated using two dimensions: completeness, defined as the proportion of patients with at least one entered value for each element; and accuracy, defined as the proportion of correct entries based on verification checks (including plausibility and consistency). Outcomes were calculated at the element level and weighted to generate domain- and center-level proportions.</p><p><strong>Results: </strong>A total of 20,671 oncology patients were included. Overall weighted alignment with mCODE domains was moderate (62.43%). The patient domain showed the highest alignment (71.43%), whereas the outcome domain exhibited significant gaps. Data completeness was low to moderate (49.02%), with higher levels in common cancers (54.33%) than in rare cancers (51.50%). Data accuracy was high overall (95.03%), with rare cancers showing higher accuracy (98.76%) than common cancers (94.62%).</p><p><strong>Conclusion: </strong>Saudi oncology RWD show moderate alignment with mCODE, with consistently high accuracy across domains. However, gaps in data completeness highlight the need for broader adoption of standardized data frameworks to support interoperability and enable nationwide research and regulatory use.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500248"},"PeriodicalIF":2.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146114921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prompt Engineering for Eastern Cooperative Oncology Group Status Extraction: Comparing Large Language Model Techniques. 东方合作肿瘤群体状态提取的提示工程:比较大语言模型技术。
IF 2.8 Q2 ONCOLOGY Pub Date : 2026-02-01 Epub Date: 2026-02-12 DOI: 10.1200/CCI-25-00226
Meenakshi Dubey, Kok Joon Chong, Yuba Raj Pun, Lee Yi Foo, Melissa Ooi, Iain Bee Huat Tan, David Shao Peng Tan, Kee Yuan Ngiam, Hwee Lin Wee

Purpose: Eastern Cooperative Oncology Group (ECOG) performance status is critical for cancer patient management, yet it is often documented only in unstructured clinical notes. This study compares several approaches to extract ECOG status from oncology notes, focusing on advanced prompting techniques for large language models (LLMs).

Methods: We evaluated four ECOG extraction approaches on unstructured clinical notes from patients with non-small cell lung cancer, multiple myeloma, or ovarian cancer (2017-2021). The approaches were a rule-based natural language processing algorithm, simple LLM prompting, and two advanced prompts (chain-of-thought and Double Filtering) using a domain-tuned LLM (LLAMAv3.2). Performance was measured on a binary outcome (any ECOG documented v none) and a three-class outcome (ECOG 0-1 v ≥2 v none) and via an adapted QUEST questionnaire for human evaluation.

Results: Both CoT and double filtering technique (DFT) achieved 94% accuracy, outperforming the rule-based method (91%) and simple prompting (86%). DFT had the highest specificity (0.91) and positive predictive value (PPV; 0.93), whereas CoT attained the highest sensitivity (0.98). In the QUEST evaluation, DFT and CoT scored higher on output quality, reasoning, bias reduction, and user satisfaction than the simple prompt. DFT received the top satisfaction rating. In the three-class analysis, DFT and CoT again performed best (accuracy 0.91 v 0.87) and DFT was most sensitive for ECOG ≥2 cases. Estimates for ECOG ≥2 remained imprecise because of the small sample (n = 20). All methods sometimes hallucinated ECOG status.

Conclusion: Advanced LLM prompting improved ECOG extraction over basic methods. DFT and CoT each showed specific strengths (DFT had higher PPV and user satisfaction; CoT achieved higher sensitivity). These approaches appear to be generalizable across cancer types. Key implementation considerations include computational cost and human oversight. Overall, advanced prompting can standardize ECOG documentation, accelerate patient cohort identification, and inform personalized treatment planning.

目的:东部肿瘤合作小组(ECOG)的工作状态对癌症患者的管理至关重要,但通常只记录在非结构化的临床记录中。本研究比较了几种从肿瘤学笔记中提取ECOG状态的方法,重点关注大型语言模型(llm)的高级提示技术。方法:我们对2017-2021年非小细胞肺癌、多发性骨髓瘤或卵巢癌患者的非结构化临床记录评估了四种ECOG提取方法。这些方法是基于规则的自然语言处理算法、简单的LLM提示和使用域调优LLM (LLAMAv3.2)的两个高级提示(思想链和双重过滤)。采用二元结果(有ECOG记录的v无)和三级结果(ECOG 0-1 v≥2 v无)并通过一份适用于人类评估的QUEST问卷来衡量绩效。结果:CoT和双重过滤技术(DFT)的准确率均达到94%,优于基于规则的方法(91%)和简单提示(86%)。DFT具有最高的特异性(0.91)和阳性预测值(PPV; 0.93),而CoT具有最高的敏感性(0.98)。在QUEST评估中,DFT和CoT在输出质量、推理、减少偏差和用户满意度方面得分高于简单提示。DFT获得了最高的满意度评级。在三级分析中,DFT和CoT表现最好(准确率0.91 v 0.87), DFT对ECOG≥2的病例最敏感。由于样本量小(n = 20), ECOG≥2的估计仍然不精确。所有方法有时会产生幻觉。结论:与基本方法相比,先进的LLM可改善ECOG提取。DFT和CoT各有其优势(DFT具有更高的PPV和用户满意度;CoT具有更高的灵敏度)。这些方法似乎适用于所有癌症类型。关键的实现考虑因素包括计算成本和人工监督。总的来说,先进的提示可以标准化ECOG文件,加速患者队列识别,并告知个性化的治疗计划。
{"title":"Prompt Engineering for Eastern Cooperative Oncology Group Status Extraction: Comparing Large Language Model Techniques.","authors":"Meenakshi Dubey, Kok Joon Chong, Yuba Raj Pun, Lee Yi Foo, Melissa Ooi, Iain Bee Huat Tan, David Shao Peng Tan, Kee Yuan Ngiam, Hwee Lin Wee","doi":"10.1200/CCI-25-00226","DOIUrl":"https://doi.org/10.1200/CCI-25-00226","url":null,"abstract":"<p><strong>Purpose: </strong>Eastern Cooperative Oncology Group (ECOG) performance status is critical for cancer patient management, yet it is often documented only in unstructured clinical notes. This study compares several approaches to extract ECOG status from oncology notes, focusing on advanced prompting techniques for large language models (LLMs).</p><p><strong>Methods: </strong>We evaluated four ECOG extraction approaches on unstructured clinical notes from patients with non-small cell lung cancer, multiple myeloma, or ovarian cancer (2017-2021). The approaches were a rule-based natural language processing algorithm, simple LLM prompting, and two advanced prompts (chain-of-thought and Double Filtering) using a domain-tuned LLM (LLAMAv3.2). Performance was measured on a binary outcome (any ECOG documented <i>v</i> none) and a three-class outcome (ECOG 0-1 <i>v</i> ≥2 <i>v</i> none) and via an adapted QUEST questionnaire for human evaluation.</p><p><strong>Results: </strong>Both CoT and double filtering technique (DFT) achieved 94% accuracy, outperforming the rule-based method (91%) and simple prompting (86%). DFT had the highest specificity (0.91) and positive predictive value (PPV; 0.93), whereas CoT attained the highest sensitivity (0.98). In the QUEST evaluation, DFT and CoT scored higher on output quality, reasoning, bias reduction, and user satisfaction than the simple prompt. DFT received the top satisfaction rating. In the three-class analysis, DFT and CoT again performed best (accuracy 0.91 <i>v</i> 0.87) and DFT was most sensitive for ECOG ≥2 cases. Estimates for ECOG ≥2 remained imprecise because of the small sample (n = 20). All methods sometimes hallucinated ECOG status.</p><p><strong>Conclusion: </strong>Advanced LLM prompting improved ECOG extraction over basic methods. DFT and CoT each showed specific strengths (DFT had higher PPV and user satisfaction; CoT achieved higher sensitivity). These approaches appear to be generalizable across cancer types. Key implementation considerations include computational cost and human oversight. Overall, advanced prompting can standardize ECOG documentation, accelerate patient cohort identification, and inform personalized treatment planning.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500226"},"PeriodicalIF":2.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146183253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Novel R Shiny Tool for Survival Analysis With Time-Varying Covariate in Oncology Studies: Overcoming Biases and Enhancing Collaboration. 肿瘤研究中时变协变量生存分析的新R闪亮工具:克服偏见和加强合作。
IF 2.8 Q2 ONCOLOGY Pub Date : 2026-02-01 Epub Date: 2026-01-30 DOI: 10.1200/CCI-25-00225
Yimei Li, Yang Qiao, Fei Gao, Jordan Gauthier, Qiang Ed Zhang, Jenna Voutsinas, Wendy Leisenring, Ted Gooley, Corinne Summers, Alexandre Hirayama, Cameron J Turtle, Rebecca Gardner, Jarcy Zee, Qian Vicky Wu

Purpose: Our study is motivated by evaluating the role of hematopoietic cell transplantation (HCT) after chimeric antigen receptor T-cell (CAR-T) therapy for ALL, a debated topic. Because patients may receive HCT at different times after CAR-T infusion or never, HCT post-CAR-T should be considered as a time-varying covariate (TVC).

Methods: Standard Cox models and Kaplan-Meier (KM) curves (naïve method) assume that TVC status is known and fixed at baseline, which can yield biased estimates. Landmark analysis is a popular alternative but depends on a chosen landmark time. Time-dependent (TD) Cox model is better suited for TVC although visualizing survival curves is complex. The newly proposed Smith-Zee method generates appropriate survival curves from TD Cox models.

Results: To address these challenges, we developed an open-source R Shiny tool integrating multiple models (naïve Cox, landmark Cox, and TD Cox) and curves (naïve KM, landmark KM, Smith-Zee, and Extended KM) to facilitate TVC analysis. Reanalysis of post-CAR-T HCT's effect on leukemia-free survival (LFS) showed consistent results between naïve and TD Cox models, whereas landmark analyses varied by landmark time. A separate data analysis of chronic graft-versus-host disease and survival showed that substantial differences emerged across statistical methods. Simulations revealed increased bias in naïve methods when TVC changed late and minimal bias when TVC changes occurred early relative to time to events.

Conclusion: We recommend TD Cox models and Smith-Zee curves for robust TVC analysis. Our R Shiny tool supports standardized analyses without requiring data sharing, thereby promoting collaboration across different institutions and providing a practical tool to advance survival analysis in oncology research.

目的:本研究的动机是评估嵌合抗原受体t细胞(CAR-T)治疗ALL后造血细胞移植(HCT)的作用,这是一个有争议的话题。由于患者在CAR-T输注后可能在不同时间接受HCT或从未接受过HCT,因此CAR-T后的HCT应被视为时变协变量(TVC)。方法:标准Cox模型和Kaplan-Meier (KM)曲线(naïve方法)假设TVC状态是已知的,并且在基线处是固定的,这可能产生有偏差的估计。里程碑分析是一种流行的替代方法,但取决于所选择的里程碑时间。虽然生存曲线的可视化比较复杂,但时间依赖(TD) Cox模型更适合TVC。新提出的Smith-Zee方法从TD Cox模型中生成合适的生存曲线。为了解决这些挑战,我们开发了一个开源R Shiny工具,集成了多个模型(naïve Cox, landmark Cox和TD Cox)和曲线(naïve KM, landmark KM, Smith-Zee和Extended KM),以促进TVC分析。重新分析car - t后HCT对无白血病生存(LFS)的影响,naïve和TD Cox模型之间的结果一致,而里程碑分析因里程碑时间而异。一项关于慢性移植物抗宿主病和生存率的独立数据分析显示,不同统计方法之间存在实质性差异。模拟显示,当TVC变化较晚时,naïve方法的偏差增加,而当TVC变化相对于事件发生的时间较早时,偏差最小。结论:我们推荐TD - Cox模型和Smith-Zee曲线进行稳健的TVC分析。我们的R Shiny工具支持标准化分析,而不需要数据共享,从而促进不同机构之间的合作,并为肿瘤研究中的生存分析提供实用工具。
{"title":"Novel R Shiny Tool for Survival Analysis With Time-Varying Covariate in Oncology Studies: Overcoming Biases and Enhancing Collaboration.","authors":"Yimei Li, Yang Qiao, Fei Gao, Jordan Gauthier, Qiang Ed Zhang, Jenna Voutsinas, Wendy Leisenring, Ted Gooley, Corinne Summers, Alexandre Hirayama, Cameron J Turtle, Rebecca Gardner, Jarcy Zee, Qian Vicky Wu","doi":"10.1200/CCI-25-00225","DOIUrl":"10.1200/CCI-25-00225","url":null,"abstract":"<p><strong>Purpose: </strong>Our study is motivated by evaluating the role of hematopoietic cell transplantation (HCT) after chimeric antigen receptor T-cell (CAR-T) therapy for ALL, a debated topic. Because patients may receive HCT at different times after CAR-T infusion or never, HCT post-CAR-T should be considered as a time-varying covariate (TVC).</p><p><strong>Methods: </strong>Standard Cox models and Kaplan-Meier (KM) curves (naïve method) assume that TVC status is known and fixed at baseline, which can yield biased estimates. Landmark analysis is a popular alternative but depends on a chosen landmark time. Time-dependent (TD) Cox model is better suited for TVC although visualizing survival curves is complex. The newly proposed Smith-Zee method generates appropriate survival curves from TD Cox models.</p><p><strong>Results: </strong>To address these challenges, we developed an open-source R Shiny tool integrating multiple models (naïve Cox, landmark Cox, and TD Cox) and curves (naïve KM, landmark KM, Smith-Zee, and Extended KM) to facilitate TVC analysis. Reanalysis of post-CAR-T HCT's effect on leukemia-free survival (LFS) showed consistent results between naïve and TD Cox models, whereas landmark analyses varied by landmark time. A separate data analysis of chronic graft-versus-host disease and survival showed that substantial differences emerged across statistical methods. Simulations revealed increased bias in naïve methods when TVC changed late and minimal bias when TVC changes occurred early relative to time to events.</p><p><strong>Conclusion: </strong>We recommend TD Cox models and Smith-Zee curves for robust TVC analysis. Our R Shiny tool supports standardized analyses without requiring data sharing, thereby promoting collaboration across different institutions and providing a practical tool to advance survival analysis in oncology research.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500225"},"PeriodicalIF":2.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12885575/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146094767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrating Symptom Screening in Pediatrics Into the Epic Electronic Health Record: Development and Acceptability for Pediatric Cancer Patients. 将儿科症状筛查纳入Epic电子健康记录:儿童癌症患者的发展和可接受性。
IF 2.8 Q2 ONCOLOGY Pub Date : 2026-02-01 Epub Date: 2026-02-12 DOI: 10.1200/CCI-25-00257
Adam P Yan, Emily Saso, Julia Shannon, Heather Laird, Alyssa Ramdeo, Robin Deliva, Samantha Baron, Bren Cardiff, Daniel Rosenfield, Ashley Graham, Mihir Ramnani, Zahra Syed, Denise Connolly, Allison Starr, Priya Patel, L Lee Dupuis, Lillian Sung

Purpose: Calls to implement routine symptom screening among pediatric oncology patients are increasing. Objectives were to develop and evaluate the usability of Symptom Screening in Pediatrics (SSPedi), a validated patient reported outcome tool, when integrated into the Epic electronic health record.

Methods: We developed self-report and proxy-report SSPedi in Epic's patient portal MyChart and enrolled patients with cancer age 12-18 years or their parent/guardians, and parents/guardians of patients with cancer age 2-18 years. Participants were enrolled in three cohorts of 10 participants per cohort. A clinical research associate evaluated the participants' ability to correctly complete eight tasks including finding and completing SSPedi on a scheduled day and when unscheduled, locating tips to manage symptoms, and viewing past SSPedi reports. Participants self-reported ease or difficulty in completing each task. Modifications were made to refine SSPedi in Epic after the enrollment of each cohort of 10 patients on the basis of feedback.

Results: We enrolled 30 participants, including 21 parents/guardians and nine patients. Overall, 60% were correctly able to find SSPedi on a scheduled reminder day and 33% were able to find SSPedi on an unscheduled day. Once found, 70% of participants could complete SSPedi correctly. Only 33% could correctly view SSPedi trends over time. By self-report, 20 of 30 participants (67%) found SSPedi easy or very easy to use overall. This increased to 100% in the final cohort of 10 participants.

Conclusion: We integrated SSPedi into Epic. Participants can successfully complete SSPedi when scheduled on a reminder day. They found it more challenging to complete SSPedi without a reminder and to view past SSPedi reports. Implementation will require patient and parent/or guardian training and support.

目的:要求在儿科肿瘤患者中实施常规症状筛查的呼声越来越高。目的是开发和评估儿科症状筛查(SSPedi)的可用性,这是一种经过验证的患者报告结果工具,可整合到Epic电子健康记录中。方法:我们在Epic患者门户网站MyChart中开发了自我报告和代理报告SSPedi,并招募了12-18岁的癌症患者或其父母/监护人,以及2-18岁癌症患者的父母/监护人。参与者被分为三个队列,每个队列10名参与者。一名临床研究助理评估了参与者正确完成八项任务的能力,包括在预定日期和非预定日期查找和完成SSPedi,找到管理症状的提示,以及查看过去的SSPedi报告。参与者自我报告完成每项任务的难易程度。每组10例患者入组后,根据反馈对Epic中的SSPedi进行修改完善。结果:共纳入30名受试者,包括21名家长/监护人和9名患者。总体而言,60%的人能够正确地在计划的提醒日找到SSPedi, 33%的人能够在未计划的日子找到SSPedi。一旦发现,70%的参与者可以正确完成SSPedi。只有33%的人能够正确地看到SSPedi随时间的趋势。通过自我报告,30名参与者中有20人(67%)认为SSPedi总体上容易或非常容易使用。在最后一组10名参与者中,这一比例增加到100%。结论:我们将SSPedi集成到Epic中。参与者可以在提醒日成功完成SSPedi。他们发现在没有提醒的情况下完成SSPedi和查看过去的SSPedi报告更具挑战性。实施将需要患者和家长/或监护人的培训和支持。
{"title":"Integrating Symptom Screening in Pediatrics Into the Epic Electronic Health Record: Development and Acceptability for Pediatric Cancer Patients.","authors":"Adam P Yan, Emily Saso, Julia Shannon, Heather Laird, Alyssa Ramdeo, Robin Deliva, Samantha Baron, Bren Cardiff, Daniel Rosenfield, Ashley Graham, Mihir Ramnani, Zahra Syed, Denise Connolly, Allison Starr, Priya Patel, L Lee Dupuis, Lillian Sung","doi":"10.1200/CCI-25-00257","DOIUrl":"https://doi.org/10.1200/CCI-25-00257","url":null,"abstract":"<p><strong>Purpose: </strong>Calls to implement routine symptom screening among pediatric oncology patients are increasing. Objectives were to develop and evaluate the usability of Symptom Screening in Pediatrics (SSPedi), a validated patient reported outcome tool, when integrated into the Epic electronic health record.</p><p><strong>Methods: </strong>We developed self-report and proxy-report SSPedi in Epic's patient portal MyChart and enrolled patients with cancer age 12-18 years or their parent/guardians, and parents/guardians of patients with cancer age 2-18 years. Participants were enrolled in three cohorts of 10 participants per cohort. A clinical research associate evaluated the participants' ability to correctly complete eight tasks including finding and completing SSPedi on a scheduled day and when unscheduled, locating tips to manage symptoms, and viewing past SSPedi reports. Participants self-reported ease or difficulty in completing each task. Modifications were made to refine SSPedi in Epic after the enrollment of each cohort of 10 patients on the basis of feedback.</p><p><strong>Results: </strong>We enrolled 30 participants, including 21 parents/guardians and nine patients. Overall, 60% were correctly able to find SSPedi on a scheduled reminder day and 33% were able to find SSPedi on an unscheduled day. Once found, 70% of participants could complete SSPedi correctly. Only 33% could correctly view SSPedi trends over time. By self-report, 20 of 30 participants (67%) found SSPedi easy or very easy to use overall. This increased to 100% in the final cohort of 10 participants.</p><p><strong>Conclusion: </strong>We integrated SSPedi into Epic. Participants can successfully complete SSPedi when scheduled on a reminder day. They found it more challenging to complete SSPedi without a reminder and to view past SSPedi reports. Implementation will require patient and parent/or guardian training and support.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500257"},"PeriodicalIF":2.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146183303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clinical Trial Patient Matching: A Real-Time, Common Data Model and Artificial Intelligence-Driven System for Semiautomated Patient Prescreening in Cancer Clinical Trials. 临床试验患者匹配:用于癌症临床试验半自动患者预筛选的实时、通用数据模型和人工智能驱动系统。
IF 2.8 Q2 ONCOLOGY Pub Date : 2026-01-01 Epub Date: 2026-01-09 DOI: 10.1200/CCI-25-00262
Guannan Gong, Jessica Liu, Sameer Pandya, Cristian Taborda, Nathalie Wiesendanger, Nate Price, Will Byron, Andreas Coppi, Patrick Young, Christina Wiess, Haley Dunning, Courtney Barganier, Rachel Brodeur, Neal Fischbach, Patricia LoRusso, Lajos Pusztai, So Yeon Kim, Mariya Rozenblit, Michael Cecchini, Anne Mongiu, Lourdes Mendez, Edward Kaftan, Charles Torre, Harlan Krumholz, Ian Krop, Wade Schulz, Maryam Lustberg, Pamela L Kunz

Purpose: Cancer clinical trial enrollment remains critically low at 5%-7% of adult patients despite exponential growth in available trials. Manual patient-trial matching represents a fundamental bottleneck, whereas current artificial intelligence (AI) and machine learning patient-trial matching systems lack data standardization and compatibility across health systems. We developed and validated a semiautomated clinical trial patient matching (CTPM) tool to improve recruitment efficiency and scalability.

Methods: We created a hybrid rules-based and natural language processing (NLP)-based pipeline that automatically screens patients using structured and unstructured electronic health record data standardized to the Observational Medical Outcomes Partnership (OMOP) common data model. CTPM performance was first evaluated on one metastatic colorectal cancer (CRC) trial by comparing CTPM accuracy and efficiency to manual chart review. Following the single-trial validation, we then implemented the system across 29 clinical trials spanning multiple cancer specialties and phases.

Results: For the single CRC trial, CTPM achieved 94% retrospective and 88% prospective accuracy, matching gold standard clinical chart review with 100% sensitivity. Implementation reduced chart review workload 10-fold and screening time by 41% (3.1 to 1.8 minutes per chart) for those patients who did undergo review. Since September 2022, the system has screened 98,348 patients across 29 trials, identifying 825 eligible candidates and facilitating 117 patient enrollments with 9%-37% consent rates.

Conclusion: This AI and NLP tool demonstrates improved efficiency in clinical trial recruitment by enabling research teams to focus on qualified candidates rather than exhaustive chart reviews. The OMOP-based framework supports scalability across health systems, with potential to address enrollment challenges that limit patient access to clinical trials.

目的:尽管现有临床试验呈指数级增长,但癌症临床试验的入组率仍极低,仅为成人患者的5%-7%。人工患者-试验匹配是一个基本瓶颈,而当前的人工智能(AI)和机器学习患者-试验匹配系统缺乏跨卫生系统的数据标准化和兼容性。我们开发并验证了一种半自动临床试验患者匹配(CTPM)工具,以提高招募效率和可扩展性。方法:我们创建了一个基于规则和基于自然语言处理(NLP)的混合管道,该管道使用结构化和非结构化电子健康记录数据自动筛选患者,这些数据标准化到观察性医疗结果合作伙伴关系(OMOP)通用数据模型。CTPM的性能首次在一项转移性结直肠癌(CRC)试验中通过比较CTPM的准确性和效率与手动图表审查来评估。在单次试验验证之后,我们在29项临床试验中实施了该系统,涵盖了多个癌症专科和阶段。结果:对于单个结直肠癌试验,CTPM达到94%的回顾性和88%的前瞻性准确性,与金标准临床图表审查100%的敏感性相匹配。对于那些确实接受了检查的患者,实施将图表审查工作量减少了10倍,筛查时间减少了41%(每张图表3.1至1.8分钟)。自2022年9月以来,该系统已经在29项试验中筛选了98,348名患者,确定了825名符合条件的候选人,并促进了117名患者的登记,同意率为9%-37%。结论:该AI和NLP工具通过使研究团队专注于合格的候选人而不是详尽的图表审查,提高了临床试验招募的效率。基于omop的框架支持跨卫生系统的可扩展性,有可能解决限制患者获得临床试验的注册挑战。
{"title":"Clinical Trial Patient Matching: A Real-Time, Common Data Model and Artificial Intelligence-Driven System for Semiautomated Patient Prescreening in Cancer Clinical Trials.","authors":"Guannan Gong, Jessica Liu, Sameer Pandya, Cristian Taborda, Nathalie Wiesendanger, Nate Price, Will Byron, Andreas Coppi, Patrick Young, Christina Wiess, Haley Dunning, Courtney Barganier, Rachel Brodeur, Neal Fischbach, Patricia LoRusso, Lajos Pusztai, So Yeon Kim, Mariya Rozenblit, Michael Cecchini, Anne Mongiu, Lourdes Mendez, Edward Kaftan, Charles Torre, Harlan Krumholz, Ian Krop, Wade Schulz, Maryam Lustberg, Pamela L Kunz","doi":"10.1200/CCI-25-00262","DOIUrl":"https://doi.org/10.1200/CCI-25-00262","url":null,"abstract":"<p><strong>Purpose: </strong>Cancer clinical trial enrollment remains critically low at 5%-7% of adult patients despite exponential growth in available trials. Manual patient-trial matching represents a fundamental bottleneck, whereas current artificial intelligence (AI) and machine learning patient-trial matching systems lack data standardization and compatibility across health systems. We developed and validated a semiautomated clinical trial patient matching (CTPM) tool to improve recruitment efficiency and scalability.</p><p><strong>Methods: </strong>We created a hybrid rules-based and natural language processing (NLP)-based pipeline that automatically screens patients using structured and unstructured electronic health record data standardized to the Observational Medical Outcomes Partnership (OMOP) common data model. CTPM performance was first evaluated on one metastatic colorectal cancer (CRC) trial by comparing CTPM accuracy and efficiency to manual chart review. Following the single-trial validation, we then implemented the system across 29 clinical trials spanning multiple cancer specialties and phases.</p><p><strong>Results: </strong>For the single CRC trial, CTPM achieved 94% retrospective and 88% prospective accuracy, matching gold standard clinical chart review with 100% sensitivity. Implementation reduced chart review workload 10-fold and screening time by 41% (3.1 to 1.8 minutes per chart) for those patients who did undergo review. Since September 2022, the system has screened 98,348 patients across 29 trials, identifying 825 eligible candidates and facilitating 117 patient enrollments with 9%-37% consent rates.</p><p><strong>Conclusion: </strong>This AI and NLP tool demonstrates improved efficiency in clinical trial recruitment by enabling research teams to focus on qualified candidates rather than exhaustive chart reviews. The OMOP-based framework supports scalability across health systems, with potential to address enrollment challenges that limit patient access to clinical trials.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500262"},"PeriodicalIF":2.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145946722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
JCO Clinical Cancer Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1