首页 > 最新文献

JCO Clinical Cancer Informatics最新文献

英文 中文
Early-Stage Breast Cancer in Women Younger Than 50 Years: Comparing American Joint Committee on Cancer Anatomic and Prognostic Stages With Partitioning Around Medoids Clusters in SEER Data. 50岁以下女性的早期乳腺癌:比较美国癌症解剖和预后联合委员会在SEER数据中围绕中间簇划分的分期
IF 2.8 Q2 ONCOLOGY Pub Date : 2026-03-01 Epub Date: 2026-03-03 DOI: 10.1200/CCI-25-00173
Suvd Zulbayar, Jennifer Brooks, Arian Aminoleslami, Shana Kim, Renzo Jose Carlos Calderon Anyosa, Feifan Xiang, Julia Knight, Andrea Eisen, Geoffrey M Anderson

Purpose: Early-stage breast cancer (ESBC) in women younger than 50 years often presents with tumor features, including grade and hormone receptor and human epidermal growth factor receptor 2 (HER2) status different from older women. Machine learning clustering techniques can reveal underlying patterns in the inter-relationships of these features and provide novel insights to inform and guide decision making by patients and providers.

Methods: Partitioning around medoids (PAM) was applied to SEER data from 67,746 women age 18-49 years diagnosed with ESBC. PAM clustering based on tumor size (T), nodal status (N), grade, and receptor status identified 10 distinct clusters. The PAM clusters and American Joint Committee on Cancer (AJCC) anatomic and prognostic stages were compared in terms of their tumor features and their association with chemotherapy and survival.

Results: AJCC anatomic and prognostic stages are primarily defined by T and N. PAM clusters were primarily defined by receptor status and grade. PAM clusters align closely with luminal A, luminal B, triple-negative, or HER2-overexpressing treatment-related subtypes. PAM clusters better discriminated chemotherapy treatment, with C-statistic 0.839 (95% CI, 0.836 to 0.842), than either anatomic, with C-statistic 0.770 (95% CI, 0.767 to 0.773), or prognostic staging, with C-statistic 0.796 (95% CI, 0.794 to 0.800). PAM clusters were better predictors of 5-year overall survival, with C-statistic 0.733 (95% CI, 0.727 to 0.739), than anatomic stages, with C-statistic 0.721 (95% CI, 0.715 to 728), but not as predictive as prognostic stages, with C-statistic 0.759 (95% CI, 0.753 to 0.764).

Conclusion: Data-driven PAM clusters provide novel insights into the inter-relationship of tumor features and their association with hormonal, targeted, and chemotherapy treatment and with survival outcomes in women younger than 50 years with ESBC. An online application was created so that the PAM clusters could be used as alternatives or in addition to traditional AJCC staging to inform and guide patients and providers.

目的:50岁以下女性的早期乳腺癌(ESBC)往往表现出不同于老年女性的肿瘤特征,包括肿瘤分级、激素受体和人表皮生长因子受体2 (HER2)状态。机器学习聚类技术可以揭示这些特征之间相互关系的潜在模式,并提供新的见解,以告知和指导患者和提供者的决策。方法:对67,746名年龄在18-49岁诊断为ESBC的女性的SEER数据进行分组(PAM)。基于肿瘤大小(T)、淋巴结状态(N)、分级和受体状态的PAM聚类确定了10个不同的聚类。PAM簇和美国癌症联合委员会(AJCC)的解剖和预后分期比较其肿瘤特征及其与化疗和生存的关系。结果:AJCC的解剖和预后分期主要由T和n来确定。PAM簇主要由受体状态和分级来确定。PAM簇与管腔A、管腔B、三阴性或her2过表达治疗相关亚型密切相关。PAM聚类更好地区分化疗,c统计量为0.839 (95% CI, 0.836 ~ 0.842),而解剖学上的c统计量为0.770 (95% CI, 0.767 ~ 0.773),或预后分期上的c统计量为0.796 (95% CI, 0.794 ~ 0.800)。PAM集群是更好的5年总生存预测指标,其c统计量为0.733 (95% CI, 0.727 ~ 0.739),优于解剖分期,其c统计量为0.721 (95% CI, 0.715 ~ 728),但不如预后分期,其c统计量为0.759 (95% CI, 0.753 ~ 0.764)。结论:数据驱动的PAM集群为肿瘤特征的相互关系及其与激素、靶向和化疗治疗的关系以及50岁以下ESBC女性患者的生存结果提供了新的见解。创建了一个在线应用程序,使PAM集群可以用作传统AJCC分期的替代方案或补充,以通知和指导患者和提供者。
{"title":"Early-Stage Breast Cancer in Women Younger Than 50 Years: Comparing American Joint Committee on Cancer Anatomic and Prognostic Stages With Partitioning Around Medoids Clusters in SEER Data.","authors":"Suvd Zulbayar, Jennifer Brooks, Arian Aminoleslami, Shana Kim, Renzo Jose Carlos Calderon Anyosa, Feifan Xiang, Julia Knight, Andrea Eisen, Geoffrey M Anderson","doi":"10.1200/CCI-25-00173","DOIUrl":"10.1200/CCI-25-00173","url":null,"abstract":"<p><strong>Purpose: </strong>Early-stage breast cancer (ESBC) in women younger than 50 years often presents with tumor features, including grade and hormone receptor and human epidermal growth factor receptor 2 (HER2) status different from older women. Machine learning clustering techniques can reveal underlying patterns in the inter-relationships of these features and provide novel insights to inform and guide decision making by patients and providers.</p><p><strong>Methods: </strong>Partitioning around medoids (PAM) was applied to SEER data from 67,746 women age 18-49 years diagnosed with ESBC. PAM clustering based on tumor size (T), nodal status (N), grade, and receptor status identified 10 distinct clusters. The PAM clusters and American Joint Committee on Cancer (AJCC) anatomic and prognostic stages were compared in terms of their tumor features and their association with chemotherapy and survival.</p><p><strong>Results: </strong>AJCC anatomic and prognostic stages are primarily defined by T and N. PAM clusters were primarily defined by receptor status and grade. PAM clusters align closely with luminal A, luminal B, triple-negative, or HER2-overexpressing treatment-related subtypes. PAM clusters better discriminated chemotherapy treatment, with C-statistic 0.839 (95% CI, 0.836 to 0.842), than either anatomic, with C-statistic 0.770 (95% CI, 0.767 to 0.773), or prognostic staging, with C-statistic 0.796 (95% CI, 0.794 to 0.800). PAM clusters were better predictors of 5-year overall survival, with C-statistic 0.733 (95% CI, 0.727 to 0.739), than anatomic stages, with C-statistic 0.721 (95% CI, 0.715 to 728), but not as predictive as prognostic stages, with C-statistic 0.759 (95% CI, 0.753 to 0.764).</p><p><strong>Conclusion: </strong>Data-driven PAM clusters provide novel insights into the inter-relationship of tumor features and their association with hormonal, targeted, and chemotherapy treatment and with survival outcomes in women younger than 50 years with ESBC. An online application was created so that the PAM clusters could be used as alternatives or in addition to traditional AJCC staging to inform and guide patients and providers.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500173"},"PeriodicalIF":2.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12959580/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147349744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling the Pretest Probability of Identifying Druggable Mutations in Lung Cancer Using Nationwide Comprehensive Genomic Profiling Data. 利用全国范围内的综合基因组图谱数据对肺癌中可用药突变的预测概率进行建模。
IF 2.8 Q2 ONCOLOGY Pub Date : 2026-03-01 Epub Date: 2026-03-19 DOI: 10.1200/CCI-25-00269
Hiroaki Ikushima, Kousuke Watanabe, Aya Shinozaki-Ushiku, Satoshi Kodera, Norihiko Takeda, Katsutoshi Oda, Hidenori Kage

Purpose: Comprehensive genomic profiling (CGP) is a key strategy in precision medicine for lung cancer, yet its clinical implementation remains limited, partly because of the uncertainty in identifying druggable mutations in individual patients. In this study, we investigated the potential of an artificial intelligence (AI)-based tool to predict the probability of identifying druggable mutations before CGP (pretest probability).

Methods: We developed an eXtreme Gradient Boosting (XGBoost) prediction model trained on pre-CGP clinical variables from 3,470 patients with lung cancer (June 2019-November 2023) to estimate the probability of identifying druggable mutations. The key predictors were identified using explainable artificial intelligence (XAI) analysis. The refined model was deployed as a web application and evaluated in a temporally independent test cohort of 1,307 patients (December 2023-November 2024), with Brier score as the primary end point.

Results: The prediction model achieved an area under the receiver operating characteristic curve (AUROC) of 0.85 (95% CI, 0.82 to 0.89) in the overall validation cohort and 0.79 (95% CI, 0.74 to 0.84) in patients for whom a driver mutation had not been identified through companion diagnostic testing. The XAI analysis identified sex, smoking history, histology, and metastatic sites as important predictors. Even among patients who underwent tissue CGP, bone (P = .011) and lung (P < .001) metastases were significantly associated with a higher druggable mutation detection rate. The deployed model achieved Brier scores of 0.19 in the overall independent test cohort and 0.16 in patients for whom a driver mutation had not been identified through companion diagnostic testing.

Conclusion: These findings indicate that an AI-based tool using pre-CGP clinical data may support broader CGP implementation and improve access to targeted therapies.

目的:综合基因组谱分析(CGP)是肺癌精准医疗的关键策略,但其临床实施仍然有限,部分原因是在个体患者中识别可药物突变的不确定性。在这项研究中,我们研究了一种基于人工智能(AI)的工具在CGP之前预测识别可药物突变的概率(预测概率)的潜力。方法:基于3470例肺癌患者(2019年6月- 2023年11月)的cgp前临床变量,我们开发了一个极端梯度增强(XGBoost)预测模型,以估计识别可药物突变的概率。使用可解释人工智能(XAI)分析确定关键预测因子。以Brier评分为主要终点,对1307例患者(2023年12月至2024年11月)进行了暂时独立的测试队列评估。结果:在整个验证队列中,预测模型的受试者工作特征曲线下面积(AUROC)为0.85 (95% CI, 0.82至0.89),在未通过伴随诊断测试确定驱动突变的患者中,预测模型的AUROC为0.79 (95% CI, 0.74至0.84)。XAI分析发现性别、吸烟史、组织学和转移部位是重要的预测因素。即使在接受组织CGP的患者中,骨转移(P = 0.011)和肺转移(P < 0.001)与较高的可药物突变检出率显著相关。部署的模型在整个独立测试队列中获得了0.19分,在未通过伴随诊断测试确定驱动突变的患者中获得了0.16分。结论:这些发现表明,基于人工智能的工具使用前CGP临床数据可以支持更广泛的CGP实施并改善靶向治疗的可及性。
{"title":"Modeling the Pretest Probability of Identifying Druggable Mutations in Lung Cancer Using Nationwide Comprehensive Genomic Profiling Data.","authors":"Hiroaki Ikushima, Kousuke Watanabe, Aya Shinozaki-Ushiku, Satoshi Kodera, Norihiko Takeda, Katsutoshi Oda, Hidenori Kage","doi":"10.1200/CCI-25-00269","DOIUrl":"https://doi.org/10.1200/CCI-25-00269","url":null,"abstract":"<p><strong>Purpose: </strong>Comprehensive genomic profiling (CGP) is a key strategy in precision medicine for lung cancer, yet its clinical implementation remains limited, partly because of the uncertainty in identifying druggable mutations in individual patients. In this study, we investigated the potential of an artificial intelligence (AI)-based tool to predict the probability of identifying druggable mutations before CGP (pretest probability).</p><p><strong>Methods: </strong>We developed an eXtreme Gradient Boosting (XGBoost) prediction model trained on pre-CGP clinical variables from 3,470 patients with lung cancer (June 2019-November 2023) to estimate the probability of identifying druggable mutations. The key predictors were identified using explainable artificial intelligence (XAI) analysis. The refined model was deployed as a web application and evaluated in a temporally independent test cohort of 1,307 patients (December 2023-November 2024), with Brier score as the primary end point.</p><p><strong>Results: </strong>The prediction model achieved an area under the receiver operating characteristic curve (AUROC) of 0.85 (95% CI, 0.82 to 0.89) in the overall validation cohort and 0.79 (95% CI, 0.74 to 0.84) in patients for whom a driver mutation had not been identified through companion diagnostic testing. The XAI analysis identified sex, smoking history, histology, and metastatic sites as important predictors. Even among patients who underwent tissue CGP, bone (<i>P</i> = .011) and lung (<i>P</i> < .001) metastases were significantly associated with a higher druggable mutation detection rate. The deployed model achieved Brier scores of 0.19 in the overall independent test cohort and 0.16 in patients for whom a driver mutation had not been identified through companion diagnostic testing.</p><p><strong>Conclusion: </strong>These findings indicate that an AI-based tool using pre-CGP clinical data may support broader CGP implementation and improve access to targeted therapies.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500269"},"PeriodicalIF":2.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147488321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interpretable Active Learning for Pedigree Data Deduplication in Cancer Genetics. 癌症遗传学中系谱重复数据删除的可解释主动学习。
IF 2.8 Q2 ONCOLOGY Pub Date : 2026-03-01 Epub Date: 2026-03-16 DOI: 10.1200/CCI-25-00252
Maria S Rosito, Aleck E Cervantes, Christine Hong, Joseph D Bonner, Bita Nehoray, Alison Schwartz Levine, Danna Rosenberg, Isabel Anez-Bruzual, Giovanni Parmigiani, Christopher I Amos, Judy E Garber, Stephen B Gruber, Danielle Braun

Purpose: Studying rare genetic conditions often requires multicenter research to gather sufficient data. However, data from multiple institutions may include relatives from the same family enrolled at different sites, increasing the likelihood of duplicate records. This issue is compounded by the use of deidentified data, which limits the direct linkage through personal identifiers. These redundancies can bias family-based genetic studies underscoring the need for robust methods for pedigree deduplication. We propose an interpretable, active learning-based approach to efficiently identify duplicate records in genetic studies, with specific application to families with TP53 mutations in the Li-Fraumeni and TP53: Understanding and Progress (LiFT UP) study.

Materials and methods: Our approach combines heuristic labeling with graph-based features and a machine learning model to iteratively refine duplicate detection. We first generate a partially labeled data set leveraging mutation variant diversity and family characteristics. A random forest classifier is then trained to predict duplicate pairs, with active learning guiding iterative refinement. This method is applied to real-world pedigree data from the LiFT UP study to assess its effectiveness in a multicenter setting.

Results: Our method labeled pedigree pairs in data from the LiFT UP study with a high degree of automation, achieving 99.95% automated processing in the deduplication workflow. By prioritizing likely duplicates for human review, it minimized manual effort while aiming for high specificity. This automated approach avoids dependence on rule-based filters, such as identifier matching, which ultimately require manual confirmation, offering a more scalable solution for improving data quality in risk estimation.

Conclusion: Interpretable active learning provides an effective solution for pedigree deduplication. Future work will explore refinements in identifying potential duplicates and evaluate its generalizability across other genetic data sets.

目的:研究罕见的遗传条件往往需要多中心的研究来收集足够的数据。然而,来自多个机构的数据可能包括在不同地点注册的同一家庭的亲属,从而增加了重复记录的可能性。由于使用了去标识的数据,限制了通过个人标识符的直接链接,这个问题更加复杂。这些冗余可能会影响基于家庭的基因研究,强调需要强大的系谱重复数据删除方法。我们提出了一种可解释的、基于主动学习的方法来有效地识别遗传研究中的重复记录,并在Li-Fraumeni和TP53:理解和进展(LiFT UP)研究中特别应用于TP53突变家族。材料和方法:我们的方法结合了启发式标记与基于图的特征和机器学习模型来迭代地改进重复检测。我们首先利用突变变异多样性和家族特征生成部分标记的数据集。然后训练随机森林分类器来预测重复对,主动学习指导迭代改进。该方法应用于LiFT UP研究的真实谱系数据,以评估其在多中心环境中的有效性。结果:我们的方法对LiFT UP研究数据中的谱系对进行了高度自动化标记,在重复数据删除工作流程中实现了99.95%的自动化处理。通过优先考虑可能的重复供人工审查,它最大限度地减少了人工工作量,同时瞄准高特异性。这种自动化的方法避免了对基于规则的过滤器的依赖,例如标识符匹配,这最终需要手动确认,为提高风险评估中的数据质量提供了更具可扩展性的解决方案。结论:可解释的主动学习为谱系重复数据删除提供了有效的解决方案。未来的工作将探索识别潜在重复的改进,并评估其在其他遗传数据集上的普遍性。
{"title":"Interpretable Active Learning for Pedigree Data Deduplication in Cancer Genetics.","authors":"Maria S Rosito, Aleck E Cervantes, Christine Hong, Joseph D Bonner, Bita Nehoray, Alison Schwartz Levine, Danna Rosenberg, Isabel Anez-Bruzual, Giovanni Parmigiani, Christopher I Amos, Judy E Garber, Stephen B Gruber, Danielle Braun","doi":"10.1200/CCI-25-00252","DOIUrl":"10.1200/CCI-25-00252","url":null,"abstract":"<p><strong>Purpose: </strong>Studying rare genetic conditions often requires multicenter research to gather sufficient data. However, data from multiple institutions may include relatives from the same family enrolled at different sites, increasing the likelihood of duplicate records. This issue is compounded by the use of deidentified data, which limits the direct linkage through personal identifiers. These redundancies can bias family-based genetic studies underscoring the need for robust methods for pedigree deduplication. We propose an interpretable, active learning-based approach to efficiently identify duplicate records in genetic studies, with specific application to families with <i>TP53</i> mutations in the Li-Fraumeni and <i>TP53</i>: Understanding and Progress (LiFT UP) study.</p><p><strong>Materials and methods: </strong>Our approach combines heuristic labeling with graph-based features and a machine learning model to iteratively refine duplicate detection. We first generate a partially labeled data set leveraging mutation variant diversity and family characteristics. A random forest classifier is then trained to predict duplicate pairs, with active learning guiding iterative refinement. This method is applied to real-world pedigree data from the LiFT UP study to assess its effectiveness in a multicenter setting.</p><p><strong>Results: </strong>Our method labeled pedigree pairs in data from the LiFT UP study with a high degree of automation, achieving 99.95% automated processing in the deduplication workflow. By prioritizing likely duplicates for human review, it minimized manual effort while aiming for high specificity. This automated approach avoids dependence on rule-based filters, such as identifier matching, which ultimately require manual confirmation, offering a more scalable solution for improving data quality in risk estimation.</p><p><strong>Conclusion: </strong>Interpretable active learning provides an effective solution for pedigree deduplication. Future work will explore refinements in identifying potential duplicates and evaluate its generalizability across other genetic data sets.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500252"},"PeriodicalIF":2.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13001896/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147470191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Weighting by Income Probabilities as a Novel Approach to Quantifying Differences in the Burden of Cancer by Income: A Case Study of Colorectal Cancer in Ohio. 收入概率加权作为一种量化不同收入癌症负担差异的新方法:以俄亥俄州结直肠癌为例
IF 2.8 Q2 ONCOLOGY Pub Date : 2026-03-01 Epub Date: 2026-03-25 DOI: 10.1200/CCI-25-00171
Uriel Kim, Siran Koroukian, Johnie Rose

Purpose: Population-based cancer registries are a key data resource for catchment area informatics, but their utility for quantifying differences in cancer burden by socioeconomic status is limited. Here, we describe an approach that estimates cancer incidence along income gradients, leveraging a newly validated method called weighting by income probabilities (WIP).

Methods: We estimated income-specific colorectal cancer incidence, stratified by sex and race/ethnicity, in a catchment area (Ohio) as a case study. Income-specific numerator data (number of cancer cases per income bracket) were estimated using WIP, whereas denominators (population at risk by income bracket) were derived from US Census data.

Results: In the case study of the 52,257 patients with invasive colorectal cancer diagnosed in the catchment area of Ohio between 2010 and 2019, lower income was generally associated with higher incidence rates, except in non-Hispanic (NH) White female individuals. The highest incidence was observed in NH Black male individuals at 0-149% of the Federal Poverty Level, with 113.7 cases per 100,000 (95% CI, 99.6 to 129.3) in 2010-2012, compared with 57.8 (95% CI, 54.7 to 61.2) in their NH White counterparts. Sensitivity analyses showed that income-specific incidence statistics were robust to sources of error in numerator and denominator estimation, with incidence estimates varying by no more than 1.98% from the reference estimates.

Conclusion: The approach described here accurately estimates cancer incidence along income gradients and can be expanded to estimate income-specific survival and mortality. The case study of colorectal cancer in Ohio demonstrates important insights into the burden of cancer by income. These granular income-specific data can enhance our understanding of the relationship between cancer burden and socioeconomic status and inform cancer surveillance, prevention, and control efforts.

目的:基于人群的癌症登记是流域信息学的关键数据资源,但其在量化不同社会经济地位的癌症负担差异方面的效用有限。在这里,我们描述了一种沿收入梯度估计癌症发病率的方法,利用一种新验证的方法,称为收入概率加权(WIP)。方法:我们以俄亥俄州一个集水区为例,按性别和种族/民族分层估计收入特异性结直肠癌发病率。收入特定分子数据(每个收入阶层的癌症病例数)使用WIP估计,而分母(按收入阶层划分的风险人口)来自美国人口普查数据。结果:在2010年至2019年期间对俄亥俄州集水区确诊的52257例浸润性结直肠癌患者的病例研究中,除了非西班牙裔(NH)白人女性个体外,收入较低通常与较高的发病率相关。NH黑人男性发病率最高,为联邦贫困水平的0-149%,2010-2012年每10万人中有113.7例(95% CI, 99.6至129.3),而NH白人为57.8例(95% CI, 54.7至61.2)。敏感性分析表明,特定收入的发病率统计数据对分子和分母估计的误差来源是稳健的,发病率估计值与参考估计值的差异不超过1.98%。结论:本文描述的方法准确地估计了沿收入梯度的癌症发病率,并可扩展到估计收入特异性生存和死亡率。俄亥俄州结直肠癌的案例研究证明了收入对癌症负担的重要影响。这些具体的收入数据可以增强我们对癌症负担与社会经济地位之间关系的理解,并为癌症监测、预防和控制工作提供信息。
{"title":"Weighting by Income Probabilities as a Novel Approach to Quantifying Differences in the Burden of Cancer by Income: A Case Study of Colorectal Cancer in Ohio.","authors":"Uriel Kim, Siran Koroukian, Johnie Rose","doi":"10.1200/CCI-25-00171","DOIUrl":"https://doi.org/10.1200/CCI-25-00171","url":null,"abstract":"<p><strong>Purpose: </strong>Population-based cancer registries are a key data resource for catchment area informatics, but their utility for quantifying differences in cancer burden by socioeconomic status is limited. Here, we describe an approach that estimates cancer incidence along income gradients, leveraging a newly validated method called weighting by income probabilities (WIP).</p><p><strong>Methods: </strong>We estimated income-specific colorectal cancer incidence, stratified by sex and race/ethnicity, in a catchment area (Ohio) as a case study. Income-specific numerator data (number of cancer cases per income bracket) were estimated using WIP, whereas denominators (population at risk by income bracket) were derived from US Census data.</p><p><strong>Results: </strong>In the case study of the 52,257 patients with invasive colorectal cancer diagnosed in the catchment area of Ohio between 2010 and 2019, lower income was generally associated with higher incidence rates, except in non-Hispanic (NH) White female individuals. The highest incidence was observed in NH Black male individuals at 0-149% of the Federal Poverty Level, with 113.7 cases per 100,000 (95% CI, 99.6 to 129.3) in 2010-2012, compared with 57.8 (95% CI, 54.7 to 61.2) in their NH White counterparts. Sensitivity analyses showed that income-specific incidence statistics were robust to sources of error in numerator and denominator estimation, with incidence estimates varying by no more than 1.98% from the reference estimates.</p><p><strong>Conclusion: </strong>The approach described here accurately estimates cancer incidence along income gradients and can be expanded to estimate income-specific survival and mortality. The case study of colorectal cancer in Ohio demonstrates important insights into the burden of cancer by income. These granular income-specific data can enhance our understanding of the relationship between cancer burden and socioeconomic status and inform cancer surveillance, prevention, and control efforts.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500171"},"PeriodicalIF":2.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147516764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ensuring Reliability of Curated Electronic Health Record-Derived Data: The Validation of Accuracy for Large Language Model-/Machine Learning-Extracted Information and Data (VALID) Framework. 确保电子健康记录衍生数据的可靠性:大型语言模型/机器学习提取信息和数据(VALID)框架的准确性验证。
IF 2.8 Q2 ONCOLOGY Pub Date : 2026-03-01 Epub Date: 2026-03-16 DOI: 10.1200/CCI-25-00215
Melissa Estevez, Nisha Singh, Lauren Dyson, Blythe Adamson, Konstantin Krismer, Kelly Magee, Qianyu Yuan, Megan W Hildner, Erin Fidyk, Tori Williams, Olive Mbah, Farhad Khan, Kathi Seidl-Rathkopf, Aaron B Cohen

Large language models (LLMs) are increasingly used to extract clinical data from electronic health records, offering significant improvements in scalability and efficiency for real-world data (RWD) curation in oncology. However, the adoption of LLMs introduces new challenges in ensuring the reliability, accuracy, and fairness of extracted data, which are essential for research, regulatory, and clinical applications. Existing quality assurance frameworks for RWD and artificial intelligence (AI) do not fully address the unique error modes and complexities associated with LLM-extracted data. In this paper, we propose a comprehensive framework for evaluating the quality of clinical data extracted by LLMs. The framework integrates variable-level performance benchmarking against expert human abstraction, verification checks for internal consistency and plausibility, and replication analyses comparing LLM-extracted data to human-abstracted data sets or external standards. This multidimensional approach enables the identification of variables most in need of improvement, systematic detection of latent errors, and confirmation of data set fitness-for-purpose in real-world research. Additionally, the framework supports bias assessment by stratifying across demographic subgroups. By providing a rigorous and transparent method for assessing LLM-extracted RWD, this framework advances industry standards and supports the trustworthy use of AI-powered evidence generation in oncology research and practice.

大型语言模型(llm)越来越多地用于从电子健康记录中提取临床数据,为肿瘤学中的真实世界数据(RWD)管理提供了可扩展性和效率方面的显着改进。然而,法学硕士的采用在确保提取数据的可靠性、准确性和公平性方面带来了新的挑战,这对研究、监管和临床应用至关重要。现有的RWD和人工智能(AI)质量保证框架不能完全解决与llm提取数据相关的独特错误模式和复杂性。在本文中,我们提出了一个综合框架来评估法学硕士提取的临床数据的质量。该框架集成了针对专家人类抽象的可变级别性能基准,内部一致性和合理性的验证检查,以及将llm提取的数据与人类抽象数据集或外部标准进行比较的复制分析。这种多维方法能够识别最需要改进的变量,系统地检测潜在的错误,并在现实世界的研究中确认数据集的适用性。此外,该框架通过跨人口亚组分层来支持偏见评估。通过提供严格和透明的方法来评估llm提取的RWD,该框架提高了行业标准,并支持在肿瘤学研究和实践中可靠地使用ai驱动的证据生成。
{"title":"Ensuring Reliability of Curated Electronic Health Record-Derived Data: The Validation of Accuracy for Large Language Model-/Machine Learning-Extracted Information and Data (VALID) Framework.","authors":"Melissa Estevez, Nisha Singh, Lauren Dyson, Blythe Adamson, Konstantin Krismer, Kelly Magee, Qianyu Yuan, Megan W Hildner, Erin Fidyk, Tori Williams, Olive Mbah, Farhad Khan, Kathi Seidl-Rathkopf, Aaron B Cohen","doi":"10.1200/CCI-25-00215","DOIUrl":"10.1200/CCI-25-00215","url":null,"abstract":"<p><p>Large language models (LLMs) are increasingly used to extract clinical data from electronic health records, offering significant improvements in scalability and efficiency for real-world data (RWD) curation in oncology. However, the adoption of LLMs introduces new challenges in ensuring the reliability, accuracy, and fairness of extracted data, which are essential for research, regulatory, and clinical applications. Existing quality assurance frameworks for RWD and artificial intelligence (AI) do not fully address the unique error modes and complexities associated with LLM-extracted data. In this paper, we propose a comprehensive framework for evaluating the quality of clinical data extracted by LLMs. The framework integrates variable-level performance benchmarking against expert human abstraction, verification checks for internal consistency and plausibility, and replication analyses comparing LLM-extracted data to human-abstracted data sets or external standards. This multidimensional approach enables the identification of variables most in need of improvement, systematic detection of latent errors, and confirmation of data set fitness-for-purpose in real-world research. Additionally, the framework supports bias assessment by stratifying across demographic subgroups. By providing a rigorous and transparent method for assessing LLM-extracted RWD, this framework advances industry standards and supports the trustworthy use of AI-powered evidence generation in oncology research and practice.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500215"},"PeriodicalIF":2.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13001894/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147470196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysis of Large Language Model Decision Making in Hormone Receptor-Positive/Human Epidermal Growth Factor Receptor 2-Negative Early Breast Cancer. 激素受体阳性/人表皮生长因子受体2阴性早期乳腺癌的大语言模型决策分析
IF 2.8 Q2 ONCOLOGY Pub Date : 2026-03-01 Epub Date: 2026-03-06 DOI: 10.1200/CCI-25-00230
Roberto Buonaiuto, Aldo Caltavituro, Rossana Di Rienzo, Angela Grieco, Federica P Mangiacotti, Alessandra Longobardi, Vincenza Cantile, Vittoria Molinaro, Martina Pagliuca, Giuseppe Buono, Pietro De Placido, Erica Pietroluongo, Valeria Forestieri, Claudia Martinelli, Vincenzo di Lauro, Luigi Leo, Massimiliano D'Aiuto, Giampaolo Bianchini, Carmen Criscitiello, Roberto Bianco, Lucia Del Mastro, Michelino De Laurentiis, Grazia Arpino, Carmine De Angelis, Mario Giuliano

Purpose: To assess the ability of GPT-4o in adjuvant treatment decision making in hormone receptor-positive (HR+)/human epidermal growth factor receptor 2-negative (HER2-) early breast cancer by comparing its recommendations with those of clinicians including Oncotype DX data, and to explore its potential as a decision-support tool in routine clinical practice.

Methods: We compared clinician and GPT-4o recommendations in patients tested with Oncotype DX in routine practice at the University of Naples Federico II (n = 607, cohort 1 [C1]) and within the prospective, multicenter PRO BONO study (n = 237, cohort 2 [C2]). Pre- and post-Oncotype DX treatment recommendations were categorized as chemotherapy (CT) + endocrine therapy (ET) or ET alone. Concordance between clinician and GPT-4o recommendations was assessed using agreement rates and Cohen's kappa. The accuracy of Oncotype DX results was evaluated using the AUC metric.

Results: The agreement between clinicians and GPT-4o in pretest recommendations was 68% (kappa, 0.381 [95% CI, 0.31 to 0.45], P < .001) in C1 and 70% (0.401 [95% CI, 0.29 to 0.52], P < .001) in C2. Before Oncotype DX, clinicians recommended CT more frequently than GPT-4o for C1 (58% v 38%) and C2 (53% v 43%). Post-test agreement increased to 93% (0.814 [95% CI, 0.76 to 0.87], P < .001) in C1 and 90% (0.741 [95% CI, 0.64 to 0.84], P < .001) in C2. The agreement between pre- and post-Oncotype DX treatment recommendations for clinicians was 56% and 63% versus 68% and 60% for GPT-4o in C1 and C2, respectively. GPT-4o showed higher accuracy in predicting low than high genomic risk in postmenopausal patients (87% v 43% in C1; 85% v 45% in C2, P < .001) and low versus intermediate and high risk in premenopausal patients in both cohorts (P < .001).

Conclusion: The agreement between clinicians and GPT-4o in pretest recommendations was modest but improved post-test, highlighting the importance of multigene testing and the potential of large language models in clinical decision making.

目的:通过比较gpt - 40在激素受体阳性(HR+)/人表皮生长因子受体2阴性(HER2-)早期乳腺癌辅助治疗决策中的推荐值与临床医生的推荐值(包括Oncotype DX数据),评估gpt - 40在辅助治疗决策中的能力,并探讨其作为常规临床实践决策支持工具的潜力。方法:我们比较了那不勒斯费德里科大学(n = 607,队列1 [C1])和前瞻性多中心PRO - BONO研究(n = 237,队列2 [C2])在常规实践中检测Oncotype DX患者的临床医生和gpt - 40建议。oncotype DX术前和术后的治疗建议分为化疗(CT) +内分泌治疗(ET)或单独ET。临床医生和gpt - 40建议之间的一致性评估使用协议率和科恩kappa。使用AUC指标评估Oncotype DX结果的准确性。结果:临床医生和gpt - 40在检测前推荐方面的一致性在C1组为68% (kappa, 0.381 [95% CI, 0.31 ~ 0.45], P < .001),在C2组为70% (0.401 [95% CI, 0.29 ~ 0.52], P < .001)。在Oncotype DX之前,临床医生推荐CT检查C1 (58% vs 38%)和C2 (53% vs 43%)的频率高于gpt - 40。C1组检验后一致性提高到93% (0.814 [95% CI, 0.76 ~ 0.87], P < 0.001), C2组提高到90% (0.741 [95% CI, 0.64 ~ 0.84], P < 0.001)。临床医生推荐的oncotype DX治疗前和后的一致性分别为56%和63%,而gpt - 40治疗C1和C2的一致性分别为68%和60%。gpt - 40在预测绝经后患者低基因组风险比高基因组风险的准确性更高(C1组为87% vs 43%; C2组为85% vs 45%, P < 0.001),两组队列中绝经前患者低基因组风险比中基因组风险高基因组风险的准确性更高(P < 0.001)。结论:临床医生和gpt - 40在测试前推荐方面的一致性不高,但在测试后有所改善,突出了多基因测试的重要性和大语言模型在临床决策中的潜力。
{"title":"Analysis of Large Language Model Decision Making in Hormone Receptor-Positive/Human Epidermal Growth Factor Receptor 2-Negative Early Breast Cancer.","authors":"Roberto Buonaiuto, Aldo Caltavituro, Rossana Di Rienzo, Angela Grieco, Federica P Mangiacotti, Alessandra Longobardi, Vincenza Cantile, Vittoria Molinaro, Martina Pagliuca, Giuseppe Buono, Pietro De Placido, Erica Pietroluongo, Valeria Forestieri, Claudia Martinelli, Vincenzo di Lauro, Luigi Leo, Massimiliano D'Aiuto, Giampaolo Bianchini, Carmen Criscitiello, Roberto Bianco, Lucia Del Mastro, Michelino De Laurentiis, Grazia Arpino, Carmine De Angelis, Mario Giuliano","doi":"10.1200/CCI-25-00230","DOIUrl":"10.1200/CCI-25-00230","url":null,"abstract":"<p><strong>Purpose: </strong>To assess the ability of GPT-4o in adjuvant treatment decision making in hormone receptor-positive (HR+)/human epidermal growth factor receptor 2-negative (HER2-) early breast cancer by comparing its recommendations with those of clinicians including Oncotype DX data, and to explore its potential as a decision-support tool in routine clinical practice.</p><p><strong>Methods: </strong>We compared clinician and GPT-4o recommendations in patients tested with Oncotype DX in routine practice at the University of Naples Federico II (n = 607, cohort 1 [C1]) and within the prospective, multicenter PRO BONO study (n = 237, cohort 2 [C2]). Pre- and post-Oncotype DX treatment recommendations were categorized as chemotherapy (CT) + endocrine therapy (ET) or ET alone. Concordance between clinician and GPT-4o recommendations was assessed using agreement rates and Cohen's kappa. The accuracy of Oncotype DX results was evaluated using the AUC metric.</p><p><strong>Results: </strong>The agreement between clinicians and GPT-4o in pretest recommendations was 68% (kappa, 0.381 [95% CI, 0.31 to 0.45], <i>P</i> < .001) in C1 and 70% (0.401 [95% CI, 0.29 to 0.52], <i>P</i> < .001) in C2. Before Oncotype DX, clinicians recommended CT more frequently than GPT-4o for C1 (58% <i>v</i> 38%) and C2 (53% <i>v</i> 43%). Post-test agreement increased to 93% (0.814 [95% CI, 0.76 to 0.87], <i>P</i> < .001) in C1 and 90% (0.741 [95% CI, 0.64 to 0.84], <i>P</i> < .001) in C2. The agreement between pre- and post-Oncotype DX treatment recommendations for clinicians was 56% and 63% versus 68% and 60% for GPT-4o in C1 and C2, respectively. GPT-4o showed higher accuracy in predicting low than high genomic risk in postmenopausal patients (87% <i>v</i> 43% in C1; 85% <i>v</i> 45% in C2, <i>P</i> < .001) and low versus intermediate and high risk in premenopausal patients in both cohorts (<i>P</i> < .001).</p><p><strong>Conclusion: </strong>The agreement between clinicians and GPT-4o in pretest recommendations was modest but improved post-test, highlighting the importance of multigene testing and the potential of large language models in clinical decision making.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500230"},"PeriodicalIF":2.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12986038/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147370764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cascade Chatbot: A Scalable Approach to Family-Based Genetic Testing for Hereditary Cancer Syndromes. 级联聊天机器人:一种可扩展的方法,以家庭为基础的基因检测遗传性癌症综合征。
IF 2.8 Q2 ONCOLOGY Pub Date : 2026-03-01 Epub Date: 2026-03-09 DOI: 10.1200/CCI-25-00321
Lauren B Davis Rivera, Lauren Mitchell, Muhammad Danyal Ahsan, Isabelle Chandler, Emily S Epstein, Emerson P Borsato, Caitlin Allen, Kimberly A Kaphingst, Richard L Bradshaw, Guilherme Del Fiol, Kensaku Kawamoto, Anne C Madeo, Ravi N Sharaf, Melissa K Frey

Purpose: Cascade genetic testing enables identification of relatives at risk of hereditary cancer syndromes, creating opportunities for early detection and prevention. However, uptake of cascade testing remains low, with approximately one-third of eligible relatives completing testing, largely because of reliance on patient-mediated communication. Although clinician-mediated outreach has demonstrated improved efficacy, it is often limited by resource demands. Scalable digital health tools are a promising strategy to address this gap in testing uptake.

Methods: In this quality improvement initiative, we developed a digital cascade chatbot to deliver gene-specific education and facilitate access to genetic services among at-risk relatives. Between October 2024 and January 2025, 100 consecutive probands with a hereditary cancer pathogenic variant seen in a gynecologic oncology clinic were offered a cascade chatbot to share with their relatives. The primary outcome was proband acceptance of the cascade chatbot. Secondary outcomes included sharing of the cascade chatbot with at-risk relatives and relatives' subsequent utilization of genetic services. Outcomes were evaluated through telephone follow-up at 2 weeks and 3 months after chatbot introduction.

Results: Fifty-nine of 100 probands reported having relatives who had not undergone genetic testing. Among this group, 58 (98.3%) accepted the cascade chatbot. At 2-week follow-up, 44 of 58 probands (75.9%) had shared the cascade chatbot with at least one relative, and an additional eight (13.8%) reported plans to share. At 3-month follow-up with probands, 48 (82.8%) probands had shared the cascade chatbot with at least one relative. A total of 122 relatives received the cascade chatbot and 96 (78.7%) were reached for 3-month follow-up. Among the 96 relatives reached, 49 (51.0%) had scheduled or completed a genetics appointment, and of them, 36 (73.5%) had completed testing.

Conclusion: A cascade chatbot was highly acceptable to probands and effectively engaged relatives. Scalable digital health tools may enhance cascade testing and support precision cancer prevention.

目的:级联基因检测能够识别有遗传性癌症综合征风险的亲属,为早期发现和预防创造机会。然而,级联检测的接受度仍然很低,大约三分之一的符合条件的亲属完成了检测,主要是因为依赖于患者介导的沟通。尽管临床医生介导的外展已证明其疗效有所改善,但往往受到资源需求的限制。可扩展的数字卫生工具是一种很有前途的战略,可以解决测试接受方面的这一差距。方法:在这项质量改进计划中,我们开发了一个数字级联聊天机器人,以提供基因特异性教育,并促进高危亲属获得遗传服务。在2024年10月至2025年1月期间,在妇科肿瘤诊所连续发现了100名患有遗传性癌症致病变异的先证,并为他们提供了一个级联聊天机器人,以便与他们的亲属分享。主要结果是先证者接受级联聊天机器人。次要结果包括与有风险的亲属共享级联聊天机器人以及亲属随后对遗传服务的利用。在引入聊天机器人2周和3个月后通过电话随访评估结果。结果:100个先证者中有59个报告他们的亲属没有接受过基因检测。在这一组中,58人(98.3%)接受了cascade聊天机器人。在两周的随访中,58个先证中有44个(75.9%)与至少一个亲属共享了级联聊天机器人,另外8个(13.8%)报告计划共享。在对先证者的3个月随访中,48名(82.8%)先证者与至少一名亲属共享级联聊天机器人。共有122名亲属接受了级联聊天机器人,其中96名(78.7%)接受了3个月的随访。在所接触的96名亲属中,49名(51.0%)已安排或完成遗传学预约,其中36名(73.5%)已完成检测。结论:级联聊天机器人被先证者高度接受,并能有效地吸引亲属。可扩展的数字健康工具可以增强级联测试并支持精确的癌症预防。
{"title":"Cascade Chatbot: A Scalable Approach to Family-Based Genetic Testing for Hereditary Cancer Syndromes.","authors":"Lauren B Davis Rivera, Lauren Mitchell, Muhammad Danyal Ahsan, Isabelle Chandler, Emily S Epstein, Emerson P Borsato, Caitlin Allen, Kimberly A Kaphingst, Richard L Bradshaw, Guilherme Del Fiol, Kensaku Kawamoto, Anne C Madeo, Ravi N Sharaf, Melissa K Frey","doi":"10.1200/CCI-25-00321","DOIUrl":"https://doi.org/10.1200/CCI-25-00321","url":null,"abstract":"<p><strong>Purpose: </strong>Cascade genetic testing enables identification of relatives at risk of hereditary cancer syndromes, creating opportunities for early detection and prevention. However, uptake of cascade testing remains low, with approximately one-third of eligible relatives completing testing, largely because of reliance on patient-mediated communication. Although clinician-mediated outreach has demonstrated improved efficacy, it is often limited by resource demands. Scalable digital health tools are a promising strategy to address this gap in testing uptake.</p><p><strong>Methods: </strong>In this quality improvement initiative, we developed a digital cascade chatbot to deliver gene-specific education and facilitate access to genetic services among at-risk relatives. Between October 2024 and January 2025, 100 consecutive probands with a hereditary cancer pathogenic variant seen in a gynecologic oncology clinic were offered a cascade chatbot to share with their relatives. The primary outcome was proband acceptance of the cascade chatbot. Secondary outcomes included sharing of the cascade chatbot with at-risk relatives and relatives' subsequent utilization of genetic services. Outcomes were evaluated through telephone follow-up at 2 weeks and 3 months after chatbot introduction.</p><p><strong>Results: </strong>Fifty-nine of 100 probands reported having relatives who had not undergone genetic testing. Among this group, 58 (98.3%) accepted the cascade chatbot. At 2-week follow-up, 44 of 58 probands (75.9%) had shared the cascade chatbot with at least one relative, and an additional eight (13.8%) reported plans to share. At 3-month follow-up with probands, 48 (82.8%) probands had shared the cascade chatbot with at least one relative. A total of 122 relatives received the cascade chatbot and 96 (78.7%) were reached for 3-month follow-up. Among the 96 relatives reached, 49 (51.0%) had scheduled or completed a genetics appointment, and of them, 36 (73.5%) had completed testing.</p><p><strong>Conclusion: </strong>A cascade chatbot was highly acceptable to probands and effectively engaged relatives. Scalable digital health tools may enhance cascade testing and support precision cancer prevention.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500321"},"PeriodicalIF":2.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147391620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging Digital Technology and Artificial Intelligence to Describe the Real-World Belgian Chronic Lymphocytic Leukemia Patient Population: The BE-CLLEAR Study. 利用数字技术和人工智能来描述现实世界比利时慢性淋巴细胞白血病患者群体:be - clear研究。
IF 2.8 Q2 ONCOLOGY Pub Date : 2026-03-01 Epub Date: 2026-03-18 DOI: 10.1200/CCI-25-00159
Matthias Vanderkerken, Koen Van Eygen, Veerle Galle, Annelies Verbiest, Ann Janssens, Imke Masuy, Kristof Theys, Tine Cuppens, Katoo Muylle, Ann De Becker

Purpose: Chronic lymphocytic leukemia (CLL) treatment paradigms have evolved significantly, yet real-world evidence (RWE) on guideline implementation and patient characteristics remains limited.

Materials and methods: This multicenter retrospective study leveraged artificial intelligence (AI) to analyze structured and unstructured data from four Belgian hospitals (January 1, 2018-October 31, 2021). Structured data including diagnosis codes, laboratory results, treatment records, and national registries were standardized using the Observational Medical Outcomes Partnership (OMOP) Common Data Model. Unstructured clinical notes and reports were processed using a transformer-based natural language processing (NLP) pipeline. We examined clinical characteristics, diagnostic testing, and treatment patterns among patients with newly diagnosed CLL.

Results: Of 22 variable groups analyzed, 50.0% was derived from structured data only, 36.4% from unstructured data only (NLP-extracted), and 13.6% from mixed sources. Five hundred eighty-six patients with CLL were identified, with a median age of 74 years. One hundred seventy-four patients (29.7%) initiated first-line (1L) treatment, and 41 progressed to second-line treatment. Of 1L treated patients, 68.4% had at least one prespecified comorbidity, including 12.1% with significant cardiovascular disease. TP53/del17p testing was documented in 34.3% of patients before 1L treatment, with aberrations detected in 42.8%. Bruton's tyrosine kinase inhibitors (BTKi; 35.6%) were the most common 1L treatment, followed by chemoimmunotherapy (CIT; 25.9%). CIT use declined (30.6% to 17.5%), whereas BTKi use remained stable (34.2% to 38.1%) between 2018 and 2021.

Conclusion: This AI-augmented study demonstrates the feasibility and scalability of combining NLP-derived insights with OMOP-standardized structured data to generate reproducible RWE in hematology. Our results highlight an elderly CLL population with significant comorbidities and a shift toward targeted therapies. While treatment patterns aligned with guidelines, data quality depended on source documentation accessibility. Improved integration of molecular testing into electronic health records is essential for enhancing clinical decision making, patient outcomes, and future research.

目的:慢性淋巴细胞白血病(CLL)的治疗模式已经发生了重大变化,但关于指南实施和患者特征的现实证据(RWE)仍然有限。材料和方法:这项多中心回顾性研究利用人工智能(AI)分析了比利时四家医院(2018年1月1日至2021年10月31日)的结构化和非结构化数据。结构化数据包括诊断代码、实验室结果、治疗记录和国家登记,使用观察性医疗结果伙伴关系(OMOP)公共数据模型进行标准化。使用基于转换器的自然语言处理(NLP)管道处理非结构化临床记录和报告。我们研究了新诊断的CLL患者的临床特征、诊断测试和治疗模式。结果:在分析的22个变量组中,50.0%仅来自结构化数据,36.4%仅来自非结构化数据(nlp提取),13.6%来自混合来源。586例CLL患者被确诊,中位年龄为74岁。174例(29.7%)患者开始一线治疗,41例进展到二线治疗。在1L名接受治疗的患者中,68.4%至少有一种预先规定的合并症,其中12.1%患有明显的心血管疾病。1L治疗前34.3%的患者有TP53/del17p检测记录,42.8%的患者检测到异常。布鲁顿酪氨酸激酶抑制剂(BTKi; 35.6%)是最常见的1L治疗,其次是化学免疫治疗(CIT; 25.9%)。2018年至2021年间,CIT使用率下降(30.6%至17.5%),而BTKi使用率保持稳定(34.2%至38.1%)。结论:这项人工智能增强研究证明了将nlp衍生的见解与omop标准化结构化数据相结合,在血液学中生成可重复的RWE的可行性和可扩展性。我们的研究结果强调了具有显著合并症的老年CLL人群和向靶向治疗的转变。虽然处理模式与指导方针一致,但数据质量取决于源文档的可访问性。改进分子检测与电子健康记录的集成,对于增强临床决策、患者结果和未来研究至关重要。
{"title":"Leveraging Digital Technology and Artificial Intelligence to Describe the Real-World Belgian Chronic Lymphocytic Leukemia Patient Population: The BE-CLLEAR Study.","authors":"Matthias Vanderkerken, Koen Van Eygen, Veerle Galle, Annelies Verbiest, Ann Janssens, Imke Masuy, Kristof Theys, Tine Cuppens, Katoo Muylle, Ann De Becker","doi":"10.1200/CCI-25-00159","DOIUrl":"10.1200/CCI-25-00159","url":null,"abstract":"<p><strong>Purpose: </strong>Chronic lymphocytic leukemia (CLL) treatment paradigms have evolved significantly, yet real-world evidence (RWE) on guideline implementation and patient characteristics remains limited.</p><p><strong>Materials and methods: </strong>This multicenter retrospective study leveraged artificial intelligence (AI) to analyze structured and unstructured data from four Belgian hospitals (January 1, 2018-October 31, 2021). Structured data including diagnosis codes, laboratory results, treatment records, and national registries were standardized using the Observational Medical Outcomes Partnership (OMOP) Common Data Model. Unstructured clinical notes and reports were processed using a transformer-based natural language processing (NLP) pipeline. We examined clinical characteristics, diagnostic testing, and treatment patterns among patients with newly diagnosed CLL.</p><p><strong>Results: </strong>Of 22 variable groups analyzed, 50.0% was derived from structured data only, 36.4% from unstructured data only (NLP-extracted), and 13.6% from mixed sources. Five hundred eighty-six patients with CLL were identified, with a median age of 74 years. One hundred seventy-four patients (29.7%) initiated first-line (1L) treatment, and 41 progressed to second-line treatment. Of 1L treated patients, 68.4% had at least one prespecified comorbidity, including 12.1% with significant cardiovascular disease. <i>TP53</i>/del17p testing was documented in 34.3% of patients before 1L treatment, with aberrations detected in 42.8%. Bruton's tyrosine kinase inhibitors (BTKi; 35.6%) were the most common 1L treatment, followed by chemoimmunotherapy (CIT; 25.9%). CIT use declined (30.6% to 17.5%), whereas BTKi use remained stable (34.2% to 38.1%) between 2018 and 2021.</p><p><strong>Conclusion: </strong>This AI-augmented study demonstrates the feasibility and scalability of combining NLP-derived insights with OMOP-standardized structured data to generate reproducible RWE in hematology. Our results highlight an elderly CLL population with significant comorbidities and a shift toward targeted therapies. While treatment patterns aligned with guidelines, data quality depended on source documentation accessibility. Improved integration of molecular testing into electronic health records is essential for enhancing clinical decision making, patient outcomes, and future research.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500159"},"PeriodicalIF":2.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13003938/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147482349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Opportunities and Challenges in Implementing Large Language Models (LLMs) in Oncology. 在肿瘤学中实现大型语言模型(LLMs)的机遇和挑战。
IF 2.8 Q2 ONCOLOGY Pub Date : 2026-03-01 Epub Date: 2026-03-10 DOI: 10.1200/CCI-26-00031
Christine Adams

Large language models (LLMs) and artificial intelligence systems possess the transformative potential to revolutionize cancer care. However, their integration into oncology presents both extraordinary opportunities and challenges. Clinically, these tools can extract actionable insights from pathology reports, radiology imaging, and genomic sequencing at previously impossible scales. They also enhance the patient-facing dimension by providing accurate informational support and improving patient-clinical trial matching. In translational research, LLMs accelerate informatics analysis for single-cell transcriptomics, spatial omics, and computational pathology, thereby improving support for precision oncology. However, ethical concerns regarding trust, equity, privacy, transparency, non-maleficence, and accountability call for caution. Implementation challenges include hallucination risks, high computational costs, and the potential to exacerbate existing healthcare disparities. Furthermore, developers must navigate a fragmented regulatory landscape consisting of an evolving patchwork of federal, state, and international rules. Responsible implementation requires appropriate skepticism, rigorous validation, and a commitment to patient welfare to navigate this rapidly evolving landscape.

大型语言模型(llm)和人工智能系统具有革新癌症治疗的变革潜力。然而,它们与肿瘤学的结合带来了非凡的机遇和挑战。在临床上,这些工具可以从病理报告、放射成像和基因组测序中提取可操作的见解,这在以前是不可能的。它们还通过提供准确的信息支持和改善患者-临床试验匹配来增强面向患者的维度。在转化研究中,法学硕士加速了单细胞转录组学、空间组学和计算病理学的信息学分析,从而提高了对精确肿瘤学的支持。然而,关于信任、公平、隐私、透明度、非恶意和问责制的伦理问题要求我们谨慎行事。实现挑战包括幻觉风险、高计算成本以及可能加剧现有医疗保健差距。此外,开发者必须驾驭由联邦、州和国际规则组成的支离破碎的监管格局。负责任的实施需要适当的怀疑,严格的验证和对患者福利的承诺,以应对这一快速发展的环境。
{"title":"Opportunities and Challenges in Implementing Large Language Models (LLMs) in Oncology.","authors":"Christine Adams","doi":"10.1200/CCI-26-00031","DOIUrl":"https://doi.org/10.1200/CCI-26-00031","url":null,"abstract":"<p><p>Large language models (LLMs) and artificial intelligence systems possess the transformative potential to revolutionize cancer care. However, their integration into oncology presents both extraordinary opportunities and challenges. Clinically, these tools can extract actionable insights from pathology reports, radiology imaging, and genomic sequencing at previously impossible scales. They also enhance the patient-facing dimension by providing accurate informational support and improving patient-clinical trial matching. In translational research, LLMs accelerate informatics analysis for single-cell transcriptomics, spatial omics, and computational pathology, thereby improving support for precision oncology. However, ethical concerns regarding trust, equity, privacy, transparency, non-maleficence, and accountability call for caution. Implementation challenges include hallucination risks, high computational costs, and the potential to exacerbate existing healthcare disparities. Furthermore, developers must navigate a fragmented regulatory landscape consisting of an evolving patchwork of federal, state, and international rules. Responsible implementation requires appropriate skepticism, rigorous validation, and a commitment to patient welfare to navigate this rapidly evolving landscape.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2600031"},"PeriodicalIF":2.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147437517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Empowering Children and Adolescents With Cancer Through Novel, Electronic Health Record-Embedded Symptom Management Tools. 授权儿童和青少年癌症通过新颖的,电子健康记录嵌入症状管理工具。
IF 2.8 Q2 ONCOLOGY Pub Date : 2026-03-01 Epub Date: 2026-03-20 DOI: 10.1200/CCI-26-00005
David H Noyd
{"title":"Empowering Children and Adolescents With Cancer Through Novel, Electronic Health Record-Embedded Symptom Management Tools.","authors":"David H Noyd","doi":"10.1200/CCI-26-00005","DOIUrl":"https://doi.org/10.1200/CCI-26-00005","url":null,"abstract":"","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2600005"},"PeriodicalIF":2.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147492261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
JCO Clinical Cancer Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1