Pub Date : 2026-02-01Epub Date: 2026-02-18DOI: 10.1200/CCI-25-00133
Enzo Joseph, Paul Vallee, Tanguy Perennec, Nicolas Wagneur, Jean-Sébastien Frenel, Mario Campone, François Bocquet, Florent Le Borgne
Purpose: Medical free texts such as pathology reports contain valuable clinical data but are challenging to structure at scale. Traditional natural language processing approaches require extensive annotated data and training. We investigate the use of large language model (LLM) like Mistral to automatically extract three breast cancer (BC) biomarkers from pathology reports.
Materials and methods: We developed and evaluated a pipeline combining Mistral Large LLM and a postprocessing phase. The pipeline's performance was assessed both at document and patient levels. For evaluation, two data sets were used: a data set of 1,152 pathology reports associated with 150 patients with BC focused solely on biomarker values and a gold standard database containing 101 patients with metastatic BC, enriched with detailed patient and tumor characteristics and double-blind validated by clinical research assistants. We also explored the pipeline's performance according to the use of a confidence prompt (CP), a chain of thought (CoT), and few-shot examples.
Results: Our extraction pipeline achieved F1 scores of more than 95% and both recall and precision of more than 94% for each biomarker of interest (ie, estrogen receptor, progesterone receptor and human epidermal growth factor receptor 2 status and score) at the document level. At the patient level, the F1 score decreased between 87% and 90% with a greater drop in recall (ranging between 83% and 87%) compared with precision, which remained >90%. The results were similar whether the pipeline included a CP, CoT, or few-shot examples.
Conclusion: Our study provides strong evidence of the potential of LLMs like Mistral Large for extracting structured BC biomarker data from pathology reports and the potential of such methods for broader digital transformation of health care documents.
目的:医学免费文本,如病理报告,包含有价值的临床数据,但具有挑战性的结构在规模。传统的自然语言处理方法需要大量带注释的数据和训练。我们研究了使用Mistral等大型语言模型(LLM)从病理报告中自动提取三种乳腺癌(BC)生物标志物。材料和方法:我们开发并评估了结合Mistral Large LLM和后处理阶段的管道。该管道的性能在文件和患者水平上进行了评估。为了进行评估,使用了两个数据集:一个数据集包含与150例BC患者相关的1152份病理报告,仅关注生物标志物值;一个金标准数据库包含101例转移性BC患者,丰富了详细的患者和肿瘤特征,并由临床研究助理进行了双盲验证。我们还通过使用置信提示(CP)、思维链(CoT)和少量示例来探索管道的性能。结果:我们的提取管道在文献水平上对每个感兴趣的生物标志物(即雌激素受体、孕激素受体和人表皮生长因子受体2的状态和评分)达到了95%以上的F1评分,召回率和精确度均超过94%。在患者水平上,F1评分下降了87%到90%,召回率下降幅度更大(范围在83%到87%之间),而准确率保持在80%到90%之间。无论管道中是否包含CP、CoT或少量样本,结果都是相似的。结论:我们的研究提供了强有力的证据,证明了像Mistral Large这样的llm在从病理报告中提取结构化BC生物标志物数据方面的潜力,以及这种方法在更广泛的医疗保健文件数字化转换方面的潜力。
{"title":"Development and Assessment of a Pipeline for Extracting Structured Data From Free-Text Medical Reports Using a Large Language Model.","authors":"Enzo Joseph, Paul Vallee, Tanguy Perennec, Nicolas Wagneur, Jean-Sébastien Frenel, Mario Campone, François Bocquet, Florent Le Borgne","doi":"10.1200/CCI-25-00133","DOIUrl":"10.1200/CCI-25-00133","url":null,"abstract":"<p><strong>Purpose: </strong>Medical free texts such as pathology reports contain valuable clinical data but are challenging to structure at scale. Traditional natural language processing approaches require extensive annotated data and training. We investigate the use of large language model (LLM) like Mistral to automatically extract three breast cancer (BC) biomarkers from pathology reports.</p><p><strong>Materials and methods: </strong>We developed and evaluated a pipeline combining Mistral Large LLM and a postprocessing phase. The pipeline's performance was assessed both at document and patient levels. For evaluation, two data sets were used: a data set of 1,152 pathology reports associated with 150 patients with BC focused solely on biomarker values and a gold standard database containing 101 patients with metastatic BC, enriched with detailed patient and tumor characteristics and double-blind validated by clinical research assistants. We also explored the pipeline's performance according to the use of a confidence prompt (CP), a chain of thought (CoT), and few-shot examples.</p><p><strong>Results: </strong>Our extraction pipeline achieved F1 scores of more than 95% and both recall and precision of more than 94% for each biomarker of interest (ie, estrogen receptor, progesterone receptor and human epidermal growth factor receptor 2 status and score) at the document level. At the patient level, the F1 score decreased between 87% and 90% with a greater drop in recall (ranging between 83% and 87%) compared with precision, which remained >90%. The results were similar whether the pipeline included a CP, CoT, or few-shot examples.</p><p><strong>Conclusion: </strong>Our study provides strong evidence of the potential of LLMs like Mistral Large for extracting structured BC biomarker data from pathology reports and the potential of such methods for broader digital transformation of health care documents.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500133"},"PeriodicalIF":2.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12928813/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146222008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2026-01-16DOI: 10.1200/CCI-25-00266
Shane S Neibart, Nicholas Lin, Jacob Hogan, Shalini Moningi, Benjamin H Kann, Raymond H Mak, Miranda Lam
Purpose: Routinely collected administrative data provide insights into health care utilization and outcomes but lack detailed clinical information, such as the specific site and intent of radiation therapy (RT). This study aimed to validate claims-based algorithms to accurately identify thoracic RT (TRT) and curative-intent RT in administrative databases.
Methods: Patients at our institution with lung cancer and any RT Current Procedural Terminology (CPT) code from October 2015 to January 2024 were analyzed. RT claims were organized by treatment episode, and RT details were manually abstracted from the electronic health record to classify episodes as TRT or non-TRT and curative or noncurative. A priori algorithms were defined as the presence of respiratory motion management codes, >14 treatment codes (except for stereotactic body RT [SBRT] courses), with or without exclusive thoracic malignancy diagnosis codes. Positive predictive value (PPV) was computed for each episode, stratified by modality (three-dimensional conformal RT [3DCRT], intensity-modulated RT [IMRT], and SBRT). Algorithms were considered acceptable if the lower bound of the Clopper-Pearson 95% CI for PPV exceeded 70%.
Results: A total of 3,846 RT episodes were analyzed. The primary a priori TRT algorithm achieved a PPV of 97% (95% CI, 96 to 98) for IMRT, 99% (95% CI, 97 to 99) for SBRT, and 87% (95% CI, 81 to 92) for 3DCRT. Performance declined when exclusive thoracic malignancy diagnosis codes were excluded. For curative-intent RT, PPVs were 87% for IMRT, 90% for SBRT, and 55% for 3DCRT.
Conclusion: Clinically informed algorithms can accurately identify TRT in claims data, achieving high PPVs particularly for IMRT and SBRT courses. These algorithms can be applied in claims databases to assess RT toxicity and effectiveness. External validation across diverse data sets will be important to confirm generalizability.
{"title":"Validation of Claims-Based Algorithms to Classify Thoracic Radiation Therapy Courses.","authors":"Shane S Neibart, Nicholas Lin, Jacob Hogan, Shalini Moningi, Benjamin H Kann, Raymond H Mak, Miranda Lam","doi":"10.1200/CCI-25-00266","DOIUrl":"https://doi.org/10.1200/CCI-25-00266","url":null,"abstract":"<p><strong>Purpose: </strong>Routinely collected administrative data provide insights into health care utilization and outcomes but lack detailed clinical information, such as the specific site and intent of radiation therapy (RT). This study aimed to validate claims-based algorithms to accurately identify thoracic RT (TRT) and curative-intent RT in administrative databases.</p><p><strong>Methods: </strong>Patients at our institution with lung cancer and any RT Current Procedural Terminology (CPT) code from October 2015 to January 2024 were analyzed. RT claims were organized by treatment episode, and RT details were manually abstracted from the electronic health record to classify episodes as TRT or non-TRT and curative or noncurative. A priori algorithms were defined as the presence of respiratory motion management codes, >14 treatment codes (except for stereotactic body RT [SBRT] courses), with or without exclusive thoracic malignancy diagnosis codes. Positive predictive value (PPV) was computed for each episode, stratified by modality (three-dimensional conformal RT [3DCRT], intensity-modulated RT [IMRT], and SBRT). Algorithms were considered acceptable if the lower bound of the Clopper-Pearson 95% CI for PPV exceeded 70%.</p><p><strong>Results: </strong>A total of 3,846 RT episodes were analyzed. The primary a priori TRT algorithm achieved a PPV of 97% (95% CI, 96 to 98) for IMRT, 99% (95% CI, 97 to 99) for SBRT, and 87% (95% CI, 81 to 92) for 3DCRT. Performance declined when exclusive thoracic malignancy diagnosis codes were excluded. For curative-intent RT, PPVs were 87% for IMRT, 90% for SBRT, and 55% for 3DCRT.</p><p><strong>Conclusion: </strong>Clinically informed algorithms can accurately identify TRT in claims data, achieving high PPVs particularly for IMRT and SBRT courses. These algorithms can be applied in claims databases to assess RT toxicity and effectiveness. External validation across diverse data sets will be important to confirm generalizability.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500266"},"PeriodicalIF":2.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146114914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2026-01-30DOI: 10.1200/CCI-25-00135
Ramtin Mojtahedi, Mohammad Hamghalam, Jacob J Peoples, William R Jarnagin, Richard K G Do, Amber L Simpson
Purpose: It is essential to detect and segment liver tumors to guide treatment and track disease progression. To reduce the need for large annotated data sets, we present an end-to-end pipeline that uses self-supervised pretraining to improve segmentation and then classifies tumor types with a separate pretrained classifier applied to the segmented tumor regions.
Methods: First, we pretrained the encoder of a transformer-based network using a self-supervised approach on unlabeled abdominal computed tomography images. Subsequently, we fine-tuned the segmentation network to segment the liver and tumors, and the tumor regions were classified using a pretrained convolutional neural network (Inception-v3 architecture) as intrahepatic cholangiocarcinoma (ICC), hepatocellular carcinoma (HCC), or colorectal liver metastases (CRLMs). We evaluated 459 images (155 HCC, 107 ICC, 197 CRLM). For external testing, we used an independent public data set (n = 40).
Results: Averaged across HCC, ICC, and CRLM, in comparison with a supervised baseline (no pretraining), self-supervised pretraining improved the liver Dice similarity coefficient (DSC) by 6.4 percentage points and reduced the 95th-percentile Hausdorff distance (HD95) by 32.97 mm. For tumors, the DSC increased by 6.0 percentage points and the HD95 decreased by 3.2 mm. Tumor type classification achieved AUC 0.98 (95% CI, 0.96 to 1.00) and accuracy 96% (95% CI, 92% to 99%). Segmentation performance on the external data was close to the internal cohort with tumor DSC 0.73, intersection over union (IoU) 0.60, and HD95 30.98 mm and liver DSC 0.91, IoU 0.83, and HD95 29.67 mm.
Conclusion: The proposed self-supervised, end-to-end pipeline improves liver tumor segmentation and provides accurate tumor type classification, supporting reliable radiologic assessment, treatment planning, and improved prognostication for patients with liver cancer.
{"title":"Self-Supervised Transformer-Based Pipeline for Liver Tumor Segmentation and Type Classification.","authors":"Ramtin Mojtahedi, Mohammad Hamghalam, Jacob J Peoples, William R Jarnagin, Richard K G Do, Amber L Simpson","doi":"10.1200/CCI-25-00135","DOIUrl":"10.1200/CCI-25-00135","url":null,"abstract":"<p><strong>Purpose: </strong>It is essential to detect and segment liver tumors to guide treatment and track disease progression. To reduce the need for large annotated data sets, we present an end-to-end pipeline that uses self-supervised pretraining to improve segmentation and then classifies tumor types with a separate pretrained classifier applied to the segmented tumor regions.</p><p><strong>Methods: </strong>First, we pretrained the encoder of a transformer-based network using a self-supervised approach on unlabeled abdominal computed tomography images. Subsequently, we fine-tuned the segmentation network to segment the liver and tumors, and the tumor regions were classified using a pretrained convolutional neural network (Inception-v3 architecture) as intrahepatic cholangiocarcinoma (ICC), hepatocellular carcinoma (HCC), or colorectal liver metastases (CRLMs). We evaluated 459 images (155 HCC, 107 ICC, 197 CRLM). For external testing, we used an independent public data set (n = 40).</p><p><strong>Results: </strong>Averaged across HCC, ICC, and CRLM, in comparison with a supervised baseline (no pretraining), self-supervised pretraining improved the liver Dice similarity coefficient (DSC) by 6.4 percentage points and reduced the 95th-percentile Hausdorff distance (HD<sub>95</sub>) by 32.97 mm. For tumors, the DSC increased by 6.0 percentage points and the HD<sub>95</sub> decreased by 3.2 mm. Tumor type classification achieved AUC 0.98 (95% CI, 0.96 to 1.00) and accuracy 96% (95% CI, 92% to 99%). Segmentation performance on the external data was close to the internal cohort with tumor DSC 0.73, intersection over union (IoU) 0.60, and HD<sub>95</sub> 30.98 mm and liver DSC 0.91, IoU 0.83, and HD<sub>95</sub> 29.67 mm.</p><p><strong>Conclusion: </strong>The proposed self-supervised, end-to-end pipeline improves liver tumor segmentation and provides accurate tumor type classification, supporting reliable radiologic assessment, treatment planning, and improved prognostication for patients with liver cancer.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500135"},"PeriodicalIF":2.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12866948/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146094763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2026-02-05DOI: 10.1200/CCI-25-00126
Amy Trentham-Dietz, Thomas P Lawler, Ronald E Gangnon, Allison R Dahlke, Noelle K LoConte, Earlise C Ward, Christine P Muganda, Shaneda Warren Andersen, Marjory L Givens
Purpose: The University of Wisconsin Population Health Institute (PHI) Model of Health, grounded in models developed over a decade ago, provides a framework for prioritizing health-related investments including setting agendas, implementing policies, and sharing resources for improving community health and health equity. The model includes multiple determinants of health and two broad health outcomes (length and quality of life). We adapted the PHI Model of Health to cancer outcomes.
Methods: Using county-level publicly available data, health factor summary measures were derived in three areas: health infrastructure including health promotion and clinical care, physical environment, and social and economic factors. A composite health factor z-score was calculated as the weighted (40%, 15%, and 45%, respectively) average of the summary measures for each county, and k-means clustering was used to create unequally sized county groups with lower (healthier) to higher (less healthy) z-scores. We fit age-adjusted negative binomial regression models to estimate rate ratios and 95% CI for cancer mortality in relation to county health factor cluster.
Results: Age-adjusted cancer mortality rates increased across the 10 county health factor clusters for all-cancers as well as for lung, colorectal, breast, and prostate cancers. Rate ratios generally increased across the 10 health factor clusters for all cancers combined and for specific cancer types. Compared with counties with the most favorable health factor conditions, the counties with the least favorable conditions had an all-cancer mortality rate ratio of 1.49 (95% CI, 1.39 to 1.60).
Conclusion: The PHI model of health adapted to cancer outcomes provides an approach for linking community-specific conditions to the interventions that hold promise to directly address drivers of the cancer burden.
{"title":"Application of the Population Health Institute Model of Health for Identifying Cancer Catchment Area Priorities.","authors":"Amy Trentham-Dietz, Thomas P Lawler, Ronald E Gangnon, Allison R Dahlke, Noelle K LoConte, Earlise C Ward, Christine P Muganda, Shaneda Warren Andersen, Marjory L Givens","doi":"10.1200/CCI-25-00126","DOIUrl":"https://doi.org/10.1200/CCI-25-00126","url":null,"abstract":"<p><strong>Purpose: </strong>The University of Wisconsin Population Health Institute (PHI) Model of Health, grounded in models developed over a decade ago, provides a framework for prioritizing health-related investments including setting agendas, implementing policies, and sharing resources for improving community health and health equity. The model includes multiple determinants of health and two broad health outcomes (length and quality of life). We adapted the PHI Model of Health to cancer outcomes.</p><p><strong>Methods: </strong>Using county-level publicly available data, health factor summary measures were derived in three areas: health infrastructure including health promotion and clinical care, physical environment, and social and economic factors. A composite health factor z-score was calculated as the weighted (40%, 15%, and 45%, respectively) average of the summary measures for each county, and k-means clustering was used to create unequally sized county groups with lower (healthier) to higher (less healthy) z-scores. We fit age-adjusted negative binomial regression models to estimate rate ratios and 95% CI for cancer mortality in relation to county health factor cluster.</p><p><strong>Results: </strong>Age-adjusted cancer mortality rates increased across the 10 county health factor clusters for all-cancers as well as for lung, colorectal, breast, and prostate cancers. Rate ratios generally increased across the 10 health factor clusters for all cancers combined and for specific cancer types. Compared with counties with the most favorable health factor conditions, the counties with the least favorable conditions had an all-cancer mortality rate ratio of 1.49 (95% CI, 1.39 to 1.60).</p><p><strong>Conclusion: </strong>The PHI model of health adapted to cancer outcomes provides an approach for linking community-specific conditions to the interventions that hold promise to directly address drivers of the cancer burden.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500126"},"PeriodicalIF":2.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146127245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2026-02-25DOI: 10.1200/CCI-25-00020
Julie Midroni, Felipe S Torres, Jay Hennessy, Tony Tadic, Andrew Hope, Philip Wong, Srinivas Raman
Purpose: Radiation pneumonitis (RP) is the most common toxicity after thoracic radiotherapy. We develop an artificial intelligence model to predict RP in an institutional cohort of patients undergoing radiotherapy for non-small cell lung cancer.
Methods: Data were collected from patients diagnosed between 2002 and 2020. Patients were screened for a known survival/RP outcome, as well as treatment and clinical parameters. A transformer, pretrained on an open-source data set, was first trained to predict abnormal versus normal pulmonary function based on computed tomography (CT) scans. Transfer learning was then used to apply this model to the RP data set. Three clinical-plus-dosimetric variable models were trained. Finally, a model that combined the CT-based risk score and clinical/dosimetric variables was also trained, to explore if the CT-based risk score improved risk stratification. All models were cross-validated.
Results: 1,023 patients were included in the RP data set, for a total of 2,257 pretreatment scans, with a 15% RP rate. The clinical-plus-dosimetric-only values were 0.70, 0.70, and 0.71, and the CT-only was 0.66. Combining the CT-based risk score and clinical parameters improved the receiver operating characteristic curve to a value of 0.74, averaged across all folds. The combined model also had superior sensitivity for a fixed specificity value of 60%. Precision-recall metrics were comparable across models. Activation mapping of the CT-only model showed prioritization of upper lung and right lung.
Conclusion: In a cohort treated heterogeneous radiotherapy techniques and doses, combining CT-based risk scores with clinical values enhances the prediction of RP. This suggests that CT scans contain additional information that has the potential to enhance RP predictions. Activation score mapping shows focus on lung structure, upper lung, and right lung. Model code is available online.
{"title":"End-to-End Pretreatment Prediction of Radiation Pneumonitis in Patients With Non-Small Cell Lung Cancer Using Computed Tomography: A Vision Transformer Approach.","authors":"Julie Midroni, Felipe S Torres, Jay Hennessy, Tony Tadic, Andrew Hope, Philip Wong, Srinivas Raman","doi":"10.1200/CCI-25-00020","DOIUrl":"https://doi.org/10.1200/CCI-25-00020","url":null,"abstract":"<p><strong>Purpose: </strong>Radiation pneumonitis (RP) is the most common toxicity after thoracic radiotherapy. We develop an artificial intelligence model to predict RP in an institutional cohort of patients undergoing radiotherapy for non-small cell lung cancer.</p><p><strong>Methods: </strong>Data were collected from patients diagnosed between 2002 and 2020. Patients were screened for a known survival/RP outcome, as well as treatment and clinical parameters. A transformer, pretrained on an open-source data set, was first trained to predict abnormal versus normal pulmonary function based on computed tomography (CT) scans. Transfer learning was then used to apply this model to the RP data set. Three clinical-plus-dosimetric variable models were trained. Finally, a model that combined the CT-based risk score and clinical/dosimetric variables was also trained, to explore if the CT-based risk score improved risk stratification. All models were cross-validated.</p><p><strong>Results: </strong>1,023 patients were included in the RP data set, for a total of 2,257 pretreatment scans, with a 15% RP rate. The clinical-plus-dosimetric-only values were 0.70, 0.70, and 0.71, and the CT-only was 0.66. Combining the CT-based risk score and clinical parameters improved the receiver operating characteristic curve to a value of 0.74, averaged across all folds. The combined model also had superior sensitivity for a fixed specificity value of 60%. Precision-recall metrics were comparable across models. Activation mapping of the CT-only model showed prioritization of upper lung and right lung.</p><p><strong>Conclusion: </strong>In a cohort treated heterogeneous radiotherapy techniques and doses, combining CT-based risk scores with clinical values enhances the prediction of RP. This suggests that CT scans contain additional information that has the potential to enhance RP predictions. Activation score mapping shows focus on lung structure, upper lung, and right lung. Model code is available online.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500020"},"PeriodicalIF":2.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147291908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2026-01-16DOI: 10.1200/CCI-25-00248
Almaha Alfakhri, Ohoud Almadani, Ibrahim Asiri, Nada Alsuhebany, Ahmed Alanazi, Turki Althunian
Purpose: Real-world data (RWD) are increasingly used in oncology research, regulatory decisions, and clinical practice; however, variability in data quality and lack of standardization remain major limitations. This study assessed the readiness of oncology RWD from Saudi health care centers for standardization and evaluated their completeness and accuracy.
Methods: Deidentified electronic health records for adult patients (18 years and older) diagnosed with breast cancer, thyroid cancer, colorectal cancer, gastric cancer, hepatocellular carcinoma, or renal cell carcinoma were extracted from five health care centers within the Saudi Real-World Evidence Network. Readiness for standardization was evaluated by assessing alignment with data elements in the Minimal Common Oncology Data Elements (mCODE) framework, a standardized and clinically focused oncology data model. Data quality was evaluated using two dimensions: completeness, defined as the proportion of patients with at least one entered value for each element; and accuracy, defined as the proportion of correct entries based on verification checks (including plausibility and consistency). Outcomes were calculated at the element level and weighted to generate domain- and center-level proportions.
Results: A total of 20,671 oncology patients were included. Overall weighted alignment with mCODE domains was moderate (62.43%). The patient domain showed the highest alignment (71.43%), whereas the outcome domain exhibited significant gaps. Data completeness was low to moderate (49.02%), with higher levels in common cancers (54.33%) than in rare cancers (51.50%). Data accuracy was high overall (95.03%), with rare cancers showing higher accuracy (98.76%) than common cancers (94.62%).
Conclusion: Saudi oncology RWD show moderate alignment with mCODE, with consistently high accuracy across domains. However, gaps in data completeness highlight the need for broader adoption of standardized data frameworks to support interoperability and enable nationwide research and regulatory use.
{"title":"Evaluating the Readiness of Saudi Oncology Real-World Data for Standardization and Quality Enhancement.","authors":"Almaha Alfakhri, Ohoud Almadani, Ibrahim Asiri, Nada Alsuhebany, Ahmed Alanazi, Turki Althunian","doi":"10.1200/CCI-25-00248","DOIUrl":"https://doi.org/10.1200/CCI-25-00248","url":null,"abstract":"<p><strong>Purpose: </strong>Real-world data (RWD) are increasingly used in oncology research, regulatory decisions, and clinical practice; however, variability in data quality and lack of standardization remain major limitations. This study assessed the readiness of oncology RWD from Saudi health care centers for standardization and evaluated their completeness and accuracy.</p><p><strong>Methods: </strong>Deidentified electronic health records for adult patients (18 years and older) diagnosed with breast cancer, thyroid cancer, colorectal cancer, gastric cancer, hepatocellular carcinoma, or renal cell carcinoma were extracted from five health care centers within the Saudi Real-World Evidence Network. Readiness for standardization was evaluated by assessing alignment with data elements in the Minimal Common Oncology Data Elements (mCODE) framework, a standardized and clinically focused oncology data model. Data quality was evaluated using two dimensions: completeness, defined as the proportion of patients with at least one entered value for each element; and accuracy, defined as the proportion of correct entries based on verification checks (including plausibility and consistency). Outcomes were calculated at the element level and weighted to generate domain- and center-level proportions.</p><p><strong>Results: </strong>A total of 20,671 oncology patients were included. Overall weighted alignment with mCODE domains was moderate (62.43%). The patient domain showed the highest alignment (71.43%), whereas the outcome domain exhibited significant gaps. Data completeness was low to moderate (49.02%), with higher levels in common cancers (54.33%) than in rare cancers (51.50%). Data accuracy was high overall (95.03%), with rare cancers showing higher accuracy (98.76%) than common cancers (94.62%).</p><p><strong>Conclusion: </strong>Saudi oncology RWD show moderate alignment with mCODE, with consistently high accuracy across domains. However, gaps in data completeness highlight the need for broader adoption of standardized data frameworks to support interoperability and enable nationwide research and regulatory use.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500248"},"PeriodicalIF":2.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146114921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2026-02-12DOI: 10.1200/CCI-25-00226
Meenakshi Dubey, Kok Joon Chong, Yuba Raj Pun, Lee Yi Foo, Melissa Ooi, Iain Bee Huat Tan, David Shao Peng Tan, Kee Yuan Ngiam, Hwee Lin Wee
Purpose: Eastern Cooperative Oncology Group (ECOG) performance status is critical for cancer patient management, yet it is often documented only in unstructured clinical notes. This study compares several approaches to extract ECOG status from oncology notes, focusing on advanced prompting techniques for large language models (LLMs).
Methods: We evaluated four ECOG extraction approaches on unstructured clinical notes from patients with non-small cell lung cancer, multiple myeloma, or ovarian cancer (2017-2021). The approaches were a rule-based natural language processing algorithm, simple LLM prompting, and two advanced prompts (chain-of-thought and Double Filtering) using a domain-tuned LLM (LLAMAv3.2). Performance was measured on a binary outcome (any ECOG documented v none) and a three-class outcome (ECOG 0-1 v ≥2 v none) and via an adapted QUEST questionnaire for human evaluation.
Results: Both CoT and double filtering technique (DFT) achieved 94% accuracy, outperforming the rule-based method (91%) and simple prompting (86%). DFT had the highest specificity (0.91) and positive predictive value (PPV; 0.93), whereas CoT attained the highest sensitivity (0.98). In the QUEST evaluation, DFT and CoT scored higher on output quality, reasoning, bias reduction, and user satisfaction than the simple prompt. DFT received the top satisfaction rating. In the three-class analysis, DFT and CoT again performed best (accuracy 0.91 v 0.87) and DFT was most sensitive for ECOG ≥2 cases. Estimates for ECOG ≥2 remained imprecise because of the small sample (n = 20). All methods sometimes hallucinated ECOG status.
Conclusion: Advanced LLM prompting improved ECOG extraction over basic methods. DFT and CoT each showed specific strengths (DFT had higher PPV and user satisfaction; CoT achieved higher sensitivity). These approaches appear to be generalizable across cancer types. Key implementation considerations include computational cost and human oversight. Overall, advanced prompting can standardize ECOG documentation, accelerate patient cohort identification, and inform personalized treatment planning.
{"title":"Prompt Engineering for Eastern Cooperative Oncology Group Status Extraction: Comparing Large Language Model Techniques.","authors":"Meenakshi Dubey, Kok Joon Chong, Yuba Raj Pun, Lee Yi Foo, Melissa Ooi, Iain Bee Huat Tan, David Shao Peng Tan, Kee Yuan Ngiam, Hwee Lin Wee","doi":"10.1200/CCI-25-00226","DOIUrl":"https://doi.org/10.1200/CCI-25-00226","url":null,"abstract":"<p><strong>Purpose: </strong>Eastern Cooperative Oncology Group (ECOG) performance status is critical for cancer patient management, yet it is often documented only in unstructured clinical notes. This study compares several approaches to extract ECOG status from oncology notes, focusing on advanced prompting techniques for large language models (LLMs).</p><p><strong>Methods: </strong>We evaluated four ECOG extraction approaches on unstructured clinical notes from patients with non-small cell lung cancer, multiple myeloma, or ovarian cancer (2017-2021). The approaches were a rule-based natural language processing algorithm, simple LLM prompting, and two advanced prompts (chain-of-thought and Double Filtering) using a domain-tuned LLM (LLAMAv3.2). Performance was measured on a binary outcome (any ECOG documented <i>v</i> none) and a three-class outcome (ECOG 0-1 <i>v</i> ≥2 <i>v</i> none) and via an adapted QUEST questionnaire for human evaluation.</p><p><strong>Results: </strong>Both CoT and double filtering technique (DFT) achieved 94% accuracy, outperforming the rule-based method (91%) and simple prompting (86%). DFT had the highest specificity (0.91) and positive predictive value (PPV; 0.93), whereas CoT attained the highest sensitivity (0.98). In the QUEST evaluation, DFT and CoT scored higher on output quality, reasoning, bias reduction, and user satisfaction than the simple prompt. DFT received the top satisfaction rating. In the three-class analysis, DFT and CoT again performed best (accuracy 0.91 <i>v</i> 0.87) and DFT was most sensitive for ECOG ≥2 cases. Estimates for ECOG ≥2 remained imprecise because of the small sample (n = 20). All methods sometimes hallucinated ECOG status.</p><p><strong>Conclusion: </strong>Advanced LLM prompting improved ECOG extraction over basic methods. DFT and CoT each showed specific strengths (DFT had higher PPV and user satisfaction; CoT achieved higher sensitivity). These approaches appear to be generalizable across cancer types. Key implementation considerations include computational cost and human oversight. Overall, advanced prompting can standardize ECOG documentation, accelerate patient cohort identification, and inform personalized treatment planning.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500226"},"PeriodicalIF":2.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146183253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose: Our study is motivated by evaluating the role of hematopoietic cell transplantation (HCT) after chimeric antigen receptor T-cell (CAR-T) therapy for ALL, a debated topic. Because patients may receive HCT at different times after CAR-T infusion or never, HCT post-CAR-T should be considered as a time-varying covariate (TVC).
Methods: Standard Cox models and Kaplan-Meier (KM) curves (naïve method) assume that TVC status is known and fixed at baseline, which can yield biased estimates. Landmark analysis is a popular alternative but depends on a chosen landmark time. Time-dependent (TD) Cox model is better suited for TVC although visualizing survival curves is complex. The newly proposed Smith-Zee method generates appropriate survival curves from TD Cox models.
Results: To address these challenges, we developed an open-source R Shiny tool integrating multiple models (naïve Cox, landmark Cox, and TD Cox) and curves (naïve KM, landmark KM, Smith-Zee, and Extended KM) to facilitate TVC analysis. Reanalysis of post-CAR-T HCT's effect on leukemia-free survival (LFS) showed consistent results between naïve and TD Cox models, whereas landmark analyses varied by landmark time. A separate data analysis of chronic graft-versus-host disease and survival showed that substantial differences emerged across statistical methods. Simulations revealed increased bias in naïve methods when TVC changed late and minimal bias when TVC changes occurred early relative to time to events.
Conclusion: We recommend TD Cox models and Smith-Zee curves for robust TVC analysis. Our R Shiny tool supports standardized analyses without requiring data sharing, thereby promoting collaboration across different institutions and providing a practical tool to advance survival analysis in oncology research.
{"title":"Novel R Shiny Tool for Survival Analysis With Time-Varying Covariate in Oncology Studies: Overcoming Biases and Enhancing Collaboration.","authors":"Yimei Li, Yang Qiao, Fei Gao, Jordan Gauthier, Qiang Ed Zhang, Jenna Voutsinas, Wendy Leisenring, Ted Gooley, Corinne Summers, Alexandre Hirayama, Cameron J Turtle, Rebecca Gardner, Jarcy Zee, Qian Vicky Wu","doi":"10.1200/CCI-25-00225","DOIUrl":"10.1200/CCI-25-00225","url":null,"abstract":"<p><strong>Purpose: </strong>Our study is motivated by evaluating the role of hematopoietic cell transplantation (HCT) after chimeric antigen receptor T-cell (CAR-T) therapy for ALL, a debated topic. Because patients may receive HCT at different times after CAR-T infusion or never, HCT post-CAR-T should be considered as a time-varying covariate (TVC).</p><p><strong>Methods: </strong>Standard Cox models and Kaplan-Meier (KM) curves (naïve method) assume that TVC status is known and fixed at baseline, which can yield biased estimates. Landmark analysis is a popular alternative but depends on a chosen landmark time. Time-dependent (TD) Cox model is better suited for TVC although visualizing survival curves is complex. The newly proposed Smith-Zee method generates appropriate survival curves from TD Cox models.</p><p><strong>Results: </strong>To address these challenges, we developed an open-source R Shiny tool integrating multiple models (naïve Cox, landmark Cox, and TD Cox) and curves (naïve KM, landmark KM, Smith-Zee, and Extended KM) to facilitate TVC analysis. Reanalysis of post-CAR-T HCT's effect on leukemia-free survival (LFS) showed consistent results between naïve and TD Cox models, whereas landmark analyses varied by landmark time. A separate data analysis of chronic graft-versus-host disease and survival showed that substantial differences emerged across statistical methods. Simulations revealed increased bias in naïve methods when TVC changed late and minimal bias when TVC changes occurred early relative to time to events.</p><p><strong>Conclusion: </strong>We recommend TD Cox models and Smith-Zee curves for robust TVC analysis. Our R Shiny tool supports standardized analyses without requiring data sharing, thereby promoting collaboration across different institutions and providing a practical tool to advance survival analysis in oncology research.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500225"},"PeriodicalIF":2.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12885575/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146094767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01Epub Date: 2026-02-12DOI: 10.1200/CCI-25-00257
Adam P Yan, Emily Saso, Julia Shannon, Heather Laird, Alyssa Ramdeo, Robin Deliva, Samantha Baron, Bren Cardiff, Daniel Rosenfield, Ashley Graham, Mihir Ramnani, Zahra Syed, Denise Connolly, Allison Starr, Priya Patel, L Lee Dupuis, Lillian Sung
Purpose: Calls to implement routine symptom screening among pediatric oncology patients are increasing. Objectives were to develop and evaluate the usability of Symptom Screening in Pediatrics (SSPedi), a validated patient reported outcome tool, when integrated into the Epic electronic health record.
Methods: We developed self-report and proxy-report SSPedi in Epic's patient portal MyChart and enrolled patients with cancer age 12-18 years or their parent/guardians, and parents/guardians of patients with cancer age 2-18 years. Participants were enrolled in three cohorts of 10 participants per cohort. A clinical research associate evaluated the participants' ability to correctly complete eight tasks including finding and completing SSPedi on a scheduled day and when unscheduled, locating tips to manage symptoms, and viewing past SSPedi reports. Participants self-reported ease or difficulty in completing each task. Modifications were made to refine SSPedi in Epic after the enrollment of each cohort of 10 patients on the basis of feedback.
Results: We enrolled 30 participants, including 21 parents/guardians and nine patients. Overall, 60% were correctly able to find SSPedi on a scheduled reminder day and 33% were able to find SSPedi on an unscheduled day. Once found, 70% of participants could complete SSPedi correctly. Only 33% could correctly view SSPedi trends over time. By self-report, 20 of 30 participants (67%) found SSPedi easy or very easy to use overall. This increased to 100% in the final cohort of 10 participants.
Conclusion: We integrated SSPedi into Epic. Participants can successfully complete SSPedi when scheduled on a reminder day. They found it more challenging to complete SSPedi without a reminder and to view past SSPedi reports. Implementation will require patient and parent/or guardian training and support.
{"title":"Integrating Symptom Screening in Pediatrics Into the Epic Electronic Health Record: Development and Acceptability for Pediatric Cancer Patients.","authors":"Adam P Yan, Emily Saso, Julia Shannon, Heather Laird, Alyssa Ramdeo, Robin Deliva, Samantha Baron, Bren Cardiff, Daniel Rosenfield, Ashley Graham, Mihir Ramnani, Zahra Syed, Denise Connolly, Allison Starr, Priya Patel, L Lee Dupuis, Lillian Sung","doi":"10.1200/CCI-25-00257","DOIUrl":"https://doi.org/10.1200/CCI-25-00257","url":null,"abstract":"<p><strong>Purpose: </strong>Calls to implement routine symptom screening among pediatric oncology patients are increasing. Objectives were to develop and evaluate the usability of Symptom Screening in Pediatrics (SSPedi), a validated patient reported outcome tool, when integrated into the Epic electronic health record.</p><p><strong>Methods: </strong>We developed self-report and proxy-report SSPedi in Epic's patient portal MyChart and enrolled patients with cancer age 12-18 years or their parent/guardians, and parents/guardians of patients with cancer age 2-18 years. Participants were enrolled in three cohorts of 10 participants per cohort. A clinical research associate evaluated the participants' ability to correctly complete eight tasks including finding and completing SSPedi on a scheduled day and when unscheduled, locating tips to manage symptoms, and viewing past SSPedi reports. Participants self-reported ease or difficulty in completing each task. Modifications were made to refine SSPedi in Epic after the enrollment of each cohort of 10 patients on the basis of feedback.</p><p><strong>Results: </strong>We enrolled 30 participants, including 21 parents/guardians and nine patients. Overall, 60% were correctly able to find SSPedi on a scheduled reminder day and 33% were able to find SSPedi on an unscheduled day. Once found, 70% of participants could complete SSPedi correctly. Only 33% could correctly view SSPedi trends over time. By self-report, 20 of 30 participants (67%) found SSPedi easy or very easy to use overall. This increased to 100% in the final cohort of 10 participants.</p><p><strong>Conclusion: </strong>We integrated SSPedi into Epic. Participants can successfully complete SSPedi when scheduled on a reminder day. They found it more challenging to complete SSPedi without a reminder and to view past SSPedi reports. Implementation will require patient and parent/or guardian training and support.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500257"},"PeriodicalIF":2.8,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146183303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2026-01-09DOI: 10.1200/CCI-25-00262
Guannan Gong, Jessica Liu, Sameer Pandya, Cristian Taborda, Nathalie Wiesendanger, Nate Price, Will Byron, Andreas Coppi, Patrick Young, Christina Wiess, Haley Dunning, Courtney Barganier, Rachel Brodeur, Neal Fischbach, Patricia LoRusso, Lajos Pusztai, So Yeon Kim, Mariya Rozenblit, Michael Cecchini, Anne Mongiu, Lourdes Mendez, Edward Kaftan, Charles Torre, Harlan Krumholz, Ian Krop, Wade Schulz, Maryam Lustberg, Pamela L Kunz
Purpose: Cancer clinical trial enrollment remains critically low at 5%-7% of adult patients despite exponential growth in available trials. Manual patient-trial matching represents a fundamental bottleneck, whereas current artificial intelligence (AI) and machine learning patient-trial matching systems lack data standardization and compatibility across health systems. We developed and validated a semiautomated clinical trial patient matching (CTPM) tool to improve recruitment efficiency and scalability.
Methods: We created a hybrid rules-based and natural language processing (NLP)-based pipeline that automatically screens patients using structured and unstructured electronic health record data standardized to the Observational Medical Outcomes Partnership (OMOP) common data model. CTPM performance was first evaluated on one metastatic colorectal cancer (CRC) trial by comparing CTPM accuracy and efficiency to manual chart review. Following the single-trial validation, we then implemented the system across 29 clinical trials spanning multiple cancer specialties and phases.
Results: For the single CRC trial, CTPM achieved 94% retrospective and 88% prospective accuracy, matching gold standard clinical chart review with 100% sensitivity. Implementation reduced chart review workload 10-fold and screening time by 41% (3.1 to 1.8 minutes per chart) for those patients who did undergo review. Since September 2022, the system has screened 98,348 patients across 29 trials, identifying 825 eligible candidates and facilitating 117 patient enrollments with 9%-37% consent rates.
Conclusion: This AI and NLP tool demonstrates improved efficiency in clinical trial recruitment by enabling research teams to focus on qualified candidates rather than exhaustive chart reviews. The OMOP-based framework supports scalability across health systems, with potential to address enrollment challenges that limit patient access to clinical trials.
{"title":"Clinical Trial Patient Matching: A Real-Time, Common Data Model and Artificial Intelligence-Driven System for Semiautomated Patient Prescreening in Cancer Clinical Trials.","authors":"Guannan Gong, Jessica Liu, Sameer Pandya, Cristian Taborda, Nathalie Wiesendanger, Nate Price, Will Byron, Andreas Coppi, Patrick Young, Christina Wiess, Haley Dunning, Courtney Barganier, Rachel Brodeur, Neal Fischbach, Patricia LoRusso, Lajos Pusztai, So Yeon Kim, Mariya Rozenblit, Michael Cecchini, Anne Mongiu, Lourdes Mendez, Edward Kaftan, Charles Torre, Harlan Krumholz, Ian Krop, Wade Schulz, Maryam Lustberg, Pamela L Kunz","doi":"10.1200/CCI-25-00262","DOIUrl":"https://doi.org/10.1200/CCI-25-00262","url":null,"abstract":"<p><strong>Purpose: </strong>Cancer clinical trial enrollment remains critically low at 5%-7% of adult patients despite exponential growth in available trials. Manual patient-trial matching represents a fundamental bottleneck, whereas current artificial intelligence (AI) and machine learning patient-trial matching systems lack data standardization and compatibility across health systems. We developed and validated a semiautomated clinical trial patient matching (CTPM) tool to improve recruitment efficiency and scalability.</p><p><strong>Methods: </strong>We created a hybrid rules-based and natural language processing (NLP)-based pipeline that automatically screens patients using structured and unstructured electronic health record data standardized to the Observational Medical Outcomes Partnership (OMOP) common data model. CTPM performance was first evaluated on one metastatic colorectal cancer (CRC) trial by comparing CTPM accuracy and efficiency to manual chart review. Following the single-trial validation, we then implemented the system across 29 clinical trials spanning multiple cancer specialties and phases.</p><p><strong>Results: </strong>For the single CRC trial, CTPM achieved 94% retrospective and 88% prospective accuracy, matching gold standard clinical chart review with 100% sensitivity. Implementation reduced chart review workload 10-fold and screening time by 41% (3.1 to 1.8 minutes per chart) for those patients who did undergo review. Since September 2022, the system has screened 98,348 patients across 29 trials, identifying 825 eligible candidates and facilitating 117 patient enrollments with 9%-37% consent rates.</p><p><strong>Conclusion: </strong>This AI and NLP tool demonstrates improved efficiency in clinical trial recruitment by enabling research teams to focus on qualified candidates rather than exhaustive chart reviews. The OMOP-based framework supports scalability across health systems, with potential to address enrollment challenges that limit patient access to clinical trials.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"10 ","pages":"e2500262"},"PeriodicalIF":2.8,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145946722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}