Pub Date : 2025-01-01Epub Date: 2025-01-10DOI: 10.1200/CCI-24-00300
{"title":"Errata: Waiting to Exhale: The Feasibility and Appropriateness of Home Blood Oxygen Monitoring in Oncology Patients Post-Hospital Discharge.","authors":"","doi":"10.1200/CCI-24-00300","DOIUrl":"https://doi.org/10.1200/CCI-24-00300","url":null,"abstract":"","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2400300"},"PeriodicalIF":3.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142958612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-01-22DOI: 10.1200/CCI.24.00042
Raul F Valenzuela, Elvis de Jesus Duran Sierra, Mathew A Canjirathinkal, Behrang Amini, Ken-Pin Hwang, Jingfei Ma, Keila E Torres, R Jason Stafford, Wei-Lien Wang, Robert S Benjamin, Andrew J Bishop, John E Madewell, William A Murphy, Colleen M Costelloe
Purpose: Undifferentiated pleomorphic sarcomas (UPSs) demonstrate therapy-induced hemosiderin deposition, granulation tissue formation, fibrosis, and calcification. We aimed to determine the treatment-assessment value of morphologic tumoral hemorrhage patterns and first- and high-order radiomic features extracted from contrast-enhanced susceptibility-weighted imaging (CE-SWI).
Materials and methods: This retrospective institutional review board-authorized study included 33 patients with extremity UPS with magnetic resonance imaging and resection performed from February 2021 to May 2023. Volumetric tumor segmentation was obtained at baseline, postsystemic chemotherapy (PC), and postradiation therapy (PRT). The pathology-assessed treatment effect (PATE) in surgical specimens separated patients into responders (R; ≥90%, n = 16), partial responders (PR; 89%-31%, n = 10), and nonresponders (NR; ≤30%, n = 7). RECIST, WHO, and volume were assessed for all time points. CE-SWI T2* morphologic patterns and 107 radiomic features were analyzed.
Results: A Complete-Ring (CR) pattern was observed in PRT in 71.4% of R (P = 7.71 × 10-6), an Incomplete-Ring pattern in 33.3% of PR (P = .2751), and a Globular pattern in 50% of NR (P = .1562). The first-order radiomic analysis from the CE-SWI intensity histogram outlined the values of the 10th and 90th percentiles and their skewness. R showed a 280% increase in 10th percentile voxels (P = .061) and a 241% increase in skewness (P = .0449) at PC. PR/NR showed a 690% increase in the 90th percentile voxels (P = .03) at PC. Multiple high-order radiomic texture features observed at PRT discriminated better R versus PR/NR than the first-order features.
Conclusion: CE-SWI morphologic patterns strongly correlate with PATE. The CR morphology pattern was the most frequent in R and had the highest statistical association predicting response at PRT, easily recognized by a radiologist not requiring postprocessing software. It can potentially outperform size-based metrics, such as RECIST. The first- and high-order radiomic analysis found several features separating R versus PR/NR.
{"title":"Novel Use and Value of Contrast-Enhanced Susceptibility-Weighted Imaging Morphologic and Radiomic Features in Predicting Extremity Soft Tissue Undifferentiated Pleomorphic Sarcoma Treatment Response.","authors":"Raul F Valenzuela, Elvis de Jesus Duran Sierra, Mathew A Canjirathinkal, Behrang Amini, Ken-Pin Hwang, Jingfei Ma, Keila E Torres, R Jason Stafford, Wei-Lien Wang, Robert S Benjamin, Andrew J Bishop, John E Madewell, William A Murphy, Colleen M Costelloe","doi":"10.1200/CCI.24.00042","DOIUrl":"https://doi.org/10.1200/CCI.24.00042","url":null,"abstract":"<p><strong>Purpose: </strong>Undifferentiated pleomorphic sarcomas (UPSs) demonstrate therapy-induced hemosiderin deposition, granulation tissue formation, fibrosis, and calcification. We aimed to determine the treatment-assessment value of morphologic tumoral hemorrhage patterns and first- and high-order radiomic features extracted from contrast-enhanced susceptibility-weighted imaging (CE-SWI).</p><p><strong>Materials and methods: </strong>This retrospective institutional review board-authorized study included 33 patients with extremity UPS with magnetic resonance imaging and resection performed from February 2021 to May 2023. Volumetric tumor segmentation was obtained at baseline, postsystemic chemotherapy (PC), and postradiation therapy (PRT). The pathology-assessed treatment effect (PATE) in surgical specimens separated patients into responders (R; ≥90%, n = 16), partial responders (PR; 89%-31%, n = 10), and nonresponders (NR; ≤30%, n = 7). RECIST, WHO, and volume were assessed for all time points. CE-SWI T2* morphologic patterns and 107 radiomic features were analyzed.</p><p><strong>Results: </strong>A Complete-Ring (CR) pattern was observed in PRT in 71.4% of R (<i>P</i> = 7.71 × 10<sup>-6</sup>), an Incomplete-Ring pattern in 33.3% of PR (<i>P</i> = .2751), and a Globular pattern in 50% of NR (<i>P</i> = .1562). The first-order radiomic analysis from the CE-SWI intensity histogram outlined the values of the 10th and 90th percentiles and their skewness. R showed a 280% increase in 10th percentile voxels (<i>P</i> = .061) and a 241% increase in skewness (<i>P</i> = .0449) at PC. PR/NR showed a 690% increase in the 90th percentile voxels (<i>P</i> = .03) at PC. Multiple high-order radiomic texture features observed at PRT discriminated better R versus PR/NR than the first-order features.</p><p><strong>Conclusion: </strong>CE-SWI morphologic patterns strongly correlate with PATE. The CR morphology pattern was the most frequent in R and had the highest statistical association predicting response at PRT, easily recognized by a radiologist not requiring postprocessing software. It can potentially outperform size-based metrics, such as RECIST. The first- and high-order radiomic analysis found several features separating R versus PR/NR.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2400042"},"PeriodicalIF":3.3,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143025544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2024-12-05DOI: 10.1200/CCI-24-00200
Shu Jiang, Debbie L Bennett, Bernard A Rosner, Rulla M Tamimi, Graham A Colditz
Purpose: Current image-based long-term risk prediction models do not fully use previous screening mammogram images. Dynamic prediction models have not been investigated for use in routine care.
Methods: We analyzed a prospective WashU clinic-based cohort of 10,099 cancer-free women at entry (between November 3, 2008 and February 2012). Follow-up through 2020 identified 478 pathology-confirmed breast cancers (BCs). The cohort included 27% Black women. An external validation cohort (Emory) included 18,360 women screened from 2013, followed through 2020. This included 42% Black women and 332 pathology-confirmed BC excluding those diagnosed within 6 months of screening. We trained a dynamic model using repeated screening mammograms at WashU to predict 5-year risk. This opportunistic screening service presented a range of mammogram images for each woman. We applied the model to the external validation data to evaluate discrimination performance (AUC) and calibrated to US SEER.
Results: Using 3 years of previous mammogram images available at the current screening visit, we obtained a 5-year AUC of 0.80 (95% CI, 0.78 to 0.83) in the external validation. This represents a significant improvement over the current visit mammogram AUC 0.74 (95% CI, 0.71 to 0.77; P < .01) in the same women. When calibrated, a risk ratio of 21.1 was observed comparing high (>4%) to very low (<0.3%) 5-year risk. The dynamic model classified 16% of the cohort as high risk among whom 61% of all BCs were diagnosed. The dynamic model performed comparably in Black and White women.
Conclusion: Adding previous screening mammogram images improves 5-year BC risk prediction beyond static models. It can identify women at high risk who might benefit from supplemental screening or risk-reduction strategies.
{"title":"Development and Validation of Dynamic 5-Year Breast Cancer Risk Model Using Repeated Mammograms.","authors":"Shu Jiang, Debbie L Bennett, Bernard A Rosner, Rulla M Tamimi, Graham A Colditz","doi":"10.1200/CCI-24-00200","DOIUrl":"10.1200/CCI-24-00200","url":null,"abstract":"<p><strong>Purpose: </strong>Current image-based long-term risk prediction models do not fully use previous screening mammogram images. Dynamic prediction models have not been investigated for use in routine care.</p><p><strong>Methods: </strong>We analyzed a prospective WashU clinic-based cohort of 10,099 cancer-free women at entry (between November 3, 2008 and February 2012). Follow-up through 2020 identified 478 pathology-confirmed breast cancers (BCs). The cohort included 27% Black women. An external validation cohort (Emory) included 18,360 women screened from 2013, followed through 2020. This included 42% Black women and 332 pathology-confirmed BC excluding those diagnosed within 6 months of screening. We trained a dynamic model using repeated screening mammograms at WashU to predict 5-year risk. This opportunistic screening service presented a range of mammogram images for each woman. We applied the model to the external validation data to evaluate discrimination performance (AUC) and calibrated to US SEER.</p><p><strong>Results: </strong>Using 3 years of previous mammogram images available at the current screening visit, we obtained a 5-year AUC of 0.80 (95% CI, 0.78 to 0.83) in the external validation. This represents a significant improvement over the current visit mammogram AUC 0.74 (95% CI, 0.71 to 0.77; <i>P</i> < .01) in the same women. When calibrated, a risk ratio of 21.1 was observed comparing high (>4%) to very low (<0.3%) 5-year risk. The dynamic model classified 16% of the cohort as high risk among whom 61% of all BCs were diagnosed. The dynamic model performed comparably in Black and White women.</p><p><strong>Conclusion: </strong>Adding previous screening mammogram images improves 5-year BC risk prediction beyond static models. It can identify women at high risk who might benefit from supplemental screening or risk-reduction strategies.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400200"},"PeriodicalIF":3.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11634085/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142786942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2024-12-20DOI: 10.1200/CCI.24.00107
Jaimie J Lee, Andres Zepeda, Gregory Arbour, Kathryn V Isaac, Raymond T Ng, Alan M Nichol
Purpose: Breast cancer relapses are rarely collected by cancer registries because of logistical and financial constraints. Hence, we investigated natural language processing (NLP), enhanced with state-of-the-art deep learning transformer tools and large language models, to automate relapse identification in the text of computed tomography (CT) reports.
Methods: We analyzed follow-up CT reports from patients diagnosed with breast cancer between January 1, 2005, and December 31, 2014. The reports were curated and annotated for the presence or absence of local, regional, and distant breast cancer relapses. We performed 10-fold cross-validation to evaluate models identifying different types of relapses in CT reports. Model performance was assessed with classification metrics, reported with 95% confidence intervals.
Results: In our data set of 1,445 CT reports, 799 (55.3%) described any relapse, 72 (5.0%) local relapses, 97 (6.7%) regional relapses, and 743 (51.4%) distant relapses. The any-relapse model achieved an accuracy of 89.6% (87.8-91.1), with a sensitivity of 93.2% (91.4-94.9) and a specificity of 84.2% (80.9-87.1). The local relapse model achieved an accuracy of 94.6% (93.3-95.7), a sensitivity of 44.4% (32.8-56.3), and a specificity of 97.2% (96.2-98.0). The regional relapse model showed an accuracy of 93.6% (92.3-94.9), a sensitivity of 70.1% (60.0-79.1), and a specificity of 95.3% (94.2-96.5). Finally, the distant relapse model demonstrated an accuracy of 88.1% (86.2-89.7), a sensitivity of 91.8% (89.9-93.8), and a specificity of 83.7% (80.5-86.4).
Conclusion: We developed NLP models to identify local, regional, and distant breast cancer relapses from CT reports. Automating the identification of breast cancer relapses can enhance data collection about patient outcomes.
{"title":"Automated Identification of Breast Cancer Relapse in Computed Tomography Reports Using Natural Language Processing.","authors":"Jaimie J Lee, Andres Zepeda, Gregory Arbour, Kathryn V Isaac, Raymond T Ng, Alan M Nichol","doi":"10.1200/CCI.24.00107","DOIUrl":"10.1200/CCI.24.00107","url":null,"abstract":"<p><strong>Purpose: </strong>Breast cancer relapses are rarely collected by cancer registries because of logistical and financial constraints. Hence, we investigated natural language processing (NLP), enhanced with state-of-the-art deep learning transformer tools and large language models, to automate relapse identification in the text of computed tomography (CT) reports.</p><p><strong>Methods: </strong>We analyzed follow-up CT reports from patients diagnosed with breast cancer between January 1, 2005, and December 31, 2014. The reports were curated and annotated for the presence or absence of local, regional, and distant breast cancer relapses. We performed 10-fold cross-validation to evaluate models identifying different types of relapses in CT reports. Model performance was assessed with classification metrics, reported with 95% confidence intervals.</p><p><strong>Results: </strong>In our data set of 1,445 CT reports, 799 (55.3%) described any relapse, 72 (5.0%) local relapses, 97 (6.7%) regional relapses, and 743 (51.4%) distant relapses. The any-relapse model achieved an accuracy of 89.6% (87.8-91.1), with a sensitivity of 93.2% (91.4-94.9) and a specificity of 84.2% (80.9-87.1). The local relapse model achieved an accuracy of 94.6% (93.3-95.7), a sensitivity of 44.4% (32.8-56.3), and a specificity of 97.2% (96.2-98.0). The regional relapse model showed an accuracy of 93.6% (92.3-94.9), a sensitivity of 70.1% (60.0-79.1), and a specificity of 95.3% (94.2-96.5). Finally, the distant relapse model demonstrated an accuracy of 88.1% (86.2-89.7), a sensitivity of 91.8% (89.9-93.8), and a specificity of 83.7% (80.5-86.4).</p><p><strong>Conclusion: </strong>We developed NLP models to identify local, regional, and distant breast cancer relapses from CT reports. Automating the identification of breast cancer relapses can enhance data collection about patient outcomes.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400107"},"PeriodicalIF":3.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11670918/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2024-12-20DOI: 10.1200/CCI.24.00101
Rachelle Swart, Liesbeth Boersma, Rianne Fijten, Wouter van Elmpt, Paul Cremers, Maria J G Jacobs
Purpose: Artificial intelligence (AI) applications in radiotherapy (RT) are expected to save time and improve quality, but implementation remains limited. Therefore, we used implementation science to develop a format for designing an implementation strategy for AI. This study aimed to (1) apply this format to develop an AI implementation strategy for our center; (2) identify insights gained to enhance AI implementation using this format; and (3) assess the feasibility and acceptability of this format to design a center-specific implementation strategy for departments aiming to implement AI.
Methods: We created an AI-implementation strategy for our own center using implementation science methods. This included a stakeholder analysis, literature review, and interviews to identify facilitators and barriers, and designed strategies to overcome the barriers. These methods were subsequently used in a workshop with teams from seven Dutch RT centers to develop their own AI-implementation plans. The applicability, appropriateness, and feasibility were evaluated by the workshop participants, and relevant insights for AI implementation were summarized.
Results: The stakeholder analysis identified internal (physicians, physicists, RT technicians, information technology, and education) and external (patients and representatives) stakeholders. Barriers and facilitators included concerns about opacity, privacy, data quality, legal aspects, knowledge, trust, stakeholder involvement, ethics, and multidisciplinary collaboration, all integrated into our implementation strategy. The workshop evaluation showed high acceptability (18 participants [90%]), appropriateness (17 participants [85%]), and feasibility (15 participants [75%]) of the implementation strategy. Sixteen participants fully agreed with the format.
Conclusion: Our study highlights the need for a collaborative approach to implement AI in RT. We designed a strategy to overcome organizational challenges, improve AI integration, and enhance patient care. Workshop feedback indicates the proposed methods are useful for multiple RT centers. Insights gained by applying the methods highlight the importance of multidisciplinary collaboration in the development and implementation of AI.
{"title":"Implementation Strategy for Artificial Intelligence in Radiotherapy: Can Implementation Science Help?","authors":"Rachelle Swart, Liesbeth Boersma, Rianne Fijten, Wouter van Elmpt, Paul Cremers, Maria J G Jacobs","doi":"10.1200/CCI.24.00101","DOIUrl":"10.1200/CCI.24.00101","url":null,"abstract":"<p><strong>Purpose: </strong>Artificial intelligence (AI) applications in radiotherapy (RT) are expected to save time and improve quality, but implementation remains limited. Therefore, we used implementation science to develop a format for designing an implementation strategy for AI. This study aimed to (1) apply this format to develop an AI implementation strategy for our center; (2) identify insights gained to enhance AI implementation using this format; and (3) assess the feasibility and acceptability of this format to design a center-specific implementation strategy for departments aiming to implement AI.</p><p><strong>Methods: </strong>We created an AI-implementation strategy for our own center using implementation science methods. This included a stakeholder analysis, literature review, and interviews to identify facilitators and barriers, and designed strategies to overcome the barriers. These methods were subsequently used in a workshop with teams from seven Dutch RT centers to develop their own AI-implementation plans. The applicability, appropriateness, and feasibility were evaluated by the workshop participants, and relevant insights for AI implementation were summarized.</p><p><strong>Results: </strong>The stakeholder analysis identified internal (physicians, physicists, RT technicians, information technology, and education) and external (patients and representatives) stakeholders. Barriers and facilitators included concerns about opacity, privacy, data quality, legal aspects, knowledge, trust, stakeholder involvement, ethics, and multidisciplinary collaboration, all integrated into our implementation strategy. The workshop evaluation showed high acceptability (18 participants [90%]), appropriateness (17 participants [85%]), and feasibility (15 participants [75%]) of the implementation strategy. Sixteen participants fully agreed with the format.</p><p><strong>Conclusion: </strong>Our study highlights the need for a collaborative approach to implement AI in RT. We designed a strategy to overcome organizational challenges, improve AI integration, and enhance patient care. Workshop feedback indicates the proposed methods are useful for multiple RT centers. Insights gained by applying the methods highlight the importance of multidisciplinary collaboration in the development and implementation of AI.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400101"},"PeriodicalIF":3.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11670909/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2024-12-11DOI: 10.1200/CCI.24.00123
Zacharie Hamilton, Aseem Aseem, Zhengjia Chen, Noor Naffakh, Natalie M Reizine, Frank Weinberg, Shikha Jain, Larry G Kessler, Vijayakrishna K Gadi, Christopher Bun, Ryan H Nguyen
Purpose: Precision oncology in non-small cell lung cancer (NSCLC) relies on biomarker testing for clinical decision making. Despite its importance, challenges like the lack of genomic oncology training, nonstandardized biomarker reporting, and a rapidly evolving treatment landscape hinder its practice. Generative artificial intelligence (AI), such as ChatGPT, offers promise for enhancing clinical decision support. Effective performance metrics are crucial to evaluate these models' accuracy and their propensity for producing incorrect or hallucinated information. We assessed various ChatGPT versions' ability to generate accurate next-generation sequencing reports and treatment recommendations for NSCLC, using a novel Generative AI Performance Score (G-PS), which considers accuracy, relevancy, and hallucinations.
Methods: We queried ChatGPT versions for first-line NSCLC treatment recommendations with an Food and Drug Administration-approved targeted therapy, using a zero-shot prompt approach for eight oncogenes. Responses were assessed against National Comprehensive Cancer Network (NCCN) guidelines for accuracy, relevance, and hallucinations, with G-PS calculating scores from -1 (all hallucinations) to 1 (fully NCCN-compliant recommendations). G-PS was designed as a composite measure with a base score for correct recommendations (weighted for preferred treatments) and a penalty for hallucinations.
Results: Analyzing 160 responses, generative pre-trained transformer (GPT)-4 outperformed GPT-3.5, showing higher base score (90% v 60%; P < .01) and fewer hallucinations (34% v 53%; P < .01). GPT-4's overall G-PS was significantly higher (0.34 v -0.15; P < .01), indicating superior performance.
Conclusion: This study highlights the rapid improvement of generative AI in matching treatment recommendations with biomarkers in precision oncology. Although the rate of hallucinations improved in the GPT-4 model, future generative AI use in clinical care requires high levels of accuracy with minimal to no room for hallucinations. The GP-S represents a novel metric quantifying generative AI utility in health care compared with national guidelines, with potential adaptation beyond precision oncology.
{"title":"Comparative Analysis of Generative Pre-Trained Transformer Models in Oncogene-Driven Non-Small Cell Lung Cancer: Introducing the Generative Artificial Intelligence Performance Score.","authors":"Zacharie Hamilton, Aseem Aseem, Zhengjia Chen, Noor Naffakh, Natalie M Reizine, Frank Weinberg, Shikha Jain, Larry G Kessler, Vijayakrishna K Gadi, Christopher Bun, Ryan H Nguyen","doi":"10.1200/CCI.24.00123","DOIUrl":"10.1200/CCI.24.00123","url":null,"abstract":"<p><strong>Purpose: </strong>Precision oncology in non-small cell lung cancer (NSCLC) relies on biomarker testing for clinical decision making. Despite its importance, challenges like the lack of genomic oncology training, nonstandardized biomarker reporting, and a rapidly evolving treatment landscape hinder its practice. Generative artificial intelligence (AI), such as ChatGPT, offers promise for enhancing clinical decision support. Effective performance metrics are crucial to evaluate these models' accuracy and their propensity for producing incorrect or hallucinated information. We assessed various ChatGPT versions' ability to generate accurate next-generation sequencing reports and treatment recommendations for NSCLC, using a novel Generative AI Performance Score (G-PS), which considers accuracy, relevancy, and hallucinations.</p><p><strong>Methods: </strong>We queried ChatGPT versions for first-line NSCLC treatment recommendations with an Food and Drug Administration-approved targeted therapy, using a zero-shot prompt approach for eight oncogenes. Responses were assessed against National Comprehensive Cancer Network (NCCN) guidelines for accuracy, relevance, and hallucinations, with G-PS calculating scores from -1 (all hallucinations) to 1 (fully NCCN-compliant recommendations). G-PS was designed as a composite measure with a base score for correct recommendations (weighted for preferred treatments) and a penalty for hallucinations.</p><p><strong>Results: </strong>Analyzing 160 responses, generative pre-trained transformer (GPT)-4 outperformed GPT-3.5, showing higher base score (90% <i>v</i> 60%; <i>P</i> < .01) and fewer hallucinations (34% <i>v</i> 53%; <i>P</i> < .01). GPT-4's overall G-PS was significantly higher (0.34 <i>v</i> -0.15; <i>P</i> < .01), indicating superior performance.</p><p><strong>Conclusion: </strong>This study highlights the rapid improvement of generative AI in matching treatment recommendations with biomarkers in precision oncology. Although the rate of hallucinations improved in the GPT-4 model, future generative AI use in clinical care requires high levels of accuracy with minimal to no room for hallucinations. The GP-S represents a novel metric quantifying generative AI utility in health care compared with national guidelines, with potential adaptation beyond precision oncology.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400123"},"PeriodicalIF":3.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11634130/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142814870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2024-12-11DOI: 10.1200/CCI.24.00126
Li-Ching Chen, Travis Zack, Arda Demirci, Madhumita Sushil, Brenda Miao, Corynn Kasap, Atul Butte, Eric A Collisson, Julian C Hong
Purpose: We examined the effectiveness of proprietary and open large language models (LLMs) in detecting disease presence, location, and treatment response in pancreatic cancer from radiology reports.
Methods: We analyzed 203 deidentified radiology reports, manually annotated for disease status, location, and indeterminate nodules needing follow-up. Using generative pre-trained transformer (GPT)-4, GPT-3.5-turbo, and open models such as Gemma-7B and Llama3-8B, we employed strategies such as ablation and prompt engineering to boost accuracy. Discrepancies between human and model interpretations were reviewed by a secondary oncologist.
Results: Among 164 patients with pancreatic tumor, GPT-4 showed the highest accuracy in inferring disease status, achieving a 75.5% correctness (F1-micro). Open models Mistral-7B and Llama3-8B performed comparably, with accuracies of 68.6% and 61.4%, respectively. Mistral-7B excelled in deriving correct inferences from objective findings directly. Most tested models demonstrated proficiency in identifying disease containing anatomic locations from a list of choices, with GPT-4 and Llama3-8B showing near-parity in precision and recall for disease site identification. However, open models struggled with differentiating benign from malignant postsurgical changes, affecting their precision in identifying findings indeterminate for cancer. A secondary review occasionally favored GPT-3.5's interpretations, indicating the variability in human judgment.
Conclusion: LLMs, especially GPT-4, are proficient in deriving oncologic insights from radiology reports. Their performance is enhanced by effective summarization strategies, demonstrating their potential in clinical support and health care analytics. This study also underscores the possibility of zero-shot open model utility in environments where proprietary models are restricted. Finally, by providing a set of annotated radiology reports, this paper presents a valuable data set for further LLM research in oncology.
{"title":"Assessing Large Language Models for Oncology Data Inference From Radiology Reports.","authors":"Li-Ching Chen, Travis Zack, Arda Demirci, Madhumita Sushil, Brenda Miao, Corynn Kasap, Atul Butte, Eric A Collisson, Julian C Hong","doi":"10.1200/CCI.24.00126","DOIUrl":"https://doi.org/10.1200/CCI.24.00126","url":null,"abstract":"<p><strong>Purpose: </strong>We examined the effectiveness of proprietary and open large language models (LLMs) in detecting disease presence, location, and treatment response in pancreatic cancer from radiology reports.</p><p><strong>Methods: </strong>We analyzed 203 deidentified radiology reports, manually annotated for disease status, location, and indeterminate nodules needing follow-up. Using generative pre-trained transformer (GPT)-4, GPT-3.5-turbo, and open models such as Gemma-7B and Llama3-8B, we employed strategies such as ablation and prompt engineering to boost accuracy. Discrepancies between human and model interpretations were reviewed by a secondary oncologist.</p><p><strong>Results: </strong>Among 164 patients with pancreatic tumor, GPT-4 showed the highest accuracy in inferring disease status, achieving a 75.5% correctness (F1-micro). Open models Mistral-7B and Llama3-8B performed comparably, with accuracies of 68.6% and 61.4%, respectively. Mistral-7B excelled in deriving correct inferences from objective findings directly. Most tested models demonstrated proficiency in identifying disease containing anatomic locations from a list of choices, with GPT-4 and Llama3-8B showing near-parity in precision and recall for disease site identification. However, open models struggled with differentiating benign from malignant postsurgical changes, affecting their precision in identifying findings indeterminate for cancer. A secondary review occasionally favored GPT-3.5's interpretations, indicating the variability in human judgment.</p><p><strong>Conclusion: </strong>LLMs, especially GPT-4, are proficient in deriving oncologic insights from radiology reports. Their performance is enhanced by effective summarization strategies, demonstrating their potential in clinical support and health care analytics. This study also underscores the possibility of zero-shot open model utility in environments where proprietary models are restricted. Finally, by providing a set of annotated radiology reports, this paper presents a valuable data set for further LLM research in oncology.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400126"},"PeriodicalIF":3.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142814650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Purpose: Postsustained virologic response (SVR) screening following clinical guidelines does not address individual risk of hepatocellular carcinoma (HCC). Our aim is to provide tailored screening for patients using machine learning to predict HCC incidence after SVR.
Methods: Using clinical data from 1,028 SVR patients, we developed an HCC prediction model using a random survival forest (RSF). Model performance was assessed using Harrel's c-index and validated in an independent cohort of 737 SVR patients. Shapley additive explanation (SHAP) facilitated feature quantification, whereas optimal cutoffs were determined using maximally selected rank statistics. We used Kaplan-Meier analysis to compare cumulative HCC incidence between risk groups.
Results: We achieved c-index scores and 95% CIs of 0.90 (0.85 to 0.94) and 0.80 (0.74 to 0.85) in the derivation and validation cohorts, respectively, in a model using platelet count, gamma-glutamyl transpeptidase, sex, age, and ALT. Stratification resulted in four risk groups: low, intermediate, high, and very high. The 5-year cumulative HCC incidence rates and 95% CIs for these groups were as follows: derivation: 0% (0 to 0), 3.8% (0.6 to 6.8), 26.2% (17.2 to 34.3), and 54.2% (20.2 to 73.7), respectively, and validation: 0.7% (0 to 1.6), 7.1% (2.7 to 11.3), 5.2% (0 to 10.8), and 28.6% (0 to 55.3), respectively.
Conclusion: The integration of RSF and SHAP enabled accurate HCC risk classification after SVR, which may facilitate individualized HCC screening strategies and more cost-effective care.
{"title":"Prediction of Hepatocellular Carcinoma After Hepatitis C Virus Sustained Virologic Response Using a Random Survival Forest Model.","authors":"Hikaru Nakahara, Atsushi Ono, C Nelson Hayes, Yuki Shirane, Ryoichi Miura, Yasutoshi Fujii, Serami Murakami, Kenji Yamaoka, Hauri Bao, Shinsuke Uchikawa, Hatsue Fujino, Eisuke Murakami, Tomokazu Kawaoka, Daiki Miki, Masataka Tsuge, Shiro Oka","doi":"10.1200/CCI.24.00108","DOIUrl":"https://doi.org/10.1200/CCI.24.00108","url":null,"abstract":"<p><strong>Purpose: </strong>Postsustained virologic response (SVR) screening following clinical guidelines does not address individual risk of hepatocellular carcinoma (HCC). Our aim is to provide tailored screening for patients using machine learning to predict HCC incidence after SVR.</p><p><strong>Methods: </strong>Using clinical data from 1,028 SVR patients, we developed an HCC prediction model using a random survival forest (RSF). Model performance was assessed using Harrel's c-index and validated in an independent cohort of 737 SVR patients. Shapley additive explanation (SHAP) facilitated feature quantification, whereas optimal cutoffs were determined using maximally selected rank statistics. We used Kaplan-Meier analysis to compare cumulative HCC incidence between risk groups.</p><p><strong>Results: </strong>We achieved c-index scores and 95% CIs of 0.90 (0.85 to 0.94) and 0.80 (0.74 to 0.85) in the derivation and validation cohorts, respectively, in a model using platelet count, gamma-glutamyl transpeptidase, sex, age, and ALT. Stratification resulted in four risk groups: low, intermediate, high, and very high. The 5-year cumulative HCC incidence rates and 95% CIs for these groups were as follows: derivation: 0% (0 to 0), 3.8% (0.6 to 6.8), 26.2% (17.2 to 34.3), and 54.2% (20.2 to 73.7), respectively, and validation: 0.7% (0 to 1.6), 7.1% (2.7 to 11.3), 5.2% (0 to 10.8), and 28.6% (0 to 55.3), respectively.</p><p><strong>Conclusion: </strong>The integration of RSF and SHAP enabled accurate HCC risk classification after SVR, which may facilitate individualized HCC screening strategies and more cost-effective care.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400108"},"PeriodicalIF":3.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142856659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2024-12-20DOI: 10.1200/CCI.24.00132
Gurjyot K Doshi, Andrew J Osterland, Ping Shi, Annette Yim, Viviana Del Tejo, Sarah B Guttenplan, Samantha Eiffert, Xin Yin, Lisa Rosenblatt, Paul R Conkling
Purpose: Nivolumab plus ipilimumab (NIVO + IPI) is a first-in-class combination immunotherapy for the treatment of intermediate- or poor (I/P)-risk advanced or metastatic renal cell carcinoma (mRCC). Currently, there are limited real-world data regarding clinical effectiveness beyond 12-24 months from treatment initiation. In this real-world study, treatment patterns and clinical outcomes were evaluated for NIVO + IPI in a community oncology setting.
Methods: A retrospective analysis using electronic medical record data from The US Oncology Network examined patients with I/P-risk clear cell mRCC who initiated first-line (1L) NIVO + IPI between January 4, 2018, and December 31, 2019, with follow-up until June 30, 2022. Baseline demographics, clinical characteristics, treatment patterns, clinical effectiveness, and safety outcomes were assessed descriptively. Overall survival (OS) and real-world progression-free survival (rwPFS) were analyzed using Kaplan-Meier methods.
Results: Among 187 patients identified (median follow-up, 22.4 months), with median age 63 (range, 30-89) years, 74 (39.6%) patients had poor risk and 37 (19.8%) patients had Eastern Cooperative Oncology Group performance status score ≥2. Of 86 patients who received second-line therapy, 54.7% received cabozantinib and 10.5% received pazopanib. The median (95% CI) OS and rwPFS were 38.4 (24.7-46.1) months and 11.1 (7.5-15.0) months, respectively. Treatment-related adverse events (TRAEs) were reported in 89 (47.6%) patients, including fatigue (n = 25, 13.4%) and rash (n = 19, 10.2%).
Conclusion: This study provides data to support the understanding of the real-world utilization and long-term effectiveness of 1L NIVO + IPI in patients with I/P-risk mRCC. TRAE rates were low relative to clinical trials.
目的:Nivolumab + ipilimumab (NIVO + IPI)是一种用于治疗中或低(I/P)风险晚期或转移性肾细胞癌(mRCC)的首创联合免疫疗法。目前,关于治疗开始后12-24个月的临床有效性的实际数据有限。在这项现实世界的研究中,在社区肿瘤学环境中评估了NIVO + IPI的治疗模式和临床结果。方法:回顾性分析美国肿瘤网络的电子病历数据,对2018年1月4日至2019年12月31日期间开始一线(1L) NIVO + IPI的I/ p -风险透明细胞mRCC患者进行分析,随访至2022年6月30日。对基线人口统计学、临床特征、治疗模式、临床有效性和安全性结果进行描述性评估。采用Kaplan-Meier方法分析总生存期(OS)和真实世界无进展生存期(rwPFS)。结果:187例患者(中位随访22.4个月),中位年龄63岁(范围30 ~ 89岁),不良风险74例(39.6%),东部肿瘤合作组绩效状态评分≥2例(19.8%)。86名接受二线治疗的患者中,54.7%接受卡博赞替尼治疗,10.5%接受帕唑帕尼治疗。中位(95% CI) OS和rwPFS分别为38.4(24.7-46.1)个月和11.1(7.5-15.0)个月。89例(47.6%)患者报告了治疗相关不良事件(TRAEs),包括疲劳(n = 25, 13.4%)和皮疹(n = 19, 10.2%)。结论:本研究提供的数据支持了解1L NIVO + IPI在I/P-risk mRCC患者中的实际使用情况和长期有效性。与临床试验相比,TRAE率较低。
{"title":"Real-World Outcomes in Patients With Metastatic Renal Cell Carcinoma Treated With First-Line Nivolumab Plus Ipilimumab in the United States.","authors":"Gurjyot K Doshi, Andrew J Osterland, Ping Shi, Annette Yim, Viviana Del Tejo, Sarah B Guttenplan, Samantha Eiffert, Xin Yin, Lisa Rosenblatt, Paul R Conkling","doi":"10.1200/CCI.24.00132","DOIUrl":"10.1200/CCI.24.00132","url":null,"abstract":"<p><strong>Purpose: </strong>Nivolumab plus ipilimumab (NIVO + IPI) is a first-in-class combination immunotherapy for the treatment of intermediate- or poor (I/P)-risk advanced or metastatic renal cell carcinoma (mRCC). Currently, there are limited real-world data regarding clinical effectiveness beyond 12-24 months from treatment initiation. In this real-world study, treatment patterns and clinical outcomes were evaluated for NIVO + IPI in a community oncology setting.</p><p><strong>Methods: </strong>A retrospective analysis using electronic medical record data from The US Oncology Network examined patients with I/P-risk clear cell mRCC who initiated first-line (1L) NIVO + IPI between January 4, 2018, and December 31, 2019, with follow-up until June 30, 2022. Baseline demographics, clinical characteristics, treatment patterns, clinical effectiveness, and safety outcomes were assessed descriptively. Overall survival (OS) and real-world progression-free survival (rwPFS) were analyzed using Kaplan-Meier methods.</p><p><strong>Results: </strong>Among 187 patients identified (median follow-up, 22.4 months), with median age 63 (range, 30-89) years, 74 (39.6%) patients had poor risk and 37 (19.8%) patients had Eastern Cooperative Oncology Group performance status score ≥2. Of 86 patients who received second-line therapy, 54.7% received cabozantinib and 10.5% received pazopanib. The median (95% CI) OS and rwPFS were 38.4 (24.7-46.1) months and 11.1 (7.5-15.0) months, respectively. Treatment-related adverse events (TRAEs) were reported in 89 (47.6%) patients, including fatigue (n = 25, 13.4%) and rash (n = 19, 10.2%).</p><p><strong>Conclusion: </strong>This study provides data to support the understanding of the real-world utilization and long-term effectiveness of 1L NIVO + IPI in patients with I/P-risk mRCC. TRAE rates were low relative to clinical trials.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400132"},"PeriodicalIF":3.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11670916/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2024-12-17DOI: 10.1200/CCI-24-00196
Bradley D McDowell, Michael A O'Rorke, Mary C Schroeder, Elizabeth A Chrischilles, Christine M Spinka, Lemuel R Waitman, Kelechi Anuforo, Alejandro Araya, Haddyjatou Bah, Jackson Barlocker, Sravani Chandaka, Lindsay G Cowell, Carol R Geary, Snehil Gupta, Benjamin D Horne, Boyd M Knosp, Albert M Lai, Vasanthi Mandhadi, Abu Saleh Mohammad Mosa, Phillip Reeder, Giyung Ryu, Brian Shukwit, Claire Smith, Alexander J Stoddard, Mahanazuddin Syed, Shorabuddin Syed, Bradley W Taylor, Jeffrey J VanWormer
Purpose: Electronic health records (EHRs) comprise a rich source of real-world data for cancer studies, but they often lack critical structured data elements such as diagnosis date and disease stage. Fortunately, such concepts are available from hospital cancer registries. We describe experiences from integrating cancer registry data with EHR and billing data in an interoperable data model across a multisite clinical research network.
Methods: After sites implemented cancer registry data into a tumor table compatible with the PCORnet Common Data Model (CDM), distributed queries were performed to assess quality issues. After remediation of quality issues, another query produced descriptive frequencies of cancer types and demographic characteristics. This included linked BMI. We also report two current use cases of the new resource.
Results: Eleven sites implemented the tumor table, yielding a resource with data for 572,902 tumors. Institutional and technical barriers were surmounted to accomplish this. Variations in racial and ethnic distributions across the sites were observed; the percent of tumors among Black patients ranged from <1% to 15% across sites, and the percent of tumors among Hispanic patients ranged from 1% to 46% across sites. Current use cases include a pragmatic prospective cohort study of a rare cancer and a retrospective cohort study leveraging body size and chemotherapy dosing.
Conclusion: Integrating cancer registry data with the PCORnet CDM across multiple institutions creates a powerful resource for cancer studies. It provides a wider array of structured, cancer-relevant concepts, and it allows investigators to examine variability in those concepts across many treatment environments. Having the CDM tumor table in place enhances the impact of the network's effectiveness for real-world cancer research.
{"title":"Implementing Cancer Registry Data With the PCORnet Common Data Model: The Greater Plains Collaborative Experience.","authors":"Bradley D McDowell, Michael A O'Rorke, Mary C Schroeder, Elizabeth A Chrischilles, Christine M Spinka, Lemuel R Waitman, Kelechi Anuforo, Alejandro Araya, Haddyjatou Bah, Jackson Barlocker, Sravani Chandaka, Lindsay G Cowell, Carol R Geary, Snehil Gupta, Benjamin D Horne, Boyd M Knosp, Albert M Lai, Vasanthi Mandhadi, Abu Saleh Mohammad Mosa, Phillip Reeder, Giyung Ryu, Brian Shukwit, Claire Smith, Alexander J Stoddard, Mahanazuddin Syed, Shorabuddin Syed, Bradley W Taylor, Jeffrey J VanWormer","doi":"10.1200/CCI-24-00196","DOIUrl":"10.1200/CCI-24-00196","url":null,"abstract":"<p><strong>Purpose: </strong>Electronic health records (EHRs) comprise a rich source of real-world data for cancer studies, but they often lack critical structured data elements such as diagnosis date and disease stage. Fortunately, such concepts are available from hospital cancer registries. We describe experiences from integrating cancer registry data with EHR and billing data in an interoperable data model across a multisite clinical research network.</p><p><strong>Methods: </strong>After sites implemented cancer registry data into a tumor table compatible with the PCORnet Common Data Model (CDM), distributed queries were performed to assess quality issues. After remediation of quality issues, another query produced descriptive frequencies of cancer types and demographic characteristics. This included linked BMI. We also report two current use cases of the new resource.</p><p><strong>Results: </strong>Eleven sites implemented the tumor table, yielding a resource with data for 572,902 tumors. Institutional and technical barriers were surmounted to accomplish this. Variations in racial and ethnic distributions across the sites were observed; the percent of tumors among Black patients ranged from <1% to 15% across sites, and the percent of tumors among Hispanic patients ranged from 1% to 46% across sites. Current use cases include a pragmatic prospective cohort study of a rare cancer and a retrospective cohort study leveraging body size and chemotherapy dosing.</p><p><strong>Conclusion: </strong>Integrating cancer registry data with the PCORnet CDM across multiple institutions creates a powerful resource for cancer studies. It provides a wider array of structured, cancer-relevant concepts, and it allows investigators to examine variability in those concepts across many treatment environments. Having the CDM tumor table in place enhances the impact of the network's effectiveness for real-world cancer research.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400196"},"PeriodicalIF":3.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11658786/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142848405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}