Comparative Analysis of Generative Pre-Trained Transformer Models in Oncogene-Driven Non-Small Cell Lung Cancer: Introducing the Generative Artificial Intelligence Performance Score.

IF 3.3 Q2 ONCOLOGY JCO Clinical Cancer Informatics Pub Date : 2024-12-01 Epub Date: 2024-12-11 DOI:10.1200/CCI.24.00123
Zacharie Hamilton, Aseem Aseem, Zhengjia Chen, Noor Naffakh, Natalie M Reizine, Frank Weinberg, Shikha Jain, Larry G Kessler, Vijayakrishna K Gadi, Christopher Bun, Ryan H Nguyen
{"title":"Comparative Analysis of Generative Pre-Trained Transformer Models in Oncogene-Driven Non-Small Cell Lung Cancer: Introducing the Generative Artificial Intelligence Performance Score.","authors":"Zacharie Hamilton, Aseem Aseem, Zhengjia Chen, Noor Naffakh, Natalie M Reizine, Frank Weinberg, Shikha Jain, Larry G Kessler, Vijayakrishna K Gadi, Christopher Bun, Ryan H Nguyen","doi":"10.1200/CCI.24.00123","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Precision oncology in non-small cell lung cancer (NSCLC) relies on biomarker testing for clinical decision making. Despite its importance, challenges like the lack of genomic oncology training, nonstandardized biomarker reporting, and a rapidly evolving treatment landscape hinder its practice. Generative artificial intelligence (AI), such as ChatGPT, offers promise for enhancing clinical decision support. Effective performance metrics are crucial to evaluate these models' accuracy and their propensity for producing incorrect or hallucinated information. We assessed various ChatGPT versions' ability to generate accurate next-generation sequencing reports and treatment recommendations for NSCLC, using a novel Generative AI Performance Score (G-PS), which considers accuracy, relevancy, and hallucinations.</p><p><strong>Methods: </strong>We queried ChatGPT versions for first-line NSCLC treatment recommendations with an Food and Drug Administration-approved targeted therapy, using a zero-shot prompt approach for eight oncogenes. Responses were assessed against National Comprehensive Cancer Network (NCCN) guidelines for accuracy, relevance, and hallucinations, with G-PS calculating scores from -1 (all hallucinations) to 1 (fully NCCN-compliant recommendations). G-PS was designed as a composite measure with a base score for correct recommendations (weighted for preferred treatments) and a penalty for hallucinations.</p><p><strong>Results: </strong>Analyzing 160 responses, generative pre-trained transformer (GPT)-4 outperformed GPT-3.5, showing higher base score (90% <i>v</i> 60%; <i>P</i> < .01) and fewer hallucinations (34% <i>v</i> 53%; <i>P</i> < .01). GPT-4's overall G-PS was significantly higher (0.34 <i>v</i> -0.15; <i>P</i> < .01), indicating superior performance.</p><p><strong>Conclusion: </strong>This study highlights the rapid improvement of generative AI in matching treatment recommendations with biomarkers in precision oncology. Although the rate of hallucinations improved in the GPT-4 model, future generative AI use in clinical care requires high levels of accuracy with minimal to no room for hallucinations. The GP-S represents a novel metric quantifying generative AI utility in health care compared with national guidelines, with potential adaptation beyond precision oncology.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"8 ","pages":"e2400123"},"PeriodicalIF":3.3000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11634130/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JCO Clinical Cancer Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1200/CCI.24.00123","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/11 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: Precision oncology in non-small cell lung cancer (NSCLC) relies on biomarker testing for clinical decision making. Despite its importance, challenges like the lack of genomic oncology training, nonstandardized biomarker reporting, and a rapidly evolving treatment landscape hinder its practice. Generative artificial intelligence (AI), such as ChatGPT, offers promise for enhancing clinical decision support. Effective performance metrics are crucial to evaluate these models' accuracy and their propensity for producing incorrect or hallucinated information. We assessed various ChatGPT versions' ability to generate accurate next-generation sequencing reports and treatment recommendations for NSCLC, using a novel Generative AI Performance Score (G-PS), which considers accuracy, relevancy, and hallucinations.

Methods: We queried ChatGPT versions for first-line NSCLC treatment recommendations with an Food and Drug Administration-approved targeted therapy, using a zero-shot prompt approach for eight oncogenes. Responses were assessed against National Comprehensive Cancer Network (NCCN) guidelines for accuracy, relevance, and hallucinations, with G-PS calculating scores from -1 (all hallucinations) to 1 (fully NCCN-compliant recommendations). G-PS was designed as a composite measure with a base score for correct recommendations (weighted for preferred treatments) and a penalty for hallucinations.

Results: Analyzing 160 responses, generative pre-trained transformer (GPT)-4 outperformed GPT-3.5, showing higher base score (90% v 60%; P < .01) and fewer hallucinations (34% v 53%; P < .01). GPT-4's overall G-PS was significantly higher (0.34 v -0.15; P < .01), indicating superior performance.

Conclusion: This study highlights the rapid improvement of generative AI in matching treatment recommendations with biomarkers in precision oncology. Although the rate of hallucinations improved in the GPT-4 model, future generative AI use in clinical care requires high levels of accuracy with minimal to no room for hallucinations. The GP-S represents a novel metric quantifying generative AI utility in health care compared with national guidelines, with potential adaptation beyond precision oncology.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
癌基因驱动的非小细胞肺癌中生成预训练变压器模型的比较分析:引入生成人工智能性能评分。
目的:非小细胞肺癌(NSCLC)的精准肿瘤学依赖于临床决策的生物标记物检测。尽管其重要性不言而喻,但缺乏肿瘤基因组学培训、生物标记物报告不规范以及治疗环境快速变化等挑战阻碍了其实践。生成式人工智能(AI),如 ChatGPT,为加强临床决策支持带来了希望。有效的性能指标对于评估这些模型的准确性及其产生错误或幻觉信息的倾向至关重要。我们使用新颖的生成式人工智能性能评分(G-PS)评估了不同版本的 ChatGPT 生成准确的下一代测序报告和 NSCLC 治疗建议的能力,该评分考虑了准确性、相关性和幻觉:我们查询了 ChatGPT 版本,以获得美国食品和药物管理局批准的靶向疗法一线 NSCLC 治疗建议,并针对八种癌基因采用了零击提示方法。根据美国国家综合癌症网络(NCCN)指南对回复的准确性、相关性和幻觉进行评估,G-PS 计算的分数从-1(所有幻觉)到 1(完全符合 NCCN 的建议)不等。G-PS 被设计为一种综合测量方法,其中正确建议为基础分(根据首选治疗方法加权),幻觉为惩罚分:结果:分析了 160 个回复,生成式预训练转换器 (GPT)-4 的表现优于 GPT-3.5,显示出更高的基础分(90% 对 60%;P < .01)和更少的幻觉(34% 对 53%;P < .01)。GPT-4 的总体 G-PS 显著更高(0.34 v -0.15; P < .01),表明其性能更优:本研究强调了生成式人工智能在精准肿瘤学中将治疗建议与生物标志物相匹配方面的快速改进。虽然 GPT-4 模型的幻觉率有所改善,但未来在临床护理中使用生成式人工智能需要高水平的准确性,同时尽量减少或消除幻觉。与国家指南相比,GP-S代表了一种量化生成式人工智能在医疗保健中的实用性的新指标,具有超越精准肿瘤学的潜在适应性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
6.20
自引率
4.80%
发文量
190
期刊最新文献
CFO: Calibration-Free Odds Bayesian Designs for Dose Finding in Clinical Trials. Patient-Reported Outcomes: Comparing Functional Avoidance and Standard Thoracic Radiation Therapy in Lung Cancer. Advancements in Interoperability: Achieving Anatomic Pathology Reports That Adhere to International Standards and Are Both Human-Readable and Readily Computable. Incorporating Structured and Unstructured Data Sources to Identify and Characterize Hereditary Cancer Testing Among Veterans With Metastatic Castration-Resistant Prostate Cancer. Leveraging Radiotherapy Data for Precision Oncology: Veterans Affairs Granular Radiotherapy Information Database.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1