The Double-Edged Sword of Generative AI: Surpassing an Expert or a Deceptive "False Friend"?

IF 4.9 1区 医学 Q1 CLINICAL NEUROLOGY Spine Journal Pub Date : 2025-03-04 DOI:10.1016/j.spinee.2025.02.010
Franziska C S Altorfer, Michael J Kelly, Fedan Avrumova, Varun Rohatgi, Jiaqi Zhu, Christopher M Bono, Darren R Lebl
{"title":"The Double-Edged Sword of Generative AI: Surpassing an Expert or a Deceptive \"False Friend\"?","authors":"Franziska C S Altorfer, Michael J Kelly, Fedan Avrumova, Varun Rohatgi, Jiaqi Zhu, Christopher M Bono, Darren R Lebl","doi":"10.1016/j.spinee.2025.02.010","DOIUrl":null,"url":null,"abstract":"<p><strong>Background context: </strong>Generative artificial intelligence (AI), ChatGPT being the most popular example, has been extensively assessed for its capability to respond to medical questions, such as queries in spine treatment approaches or technological advances. However, it often lacks scientific foundation or fabricates inauthentic references, also known as AI hallucinations.</p><p><strong>Purpose: </strong>To develop an understanding of the scientific basis of generative AI tools by studying the authenticity of references and reliability in comparison to the alignment of responses of evidence-based guidelines.</p><p><strong>Study design: </strong>Comparative Study METHODS: Thirty-three previously published North American Spine Society (NASS) guideline questions were posed as prompts to two freely available generative AI tools (Tools I and II). The responses were scored for correctness compared with the published NASS guideline responses using a five-point \"alignment score.\" Furthermore, all cited references were evaluated for authenticity, source type, year of publication, and inclusion in the scientific guidelines.</p><p><strong>Results: </strong>Both tools' responses to guideline questions achieved an overall score of 3.5±1.1, which is considered acceptable to be equivalent to the guideline. Both tools generated 254 references to support their responses, of which 76.0% (n = 193) were authentic and 24.0% (n = 61) were fabricated. From these, authentic references were: peer-reviewed scientific research papers (147, 76.2%), guidelines (16, 8.3%), educational websites (9, 4.7%), books (9, 4.7%), a government website (1, 0.5%), insurance websites (6, 3.1%) and newspaper websites (5, 2.6%). Claude referenced significantly more authentic peer-reviewed scientific papers (Claude: n = 111, 91.0%; Gemini: n = 36, 50.7%; p< 0.001). The year of publication amongst all references ranged from 1988-2023, with significantly older references provided by Claude (Claude: 2008±6; Gemini: 2014±6; p< 0.001). Lastly, significantly more references provided by Claude were also referenced in the published NASS guidelines (Claude: n = 27, 24.3%; Gemini: n = 1, 2.8%; p = 0.04).</p><p><strong>Conclusions: </strong>Both generative AI tools provided responses that had acceptable alignment with NASS evidence-based guideline recommendations and offered references, though nearly a quarter of the references were inauthentic or non-scientific sources. This deficiency of legitimate scientific references does not meet standards for clinical implementation. Considering this limitation, caution should be exercised when applying the output of generative AI tools to clinical applications.</p>","PeriodicalId":49484,"journal":{"name":"Spine Journal","volume":" ","pages":""},"PeriodicalIF":4.9000,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Spine Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.spinee.2025.02.010","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background context: Generative artificial intelligence (AI), ChatGPT being the most popular example, has been extensively assessed for its capability to respond to medical questions, such as queries in spine treatment approaches or technological advances. However, it often lacks scientific foundation or fabricates inauthentic references, also known as AI hallucinations.

Purpose: To develop an understanding of the scientific basis of generative AI tools by studying the authenticity of references and reliability in comparison to the alignment of responses of evidence-based guidelines.

Study design: Comparative Study METHODS: Thirty-three previously published North American Spine Society (NASS) guideline questions were posed as prompts to two freely available generative AI tools (Tools I and II). The responses were scored for correctness compared with the published NASS guideline responses using a five-point "alignment score." Furthermore, all cited references were evaluated for authenticity, source type, year of publication, and inclusion in the scientific guidelines.

Results: Both tools' responses to guideline questions achieved an overall score of 3.5±1.1, which is considered acceptable to be equivalent to the guideline. Both tools generated 254 references to support their responses, of which 76.0% (n = 193) were authentic and 24.0% (n = 61) were fabricated. From these, authentic references were: peer-reviewed scientific research papers (147, 76.2%), guidelines (16, 8.3%), educational websites (9, 4.7%), books (9, 4.7%), a government website (1, 0.5%), insurance websites (6, 3.1%) and newspaper websites (5, 2.6%). Claude referenced significantly more authentic peer-reviewed scientific papers (Claude: n = 111, 91.0%; Gemini: n = 36, 50.7%; p< 0.001). The year of publication amongst all references ranged from 1988-2023, with significantly older references provided by Claude (Claude: 2008±6; Gemini: 2014±6; p< 0.001). Lastly, significantly more references provided by Claude were also referenced in the published NASS guidelines (Claude: n = 27, 24.3%; Gemini: n = 1, 2.8%; p = 0.04).

Conclusions: Both generative AI tools provided responses that had acceptable alignment with NASS evidence-based guideline recommendations and offered references, though nearly a quarter of the references were inauthentic or non-scientific sources. This deficiency of legitimate scientific references does not meet standards for clinical implementation. Considering this limitation, caution should be exercised when applying the output of generative AI tools to clinical applications.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
求助全文
约1分钟内获得全文 去求助
来源期刊
Spine Journal
Spine Journal 医学-临床神经学
CiteScore
8.20
自引率
6.70%
发文量
680
审稿时长
13.1 weeks
期刊介绍: The Spine Journal, the official journal of the North American Spine Society, is an international and multidisciplinary journal that publishes original, peer-reviewed articles on research and treatment related to the spine and spine care, including basic science and clinical investigations. It is a condition of publication that manuscripts submitted to The Spine Journal have not been published, and will not be simultaneously submitted or published elsewhere. The Spine Journal also publishes major reviews of specific topics by acknowledged authorities, technical notes, teaching editorials, and other special features, Letters to the Editor-in-Chief are encouraged.
期刊最新文献
Appropriate Use Criteria for Osteoporotic Compression Fractures. The Double-Edged Sword of Generative AI: Surpassing an Expert or a Deceptive "False Friend"? The cost-effectiveness of physical therapy versus laminectomy for lumbar spinal stenosis: A Markov decision analysis. The New Era of Cost Analysis in Spine Surgery Utilizing Time-Driven Activity Based Costing: A Systematic Review and Introduction of an Enabling Technology Value Index. Endoscopic Spine Surgery: Are We Finally There? A Meta-Analysis of Its Effectiveness Against Conventional and Tubular Microdiscectomy.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1