人工智能与人情味:人工智能能否准确生成激光技术文献综述?

IF 2.8 2区 医学 Q2 UROLOGY & NEPHROLOGY World Journal of Urology Pub Date : 2024-10-28 DOI:10.1007/s00345-024-05311-8
Frédéric Panthier, Hugh Crawford-Smith, Eduarda Alvarez, Alberto Melchionna, Daniela Velinova, Ikran Mohamed, Siobhan Price, Simon Choong, Vimoshan Arumuham, Sian Allen, Olivier Traxer, Daron Smith
{"title":"人工智能与人情味:人工智能能否准确生成激光技术文献综述?","authors":"Frédéric Panthier, Hugh Crawford-Smith, Eduarda Alvarez, Alberto Melchionna, Daniela Velinova, Ikran Mohamed, Siobhan Price, Simon Choong, Vimoshan Arumuham, Sian Allen, Olivier Traxer, Daron Smith","doi":"10.1007/s00345-024-05311-8","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>To compare the accuracy of open-source Artificial Intelligence (AI) Large Language Models (LLM) against human authors to generate a systematic review (SR) on the new pulsed-Thulium:YAG (p-Tm:YAG) laser.</p><p><strong>Methods: </strong>Five manuscripts were compared. The Human-SR on p-Tm:YAG (considered to be the \"ground truth\") was written by independent certified endourologists with expertise in lasers, accepted in a peer-review pubmed-indexed journal (but not yet available online, and therefore not accessible to the LLMs). The query to the AI LLMs was: \"write a systematic review on pulsed-Thulium:YAG laser for lithotripsy\" which was submitted to four LLMs (ChatGPT3.5/Vercel/Claude/Mistral-7b). The LLM-SR were uniformed and Human-SR reformatted to fit the general output appearance, to ensure blindness. Nine participants with various levels of endourological expertise (three Clinical Nurse Specialist nurses, Urology Trainees and Consultants) objectively assessed the accuracy of the five SRs using a bespoke 10 \"checkpoint\" proforma. A subjective assessment was recorded using a composite score including quality (0-10), clarity (0-10) and overall manuscript rank (1-5).</p><p><strong>Results: </strong>The Human-SR was objectively and subjectively more accurate than LLM-SRs (96 ± 7% and 86.8 ± 8.2% respectively; p < 0.001). The LLM-SRs did not significantly differ but ChatGPT3.5 presented greater subjective and objective accuracy scores (62.4 ± 15% and 29 ± 28% respectively; p > 0.05). Quality and clarity assessments were significantly impacted by SR type but not the expertise level (p < 0.001 and > 0.05, respectively).</p><p><strong>Conclusions: </strong>LLM generated data on highly technical topics present a lower accuracy than Key Opinion Leaders. LLMs, especially ChatGPT3.5, with human supervision could improve our practice.</p>","PeriodicalId":23954,"journal":{"name":"World Journal of Urology","volume":null,"pages":null},"PeriodicalIF":2.8000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Artificial intelligence versus human touch: can artificial intelligence accurately generate a literature review on laser technologies?\",\"authors\":\"Frédéric Panthier, Hugh Crawford-Smith, Eduarda Alvarez, Alberto Melchionna, Daniela Velinova, Ikran Mohamed, Siobhan Price, Simon Choong, Vimoshan Arumuham, Sian Allen, Olivier Traxer, Daron Smith\",\"doi\":\"10.1007/s00345-024-05311-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>To compare the accuracy of open-source Artificial Intelligence (AI) Large Language Models (LLM) against human authors to generate a systematic review (SR) on the new pulsed-Thulium:YAG (p-Tm:YAG) laser.</p><p><strong>Methods: </strong>Five manuscripts were compared. The Human-SR on p-Tm:YAG (considered to be the \\\"ground truth\\\") was written by independent certified endourologists with expertise in lasers, accepted in a peer-review pubmed-indexed journal (but not yet available online, and therefore not accessible to the LLMs). The query to the AI LLMs was: \\\"write a systematic review on pulsed-Thulium:YAG laser for lithotripsy\\\" which was submitted to four LLMs (ChatGPT3.5/Vercel/Claude/Mistral-7b). The LLM-SR were uniformed and Human-SR reformatted to fit the general output appearance, to ensure blindness. Nine participants with various levels of endourological expertise (three Clinical Nurse Specialist nurses, Urology Trainees and Consultants) objectively assessed the accuracy of the five SRs using a bespoke 10 \\\"checkpoint\\\" proforma. A subjective assessment was recorded using a composite score including quality (0-10), clarity (0-10) and overall manuscript rank (1-5).</p><p><strong>Results: </strong>The Human-SR was objectively and subjectively more accurate than LLM-SRs (96 ± 7% and 86.8 ± 8.2% respectively; p < 0.001). The LLM-SRs did not significantly differ but ChatGPT3.5 presented greater subjective and objective accuracy scores (62.4 ± 15% and 29 ± 28% respectively; p > 0.05). Quality and clarity assessments were significantly impacted by SR type but not the expertise level (p < 0.001 and > 0.05, respectively).</p><p><strong>Conclusions: </strong>LLM generated data on highly technical topics present a lower accuracy than Key Opinion Leaders. LLMs, especially ChatGPT3.5, with human supervision could improve our practice.</p>\",\"PeriodicalId\":23954,\"journal\":{\"name\":\"World Journal of Urology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"World Journal of Urology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1007/s00345-024-05311-8\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"UROLOGY & NEPHROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"World Journal of Urology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00345-024-05311-8","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"UROLOGY & NEPHROLOGY","Score":null,"Total":0}
引用次数: 0

摘要

目的:比较开源人工智能(AI)大语言模型(LLM)与人类作者就新型脉冲铥:YAG(p-Tm:YAG)激光生成系统综述(SR)的准确性:比较了五篇稿件。关于p-Tm:YAG的人类系统综述(被视为 "基本事实")由具有激光专业知识的独立认证内科医师撰写,已被同行评议的pubmed索引期刊接受(但尚未在线提供,因此LLM无法访问)。对人工智能 LLM 的要求是"撰写一篇关于脉冲铥:YAG 激光碎石的系统综述",该综述已提交给四位 LLM(ChatGPT3.5/Vercel/Claude/Mistral-7b)。对 LLM-SR 进行了统一,并对 Human-SR 进行了重新格式化,使其符合一般输出外观,以确保盲目性。九名具有不同程度内科专业知识的参与者(三名临床专科护士、泌尿科受训人员和顾问)使用定制的 10 个 "检查点 "表格对五个 SR 的准确性进行了客观评估。主观评估采用综合评分记录,包括质量(0-10 分)、清晰度(0-10 分)和稿件总体排名(1-5 分):结果:从客观和主观上看,Human-SR 比 LLM-SR 更准确(分别为 96 ± 7% 和 86.8 ± 8.2%;P 0.05)。质量和清晰度评估受到 SR 类型的显著影响,但不受专业知识水平的影响(分别为 p 0.05):结论:LLM 生成的关于高技术主题的数据准确性低于关键意见领袖。LLM,尤其是 ChatGPT3.5,在人工监督下可以改进我们的实践。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Artificial intelligence versus human touch: can artificial intelligence accurately generate a literature review on laser technologies?

Purpose: To compare the accuracy of open-source Artificial Intelligence (AI) Large Language Models (LLM) against human authors to generate a systematic review (SR) on the new pulsed-Thulium:YAG (p-Tm:YAG) laser.

Methods: Five manuscripts were compared. The Human-SR on p-Tm:YAG (considered to be the "ground truth") was written by independent certified endourologists with expertise in lasers, accepted in a peer-review pubmed-indexed journal (but not yet available online, and therefore not accessible to the LLMs). The query to the AI LLMs was: "write a systematic review on pulsed-Thulium:YAG laser for lithotripsy" which was submitted to four LLMs (ChatGPT3.5/Vercel/Claude/Mistral-7b). The LLM-SR were uniformed and Human-SR reformatted to fit the general output appearance, to ensure blindness. Nine participants with various levels of endourological expertise (three Clinical Nurse Specialist nurses, Urology Trainees and Consultants) objectively assessed the accuracy of the five SRs using a bespoke 10 "checkpoint" proforma. A subjective assessment was recorded using a composite score including quality (0-10), clarity (0-10) and overall manuscript rank (1-5).

Results: The Human-SR was objectively and subjectively more accurate than LLM-SRs (96 ± 7% and 86.8 ± 8.2% respectively; p < 0.001). The LLM-SRs did not significantly differ but ChatGPT3.5 presented greater subjective and objective accuracy scores (62.4 ± 15% and 29 ± 28% respectively; p > 0.05). Quality and clarity assessments were significantly impacted by SR type but not the expertise level (p < 0.001 and > 0.05, respectively).

Conclusions: LLM generated data on highly technical topics present a lower accuracy than Key Opinion Leaders. LLMs, especially ChatGPT3.5, with human supervision could improve our practice.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
World Journal of Urology
World Journal of Urology 医学-泌尿学与肾脏学
CiteScore
6.80
自引率
8.80%
发文量
317
审稿时长
4-8 weeks
期刊介绍: The WORLD JOURNAL OF UROLOGY conveys regularly the essential results of urological research and their practical and clinical relevance to a broad audience of urologists in research and clinical practice. In order to guarantee a balanced program, articles are published to reflect the developments in all fields of urology on an internationally advanced level. Each issue treats a main topic in review articles of invited international experts. Free papers are unrelated articles to the main topic.
期刊最新文献
The outcomes of robot-assisted surgery in the treatment of neurogenic lower urinary tract dysfunctions: a systematic review and meta-analysis. The role of re-transurethral resection of bladder tumor in patients with TaHG non muscle invasive bladder cancer. Ideal cystoscopic interval after nephroureterectomy in patients with upper tract urothelial carcinoma. Is flexible and navigable suction ureteral access sheath (FANS-UAS) the next best development for retrograde intrarenal surgery in children? Results of a prospective multicentre study. Letter to the editor for the article "The impact of non-structured PSA testing on prostate cancer-specific mortality on New Zealand Māori men".
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1