Artificial Intelligence, the ChatGPT Large Language Model: Assessing the Accuracy of Responses to the Gynaecological Endoscopic Surgical Education and Assessment (GESEA) Level 1-2 knowledge tests.

IF 1.7 Q3 OBSTETRICS & GYNECOLOGY Facts Views and Vision in ObGyn Pub Date : 2024-12-01 DOI:10.52054/FVVO.16.4.052
M Pavone, L Palmieri, N Bizzarri, A Rosati, F Campolo, C Innocenzi, C Taliento, S Restaino, U Catena, G Vizzielli, C Akladios, M M Ianieri, J Marescaux, R Campo, F Fanfani, G Scambia
{"title":"Artificial Intelligence, the ChatGPT Large Language Model: Assessing the Accuracy of Responses to the Gynaecological Endoscopic Surgical Education and Assessment (GESEA) Level 1-2 knowledge tests.","authors":"M Pavone, L Palmieri, N Bizzarri, A Rosati, F Campolo, C Innocenzi, C Taliento, S Restaino, U Catena, G Vizzielli, C Akladios, M M Ianieri, J Marescaux, R Campo, F Fanfani, G Scambia","doi":"10.52054/FVVO.16.4.052","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>In 2022, OpenAI launched ChatGPT 3.5, which is now widely used in medical education, training, and research. Despite its valuable use for the generation of information, concerns persist about its authenticity and accuracy. Its undisclosed information source and outdated dataset pose risks of misinformation. Although it is widely used, AI-generated text inaccuracies raise doubts about its reliability. The ethical use of such technologies is crucial to uphold scientific accuracy in research.</p><p><strong>Objective: </strong>This study aimed to assess the accuracy of ChatGPT in doing GESEA tests 1 and 2.</p><p><strong>Materials and methods: </strong>The 100 multiple-choice theoretical questions from GESEA certifications 1 and 2 were presented to ChatGPT, requesting the selection of the correct answer along with an explanation. Expert gynaecologists evaluated and graded the explanations for accuracy.</p><p><strong>Main outcome measures: </strong>ChatGPT showed a 59% accuracy in responses, with 64% providing comprehensive explanations. It performed better in GESEA Level 1 (64% accuracy) than in GESEA Level 2 (54% accuracy) questions.</p><p><strong>Conclusions: </strong>ChatGPT is a versatile tool in medicine and research, offering knowledge, information, and promoting evidence-based practice. Despite its widespread use, its accuracy has not been validated yet. This study found a 59% correct response rate, highlighting the need for accuracy validation and ethical use considerations. Future research should investigate ChatGPT's truthfulness in subspecialty fields such as gynaecologic oncology and compare different versions of chatbot for continuous improvement.</p><p><strong>What is new?: </strong>Artificial intelligence (AI) has a great potential in scientific research. However, the validity of outputs remains unverified. This study aims to evaluate the accuracy of responses generated by ChatGPT to enhance the critical use of this tool.</p>","PeriodicalId":46400,"journal":{"name":"Facts Views and Vision in ObGyn","volume":"16 4","pages":"449-456"},"PeriodicalIF":1.7000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Facts Views and Vision in ObGyn","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.52054/FVVO.16.4.052","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"OBSTETRICS & GYNECOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: In 2022, OpenAI launched ChatGPT 3.5, which is now widely used in medical education, training, and research. Despite its valuable use for the generation of information, concerns persist about its authenticity and accuracy. Its undisclosed information source and outdated dataset pose risks of misinformation. Although it is widely used, AI-generated text inaccuracies raise doubts about its reliability. The ethical use of such technologies is crucial to uphold scientific accuracy in research.

Objective: This study aimed to assess the accuracy of ChatGPT in doing GESEA tests 1 and 2.

Materials and methods: The 100 multiple-choice theoretical questions from GESEA certifications 1 and 2 were presented to ChatGPT, requesting the selection of the correct answer along with an explanation. Expert gynaecologists evaluated and graded the explanations for accuracy.

Main outcome measures: ChatGPT showed a 59% accuracy in responses, with 64% providing comprehensive explanations. It performed better in GESEA Level 1 (64% accuracy) than in GESEA Level 2 (54% accuracy) questions.

Conclusions: ChatGPT is a versatile tool in medicine and research, offering knowledge, information, and promoting evidence-based practice. Despite its widespread use, its accuracy has not been validated yet. This study found a 59% correct response rate, highlighting the need for accuracy validation and ethical use considerations. Future research should investigate ChatGPT's truthfulness in subspecialty fields such as gynaecologic oncology and compare different versions of chatbot for continuous improvement.

What is new?: Artificial intelligence (AI) has a great potential in scientific research. However, the validity of outputs remains unverified. This study aims to evaluate the accuracy of responses generated by ChatGPT to enhance the critical use of this tool.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
人工智能,ChatGPT大语言模型:评估对妇科内镜手术教育和评估(GESEA) 1-2级知识测试的反应的准确性。
背景:2022年,OpenAI推出了ChatGPT 3.5,目前广泛应用于医学教育、培训和研究。尽管它在信息生成方面有重要作用,但人们对其真实性和准确性的担忧仍然存在。其未公开的信息源和过时的数据集构成了错误信息的风险。虽然它被广泛使用,但人工智能生成的文本不准确引起了人们对其可靠性的质疑。这些技术的伦理使用对于维护研究中的科学准确性至关重要。目的:评价ChatGPT在GESEA 1、2试验中的准确性。材料和方法:将GESEA认证1和认证2的100道选择题交给ChatGPT,请选择正确答案并给出解释。妇科专家对这些解释的准确性进行了评估和评分。主要结果测量:ChatGPT的回答准确率为59%,其中64%提供了全面的解释。它在GESEA Level 1(准确率64%)中的表现优于在GESEA Level 2(准确率54%)中的表现。结论:ChatGPT是医学和研究中的一个多功能工具,提供知识、信息并促进循证实践。尽管它被广泛使用,但其准确性尚未得到验证。该研究发现59%的正确反应率,突出了准确性验证和伦理使用考虑的必要性。未来的研究应该调查ChatGPT在妇科肿瘤学等亚专科领域的真实性,并比较不同版本的聊天机器人,以不断改进。有什么新鲜事吗?人工智能(AI)在科学研究中具有巨大的潜力。但是,产出的有效性仍未得到核实。本研究旨在评估ChatGPT生成的响应的准确性,以提高该工具的关键使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Facts Views and Vision in ObGyn
Facts Views and Vision in ObGyn OBSTETRICS & GYNECOLOGY-
自引率
15.00%
发文量
59
期刊最新文献
Artificial Intelligence, the ChatGPT Large Language Model: Assessing the Accuracy of Responses to the Gynaecological Endoscopic Surgical Education and Assessment (GESEA) Level 1-2 knowledge tests. Comparison between learning curves of robot-assisted and laparoscopic surgery in gynaecology: a systematic review. Complete bicorporeal uterus, double cervix, longitudinal obstructing vaginal septum: an integrated approach for one-stop diagnosis and ultrasound-guided endoscopic hymen-sparing treatment. Complications of electrosurgery: mechanisms and prevention strategies. European Society for Gynaecological Endoscopy (ESGE) Good Practice Recommendations on surgical techniques for Removal of Fibroids: Part 2 Hysteroscopic Myomectomy.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1