Artificial Intelligence, the ChatGPT Large Language Model: Assessing the Accuracy of Responses to the Gynaecological Endoscopic Surgical Education and Assessment (GESEA) Level 1-2 knowledge tests.

IF 1.7 Q3 OBSTETRICS & GYNECOLOGY Facts Views and Vision in ObGyn Pub Date : 2024-12-01 DOI:10.52054/FVVO.16.4.052

M Pavone, L Palmieri, N Bizzarri, A Rosati, F Campolo, C Innocenzi, C Taliento, S Restaino, U Catena, G Vizzielli, C Akladios, M M Ianieri, J Marescaux, R Campo, F Fanfani, G Scambia

{"title":"Artificial Intelligence, the ChatGPT Large Language Model: Assessing the Accuracy of Responses to the Gynaecological Endoscopic Surgical Education and Assessment (GESEA) Level 1-2 knowledge tests.","authors":"M Pavone, L Palmieri, N Bizzarri, A Rosati, F Campolo, C Innocenzi, C Taliento, S Restaino, U Catena, G Vizzielli, C Akladios, M M Ianieri, J Marescaux, R Campo, F Fanfani, G Scambia","doi":"10.52054/FVVO.16.4.052","DOIUrl":null,"url":null,"abstract":"Background: In 2022, OpenAI launched ChatGPT 3.5, which is now widely used in medical education, training, and research. Despite its valuable use for the generation of information, concerns persist about its authenticity and accuracy. Its undisclosed information source and outdated dataset pose risks of misinformation. Although it is widely used, AI-generated text inaccuracies raise doubts about its reliability. The ethical use of such technologies is crucial to uphold scientific accuracy in research.Objective: This study aimed to assess the accuracy of ChatGPT in doing GESEA tests 1 and 2.Materials and methods: The 100 multiple-choice theoretical questions from GESEA certifications 1 and 2 were presented to ChatGPT, requesting the selection of the correct answer along with an explanation. Expert gynaecologists evaluated and graded the explanations for accuracy.Main outcome measures: ChatGPT showed a 59% accuracy in responses, with 64% providing comprehensive explanations. It performed better in GESEA Level 1 (64% accuracy) than in GESEA Level 2 (54% accuracy) questions.Conclusions: ChatGPT is a versatile tool in medicine and research, offering knowledge, information, and promoting evidence-based practice. Despite its widespread use, its accuracy has not been validated yet. This study found a 59% correct response rate, highlighting the need for accuracy validation and ethical use considerations. Future research should investigate ChatGPT's truthfulness in subspecialty fields such as gynaecologic oncology and compare different versions of chatbot for continuous improvement.What is new?: Artificial intelligence (AI) has a great potential in scientific research. However, the validity of outputs remains unverified. This study aims to evaluate the accuracy of responses generated by ChatGPT to enhance the critical use of this tool.","PeriodicalId":46400,"journal":{"name":"Facts Views and Vision in ObGyn","volume":"16 4","pages":"449-456"},"PeriodicalIF":1.7000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11819790/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Facts Views and Vision in ObGyn","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.52054/FVVO.16.4.052","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"OBSTETRICS & GYNECOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: In 2022, OpenAI launched ChatGPT 3.5, which is now widely used in medical education, training, and research. Despite its valuable use for the generation of information, concerns persist about its authenticity and accuracy. Its undisclosed information source and outdated dataset pose risks of misinformation. Although it is widely used, AI-generated text inaccuracies raise doubts about its reliability. The ethical use of such technologies is crucial to uphold scientific accuracy in research.

Objective: This study aimed to assess the accuracy of ChatGPT in doing GESEA tests 1 and 2.

Materials and methods: The 100 multiple-choice theoretical questions from GESEA certifications 1 and 2 were presented to ChatGPT, requesting the selection of the correct answer along with an explanation. Expert gynaecologists evaluated and graded the explanations for accuracy.

Main outcome measures: ChatGPT showed a 59% accuracy in responses, with 64% providing comprehensive explanations. It performed better in GESEA Level 1 (64% accuracy) than in GESEA Level 2 (54% accuracy) questions.

Conclusions: ChatGPT is a versatile tool in medicine and research, offering knowledge, information, and promoting evidence-based practice. Despite its widespread use, its accuracy has not been validated yet. This study found a 59% correct response rate, highlighting the need for accuracy validation and ethical use considerations. Future research should investigate ChatGPT's truthfulness in subspecialty fields such as gynaecologic oncology and compare different versions of chatbot for continuous improvement.

What is new?: Artificial intelligence (AI) has a great potential in scientific research. However, the validity of outputs remains unverified. This study aims to evaluate the accuracy of responses generated by ChatGPT to enhance the critical use of this tool.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

人工智能，ChatGPT大语言模型：评估对妇科内镜手术教育和评估（GESEA） 1-2级知识测试的反应的准确性。

背景：2022年，OpenAI推出了ChatGPT 3.5，目前广泛应用于医学教育、培训和研究。尽管它在信息生成方面有重要作用，但人们对其真实性和准确性的担忧仍然存在。其未公开的信息源和过时的数据集构成了错误信息的风险。虽然它被广泛使用，但人工智能生成的文本不准确引起了人们对其可靠性的质疑。这些技术的伦理使用对于维护研究中的科学准确性至关重要。目的：评价ChatGPT在GESEA 1、2试验中的准确性。材料和方法：将GESEA认证1和认证2的100道选择题交给ChatGPT，请选择正确答案并给出解释。妇科专家对这些解释的准确性进行了评估和评分。主要结果测量：ChatGPT的回答准确率为59%，其中64%提供了全面的解释。它在GESEA Level 1（准确率64%）中的表现优于在GESEA Level 2（准确率54%）中的表现。结论：ChatGPT是医学和研究中的一个多功能工具，提供知识、信息并促进循证实践。尽管它被广泛使用，但其准确性尚未得到验证。该研究发现59%的正确反应率，突出了准确性验证和伦理使用考虑的必要性。未来的研究应该调查ChatGPT在妇科肿瘤学等亚专科领域的真实性，并比较不同版本的聊天机器人，以不断改进。有什么新鲜事吗？人工智能（AI）在科学研究中具有巨大的潜力。但是，产出的有效性仍未得到核实。本研究旨在评估ChatGPT生成的响应的准确性，以提高该工具的关键使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Facts Views and Vision in ObGyn OBSTETRICS & GYNECOLOGY-

自引率

15.00%

发文量