测试谷歌 DeepMind 的能力:双子座与 ChatGPT 4 面对欧洲眼科考试

Matteo Mario Carlà , Federico Giannuzzi , Francesco Boselli , Stanislao Rizzo
{"title":"测试谷歌 DeepMind 的能力:双子座与 ChatGPT 4 面对欧洲眼科考试","authors":"Matteo Mario Carlà ,&nbsp;Federico Giannuzzi ,&nbsp;Francesco Boselli ,&nbsp;Stanislao Rizzo","doi":"10.1016/j.ajoint.2024.100063","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><p>The aim of this study was to compare the performances of Google Gemini and ChatGPT-4, facing a triple simulation of the European Board of Ophthalmologists (EBO) multiple choices exam.</p></div><div><h3>Design</h3><p>Observational study.</p></div><div><h3>Methods</h3><p>The EBO multiple choice examination consists of 52 questions followed by 5 statements each, for a total of 260 answers. Statements may be answered with “True”, “False” or “Don't Know”: a correct answer is awarded 1 point; an incorrect is penalized 0.5 points; “don't know” scores 0 points. At least 60 % correct answers are needed to pass the exam. After explaining the rules to the chatbots, he entire question with the 5 statements was input. The rate of correct answers and the final score were collected. The exam simulation was repeated 3 times with randomly generated questions.</p></div><div><h3>Results</h3><p>Google Gemini and ChatGPT-4 succeed in EBO exam simulations in all 3 cases, with an average 85.3 ± 3.1 % and 83.3 ± 2.4 % of correct answers. Gemini had a lower error rate compared to ChatGPT (6.7 ± 1.5 % vs. 13.0 ± 2.6 %, <em>p</em> = 0.03), but answered “Don't know” more frequently (8.0 ± 2.7 % vs. 3.7 ± 1.5 %, <em>p</em> = 0.05). Both chatbots scored at least 70 % of correct answers in each exam subspecialty across the 3 simulations. Converting the percentages into points, Gemini scored 213.5 ± 9.3 points on average, compared to 199.8 ± 7.1 points for ChatGPT (<em>p</em> = 0.21).</p></div><div><h3>Conclusions</h3><p>Google Gemini and ChatGPT-4 can both succeed in a complex ophthalmology examination on widespread topics, with higher accuracy compared to their former versions, highlighting their evolving importance in educational and informative setting.</p></div><div><h3>Precis</h3><p>Google Gemini and ChatGPT-4 were both able to succeed in 3 consecutive exam simulations of the European Board of Ophthalmologists with an average of 85 % and 83 % correct answers, respectively. Google Gemini showed significantly less errors when compared to ChatGPT.</p></div>","PeriodicalId":100071,"journal":{"name":"AJO International","volume":"1 3","pages":"Article 100063"},"PeriodicalIF":0.0000,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2950253524000637/pdfft?md5=4e9c209c1a98a7ea76ea9d21c9040f92&pid=1-s2.0-S2950253524000637-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Testing the power of Google DeepMind: Gemini versus ChatGPT 4 facing a European ophthalmology examination\",\"authors\":\"Matteo Mario Carlà ,&nbsp;Federico Giannuzzi ,&nbsp;Francesco Boselli ,&nbsp;Stanislao Rizzo\",\"doi\":\"10.1016/j.ajoint.2024.100063\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Purpose</h3><p>The aim of this study was to compare the performances of Google Gemini and ChatGPT-4, facing a triple simulation of the European Board of Ophthalmologists (EBO) multiple choices exam.</p></div><div><h3>Design</h3><p>Observational study.</p></div><div><h3>Methods</h3><p>The EBO multiple choice examination consists of 52 questions followed by 5 statements each, for a total of 260 answers. Statements may be answered with “True”, “False” or “Don't Know”: a correct answer is awarded 1 point; an incorrect is penalized 0.5 points; “don't know” scores 0 points. At least 60 % correct answers are needed to pass the exam. After explaining the rules to the chatbots, he entire question with the 5 statements was input. The rate of correct answers and the final score were collected. The exam simulation was repeated 3 times with randomly generated questions.</p></div><div><h3>Results</h3><p>Google Gemini and ChatGPT-4 succeed in EBO exam simulations in all 3 cases, with an average 85.3 ± 3.1 % and 83.3 ± 2.4 % of correct answers. Gemini had a lower error rate compared to ChatGPT (6.7 ± 1.5 % vs. 13.0 ± 2.6 %, <em>p</em> = 0.03), but answered “Don't know” more frequently (8.0 ± 2.7 % vs. 3.7 ± 1.5 %, <em>p</em> = 0.05). Both chatbots scored at least 70 % of correct answers in each exam subspecialty across the 3 simulations. Converting the percentages into points, Gemini scored 213.5 ± 9.3 points on average, compared to 199.8 ± 7.1 points for ChatGPT (<em>p</em> = 0.21).</p></div><div><h3>Conclusions</h3><p>Google Gemini and ChatGPT-4 can both succeed in a complex ophthalmology examination on widespread topics, with higher accuracy compared to their former versions, highlighting their evolving importance in educational and informative setting.</p></div><div><h3>Precis</h3><p>Google Gemini and ChatGPT-4 were both able to succeed in 3 consecutive exam simulations of the European Board of Ophthalmologists with an average of 85 % and 83 % correct answers, respectively. Google Gemini showed significantly less errors when compared to ChatGPT.</p></div>\",\"PeriodicalId\":100071,\"journal\":{\"name\":\"AJO International\",\"volume\":\"1 3\",\"pages\":\"Article 100063\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2950253524000637/pdfft?md5=4e9c209c1a98a7ea76ea9d21c9040f92&pid=1-s2.0-S2950253524000637-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AJO International\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2950253524000637\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AJO International","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2950253524000637","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本研究旨在比较 Google Gemini 和 ChatGPT-4 在面对欧洲眼科医师委员会(EBO)多项选择考试的三重模拟考试时的表现。答案可以是 "真"、"假 "或 "不知道":答对得 1 分,答错得 0.5 分,"不知道 "得 0 分。至少要答对 60% 才能通过考试。向聊天机器人解释规则后,输入了包含 5 个语句的整个问题。收集正确答案率和最终得分。结果谷歌双子座和 ChatGPT-4 在 EBO 考试模拟中均取得了成功,平均正确率分别为 85.3 ± 3.1 % 和 83.3 ± 2.4 %。与 ChatGPT 相比,Gemini 的错误率更低(6.7 ± 1.5 % vs. 13.0 ± 2.6 %,p = 0.03),但回答 "不知道 "的频率更高(8.0 ± 2.7 % vs. 3.7 ± 1.5 %,p = 0.05)。在 3 次模拟中,两个聊天机器人在每个考试子专业中的正确率都至少达到 70%。结论Google Gemini 和 ChatGPT-4 都能在复杂的眼科考试中成功应对广泛的题目,与以前的版本相比准确率更高,突出了它们在教育和信息环境中不断发展的重要性。准确性谷歌双子座和 ChatGPT-4 都能在欧洲眼科医师委员会的 3 次连续模拟考试中取得成功,平均正确率分别为 85% 和 83%。与 ChatGPT 相比,Google Gemini 的错误率明显较低。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Testing the power of Google DeepMind: Gemini versus ChatGPT 4 facing a European ophthalmology examination

Purpose

The aim of this study was to compare the performances of Google Gemini and ChatGPT-4, facing a triple simulation of the European Board of Ophthalmologists (EBO) multiple choices exam.

Design

Observational study.

Methods

The EBO multiple choice examination consists of 52 questions followed by 5 statements each, for a total of 260 answers. Statements may be answered with “True”, “False” or “Don't Know”: a correct answer is awarded 1 point; an incorrect is penalized 0.5 points; “don't know” scores 0 points. At least 60 % correct answers are needed to pass the exam. After explaining the rules to the chatbots, he entire question with the 5 statements was input. The rate of correct answers and the final score were collected. The exam simulation was repeated 3 times with randomly generated questions.

Results

Google Gemini and ChatGPT-4 succeed in EBO exam simulations in all 3 cases, with an average 85.3 ± 3.1 % and 83.3 ± 2.4 % of correct answers. Gemini had a lower error rate compared to ChatGPT (6.7 ± 1.5 % vs. 13.0 ± 2.6 %, p = 0.03), but answered “Don't know” more frequently (8.0 ± 2.7 % vs. 3.7 ± 1.5 %, p = 0.05). Both chatbots scored at least 70 % of correct answers in each exam subspecialty across the 3 simulations. Converting the percentages into points, Gemini scored 213.5 ± 9.3 points on average, compared to 199.8 ± 7.1 points for ChatGPT (p = 0.21).

Conclusions

Google Gemini and ChatGPT-4 can both succeed in a complex ophthalmology examination on widespread topics, with higher accuracy compared to their former versions, highlighting their evolving importance in educational and informative setting.

Precis

Google Gemini and ChatGPT-4 were both able to succeed in 3 consecutive exam simulations of the European Board of Ophthalmologists with an average of 85 % and 83 % correct answers, respectively. Google Gemini showed significantly less errors when compared to ChatGPT.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Intravitreal dexamethasone implant concomitant to cataract surgery in retinitis pigmentosa: potential retinal preservation effect FaceFinder: A machine learning tool for identification of facial images from heterogenous datasets Gender based differences in electronic medical record utilization in an academic ophthalmology practice Evolving practice patterns of young retinal specialists: A five-year comparison of treatment and surgical preferences Candida parapsilosis keratitis: A case series
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1