Performance of ChatGPT and Bard on the official part 1 FRCOphth practice questions.

IF 3.7 2区医学 Q1 OPHTHALMOLOGY British Journal of Ophthalmology Pub Date : 2024-09-20 DOI:10.1136/bjo-2023-324091

Thomas Fowler, Simon Pullen, Liam Birkett

{"title":"Performance of ChatGPT and Bard on the official part 1 FRCOphth practice questions.","authors":"Thomas Fowler, Simon Pullen, Liam Birkett","doi":"10.1136/bjo-2023-324091","DOIUrl":null,"url":null,"abstract":"Background: Chat Generative Pre-trained Transformer (ChatGPT), a large language model by OpenAI, and Bard, Google's artificial intelligence (AI) chatbot, have been evaluated in various contexts. This study aims to assess these models' proficiency in the part 1 Fellowship of the Royal College of Ophthalmologists (FRCOphth) Multiple Choice Question (MCQ) examination, highlighting their potential in medical education.Methods: Both models were tested on a sample question bank for the part 1 FRCOphth MCQ exam. Their performances were compared with historical human performance on the exam, focusing on the ability to comprehend, retain and apply information related to ophthalmology. We also tested it on the book 'MCQs for FRCOpth part 1', and assessed its performance across subjects.Results: ChatGPT demonstrated a strong performance, surpassing historical human pass marks and examination performance, while Bard underperformed. The comparison indicates the potential of certain AI models to match, and even exceed, human standards in such tasks.Conclusion: The results demonstrate the potential of AI models, such as ChatGPT, in processing and applying medical knowledge at a postgraduate level. However, performance varied among different models, highlighting the importance of appropriate AI selection. The study underlines the potential for AI applications in medical education and the necessity for further investigation into their strengths and limitations.","PeriodicalId":9313,"journal":{"name":"British Journal of Ophthalmology","volume":null,"pages":null},"PeriodicalIF":3.7000,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"British Journal of Ophthalmology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1136/bjo-2023-324091","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Chat Generative Pre-trained Transformer (ChatGPT), a large language model by OpenAI, and Bard, Google's artificial intelligence (AI) chatbot, have been evaluated in various contexts. This study aims to assess these models' proficiency in the part 1 Fellowship of the Royal College of Ophthalmologists (FRCOphth) Multiple Choice Question (MCQ) examination, highlighting their potential in medical education.

Methods: Both models were tested on a sample question bank for the part 1 FRCOphth MCQ exam. Their performances were compared with historical human performance on the exam, focusing on the ability to comprehend, retain and apply information related to ophthalmology. We also tested it on the book 'MCQs for FRCOpth part 1', and assessed its performance across subjects.

Results: ChatGPT demonstrated a strong performance, surpassing historical human pass marks and examination performance, while Bard underperformed. The comparison indicates the potential of certain AI models to match, and even exceed, human standards in such tasks.

Conclusion: The results demonstrate the potential of AI models, such as ChatGPT, in processing and applying medical knowledge at a postgraduate level. However, performance varied among different models, highlighting the importance of appropriate AI selection. The study underlines the potential for AI applications in medical education and the necessity for further investigation into their strengths and limitations.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

ChatGPT和Bard在官方第1部分FRCOphth练习问题上的表现。

背景：OpenAI的大型语言模型Chat Generative Pre-trained Transformer（ChatGPT）和谷歌的人工智能聊天机器人Bard已经在各种背景下进行了评估。本研究旨在评估这些模型在英国皇家眼科学院第一部分奖学金（FRCOphth）多项选择题（MCQ）考试中的熟练程度，突出其在医学教育中的潜力。方法：在第1部分FRCOphth MCQ考试的样本题库上测试这两种模型。将它们的表现与历史人类在考试中的表现进行比较，重点是理解、保留和应用眼科相关信息的能力。我们还在《FRCOpth第1部分的MCQ》一书中对其进行了测试，并评估了其在各受试者中的表现。结果：ChatGPT表现强劲，超过了历史人类及格分数和考试成绩，而Bard表现不佳。这一比较表明，某些人工智能模型在此类任务中有可能达到甚至超过人类标准。结论：研究结果证明了人工智能模型（如ChatGPT）在研究生水平处理和应用医学知识方面的潜力。然而，不同型号的性能各不相同，突出了适当的人工智能选择的重要性。这项研究强调了人工智能在医学教育中应用的潜力，以及进一步研究其优势和局限性的必要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

British Journal of Ophthalmology 医学-眼科学

CiteScore

10.30

自引率

2.40%

发文量

213

审稿时长

3-6 weeks

期刊介绍： The British Journal of Ophthalmology (BJO) is an international peer-reviewed journal for ophthalmologists and visual science specialists. BJO publishes clinical investigations, clinical observations, and clinically relevant laboratory investigations related to ophthalmology. It also provides major reviews and also publishes manuscripts covering regional issues in a global context.