ChatGPT未通过台湾家庭医学委员会考试。

IF 1.9 4区 医学 Q2 MEDICINE, GENERAL & INTERNAL Journal of the Chinese Medical Association Pub Date : 2023-08-01 DOI:10.1097/JCMA.0000000000000946
Tzu-Ling Weng, Ying-Mei Wang, Samuel Chang, Tzeng-Ji Chen, Shinn-Jang Hwang
{"title":"ChatGPT未通过台湾家庭医学委员会考试。","authors":"Tzu-Ling Weng,&nbsp;Ying-Mei Wang,&nbsp;Samuel Chang,&nbsp;Tzeng-Ji Chen,&nbsp;Shinn-Jang Hwang","doi":"10.1097/JCMA.0000000000000946","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Chat Generative Pre-trained Transformer (ChatGPT), OpenAI Limited Partnership, San Francisco, CA, USA is an artificial intelligence language model gaining popularity because of its large database and ability to interpret and respond to various queries. Although it has been tested by researchers in different fields, its performance varies depending on the domain. We aimed to further test its ability in the medical field.</p><p><strong>Methods: </strong>We used questions from Taiwan's 2022 Family Medicine Board Exam, which combined both Chinese and English and covered various question types, including reverse questions and multiple-choice questions, and mainly focused on general medical knowledge. We pasted each question into ChatGPT and recorded its response, comparing it to the correct answer provided by the exam board. We used SAS 9.4 (Cary, North Carolina, USA) and Excel to calculate the accuracy rates for each question type.</p><p><strong>Results: </strong>ChatGPT answered 52 questions out of 125 correctly, with an accuracy rate of 41.6%. The questions' length did not affect the accuracy rates. These were 45.5%, 33.3%, 58.3%, 50.0%, and 43.5% for negative-phrase questions, multiple-choice questions, mutually exclusive options, case scenario questions, and Taiwan's local policy-related questions, with no statistical difference observed.</p><p><strong>Conclusion: </strong>ChatGPT's accuracy rate was not good enough for Taiwan's Family Medicine Board Exam. Possible reasons include the difficulty level of the specialist exam and the relatively weak database of traditional Chinese language resources. However, ChatGPT performed acceptably in negative-phrase questions, mutually exclusive questions, and case scenario questions, and it can be a helpful tool for learning and exam preparation. Future research can explore ways to improve ChatGPT's accuracy rate for specialized exams and other domains.</p>","PeriodicalId":17251,"journal":{"name":"Journal of the Chinese Medical Association","volume":"86 8","pages":"762-766"},"PeriodicalIF":1.9000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"ChatGPT failed Taiwan's Family Medicine Board Exam.\",\"authors\":\"Tzu-Ling Weng,&nbsp;Ying-Mei Wang,&nbsp;Samuel Chang,&nbsp;Tzeng-Ji Chen,&nbsp;Shinn-Jang Hwang\",\"doi\":\"10.1097/JCMA.0000000000000946\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Chat Generative Pre-trained Transformer (ChatGPT), OpenAI Limited Partnership, San Francisco, CA, USA is an artificial intelligence language model gaining popularity because of its large database and ability to interpret and respond to various queries. Although it has been tested by researchers in different fields, its performance varies depending on the domain. We aimed to further test its ability in the medical field.</p><p><strong>Methods: </strong>We used questions from Taiwan's 2022 Family Medicine Board Exam, which combined both Chinese and English and covered various question types, including reverse questions and multiple-choice questions, and mainly focused on general medical knowledge. We pasted each question into ChatGPT and recorded its response, comparing it to the correct answer provided by the exam board. We used SAS 9.4 (Cary, North Carolina, USA) and Excel to calculate the accuracy rates for each question type.</p><p><strong>Results: </strong>ChatGPT answered 52 questions out of 125 correctly, with an accuracy rate of 41.6%. The questions' length did not affect the accuracy rates. These were 45.5%, 33.3%, 58.3%, 50.0%, and 43.5% for negative-phrase questions, multiple-choice questions, mutually exclusive options, case scenario questions, and Taiwan's local policy-related questions, with no statistical difference observed.</p><p><strong>Conclusion: </strong>ChatGPT's accuracy rate was not good enough for Taiwan's Family Medicine Board Exam. Possible reasons include the difficulty level of the specialist exam and the relatively weak database of traditional Chinese language resources. However, ChatGPT performed acceptably in negative-phrase questions, mutually exclusive questions, and case scenario questions, and it can be a helpful tool for learning and exam preparation. Future research can explore ways to improve ChatGPT's accuracy rate for specialized exams and other domains.</p>\",\"PeriodicalId\":17251,\"journal\":{\"name\":\"Journal of the Chinese Medical Association\",\"volume\":\"86 8\",\"pages\":\"762-766\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2023-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the Chinese Medical Association\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1097/JCMA.0000000000000946\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICINE, GENERAL & INTERNAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Chinese Medical Association","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/JCMA.0000000000000946","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 7

摘要

背景:聊天生成预训练转换器(ChatGPT), OpenAI Limited Partnership, San Francisco, CA, USA,是一种人工智能语言模型,由于其庞大的数据库和解释和响应各种查询的能力而受到欢迎。虽然它已经被不同领域的研究人员测试过,但它的性能因领域而异。我们的目的是进一步测试它在医学领域的能力。方法:采用台湾2022年家庭医学委员会考试试题,采用中英文结合,题型多样,包括反题和选择题,以医学常识为主。我们将每个问题粘贴到ChatGPT中,并记录其回答,将其与考试委员会提供的正确答案进行比较。我们使用SAS 9.4 (Cary, North Carolina, USA)和Excel计算每个问题类型的准确率。结果:ChatGPT答对了125个问题中的52个,正确率为41.6%。问题的长度不影响准确率。否定短语题、多项选择题、互斥选项题、案例情景题和台湾地方政策题的比例分别为45.5%、33.3%、58.3%、50.0%和43.5%,差异无统计学意义。结论:ChatGPT在台湾家庭医学委员会考试中准确率不高。可能的原因包括专科考试的难度和相对薄弱的传统汉语资源数据库。然而,ChatGPT在否定短语问题、互斥问题和案例场景问题中的表现还可以接受,它可以成为学习和考试准备的有用工具。未来的研究可以探索如何提高ChatGPT在专业考试和其他领域的准确率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
ChatGPT failed Taiwan's Family Medicine Board Exam.

Background: Chat Generative Pre-trained Transformer (ChatGPT), OpenAI Limited Partnership, San Francisco, CA, USA is an artificial intelligence language model gaining popularity because of its large database and ability to interpret and respond to various queries. Although it has been tested by researchers in different fields, its performance varies depending on the domain. We aimed to further test its ability in the medical field.

Methods: We used questions from Taiwan's 2022 Family Medicine Board Exam, which combined both Chinese and English and covered various question types, including reverse questions and multiple-choice questions, and mainly focused on general medical knowledge. We pasted each question into ChatGPT and recorded its response, comparing it to the correct answer provided by the exam board. We used SAS 9.4 (Cary, North Carolina, USA) and Excel to calculate the accuracy rates for each question type.

Results: ChatGPT answered 52 questions out of 125 correctly, with an accuracy rate of 41.6%. The questions' length did not affect the accuracy rates. These were 45.5%, 33.3%, 58.3%, 50.0%, and 43.5% for negative-phrase questions, multiple-choice questions, mutually exclusive options, case scenario questions, and Taiwan's local policy-related questions, with no statistical difference observed.

Conclusion: ChatGPT's accuracy rate was not good enough for Taiwan's Family Medicine Board Exam. Possible reasons include the difficulty level of the specialist exam and the relatively weak database of traditional Chinese language resources. However, ChatGPT performed acceptably in negative-phrase questions, mutually exclusive questions, and case scenario questions, and it can be a helpful tool for learning and exam preparation. Future research can explore ways to improve ChatGPT's accuracy rate for specialized exams and other domains.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of the Chinese Medical Association
Journal of the Chinese Medical Association MEDICINE, GENERAL & INTERNAL-
CiteScore
6.20
自引率
13.30%
发文量
320
审稿时长
15.5 weeks
期刊介绍: Journal of the Chinese Medical Association, previously known as the Chinese Medical Journal (Taipei), has a long history of publishing scientific papers and has continuously made substantial contribution in the understanding and progress of a broad range of biomedical sciences. It is published monthly by Wolters Kluwer Health and indexed in Science Citation Index Expanded (SCIE), MEDLINE®, Index Medicus, EMBASE, CAB Abstracts, Sociedad Iberoamericana de Informacion Cientifica (SIIC) Data Bases, ScienceDirect, Scopus and Global Health. JCMA is the official and open access journal of the Chinese Medical Association, Taipei, Taiwan, Republic of China and is an international forum for scholarly reports in medicine, surgery, dentistry and basic research in biomedical science. As a vehicle of communication and education among physicians and scientists, the journal is open to the use of diverse methodological approaches. Reports of professional practice will need to demonstrate academic robustness and scientific rigor. Outstanding scholars are invited to give their update reviews on the perspectives of the evidence-based science in the related research field. Article types accepted include review articles, original articles, case reports, brief communications and letters to the editor
期刊最新文献
The impact of serum estradiol and progesterone levels during implantation on obstetrical complications and perinatal outcomes in frozen embryo transfer. Validation of traditional Chinese version of Sheffield Profile for Assessment and Referral for Care Questionnaire in Taiwanese patients. Multilineage differentiation potential in the infant adipose- and umbilical cord-derived mesenchymal stem cells. Winners of the 2022 honor awards for excellence at the annual meeting of the Chinese Medical Association-Taipei: Part III. Electrophysiological status indexed by early changes in impedance after cochlear implantation: A literature review.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1