大语言模型 ChatGPT 在日本全国护士考试中的表现:评估研究。

JMIR nursing Pub Date : 2023-06-27 DOI:10.2196/47305
Kazuya Taira, Takahiro Itaya, Ayame Hanada
{"title":"大语言模型 ChatGPT 在日本全国护士考试中的表现:评估研究。","authors":"Kazuya Taira, Takahiro Itaya, Ayame Hanada","doi":"10.2196/47305","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>ChatGPT, a large language model, has shown good performance on physician certification examinations and medical consultations. However, its performance has not been examined in languages other than English or on nursing examinations.</p><p><strong>Objective: </strong>We aimed to evaluate the performance of ChatGPT on the Japanese National Nurse Examinations.</p><p><strong>Methods: </strong>We evaluated the percentages of correct answers provided by ChatGPT (GPT-3.5) for all questions on the Japanese National Nurse Examinations from 2019 to 2023, excluding inappropriate questions and those containing images. Inappropriate questions were pointed out by a third-party organization and announced by the government to be excluded from scoring. Specifically, these include \"questions with inappropriate question difficulty\" and \"questions with errors in the questions or choices.\" These examinations consist of 240 questions each year, divided into basic knowledge questions that test the basic issues of particular importance to nurses and general questions that test a wide range of specialized knowledge. Furthermore, the questions had 2 types of formats: simple-choice and situation-setup questions. Simple-choice questions are primarily knowledge-based and multiple-choice, whereas situation-setup questions entail the candidate reading a patient's and family situation's description, and selecting the nurse's action or patient's response. Hence, the questions were standardized using 2 types of prompts before requesting answers from ChatGPT. Chi-square tests were conducted to compare the percentage of correct answers for each year's examination format and specialty area related to the question. In addition, a Cochran-Armitage trend test was performed with the percentage of correct answers from 2019 to 2023.</p><p><strong>Results: </strong>The 5-year average percentage of correct answers for ChatGPT was 75.1% (SD 3%) for basic knowledge questions and 64.5% (SD 5%) for general questions. The highest percentage of correct answers on the 2019 examination was 80% for basic knowledge questions and 71.2% for general questions. ChatGPT met the passing criteria for the 2019 Japanese National Nurse Examination and was close to passing the 2020-2023 examinations, with only a few more correct answers required to pass. ChatGPT had a lower percentage of correct answers in some areas, such as pharmacology, social welfare, related law and regulations, endocrinology/metabolism, and dermatology, and a higher percentage of correct answers in the areas of nutrition, pathology, hematology, ophthalmology, otolaryngology, dentistry and dental surgery, and nursing integration and practice.</p><p><strong>Conclusions: </strong>ChatGPT only passed the 2019 Japanese National Nursing Examination during the most recent 5 years. Although it did not pass the examinations from other years, it performed very close to the passing level, even in those containing questions related to psychology, communication, and nursing.</p>","PeriodicalId":73556,"journal":{"name":"JMIR nursing","volume":"6 ","pages":"e47305"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10337249/pdf/","citationCount":"0","resultStr":"{\"title\":\"Performance of the Large Language Model ChatGPT on the National Nurse Examinations in Japan: Evaluation Study.\",\"authors\":\"Kazuya Taira, Takahiro Itaya, Ayame Hanada\",\"doi\":\"10.2196/47305\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>ChatGPT, a large language model, has shown good performance on physician certification examinations and medical consultations. However, its performance has not been examined in languages other than English or on nursing examinations.</p><p><strong>Objective: </strong>We aimed to evaluate the performance of ChatGPT on the Japanese National Nurse Examinations.</p><p><strong>Methods: </strong>We evaluated the percentages of correct answers provided by ChatGPT (GPT-3.5) for all questions on the Japanese National Nurse Examinations from 2019 to 2023, excluding inappropriate questions and those containing images. Inappropriate questions were pointed out by a third-party organization and announced by the government to be excluded from scoring. Specifically, these include \\\"questions with inappropriate question difficulty\\\" and \\\"questions with errors in the questions or choices.\\\" These examinations consist of 240 questions each year, divided into basic knowledge questions that test the basic issues of particular importance to nurses and general questions that test a wide range of specialized knowledge. Furthermore, the questions had 2 types of formats: simple-choice and situation-setup questions. Simple-choice questions are primarily knowledge-based and multiple-choice, whereas situation-setup questions entail the candidate reading a patient's and family situation's description, and selecting the nurse's action or patient's response. Hence, the questions were standardized using 2 types of prompts before requesting answers from ChatGPT. Chi-square tests were conducted to compare the percentage of correct answers for each year's examination format and specialty area related to the question. In addition, a Cochran-Armitage trend test was performed with the percentage of correct answers from 2019 to 2023.</p><p><strong>Results: </strong>The 5-year average percentage of correct answers for ChatGPT was 75.1% (SD 3%) for basic knowledge questions and 64.5% (SD 5%) for general questions. The highest percentage of correct answers on the 2019 examination was 80% for basic knowledge questions and 71.2% for general questions. ChatGPT met the passing criteria for the 2019 Japanese National Nurse Examination and was close to passing the 2020-2023 examinations, with only a few more correct answers required to pass. ChatGPT had a lower percentage of correct answers in some areas, such as pharmacology, social welfare, related law and regulations, endocrinology/metabolism, and dermatology, and a higher percentage of correct answers in the areas of nutrition, pathology, hematology, ophthalmology, otolaryngology, dentistry and dental surgery, and nursing integration and practice.</p><p><strong>Conclusions: </strong>ChatGPT only passed the 2019 Japanese National Nursing Examination during the most recent 5 years. Although it did not pass the examinations from other years, it performed very close to the passing level, even in those containing questions related to psychology, communication, and nursing.</p>\",\"PeriodicalId\":73556,\"journal\":{\"name\":\"JMIR nursing\",\"volume\":\"6 \",\"pages\":\"e47305\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10337249/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR nursing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2196/47305\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR nursing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/47305","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

背景介绍ChatGPT 是一种大型语言模型,在医生资格考试和医疗咨询方面表现良好。然而,它在英语以外的语言或护士考试中的表现还未得到检验:我们旨在评估 ChatGPT 在日本全国护士考试中的表现:我们评估了 ChatGPT(GPT-3.5)在 2019 年至 2023 年日本全国护士考试的所有试题中提供的正确答案百分比,其中不包括不恰当的试题和包含图片的试题。不恰当的问题由第三方机构指出,并由政府宣布不计分。具体来说,这些问题包括 "问题难度不适当的问题 "和 "问题或选项有错误的问题"。这些考试每年有 240 道题,分为考查对护士特别重要的基本问题的基础知识题和考查各种专业知识的综合题。此外,试题有两种形式:简单选择题和情境设置题。简单选择题主要是以知识为基础的多选题,而情境设置题则需要考生阅读病人和家庭的情境描述,并选择护士的行动或病人的反应。因此,在要求 ChatGPT 提供答案之前,先使用两种类型的提示对问题进行了标准化。通过卡方检验比较了每年考试形式和问题相关专业领域的正确答案百分比。此外,还对 2019 年至 2023 年的正确答案百分比进行了 Cochran-Armitage 趋势检验:ChatGPT 5 年的平均正确率为:基础知识题 75.1%(标准差 3%),综合题 64.5%(标准差 5%)。在 2019 年的考试中,基础知识题的最高正确率为 80%,综合题为 71.2%。ChatGPT 达到了 2019 年日本全国护士考试的合格标准,并接近通过 2020-2023 年的考试,只需再答对几题即可通过。ChatGPT 在药理学、社会福利、相关法律法规、内分泌/代谢、皮肤病学等部分领域的答题正确率较低,而在营养学、病理学、血液学、眼科学、耳鼻咽喉科学、口腔医学与口腔外科、护理综合与实践等领域的答题正确率较高:ChatGPT 在最近 5 年中只通过了 2019 年日本全国护理考试。虽然没有通过其他年份的考试,但它的表现非常接近及格水平,甚至在那些包含与心理学、沟通和护理相关的问题的考试中也是如此。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

摘要图片

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Performance of the Large Language Model ChatGPT on the National Nurse Examinations in Japan: Evaluation Study.

Background: ChatGPT, a large language model, has shown good performance on physician certification examinations and medical consultations. However, its performance has not been examined in languages other than English or on nursing examinations.

Objective: We aimed to evaluate the performance of ChatGPT on the Japanese National Nurse Examinations.

Methods: We evaluated the percentages of correct answers provided by ChatGPT (GPT-3.5) for all questions on the Japanese National Nurse Examinations from 2019 to 2023, excluding inappropriate questions and those containing images. Inappropriate questions were pointed out by a third-party organization and announced by the government to be excluded from scoring. Specifically, these include "questions with inappropriate question difficulty" and "questions with errors in the questions or choices." These examinations consist of 240 questions each year, divided into basic knowledge questions that test the basic issues of particular importance to nurses and general questions that test a wide range of specialized knowledge. Furthermore, the questions had 2 types of formats: simple-choice and situation-setup questions. Simple-choice questions are primarily knowledge-based and multiple-choice, whereas situation-setup questions entail the candidate reading a patient's and family situation's description, and selecting the nurse's action or patient's response. Hence, the questions were standardized using 2 types of prompts before requesting answers from ChatGPT. Chi-square tests were conducted to compare the percentage of correct answers for each year's examination format and specialty area related to the question. In addition, a Cochran-Armitage trend test was performed with the percentage of correct answers from 2019 to 2023.

Results: The 5-year average percentage of correct answers for ChatGPT was 75.1% (SD 3%) for basic knowledge questions and 64.5% (SD 5%) for general questions. The highest percentage of correct answers on the 2019 examination was 80% for basic knowledge questions and 71.2% for general questions. ChatGPT met the passing criteria for the 2019 Japanese National Nurse Examination and was close to passing the 2020-2023 examinations, with only a few more correct answers required to pass. ChatGPT had a lower percentage of correct answers in some areas, such as pharmacology, social welfare, related law and regulations, endocrinology/metabolism, and dermatology, and a higher percentage of correct answers in the areas of nutrition, pathology, hematology, ophthalmology, otolaryngology, dentistry and dental surgery, and nursing integration and practice.

Conclusions: ChatGPT only passed the 2019 Japanese National Nursing Examination during the most recent 5 years. Although it did not pass the examinations from other years, it performed very close to the passing level, even in those containing questions related to psychology, communication, and nursing.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.20
自引率
0.00%
发文量
0
审稿时长
16 weeks
期刊最新文献
Unobtrusive Nighttime Movement Monitoring to Support Nursing Home Continence Care: Algorithm Development and Validation Study. Educators' perceptions and experiences of online teaching to foster caring professions students' development of virtual caring skills: A sequential explanatory mixed-methods study. Assessing Visitor Expectations of AI Nursing Robots in Hospital Settings: Cross-Sectional Study Using the Kano Model. Calculating Optimal Patient to Nursing Capacity: Comparative Analysis of Traditional and New Methods. Remote Patient Monitoring at Home in Patients With COVID-19: Narrative Review.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1