Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical education in Serbia: a descriptive study

IF 9.3 Q1 EDUCATION, SCIENTIFIC DISCIPLINES Journal of Educational Evaluation for Health Professions Pub Date : 2023-01-01 Epub Date: 2023-10-16 DOI:10.3352/jeehp.2023.20.28
Aleksandra Ignjatović, Lazar Stevanović
{"title":"Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical education in Serbia: a descriptive study","authors":"Aleksandra Ignjatović, Lazar Stevanović","doi":"10.3352/jeehp.2023.20.28","DOIUrl":null,"url":null,"abstract":"Purpose This study aimed to assess the performance of ChatGPT (GPT-3.5 and GPT-4) as a study tool in solving biostatistical problems and to identify any potential drawbacks that might arise from using ChatGPT in medical education, particularly in solving practical biostatistical problems. Methods ChatGPT was tested to evaluate its ability to solve biostatistical problems from the Handbook of Medical Statistics by Peacock and Peacock in this descriptive study. Tables from the problems were transformed into textual questions. Ten biostatistical problems were randomly chosen and used as text-based input for conversation with ChatGPT (versions 3.5 and 4). Results GPT-3.5 solved 5 practical problems in the first attempt, related to categorical data, cross-sectional study, measuring reliability, probability properties, and the t-test. GPT-3.5 failed to provide correct answers regarding analysis of variance, the chi-square test, and sample size within 3 attempts. GPT-4 also solved a task related to the confidence interval in the first attempt and solved all questions within 3 attempts, with precise guidance and monitoring. Conclusion The assessment of both versions of ChatGPT performance in 10 biostatistical problems revealed that GPT-3.5 and 4’s performance was below average, with correct response rates of 5 and 6 out of 10 on the first attempt. GPT-4 succeeded in providing all correct answers within 3 attempts. These findings indicate that students must be aware that this tool, even when providing and calculating different statistical analyses, can be wrong, and they should be aware of ChatGPT’s limitations and be careful when incorporating this model into medical education.","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"20 ","pages":"28"},"PeriodicalIF":9.3000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10646144/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Educational Evaluation for Health Professions","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3352/jeehp.2023.20.28","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/10/16 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose This study aimed to assess the performance of ChatGPT (GPT-3.5 and GPT-4) as a study tool in solving biostatistical problems and to identify any potential drawbacks that might arise from using ChatGPT in medical education, particularly in solving practical biostatistical problems. Methods ChatGPT was tested to evaluate its ability to solve biostatistical problems from the Handbook of Medical Statistics by Peacock and Peacock in this descriptive study. Tables from the problems were transformed into textual questions. Ten biostatistical problems were randomly chosen and used as text-based input for conversation with ChatGPT (versions 3.5 and 4). Results GPT-3.5 solved 5 practical problems in the first attempt, related to categorical data, cross-sectional study, measuring reliability, probability properties, and the t-test. GPT-3.5 failed to provide correct answers regarding analysis of variance, the chi-square test, and sample size within 3 attempts. GPT-4 also solved a task related to the confidence interval in the first attempt and solved all questions within 3 attempts, with precise guidance and monitoring. Conclusion The assessment of both versions of ChatGPT performance in 10 biostatistical problems revealed that GPT-3.5 and 4’s performance was below average, with correct response rates of 5 and 6 out of 10 on the first attempt. GPT-4 succeeded in providing all correct answers within 3 attempts. These findings indicate that students must be aware that this tool, even when providing and calculating different statistical analyses, can be wrong, and they should be aware of ChatGPT’s limitations and be careful when incorporating this model into medical education.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
ChatGPT作为医学教育中生物统计学问题解决工具的有效性和局限性:一项描述性研究。
目的:本研究旨在评估ChatGPT(GPT-3.5和GPT-4)作为解决生物统计学问题的研究工具的性能,并确定在医学教育中使用ChatGPT可能产生的任何潜在缺陷,特别是在解决实际生物统计学问题时。方法:在这项描述性研究中,对ChatGPT进行测试,以评估其解决Peacock和Peacock的《医学统计手册》中的生物统计学问题的能力。问题的表格被转换为文本问题。随机选择10个生物统计学问题,并将其用作与ChatGPT(3.5版和4版)对话的基于文本的输入。结果:GPT-3.5在第一次尝试中解决了5个实际问题,涉及分类数据、横断面研究、测量可靠性、概率属性和t检验。GPT-3.5未能在3次尝试中提供有关ANOVA、卡方检验和样本量的正确答案。GPT-4还在第一次尝试中解决了一项与置信区间有关的任务,并在3次尝试内解决了所有问题,并进行了精确的指导和监测。结论:对两个版本的ChatGPT在10个生物统计学问题中的表现的评估显示,GPT-3.5和4的表现低于平均水平,第一次尝试的正确回答率分别为5和6(满分10)。GPT-4在3次尝试内成功提供了所有正确答案。这些发现表明,学生们必须意识到,即使在提供和计算不同的统计分析时,这种工具也可能是错误的,他们应该意识到ChatGPT的局限性,并在将这种模式纳入医学教育时要小心。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
9.60
自引率
9.10%
发文量
32
审稿时长
5 weeks
期刊介绍: Journal of Educational Evaluation for Health Professions aims to provide readers the state-of-the art practical information on the educational evaluation for health professions so that to increase the quality of undergraduate, graduate, and continuing education. It is specialized in educational evaluation including adoption of measurement theory to medical health education, promotion of high stakes examination such as national licensing examinations, improvement of nationwide or international programs of education, computer-based testing, computerized adaptive testing, and medical health regulatory bodies. Its field comprises a variety of professions that address public medical health as following but not limited to: Care workers Dental hygienists Dental technicians Dentists Dietitians Emergency medical technicians Health educators Medical record technicians Medical technologists Midwives Nurses Nursing aides Occupational therapists Opticians Oriental medical doctors Oriental medicine dispensers Oriental pharmacists Pharmacists Physical therapists Physicians Prosthetists and Orthotists Radiological technologists Rehabilitation counselor Sanitary technicians Speech-language therapists.
期刊最新文献
The irtQ R package: a user-friendly tool for item response theory-based test data analysis and calibration. Insights into undergraduate medical student selection tools: a systematic review and meta-analysis. Importance, performance frequency, and predicted future importance of dietitians’ jobs by practicing dietitians in Korea: a survey study Presidential address 2024: the expansion of computer-based testing to numerous health professions licensing examinations in Korea, preparation of computer-based practical tests, and adoption of the medical metaverse. Development and validity evidence for the resident-led large group teaching assessment instrument in the United States: a methodological study.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1