Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical education in Serbia: a descriptive study

IF 3.7 Q1 EDUCATION, SCIENTIFIC DISCIPLINES Journal of Educational Evaluation for Health Professions Pub Date : 2023-01-01 Epub Date: 2023-10-16 DOI:10.3352/jeehp.2023.20.28

Aleksandra Ignjatović, Lazar Stevanović

{"title":"Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical education in Serbia: a descriptive study","authors":"Aleksandra Ignjatović, Lazar Stevanović","doi":"10.3352/jeehp.2023.20.28","DOIUrl":null,"url":null,"abstract":"Purpose This study aimed to assess the performance of ChatGPT (GPT-3.5 and GPT-4) as a study tool in solving biostatistical problems and to identify any potential drawbacks that might arise from using ChatGPT in medical education, particularly in solving practical biostatistical problems. Methods ChatGPT was tested to evaluate its ability to solve biostatistical problems from the Handbook of Medical Statistics by Peacock and Peacock in this descriptive study. Tables from the problems were transformed into textual questions. Ten biostatistical problems were randomly chosen and used as text-based input for conversation with ChatGPT (versions 3.5 and 4). Results GPT-3.5 solved 5 practical problems in the first attempt, related to categorical data, cross-sectional study, measuring reliability, probability properties, and the t-test. GPT-3.5 failed to provide correct answers regarding analysis of variance, the chi-square test, and sample size within 3 attempts. GPT-4 also solved a task related to the confidence interval in the first attempt and solved all questions within 3 attempts, with precise guidance and monitoring. Conclusion The assessment of both versions of ChatGPT performance in 10 biostatistical problems revealed that GPT-3.5 and 4’s performance was below average, with correct response rates of 5 and 6 out of 10 on the first attempt. GPT-4 succeeded in providing all correct answers within 3 attempts. These findings indicate that students must be aware that this tool, even when providing and calculating different statistical analyses, can be wrong, and they should be aware of ChatGPT’s limitations and be careful when incorporating this model into medical education.","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"20 ","pages":"28"},"PeriodicalIF":3.7000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10646144/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Educational Evaluation for Health Professions","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3352/jeehp.2023.20.28","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/10/16 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose This study aimed to assess the performance of ChatGPT (GPT-3.5 and GPT-4) as a study tool in solving biostatistical problems and to identify any potential drawbacks that might arise from using ChatGPT in medical education, particularly in solving practical biostatistical problems. Methods ChatGPT was tested to evaluate its ability to solve biostatistical problems from the Handbook of Medical Statistics by Peacock and Peacock in this descriptive study. Tables from the problems were transformed into textual questions. Ten biostatistical problems were randomly chosen and used as text-based input for conversation with ChatGPT (versions 3.5 and 4). Results GPT-3.5 solved 5 practical problems in the first attempt, related to categorical data, cross-sectional study, measuring reliability, probability properties, and the t-test. GPT-3.5 failed to provide correct answers regarding analysis of variance, the chi-square test, and sample size within 3 attempts. GPT-4 also solved a task related to the confidence interval in the first attempt and solved all questions within 3 attempts, with precise guidance and monitoring. Conclusion The assessment of both versions of ChatGPT performance in 10 biostatistical problems revealed that GPT-3.5 and 4’s performance was below average, with correct response rates of 5 and 6 out of 10 on the first attempt. GPT-4 succeeded in providing all correct answers within 3 attempts. These findings indicate that students must be aware that this tool, even when providing and calculating different statistical analyses, can be wrong, and they should be aware of ChatGPT’s limitations and be careful when incorporating this model into medical education.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

ChatGPT作为医学教育中生物统计学问题解决工具的有效性和局限性：一项描述性研究。

目的：本研究旨在评估ChatGPT（GPT-3.5和GPT-4）作为解决生物统计学问题的研究工具的性能，并确定在医学教育中使用ChatGPT可能产生的任何潜在缺陷，特别是在解决实际生物统计学问题时。方法：在这项描述性研究中，对ChatGPT进行测试，以评估其解决Peacock和Peacock的《医学统计手册》中的生物统计学问题的能力。问题的表格被转换为文本问题。随机选择10个生物统计学问题，并将其用作与ChatGPT（3.5版和4版）对话的基于文本的输入。结果：GPT-3.5在第一次尝试中解决了5个实际问题，涉及分类数据、横断面研究、测量可靠性、概率属性和t检验。GPT-3.5未能在3次尝试中提供有关ANOVA、卡方检验和样本量的正确答案。GPT-4还在第一次尝试中解决了一项与置信区间有关的任务，并在3次尝试内解决了所有问题，并进行了精确的指导和监测。结论：对两个版本的ChatGPT在10个生物统计学问题中的表现的评估显示，GPT-3.5和4的表现低于平均水平，第一次尝试的正确回答率分别为5和6（满分10）。GPT-4在3次尝试内成功提供了所有正确答案。这些发现表明，学生们必须意识到，即使在提供和计算不同的统计分析时，这种工具也可能是错误的，他们应该意识到ChatGPT的局限性，并在将这种模式纳入医学教育时要小心。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Educational Evaluation for Health Professions EDUCATION, SCIENTIFIC DISCIPLINES-

CiteScore

9.60

自引率

9.10%

发文量

审稿时长

5 weeks

期刊介绍： Journal of Educational Evaluation for Health Professions aims to provide readers the state-of-the art practical information on the educational evaluation for health professions so that to increase the quality of undergraduate, graduate, and continuing education. It is specialized in educational evaluation including adoption of measurement theory to medical health education, promotion of high stakes examination such as national licensing examinations, improvement of nationwide or international programs of education, computer-based testing, computerized adaptive testing, and medical health regulatory bodies. Its field comprises a variety of professions that address public medical health as following but not limited to: Care workers Dental hygienists Dental technicians Dentists Dietitians Emergency medical technicians Health educators Medical record technicians Medical technologists Midwives Nurses Nursing aides Occupational therapists Opticians Oriental medical doctors Oriental medicine dispensers Oriental pharmacists Pharmacists Physical therapists Physicians Prosthetists and Orthotists Radiological technologists Rehabilitation counselor Sanitary technicians Speech-language therapists.