ChatGPT efficacy for answering musculoskeletal anatomy questions: a study evaluating quality and consistency between raters and timepoints

IF 1.2 4区医学 Q3 ANATOMY & MORPHOLOGY Surgical and Radiologic Anatomy Pub Date : 2024-09-12 DOI:10.1007/s00276-024-03477-9

Nikolaos Mantzou, Vasileios Ediaroglou, Elena Drakonaki, Spyros A. Syggelos, Filippos F. Karageorgos, Trifon Totlis

{"title":"ChatGPT efficacy for answering musculoskeletal anatomy questions: a study evaluating quality and consistency between raters and timepoints","authors":"Nikolaos Mantzou, Vasileios Ediaroglou, Elena Drakonaki, Spyros A. Syggelos, Filippos F. Karageorgos, Trifon Totlis","doi":"10.1007/s00276-024-03477-9","DOIUrl":null,"url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Purpose</h3><p>There is increasing interest in the use of digital platforms such as ChatGPT for anatomy education. This study aims to evaluate the efficacy of ChatGPT in providing accurate and consistent responses to questions focusing on musculoskeletal anatomy across various time points (hours and days).</p><h3 data-test=\"abstract-sub-heading\">Methods</h3><p>A selection of 6 Anatomy-related questions were asked to ChatGPT 3.5 in 4 different timepoints. All answers were rated blindly by 3 expert raters for quality according to a 5 -point Likert Scale. Difference of 0 or 1 points in Likert scale scores between raters was considered as agreement and between different timepoints was considered as consistent indicating good reproducibility.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>There was significant variation in the quality of the answers ranging from extremely good to very poor quality. There was also variation of consistency levels between different timepoints. Answers were rated as good quality (<i>≥</i> 3 in Likert scale) in 50% of cases (3/6) and as consistent in 66.6% (4/6) of cases. In the low-quality answers, significant mistakes, conflicting data or lack of information were encountered.</p><h3 data-test=\"abstract-sub-heading\">Conclusion</h3><p>As of the time of this article, the quality and consistency of the ChatGPT v3.5 answers is variable, thus limiting its utility as independent and reliable resource of learning musculoskeletal anatomy. Validating information by reviewing the anatomical literature is highly recommended.</p>","PeriodicalId":49296,"journal":{"name":"Surgical and Radiologic Anatomy","volume":"734 1","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Surgical and Radiologic Anatomy","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00276-024-03477-9","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ANATOMY & MORPHOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose

There is increasing interest in the use of digital platforms such as ChatGPT for anatomy education. This study aims to evaluate the efficacy of ChatGPT in providing accurate and consistent responses to questions focusing on musculoskeletal anatomy across various time points (hours and days).

Methods

A selection of 6 Anatomy-related questions were asked to ChatGPT 3.5 in 4 different timepoints. All answers were rated blindly by 3 expert raters for quality according to a 5 -point Likert Scale. Difference of 0 or 1 points in Likert scale scores between raters was considered as agreement and between different timepoints was considered as consistent indicating good reproducibility.

Results

There was significant variation in the quality of the answers ranging from extremely good to very poor quality. There was also variation of consistency levels between different timepoints. Answers were rated as good quality (≥ 3 in Likert scale) in 50% of cases (3/6) and as consistent in 66.6% (4/6) of cases. In the low-quality answers, significant mistakes, conflicting data or lack of information were encountered.

Conclusion

As of the time of this article, the quality and consistency of the ChatGPT v3.5 answers is variable, thus limiting its utility as independent and reliable resource of learning musculoskeletal anatomy. Validating information by reviewing the anatomical literature is highly recommended.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

ChatGPT 在回答肌肉骨骼解剖问题方面的功效：一项评估评分者和时间点之间的质量和一致性的研究

目的人们对使用 ChatGPT 等数字平台进行解剖学教育越来越感兴趣。本研究旨在评估 ChatGPT 在不同时间点（小时和天）对肌肉骨骼解剖学相关问题提供准确一致回答的效果。所有答案均由 3 位专家根据 5 点李克特量表进行盲评。评分者之间的李克特量表评分相差 0 或 1 分被视为一致，不同时间点之间的评分相差 0 或 1 分被视为一致，表明具有良好的可重复性。不同时间点之间的一致性水平也存在差异。在 50%的案例（3/6）中，答案质量被评为良好（李克特量表≥ 3），在 66.6%的案例（4/6）中，答案质量被评为一致。结论截至本文撰写之时，ChatGPT v3.5 答案的质量和一致性参差不齐，因此限制了其作为学习肌肉骨骼解剖学的独立可靠资源的实用性。强烈建议通过查阅解剖学文献来验证信息。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Surgical and Radiologic Anatomy ANATOMY & MORPHOLOGY-RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

CiteScore

2.70

自引率

14.30%

发文量

183

审稿时长

4-8 weeks

期刊介绍： Anatomy is a morphological science which cannot fail to interest the clinician. The practical application of anatomical research to clinical problems necessitates special adaptation and selectivity in choosing from numerous international works. Although there is a tendency to believe that meaningful advances in anatomy are unlikely, constant revision is necessary. Surgical and Radiologic Anatomy, the first international journal of Clinical anatomy has been created in this spirit. Its goal is to serve clinicians, regardless of speciality-physicians, surgeons, radiologists or other specialists-as an indispensable aid with which they can improve their knowledge of anatomy. Each issue includes: Original papers, review articles, articles on the anatomical bases of medical, surgical and radiological techniques, articles of normal radiologic anatomy, brief reviews of anatomical publications of clinical interest. Particular attention is given to high quality illustrations, which are indispensable for a better understanding of anatomical problems. Surgical and Radiologic Anatomy is a journal written by anatomists for clinicians with a special interest in anatomy.