Nikolaos Mantzou, Vasileios Ediaroglou, Elena Drakonaki, Spyros A. Syggelos, Filippos F. Karageorgos, Trifon Totlis
{"title":"ChatGPT efficacy for answering musculoskeletal anatomy questions: a study evaluating quality and consistency between raters and timepoints","authors":"Nikolaos Mantzou, Vasileios Ediaroglou, Elena Drakonaki, Spyros A. Syggelos, Filippos F. Karageorgos, Trifon Totlis","doi":"10.1007/s00276-024-03477-9","DOIUrl":null,"url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Purpose</h3><p>There is increasing interest in the use of digital platforms such as ChatGPT for anatomy education. This study aims to evaluate the efficacy of ChatGPT in providing accurate and consistent responses to questions focusing on musculoskeletal anatomy across various time points (hours and days).</p><h3 data-test=\"abstract-sub-heading\">Methods</h3><p>A selection of 6 Anatomy-related questions were asked to ChatGPT 3.5 in 4 different timepoints. All answers were rated blindly by 3 expert raters for quality according to a 5 -point Likert Scale. Difference of 0 or 1 points in Likert scale scores between raters was considered as agreement and between different timepoints was considered as consistent indicating good reproducibility.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>There was significant variation in the quality of the answers ranging from extremely good to very poor quality. There was also variation of consistency levels between different timepoints. Answers were rated as good quality (<i>≥</i> 3 in Likert scale) in 50% of cases (3/6) and as consistent in 66.6% (4/6) of cases. In the low-quality answers, significant mistakes, conflicting data or lack of information were encountered.</p><h3 data-test=\"abstract-sub-heading\">Conclusion</h3><p>As of the time of this article, the quality and consistency of the ChatGPT v3.5 answers is variable, thus limiting its utility as independent and reliable resource of learning musculoskeletal anatomy. Validating information by reviewing the anatomical literature is highly recommended.</p>","PeriodicalId":49296,"journal":{"name":"Surgical and Radiologic Anatomy","volume":"734 1","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Surgical and Radiologic Anatomy","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00276-024-03477-9","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ANATOMY & MORPHOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose
There is increasing interest in the use of digital platforms such as ChatGPT for anatomy education. This study aims to evaluate the efficacy of ChatGPT in providing accurate and consistent responses to questions focusing on musculoskeletal anatomy across various time points (hours and days).
Methods
A selection of 6 Anatomy-related questions were asked to ChatGPT 3.5 in 4 different timepoints. All answers were rated blindly by 3 expert raters for quality according to a 5 -point Likert Scale. Difference of 0 or 1 points in Likert scale scores between raters was considered as agreement and between different timepoints was considered as consistent indicating good reproducibility.
Results
There was significant variation in the quality of the answers ranging from extremely good to very poor quality. There was also variation of consistency levels between different timepoints. Answers were rated as good quality (≥ 3 in Likert scale) in 50% of cases (3/6) and as consistent in 66.6% (4/6) of cases. In the low-quality answers, significant mistakes, conflicting data or lack of information were encountered.
Conclusion
As of the time of this article, the quality and consistency of the ChatGPT v3.5 answers is variable, thus limiting its utility as independent and reliable resource of learning musculoskeletal anatomy. Validating information by reviewing the anatomical literature is highly recommended.
期刊介绍:
Anatomy is a morphological science which cannot fail to interest the clinician. The practical application of anatomical research to clinical problems necessitates special adaptation and selectivity in choosing from numerous international works. Although there is a tendency to believe that meaningful advances in anatomy are unlikely, constant revision is necessary. Surgical and Radiologic Anatomy, the first international journal of Clinical anatomy has been created in this spirit.
Its goal is to serve clinicians, regardless of speciality-physicians, surgeons, radiologists or other specialists-as an indispensable aid with which they can improve their knowledge of anatomy. Each issue includes: Original papers, review articles, articles on the anatomical bases of medical, surgical and radiological techniques, articles of normal radiologic anatomy, brief reviews of anatomical publications of clinical interest.
Particular attention is given to high quality illustrations, which are indispensable for a better understanding of anatomical problems.
Surgical and Radiologic Anatomy is a journal written by anatomists for clinicians with a special interest in anatomy.