Miloš Bajčetić , Aleksandar Mirčić , Jelena Rakočević, Danilo Đoković, Katarina Milutinović, Ivan Zaletel
{"title":"Comparing the performance of artificial intelligence learning models to medical students in solving histology and embryology multiple choice questions","authors":"Miloš Bajčetić , Aleksandar Mirčić , Jelena Rakočević, Danilo Đoković, Katarina Milutinović, Ivan Zaletel","doi":"10.1016/j.aanat.2024.152261","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><p>The appearance of artificial intelligence language models (AI LMs) in the form of chatbots has gained a lot of popularity worldwide, potentially interfering with different aspects of education, including medical education as well. The present study aims to assess the accuracy and consistency of different AI LMs regarding the histology and embryology knowledge obtained during the 1st year of medical studies.</p></div><div><h3>Methods</h3><p>Five different chatbots (ChatGPT, Bing AI, Bard AI, Perplexity AI, and ChatSonic) were given two sets of multiple-choice questions (MCQs). AI LMs test results were compared to the same test results obtained from 1st year medical students. Chatbots were instructed to use revised Bloom’s taxonomy when classifying questions depending on hierarchical cognitive domains. Simultaneously, two histology teachers independently rated the questions applying the same criteria, followed by the comparison between chatbots’ and teachers’ question classification. The consistency of chatbots’ answers was explored by giving the chatbots the same tests two months apart.</p></div><div><h3>Results</h3><p>AI LMs successfully and correctly solved MCQs regarding histology and embryology material. All five chatbots showed better results than the 1st year medical students on both histology and embryology tests. Chatbots showed poor results when asked to classify the questions according to revised Bloom’s cognitive taxonomy compared to teachers. There was an inverse correlation between the difficulty of questions and their correct classification by the chatbots. Retesting the chatbots after two months showed a lack of consistency concerning both MCQs answers and question classification according to revised Bloom’s taxonomy learning stage.</p></div><div><h3>Conclusion</h3><p>Despite the ability of certain chatbots to provide correct answers to the majority of diverse and heterogeneous questions, a lack of consistency in answers over time warrants their careful use as a medical education tool.</p></div>","PeriodicalId":50974,"journal":{"name":"Annals of Anatomy-Anatomischer Anzeiger","volume":"254 ","pages":"Article 152261"},"PeriodicalIF":2.0000,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Anatomy-Anatomischer Anzeiger","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0940960224000530","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ANATOMY & MORPHOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction
The appearance of artificial intelligence language models (AI LMs) in the form of chatbots has gained a lot of popularity worldwide, potentially interfering with different aspects of education, including medical education as well. The present study aims to assess the accuracy and consistency of different AI LMs regarding the histology and embryology knowledge obtained during the 1st year of medical studies.
Methods
Five different chatbots (ChatGPT, Bing AI, Bard AI, Perplexity AI, and ChatSonic) were given two sets of multiple-choice questions (MCQs). AI LMs test results were compared to the same test results obtained from 1st year medical students. Chatbots were instructed to use revised Bloom’s taxonomy when classifying questions depending on hierarchical cognitive domains. Simultaneously, two histology teachers independently rated the questions applying the same criteria, followed by the comparison between chatbots’ and teachers’ question classification. The consistency of chatbots’ answers was explored by giving the chatbots the same tests two months apart.
Results
AI LMs successfully and correctly solved MCQs regarding histology and embryology material. All five chatbots showed better results than the 1st year medical students on both histology and embryology tests. Chatbots showed poor results when asked to classify the questions according to revised Bloom’s cognitive taxonomy compared to teachers. There was an inverse correlation between the difficulty of questions and their correct classification by the chatbots. Retesting the chatbots after two months showed a lack of consistency concerning both MCQs answers and question classification according to revised Bloom’s taxonomy learning stage.
Conclusion
Despite the ability of certain chatbots to provide correct answers to the majority of diverse and heterogeneous questions, a lack of consistency in answers over time warrants their careful use as a medical education tool.
期刊介绍:
Annals of Anatomy publish peer reviewed original articles as well as brief review articles. The journal is open to original papers covering a link between anatomy and areas such as
•molecular biology,
•cell biology
•reproductive biology
•immunobiology
•developmental biology, neurobiology
•embryology as well as
•neuroanatomy
•neuroimmunology
•clinical anatomy
•comparative anatomy
•modern imaging techniques
•evolution, and especially also
•aging