Enhancements in artificial intelligence for medical examinations: A leap from ChatGPT 3.5 to ChatGPT 4.0 in the FRCS trauma & orthopaedics examination.
Akib Majed Khan, Khaled Maher Sarraf, Ashley Iain Simpson
{"title":"Enhancements in artificial intelligence for medical examinations: A leap from ChatGPT 3.5 to ChatGPT 4.0 in the FRCS trauma & orthopaedics examination.","authors":"Akib Majed Khan, Khaled Maher Sarraf, Ashley Iain Simpson","doi":"10.1016/j.surge.2024.11.008","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>ChatGPT is a sophisticated AI model capable of generating human-like text based on the input it receives. ChatGPT 3.5 showed an inability to pass the FRCS (Tr&Orth) examination due to a lack of higher-order judgement in previous studies. Enhancements in ChatGPT 4.0 warrant an evaluation of its performance.</p><p><strong>Methodology: </strong>Questions from the UK-based December 2022 In-Training examination were input into ChatGPT 3.5 and 4.0. Methodology from a prior study was replicated to maintain consistency, allowing for a direct comparison between the two model versions. The performance threshold remained at 65.8 %, aligning with the November 2022 sitting of Section 1 of the FRCS (Tr&Orth).</p><p><strong>Results: </strong>ChatGPT 4.0 achieved a passing score (73.9 %), indicating an improvement in its ability to analyse clinical information and make decisions reflective of a competent trauma and orthopaedic consultant. Compared to ChatGPT 4.0, version 3.5 scored 38.1 % lower, which represents a significant difference (p < 0.0001; Chi-square). The breakdown by subspecialty further demonstrated version 4.0's enhanced understanding and application in complex clinical scenarios. ChatGPT 4.0 had a significantly significant improvement in answering image-based questions (p = 0.0069) compared to its predecessor.</p><p><strong>Conclusion: </strong>ChatGPT 4.0's success in passing Section One of the FRCS (Tr&Orth) examination highlights the rapid evolution of AI technologies and their potential applications in healthcare and education.</p>","PeriodicalId":49463,"journal":{"name":"Surgeon-Journal of the Royal Colleges of Surgeons of Edinburgh and Ireland","volume":" ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Surgeon-Journal of the Royal Colleges of Surgeons of Edinburgh and Ireland","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.surge.2024.11.008","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: ChatGPT is a sophisticated AI model capable of generating human-like text based on the input it receives. ChatGPT 3.5 showed an inability to pass the FRCS (Tr&Orth) examination due to a lack of higher-order judgement in previous studies. Enhancements in ChatGPT 4.0 warrant an evaluation of its performance.
Methodology: Questions from the UK-based December 2022 In-Training examination were input into ChatGPT 3.5 and 4.0. Methodology from a prior study was replicated to maintain consistency, allowing for a direct comparison between the two model versions. The performance threshold remained at 65.8 %, aligning with the November 2022 sitting of Section 1 of the FRCS (Tr&Orth).
Results: ChatGPT 4.0 achieved a passing score (73.9 %), indicating an improvement in its ability to analyse clinical information and make decisions reflective of a competent trauma and orthopaedic consultant. Compared to ChatGPT 4.0, version 3.5 scored 38.1 % lower, which represents a significant difference (p < 0.0001; Chi-square). The breakdown by subspecialty further demonstrated version 4.0's enhanced understanding and application in complex clinical scenarios. ChatGPT 4.0 had a significantly significant improvement in answering image-based questions (p = 0.0069) compared to its predecessor.
Conclusion: ChatGPT 4.0's success in passing Section One of the FRCS (Tr&Orth) examination highlights the rapid evolution of AI technologies and their potential applications in healthcare and education.
期刊介绍:
Since its establishment in 2003, The Surgeon has established itself as one of the leading multidisciplinary surgical titles, both in print and online. The Surgeon is published for the worldwide surgical and dental communities. The goal of the Journal is to achieve wider national and international recognition, through a commitment to excellence in original research. In addition, both Colleges see the Journal as an important educational service, and consequently there is a particular focus on post-graduate development. Much of our educational role will continue to be achieved through publishing expanded review articles by leaders in their field.
Articles in related areas to surgery and dentistry, such as healthcare management and education, are also welcomed. We aim to educate, entertain, give insight into new surgical techniques and technology, and provide a forum for debate and discussion.