Michael N Gritti, Rahil Prajapati, Dolev Yissar, Conall T Morgan
{"title":"Precision of artificial intelligence in paediatric cardiology multimodal image interpretation.","authors":"Michael N Gritti, Rahil Prajapati, Dolev Yissar, Conall T Morgan","doi":"10.1017/S1047951124036035","DOIUrl":null,"url":null,"abstract":"<p><p>Multimodal imaging is crucial for diagnosis and treatment in paediatric cardiology. However, the proficiency of artificial intelligence chatbots, like ChatGPT-4, in interpreting these images has not been assessed. This cross-sectional study evaluates the precision of ChatGPT-4 in interpreting multimodal images for paediatric cardiology knowledge assessment, including echocardiograms, angiograms, X-rays, and electrocardiograms. One hundred multiple-choice questions with accompanying images from the textbook <i>Pediatric Cardiology Board Review</i> were randomly selected. The chatbot was prompted to answer these questions with and without the accompanying images. Statistical analysis was done using <i>X</i><sup>2</sup>, Fisher's exact, and McNemar tests. Results showed that ChatGPT-4 answered 41% of questions with images correctly, performing best on those with electrocardiograms (54%) and worst on those with angiograms (29%). Without the images, ChatGPT-4's performance was similar at 37% (difference = 4%, 95% confidence interval (CI) -9.4% to 17.2%, <i>p</i> = 0.56). The chatbot performed significantly better when provided the image of an electrocardiogram than without (difference = 18, 95% CI 4.0% to 31.9%, <i>p</i> < 0.04). In cases of incorrect answers, ChatGPT-4 was more inconsistent with an image than without (difference = 21%, 95% CI 3.5% to 36.9%, <i>p</i> < 0.02). In conclusion, ChatGPT-4 performed poorly in answering image-based multiple-choice questions in paediatric cardiology. Its accuracy in answering questions with images was similar to without, indicating limited multimodal image interpretation capabilities. Substantial training is required before clinical integration can be considered. Further research is needed to assess the clinical reasoning skills and progression of ChatGPT in paediatric cardiology for clinical and academic utility.</p>","PeriodicalId":9435,"journal":{"name":"Cardiology in the Young","volume":" ","pages":"1-6"},"PeriodicalIF":0.9000,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cardiology in the Young","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1017/S1047951124036035","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"CARDIAC & CARDIOVASCULAR SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Multimodal imaging is crucial for diagnosis and treatment in paediatric cardiology. However, the proficiency of artificial intelligence chatbots, like ChatGPT-4, in interpreting these images has not been assessed. This cross-sectional study evaluates the precision of ChatGPT-4 in interpreting multimodal images for paediatric cardiology knowledge assessment, including echocardiograms, angiograms, X-rays, and electrocardiograms. One hundred multiple-choice questions with accompanying images from the textbook Pediatric Cardiology Board Review were randomly selected. The chatbot was prompted to answer these questions with and without the accompanying images. Statistical analysis was done using X2, Fisher's exact, and McNemar tests. Results showed that ChatGPT-4 answered 41% of questions with images correctly, performing best on those with electrocardiograms (54%) and worst on those with angiograms (29%). Without the images, ChatGPT-4's performance was similar at 37% (difference = 4%, 95% confidence interval (CI) -9.4% to 17.2%, p = 0.56). The chatbot performed significantly better when provided the image of an electrocardiogram than without (difference = 18, 95% CI 4.0% to 31.9%, p < 0.04). In cases of incorrect answers, ChatGPT-4 was more inconsistent with an image than without (difference = 21%, 95% CI 3.5% to 36.9%, p < 0.02). In conclusion, ChatGPT-4 performed poorly in answering image-based multiple-choice questions in paediatric cardiology. Its accuracy in answering questions with images was similar to without, indicating limited multimodal image interpretation capabilities. Substantial training is required before clinical integration can be considered. Further research is needed to assess the clinical reasoning skills and progression of ChatGPT in paediatric cardiology for clinical and academic utility.
期刊介绍:
Cardiology in the Young is devoted to cardiovascular issues affecting the young, and the older patient suffering the sequels of congenital heart disease, or other cardiac diseases acquired in childhood. The journal serves the interests of all professionals concerned with these topics. By design, the journal is international and multidisciplinary in its approach, and members of the editorial board take an active role in the its mission, helping to make it the essential journal in paediatric cardiology. All aspects of paediatric cardiology are covered within the journal. The content includes original articles, brief reports, editorials, reviews, and papers devoted to continuing professional development.