Comparison of radiological interpretation made by veterinary radiologists and state-of-the-art commercial AI software for canine and feline radiographic studies.
Yero S Ndiaye, Peter Cramton, Chavdar Chernev, Axel Ockenfels, Tobias Schwarz
{"title":"Comparison of radiological interpretation made by veterinary radiologists and state-of-the-art commercial AI software for canine and feline radiographic studies.","authors":"Yero S Ndiaye, Peter Cramton, Chavdar Chernev, Axel Ockenfels, Tobias Schwarz","doi":"10.3389/fvets.2025.1502790","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>As human medical diagnostic expertise is scarcely available, especially in veterinary care, artificial intelligence (AI) has been increasingly used as a remedy. AI's promise comes from improving human diagnostics or providing good diagnostics at lower cost, increasing access. This study analyzed the diagnostic performance of a widely used AI radiology software vs. veterinary radiologists in interpreting canine and feline radiographs. We aimed to establish whether the performance of commonly used AI matches the performance of a typical radiologist and thus can be reliably used. Secondly, we try to identify in which cases AI is effective.</p><p><strong>Methods: </strong>Fifty canine and feline radiographic studies in DICOM format were anonymized and reported by 11 board-certified veterinary radiologists (ECVDI or ACVR) and processed with commercial and widely used AI software dedicated to small animal radiography (SignalRAY<sup>®</sup>, SignalPET<sup>®</sup> Dallas, TX, USA). The AI software used a deep-learning algorithm and returned a coded <i>abnormal</i> or <i>normal</i> diagnosis for each finding in the study. The radiologists provided a written report in English. All reports' findings were coded into categories matching the codes from the AI software and classified as <i>normal</i> or <i>abnormal</i>. The sensitivity, specificity, and accuracy of each radiologist and the AI software were calculated. The variance in agreement between each radiologist and the AI software was measured to calculate the ambiguity of each radiological finding.</p><p><strong>Results: </strong>AI matched the best radiologist in accuracy and was more specific but less sensitive than human radiologists. AI did better than the median radiologist overall in low- and high-ambiguity cases. In high-ambiguity cases, AI's accuracy remained high, though it was less effective at detecting abnormalities but better at identifying normal findings. The study confirmed AI's reliability, especially in low-ambiguity scenarios.</p><p><strong>Conclusion: </strong>Our findings suggest that AI performs almost as well as the best veterinary radiologist in all settings of descriptive radiographic findings. However, its strengths lie more in confirming normality than detecting abnormalities, and it does not provide differential diagnoses. Therefore, the broader use of AI could reliably increase diagnostic availability but requires further human input. Given the unique strengths of human experts and AI and the differences in sensitivity vs. specificity and low-ambiguity vs. high-ambiguity settings, AI will likely complement rather than replace human experts.</p>","PeriodicalId":12772,"journal":{"name":"Frontiers in Veterinary Science","volume":"12 ","pages":"1502790"},"PeriodicalIF":2.6000,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11886591/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Veterinary Science","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.3389/fvets.2025.1502790","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"VETERINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: As human medical diagnostic expertise is scarcely available, especially in veterinary care, artificial intelligence (AI) has been increasingly used as a remedy. AI's promise comes from improving human diagnostics or providing good diagnostics at lower cost, increasing access. This study analyzed the diagnostic performance of a widely used AI radiology software vs. veterinary radiologists in interpreting canine and feline radiographs. We aimed to establish whether the performance of commonly used AI matches the performance of a typical radiologist and thus can be reliably used. Secondly, we try to identify in which cases AI is effective.
Methods: Fifty canine and feline radiographic studies in DICOM format were anonymized and reported by 11 board-certified veterinary radiologists (ECVDI or ACVR) and processed with commercial and widely used AI software dedicated to small animal radiography (SignalRAY®, SignalPET® Dallas, TX, USA). The AI software used a deep-learning algorithm and returned a coded abnormal or normal diagnosis for each finding in the study. The radiologists provided a written report in English. All reports' findings were coded into categories matching the codes from the AI software and classified as normal or abnormal. The sensitivity, specificity, and accuracy of each radiologist and the AI software were calculated. The variance in agreement between each radiologist and the AI software was measured to calculate the ambiguity of each radiological finding.
Results: AI matched the best radiologist in accuracy and was more specific but less sensitive than human radiologists. AI did better than the median radiologist overall in low- and high-ambiguity cases. In high-ambiguity cases, AI's accuracy remained high, though it was less effective at detecting abnormalities but better at identifying normal findings. The study confirmed AI's reliability, especially in low-ambiguity scenarios.
Conclusion: Our findings suggest that AI performs almost as well as the best veterinary radiologist in all settings of descriptive radiographic findings. However, its strengths lie more in confirming normality than detecting abnormalities, and it does not provide differential diagnoses. Therefore, the broader use of AI could reliably increase diagnostic availability but requires further human input. Given the unique strengths of human experts and AI and the differences in sensitivity vs. specificity and low-ambiguity vs. high-ambiguity settings, AI will likely complement rather than replace human experts.
期刊介绍:
Frontiers in Veterinary Science is a global, peer-reviewed, Open Access journal that bridges animal and human health, brings a comparative approach to medical and surgical challenges, and advances innovative biotechnology and therapy.
Veterinary research today is interdisciplinary, collaborative, and socially relevant, transforming how we understand and investigate animal health and disease. Fundamental research in emerging infectious diseases, predictive genomics, stem cell therapy, and translational modelling is grounded within the integrative social context of public and environmental health, wildlife conservation, novel biomarkers, societal well-being, and cutting-edge clinical practice and specialization. Frontiers in Veterinary Science brings a 21st-century approach—networked, collaborative, and Open Access—to communicate this progress and innovation to both the specialist and to the wider audience of readers in the field.
Frontiers in Veterinary Science publishes articles on outstanding discoveries across a wide spectrum of translational, foundational, and clinical research. The journal''s mission is to bring all relevant veterinary sciences together on a single platform with the goal of improving animal and human health.