{"title":"Automatic diagnosis of abdominal pathologies in untrimmed ultrasound videos.","authors":"Güinther Saibro, Yvonne Keeza, Benoît Sauer, Jacques Marescaux, Michele Diana, Alexandre Hostettler, Toby Collins","doi":"10.1007/s11548-025-03334-z","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Despite major advances in Computer Assisted Diagnosis (CAD), the need for carefully labeled training data remains an important clinical translation barrier. This work aims to overcome this barrier for ultrasound video-based CAD, using video-level classification labels combined with a novel training strategy to improve the generalization performance of state-of-the-art (SOTA) video classifiers.</p><p><strong>Methods: </strong>SOTA video classifiers were trained and evaluated on a novel ultrasound video dataset of liver and kidney pathologies, and they all struggled to generalize, especially for kidney pathologies. A new training strategy is presented, wherein a frame relevance assessor is trained to score the video frames in a video by diagnostic relevance. This is used to automatically generate diagnostically-relevant video clips (DR-Clips), which guide a video classifier during training and inference.</p><p><strong>Results: </strong>Using DR-Clips with a Video Swin Transformer, we achieved a 0.92 ROC-AUC for kidney pathology detection in videos, compared to 0.72 ROC-AUC with a Swin Transformer and standard video clips. For liver steatosis detection, due to the diffuse nature of the pathology, the Video Swin Transformer, and other video classifiers, performed similarly well, generally exceeding a 0.92 ROC-AUC.</p><p><strong>Conclusion: </strong>In theory, video classifiers, such as video transformers, should be able to solve ultrasound CAD tasks with video labels. However, in practice, video labels provide weaker supervision compared to image labels, resulting in worse generalization, as demonstrated. The additional frame guidance provided by DR-Clips enhances performance significantly. The results highlight current limits and opportunities to improve frame guidance.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Assisted Radiology and Surgery","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s11548-025-03334-z","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: Despite major advances in Computer Assisted Diagnosis (CAD), the need for carefully labeled training data remains an important clinical translation barrier. This work aims to overcome this barrier for ultrasound video-based CAD, using video-level classification labels combined with a novel training strategy to improve the generalization performance of state-of-the-art (SOTA) video classifiers.
Methods: SOTA video classifiers were trained and evaluated on a novel ultrasound video dataset of liver and kidney pathologies, and they all struggled to generalize, especially for kidney pathologies. A new training strategy is presented, wherein a frame relevance assessor is trained to score the video frames in a video by diagnostic relevance. This is used to automatically generate diagnostically-relevant video clips (DR-Clips), which guide a video classifier during training and inference.
Results: Using DR-Clips with a Video Swin Transformer, we achieved a 0.92 ROC-AUC for kidney pathology detection in videos, compared to 0.72 ROC-AUC with a Swin Transformer and standard video clips. For liver steatosis detection, due to the diffuse nature of the pathology, the Video Swin Transformer, and other video classifiers, performed similarly well, generally exceeding a 0.92 ROC-AUC.
Conclusion: In theory, video classifiers, such as video transformers, should be able to solve ultrasound CAD tasks with video labels. However, in practice, video labels provide weaker supervision compared to image labels, resulting in worse generalization, as demonstrated. The additional frame guidance provided by DR-Clips enhances performance significantly. The results highlight current limits and opportunities to improve frame guidance.
期刊介绍:
The International Journal for Computer Assisted Radiology and Surgery (IJCARS) is a peer-reviewed journal that provides a platform for closing the gap between medical and technical disciplines, and encourages interdisciplinary research and development activities in an international environment.