Rahul Gomes , Tyler Pham , Nichol He , Connor Kamrowski , Joseph Wildenberg
{"title":"Analysis of Swin-UNet vision transformer for Inferior Vena Cava filter segmentation from CT scans","authors":"Rahul Gomes , Tyler Pham , Nichol He , Connor Kamrowski , Joseph Wildenberg","doi":"10.1016/j.ailsci.2023.100084","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><p>The purpose of this study is to develop an accurate deep learning model capable of Inferior Vena Cava (IVC) filter segmentation from CT scans. The study does a comparative assessment of the impact of Residual Networks (ResNets) complemented with reduced convolutional layer depth and also analyzes the impact of using vision transformer architectures without performance degradation.</p></div><div><h3>Materials and Methods</h3><p>This experimental retrospective study on 84 CT scans consisting of 54618 slices involves design, implementation, and evaluation of segmentation algorithm which can be used to generate a clinical report for the presence of IVC filters on abdominal CT scans performed for any reason. Several variants of patch-based 3D-Convolutional Neural Network (CNN) and the Swin UNet Transformer (Swin-UNETR) are used to retrieve the signature of IVC filters. The Dice Score is used as a metric to compare the performance of the segmentation models.</p></div><div><h3>Results</h3><p>Model trained on UNet variant using four ResNet layers showed a higher segmentation performance achieving median Dice = 0.92 [Interquartile range(IQR): 0.85, 0.93] compared to the plain UNet model with four layers having median Dice = 0.89 [IQR: 0.83, 0.92]. Segmentation results from ResNet with two layers achieved a median Dice = 0.93 [IQR: 0.87, 0.94] which was higher than the plain UNet model with two layers at median Dice = 0.87 [IQR: 0.77, 0.90]. Models trained using SWIN-based transformers performed significantly better in both training and validation datasets compared to the four CNN variants. The validation median Dice was highest in 4 layer Swin UNETR at 0.88 followed by 2 layer Swin UNETR at 0.85.</p></div><div><h3>Conclusion</h3><p>Utilization of vision based transformer Swin-UNETR results in segmentation output with both low bias and variance thereby solving a real-world problem within healthcare for advanced Artificial Intelligence (AI) image processing and recognition. The Swin UNETR will reduce the time spent manually tracking IVC filters by centralizing within the electronic health record. Link to <span>GitHub</span><svg><path></path></svg> repository.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial intelligence in the life sciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667318523000284","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose
The purpose of this study is to develop an accurate deep learning model capable of Inferior Vena Cava (IVC) filter segmentation from CT scans. The study does a comparative assessment of the impact of Residual Networks (ResNets) complemented with reduced convolutional layer depth and also analyzes the impact of using vision transformer architectures without performance degradation.
Materials and Methods
This experimental retrospective study on 84 CT scans consisting of 54618 slices involves design, implementation, and evaluation of segmentation algorithm which can be used to generate a clinical report for the presence of IVC filters on abdominal CT scans performed for any reason. Several variants of patch-based 3D-Convolutional Neural Network (CNN) and the Swin UNet Transformer (Swin-UNETR) are used to retrieve the signature of IVC filters. The Dice Score is used as a metric to compare the performance of the segmentation models.
Results
Model trained on UNet variant using four ResNet layers showed a higher segmentation performance achieving median Dice = 0.92 [Interquartile range(IQR): 0.85, 0.93] compared to the plain UNet model with four layers having median Dice = 0.89 [IQR: 0.83, 0.92]. Segmentation results from ResNet with two layers achieved a median Dice = 0.93 [IQR: 0.87, 0.94] which was higher than the plain UNet model with two layers at median Dice = 0.87 [IQR: 0.77, 0.90]. Models trained using SWIN-based transformers performed significantly better in both training and validation datasets compared to the four CNN variants. The validation median Dice was highest in 4 layer Swin UNETR at 0.88 followed by 2 layer Swin UNETR at 0.85.
Conclusion
Utilization of vision based transformer Swin-UNETR results in segmentation output with both low bias and variance thereby solving a real-world problem within healthcare for advanced Artificial Intelligence (AI) image processing and recognition. The Swin UNETR will reduce the time spent manually tracking IVC filters by centralizing within the electronic health record. Link to GitHub repository.
Artificial intelligence in the life sciencesPharmacology, Biochemistry, Genetics and Molecular Biology (General), Computer Science Applications, Health Informatics, Drug Discovery, Veterinary Science and Veterinary Medicine (General)