{"title":"Combining the Variational and Deep Learning Techniques for Classification of Video Capsule Endoscopic Images.","authors":"Bhavana Singh, Pushpendra Kumar, Shailendra Kumar Jain","doi":"10.1007/s10278-024-01352-y","DOIUrl":null,"url":null,"abstract":"<p><p>Gastrointestinal tract-related cancers pose a significant health burden, with high mortality rates. In order to detect the anomalies of the gastrointestinal tract that may progress to cancer, a video capsule endoscopy procedure is employed. The number of video capsule endoscopic ( <math><mi>VCE</mi></math> ) images produced per examination is enormous, which necessitates hours of analysis by clinicians. Therefore, there is a pressing need for automated computer-aided lesion classification techniques. Computer-aided systems utilize deep learning (DL) techniques, as they can potentially enhance anomaly detection rates. However, most of the DL techniques available in the literature utilizes the static frames for the classification purpose, which uses only the spatial information of the image. In addition, they only perform binary classification. Thus, the presented work proposes a framework to perform multi-class classification of <math><mi>VCE</mi></math> images by using the dynamic information of the images. The proposed algorithm is a combination of the fractional order variational model and the DL model. The fractional order variational model captures the dynamic information of <math><mi>VCE</mi></math> images by estimating optical flow color maps. Optical flow color maps are fed to the DL model for training. The DL model performs the multi-class classification task and localizes the region of interest with the maximum class score. DL model is inspired by the Faster RCNN approach, and its backbone architecture is EfficientNet B0. The proposed framework achieves the average AUC value of 0.98, mAP value of 0.93, and 0.878 as balanced accuracy value. Hence, the proposed model is efficient in <math><mi>VCE</mi></math> image classification and detection of region of interest.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of imaging informatics in medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10278-024-01352-y","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Gastrointestinal tract-related cancers pose a significant health burden, with high mortality rates. In order to detect the anomalies of the gastrointestinal tract that may progress to cancer, a video capsule endoscopy procedure is employed. The number of video capsule endoscopic ( ) images produced per examination is enormous, which necessitates hours of analysis by clinicians. Therefore, there is a pressing need for automated computer-aided lesion classification techniques. Computer-aided systems utilize deep learning (DL) techniques, as they can potentially enhance anomaly detection rates. However, most of the DL techniques available in the literature utilizes the static frames for the classification purpose, which uses only the spatial information of the image. In addition, they only perform binary classification. Thus, the presented work proposes a framework to perform multi-class classification of images by using the dynamic information of the images. The proposed algorithm is a combination of the fractional order variational model and the DL model. The fractional order variational model captures the dynamic information of images by estimating optical flow color maps. Optical flow color maps are fed to the DL model for training. The DL model performs the multi-class classification task and localizes the region of interest with the maximum class score. DL model is inspired by the Faster RCNN approach, and its backbone architecture is EfficientNet B0. The proposed framework achieves the average AUC value of 0.98, mAP value of 0.93, and 0.878 as balanced accuracy value. Hence, the proposed model is efficient in image classification and detection of region of interest.