Numerous transformer-based medical image segmentation methods have been proposed and achieved good segmentation results. However, it is still a challenge to train and deploy transformer networks to mobile medical devices due to a large number of model parameters. To resolve the training and model parameter problems, in this paper, we propose a Transformer-based network for Medical Image Segmentation using Knowledge Distillation named MISTKD. The MISTKD consists of a teacher network and a student network. It achieves comparable performance to state-of-the-art transformer works using fewer parameters by employing the teacher network to train the student network. The training can be implemented by extracting the sequence in the teacher and student encoder networks during the training procedure. The losses between sequences are further calculated, thus the student network can learn from the teacher network. The experimental results on Synapse show that the proposed work achieves competitive performance using only one-eighth parameters.
{"title":"Medical Image Segmentation Approach via Transformer Knowledge Distillation","authors":"Tianshu Zhang, Hao Wang, K. Lam, Chi-Yin Chow","doi":"10.1145/3596286.3596292","DOIUrl":"https://doi.org/10.1145/3596286.3596292","url":null,"abstract":"Numerous transformer-based medical image segmentation methods have been proposed and achieved good segmentation results. However, it is still a challenge to train and deploy transformer networks to mobile medical devices due to a large number of model parameters. To resolve the training and model parameter problems, in this paper, we propose a Transformer-based network for Medical Image Segmentation using Knowledge Distillation named MISTKD. The MISTKD consists of a teacher network and a student network. It achieves comparable performance to state-of-the-art transformer works using fewer parameters by employing the teacher network to train the student network. The training can be implemented by extracting the sequence in the teacher and student encoder networks during the training procedure. The losses between sequences are further calculated, thus the student network can learn from the teacher network. The experimental results on Synapse show that the proposed work achieves competitive performance using only one-eighth parameters.","PeriodicalId":208318,"journal":{"name":"Proceedings of the 2023 Asia Conference on Computer Vision, Image Processing and Pattern Recognition","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121873756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To improve the prediction accuracy of remaining useful life (RUL), a deep learning method coupled with clustering analysis is proposed. K-means clustering algorithm is employed to analyze the operation settings in data set for matching different operating conditions, and a wise operation mechanism is utilized to normalize the sensor data and match the operation history corresponding to the time instances. The deep convolutional neural network (DCNN) architecture is constructed, which adopts time-sliding window-based sequence as network input. Moreover, it does not require expertise in prediction and signal processing. The CMAPSS dataset published by NASA is used for case study. The proposed approach is validated by comparing with other approaches. The results indicate its superiority on prediction performance of RUL for aeroengine.
{"title":"Remaining useful life prediction via K-means clustering analysis and deep convolutional neural network","authors":"Yuru Zhang, Chun-Ming Su, Jiajun Wu","doi":"10.1145/3596286.3596297","DOIUrl":"https://doi.org/10.1145/3596286.3596297","url":null,"abstract":"To improve the prediction accuracy of remaining useful life (RUL), a deep learning method coupled with clustering analysis is proposed. K-means clustering algorithm is employed to analyze the operation settings in data set for matching different operating conditions, and a wise operation mechanism is utilized to normalize the sensor data and match the operation history corresponding to the time instances. The deep convolutional neural network (DCNN) architecture is constructed, which adopts time-sliding window-based sequence as network input. Moreover, it does not require expertise in prediction and signal processing. The CMAPSS dataset published by NASA is used for case study. The proposed approach is validated by comparing with other approaches. The results indicate its superiority on prediction performance of RUL for aeroengine.","PeriodicalId":208318,"journal":{"name":"Proceedings of the 2023 Asia Conference on Computer Vision, Image Processing and Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131004010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Robustness against real-world distribution shifts is crucial for the successful deployment of object detection models in practical applications. In this paper, we address the problem of assessing and enhancing the robustness of object detection models against natural perturbations, such as varying lighting conditions, blur, and brightness. We analyze four state-of-the-art deep neural network models, Detr-ResNet-101, Detr-ResNet-50, YOLOv4, and YOLOv4-tiny, using the COCO 2017 dataset and ExDark dataset. By simulating synthetic perturbations with the AugLy package, we systematically explore the optimal level of synthetic perturbation required to improve the models’ robustness through data augmentation techniques. Our comprehensive ablation study meticulously evaluates the impact of synthetic perturbations on object detection models’ performance against real-world distribution shifts, establishing a tangible connection between synthetic augmentation and real-world robustness. Our findings not only substantiate the effectiveness of synthetic perturbations in improving model robustness, but also provide valuable insights for researchers and practitioners in developing more robust and reliable object detection models tailored for real-world applications.
{"title":"Improving Object Detection Robustness against Natural Perturbations through Synthetic Data Augmentation","authors":"N. Premakumara, Brian Jalaian, N. Suri, H. Samani","doi":"10.1145/3596286.3596293","DOIUrl":"https://doi.org/10.1145/3596286.3596293","url":null,"abstract":"Robustness against real-world distribution shifts is crucial for the successful deployment of object detection models in practical applications. In this paper, we address the problem of assessing and enhancing the robustness of object detection models against natural perturbations, such as varying lighting conditions, blur, and brightness. We analyze four state-of-the-art deep neural network models, Detr-ResNet-101, Detr-ResNet-50, YOLOv4, and YOLOv4-tiny, using the COCO 2017 dataset and ExDark dataset. By simulating synthetic perturbations with the AugLy package, we systematically explore the optimal level of synthetic perturbation required to improve the models’ robustness through data augmentation techniques. Our comprehensive ablation study meticulously evaluates the impact of synthetic perturbations on object detection models’ performance against real-world distribution shifts, establishing a tangible connection between synthetic augmentation and real-world robustness. Our findings not only substantiate the effectiveness of synthetic perturbations in improving model robustness, but also provide valuable insights for researchers and practitioners in developing more robust and reliable object detection models tailored for real-world applications.","PeriodicalId":208318,"journal":{"name":"Proceedings of the 2023 Asia Conference on Computer Vision, Image Processing and Pattern Recognition","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133466973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hrishikesh Singh Yadav, Priyanshu Panchal, Divyanshu Manawat, G. S, S. S
The SAR image semantic segmentation using computer vision techniques has gained much popularity in the research community due to their wide applications. Despite the advancements in Deep Learning for image analysis, these models still struggle to segment SAR images due to the existence of speckle noise and a poor feature extractor. Moreover, deep learning models are challenging to train on small datasets and the performance of the model is significantly impacted by the quality of the data. This calls for the development of an effective network that can draw out critical information from the low resolution SAR images. In this regard, the present work proposes a unique Self attention module in U-Net for the semantic segmentation of low resolution SAR images.. The Self Attention Model makes use of Laplacian kernel to highlight the sharp discontinuities in the features that define the boundaries of the objects. The proposed model, employs dilated convolution layers at the initial layers, enabling the model to more effectively capture larger contextual information. With an accuracy of 0.84 and an F1-score of 0.83, the proposed model outperforms the state-of-the-art techniques in semantic segmentation of low resolution SAR images. The results clearly demonstrate the importance of the self attention module and the consideration of dilated convolution layers in the initial layers in semantic segmentation of low resolution SAR images.
{"title":"Self Attention in U-Net for Semantic Segmentation of Low Resolution SAR Images","authors":"Hrishikesh Singh Yadav, Priyanshu Panchal, Divyanshu Manawat, G. S, S. S","doi":"10.1145/3596286.3596291","DOIUrl":"https://doi.org/10.1145/3596286.3596291","url":null,"abstract":"The SAR image semantic segmentation using computer vision techniques has gained much popularity in the research community due to their wide applications. Despite the advancements in Deep Learning for image analysis, these models still struggle to segment SAR images due to the existence of speckle noise and a poor feature extractor. Moreover, deep learning models are challenging to train on small datasets and the performance of the model is significantly impacted by the quality of the data. This calls for the development of an effective network that can draw out critical information from the low resolution SAR images. In this regard, the present work proposes a unique Self attention module in U-Net for the semantic segmentation of low resolution SAR images.. The Self Attention Model makes use of Laplacian kernel to highlight the sharp discontinuities in the features that define the boundaries of the objects. The proposed model, employs dilated convolution layers at the initial layers, enabling the model to more effectively capture larger contextual information. With an accuracy of 0.84 and an F1-score of 0.83, the proposed model outperforms the state-of-the-art techniques in semantic segmentation of low resolution SAR images. The results clearly demonstrate the importance of the self attention module and the consideration of dilated convolution layers in the initial layers in semantic segmentation of low resolution SAR images.","PeriodicalId":208318,"journal":{"name":"Proceedings of the 2023 Asia Conference on Computer Vision, Image Processing and Pattern Recognition","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124950825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 2023 Asia Conference on Computer Vision, Image Processing and Pattern Recognition","authors":"","doi":"10.1145/3596286","DOIUrl":"https://doi.org/10.1145/3596286","url":null,"abstract":"","PeriodicalId":208318,"journal":{"name":"Proceedings of the 2023 Asia Conference on Computer Vision, Image Processing and Pattern Recognition","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127051015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}