具有通道自关注的瓶颈变压器模型用于皮肤病变分类

2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-07-23 DOI:10.23919/MVA57639.2023.10215720

Masato Tada, X. Han

{"title":"具有通道自关注的瓶颈变压器模型用于皮肤病变分类","authors":"Masato Tada, X. Han","doi":"10.23919/MVA57639.2023.10215720","DOIUrl":null,"url":null,"abstract":"Early diagnosis of skin diseases is an important and challenge task for proper treatment, and even the deadliest skin cancer: the malignant melanoma can be cured for increasing the survival rate with less than 5-year life expectancy. The manual diagnosis of skin lesions by specialists not only is time-consuming but also usually causes great variation of the diagnosis results. Recently, deep learning networks with the main convolution operations have been widely employed for vision recognition including medical image analysis and classification, and demonstrated the great effectiveness. However, the convolution operation extracts the feature in the limited receptive field, and cannot capture long-range dependence for modeling global contexts. Therefore, transformer as an alternative for global feature modeling with self-attention module has become the prevalent network architecture for lifting performance in various vision tasks. This study aims to construct a hybrid skin lesion recognition model by incorporating the convolution operations and self-attention structures. Specifically, we firstly employ a backbone CNN to extract the high-level feature maps, and then leverage a transformer block to capture the global correlation. Due to the diverse contexts in channel domain and the reduced information in spatial domain of the high-level features, we alternatively incorporate a self-attention to model long-range dependencies in the channel direction instead of spatial self-attention in the conventional transformer block, and then follow spatial relation modeling with the depth-wise convolution block in the feature feed-forward module. To demonstrate the effectiveness of the proposed method, we conduct experiments on the HAM10000 and ISIC2019 skin lesion datasets, and verify the superior performance over the baseline model and the state-of-the-art methods.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Bottleneck Transformer model with Channel Self-Attention for skin lesion classification\",\"authors\":\"Masato Tada, X. Han\",\"doi\":\"10.23919/MVA57639.2023.10215720\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Early diagnosis of skin diseases is an important and challenge task for proper treatment, and even the deadliest skin cancer: the malignant melanoma can be cured for increasing the survival rate with less than 5-year life expectancy. The manual diagnosis of skin lesions by specialists not only is time-consuming but also usually causes great variation of the diagnosis results. Recently, deep learning networks with the main convolution operations have been widely employed for vision recognition including medical image analysis and classification, and demonstrated the great effectiveness. However, the convolution operation extracts the feature in the limited receptive field, and cannot capture long-range dependence for modeling global contexts. Therefore, transformer as an alternative for global feature modeling with self-attention module has become the prevalent network architecture for lifting performance in various vision tasks. This study aims to construct a hybrid skin lesion recognition model by incorporating the convolution operations and self-attention structures. Specifically, we firstly employ a backbone CNN to extract the high-level feature maps, and then leverage a transformer block to capture the global correlation. Due to the diverse contexts in channel domain and the reduced information in spatial domain of the high-level features, we alternatively incorporate a self-attention to model long-range dependencies in the channel direction instead of spatial self-attention in the conventional transformer block, and then follow spatial relation modeling with the depth-wise convolution block in the feature feed-forward module. To demonstrate the effectiveness of the proposed method, we conduct experiments on the HAM10000 and ISIC2019 skin lesion datasets, and verify the superior performance over the baseline model and the state-of-the-art methods.\",\"PeriodicalId\":338734,\"journal\":{\"name\":\"2023 18th International Conference on Machine Vision and Applications (MVA)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 18th International Conference on Machine Vision and Applications (MVA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/MVA57639.2023.10215720\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 18th International Conference on Machine Vision and Applications (MVA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/MVA57639.2023.10215720","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

皮肤病的早期诊断对于正确治疗是一项重要而具有挑战性的任务，即使是最致命的皮肤癌:恶性黑色素瘤可以治愈，以提高预期寿命不到5年的生存率。专家对皮肤病变的人工诊断不仅费时，而且往往导致诊断结果的差异很大。近年来，以卷积操作为主的深度学习网络已被广泛应用于医学图像分析和分类等视觉识别领域，并显示出极大的有效性。然而，卷积运算在有限的接受野中提取特征，并且不能捕获全局上下文建模的远程依赖性。因此，变压器作为具有自关注模块的全局特征建模的替代方案，已成为提升各种视觉任务性能的主流网络架构。本研究旨在将卷积运算与自注意结构相结合，构建一种混合皮肤损伤识别模型。具体来说，我们首先使用主干CNN来提取高级特征映射，然后利用变压器块来捕获全局相关性。由于通道域环境的多样性和高阶特征空间域信息的减少，我们在通道方向上引入自关注来建模远程依赖关系，而不是在传统的变压器块中引入空间自关注，然后在特征前馈模块中使用深度卷积块进行空间关系建模。为了证明该方法的有效性，我们在HAM10000和ISIC2019皮肤病变数据集上进行了实验，并验证了其优于基线模型和最先进方法的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Bottleneck Transformer model with Channel Self-Attention for skin lesion classification

Early diagnosis of skin diseases is an important and challenge task for proper treatment, and even the deadliest skin cancer: the malignant melanoma can be cured for increasing the survival rate with less than 5-year life expectancy. The manual diagnosis of skin lesions by specialists not only is time-consuming but also usually causes great variation of the diagnosis results. Recently, deep learning networks with the main convolution operations have been widely employed for vision recognition including medical image analysis and classification, and demonstrated the great effectiveness. However, the convolution operation extracts the feature in the limited receptive field, and cannot capture long-range dependence for modeling global contexts. Therefore, transformer as an alternative for global feature modeling with self-attention module has become the prevalent network architecture for lifting performance in various vision tasks. This study aims to construct a hybrid skin lesion recognition model by incorporating the convolution operations and self-attention structures. Specifically, we firstly employ a backbone CNN to extract the high-level feature maps, and then leverage a transformer block to capture the global correlation. Due to the diverse contexts in channel domain and the reduced information in spatial domain of the high-level features, we alternatively incorporate a self-attention to model long-range dependencies in the channel direction instead of spatial self-attention in the conventional transformer block, and then follow spatial relation modeling with the depth-wise convolution block in the feature feed-forward module. To demonstrate the effectiveness of the proposed method, we conduct experiments on the HAM10000 and ISIC2019 skin lesion datasets, and verify the superior performance over the baseline model and the state-of-the-art methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 18th International Conference on Machine Vision and Applications (MVA)

自引率

0.00%

发文量