具有通道自关注的瓶颈变压器模型用于皮肤病变分类

Masato Tada, X. Han
{"title":"具有通道自关注的瓶颈变压器模型用于皮肤病变分类","authors":"Masato Tada, X. Han","doi":"10.23919/MVA57639.2023.10215720","DOIUrl":null,"url":null,"abstract":"Early diagnosis of skin diseases is an important and challenge task for proper treatment, and even the deadliest skin cancer: the malignant melanoma can be cured for increasing the survival rate with less than 5-year life expectancy. The manual diagnosis of skin lesions by specialists not only is time-consuming but also usually causes great variation of the diagnosis results. Recently, deep learning networks with the main convolution operations have been widely employed for vision recognition including medical image analysis and classification, and demonstrated the great effectiveness. However, the convolution operation extracts the feature in the limited receptive field, and cannot capture long-range dependence for modeling global contexts. Therefore, transformer as an alternative for global feature modeling with self-attention module has become the prevalent network architecture for lifting performance in various vision tasks. This study aims to construct a hybrid skin lesion recognition model by incorporating the convolution operations and self-attention structures. Specifically, we firstly employ a backbone CNN to extract the high-level feature maps, and then leverage a transformer block to capture the global correlation. Due to the diverse contexts in channel domain and the reduced information in spatial domain of the high-level features, we alternatively incorporate a self-attention to model long-range dependencies in the channel direction instead of spatial self-attention in the conventional transformer block, and then follow spatial relation modeling with the depth-wise convolution block in the feature feed-forward module. To demonstrate the effectiveness of the proposed method, we conduct experiments on the HAM10000 and ISIC2019 skin lesion datasets, and verify the superior performance over the baseline model and the state-of-the-art methods.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Bottleneck Transformer model with Channel Self-Attention for skin lesion classification\",\"authors\":\"Masato Tada, X. Han\",\"doi\":\"10.23919/MVA57639.2023.10215720\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Early diagnosis of skin diseases is an important and challenge task for proper treatment, and even the deadliest skin cancer: the malignant melanoma can be cured for increasing the survival rate with less than 5-year life expectancy. The manual diagnosis of skin lesions by specialists not only is time-consuming but also usually causes great variation of the diagnosis results. Recently, deep learning networks with the main convolution operations have been widely employed for vision recognition including medical image analysis and classification, and demonstrated the great effectiveness. However, the convolution operation extracts the feature in the limited receptive field, and cannot capture long-range dependence for modeling global contexts. Therefore, transformer as an alternative for global feature modeling with self-attention module has become the prevalent network architecture for lifting performance in various vision tasks. This study aims to construct a hybrid skin lesion recognition model by incorporating the convolution operations and self-attention structures. Specifically, we firstly employ a backbone CNN to extract the high-level feature maps, and then leverage a transformer block to capture the global correlation. Due to the diverse contexts in channel domain and the reduced information in spatial domain of the high-level features, we alternatively incorporate a self-attention to model long-range dependencies in the channel direction instead of spatial self-attention in the conventional transformer block, and then follow spatial relation modeling with the depth-wise convolution block in the feature feed-forward module. To demonstrate the effectiveness of the proposed method, we conduct experiments on the HAM10000 and ISIC2019 skin lesion datasets, and verify the superior performance over the baseline model and the state-of-the-art methods.\",\"PeriodicalId\":338734,\"journal\":{\"name\":\"2023 18th International Conference on Machine Vision and Applications (MVA)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 18th International Conference on Machine Vision and Applications (MVA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/MVA57639.2023.10215720\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 18th International Conference on Machine Vision and Applications (MVA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/MVA57639.2023.10215720","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

皮肤病的早期诊断对于正确治疗是一项重要而具有挑战性的任务,即使是最致命的皮肤癌:恶性黑色素瘤可以治愈,以提高预期寿命不到5年的生存率。专家对皮肤病变的人工诊断不仅费时,而且往往导致诊断结果的差异很大。近年来,以卷积操作为主的深度学习网络已被广泛应用于医学图像分析和分类等视觉识别领域,并显示出极大的有效性。然而,卷积运算在有限的接受野中提取特征,并且不能捕获全局上下文建模的远程依赖性。因此,变压器作为具有自关注模块的全局特征建模的替代方案,已成为提升各种视觉任务性能的主流网络架构。本研究旨在将卷积运算与自注意结构相结合,构建一种混合皮肤损伤识别模型。具体来说,我们首先使用主干CNN来提取高级特征映射,然后利用变压器块来捕获全局相关性。由于通道域环境的多样性和高阶特征空间域信息的减少,我们在通道方向上引入自关注来建模远程依赖关系,而不是在传统的变压器块中引入空间自关注,然后在特征前馈模块中使用深度卷积块进行空间关系建模。为了证明该方法的有效性,我们在HAM10000和ISIC2019皮肤病变数据集上进行了实验,并验证了其优于基线模型和最先进方法的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Bottleneck Transformer model with Channel Self-Attention for skin lesion classification
Early diagnosis of skin diseases is an important and challenge task for proper treatment, and even the deadliest skin cancer: the malignant melanoma can be cured for increasing the survival rate with less than 5-year life expectancy. The manual diagnosis of skin lesions by specialists not only is time-consuming but also usually causes great variation of the diagnosis results. Recently, deep learning networks with the main convolution operations have been widely employed for vision recognition including medical image analysis and classification, and demonstrated the great effectiveness. However, the convolution operation extracts the feature in the limited receptive field, and cannot capture long-range dependence for modeling global contexts. Therefore, transformer as an alternative for global feature modeling with self-attention module has become the prevalent network architecture for lifting performance in various vision tasks. This study aims to construct a hybrid skin lesion recognition model by incorporating the convolution operations and self-attention structures. Specifically, we firstly employ a backbone CNN to extract the high-level feature maps, and then leverage a transformer block to capture the global correlation. Due to the diverse contexts in channel domain and the reduced information in spatial domain of the high-level features, we alternatively incorporate a self-attention to model long-range dependencies in the channel direction instead of spatial self-attention in the conventional transformer block, and then follow spatial relation modeling with the depth-wise convolution block in the feature feed-forward module. To demonstrate the effectiveness of the proposed method, we conduct experiments on the HAM10000 and ISIC2019 skin lesion datasets, and verify the superior performance over the baseline model and the state-of-the-art methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Small Object Detection for Birds with Swin Transformer CG-based dataset generation and adversarial image conversion for deep cucumber recognition Uncertainty Criteria in Active Transfer Learning for Efficient Video-Specific Human Pose Estimation Joint Learning with Group Relation and Individual Action Diabetic Retinopathy Grading based on a Sparse Network Fusion of Heterogeneous ConvNeXt Models with Category Attention
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1