{"title":"A Cross-layer Self-attention Learning Network for Fine-grained Classification","authors":"Jianhua Chen, Songsen Yu, Junle Liang","doi":"10.1109/ICCECE58074.2023.10135230","DOIUrl":null,"url":null,"abstract":"Fine-grained image classification refers to the more fine-grained sub-categories division based on the basic categories that have been divided. It has become a very challenging research task, due to the characteristics of data with large inter-class differences and small intra-class differences. This paper proposes a cross-layer self-attention (CS) network for learning refined discriminative image features across layers. The network consists of a backbone and a cross-layer self-attention module including three submodules, i.e., cross-layer channel attention, cross-layer space attention and feature fusion submodules. Cross-layer channel attention module can bring a channel self-attention by interacting information between low-layer and high-layer in convolutional networks and then load the channel self-attention into low-level to obtain finer low-level features. Cross-layer spatial attention module has similar effect and can obtain finer low level features in the spatial dimension. The feature fusion module fuses low-level features with high-level features where low-level features can be obtained through combining channel and spatial features. The experiments on three benchmark datasets show that the network based on backbone ResNet101 outperform the most mainstream models on the classification accuracy.","PeriodicalId":120030,"journal":{"name":"2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCECE58074.2023.10135230","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Fine-grained image classification refers to the more fine-grained sub-categories division based on the basic categories that have been divided. It has become a very challenging research task, due to the characteristics of data with large inter-class differences and small intra-class differences. This paper proposes a cross-layer self-attention (CS) network for learning refined discriminative image features across layers. The network consists of a backbone and a cross-layer self-attention module including three submodules, i.e., cross-layer channel attention, cross-layer space attention and feature fusion submodules. Cross-layer channel attention module can bring a channel self-attention by interacting information between low-layer and high-layer in convolutional networks and then load the channel self-attention into low-level to obtain finer low-level features. Cross-layer spatial attention module has similar effect and can obtain finer low level features in the spatial dimension. The feature fusion module fuses low-level features with high-level features where low-level features can be obtained through combining channel and spatial features. The experiments on three benchmark datasets show that the network based on backbone ResNet101 outperform the most mainstream models on the classification accuracy.