Multi-Granularity Part Sampling Attention for Fine-Grained Visual Classification

Jiahui Wang;Qin Xu;Bo Jiang;Bin Luo;Jinhui Tang
{"title":"Multi-Granularity Part Sampling Attention for Fine-Grained Visual Classification","authors":"Jiahui Wang;Qin Xu;Bo Jiang;Bin Luo;Jinhui Tang","doi":"10.1109/TIP.2024.3441813","DOIUrl":null,"url":null,"abstract":"Fine-grained visual classification aims to classify similar sub-categories with the challenges of large variations within the same sub-category and high visual similarities between different sub-categories. Recently, methods that extract semantic parts of the discriminative regions have attracted increasing attention. However, most existing methods extract the part features via rectangular bounding boxes by object detection module or attention mechanism, which makes it difficult to capture the rich shape information of objects. In this paper, we propose a novel Multi-Granularity Part Sampling Attention (MPSA) network for fine-grained visual classification. First, a novel multi-granularity part retrospect block is designed to extract the part information of different scales and enhance the high-level feature representation with discriminative part features of different granularities. Then, to extract part features of various shapes at each granularity, we propose part sampling attention, which can sample the implicit semantic parts on the feature maps comprehensively. The proposed part sampling attention not only considers the importance of sampled parts but also adopts the part dropout to reduce the overfitting issue. In addition, we propose a novel multi-granularity fusion method to highlight the foreground features and suppress the background noises with the assistance of the gradient class activation map. Experimental results demonstrate that the proposed MPSA achieves state-of-the-art performance on four commonly used fine-grained visual classification benchmarks. The source code is publicly available at \n<uri>https://github.com/mobulan/MPSA</uri>\n.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10638479/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Fine-grained visual classification aims to classify similar sub-categories with the challenges of large variations within the same sub-category and high visual similarities between different sub-categories. Recently, methods that extract semantic parts of the discriminative regions have attracted increasing attention. However, most existing methods extract the part features via rectangular bounding boxes by object detection module or attention mechanism, which makes it difficult to capture the rich shape information of objects. In this paper, we propose a novel Multi-Granularity Part Sampling Attention (MPSA) network for fine-grained visual classification. First, a novel multi-granularity part retrospect block is designed to extract the part information of different scales and enhance the high-level feature representation with discriminative part features of different granularities. Then, to extract part features of various shapes at each granularity, we propose part sampling attention, which can sample the implicit semantic parts on the feature maps comprehensively. The proposed part sampling attention not only considers the importance of sampled parts but also adopts the part dropout to reduce the overfitting issue. In addition, we propose a novel multi-granularity fusion method to highlight the foreground features and suppress the background noises with the assistance of the gradient class activation map. Experimental results demonstrate that the proposed MPSA achieves state-of-the-art performance on four commonly used fine-grained visual classification benchmarks. The source code is publicly available at https://github.com/mobulan/MPSA .
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
细粒度视觉分类的多粒度部分采样注意力
细粒度视觉分类的目的是对相似的子类别进行分类,但面临的挑战是同一子类别内的差异较大,而不同子类别之间的视觉相似度较高。最近,提取分辨区域语义部分的方法越来越受到关注。然而,现有的大多数方法都是通过物体检测模块或注意力机制,通过矩形边界框提取部分特征,很难捕捉到物体丰富的形状信息。在本文中,我们提出了一种用于细粒度视觉分类的新型多粒度部分采样注意力(MPSA)网络。首先,我们设计了一个新颖的多粒度零件回溯块,以提取不同尺度的零件信息,并利用不同粒度的零件特征增强高级特征表示。然后,为了提取每个粒度上各种形状的零件特征,我们提出了零件采样注意,它可以对特征图上的隐含语义零件进行全面采样。所提出的部件采样注意不仅考虑了采样部件的重要性,还采用了部件剔除来减少过拟合问题。此外,我们还提出了一种新颖的多粒度融合方法,借助梯度类激活图突出前景特征,抑制背景噪音。实验结果表明,所提出的 MPSA 在四个常用的细粒度视觉分类基准上取得了最先进的性能。源代码可通过 https://github.com/mobulan/MPSA 公开获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Balanced Destruction-Reconstruction Dynamics for Memory-Replay Class Incremental Learning Blind Video Quality Prediction by Uncovering Human Video Perceptual Representation. Contrastive Open-set Active Learning based Sample Selection for Image Classification. Generating Stylized Features for Single-Source Cross-Dataset Palmprint Recognition With Unseen Target Dataset Learning Prompt-Enhanced Context Features for Weakly-Supervised Video Anomaly Detection
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1