{"title":"SIM-OFE: Structure Information Mining and Object-Aware Feature Enhancement for Fine-Grained Visual Categorization","authors":"Hongbo Sun;Xiangteng He;Jinglin Xu;Yuxin Peng","doi":"10.1109/TIP.2024.3459788","DOIUrl":null,"url":null,"abstract":"Fine-grained visual categorization (FGVC) aims to distinguish visual objects from multiple subcategories of the coarse-grained category. Subtle inter-class differences among various subcategories make the FGVC task more challenging. Existing methods primarily focus on learning salient visual patterns while ignoring how to capture the object’s internal structure, causing difficulty in obtaining complete discriminative regions within the object to limit FGVC performance. To address the above issue, we propose a Structure Information Mining and Object-aware Feature Enhancement (SIM-OFE) method for fine-grained visual categorization, which explores the visual object’s internal structure composition and appearance traits. Concretely, we first propose a simple yet effective hybrid perception attention module for locating visual objects based on global-scope and local-scope significance analyses. Then, a structure information mining module is proposed to model the distribution and context relation of critical regions within the object, highlighting the whole object and discriminative regions for distinguishing subtle differences. Finally, an object-aware feature enhancement module is proposed to combine global-scope and local-scope discriminative features in an attentive coupling way for powerful visual representations in fine-grained recognition. Extensive experiments on three FGVC benchmark datasets demonstrate that our proposed SIM-OFE method can achieve state-of-the-art performance.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5312-5326"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10684043/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Fine-grained visual categorization (FGVC) aims to distinguish visual objects from multiple subcategories of the coarse-grained category. Subtle inter-class differences among various subcategories make the FGVC task more challenging. Existing methods primarily focus on learning salient visual patterns while ignoring how to capture the object’s internal structure, causing difficulty in obtaining complete discriminative regions within the object to limit FGVC performance. To address the above issue, we propose a Structure Information Mining and Object-aware Feature Enhancement (SIM-OFE) method for fine-grained visual categorization, which explores the visual object’s internal structure composition and appearance traits. Concretely, we first propose a simple yet effective hybrid perception attention module for locating visual objects based on global-scope and local-scope significance analyses. Then, a structure information mining module is proposed to model the distribution and context relation of critical regions within the object, highlighting the whole object and discriminative regions for distinguishing subtle differences. Finally, an object-aware feature enhancement module is proposed to combine global-scope and local-scope discriminative features in an attentive coupling way for powerful visual representations in fine-grained recognition. Extensive experiments on three FGVC benchmark datasets demonstrate that our proposed SIM-OFE method can achieve state-of-the-art performance.