去基础分层多粒度分类法

IF 4.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Computer Vision and Image Understanding Pub Date : 2024-08-20 DOI:10.1016/j.cviu.2024.104108

Ziyu Zhao, Leilei Gan, Tao Shen, Kun Kuang, Fei Wu

{"title":"去基础分层多粒度分类法","authors":"Ziyu Zhao, Leilei Gan, Tao Shen, Kun Kuang, Fei Wu","doi":"10.1016/j.cviu.2024.104108","DOIUrl":null,"url":null,"abstract":"<div><p>Hierarchical multi-granularity classification (HMC) assigns labels at varying levels of detail to images using a structured hierarchy that categorizes labels from coarse to fine, such as [“Suliformes”, “Fregatidae”, “Frigatebird”]. Traditional HMC methods typically integrate hierarchical label information into either the model’s architecture or its loss function. However, these approaches often overlook the spurious correlations between coarse-level semantic information and fine-grained labels, which can lead models to rely on these non-causal relationships for making predictions. In this paper, we adopt a causal perspective to address the challenges in HMC, demonstrating how coarse-grained semantics can serve as confounders in fine-grained classification. To comprehensively mitigate confounding bias in HMC, we introduce a novel framework, Deconf-HMC, which consists of three main components: (1) a causal-inspired label prediction module that combines fine-level features with coarse-level prediction outcomes to determine the appropriate labels at each hierarchical level; (2) a representation disentanglement module that minimizes the mutual information between representations of different granularities; and (3) an adversarial training module that restricts the predictive influence of coarse-level representations on fine-level labels, thereby aiming to eliminate confounding bias. Extensive experiments on three widely used datasets demonstrate the superiority of our approach over existing state-of-the-art HMC methods.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"248 ","pages":"Article 104108"},"PeriodicalIF":4.3000,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deconfounded hierarchical multi-granularity classification\",\"authors\":\"Ziyu Zhao, Leilei Gan, Tao Shen, Kun Kuang, Fei Wu\",\"doi\":\"10.1016/j.cviu.2024.104108\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Hierarchical multi-granularity classification (HMC) assigns labels at varying levels of detail to images using a structured hierarchy that categorizes labels from coarse to fine, such as [“Suliformes”, “Fregatidae”, “Frigatebird”]. Traditional HMC methods typically integrate hierarchical label information into either the model’s architecture or its loss function. However, these approaches often overlook the spurious correlations between coarse-level semantic information and fine-grained labels, which can lead models to rely on these non-causal relationships for making predictions. In this paper, we adopt a causal perspective to address the challenges in HMC, demonstrating how coarse-grained semantics can serve as confounders in fine-grained classification. To comprehensively mitigate confounding bias in HMC, we introduce a novel framework, Deconf-HMC, which consists of three main components: (1) a causal-inspired label prediction module that combines fine-level features with coarse-level prediction outcomes to determine the appropriate labels at each hierarchical level; (2) a representation disentanglement module that minimizes the mutual information between representations of different granularities; and (3) an adversarial training module that restricts the predictive influence of coarse-level representations on fine-level labels, thereby aiming to eliminate confounding bias. Extensive experiments on three widely used datasets demonstrate the superiority of our approach over existing state-of-the-art HMC methods.</p></div>\",\"PeriodicalId\":50633,\"journal\":{\"name\":\"Computer Vision and Image Understanding\",\"volume\":\"248 \",\"pages\":\"Article 104108\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Vision and Image Understanding\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1077314224001899\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314224001899","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

分层多粒度分类法（HMC）通过结构化的层次结构，将标签从粗到细进行分类，如["Suliformes"、"Fregatidae"、"Frigatebird"]，从而为图像分配不同详细程度的标签。传统的 HMC 方法通常会将层次标签信息整合到模型结构或损失函数中。然而，这些方法往往忽略了粗粒度语义信息与细粒度标签之间的虚假相关性，从而导致模型依赖这些非因果关系进行预测。在本文中，我们采用因果视角来应对 HMC 中的挑战，展示了粗粒度语义如何成为细粒度分类的混杂因素。为了全面缓解 HMC 中的混杂偏差，我们引入了一个新颖的框架--Deconf-HMC，它由三个主要部分组成：(1) 一个因果启发标签预测模块，该模块将细粒度特征与粗粒度预测结果相结合，以确定每个层次上的适当标签；(2) 一个表征解离模块，该模块最小化不同粒度表征之间的互信息；(3) 一个对抗训练模块，该模块限制粗粒度表征对细粒度标签的预测影响，从而消除混杂偏差。在三个广泛使用的数据集上进行的大量实验证明，我们的方法优于现有的最先进的 HMC 方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Deconfounded hierarchical multi-granularity classification

Hierarchical multi-granularity classification (HMC) assigns labels at varying levels of detail to images using a structured hierarchy that categorizes labels from coarse to fine, such as [“Suliformes”, “Fregatidae”, “Frigatebird”]. Traditional HMC methods typically integrate hierarchical label information into either the model’s architecture or its loss function. However, these approaches often overlook the spurious correlations between coarse-level semantic information and fine-grained labels, which can lead models to rely on these non-causal relationships for making predictions. In this paper, we adopt a causal perspective to address the challenges in HMC, demonstrating how coarse-grained semantics can serve as confounders in fine-grained classification. To comprehensively mitigate confounding bias in HMC, we introduce a novel framework, Deconf-HMC, which consists of three main components: (1) a causal-inspired label prediction module that combines fine-level features with coarse-level prediction outcomes to determine the appropriate labels at each hierarchical level; (2) a representation disentanglement module that minimizes the mutual information between representations of different granularities; and (3) an adversarial training module that restricts the predictive influence of coarse-level representations on fine-level labels, thereby aiming to eliminate confounding bias. Extensive experiments on three widely used datasets demonstrate the superiority of our approach over existing state-of-the-art HMC methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Vision and Image Understanding 工程技术-工程：电子与电气

CiteScore

7.80

自引率

4.40%

发文量

112

审稿时长

79 days

期刊介绍： The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems

期刊最新文献

Editorial Board Incremental few-shot instance segmentation without fine-tuning on novel classes Navigating social contexts: A transformer approach to relationship recognition View-to-label: Multi-view consistency for self-supervised monocular 3D object detection Joint image-instance spatial–temporal attention for few-shot action recognition