{"title":"Prior-Structure Driven Weakly-Supervised Learning for Fine-Grained Human Parsing","authors":"Huaqing Hao;Weibin Liu;Weiwei Xing","doi":"10.1109/TCSVT.2024.3454171","DOIUrl":null,"url":null,"abstract":"Weakly-supervised fine-grained human parsing, which decomposes the human body into several parts and various fashion items only with some easier labels, poses a more challenging visual task and cannot be well solved by general weakly-supervised approaches. In this case, we first explore the feasibility of utilizing point-level labels to address this task. Toward this, we propose the prior-structure driven weakly-supervised learning for fine-grained human parsing. Following previous practices, we design a pseudo label initialization mechanism to produce high-quality pixel-level pseudo labels by utilizing the powerful image segmentation model Segment Anything Model (SAM). Then we propose the Feature Propagation based on Prior-Structure (FPPS) module which formalizes prior-structure knowledge as an adjacency matrix constructed from superpixel and emploies a learnable Graph Neural Network (GNN) as the feature propagator. FPPS can optimize the features of unlabeled pixels to enhance the weakly-supervised learning. The framework further designs the Refinement Pseudo Label (RPL) strategy to generate denser supervision from past sub-optimal models. To the best knowledge, this work is the first attempt to perform fine-grained human parsing in a weakly-supervised manner. We conduct extensive experiments on two challenging fine-grained datasets, including ATR and LIP. Experimental results show that the proposed weakly-supervised method yields a comparable result to strongly-supervised methods and even outperforms other state-of-the-art approaches in semi-supervised human parsing tasks.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 1","pages":"461-476"},"PeriodicalIF":11.1000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10663741/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Weakly-supervised fine-grained human parsing, which decomposes the human body into several parts and various fashion items only with some easier labels, poses a more challenging visual task and cannot be well solved by general weakly-supervised approaches. In this case, we first explore the feasibility of utilizing point-level labels to address this task. Toward this, we propose the prior-structure driven weakly-supervised learning for fine-grained human parsing. Following previous practices, we design a pseudo label initialization mechanism to produce high-quality pixel-level pseudo labels by utilizing the powerful image segmentation model Segment Anything Model (SAM). Then we propose the Feature Propagation based on Prior-Structure (FPPS) module which formalizes prior-structure knowledge as an adjacency matrix constructed from superpixel and emploies a learnable Graph Neural Network (GNN) as the feature propagator. FPPS can optimize the features of unlabeled pixels to enhance the weakly-supervised learning. The framework further designs the Refinement Pseudo Label (RPL) strategy to generate denser supervision from past sub-optimal models. To the best knowledge, this work is the first attempt to perform fine-grained human parsing in a weakly-supervised manner. We conduct extensive experiments on two challenging fine-grained datasets, including ATR and LIP. Experimental results show that the proposed weakly-supervised method yields a comparable result to strongly-supervised methods and even outperforms other state-of-the-art approaches in semi-supervised human parsing tasks.
弱监督的细粒度人体解析,将人体分解成几个部分和各种时尚单品,只有一些简单的标签,这是一个更具挑战性的视觉任务,一般的弱监督方法无法很好地解决。在这种情况下,我们首先探索利用点级标签来解决此任务的可行性。为此,我们提出了先验结构驱动的弱监督学习,用于细粒度的人类解析。根据以往的实践,我们设计了一种伪标签初始化机制,利用强大的图像分割模型Segment Anything model (SAM)来生成高质量的像素级伪标签。然后,我们提出了基于先验结构的特征传播(FPPS)模块,该模块将先验结构知识形式化为由超像素构建的邻接矩阵,并采用可学习的图神经网络(GNN)作为特征传播器。FPPS可以优化未标记像素的特征,增强弱监督学习。该框架进一步设计了细化伪标签(RPL)策略,从过去的次优模型中生成更密集的监督。据我所知,这项工作是第一次尝试以弱监督的方式执行细粒度的人工解析。我们在两个具有挑战性的细粒度数据集上进行了广泛的实验,包括ATR和LIP。实验结果表明,提出的弱监督方法产生的结果与强监督方法相当,甚至在半监督人类解析任务中优于其他最先进的方法。
期刊介绍:
The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.