Towards imbalanced motion: part-decoupling network for video portrait segmentation

IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Science China Information Sciences Pub Date : 2024-06-25 DOI:10.1007/s11432-023-4030-y
Tianshu Yu, Changqun Xia, Jia Li
{"title":"Towards imbalanced motion: part-decoupling network for video portrait segmentation","authors":"Tianshu Yu, Changqun Xia, Jia Li","doi":"10.1007/s11432-023-4030-y","DOIUrl":null,"url":null,"abstract":"<p>Video portrait segmentation (VPS), aiming at segmenting prominent foreground portraits from video frames, has received much attention in recent years. However, the simplicity of existing VPS datasets leads to a limitation on extensive research of the task. In this work, we propose a new intricate large-scale multi-scene video portrait segmentation dataset MVPS consisting of 101 video clips in 7 scenario categories, in which 10843 sampled frames are finely annotated at the pixel level. The dataset has diverse scenes and complicated background environments, which is the most complex dataset in VPS to our best knowledge. Through the observation of a large number of videos with portraits during dataset construction, we find that due to the joint structure of the human body, the motion of portraits is part-associated, which leads to the different parts being relatively independent in motion. That is, the motion of different parts of the portraits is imbalanced. Towards this imbalance, an intuitive and reasonable idea is that different motion states in portraits can be better exploited by decoupling the portraits into parts. To achieve this, we propose a part-decoupling network (PDNet) for VPS. Specifically, an inter-frame part-discriminated attention (IPDA) module is proposed which unsupervisedly segments portrait into parts and utilizes different attentiveness on discriminative features specified to each different part. In this way, appropriate attention can be imposed on portrait parts with imbalanced motion to extract part-discriminated correlations, so that the portraits can be segmented more accurately. Experimental results demonstrate that our method achieves leading performance with the comparison to state-of-the-art methods.</p>","PeriodicalId":21618,"journal":{"name":"Science China Information Sciences","volume":null,"pages":null},"PeriodicalIF":7.3000,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science China Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11432-023-4030-y","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Video portrait segmentation (VPS), aiming at segmenting prominent foreground portraits from video frames, has received much attention in recent years. However, the simplicity of existing VPS datasets leads to a limitation on extensive research of the task. In this work, we propose a new intricate large-scale multi-scene video portrait segmentation dataset MVPS consisting of 101 video clips in 7 scenario categories, in which 10843 sampled frames are finely annotated at the pixel level. The dataset has diverse scenes and complicated background environments, which is the most complex dataset in VPS to our best knowledge. Through the observation of a large number of videos with portraits during dataset construction, we find that due to the joint structure of the human body, the motion of portraits is part-associated, which leads to the different parts being relatively independent in motion. That is, the motion of different parts of the portraits is imbalanced. Towards this imbalance, an intuitive and reasonable idea is that different motion states in portraits can be better exploited by decoupling the portraits into parts. To achieve this, we propose a part-decoupling network (PDNet) for VPS. Specifically, an inter-frame part-discriminated attention (IPDA) module is proposed which unsupervisedly segments portrait into parts and utilizes different attentiveness on discriminative features specified to each different part. In this way, appropriate attention can be imposed on portrait parts with imbalanced motion to extract part-discriminated correlations, so that the portraits can be segmented more accurately. Experimental results demonstrate that our method achieves leading performance with the comparison to state-of-the-art methods.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
实现不平衡运动:用于视频肖像分割的部分解耦网络
视频肖像分割(VPS)旨在从视频帧中分割出突出的前景肖像,近年来受到广泛关注。然而,现有 VPS 数据集的简单性限制了对该任务的广泛研究。在这项工作中,我们提出了一个新的复杂大规模多场景视频肖像分割数据集 MVPS,该数据集由 7 个场景类别的 101 个视频片段组成,其中 10843 个采样帧在像素级别上进行了精细注释。该数据集场景多样,背景环境复杂,是目前所知 VPS 中最复杂的数据集。在数据集构建过程中,通过观察大量的人像视频,我们发现由于人体的关节结构,人像的运动是部分关联的,这导致不同部分的运动相对独立。也就是说,人像不同部位的运动是不平衡的。针对这种不平衡现象,一个直观合理的想法是,通过将肖像解耦为不同的部分,可以更好地利用肖像的不同运动状态。为此,我们提出了一种用于 VPS 的部分解耦网络(PDNet)。具体来说,我们提出了一个帧间部分区分注意力(IPDA)模块,该模块在无监督的情况下将肖像分割成不同部分,并利用对每个不同部分指定的区分特征的不同注意力。通过这种方法,可以对运动不平衡的人像部分施加适当的关注,以提取部分区分相关性,从而更准确地分割人像。实验结果表明,与最先进的方法相比,我们的方法取得了领先的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Science China Information Sciences
Science China Information Sciences COMPUTER SCIENCE, INFORMATION SYSTEMS-
CiteScore
12.60
自引率
5.70%
发文量
224
审稿时长
8.3 months
期刊介绍: Science China Information Sciences is a dedicated journal that showcases high-quality, original research across various domains of information sciences. It encompasses Computer Science & Technologies, Control Science & Engineering, Information & Communication Engineering, Microelectronics & Solid-State Electronics, and Quantum Information, providing a platform for the dissemination of significant contributions in these fields.
期刊最新文献
Weighted sum power maximization for STAR-RIS-aided SWIPT systems with nonlinear energy harvesting TSCompiler: efficient compilation framework for dynamic-shape models NeurDB: an AI-powered autonomous data system State and parameter identification of linearized water wave equation via adjoint method An STP look at logical blocking of finite state machines: formulation, detection, and search
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1