Part2Pose：从复杂场景中的部分推断人体姿势

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Signal Processing Letters Pub Date : 2024-12-13 DOI:10.1109/LSP.2024.3517418

Rong Zhang;Junneng Feng;Cun Feng;Yirui Wang;Lijun Guo

{"title":"Part2Pose：从复杂场景中的部分推断人体姿势","authors":"Rong Zhang;Junneng Feng;Cun Feng;Yirui Wang;Lijun Guo","doi":"10.1109/LSP.2024.3517418","DOIUrl":null,"url":null,"abstract":"Most of existing Human Pose Estimation (HPE) methods struggle to handle with challenges such as changeable poses, complex backgrounds, and occlusion encountered in complex scenes. To address these problems, a novel HPE network, called Part2Pose, is proposed in this paper. In our Part2Pose, instead of focusing on small-sized keypoints like existing HPE methods do, we first extract image features based on human body parts to expand the detection scope. This strategy enhances the robustness of the extracted features to variations and distractions in complex scenes. Then, a Transformer-based Global Part Relation Module (GPRM) and a graph convolutional network-based Local Part Relation Module (LPRM) are used to capture global and local relationships among different body parts to help infer the position of keypoints. Extensive experiments on challenging datasets, including COCO, CrowdPose and OCHuman, show that the proposed Part2Pose can surpass existing popular state-of-the-art HPE methods. The combination with lightweight networks confirms the robustness and generalizability of our Part2Pose.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"441-445"},"PeriodicalIF":3.2000,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Part2Pose: Inferring Human Pose From Parts in Complex Scenes\",\"authors\":\"Rong Zhang;Junneng Feng;Cun Feng;Yirui Wang;Lijun Guo\",\"doi\":\"10.1109/LSP.2024.3517418\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Most of existing Human Pose Estimation (HPE) methods struggle to handle with challenges such as changeable poses, complex backgrounds, and occlusion encountered in complex scenes. To address these problems, a novel HPE network, called Part2Pose, is proposed in this paper. In our Part2Pose, instead of focusing on small-sized keypoints like existing HPE methods do, we first extract image features based on human body parts to expand the detection scope. This strategy enhances the robustness of the extracted features to variations and distractions in complex scenes. Then, a Transformer-based Global Part Relation Module (GPRM) and a graph convolutional network-based Local Part Relation Module (LPRM) are used to capture global and local relationships among different body parts to help infer the position of keypoints. Extensive experiments on challenging datasets, including COCO, CrowdPose and OCHuman, show that the proposed Part2Pose can surpass existing popular state-of-the-art HPE methods. The combination with lightweight networks confirms the robustness and generalizability of our Part2Pose.\",\"PeriodicalId\":13154,\"journal\":{\"name\":\"IEEE Signal Processing Letters\",\"volume\":\"32 \",\"pages\":\"441-445\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2024-12-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Signal Processing Letters\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10798470/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10798470/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

大多数现有的人体姿态估计（HPE）方法都难以处理复杂场景中遇到的姿态变化、复杂背景和遮挡等挑战。为了解决这些问题，本文提出了一种新的HPE网络，称为Part2Pose。在我们的Part2Pose中，我们不像现有的HPE方法那样专注于小尺寸的关键点，而是首先基于人体部位提取图像特征，扩大检测范围。该策略增强了提取的特征对复杂场景中变化和干扰的鲁棒性。然后，使用基于变压器的全局部分关系模块（GPRM）和基于图卷积网络的局部部分关系模块（LPRM）来捕获不同身体部位之间的全局和局部关系，以帮助推断关键点的位置。在具有挑战性的数据集（包括COCO、CrowdPose和ochhuman）上进行的大量实验表明，提出的Part2Pose可以超越现有流行的最先进的HPE方法。与轻量级网络的结合证实了我们Part2Pose的鲁棒性和泛化性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Part2Pose: Inferring Human Pose From Parts in Complex Scenes

Most of existing Human Pose Estimation (HPE) methods struggle to handle with challenges such as changeable poses, complex backgrounds, and occlusion encountered in complex scenes. To address these problems, a novel HPE network, called Part2Pose, is proposed in this paper. In our Part2Pose, instead of focusing on small-sized keypoints like existing HPE methods do, we first extract image features based on human body parts to expand the detection scope. This strategy enhances the robustness of the extracted features to variations and distractions in complex scenes. Then, a Transformer-based Global Part Relation Module (GPRM) and a graph convolutional network-based Local Part Relation Module (LPRM) are used to capture global and local relationships among different body parts to help infer the position of keypoints. Extensive experiments on challenging datasets, including COCO, CrowdPose and OCHuman, show that the proposed Part2Pose can surpass existing popular state-of-the-art HPE methods. The combination with lightweight networks confirms the robustness and generalizability of our Part2Pose.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Signal Processing Letters 工程技术-工程：电子与电气

CiteScore

7.40

自引率

12.80%

发文量

339

审稿时长

2.8 months

期刊介绍： The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.