姿势-促进：日常生活活动的渐进式视觉感知

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Signal Processing Letters Pub Date : 2024-10-14 DOI:10.1109/LSP.2024.3480046

Qilang Ye;Zitong Yu

{"title":"姿势-促进：日常生活活动的渐进式视觉感知","authors":"Qilang Ye;Zitong Yu","doi":"10.1109/LSP.2024.3480046","DOIUrl":null,"url":null,"abstract":"Poses are effective in interpreting fine-grained human activities, especially when encountering complex visual information. Unimodal methods for action recognition unsatisfactorily to daily activities due to the lack of a more comprehensive perspective. Multimodal methods to combine pose and visual are still not exhaustive enough in mining complementary information. Therefore, we propose a Pose-promote (Ppromo) framework that utilizes a priori knowledge of pose joints to perceive visual information progressively. We first introduce a temporal promote module to activate each video segment using temporally synchronized joint weights. Then a spatial promote module is proposed to capture the key regions in visuals using the learned pose attentions. To further refine the bimodal associations, the global inter-promote module is proposed to align global pose-visual semantics at the feature granularity. Finally, a learnable late fusion strategy between visual and pose is applied for accurate inference. Ppromo achieves state-of-the-art performance on three publicly available datasets.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"2950-2954"},"PeriodicalIF":3.2000,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Pose-Promote: Progressive Visual Perception for Activities of Daily Living\",\"authors\":\"Qilang Ye;Zitong Yu\",\"doi\":\"10.1109/LSP.2024.3480046\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Poses are effective in interpreting fine-grained human activities, especially when encountering complex visual information. Unimodal methods for action recognition unsatisfactorily to daily activities due to the lack of a more comprehensive perspective. Multimodal methods to combine pose and visual are still not exhaustive enough in mining complementary information. Therefore, we propose a Pose-promote (Ppromo) framework that utilizes a priori knowledge of pose joints to perceive visual information progressively. We first introduce a temporal promote module to activate each video segment using temporally synchronized joint weights. Then a spatial promote module is proposed to capture the key regions in visuals using the learned pose attentions. To further refine the bimodal associations, the global inter-promote module is proposed to align global pose-visual semantics at the feature granularity. Finally, a learnable late fusion strategy between visual and pose is applied for accurate inference. Ppromo achieves state-of-the-art performance on three publicly available datasets.\",\"PeriodicalId\":13154,\"journal\":{\"name\":\"IEEE Signal Processing Letters\",\"volume\":\"31 \",\"pages\":\"2950-2954\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2024-10-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Signal Processing Letters\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10716484/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10716484/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

姿势能有效解释细微的人类活动，尤其是在遇到复杂的视觉信息时。由于缺乏更全面的视角，用于动作识别的单模态方法对日常活动的识别效果并不理想。结合姿势和视觉的多模态方法在挖掘互补信息方面仍不够详尽。因此，我们提出了姿势促进（Ppromo）框架，利用姿势关节的先验知识逐步感知视觉信息。我们首先引入了一个时间促进模块，利用时间同步的关节权重激活每个视频片段。然后，我们提出了一个空间促进模块，利用学习到的姿势注意力捕捉视觉中的关键区域。为了进一步完善双模态关联，我们提出了全局相互促进模块，以在特征粒度上调整全局姿势-视觉语义。最后，在视觉和姿势之间采用可学习的后期融合策略，以实现精确推理。Ppromo 在三个公开可用的数据集上实现了最先进的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Pose-Promote: Progressive Visual Perception for Activities of Daily Living

Poses are effective in interpreting fine-grained human activities, especially when encountering complex visual information. Unimodal methods for action recognition unsatisfactorily to daily activities due to the lack of a more comprehensive perspective. Multimodal methods to combine pose and visual are still not exhaustive enough in mining complementary information. Therefore, we propose a Pose-promote (Ppromo) framework that utilizes a priori knowledge of pose joints to perceive visual information progressively. We first introduce a temporal promote module to activate each video segment using temporally synchronized joint weights. Then a spatial promote module is proposed to capture the key regions in visuals using the learned pose attentions. To further refine the bimodal associations, the global inter-promote module is proposed to align global pose-visual semantics at the feature granularity. Finally, a learnable late fusion strategy between visual and pose is applied for accurate inference. Ppromo achieves state-of-the-art performance on three publicly available datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Signal Processing Letters 工程技术-工程：电子与电气

CiteScore

7.40

自引率

12.80%

发文量

339

审稿时长

2.8 months

期刊介绍： The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.