基于骨架的动作识别的上下文感知图卷积

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2020-06-01 DOI:10.1109/cvpr42600.2020.01434

Xikun Zhang, Chang Xu, D. Tao

{"title":"基于骨架的动作识别的上下文感知图卷积","authors":"Xikun Zhang, Chang Xu, D. Tao","doi":"10.1109/cvpr42600.2020.01434","DOIUrl":null,"url":null,"abstract":"Graph convolutional models have gained impressive successes on skeleton based human action recognition task. As graph convolution is a local operation, it cannot fully investigate non-local joints that could be vital to recognizing the action. For example, actions like typing and clapping request the cooperation of two hands, which are distant from each other in a human skeleton graph. Multiple graph convolutional layers thus tend to be stacked together to increase receptive field, which brings in computational inefficiency and optimization difficulty. But there is still no guarantee that distant joints (e.g. two hands) can be well integrated. In this paper, we propose a context aware graph convolutional network (CA-GCN). Besides the computation of localized graph convolution, CA-GCN considers a context term for each vertex by integrating information of all other vertices. Long range dependencies among joints are thus naturally integrated in context information, which then eliminates the need of stacking multiple layers to enlarge receptive field and greatly simplifies the network. Moreover, we further propose an advanced CA-GCN, in which asymmetric relevance measurement and higher level representation are utilized to compute context information for more flexibility and better performance. Besides the joint features, our CA-GCN could also be extended to handle graphs with edge (limb) features. Extensive experiments on two real-world datasets demonstrate the importance of context information and the effectiveness of the proposed CA-GCN in skeleton based action recognition.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"6 1","pages":"14321-14330"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"103","resultStr":"{\"title\":\"Context Aware Graph Convolution for Skeleton-Based Action Recognition\",\"authors\":\"Xikun Zhang, Chang Xu, D. Tao\",\"doi\":\"10.1109/cvpr42600.2020.01434\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graph convolutional models have gained impressive successes on skeleton based human action recognition task. As graph convolution is a local operation, it cannot fully investigate non-local joints that could be vital to recognizing the action. For example, actions like typing and clapping request the cooperation of two hands, which are distant from each other in a human skeleton graph. Multiple graph convolutional layers thus tend to be stacked together to increase receptive field, which brings in computational inefficiency and optimization difficulty. But there is still no guarantee that distant joints (e.g. two hands) can be well integrated. In this paper, we propose a context aware graph convolutional network (CA-GCN). Besides the computation of localized graph convolution, CA-GCN considers a context term for each vertex by integrating information of all other vertices. Long range dependencies among joints are thus naturally integrated in context information, which then eliminates the need of stacking multiple layers to enlarge receptive field and greatly simplifies the network. Moreover, we further propose an advanced CA-GCN, in which asymmetric relevance measurement and higher level representation are utilized to compute context information for more flexibility and better performance. Besides the joint features, our CA-GCN could also be extended to handle graphs with edge (limb) features. Extensive experiments on two real-world datasets demonstrate the importance of context information and the effectiveness of the proposed CA-GCN in skeleton based action recognition.\",\"PeriodicalId\":6715,\"journal\":{\"name\":\"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)\",\"volume\":\"6 1\",\"pages\":\"14321-14330\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"103\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/cvpr42600.2020.01434\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/cvpr42600.2020.01434","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 103

摘要

图卷积模型在基于骨骼的人体动作识别任务中取得了令人瞩目的成功。由于图卷积是一种局部操作，它不能完全研究对动作识别至关重要的非局部关节。例如，打字和拍手等动作需要两只手的配合，而在人体骨架图中，这两只手相距很远。因此，多个图卷积层倾向于堆叠在一起以增加接受场，这带来了计算效率低下和优化难度。但是仍然不能保证远处的关节(例如两只手)可以很好地结合在一起。本文提出了一种上下文感知图卷积网络(CA-GCN)。CA-GCN除了计算局部图卷积外，还通过整合所有其他顶点的信息来考虑每个顶点的上下文项。因此，节点之间的长距离依赖关系自然地集成在上下文信息中，从而消除了堆叠多层以扩大接受域的需要，大大简化了网络。此外，我们进一步提出了一种改进的CA-GCN，该算法利用非对称关联度量和更高层次的表示来计算上下文信息，以获得更大的灵活性和更好的性能。除了联合特征外，我们的CA-GCN还可以扩展到处理具有边(肢)特征的图。在两个真实数据集上的大量实验证明了上下文信息的重要性以及所提出的CA-GCN在基于骨架的动作识别中的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Context Aware Graph Convolution for Skeleton-Based Action Recognition

Graph convolutional models have gained impressive successes on skeleton based human action recognition task. As graph convolution is a local operation, it cannot fully investigate non-local joints that could be vital to recognizing the action. For example, actions like typing and clapping request the cooperation of two hands, which are distant from each other in a human skeleton graph. Multiple graph convolutional layers thus tend to be stacked together to increase receptive field, which brings in computational inefficiency and optimization difficulty. But there is still no guarantee that distant joints (e.g. two hands) can be well integrated. In this paper, we propose a context aware graph convolutional network (CA-GCN). Besides the computation of localized graph convolution, CA-GCN considers a context term for each vertex by integrating information of all other vertices. Long range dependencies among joints are thus naturally integrated in context information, which then eliminates the need of stacking multiple layers to enlarge receptive field and greatly simplifies the network. Moreover, we further propose an advanced CA-GCN, in which asymmetric relevance measurement and higher level representation are utilized to compute context information for more flexibility and better performance. Besides the joint features, our CA-GCN could also be extended to handle graphs with edge (limb) features. Extensive experiments on two real-world datasets demonstrate the importance of context information and the effectiveness of the proposed CA-GCN in skeleton based action recognition.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

自引率

0.00%

发文量

期刊最新文献

Geometric Structure Based and Regularized Depth Estimation From 360 Indoor Imagery 3D Part Guided Image Editing for Fine-Grained Object Understanding SDC-Depth: Semantic Divide-and-Conquer Network for Monocular Depth Estimation Approximating shapes in images with low-complexity polygons PFRL: Pose-Free Reinforcement Learning for 6D Pose Estimation