{"title":"基于骨架的动作识别的可分离时空图学习方法","authors":"Hui Zheng;Ye-Sheng Zhao;Bo Zhang;Guo-Qiang Shang","doi":"10.1109/LSENS.2024.3475515","DOIUrl":null,"url":null,"abstract":"With the popularization of sensors and the development of pose estimation algorithms, a skeleton-based action recognition task has gradually become mainstream in human action recognition tasks. The key to solving skeleton-based action recognition task is to extract feature representations that can accurately outline the characteristics of human actions from sensor data. In this letter, we propose a separable spatial-temporal graph learning approach, which is composed of independent spatial and temporal graph networks. In the spatial graph network, spectral-based graph convolutional network is selected to mine spatial features of each moment. In the temporal graph network, a global-local attention mechanism is embedded to excavate interdependence at different times. Extensive experiments are carried out on the NTU-RGB+D and NTU-RGB+D 120 datasets, and the results show that our proposed method outperforms several other baselines.","PeriodicalId":13014,"journal":{"name":"IEEE Sensors Letters","volume":"8 11","pages":"1-4"},"PeriodicalIF":2.2000,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Separable Spatial–Temporal Graph Learning Approach for Skeleton-Based Action Recognition\",\"authors\":\"Hui Zheng;Ye-Sheng Zhao;Bo Zhang;Guo-Qiang Shang\",\"doi\":\"10.1109/LSENS.2024.3475515\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the popularization of sensors and the development of pose estimation algorithms, a skeleton-based action recognition task has gradually become mainstream in human action recognition tasks. The key to solving skeleton-based action recognition task is to extract feature representations that can accurately outline the characteristics of human actions from sensor data. In this letter, we propose a separable spatial-temporal graph learning approach, which is composed of independent spatial and temporal graph networks. In the spatial graph network, spectral-based graph convolutional network is selected to mine spatial features of each moment. In the temporal graph network, a global-local attention mechanism is embedded to excavate interdependence at different times. Extensive experiments are carried out on the NTU-RGB+D and NTU-RGB+D 120 datasets, and the results show that our proposed method outperforms several other baselines.\",\"PeriodicalId\":13014,\"journal\":{\"name\":\"IEEE Sensors Letters\",\"volume\":\"8 11\",\"pages\":\"1-4\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2024-10-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Sensors Letters\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10706715/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Sensors Letters","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10706715/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
A Separable Spatial–Temporal Graph Learning Approach for Skeleton-Based Action Recognition
With the popularization of sensors and the development of pose estimation algorithms, a skeleton-based action recognition task has gradually become mainstream in human action recognition tasks. The key to solving skeleton-based action recognition task is to extract feature representations that can accurately outline the characteristics of human actions from sensor data. In this letter, we propose a separable spatial-temporal graph learning approach, which is composed of independent spatial and temporal graph networks. In the spatial graph network, spectral-based graph convolutional network is selected to mine spatial features of each moment. In the temporal graph network, a global-local attention mechanism is embedded to excavate interdependence at different times. Extensive experiments are carried out on the NTU-RGB+D and NTU-RGB+D 120 datasets, and the results show that our proposed method outperforms several other baselines.