To enable intuitive and reliable human–robot collaboration, robots must understand human actions at a structural level, making skeleton-based gesture recognition (SGR) a crucial source of precise and robust intention cues. Graph convolutional networks (GCNs) have become a key technology in SGR due to their efficient processing of non-Euclidean data. However, existing methods typically choose between a fixed anatomical prior graph and a fully adaptive dynamic graph, which limits the model’s ability to capture structural invariance and dynamic variability in hand motion simultaneously. To address this challenge, we propose the Structural-Adaptive Spatio-Temporal GCN (SA-STGCN), which relies on an innovative spatiotemporal feature extraction mechanism designed to fuse structural priors with motion-adaptive topology synergistically. Spatially, our designed Spatio-Temporal Attunement (STA) Block integrates two key components in parallel: Relational Semantics Graph Convolution (RS-GC), which constructs a rich structured representation by modeling multiple priors such as physical connectivity, symmetry relationships, and functional groupings, while aggregating features at both the joint and component levels. Meanwhile, Motion Signature Graph Convolution (MS-GC) learns a dynamic, instance-specific topological graph from the data to capture instantaneous motion patterns. Temporally, the Temporal Multi-Scale Aggregation (TMA) Module effectively captures fine-grained motion at varying rates through multi-way dilated convolutions, and the Temporal Saliency Modulator (TSM) further enhances the feature weights of keyframes. These improvements significantly enhance the accuracy and efficiency of GR. The experimental results demonstrate that our model achieves an accuracy of 97.62% on the 14-class task and 95.36% on the 28-class task of the SHREC’17 Track dataset, as well as 93.22% on the FPHA dataset.
扫码关注我们
求助内容:
应助结果提醒方式:
