{"title":"Fall detection method based on spatio-temporal coordinate attention for high-resolution networks","authors":"Xiaorui Zhang, Qijian Xie, Wei Sun, Ting Wang","doi":"10.1007/s40747-024-01660-4","DOIUrl":null,"url":null,"abstract":"<p>Fall behavior is closely related to the high mortality rate of the elderly, so fall detection has become an important and urgent research area in human behavior recognition. However, the existing fall detection methods, suffer from the loss of detailed action information during feature extraction due to the downsampling operation, resulting in subpar performance when detecting falls with similar behaviors such as lying and sitting. To solve the challenges, this study proposes a high-resolution spatio-temporal feature extraction method based on a spatio-temporal coordinate attention mechanism. The method employs 3D convolutions to extract spatio-temporal features and utilizes gradual down-sampling to generate a multi-resolution sub-network, thus realizing multi-scale fusion and perception enhancement of details. In particular, this study designs a pseudo-3D basic block, which simulates the ability of 3D convolution, to ensure the running speed of the network while controlling the number of parameters. Further, a spatio-temporal coordinate attention mechanism is designed to accurately extract the spatio-temporal positional changes of key skeletal points and the interrelationships among them. Long-term dependencies in horizontal, vertical, temporal directions are captured through three one-dimensional global pooling operations. Then the long-range relationships and channel correlations among features are captured by cascading and slicing operations. Finally, the key information is effectively highlighted by performing dot-multiplication operations between the feature maps from the horizontal, vertical and temporal directions and the input feature maps. Experimental results on three typical public datasets show that the proposed method can better extract motion features and improve the accuracy of fall detection.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"4 1","pages":""},"PeriodicalIF":5.0000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex & Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s40747-024-01660-4","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Fall behavior is closely related to the high mortality rate of the elderly, so fall detection has become an important and urgent research area in human behavior recognition. However, the existing fall detection methods, suffer from the loss of detailed action information during feature extraction due to the downsampling operation, resulting in subpar performance when detecting falls with similar behaviors such as lying and sitting. To solve the challenges, this study proposes a high-resolution spatio-temporal feature extraction method based on a spatio-temporal coordinate attention mechanism. The method employs 3D convolutions to extract spatio-temporal features and utilizes gradual down-sampling to generate a multi-resolution sub-network, thus realizing multi-scale fusion and perception enhancement of details. In particular, this study designs a pseudo-3D basic block, which simulates the ability of 3D convolution, to ensure the running speed of the network while controlling the number of parameters. Further, a spatio-temporal coordinate attention mechanism is designed to accurately extract the spatio-temporal positional changes of key skeletal points and the interrelationships among them. Long-term dependencies in horizontal, vertical, temporal directions are captured through three one-dimensional global pooling operations. Then the long-range relationships and channel correlations among features are captured by cascading and slicing operations. Finally, the key information is effectively highlighted by performing dot-multiplication operations between the feature maps from the horizontal, vertical and temporal directions and the input feature maps. Experimental results on three typical public datasets show that the proposed method can better extract motion features and improve the accuracy of fall detection.
期刊介绍:
Complex & Intelligent Systems aims to provide a forum for presenting and discussing novel approaches, tools and techniques meant for attaining a cross-fertilization between the broad fields of complex systems, computational simulation, and intelligent analytics and visualization. The transdisciplinary research that the journal focuses on will expand the boundaries of our understanding by investigating the principles and processes that underlie many of the most profound problems facing society today.