Mohammad Iman Junaid , Allam Jaya Prakash , Samit Ari
{"title":"利用深度卷积神经网络的时空联合调制识别人类步态","authors":"Mohammad Iman Junaid , Allam Jaya Prakash , Samit Ari","doi":"10.1016/j.jvcir.2024.104322","DOIUrl":null,"url":null,"abstract":"<div><div>Gait, a person’s distinctive walking pattern, offers a promising biometric modality for surveillance applications. Unlike fingerprints or iris scans, gait can be captured from a distance without the subject’s direct cooperation or awareness. This makes it ideal for surveillance and security applications. Traditional convolutional neural networks (CNNs) often struggle with the inherent variations within video data, limiting their effectiveness in gait recognition. The proposed technique in this work introduces a unique joint spatial–temporal modulation network designed to overcome this limitation. By extracting discriminative feature representations across varying frame levels, the network effectively leverages both spatial and temporal variations within video sequences. The proposed architecture integrates attention-based CNNs for spatial feature extraction and a Bidirectional Long Short-Term Memory (Bi-LSTM) network with a temporal attention module to analyse temporal dynamics. The use of attention in spatial and temporal blocks enhances the network’s capability of focusing on the most relevant segments of the video data. This can improve efficiency since the combined approach enhances learning capabilities when processing complex gait videos. We evaluated the effectiveness of the proposed network using two major datasets, namely CASIA-B and OUMVLP. Experimental analysis on CASIA B demonstrates that the proposed network achieves an average rank-1 accuracy of 98.20% for normal walking, 94.50% for walking with a bag and 80.40% for clothing scenarios. The proposed network also achieved an accuracy of 89.10% for OU-MVLP. These results show the proposed method‘s ability to generalize to large-scale data and consistently outperform current state-of-the-art gait recognition techniques.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"105 ","pages":"Article 104322"},"PeriodicalIF":2.6000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Human gait recognition using joint spatiotemporal modulation in deep convolutional neural networks\",\"authors\":\"Mohammad Iman Junaid , Allam Jaya Prakash , Samit Ari\",\"doi\":\"10.1016/j.jvcir.2024.104322\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Gait, a person’s distinctive walking pattern, offers a promising biometric modality for surveillance applications. Unlike fingerprints or iris scans, gait can be captured from a distance without the subject’s direct cooperation or awareness. This makes it ideal for surveillance and security applications. Traditional convolutional neural networks (CNNs) often struggle with the inherent variations within video data, limiting their effectiveness in gait recognition. The proposed technique in this work introduces a unique joint spatial–temporal modulation network designed to overcome this limitation. By extracting discriminative feature representations across varying frame levels, the network effectively leverages both spatial and temporal variations within video sequences. The proposed architecture integrates attention-based CNNs for spatial feature extraction and a Bidirectional Long Short-Term Memory (Bi-LSTM) network with a temporal attention module to analyse temporal dynamics. The use of attention in spatial and temporal blocks enhances the network’s capability of focusing on the most relevant segments of the video data. This can improve efficiency since the combined approach enhances learning capabilities when processing complex gait videos. We evaluated the effectiveness of the proposed network using two major datasets, namely CASIA-B and OUMVLP. Experimental analysis on CASIA B demonstrates that the proposed network achieves an average rank-1 accuracy of 98.20% for normal walking, 94.50% for walking with a bag and 80.40% for clothing scenarios. The proposed network also achieved an accuracy of 89.10% for OU-MVLP. These results show the proposed method‘s ability to generalize to large-scale data and consistently outperform current state-of-the-art gait recognition techniques.</div></div>\",\"PeriodicalId\":54755,\"journal\":{\"name\":\"Journal of Visual Communication and Image Representation\",\"volume\":\"105 \",\"pages\":\"Article 104322\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2024-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Visual Communication and Image Representation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1047320324002785\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047320324002785","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Human gait recognition using joint spatiotemporal modulation in deep convolutional neural networks
Gait, a person’s distinctive walking pattern, offers a promising biometric modality for surveillance applications. Unlike fingerprints or iris scans, gait can be captured from a distance without the subject’s direct cooperation or awareness. This makes it ideal for surveillance and security applications. Traditional convolutional neural networks (CNNs) often struggle with the inherent variations within video data, limiting their effectiveness in gait recognition. The proposed technique in this work introduces a unique joint spatial–temporal modulation network designed to overcome this limitation. By extracting discriminative feature representations across varying frame levels, the network effectively leverages both spatial and temporal variations within video sequences. The proposed architecture integrates attention-based CNNs for spatial feature extraction and a Bidirectional Long Short-Term Memory (Bi-LSTM) network with a temporal attention module to analyse temporal dynamics. The use of attention in spatial and temporal blocks enhances the network’s capability of focusing on the most relevant segments of the video data. This can improve efficiency since the combined approach enhances learning capabilities when processing complex gait videos. We evaluated the effectiveness of the proposed network using two major datasets, namely CASIA-B and OUMVLP. Experimental analysis on CASIA B demonstrates that the proposed network achieves an average rank-1 accuracy of 98.20% for normal walking, 94.50% for walking with a bag and 80.40% for clothing scenarios. The proposed network also achieved an accuracy of 89.10% for OU-MVLP. These results show the proposed method‘s ability to generalize to large-scale data and consistently outperform current state-of-the-art gait recognition techniques.
期刊介绍:
The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.