{"title":"A multi-modal framework for continuous and isolated hand gesture recognition utilizing movement epenthesis detection","authors":"Navneet Nayan, Debashis Ghosh, Pyari Mohan Pradhan","doi":"10.1007/s00138-024-01565-9","DOIUrl":null,"url":null,"abstract":"<p>Gesture recognition, having multitudinous applications in the real world, is one of the core areas of research in the field of human-computer interaction. In this paper, we propose a novel method for isolated and continuous hand gesture recognition utilizing the movement epenthesis detection and removal. For this purpose, the present work detects and removes the movement epenthesis frames from the isolated and continuous hand gesture videos. In this paper, we have also proposed a novel modality based on the temporal difference that extracts hand regions, removes gesture irrelevant factors and provides temporal information contained in the hand gesture videos. Using the proposed modality and other modalities such as the RGB modality, depth modality and segmented hand modality, features are extracted using Googlenet Caffe Model. Next, we derive a set of discriminative features by fusing the acquired features that form a feature vector representing the sign gesture in question. We have designed and used a Bidirectional Long Short-Term Memory Network (Bi-LSTM) for classification purpose. To test the efficacy of our proposed work, we applied our method on various publicly available continuous and isolated hand gesture datasets like ChaLearn LAP IsoGD, ChaLearn LAP ConGD, IPN Hand, and NVGesture. We observe in our experiments that our proposed method performs exceptionally well with several individual modalities as well as combination of modalities of these datasets. The combined effect of the proposed modality and movement epenthesis frames removal led to significant improvement in gesture recognition accuracy and considerable reduction in computational burden. Thus the obtained results advocate our proposed approach to be at par with the existing state-of-the-art methods.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"12 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Vision and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00138-024-01565-9","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Gesture recognition, having multitudinous applications in the real world, is one of the core areas of research in the field of human-computer interaction. In this paper, we propose a novel method for isolated and continuous hand gesture recognition utilizing the movement epenthesis detection and removal. For this purpose, the present work detects and removes the movement epenthesis frames from the isolated and continuous hand gesture videos. In this paper, we have also proposed a novel modality based on the temporal difference that extracts hand regions, removes gesture irrelevant factors and provides temporal information contained in the hand gesture videos. Using the proposed modality and other modalities such as the RGB modality, depth modality and segmented hand modality, features are extracted using Googlenet Caffe Model. Next, we derive a set of discriminative features by fusing the acquired features that form a feature vector representing the sign gesture in question. We have designed and used a Bidirectional Long Short-Term Memory Network (Bi-LSTM) for classification purpose. To test the efficacy of our proposed work, we applied our method on various publicly available continuous and isolated hand gesture datasets like ChaLearn LAP IsoGD, ChaLearn LAP ConGD, IPN Hand, and NVGesture. We observe in our experiments that our proposed method performs exceptionally well with several individual modalities as well as combination of modalities of these datasets. The combined effect of the proposed modality and movement epenthesis frames removal led to significant improvement in gesture recognition accuracy and considerable reduction in computational burden. Thus the obtained results advocate our proposed approach to be at par with the existing state-of-the-art methods.
手势识别在现实世界中应用广泛,是人机交互领域的核心研究领域之一。在本文中,我们提出了一种利用运动外显检测和移除进行孤立和连续手势识别的新方法。为此,本文从孤立和连续手势视频中检测并移除运动外显帧。在本文中,我们还提出了一种基于时间差的新型模态,该模态可提取手部区域、去除手势无关因素并提供手势视频中包含的时间信息。利用提出的模态和其他模态(如 RGB 模态、深度模态和分割手部模态),我们使用 Googlenet Caffe 模型提取了特征。接下来,我们通过融合所获得的特征,形成代表相关手势的特征向量,从而得出一组判别特征。我们设计并使用了双向长短期记忆网络(Bi-LSTM)进行分类。为了测试我们提出的方法的有效性,我们在各种公开的连续和孤立手势数据集上应用了我们的方法,如 ChaLearn LAP IsoGD、ChaLearn LAP ConGD、IPN Hand 和 NVGesture。我们在实验中观察到,我们提出的方法在这些数据集的几种单独模态和模态组合中都表现出色。所提议的模式和运动外显帧移除的综合效应显著提高了手势识别的准确性,并大大减轻了计算负担。因此,所获得的结果表明,我们提出的方法与现有的最先进方法不相上下。
期刊介绍:
Machine Vision and Applications publishes high-quality technical contributions in machine vision research and development. Specifically, the editors encourage submittals in all applications and engineering aspects of image-related computing. In particular, original contributions dealing with scientific, commercial, industrial, military, and biomedical applications of machine vision, are all within the scope of the journal.
Particular emphasis is placed on engineering and technology aspects of image processing and computer vision.
The following aspects of machine vision applications are of interest: algorithms, architectures, VLSI implementations, AI techniques and expert systems for machine vision, front-end sensing, multidimensional and multisensor machine vision, real-time techniques, image databases, virtual reality and visualization. Papers must include a significant experimental validation component.