{"title":"Modality-convolutions: Multi-modal gesture recognition based on convolutional neural network","authors":"Da Huo, Yufeng Chen, Fengxia Li, Zhengchao Lei","doi":"10.1109/ICCSE.2017.8085515","DOIUrl":null,"url":null,"abstract":"We proposed a novel method of feature extraction for multi-modal images called modality-convolution. It extracts both the intra- and inter-modality information. Whats more, it completes the data fusion at pixel-level so that the complementarity of information contained in multi-modal data is fully utilized. Based on the modality-convolution, we describe a modality-CNN for multi-modal gesture recognition. For extracting the features in RGB-D images, the modality-CNN is adopted in the gesture recognition framework. The framework use DBN to present the skeleton data. Then, the probability obtained by the two networks are fused and put into the HMM to carry out dynamic gestures classification. We use the Jaccar Index to calculate the accuracy of gesture recognition. A comparative experiment on ChaLearn LAP 2014 gesture datasets shows that the modality-convolution is able to extract the inter- and intra-modality information effectively, which is helpful to improve the accuracy.","PeriodicalId":256055,"journal":{"name":"2017 12th International Conference on Computer Science and Education (ICCSE)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 12th International Conference on Computer Science and Education (ICCSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCSE.2017.8085515","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
We proposed a novel method of feature extraction for multi-modal images called modality-convolution. It extracts both the intra- and inter-modality information. Whats more, it completes the data fusion at pixel-level so that the complementarity of information contained in multi-modal data is fully utilized. Based on the modality-convolution, we describe a modality-CNN for multi-modal gesture recognition. For extracting the features in RGB-D images, the modality-CNN is adopted in the gesture recognition framework. The framework use DBN to present the skeleton data. Then, the probability obtained by the two networks are fused and put into the HMM to carry out dynamic gestures classification. We use the Jaccar Index to calculate the accuracy of gesture recognition. A comparative experiment on ChaLearn LAP 2014 gesture datasets shows that the modality-convolution is able to extract the inter- and intra-modality information effectively, which is helpful to improve the accuracy.
我们提出了一种新的多模态图像特征提取方法,称为模态卷积。它同时提取模态内和模态间的信息。并在像素级完成数据融合,充分利用多模态数据所含信息的互补性。在模态卷积的基础上,提出了一种用于多模态手势识别的模态cnn。为了提取RGB-D图像中的特征,在手势识别框架中采用了模态- cnn。该框架使用DBN来表示骨架数据。然后,将两个网络得到的概率融合到HMM中进行动态手势分类。我们使用Jaccar指数来计算手势识别的准确性。在ChaLearn LAP 2014手势数据集上的对比实验表明,模态卷积能够有效地提取模态间和模态内信息,有助于提高识别精度。