Zero-Shot Learning for Gesture Recognition

Proceedings of the 2020 International Conference on Multimodal Interaction Pub Date : 2020-10-21 DOI:10.1145/3382507.3421161

Naveen Madapana

{"title":"Zero-Shot Learning for Gesture Recognition","authors":"Naveen Madapana","doi":"10.1145/3382507.3421161","DOIUrl":null,"url":null,"abstract":"Zero-Shot Learning (ZSL) is a new paradigm in machine learning that aims to recognize the classes that are not present in the training data. Hence, this paradigm is capable of comprehending the categories that were never seen before. While deep learning has pushed the limits of unseen object recognition, ZSL for temporal problems such as unfamiliar gesture recognition (referred to as ZSGL) remain unexplored. ZSGL has the potential to result in efficient human-machine interfaces that can recognize and understand the spontaneous and conversational gestures of humans. In this regard, the objective of this work is to conceptualize, model and develop a framework to tackle ZSGL problems. The first step in the pipeline is to develop a database of gesture attributes that are representative of a range of categories. Next, a deep architecture consisting of convolutional and recurrent layers is proposed to jointly optimize the semantic and classification losses. Lastly, rigorous experiments are performed to compare the proposed model with respect to existing ZSL models on CGD 2013 and MSRC-12 datasets. In our preliminary work, we identified a list of 64 discriminative attributes related to gestures' morphological characteristics. Our approach yields an unseen class accuracy of (41%) which outperforms the state-of-the-art approaches by a considerable margin. Future work involves the following: 1. Modifying the existing architecture in order to improve the ZSL accuracy, 2. Augmenting the database of attributes to incorporate semantic properties, 3. Addressing the issue of data imbalance which is inherent to ZSL problems, and 4. Expanding this research to other domains such as surgeme and action recognition.","PeriodicalId":402394,"journal":{"name":"Proceedings of the 2020 International Conference on Multimodal Interaction","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3382507.3421161","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Zero-Shot Learning (ZSL) is a new paradigm in machine learning that aims to recognize the classes that are not present in the training data. Hence, this paradigm is capable of comprehending the categories that were never seen before. While deep learning has pushed the limits of unseen object recognition, ZSL for temporal problems such as unfamiliar gesture recognition (referred to as ZSGL) remain unexplored. ZSGL has the potential to result in efficient human-machine interfaces that can recognize and understand the spontaneous and conversational gestures of humans. In this regard, the objective of this work is to conceptualize, model and develop a framework to tackle ZSGL problems. The first step in the pipeline is to develop a database of gesture attributes that are representative of a range of categories. Next, a deep architecture consisting of convolutional and recurrent layers is proposed to jointly optimize the semantic and classification losses. Lastly, rigorous experiments are performed to compare the proposed model with respect to existing ZSL models on CGD 2013 and MSRC-12 datasets. In our preliminary work, we identified a list of 64 discriminative attributes related to gestures' morphological characteristics. Our approach yields an unseen class accuracy of (41%) which outperforms the state-of-the-art approaches by a considerable margin. Future work involves the following: 1. Modifying the existing architecture in order to improve the ZSL accuracy, 2. Augmenting the database of attributes to incorporate semantic properties, 3. Addressing the issue of data imbalance which is inherent to ZSL problems, and 4. Expanding this research to other domains such as surgeme and action recognition.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

零射击学习手势识别

零射击学习(Zero-Shot Learning, ZSL)是机器学习的一种新范式，旨在识别训练数据中不存在的类。因此，这种范式能够理解以前从未见过的类别。虽然深度学习已经突破了看不见的物体识别的极限，但ZSL用于不熟悉的手势识别(简称ZSGL)等时间问题仍未被探索。ZSGL有可能产生高效的人机界面，可以识别和理解人类的自发和会话手势。在这方面，这项工作的目标是概念化、建模和开发一个框架来解决ZSGL问题。该流程的第一步是开发一个代表一系列类别的手势属性数据库。其次，提出了一种由卷积层和循环层组成的深度体系结构，共同优化语义和分类损失。最后，在CGD 2013和MSRC-12数据集上进行了严格的实验，将所提出的模型与现有的ZSL模型进行了比较。在我们的初步工作中，我们确定了64个与手势形态特征相关的判别属性。我们的方法产生了一个前所未见的类准确率(41%)，这比最先进的方法要好得多。今后的工作包括:1 .改进现有结构以提高ZSL精度;2 .扩充属性数据库以纳入语义属性;解决数据不平衡的问题，这是ZSL问题固有的;将此研究扩展到其他领域，如涌浪和动作识别。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2020 International Conference on Multimodal Interaction

自引率

0.00%

发文量

期刊最新文献

OpenSense: A Platform for Multimodal Data Acquisition and Behavior Perception Human-centered Multimodal Machine Intelligence Touch Recognition with Attentive End-to-End Model MORSE: MultimOdal sentiment analysis for Real-life SEttings Temporal Attention and Consistency Measuring for Video Question Answering