使用深度学习的手语识别

2023 15th International Conference on Developments in eSystems Engineering (DeSE) Pub Date : 2023-01-09 DOI:10.1109/DeSE58274.2023.10100055

M. Mahyoub, F. Natalia, S. Sudirman, J. Mustafina

{"title":"使用深度学习的手语识别","authors":"M. Mahyoub, F. Natalia, S. Sudirman, J. Mustafina","doi":"10.1109/DeSE58274.2023.10100055","DOIUrl":null,"url":null,"abstract":"Sign Language Recognition is a form of action recognition problem. The purpose of such a system is to automatically translate sign words from one language to another. While much work has been done in the SLR domain, it is a broad area of study and numerous areas still need research attention. The work that we present in this paper aims to investigate the suitability of deep learning approaches in recognizing and classifying words from video frames in different sign languages. We consider three sign languages, namely Indian Sign Language, American Sign Language, and Turkish Sign Language. Our methodology employs five different deep learning models with increasing complexities. They are a shallow four-layer Convolutional Neural Network, a basic VGG16 model, a VGG16 model with Attention Mechanism, a VGG16 model with Transformer Encoder and Gated Recurrent Units-based Decoder, and an Inflated 3D model with the same. We trained and tested the models to recognize and classify words from videos in three different sign language datasets. From our experiment, we found that the performance of the models relates quite closely to the model's complexity with the Inflated 3D model performing the best. Furthermore, we also found that all models find it more difficult to recognize words in the American Sign Language dataset than the others.","PeriodicalId":346847,"journal":{"name":"2023 15th International Conference on Developments in eSystems Engineering (DeSE)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Sign Language Recognition using Deep Learning\",\"authors\":\"M. Mahyoub, F. Natalia, S. Sudirman, J. Mustafina\",\"doi\":\"10.1109/DeSE58274.2023.10100055\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sign Language Recognition is a form of action recognition problem. The purpose of such a system is to automatically translate sign words from one language to another. While much work has been done in the SLR domain, it is a broad area of study and numerous areas still need research attention. The work that we present in this paper aims to investigate the suitability of deep learning approaches in recognizing and classifying words from video frames in different sign languages. We consider three sign languages, namely Indian Sign Language, American Sign Language, and Turkish Sign Language. Our methodology employs five different deep learning models with increasing complexities. They are a shallow four-layer Convolutional Neural Network, a basic VGG16 model, a VGG16 model with Attention Mechanism, a VGG16 model with Transformer Encoder and Gated Recurrent Units-based Decoder, and an Inflated 3D model with the same. We trained and tested the models to recognize and classify words from videos in three different sign language datasets. From our experiment, we found that the performance of the models relates quite closely to the model's complexity with the Inflated 3D model performing the best. Furthermore, we also found that all models find it more difficult to recognize words in the American Sign Language dataset than the others.\",\"PeriodicalId\":346847,\"journal\":{\"name\":\"2023 15th International Conference on Developments in eSystems Engineering (DeSE)\",\"volume\":\"94 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 15th International Conference on Developments in eSystems Engineering (DeSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DeSE58274.2023.10100055\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 15th International Conference on Developments in eSystems Engineering (DeSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DeSE58274.2023.10100055","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

手语识别是动作识别问题的一种形式。这种系统的目的是自动将手语从一种语言翻译成另一种语言。虽然在单反领域已经做了很多工作，但它是一个广泛的研究领域，还有许多领域需要关注。我们在本文中提出的工作旨在研究深度学习方法在识别和分类不同手语视频帧中的单词方面的适用性。我们考虑三种手语，即印度手语，美国手语和土耳其手语。我们的方法采用了五种不同的深度学习模型，其复杂性不断增加。它们是浅四层卷积神经网络，基本VGG16模型，带注意机制的VGG16模型，带变压器编码器和基于门控循环单元的解码器的VGG16模型，以及具有相同功能的充气3D模型。我们训练并测试了这些模型，以识别和分类三种不同的手语数据集中的视频中的单词。从实验中，我们发现模型的性能与模型的复杂性密切相关，其中充气3D模型表现最好。此外，我们还发现所有模型都发现在美国手语数据集中识别单词比其他模型更难。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Sign Language Recognition using Deep Learning

Sign Language Recognition is a form of action recognition problem. The purpose of such a system is to automatically translate sign words from one language to another. While much work has been done in the SLR domain, it is a broad area of study and numerous areas still need research attention. The work that we present in this paper aims to investigate the suitability of deep learning approaches in recognizing and classifying words from video frames in different sign languages. We consider three sign languages, namely Indian Sign Language, American Sign Language, and Turkish Sign Language. Our methodology employs five different deep learning models with increasing complexities. They are a shallow four-layer Convolutional Neural Network, a basic VGG16 model, a VGG16 model with Attention Mechanism, a VGG16 model with Transformer Encoder and Gated Recurrent Units-based Decoder, and an Inflated 3D model with the same. We trained and tested the models to recognize and classify words from videos in three different sign language datasets. From our experiment, we found that the performance of the models relates quite closely to the model's complexity with the Inflated 3D model performing the best. Furthermore, we also found that all models find it more difficult to recognize words in the American Sign Language dataset than the others.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 15th International Conference on Developments in eSystems Engineering (DeSE)

自引率

0.00%

发文量