使用深度学习的手语识别

M. Mahyoub, F. Natalia, S. Sudirman, J. Mustafina
{"title":"使用深度学习的手语识别","authors":"M. Mahyoub, F. Natalia, S. Sudirman, J. Mustafina","doi":"10.1109/DeSE58274.2023.10100055","DOIUrl":null,"url":null,"abstract":"Sign Language Recognition is a form of action recognition problem. The purpose of such a system is to automatically translate sign words from one language to another. While much work has been done in the SLR domain, it is a broad area of study and numerous areas still need research attention. The work that we present in this paper aims to investigate the suitability of deep learning approaches in recognizing and classifying words from video frames in different sign languages. We consider three sign languages, namely Indian Sign Language, American Sign Language, and Turkish Sign Language. Our methodology employs five different deep learning models with increasing complexities. They are a shallow four-layer Convolutional Neural Network, a basic VGG16 model, a VGG16 model with Attention Mechanism, a VGG16 model with Transformer Encoder and Gated Recurrent Units-based Decoder, and an Inflated 3D model with the same. We trained and tested the models to recognize and classify words from videos in three different sign language datasets. From our experiment, we found that the performance of the models relates quite closely to the model's complexity with the Inflated 3D model performing the best. Furthermore, we also found that all models find it more difficult to recognize words in the American Sign Language dataset than the others.","PeriodicalId":346847,"journal":{"name":"2023 15th International Conference on Developments in eSystems Engineering (DeSE)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Sign Language Recognition using Deep Learning\",\"authors\":\"M. Mahyoub, F. Natalia, S. Sudirman, J. Mustafina\",\"doi\":\"10.1109/DeSE58274.2023.10100055\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sign Language Recognition is a form of action recognition problem. The purpose of such a system is to automatically translate sign words from one language to another. While much work has been done in the SLR domain, it is a broad area of study and numerous areas still need research attention. The work that we present in this paper aims to investigate the suitability of deep learning approaches in recognizing and classifying words from video frames in different sign languages. We consider three sign languages, namely Indian Sign Language, American Sign Language, and Turkish Sign Language. Our methodology employs five different deep learning models with increasing complexities. They are a shallow four-layer Convolutional Neural Network, a basic VGG16 model, a VGG16 model with Attention Mechanism, a VGG16 model with Transformer Encoder and Gated Recurrent Units-based Decoder, and an Inflated 3D model with the same. We trained and tested the models to recognize and classify words from videos in three different sign language datasets. From our experiment, we found that the performance of the models relates quite closely to the model's complexity with the Inflated 3D model performing the best. Furthermore, we also found that all models find it more difficult to recognize words in the American Sign Language dataset than the others.\",\"PeriodicalId\":346847,\"journal\":{\"name\":\"2023 15th International Conference on Developments in eSystems Engineering (DeSE)\",\"volume\":\"94 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 15th International Conference on Developments in eSystems Engineering (DeSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DeSE58274.2023.10100055\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 15th International Conference on Developments in eSystems Engineering (DeSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DeSE58274.2023.10100055","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

手语识别是动作识别问题的一种形式。这种系统的目的是自动将手语从一种语言翻译成另一种语言。虽然在单反领域已经做了很多工作,但它是一个广泛的研究领域,还有许多领域需要关注。我们在本文中提出的工作旨在研究深度学习方法在识别和分类不同手语视频帧中的单词方面的适用性。我们考虑三种手语,即印度手语,美国手语和土耳其手语。我们的方法采用了五种不同的深度学习模型,其复杂性不断增加。它们是浅四层卷积神经网络,基本VGG16模型,带注意机制的VGG16模型,带变压器编码器和基于门控循环单元的解码器的VGG16模型,以及具有相同功能的充气3D模型。我们训练并测试了这些模型,以识别和分类三种不同的手语数据集中的视频中的单词。从实验中,我们发现模型的性能与模型的复杂性密切相关,其中充气3D模型表现最好。此外,我们还发现所有模型都发现在美国手语数据集中识别单词比其他模型更难。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Sign Language Recognition using Deep Learning
Sign Language Recognition is a form of action recognition problem. The purpose of such a system is to automatically translate sign words from one language to another. While much work has been done in the SLR domain, it is a broad area of study and numerous areas still need research attention. The work that we present in this paper aims to investigate the suitability of deep learning approaches in recognizing and classifying words from video frames in different sign languages. We consider three sign languages, namely Indian Sign Language, American Sign Language, and Turkish Sign Language. Our methodology employs five different deep learning models with increasing complexities. They are a shallow four-layer Convolutional Neural Network, a basic VGG16 model, a VGG16 model with Attention Mechanism, a VGG16 model with Transformer Encoder and Gated Recurrent Units-based Decoder, and an Inflated 3D model with the same. We trained and tested the models to recognize and classify words from videos in three different sign language datasets. From our experiment, we found that the performance of the models relates quite closely to the model's complexity with the Inflated 3D model performing the best. Furthermore, we also found that all models find it more difficult to recognize words in the American Sign Language dataset than the others.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Using Simulation for Investigating Emergency Traffic Situations Real- Time Healthcare Monitoring and Treatment System Based Microcontroller with IoT Automated Face Mask Detection using Artificial Intelligence and Video Surveillance Management Improvement of the Personnel Delivery System in the Mining Complex using Simulation Models An Exploratory Study on the Impact of Hosting Blockchain Applications in Cloud Infrastructures
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1