{"title":"American Sign Language Recognition using YOLOv4 Method","authors":"Ali Al-Shaheen, Mesut Cevik, Alzubair Alqaraghuli","doi":"10.36287/ijmsit.6.1.61","DOIUrl":null,"url":null,"abstract":"– Sign language is one of the ways of communication that is used by people who are unable to speak or hear (deaf and mute), so not all people are able to understand this language. Therefore, to facilitate communication between normal people and deaf and mute people, many systems have been invented that translate gestures and signs within sign language into words to be understandable. The aim behind this research is to train a model to be able to detect and recognize hand gestures and signs and then translate them into letters, numbers and words using You Only Look One (YOLO) method through pictures or videos, even in real time. YOLO is one of the methods used in detecting and recognizing things that depend in their work on convolutional neural networks (CNN), which are characterized by accuracy and speed in work. In this research, we have created a data set consisting of 8000 images divided into 40 classes, for each class, 200 images were taken with different backgrounds and under lighting conditions, which allows the model to be able to differentiate the signal regardless of the intensity of the lighting or the clarity of the image. And after training the model on the dataset many times, in the experiment using image data we got a very good results in terms of MAP = 98.01% as an accuracy and current average loss=1.3 and recall=0.96 and F1=0.96, and for video results it has the same accuracy and 28.9 frame per second (fps).","PeriodicalId":166049,"journal":{"name":"International Journal of Multidisciplinary Studies and Innovative Technologies","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Multidisciplinary Studies and Innovative Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.36287/ijmsit.6.1.61","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
– Sign language is one of the ways of communication that is used by people who are unable to speak or hear (deaf and mute), so not all people are able to understand this language. Therefore, to facilitate communication between normal people and deaf and mute people, many systems have been invented that translate gestures and signs within sign language into words to be understandable. The aim behind this research is to train a model to be able to detect and recognize hand gestures and signs and then translate them into letters, numbers and words using You Only Look One (YOLO) method through pictures or videos, even in real time. YOLO is one of the methods used in detecting and recognizing things that depend in their work on convolutional neural networks (CNN), which are characterized by accuracy and speed in work. In this research, we have created a data set consisting of 8000 images divided into 40 classes, for each class, 200 images were taken with different backgrounds and under lighting conditions, which allows the model to be able to differentiate the signal regardless of the intensity of the lighting or the clarity of the image. And after training the model on the dataset many times, in the experiment using image data we got a very good results in terms of MAP = 98.01% as an accuracy and current average loss=1.3 and recall=0.96 and F1=0.96, and for video results it has the same accuracy and 28.9 frame per second (fps).
手语是不能说话或听不见(聋哑人)的人使用的一种交流方式,所以不是所有的人都能理解这种语言。因此,为了方便正常人和聋哑人之间的交流,人们发明了许多系统,将手语中的手势和符号翻译成易于理解的文字。这项研究的目的是训练一个能够检测和识别手势和手势的模型,然后通过图片或视频,甚至是实时地,使用You Only Look One (YOLO)方法将它们翻译成字母、数字和单词。YOLO是用于检测和识别事物的方法之一,其工作依赖于卷积神经网络(CNN),其特点是工作的准确性和速度。在本研究中,我们创建了一个由8000张图像组成的数据集,分为40类,每类中有200张图像是在不同背景和光照条件下拍摄的,这使得模型能够在光照强度和图像清晰度的情况下区分信号。在数据集上对模型进行多次训练后,在使用图像数据的实验中,我们获得了MAP = 98.01%作为准确率,当前平均损失=1.3,召回率=0.96,F1=0.96的非常好的结果,对于视频结果,它具有相同的准确率和28.9帧/秒(fps)。