Real-Time Hand Gesture Recognition Using YOLO and (Darknet-19) Convolution Neural Networks

IF 1.3 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE International Journal of Innovative Computing Information and Control Pub Date : 2023-09-13 DOI:10.11113/ijic.v13n1-2.422

Raad Ahmed Mohamed, Karim Q Hussein

{"title":"Real-Time Hand Gesture Recognition Using YOLO and (Darknet-19) Convolution Neural Networks","authors":"Raad Ahmed Mohamed, Karim Q Hussein","doi":"10.11113/ijic.v13n1-2.422","DOIUrl":null,"url":null,"abstract":"There are at least three hundred and fifty million people in the world that cannot hear or speak. These are what are called deaf and dumb. Often this segment of society is partially isolated from the rest of society due to the difficulty of dealing, communicating and understanding between this segment and the rest of the healthy society. As a result of this problem, a number of solutions have been proposed that attempt to bridge this gap between this segment and the rest of society. The main reason for this is to simplify the understanding of sign language. The basic idea is building program to recognize the hand movement of the interlocutor and convert it from images to symbols or letters found in the dictionary of the deaf and dumb. This process itself follows mainly the applications of artificial intelligence, where it is important to distinguish, identify, and extract the palm of the hand from the regular images received by the camera device, and then convert this image of the movement of the paws or hands into understandable symbols. In this paper, the method of image processing and artificial intelligence, represented by the use of artificial neural networks after synthesizing the problem under research was used. Scanning the image to determine the areas of the right and left palm. Non-traditional methods that use artificial intelligence like Convolutional Neural Networks are used to fulfill this part. YOLO V-2 specifically was used in the current research with excellent results. Part Two: Building a pictorial dictionary of the letters used in teaching the deaf and dumb, after generating the image database for the dictionary, neural network Dark NET-19 were used to identify (classification) the images of characters extracted from the first part of the program. The results obtained from the research show that the use of neural networks, especially convolution neural networks, is very suitable in terms of accuracy, speed of performance, and generality in processing the previously unused input data. Many of the limitations associated with using such a program without specifying specific shapes (general shape) and templates, hand shape, hand speed, hand color and other physical expressions and without using any other physical aids were overcome through the optimal use of artificial convolution neural networks.","PeriodicalId":50314,"journal":{"name":"International Journal of Innovative Computing Information and Control","volume":"67 1","pages":"0"},"PeriodicalIF":1.3000,"publicationDate":"2023-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Innovative Computing Information and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11113/ijic.v13n1-2.422","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

There are at least three hundred and fifty million people in the world that cannot hear or speak. These are what are called deaf and dumb. Often this segment of society is partially isolated from the rest of society due to the difficulty of dealing, communicating and understanding between this segment and the rest of the healthy society. As a result of this problem, a number of solutions have been proposed that attempt to bridge this gap between this segment and the rest of society. The main reason for this is to simplify the understanding of sign language. The basic idea is building program to recognize the hand movement of the interlocutor and convert it from images to symbols or letters found in the dictionary of the deaf and dumb. This process itself follows mainly the applications of artificial intelligence, where it is important to distinguish, identify, and extract the palm of the hand from the regular images received by the camera device, and then convert this image of the movement of the paws or hands into understandable symbols. In this paper, the method of image processing and artificial intelligence, represented by the use of artificial neural networks after synthesizing the problem under research was used. Scanning the image to determine the areas of the right and left palm. Non-traditional methods that use artificial intelligence like Convolutional Neural Networks are used to fulfill this part. YOLO V-2 specifically was used in the current research with excellent results. Part Two: Building a pictorial dictionary of the letters used in teaching the deaf and dumb, after generating the image database for the dictionary, neural network Dark NET-19 were used to identify (classification) the images of characters extracted from the first part of the program. The results obtained from the research show that the use of neural networks, especially convolution neural networks, is very suitable in terms of accuracy, speed of performance, and generality in processing the previously unused input data. Many of the limitations associated with using such a program without specifying specific shapes (general shape) and templates, hand shape, hand speed, hand color and other physical expressions and without using any other physical aids were overcome through the optimal use of artificial convolution neural networks.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用YOLO和(Darknet-19)卷积神经网络的实时手势识别

世界上至少有3.5亿人不能听或说。这就是我们所说的聋哑人。由于这部分人与健康社会的其他部分难以打交道、沟通和理解，这部分人往往与社会其他部分部分隔离。由于这个问题，已经提出了一些解决办法，试图弥合这部分人与社会其他部分之间的差距。这样做的主要原因是为了简化对手语的理解。基本思想是建立一个程序来识别对话者的手部动作，并将其从图像转换为聋哑人字典中的符号或字母。这个过程本身主要遵循人工智能的应用，其中重要的是从相机设备接收到的常规图像中区分、识别和提取手掌，然后将爪子或手的运动图像转换为可理解的符号。本文采用图像处理与人工智能相结合的方法，对所研究的问题进行综合处理，以人工神经网络为代表。扫描图像以确定左右手掌的区域。使用卷积神经网络等非传统人工智能方法来完成这一部分。在目前的研究中专门使用了YOLO V-2，取得了很好的效果。第二部分:构建聋哑教学用字母的图片字典，在为词典生成图像数据库后，利用神经网络Dark NET-19对第一部分程序中提取的字符图像进行识别(分类)。研究结果表明，使用神经网络，特别是卷积神经网络，在处理以前未使用的输入数据时，在精度、性能速度和通用性方面都是非常合适的。在不指定特定形状(一般形状)和模板、手的形状、手的速度、手的颜色和其他物理表情，不使用任何其他物理辅助的情况下，使用这种程序的许多限制都通过人工卷积神经网络的最佳使用来克服。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International Journal of Innovative Computing Information and Control 工程技术-计算机：人工智能

CiteScore

3.20

自引率

20.00%

发文量

审稿时长

4.3 months

期刊介绍： The primary aim of the International Journal of Innovative Computing, Information and Control (IJICIC) is to publish high-quality papers of new developments and trends, novel techniques and approaches, innovative methodologies and technologies on the theory and applications of intelligent systems, information and control. The IJICIC is a peer-reviewed English language journal and is published bimonthly