Van-Dinh Do, Van-Hung Le, Huu-Son Do, Van-Nam Phan, Trung-Hieu Te
{"title":"利用深度学习识别基于 RGB 图像的手势的 TQU-HG 数据集和比较研究","authors":"Van-Dinh Do, Van-Hung Le, Huu-Son Do, Van-Nam Phan, Trung-Hieu Te","doi":"10.11591/ijeecs.v34.i3.pp1603-1617","DOIUrl":null,"url":null,"abstract":"Hand gesture recognition has great applications in human-computer interaction (HCI), human-robot interaction (HRI), and supporting the deaf and mute. To build a hand gesture recognition model using deep learning (DL) with high results then needs to be trained on many data and in many different conditions and contexts. In this paper, we publish the TQU-HG dataset of large RGB images with low resolution (640×480) pixels, low light conditions, and fast speed (16 fps). TQU-HG dataset includes 60,000 images collected from 20 people (10 male, 10 female) with 15 gestures of both left and right hands. A comparative study with two branches: i) based on Mediapipe TML and ii) Based on convolutional neural networks (CNNs) (you only look once (YOLO); YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLO-Nas, single shot multiBox detector (SSD) VGG16, residual network (ResNet)18, ResNext50, ResNet152, ResNext50, MobileNet V3 small, and MobileNet V3 large), the architecture and operation of CNNs models are also introduced in detail. We especially fine-tune the model and evaluate it on TQU-HG and HaGRID datasets. The quantitative results of the training and testing are presented (F1-score of YOLOv8, YOLO-Nas, MobileNet V3 small, ResNet50 is 98.99%, 98.98%, 99.27%, 99.36%, respectively on the TQU-HG dataset and is 99.21%, 99.37%, 99.36%, 86.4%, 98.3%, respectively on the HaGRID dataset). The computation time of YOLOv8 is 6.19 fps on the CPU and 18.28 fps on the GPU.","PeriodicalId":13480,"journal":{"name":"Indonesian Journal of Electrical Engineering and Computer Science","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"TQU-HG dataset and comparative study for hand gesture recognition of RGB-based images using deep learning\",\"authors\":\"Van-Dinh Do, Van-Hung Le, Huu-Son Do, Van-Nam Phan, Trung-Hieu Te\",\"doi\":\"10.11591/ijeecs.v34.i3.pp1603-1617\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hand gesture recognition has great applications in human-computer interaction (HCI), human-robot interaction (HRI), and supporting the deaf and mute. To build a hand gesture recognition model using deep learning (DL) with high results then needs to be trained on many data and in many different conditions and contexts. In this paper, we publish the TQU-HG dataset of large RGB images with low resolution (640×480) pixels, low light conditions, and fast speed (16 fps). TQU-HG dataset includes 60,000 images collected from 20 people (10 male, 10 female) with 15 gestures of both left and right hands. A comparative study with two branches: i) based on Mediapipe TML and ii) Based on convolutional neural networks (CNNs) (you only look once (YOLO); YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLO-Nas, single shot multiBox detector (SSD) VGG16, residual network (ResNet)18, ResNext50, ResNet152, ResNext50, MobileNet V3 small, and MobileNet V3 large), the architecture and operation of CNNs models are also introduced in detail. We especially fine-tune the model and evaluate it on TQU-HG and HaGRID datasets. The quantitative results of the training and testing are presented (F1-score of YOLOv8, YOLO-Nas, MobileNet V3 small, ResNet50 is 98.99%, 98.98%, 99.27%, 99.36%, respectively on the TQU-HG dataset and is 99.21%, 99.37%, 99.36%, 86.4%, 98.3%, respectively on the HaGRID dataset). The computation time of YOLOv8 is 6.19 fps on the CPU and 18.28 fps on the GPU.\",\"PeriodicalId\":13480,\"journal\":{\"name\":\"Indonesian Journal of Electrical Engineering and Computer Science\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Indonesian Journal of Electrical Engineering and Computer Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.11591/ijeecs.v34.i3.pp1603-1617\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Indonesian Journal of Electrical Engineering and Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11591/ijeecs.v34.i3.pp1603-1617","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Mathematics","Score":null,"Total":0}
TQU-HG dataset and comparative study for hand gesture recognition of RGB-based images using deep learning
Hand gesture recognition has great applications in human-computer interaction (HCI), human-robot interaction (HRI), and supporting the deaf and mute. To build a hand gesture recognition model using deep learning (DL) with high results then needs to be trained on many data and in many different conditions and contexts. In this paper, we publish the TQU-HG dataset of large RGB images with low resolution (640×480) pixels, low light conditions, and fast speed (16 fps). TQU-HG dataset includes 60,000 images collected from 20 people (10 male, 10 female) with 15 gestures of both left and right hands. A comparative study with two branches: i) based on Mediapipe TML and ii) Based on convolutional neural networks (CNNs) (you only look once (YOLO); YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLO-Nas, single shot multiBox detector (SSD) VGG16, residual network (ResNet)18, ResNext50, ResNet152, ResNext50, MobileNet V3 small, and MobileNet V3 large), the architecture and operation of CNNs models are also introduced in detail. We especially fine-tune the model and evaluate it on TQU-HG and HaGRID datasets. The quantitative results of the training and testing are presented (F1-score of YOLOv8, YOLO-Nas, MobileNet V3 small, ResNet50 is 98.99%, 98.98%, 99.27%, 99.36%, respectively on the TQU-HG dataset and is 99.21%, 99.37%, 99.36%, 86.4%, 98.3%, respectively on the HaGRID dataset). The computation time of YOLOv8 is 6.19 fps on the CPU and 18.28 fps on the GPU.
期刊介绍:
The aim of Indonesian Journal of Electrical Engineering and Computer Science (formerly TELKOMNIKA Indonesian Journal of Electrical Engineering) is to publish high-quality articles dedicated to all aspects of the latest outstanding developments in the field of electrical engineering. Its scope encompasses the applications of Telecommunication and Information Technology, Applied Computing and Computer, Instrumentation and Control, Electrical (Power), Electronics Engineering and Informatics which covers, but not limited to, the following scope: Signal Processing[...] Electronics[...] Electrical[...] Telecommunication[...] Instrumentation & Control[...] Computing and Informatics[...]