{"title":"Hand Segmentation With Dense Dilated U-Net and Structurally Incoherent Nonnegative Matrix Factorization-Based Gesture Recognition","authors":"Kankana Roy;Rajiv R. Sahay","doi":"10.1109/THMS.2024.3390415","DOIUrl":null,"url":null,"abstract":"Robust segmentation of hands in a cluttered environment for hand gesture recognition has remained a challenge in computer vision. In this work, a two-stage gesture recognition framework is proposed. In the first stage, we segment hands using the proposed deep learning algorithm, and in the second stage, we use these segmented hands to classify gestures using a novel structurally incoherent nonnegative matrix factorization approach. We propose a new deep learning framework for hand segmentation called densely dilated U-Net. We exploit recently proposed dense blocks and dilated convolution layers in our work. To cope with the scarcity of labeled datasets we extend our densely dilated U-Net for semisupervised hand segmentation using hand bounding boxes as cues. We provide quantitative and qualitative evaluation of proposed hand segmentation model on several public hand segmentation datasets including EgoHands, GTEA, EYTH, EDSH, and HOF. Semisupervised segmentation results are also obtained on two hand detection datasets including VIVA and CVRR. As an extension of our work, we show semisupervised segmentation and gesture recognition results using segmented hands on NUS-II cluttered hand gesture dataset. To validate the efficiency of our semisupervised algorithm we evaluate it on OUHands dataset with real ground truth labels. For gesture classification, we propose a novel structurally incoherent nonnegative matrix factorization algorithm. We propose to use CNN features extracted from segmented images for nonnegative matrix factorization. Experimental results on NUS-II and OUHands datasets demonstrate that our two-stage approach for gesture recognition yields superior results.","PeriodicalId":48916,"journal":{"name":"IEEE Transactions on Human-Machine Systems","volume":null,"pages":null},"PeriodicalIF":3.5000,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Human-Machine Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10522620/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Robust segmentation of hands in a cluttered environment for hand gesture recognition has remained a challenge in computer vision. In this work, a two-stage gesture recognition framework is proposed. In the first stage, we segment hands using the proposed deep learning algorithm, and in the second stage, we use these segmented hands to classify gestures using a novel structurally incoherent nonnegative matrix factorization approach. We propose a new deep learning framework for hand segmentation called densely dilated U-Net. We exploit recently proposed dense blocks and dilated convolution layers in our work. To cope with the scarcity of labeled datasets we extend our densely dilated U-Net for semisupervised hand segmentation using hand bounding boxes as cues. We provide quantitative and qualitative evaluation of proposed hand segmentation model on several public hand segmentation datasets including EgoHands, GTEA, EYTH, EDSH, and HOF. Semisupervised segmentation results are also obtained on two hand detection datasets including VIVA and CVRR. As an extension of our work, we show semisupervised segmentation and gesture recognition results using segmented hands on NUS-II cluttered hand gesture dataset. To validate the efficiency of our semisupervised algorithm we evaluate it on OUHands dataset with real ground truth labels. For gesture classification, we propose a novel structurally incoherent nonnegative matrix factorization algorithm. We propose to use CNN features extracted from segmented images for nonnegative matrix factorization. Experimental results on NUS-II and OUHands datasets demonstrate that our two-stage approach for gesture recognition yields superior results.
期刊介绍:
The scope of the IEEE Transactions on Human-Machine Systems includes the fields of human machine systems. It covers human systems and human organizational interactions including cognitive ergonomics, system test and evaluation, and human information processing concerns in systems and organizations.