Robust Hand Detection and Classification in Vehicles and in the Wild

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) Pub Date : 2017-07-01 DOI:10.1109/CVPRW.2017.159

T. Le, Kha Gia Quach, Chenchen Zhu, C. Duong, Khoa Luu, M. Savvides

{"title":"Robust Hand Detection and Classification in Vehicles and in the Wild","authors":"T. Le, Kha Gia Quach, Chenchen Zhu, C. Duong, Khoa Luu, M. Savvides","doi":"10.1109/CVPRW.2017.159","DOIUrl":null,"url":null,"abstract":"Robust hand detection and classification is one of the most crucial pre-processing steps to support human computer interaction, driver behavior monitoring, virtual reality, etc. This problem, however, is very challenging due to numerous variations of hand images in real-world scenarios. This work presents a novel approach named Multiple Scale Region-based Fully Convolutional Networks (MSRFCN) to robustly detect and classify human hand regions under various challenging conditions, e.g. occlusions, illumination, low-resolutions. In this approach, the whole image is passed through the proposed fully convolutional network to compute score maps. Those score maps with their position-sensitive properties can help to efficiently address a dilemma between translation-invariance in classification and detection. The method is evaluated on the challenging hand databases, i.e. the Vision for Intelligent Vehicles and Applications (VIVA) Challenge, Oxford hand dataset and compared against various recent hand detection methods. The experimental results show that our proposed MS-FRCN approach consistently achieves the state-of-the-art hand detection results, i.e. Average Precision (AP) / Average Recall (AR) of 95.1% / 94.5% at level 1 and 86.0% / 83.4% at level 2, on the VIVA challenge. In addition, the proposed method achieves the state-of-the-art results for left/right hand and driver/passenger classification tasks on the VIVA database with a significant improvement on AP/AR of ~7% and ~13% for both classification tasks, respectively. The hand detection performance of MS-RFCN reaches to 75.1% of AP and 77.8% of AR on Oxford database.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"32 1","pages":"1203-1210"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"54","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPRW.2017.159","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 54

Abstract

Robust hand detection and classification is one of the most crucial pre-processing steps to support human computer interaction, driver behavior monitoring, virtual reality, etc. This problem, however, is very challenging due to numerous variations of hand images in real-world scenarios. This work presents a novel approach named Multiple Scale Region-based Fully Convolutional Networks (MSRFCN) to robustly detect and classify human hand regions under various challenging conditions, e.g. occlusions, illumination, low-resolutions. In this approach, the whole image is passed through the proposed fully convolutional network to compute score maps. Those score maps with their position-sensitive properties can help to efficiently address a dilemma between translation-invariance in classification and detection. The method is evaluated on the challenging hand databases, i.e. the Vision for Intelligent Vehicles and Applications (VIVA) Challenge, Oxford hand dataset and compared against various recent hand detection methods. The experimental results show that our proposed MS-FRCN approach consistently achieves the state-of-the-art hand detection results, i.e. Average Precision (AP) / Average Recall (AR) of 95.1% / 94.5% at level 1 and 86.0% / 83.4% at level 2, on the VIVA challenge. In addition, the proposed method achieves the state-of-the-art results for left/right hand and driver/passenger classification tasks on the VIVA database with a significant improvement on AP/AR of ~7% and ~13% for both classification tasks, respectively. The hand detection performance of MS-RFCN reaches to 75.1% of AP and 77.8% of AR on Oxford database.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

车辆和野外的鲁棒手部检测与分类

鲁棒手部检测与分类是支持人机交互、驾驶员行为监控、虚拟现实等最关键的预处理步骤之一。然而，这个问题是非常具有挑战性的，因为在现实世界中，手的图像有很多变化。这项工作提出了一种名为基于多尺度区域的全卷积网络(MSRFCN)的新方法，用于在各种具有挑战性的条件下(例如遮挡，照明，低分辨率)稳健地检测和分类人类的手部区域。在这种方法中，整个图像通过所提出的全卷积网络来计算分数映射。这些具有位置敏感特性的分数图可以帮助有效地解决分类和检测中翻译不变性之间的困境。该方法在具有挑战性的手部数据库上进行了评估，即智能车辆视觉与应用(VIVA)挑战，牛津手部数据集，并与各种最新的手部检测方法进行了比较。实验结果表明，我们提出的MS-FRCN方法在VIVA挑战上始终能够达到最先进的手检测结果，即平均精度(AP) /平均召回率(AR)在水平1为95.1% / 94.5%，在水平2为86.0% / 83.4%。此外，本文提出的方法在VIVA数据库上实现了左/右手和驾驶员/乘客分类任务的最先进结果，两种分类任务的AP/AR分别提高了7%和13%。MS-RFCN在牛津数据库上的手部检测性能达到AP的75.1%和AR的77.8%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

自引率

0.00%

发文量