Learning From Less Data: A Unified Data Subset Selection and Active Learning Framework for Computer Vision

2019 IEEE Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2019-01-01 DOI:10.1109/WACV.2019.00142

Vishal Kaushal, Rishabh K. Iyer, S. Kothawade, Rohan Mahadev, Khoshrav Doctor, Ganesh Ramakrishnan

{"title":"Learning From Less Data: A Unified Data Subset Selection and Active Learning Framework for Computer Vision","authors":"Vishal Kaushal, Rishabh K. Iyer, S. Kothawade, Rohan Mahadev, Khoshrav Doctor, Ganesh Ramakrishnan","doi":"10.1109/WACV.2019.00142","DOIUrl":null,"url":null,"abstract":"Supervised machine learning based state-of-the-art computer vision techniques are in general data hungry. Their data curation poses the challenges of expensive human labeling, inadequate computing resources and larger experiment turn around times. Training data subset selection and active learning techniques have been proposed as possible solutions to these challenges. A special class of subset selection functions naturally model notions of diversity, coverage and representation and can be used to eliminate redundancy thus lending themselves well for training data subset selection. They can also help improve the efficiency of active learning in further reducing human labeling efforts by selecting a subset of the examples obtained using the conventional uncertainty sampling based techniques. In this work, we empirically demonstrate the effectiveness of two diversity models, namely the Facility-Location and Dispersion models for training-data subset selection and reducing labeling effort. We demonstrate this across the board for a variety of computer vision tasks including Gender Recognition, Face Recognition, Scene Recognition, Object Detection and Object Recognition. Our results show that diversity based subset selection done in the right way can increase the accuracy by upto 5 - 10% over existing baselines, particularly in settings in which less training data is available. This allows the training of complex machine learning models like Convolutional Neural Networks with much less training data and labeling costs while incurring minimal performance loss.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"61","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WACV.2019.00142","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 61

Abstract

Supervised machine learning based state-of-the-art computer vision techniques are in general data hungry. Their data curation poses the challenges of expensive human labeling, inadequate computing resources and larger experiment turn around times. Training data subset selection and active learning techniques have been proposed as possible solutions to these challenges. A special class of subset selection functions naturally model notions of diversity, coverage and representation and can be used to eliminate redundancy thus lending themselves well for training data subset selection. They can also help improve the efficiency of active learning in further reducing human labeling efforts by selecting a subset of the examples obtained using the conventional uncertainty sampling based techniques. In this work, we empirically demonstrate the effectiveness of two diversity models, namely the Facility-Location and Dispersion models for training-data subset selection and reducing labeling effort. We demonstrate this across the board for a variety of computer vision tasks including Gender Recognition, Face Recognition, Scene Recognition, Object Detection and Object Recognition. Our results show that diversity based subset selection done in the right way can increase the accuracy by upto 5 - 10% over existing baselines, particularly in settings in which less training data is available. This allows the training of complex machine learning models like Convolutional Neural Networks with much less training data and labeling costs while incurring minimal performance loss.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

从更少的数据中学习:计算机视觉的统一数据子集选择和主动学习框架

基于最先进的计算机视觉技术的监督机器学习通常需要大量数据。他们的数据管理面临着昂贵的人工标签、不足的计算资源和更大的实验周期的挑战。训练数据子集选择和主动学习技术已经被提出作为这些挑战的可能解决方案。一类特殊的子集选择函数自然地对多样性、覆盖范围和表示概念进行建模，并可用于消除冗余，从而很好地用于训练数据子集选择。它们还可以帮助提高主动学习的效率，通过选择使用传统的基于不确定性采样的技术获得的示例的子集，进一步减少人类的标记工作。在这项工作中，我们通过经验证明了两个多样性模型的有效性，即设施-位置和分散模型，用于训练数据子集选择和减少标记工作。我们在各种计算机视觉任务中全面展示了这一点，包括性别识别、人脸识别、场景识别、对象检测和对象识别。我们的研究结果表明，以正确的方式进行基于多样性的子集选择可以将准确率提高到现有基线的5 - 10%，特别是在可用训练数据较少的情况下。这允许训练复杂的机器学习模型，如卷积神经网络，使用更少的训练数据和标记成本，同时产生最小的性能损失。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

自引率

0.00%

发文量