iLab-20M: A Large-Scale Controlled Object Dataset to Investigate Deep Learning

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2016-07-01 DOI:10.1109/CVPR.2016.244

A. Borji, S. Izadi, L. Itti

{"title":"iLab-20M: A Large-Scale Controlled Object Dataset to Investigate Deep Learning","authors":"A. Borji, S. Izadi, L. Itti","doi":"10.1109/CVPR.2016.244","DOIUrl":null,"url":null,"abstract":"Tolerance to image variations (e.g., translation, scale, pose, illumination, background) is an important desired property of any object recognition system, be it human or machine. Moving towards increasingly bigger datasets has been trending in computer vision especially with the emergence of highly popular deep learning models. While being very useful for learning invariance to object inter-and intra-class shape variability, these large-scale wild datasets are not very useful for learning invariance to other parameters urging researchers to resort to other tricks for training models. In this work, we introduce a large-scale synthetic dataset, which is freely and publicly available, and use it to answer several fundamental questions regarding selectivity and invariance properties of convolutional neural networks. Our dataset contains two parts: a) objects shot on a turntable: 15 categories, 8 rotation angles, 11 cameras on a semi-circular arch, 5 lighting conditions, 3 focus levels, variety of backgrounds (23.4 per instance) generating 1320 images per instance (about 22 million images in total), and b) scenes: in which a robotic arm takes pictures of objects on a 1:160 scale scene. We study: 1) invariance and selectivity of different CNN layers, 2) knowledge transfer from one object category to another, 3) systematic or random sampling of images to build a train set, 4) domain adaptation from synthetic to natural scenes, and 5) order of knowledge delivery to CNNs. We also discuss how our analyses can lead the field to develop more efficient deep learning methods.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"24 1","pages":"2221-2230"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"66","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR.2016.244","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 66

Abstract

Tolerance to image variations (e.g., translation, scale, pose, illumination, background) is an important desired property of any object recognition system, be it human or machine. Moving towards increasingly bigger datasets has been trending in computer vision especially with the emergence of highly popular deep learning models. While being very useful for learning invariance to object inter-and intra-class shape variability, these large-scale wild datasets are not very useful for learning invariance to other parameters urging researchers to resort to other tricks for training models. In this work, we introduce a large-scale synthetic dataset, which is freely and publicly available, and use it to answer several fundamental questions regarding selectivity and invariance properties of convolutional neural networks. Our dataset contains two parts: a) objects shot on a turntable: 15 categories, 8 rotation angles, 11 cameras on a semi-circular arch, 5 lighting conditions, 3 focus levels, variety of backgrounds (23.4 per instance) generating 1320 images per instance (about 22 million images in total), and b) scenes: in which a robotic arm takes pictures of objects on a 1:160 scale scene. We study: 1) invariance and selectivity of different CNN layers, 2) knowledge transfer from one object category to another, 3) systematic or random sampling of images to build a train set, 4) domain adaptation from synthetic to natural scenes, and 5) order of knowledge delivery to CNNs. We also discuss how our analyses can lead the field to develop more efficient deep learning methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

iLab-20M:用于研究深度学习的大规模受控对象数据集

对图像变化的容忍度(例如，平移，比例，姿势，照明，背景)是任何物体识别系统的重要期望属性，无论是人类还是机器。越来越大的数据集是计算机视觉的趋势，特别是随着高度流行的深度学习模型的出现。虽然这些大规模的野生数据集对于学习对象类间和类内形状可变性的不变性非常有用，但对于学习其他参数的不变性并不是很有用，这促使研究人员求助于其他技巧来训练模型。在这项工作中，我们引入了一个大规模的合成数据集，它是免费和公开的，并使用它来回答关于卷积神经网络的选择性和不变性的几个基本问题。我们的数据集包含两部分:a)在转盘上拍摄的物体:15个类别，8个旋转角度，半圆形拱门上的11个摄像头，5个照明条件，3个对焦级别，各种背景(每个实例23.4个)，每个实例生成1320张图像(总共约2200万张图像)，b)场景:机器人手臂在1:160比例的场景中拍摄物体。我们研究了:1)不同CNN层的不变性和选择性，2)从一个对象类别到另一个对象类别的知识迁移，3)系统或随机采样图像以构建训练集，4)从合成场景到自然场景的领域自适应，5)知识传递到CNN的顺序。我们还讨论了我们的分析如何能够引领该领域开发更有效的深度学习方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

自引率

0.00%

发文量

期刊最新文献

Sketch Me That Shoe Multivariate Regression on the Grassmannian for Predicting Novel Domains How Hard Can It Be? Estimating the Difficulty of Visual Search in an Image Discovering the Physical Parts of an Articulated Object Class from Multiple Videos Simultaneous Optical Flow and Intensity Estimation from an Event Camera