{"title":"How Small is Big Enough? Open Labeled Datasets and the Development of Deep Learning","authors":"Daniel Souza, Aldo Geuna, Jeff Rodríguez","doi":"arxiv-2408.10359","DOIUrl":null,"url":null,"abstract":"We investigate the emergence of Deep Learning as a technoscientific field,\nemphasizing the role of open labeled datasets. Through qualitative and\nquantitative analyses, we evaluate the role of datasets like CIFAR-10 in\nadvancing computer vision and object recognition, which are central to the Deep\nLearning revolution. Our findings highlight CIFAR-10's crucial role and\nenduring influence on the field, as well as its importance in teaching ML\ntechniques. Results also indicate that dataset characteristics such as size,\nnumber of instances, and number of categories, were key factors. Econometric\nanalysis confirms that CIFAR-10, a small-but-sufficiently-large open dataset,\nplayed a significant and lasting role in technological advancements and had a\nmajor function in the development of the early scientific literature as shown\nby citation metrics.","PeriodicalId":501273,"journal":{"name":"arXiv - ECON - General Economics","volume":"58 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - ECON - General Economics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.10359","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We investigate the emergence of Deep Learning as a technoscientific field,
emphasizing the role of open labeled datasets. Through qualitative and
quantitative analyses, we evaluate the role of datasets like CIFAR-10 in
advancing computer vision and object recognition, which are central to the Deep
Learning revolution. Our findings highlight CIFAR-10's crucial role and
enduring influence on the field, as well as its importance in teaching ML
techniques. Results also indicate that dataset characteristics such as size,
number of instances, and number of categories, were key factors. Econometric
analysis confirms that CIFAR-10, a small-but-sufficiently-large open dataset,
played a significant and lasting role in technological advancements and had a
major function in the development of the early scientific literature as shown
by citation metrics.