How Does the Data set Affect CNN-based Image Classification Performance?

2018 5th International Conference on Systems and Informatics (ICSAI) Pub Date : 2018-11-01 DOI:10.1109/ICSAI.2018.8599448

Chao Luo, Xiaojie Li, Lutao Wang, Jia He, Denggao Li, Jiliu Zhou

{"title":"How Does the Data set Affect CNN-based Image Classification Performance?","authors":"Chao Luo, Xiaojie Li, Lutao Wang, Jia He, Denggao Li, Jiliu Zhou","doi":"10.1109/ICSAI.2018.8599448","DOIUrl":null,"url":null,"abstract":"Convolutional neural networks (ConvNets or CNNs) have been proven very effective in areas such as image recognition and classification. Especially in the field of image classification, the CNN-based method has achieved excellent performance. Performance is an important indicator for evaluating whether a CNN-based classification method is excellent, so it is important to study which factors affect performance. As we all know, image classification performance is affected by the network structure itself and the size of the data set. In particular, data set size have a significant impact on performance. While for most people, a large number of data set are difficult to obtain. Thus, we consider a question of this approach: How does the size of the data set affect performance? In order to clarify this issue, there are 35 groups experiment performed with 5 times experiment in each group (175 experiments in total). For each k-classification experiment, we do 5 groups by increasing the size of the training set. Observe changes in accuracy to analyze the effect of data set size on difference. For the same CNN-based network, experimental results of average accuracy illustrate that the larger the training set, the higher the test accuracy. However, when the training data set are insufficient, better results can be obtained. Furthermore, in each group experiment, the more categories that are classified, the more obvious the performance change. Results of this paper not only can guide us to do experiments on image classification, but also have important guiding significance for other experiments based on deep learning.","PeriodicalId":375852,"journal":{"name":"2018 5th International Conference on Systems and Informatics (ICSAI)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 5th International Conference on Systems and Informatics (ICSAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSAI.2018.8599448","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 34

Abstract

Convolutional neural networks (ConvNets or CNNs) have been proven very effective in areas such as image recognition and classification. Especially in the field of image classification, the CNN-based method has achieved excellent performance. Performance is an important indicator for evaluating whether a CNN-based classification method is excellent, so it is important to study which factors affect performance. As we all know, image classification performance is affected by the network structure itself and the size of the data set. In particular, data set size have a significant impact on performance. While for most people, a large number of data set are difficult to obtain. Thus, we consider a question of this approach: How does the size of the data set affect performance? In order to clarify this issue, there are 35 groups experiment performed with 5 times experiment in each group (175 experiments in total). For each k-classification experiment, we do 5 groups by increasing the size of the training set. Observe changes in accuracy to analyze the effect of data set size on difference. For the same CNN-based network, experimental results of average accuracy illustrate that the larger the training set, the higher the test accuracy. However, when the training data set are insufficient, better results can be obtained. Furthermore, in each group experiment, the more categories that are classified, the more obvious the performance change. Results of this paper not only can guide us to do experiments on image classification, but also have important guiding significance for other experiments based on deep learning.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

数据集如何影响基于cnn的图像分类性能?

卷积神经网络(ConvNets或cnn)已被证明在图像识别和分类等领域非常有效。特别是在图像分类领域，基于cnn的方法取得了优异的成绩。性能是评价一种基于cnn的分类方法是否优秀的重要指标，因此研究哪些因素会影响性能是很重要的。众所周知，图像分类性能受网络结构本身和数据集大小的影响。特别是，数据集的大小对性能有很大的影响。而对于大多数人来说，大量的数据集是很难获得的。因此，我们考虑这种方法的一个问题:数据集的大小如何影响性能?为了澄清这一问题，共进行了35组实验，每组5次实验，共175次实验。对于每个k分类实验，我们通过增加训练集的大小来做5组。观察准确率的变化，分析数据集大小对差异的影响。对于相同的基于cnn的网络，平均准确率的实验结果表明，训练集越大，测试准确率越高。然而，当训练数据集不足时，可以获得更好的结果。此外，在每组实验中，分类的类别越多，性能变化越明显。本文的研究结果不仅可以指导我们进行图像分类的实验，对其他基于深度学习的实验也具有重要的指导意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2018 5th International Conference on Systems and Informatics (ICSAI)

自引率

0.00%

发文量