{"title":"Coarse-to-fine trained multi-scale Convolutional Neural Networks for image classification","authors":"Haobin Dou, Xihong Wu","doi":"10.1109/IJCNN.2015.7280542","DOIUrl":null,"url":null,"abstract":"Convolutional Neural Networks (CNNs) have become forceful models in feature learning and image classification. They achieve translation invariance by spatial convolution and pooling mechanisms, while their ability in scale invariance is limited. To tackle the problem of scale variation in image classification, this work proposed a multi-scale CNN model with depth-decreasing multi-column structure. Input images were decomposed into multiple scales and at each scale image, a CNN column was instantiated with its depth decreasing from fine to coarse scale for model simplification. Scale-invariant features were learned by weights shared across all scales and pooled among adjacent scales. Particularly, a coarse-to-fine pre-training method imitating the human's development of spatial frequency perception was proposed to train this multi-scale CNN, which accelerated the training process and reduced the classification error. In addition, model averaging technique was used to combine models obtained during pre-training and further improve the performance. With these methods, our model achieved classification errors of 15.38% on CIFAR-10 dataset and 41.29% on CIFAR-100 dataset, i.e. 1.05% and 2.97% reduction compared with single-scale CNN model.","PeriodicalId":6539,"journal":{"name":"2015 International Joint Conference on Neural Networks (IJCNN)","volume":"7 1","pages":"1-7"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN.2015.7280542","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
Convolutional Neural Networks (CNNs) have become forceful models in feature learning and image classification. They achieve translation invariance by spatial convolution and pooling mechanisms, while their ability in scale invariance is limited. To tackle the problem of scale variation in image classification, this work proposed a multi-scale CNN model with depth-decreasing multi-column structure. Input images were decomposed into multiple scales and at each scale image, a CNN column was instantiated with its depth decreasing from fine to coarse scale for model simplification. Scale-invariant features were learned by weights shared across all scales and pooled among adjacent scales. Particularly, a coarse-to-fine pre-training method imitating the human's development of spatial frequency perception was proposed to train this multi-scale CNN, which accelerated the training process and reduced the classification error. In addition, model averaging technique was used to combine models obtained during pre-training and further improve the performance. With these methods, our model achieved classification errors of 15.38% on CIFAR-10 dataset and 41.29% on CIFAR-100 dataset, i.e. 1.05% and 2.97% reduction compared with single-scale CNN model.