No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects

Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference) Pub Date : 2022-08-07 DOI:10.48550/arXiv.2208.03641

Raja Sunkara, Tie Luo

{"title":"No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects","authors":"Raja Sunkara, Tie Luo","doi":"10.48550/arXiv.2208.03641","DOIUrl":null,"url":null,"abstract":"Convolutional neural networks (CNNs) have made resounding success in many computer vision tasks such as image classification and object detection. However, their performance degrades rapidly on tougher tasks where images are of low resolution or objects are small. In this paper, we point out that this roots in a defective yet common design in existing CNN architectures, namely the use of strided convolution and/or pooling layers, which results in a loss of fine-grained information and learning of less effective feature representations. To this end, we propose a new CNN building block called SPD-Conv in place of each strided convolution layer and each pooling layer (thus eliminates them altogether). SPD-Conv is comprised of a space-to-depth (SPD) layer followed by a non-strided convolution (Conv) layer, and can be applied in most if not all CNN architectures. We explain this new design under two most representative computer vision tasks: object detection and image classification. We then create new CNN architectures by applying SPD-Conv to YOLOv5 and ResNet, and empirically show that our approach significantly outperforms state-of-the-art deep learning models, especially on tougher tasks with low-resolution images and small objects. We have open-sourced our code at https://github.com/LabSAINT/SPD-Conv.","PeriodicalId":74091,"journal":{"name":"Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"40","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2208.03641","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 40

Abstract

Convolutional neural networks (CNNs) have made resounding success in many computer vision tasks such as image classification and object detection. However, their performance degrades rapidly on tougher tasks where images are of low resolution or objects are small. In this paper, we point out that this roots in a defective yet common design in existing CNN architectures, namely the use of strided convolution and/or pooling layers, which results in a loss of fine-grained information and learning of less effective feature representations. To this end, we propose a new CNN building block called SPD-Conv in place of each strided convolution layer and each pooling layer (thus eliminates them altogether). SPD-Conv is comprised of a space-to-depth (SPD) layer followed by a non-strided convolution (Conv) layer, and can be applied in most if not all CNN architectures. We explain this new design under two most representative computer vision tasks: object detection and image classification. We then create new CNN architectures by applying SPD-Conv to YOLOv5 and ResNet, and empirically show that our approach significantly outperforms state-of-the-art deep learning models, especially on tougher tasks with low-resolution images and small objects. We have open-sourced our code at https://github.com/LabSAINT/SPD-Conv.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

不再跨行卷积或池化:低分辨率图像和小对象的新CNN构建块

卷积神经网络(cnn)在图像分类和目标检测等计算机视觉任务中取得了巨大的成功。然而，在图像分辨率低或物体很小的复杂任务中，它们的性能会迅速下降。在本文中，我们指出，这源于现有CNN架构中有缺陷但常见的设计，即使用跨行卷积和/或池化层，这会导致细粒度信息的丢失和学习不太有效的特征表示。为此，我们提出了一个新的CNN构建块，称为SPD-Conv，以取代每个跨行卷积层和每个池化层(从而完全消除它们)。SPD-Conv由一个空间到深度(SPD)层和一个非跨行卷积(Conv)层组成，可以应用于大多数(如果不是所有的话)CNN架构。我们在两个最具代表性的计算机视觉任务中解释了这种新设计:目标检测和图像分类。然后，我们通过将SPD-Conv应用于YOLOv5和ResNet来创建新的CNN架构，并通过经验表明，我们的方法显着优于最先进的深度学习模型，特别是在具有低分辨率图像和小物体的更困难的任务上。我们已经在https://github.com/LabSAINT/SPD-Conv上开源了我们的代码。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)

自引率

0.00%

发文量