通过地形滤波图学习不变特征

2009 IEEE Conference on Computer Vision and Pattern Recognition Pub Date : 2009-06-20 DOI:10.1109/CVPR.2009.5206545

K. Kavukcuoglu, Marc'Aurelio Ranzato, R. Fergus, Yann LeCun

{"title":"通过地形滤波图学习不变特征","authors":"K. Kavukcuoglu, Marc'Aurelio Ranzato, R. Fergus, Yann LeCun","doi":"10.1109/CVPR.2009.5206545","DOIUrl":null,"url":null,"abstract":"Several recently-proposed architectures for high-performance object recognition are composed of two main stages: a feature extraction stage that extracts locally-invariant feature vectors from regularly spaced image patches, and a somewhat generic supervised classifier. The first stage is often composed of three main modules: (1) a bank of filters (often oriented edge detectors); (2) a non-linear transform, such as a point-wise squashing functions, quantization, or normalization; (3) a spatial pooling operation which combines the outputs of similar filters over neighboring regions. We propose a method that automatically learns such feature extractors in an unsupervised fashion by simultaneously learning the filters and the pooling units that combine multiple filter outputs together. The method automatically generates topographic maps of similar filters that extract features of orientations, scales, and positions. These similar filters are pooled together, producing locally-invariant outputs. The learned feature descriptors give comparable results as SIFT on image recognition tasks for which SIFT is well suited, and better results than SIFT on tasks for which SIFT is less well suited.","PeriodicalId":386532,"journal":{"name":"2009 IEEE Conference on Computer Vision and Pattern Recognition","volume":"158 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"351","resultStr":"{\"title\":\"Learning invariant features through topographic filter maps\",\"authors\":\"K. Kavukcuoglu, Marc'Aurelio Ranzato, R. Fergus, Yann LeCun\",\"doi\":\"10.1109/CVPR.2009.5206545\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Several recently-proposed architectures for high-performance object recognition are composed of two main stages: a feature extraction stage that extracts locally-invariant feature vectors from regularly spaced image patches, and a somewhat generic supervised classifier. The first stage is often composed of three main modules: (1) a bank of filters (often oriented edge detectors); (2) a non-linear transform, such as a point-wise squashing functions, quantization, or normalization; (3) a spatial pooling operation which combines the outputs of similar filters over neighboring regions. We propose a method that automatically learns such feature extractors in an unsupervised fashion by simultaneously learning the filters and the pooling units that combine multiple filter outputs together. The method automatically generates topographic maps of similar filters that extract features of orientations, scales, and positions. These similar filters are pooled together, producing locally-invariant outputs. The learned feature descriptors give comparable results as SIFT on image recognition tasks for which SIFT is well suited, and better results than SIFT on tasks for which SIFT is less well suited.\",\"PeriodicalId\":386532,\"journal\":{\"name\":\"2009 IEEE Conference on Computer Vision and Pattern Recognition\",\"volume\":\"158 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"351\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE Conference on Computer Vision and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CVPR.2009.5206545\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE Conference on Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR.2009.5206545","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 351

摘要

最近提出的几种高性能目标识别体系结构由两个主要阶段组成:从规则间隔的图像补丁中提取局部不变特征向量的特征提取阶段，以及某种程度上的通用监督分类器。第一阶段通常由三个主要模块组成:(1)一组滤波器(通常是定向边缘检测器);(2)非线性变换，如逐点压缩函数、量化或归一化;(3)空间池化操作，将相邻区域上相似滤波器的输出组合在一起。我们提出了一种方法，通过同时学习过滤器和将多个过滤器输出组合在一起的池化单元，以无监督的方式自动学习这些特征提取器。该方法通过提取方向、比例尺和位置特征的相似滤波器自动生成地形图。这些相似的过滤器汇集在一起，产生局部不变的输出。学习到的特征描述符在非常适合SIFT的图像识别任务上给出与SIFT相当的结果，并且在不太适合SIFT的任务上给出比SIFT更好的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Learning invariant features through topographic filter maps

Several recently-proposed architectures for high-performance object recognition are composed of two main stages: a feature extraction stage that extracts locally-invariant feature vectors from regularly spaced image patches, and a somewhat generic supervised classifier. The first stage is often composed of three main modules: (1) a bank of filters (often oriented edge detectors); (2) a non-linear transform, such as a point-wise squashing functions, quantization, or normalization; (3) a spatial pooling operation which combines the outputs of similar filters over neighboring regions. We propose a method that automatically learns such feature extractors in an unsupervised fashion by simultaneously learning the filters and the pooling units that combine multiple filter outputs together. The method automatically generates topographic maps of similar filters that extract features of orientations, scales, and positions. These similar filters are pooled together, producing locally-invariant outputs. The learned feature descriptors give comparable results as SIFT on image recognition tasks for which SIFT is well suited, and better results than SIFT on tasks for which SIFT is less well suited.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2009 IEEE Conference on Computer Vision and Pattern Recognition

自引率

0.00%

发文量