How deep convolutional neural networks lose spatial information with training

IF 6.3 2区 物理与天体物理 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Machine Learning Science and Technology Pub Date : 2023-11-09 DOI:10.1088/2632-2153/ad092c
Umberto Maria Tomasini, Leonardo Petrini, Francesco Cagnetta, Matthieu Wyart
{"title":"How deep convolutional neural networks lose spatial information with training","authors":"Umberto Maria Tomasini, Leonardo Petrini, Francesco Cagnetta, Matthieu Wyart","doi":"10.1088/2632-2153/ad092c","DOIUrl":null,"url":null,"abstract":"Abstract A central question of machine learning is how deep nets manage to learn tasks in high dimensions. An appealing hypothesis is that they achieve this feat by building a representation of the data where information irrelevant to the task is lost. For image datasets, this view is supported by the observation that after (and not before) training, the neural representation becomes less and less sensitive to diffeomorphisms acting on images as the signal propagates through the network. This loss of sensitivity correlates with performance and surprisingly correlates with a gain of sensitivity to white noise acquired during training. Which are the mechanisms learned by convolutional neural networks (CNNs) responsible for the these phenomena? In particular, why is the sensitivity to noise heightened with training? Our approach consists of two steps. (1) Analyzing the layer-wise representations of trained CNNs, we disentangle the role of spatial pooling in contrast to channel pooling in decreasing their sensitivity to image diffeomorphisms while increasing their sensitivity to noise. (2) We introduce model scale-detection tasks, which qualitatively reproduce the phenomena reported in our empirical analysis. In these models we can assess quantitatively how spatial pooling affects these sensitivities. We find that the increased sensitivity to noise observed in deep ReLU networks is a mechanistic consequence of the perturbing noise piling up during spatial pooling, after being rectified by ReLU units. Using odd activation functions like tanh drastically reduces the CNNs’ sensitivity to noise.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":" 7","pages":"0"},"PeriodicalIF":6.3000,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1088/2632-2153/ad092c","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 3

Abstract

Abstract A central question of machine learning is how deep nets manage to learn tasks in high dimensions. An appealing hypothesis is that they achieve this feat by building a representation of the data where information irrelevant to the task is lost. For image datasets, this view is supported by the observation that after (and not before) training, the neural representation becomes less and less sensitive to diffeomorphisms acting on images as the signal propagates through the network. This loss of sensitivity correlates with performance and surprisingly correlates with a gain of sensitivity to white noise acquired during training. Which are the mechanisms learned by convolutional neural networks (CNNs) responsible for the these phenomena? In particular, why is the sensitivity to noise heightened with training? Our approach consists of two steps. (1) Analyzing the layer-wise representations of trained CNNs, we disentangle the role of spatial pooling in contrast to channel pooling in decreasing their sensitivity to image diffeomorphisms while increasing their sensitivity to noise. (2) We introduce model scale-detection tasks, which qualitatively reproduce the phenomena reported in our empirical analysis. In these models we can assess quantitatively how spatial pooling affects these sensitivities. We find that the increased sensitivity to noise observed in deep ReLU networks is a mechanistic consequence of the perturbing noise piling up during spatial pooling, after being rectified by ReLU units. Using odd activation functions like tanh drastically reduces the CNNs’ sensitivity to noise.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
深度卷积神经网络如何在训练中丢失空间信息
机器学习的一个核心问题是深度网络如何在高维空间学习任务。一个吸引人的假设是,他们通过建立一个与任务无关的信息丢失的数据表示来实现这一壮举。对于图像数据集,这一观点得到了观察结果的支持,即在训练之后(而不是之前),随着信号在网络中传播,神经表示对作用于图像的微分同态变得越来越不敏感。这种灵敏度的丧失与表现有关,令人惊讶的是,它与训练中获得的对白噪声的灵敏度的增加有关。卷积神经网络(cnn)学习的哪些机制导致了这些现象?特别是,为什么对噪音的敏感度会随着训练而提高?我们的方法包括两个步骤。(1)分析训练后的cnn的分层表示,我们区分了空间池化与通道池化在降低其对图像差分同态的敏感性而增加其对噪声的敏感性方面的作用。(2)引入模型尺度检测任务,定性再现我们实证分析中报告的现象。在这些模型中,我们可以定量地评估空间池化如何影响这些敏感性。我们发现,在深度ReLU网络中观察到的对噪声的敏感性增加是在空间池化过程中扰动噪声堆积后被ReLU单元纠正的机制结果。使用像tanh这样的奇数激活函数大大降低了cnn对噪声的敏感性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Machine Learning Science and Technology
Machine Learning Science and Technology Computer Science-Artificial Intelligence
CiteScore
9.10
自引率
4.40%
发文量
86
审稿时长
5 weeks
期刊介绍: Machine Learning Science and Technology is a multidisciplinary open access journal that bridges the application of machine learning across the sciences with advances in machine learning methods and theory as motivated by physical insights. Specifically, articles must fall into one of the following categories: advance the state of machine learning-driven applications in the sciences or make conceptual, methodological or theoretical advances in machine learning with applications to, inspiration from, or motivated by scientific problems.
期刊最新文献
Quality assurance for online adaptive radiotherapy: a secondary dose verification model with geometry-encoded U-Net. Optimizing ZX-diagrams with deep reinforcement learning DiffLense: a conditional diffusion model for super-resolution of gravitational lensing data Equivariant tensor network potentials Masked particle modeling on sets: towards self-supervised high energy physics foundation models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1