{"title":"A note about why deep learning is deep: A discontinuous approximation perspective","authors":"Yongxin Li, Haobo Qi, Hansheng Wang","doi":"10.1002/sta4.654","DOIUrl":null,"url":null,"abstract":"Deep learning has achieved unprecedented success in recent years. This approach essentially uses the composition of nonlinear functions to model the complex relationship between input features and output labels. However, a comprehensive theoretical understanding of why the hierarchical layered structure can exhibit superior expressive power is still lacking. In this paper, we provide an explanation for this phenomenon by measuring the approximation efficiency of neural networks with respect to discontinuous target functions. We focus on deep neural networks with rectified linear unit (ReLU) activation functions. We find that to achieve the same degree of approximation accuracy, the number of neurons required by a single‐hidden‐layer (SHL) network is exponentially greater than that required by a multi‐hidden‐layer (MHL) network. In practice, discontinuous points tend to contain highly valuable information (i.e., edges in image classification). We argue that this may be a very important reason accounting for the impressive performance of deep neural networks. We validate our theory in extensive experiments.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"30 1","pages":""},"PeriodicalIF":0.7000,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Stat","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1002/sta4.654","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0
Abstract
Deep learning has achieved unprecedented success in recent years. This approach essentially uses the composition of nonlinear functions to model the complex relationship between input features and output labels. However, a comprehensive theoretical understanding of why the hierarchical layered structure can exhibit superior expressive power is still lacking. In this paper, we provide an explanation for this phenomenon by measuring the approximation efficiency of neural networks with respect to discontinuous target functions. We focus on deep neural networks with rectified linear unit (ReLU) activation functions. We find that to achieve the same degree of approximation accuracy, the number of neurons required by a single‐hidden‐layer (SHL) network is exponentially greater than that required by a multi‐hidden‐layer (MHL) network. In practice, discontinuous points tend to contain highly valuable information (i.e., edges in image classification). We argue that this may be a very important reason accounting for the impressive performance of deep neural networks. We validate our theory in extensive experiments.
StatDecision Sciences-Statistics, Probability and Uncertainty
CiteScore
1.10
自引率
0.00%
发文量
85
期刊介绍:
Stat is an innovative electronic journal for the rapid publication of novel and topical research results, publishing compact articles of the highest quality in all areas of statistical endeavour. Its purpose is to provide a means of rapid sharing of important new theoretical, methodological and applied research. Stat is a joint venture between the International Statistical Institute and Wiley-Blackwell.
Stat is characterised by:
• Speed - a high-quality review process that aims to reach a decision within 20 days of submission.
• Concision - a maximum article length of 10 pages of text, not including references.
• Supporting materials - inclusion of electronic supporting materials including graphs, video, software, data and images.
• Scope - addresses all areas of statistics and interdisciplinary areas.
Stat is a scientific journal for the international community of statisticians and researchers and practitioners in allied quantitative disciplines.