{"title":"神经网络的LU分解和Toeplitz分解","authors":"Yucong Liu , Simiao Jiao , Lek-Heng Lim","doi":"10.1016/j.acha.2023.101601","DOIUrl":null,"url":null,"abstract":"<div><p>Any matrix <em>A</em> has an LU decomposition up to a row or column permutation. Less well-known is the fact that it has a ‘Toeplitz decomposition’ <span><math><mi>A</mi><mo>=</mo><msub><mrow><mi>T</mi></mrow><mrow><mn>1</mn></mrow></msub><msub><mrow><mi>T</mi></mrow><mrow><mn>2</mn></mrow></msub><mo>⋯</mo><msub><mrow><mi>T</mi></mrow><mrow><mi>r</mi></mrow></msub></math></span> where <span><math><msub><mrow><mi>T</mi></mrow><mrow><mi>i</mi></mrow></msub></math></span>'s are Toeplitz matrices. We will prove that any continuous function <span><math><mi>f</mi><mo>:</mo><msup><mrow><mi>R</mi></mrow><mrow><mi>n</mi></mrow></msup><mo>→</mo><msup><mrow><mi>R</mi></mrow><mrow><mi>m</mi></mrow></msup></math></span> has an approximation to arbitrary accuracy by a neural network that maps <span><math><mi>x</mi><mo>∈</mo><msup><mrow><mi>R</mi></mrow><mrow><mi>n</mi></mrow></msup></math></span> to <span><math><msub><mrow><mi>L</mi></mrow><mrow><mn>1</mn></mrow></msub><msub><mrow><mi>σ</mi></mrow><mrow><mn>1</mn></mrow></msub><msub><mrow><mi>U</mi></mrow><mrow><mn>1</mn></mrow></msub><msub><mrow><mi>σ</mi></mrow><mrow><mn>2</mn></mrow></msub><msub><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msub><msub><mrow><mi>σ</mi></mrow><mrow><mn>3</mn></mrow></msub><msub><mrow><mi>U</mi></mrow><mrow><mn>2</mn></mrow></msub><mo>⋯</mo><msub><mrow><mi>L</mi></mrow><mrow><mi>r</mi></mrow></msub><msub><mrow><mi>σ</mi></mrow><mrow><mn>2</mn><mi>r</mi><mo>−</mo><mn>1</mn></mrow></msub><msub><mrow><mi>U</mi></mrow><mrow><mi>r</mi></mrow></msub><mi>x</mi><mo>∈</mo><msup><mrow><mi>R</mi></mrow><mrow><mi>m</mi></mrow></msup></math></span>, i.e., where the weight matrices alternate between lower and upper triangular matrices, <span><math><msub><mrow><mi>σ</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>(</mo><mi>x</mi><mo>)</mo><mo>≔</mo><mi>σ</mi><mo>(</mo><mi>x</mi><mo>−</mo><msub><mrow><mi>b</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>)</mo></math></span> for some bias vector <span><math><msub><mrow><mi>b</mi></mrow><mrow><mi>i</mi></mrow></msub></math></span>, and the activation <em>σ</em> may be chosen to be essentially any uniformly continuous nonpolynomial function. The same result also holds with Toeplitz matrices, i.e., <span><math><mi>f</mi><mo>≈</mo><msub><mrow><mi>T</mi></mrow><mrow><mn>1</mn></mrow></msub><msub><mrow><mi>σ</mi></mrow><mrow><mn>1</mn></mrow></msub><msub><mrow><mi>T</mi></mrow><mrow><mn>2</mn></mrow></msub><msub><mrow><mi>σ</mi></mrow><mrow><mn>2</mn></mrow></msub><mo>⋯</mo><msub><mrow><mi>σ</mi></mrow><mrow><mi>r</mi><mo>−</mo><mn>1</mn></mrow></msub><msub><mrow><mi>T</mi></mrow><mrow><mi>r</mi></mrow></msub></math></span> to arbitrary accuracy, and likewise for Hankel matrices. A consequence of our Toeplitz result is a fixed-width universal approximation theorem for convolutional neural networks, which so far have only arbitrary width versions. Since our results apply in particular to the case when <em>f</em> is a general neural network, we may regard them as LU and Toeplitz decompositions of a neural network. The practical implication of our results is that one may vastly reduce the number of weight parameters in a neural network without sacrificing its power of universal approximation. We will present several experiments on real data sets to show that imposing such structures on the weight matrices dramatically reduces the number of training parameters with almost no noticeable effect on test accuracy.</p></div>","PeriodicalId":55504,"journal":{"name":"Applied and Computational Harmonic Analysis","volume":"68 ","pages":"Article 101601"},"PeriodicalIF":2.6000,"publicationDate":"2023-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LU decomposition and Toeplitz decomposition of a neural network\",\"authors\":\"Yucong Liu , Simiao Jiao , Lek-Heng Lim\",\"doi\":\"10.1016/j.acha.2023.101601\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Any matrix <em>A</em> has an LU decomposition up to a row or column permutation. Less well-known is the fact that it has a ‘Toeplitz decomposition’ <span><math><mi>A</mi><mo>=</mo><msub><mrow><mi>T</mi></mrow><mrow><mn>1</mn></mrow></msub><msub><mrow><mi>T</mi></mrow><mrow><mn>2</mn></mrow></msub><mo>⋯</mo><msub><mrow><mi>T</mi></mrow><mrow><mi>r</mi></mrow></msub></math></span> where <span><math><msub><mrow><mi>T</mi></mrow><mrow><mi>i</mi></mrow></msub></math></span>'s are Toeplitz matrices. We will prove that any continuous function <span><math><mi>f</mi><mo>:</mo><msup><mrow><mi>R</mi></mrow><mrow><mi>n</mi></mrow></msup><mo>→</mo><msup><mrow><mi>R</mi></mrow><mrow><mi>m</mi></mrow></msup></math></span> has an approximation to arbitrary accuracy by a neural network that maps <span><math><mi>x</mi><mo>∈</mo><msup><mrow><mi>R</mi></mrow><mrow><mi>n</mi></mrow></msup></math></span> to <span><math><msub><mrow><mi>L</mi></mrow><mrow><mn>1</mn></mrow></msub><msub><mrow><mi>σ</mi></mrow><mrow><mn>1</mn></mrow></msub><msub><mrow><mi>U</mi></mrow><mrow><mn>1</mn></mrow></msub><msub><mrow><mi>σ</mi></mrow><mrow><mn>2</mn></mrow></msub><msub><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msub><msub><mrow><mi>σ</mi></mrow><mrow><mn>3</mn></mrow></msub><msub><mrow><mi>U</mi></mrow><mrow><mn>2</mn></mrow></msub><mo>⋯</mo><msub><mrow><mi>L</mi></mrow><mrow><mi>r</mi></mrow></msub><msub><mrow><mi>σ</mi></mrow><mrow><mn>2</mn><mi>r</mi><mo>−</mo><mn>1</mn></mrow></msub><msub><mrow><mi>U</mi></mrow><mrow><mi>r</mi></mrow></msub><mi>x</mi><mo>∈</mo><msup><mrow><mi>R</mi></mrow><mrow><mi>m</mi></mrow></msup></math></span>, i.e., where the weight matrices alternate between lower and upper triangular matrices, <span><math><msub><mrow><mi>σ</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>(</mo><mi>x</mi><mo>)</mo><mo>≔</mo><mi>σ</mi><mo>(</mo><mi>x</mi><mo>−</mo><msub><mrow><mi>b</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>)</mo></math></span> for some bias vector <span><math><msub><mrow><mi>b</mi></mrow><mrow><mi>i</mi></mrow></msub></math></span>, and the activation <em>σ</em> may be chosen to be essentially any uniformly continuous nonpolynomial function. The same result also holds with Toeplitz matrices, i.e., <span><math><mi>f</mi><mo>≈</mo><msub><mrow><mi>T</mi></mrow><mrow><mn>1</mn></mrow></msub><msub><mrow><mi>σ</mi></mrow><mrow><mn>1</mn></mrow></msub><msub><mrow><mi>T</mi></mrow><mrow><mn>2</mn></mrow></msub><msub><mrow><mi>σ</mi></mrow><mrow><mn>2</mn></mrow></msub><mo>⋯</mo><msub><mrow><mi>σ</mi></mrow><mrow><mi>r</mi><mo>−</mo><mn>1</mn></mrow></msub><msub><mrow><mi>T</mi></mrow><mrow><mi>r</mi></mrow></msub></math></span> to arbitrary accuracy, and likewise for Hankel matrices. A consequence of our Toeplitz result is a fixed-width universal approximation theorem for convolutional neural networks, which so far have only arbitrary width versions. Since our results apply in particular to the case when <em>f</em> is a general neural network, we may regard them as LU and Toeplitz decompositions of a neural network. The practical implication of our results is that one may vastly reduce the number of weight parameters in a neural network without sacrificing its power of universal approximation. We will present several experiments on real data sets to show that imposing such structures on the weight matrices dramatically reduces the number of training parameters with almost no noticeable effect on test accuracy.</p></div>\",\"PeriodicalId\":55504,\"journal\":{\"name\":\"Applied and Computational Harmonic Analysis\",\"volume\":\"68 \",\"pages\":\"Article 101601\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2023-10-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied and Computational Harmonic Analysis\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S106352032300088X\",\"RegionNum\":2,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICS, APPLIED\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied and Computational Harmonic Analysis","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S106352032300088X","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}
LU decomposition and Toeplitz decomposition of a neural network
Any matrix A has an LU decomposition up to a row or column permutation. Less well-known is the fact that it has a ‘Toeplitz decomposition’ where 's are Toeplitz matrices. We will prove that any continuous function has an approximation to arbitrary accuracy by a neural network that maps to , i.e., where the weight matrices alternate between lower and upper triangular matrices, for some bias vector , and the activation σ may be chosen to be essentially any uniformly continuous nonpolynomial function. The same result also holds with Toeplitz matrices, i.e., to arbitrary accuracy, and likewise for Hankel matrices. A consequence of our Toeplitz result is a fixed-width universal approximation theorem for convolutional neural networks, which so far have only arbitrary width versions. Since our results apply in particular to the case when f is a general neural network, we may regard them as LU and Toeplitz decompositions of a neural network. The practical implication of our results is that one may vastly reduce the number of weight parameters in a neural network without sacrificing its power of universal approximation. We will present several experiments on real data sets to show that imposing such structures on the weight matrices dramatically reduces the number of training parameters with almost no noticeable effect on test accuracy.
期刊介绍:
Applied and Computational Harmonic Analysis (ACHA) is an interdisciplinary journal that publishes high-quality papers in all areas of mathematical sciences related to the applied and computational aspects of harmonic analysis, with special emphasis on innovative theoretical development, methods, and algorithms, for information processing, manipulation, understanding, and so forth. The objectives of the journal are to chronicle the important publications in the rapidly growing field of data representation and analysis, to stimulate research in relevant interdisciplinary areas, and to provide a common link among mathematical, physical, and life scientists, as well as engineers.