Pub Date : 2025-07-16DOI: 10.1016/j.acha.2025.101795
Yunwen Lei , Tao Sun , Mingrui Liu
The increasing scale of data propels the popularity of leveraging parallelism to speed up the optimization. Minibatch stochastic gradient descent (minibatch SGD) and local SGD are two popular methods for parallel optimization. The existing theoretical studies show a linear speedup of these methods with respect to the number of machines, which, however, is measured by optimization errors in a multi-pass setting. As a comparison, the stability and generalization of these methods are much less studied. In this paper, we study the stability and generalization analysis of minibatch and local SGD to understand their learnability by introducing an expectation-variance decomposition. We incorporate training errors into the stability analysis, which shows how small training errors help generalization for overparameterized models. We show minibatch and local SGD achieve a linear speedup to attain the optimal risk bounds.
{"title":"Minibatch and local SGD: Algorithmic stability and linear speedup in generalization","authors":"Yunwen Lei , Tao Sun , Mingrui Liu","doi":"10.1016/j.acha.2025.101795","DOIUrl":"10.1016/j.acha.2025.101795","url":null,"abstract":"<div><div>The increasing scale of data propels the popularity of leveraging parallelism to speed up the optimization. Minibatch stochastic gradient descent (minibatch SGD) and local SGD are two popular methods for parallel optimization. The existing theoretical studies show a linear speedup of these methods with respect to the number of machines, which, however, is measured by optimization errors in a multi-pass setting. As a comparison, the stability and generalization of these methods are much less studied. In this paper, we study the stability and generalization analysis of minibatch and local SGD to understand their learnability by introducing an expectation-variance decomposition. We incorporate training errors into the stability analysis, which shows how small training errors help generalization for overparameterized models. We show minibatch and local SGD achieve a linear speedup to attain the optimal risk bounds.</div></div>","PeriodicalId":55504,"journal":{"name":"Applied and Computational Harmonic Analysis","volume":"79 ","pages":"Article 101795"},"PeriodicalIF":2.6,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144653251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-16DOI: 10.1016/j.acha.2025.101796
Dorian Florescu, Ayush Bhandari
In this paper we introduce a new sampling and reconstruction approach for multi-dimensional analog signals. Building on top of the Unlimited Sensing Framework (USF), we present a new folded sampling operator called the multi-dimensional modulo-hysteresis that is also backwards compatible with the existing one-dimensional modulo operator. Unlike previous approaches, the proposed model is specifically tailored to multi-dimensional signals. In particular, the model uses certain redundancy in dimensions 2 and above, which is exploited for input recovery with robustness. We prove that the new operator is well-defined and its outputs have a bounded dynamic range. For the noiseless case, we derive a theoretically guaranteed input reconstruction approach. When the input is corrupted by Gaussian noise, we exploit redundancy in higher dimensions to provide a bound on the error probability and show this drops to 0 for high enough sampling rates leading to new theoretical guarantees for the noisy case. Our numerical examples corroborate the theoretical results and show that the proposed approach can handle a significantly larger amount of noise compared to USF.
{"title":"Multi-dimensional unlimited sampling and robust reconstruction","authors":"Dorian Florescu, Ayush Bhandari","doi":"10.1016/j.acha.2025.101796","DOIUrl":"10.1016/j.acha.2025.101796","url":null,"abstract":"<div><div>In this paper we introduce a new sampling and reconstruction approach for multi-dimensional analog signals. Building on top of the Unlimited Sensing Framework (USF), we present a new folded sampling operator called the multi-dimensional modulo-hysteresis that is also backwards compatible with the existing one-dimensional modulo operator. Unlike previous approaches, the proposed model is specifically tailored to multi-dimensional signals. In particular, the model uses certain redundancy in dimensions 2 and above, which is exploited for input recovery with robustness. We prove that the new operator is well-defined and its outputs have a bounded dynamic range. For the noiseless case, we derive a theoretically guaranteed input reconstruction approach. When the input is corrupted by Gaussian noise, we exploit redundancy in higher dimensions to provide a bound on the error probability and show this drops to 0 for high enough sampling rates leading to new theoretical guarantees for the noisy case. Our numerical examples corroborate the theoretical results and show that the proposed approach can handle a significantly larger amount of noise compared to USF.</div></div>","PeriodicalId":55504,"journal":{"name":"Applied and Computational Harmonic Analysis","volume":"79 ","pages":"Article 101796"},"PeriodicalIF":2.6,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144665025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-16DOI: 10.1016/j.acha.2025.101797
Yunfei Yang
This paper studies the problem of how efficiently functions in the Sobolev spaces and Besov spaces can be approximated by deep ReLU neural networks with width W and depth L, when the error is measured in the norm. This problem has been studied by several recent works, which obtained the approximation rate up to logarithmic factors when , and the rate for networks with fixed width when the Sobolev embedding condition holds. We generalize these results by showing that the rate indeed holds under the Sobolev embedding condition. It is known that this rate is optimal up to logarithmic factors. The key tool in our proof is a novel encoding of sparse vectors by using deep ReLU neural networks with varied width and depth, which may be of independent interest.
{"title":"On the optimal approximation of Sobolev and Besov functions using deep ReLU neural networks","authors":"Yunfei Yang","doi":"10.1016/j.acha.2025.101797","DOIUrl":"10.1016/j.acha.2025.101797","url":null,"abstract":"<div><div>This paper studies the problem of how efficiently functions in the Sobolev spaces <span><math><msup><mrow><mi>W</mi></mrow><mrow><mi>s</mi><mo>,</mo><mi>q</mi></mrow></msup><mo>(</mo><msup><mrow><mo>[</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo>]</mo></mrow><mrow><mi>d</mi></mrow></msup><mo>)</mo></math></span> and Besov spaces <span><math><msubsup><mrow><mi>B</mi></mrow><mrow><mi>q</mi><mo>,</mo><mi>r</mi></mrow><mrow><mi>s</mi></mrow></msubsup><mo>(</mo><msup><mrow><mo>[</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo>]</mo></mrow><mrow><mi>d</mi></mrow></msup><mo>)</mo></math></span> can be approximated by deep ReLU neural networks with width <em>W</em> and depth <em>L</em>, when the error is measured in the <span><math><msup><mrow><mi>L</mi></mrow><mrow><mi>p</mi></mrow></msup><mo>(</mo><msup><mrow><mo>[</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo>]</mo></mrow><mrow><mi>d</mi></mrow></msup><mo>)</mo></math></span> norm. This problem has been studied by several recent works, which obtained the approximation rate <span><math><mi>O</mi><mo>(</mo><msup><mrow><mo>(</mo><mi>W</mi><mi>L</mi><mo>)</mo></mrow><mrow><mo>−</mo><mn>2</mn><mi>s</mi><mo>/</mo><mi>d</mi></mrow></msup><mo>)</mo></math></span> up to logarithmic factors when <span><math><mi>p</mi><mo>=</mo><mi>q</mi><mo>=</mo><mo>∞</mo></math></span>, and the rate <span><math><mi>O</mi><mo>(</mo><msup><mrow><mi>L</mi></mrow><mrow><mo>−</mo><mn>2</mn><mi>s</mi><mo>/</mo><mi>d</mi></mrow></msup><mo>)</mo></math></span> for networks with fixed width when the Sobolev embedding condition <span><math><mn>1</mn><mo>/</mo><mi>q</mi><mo>−</mo><mn>1</mn><mo>/</mo><mi>p</mi><mo><</mo><mi>s</mi><mo>/</mo><mi>d</mi></math></span> holds. We generalize these results by showing that the rate <span><math><mi>O</mi><mo>(</mo><msup><mrow><mo>(</mo><mi>W</mi><mi>L</mi><mo>)</mo></mrow><mrow><mo>−</mo><mn>2</mn><mi>s</mi><mo>/</mo><mi>d</mi></mrow></msup><mo>)</mo></math></span> indeed holds under the Sobolev embedding condition. It is known that this rate is optimal up to logarithmic factors. The key tool in our proof is a novel encoding of sparse vectors by using deep ReLU neural networks with varied width and depth, which may be of independent interest.</div></div>","PeriodicalId":55504,"journal":{"name":"Applied and Computational Harmonic Analysis","volume":"79 ","pages":"Article 101797"},"PeriodicalIF":2.6,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144653252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-09DOI: 10.1016/j.acha.2025.101791
Weilin Li
Consider an operator that takes the Fourier transform of a discrete measure supported in and restricts it to a compact . We provide lower bounds for its smallest singular value when Ω is either a closed ball of radius m or closed cube of side length 2m, and under different types of geometric assumptions on . We first show that if distances between points in are lower bounded by a δ that is allowed to be arbitrarily small, then the smallest singular value is at least , where λ is the maximum number of elements in contained within any ball or cube of an explicitly given radius. This estimate communicates a localization effect of the Fourier transform. While it is sharp, the smallest singular value behaves better than expected for many , including when we dilate a generic set by parameter δ. We next show that if there is a η such that, for each , the set locally consists of at most r hyperplanes whose distances to x are at least η, then the smallest singular value is at least . For dilations of a generic set by δ, the lower bound becomes . The appearance of a factor in the exponent indicates that compared to worst case scenarios, the condition number of nonharmonic Fourier transforms is better than expected for typical sets and improve with higher dimensionality.
{"title":"Nonharmonic multivariate Fourier transforms and matrices: Condition numbers and hyperplane geometry","authors":"Weilin Li","doi":"10.1016/j.acha.2025.101791","DOIUrl":"10.1016/j.acha.2025.101791","url":null,"abstract":"<div><div>Consider an operator that takes the Fourier transform of a discrete measure supported in <span><math><mi>X</mi><mo>⊆</mo><msup><mrow><mo>[</mo><mo>−</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mn>2</mn></mrow></mfrac><mo>,</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mn>2</mn></mrow></mfrac><mo>)</mo></mrow><mrow><mi>d</mi></mrow></msup></math></span> and restricts it to a compact <span><math><mi>Ω</mi><mo>⊆</mo><msup><mrow><mi>R</mi></mrow><mrow><mi>d</mi></mrow></msup></math></span>. We provide lower bounds for its smallest singular value when Ω is either a closed ball of radius <em>m</em> or closed cube of side length 2<em>m</em>, and under different types of geometric assumptions on <span><math><mi>X</mi></math></span>. We first show that if distances between points in <span><math><mi>X</mi></math></span> are lower bounded by a <em>δ</em> that is allowed to be arbitrarily small, then the smallest singular value is at least <span><math><mi>C</mi><msup><mrow><mi>m</mi></mrow><mrow><mi>d</mi><mo>/</mo><mn>2</mn></mrow></msup><msup><mrow><mo>(</mo><mi>m</mi><mi>δ</mi><mo>)</mo></mrow><mrow><mi>λ</mi><mo>−</mo><mn>1</mn></mrow></msup></math></span>, where <em>λ</em> is the maximum number of elements in <span><math><mi>X</mi></math></span> contained within any ball or cube of an explicitly given radius. This estimate communicates a localization effect of the Fourier transform. While it is sharp, the smallest singular value behaves better than expected for many <span><math><mi>X</mi></math></span>, including when we dilate a generic set by parameter <em>δ</em>. We next show that if there is a <em>η</em> such that, for each <span><math><mi>x</mi><mo>∈</mo><mi>X</mi></math></span>, the set <span><math><mi>X</mi><mo>∖</mo><mo>{</mo><mi>x</mi><mo>}</mo></math></span> locally consists of at most <em>r</em> hyperplanes whose distances to <em>x</em> are at least <em>η</em>, then the smallest singular value is at least <span><math><mi>C</mi><msup><mrow><mi>m</mi></mrow><mrow><mi>d</mi><mo>/</mo><mn>2</mn></mrow></msup><msup><mrow><mo>(</mo><mi>m</mi><mi>η</mi><mo>)</mo></mrow><mrow><mi>r</mi></mrow></msup></math></span>. For dilations of a generic set by <em>δ</em>, the lower bound becomes <span><math><mi>C</mi><msup><mrow><mi>m</mi></mrow><mrow><mi>d</mi><mo>/</mo><mn>2</mn></mrow></msup><msup><mrow><mo>(</mo><mi>m</mi><mi>δ</mi><mo>)</mo></mrow><mrow><mo>⌈</mo><mo>(</mo><mi>λ</mi><mo>−</mo><mn>1</mn><mo>)</mo><mo>/</mo><mi>d</mi><mo>⌉</mo></mrow></msup></math></span>. The appearance of a <span><math><mn>1</mn><mo>/</mo><mi>d</mi></math></span> factor in the exponent indicates that compared to worst case scenarios, the condition number of nonharmonic Fourier transforms is better than expected for typical sets and improve with higher dimensionality.</div></div>","PeriodicalId":55504,"journal":{"name":"Applied and Computational Harmonic Analysis","volume":"79 ","pages":"Article 101791"},"PeriodicalIF":2.6,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144595712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-09DOI: 10.1016/j.acha.2025.101794
Elias Zikkos
Let be an exponential Schauder basis for , where , and let be its dual Schauder basis. Let A be a non-empty subset of the integers containing exactly M elements. We prove that for the weighted system is exact in the space , that is, it is complete and minimal in , if and only if . We also show that such a system is not a Riesz basis for .
In particular, the weighted trigonometric system is exact in , if and only if , but this system is not even a Schauder basis for <
设{eiλnt}n∈Z是L2(0,1)的指数Schauder基,其中λn∈R,设{rn(t)}n∈Z是它的对偶Schauder基。设A是包含M个元素的整数的非空子集。证明了对于α>;0,加权系统{tα⋅rn(t)}n∈Z∈A在L2(0,1)上是精确的,即当且仅当α∈[M−12,M+12]时,它在L2(0,1)上是完备极小的。我们还证明了这样的系统不是L2(0,1)的Riesz基。特别地,当且仅当α∈[M−12,M+12]时,加权三角系统{tα⋅e2πint}n∈Z∈A在L2(0,1)中是精确的,但该系统甚至不是L2(0,1)的Schauder基。这个结果扩展了Heil和Yoon(2012)的结果,他们考虑了α为正整数时的类似问题。{tα⋅e2πint}n∈Z∈A的非碱度结合Heil et al.(2023)的结果,得到对于任意α≥1/2,过完备系统{tα⋅e2πint}n∈Z对于L2(0,1)没有可再生伙伴。然而,这个过完备系统是L2(0,1)的加权下半框架。这是根据我们最近的结果得出的,我们证明了Hilbert空间H中的任何精确系统都是H的加权下半框架。为了完备性,我们在这里重新证明了这个结果。指出Vandermonde矩阵的可逆性对上述系统的精确性和非基性起着至关重要的作用。
{"title":"On exact systems {tα⋅e2πint}n∈Z∖A in L2(0,1) which are weighted lower semi frames but not Schauder bases, and their generalizations","authors":"Elias Zikkos","doi":"10.1016/j.acha.2025.101794","DOIUrl":"10.1016/j.acha.2025.101794","url":null,"abstract":"<div><div>Let <span><math><msub><mrow><mo>{</mo><msup><mrow><mi>e</mi></mrow><mrow><mi>i</mi><msub><mrow><mi>λ</mi></mrow><mrow><mi>n</mi></mrow></msub><mi>t</mi></mrow></msup><mo>}</mo></mrow><mrow><mi>n</mi><mo>∈</mo><mi>Z</mi></mrow></msub></math></span> be an exponential Schauder basis for <span><math><msup><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>(</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo>)</mo></math></span>, where <span><math><msub><mrow><mi>λ</mi></mrow><mrow><mi>n</mi></mrow></msub><mo>∈</mo><mi>R</mi></math></span>, and let <span><math><msub><mrow><mo>{</mo><msub><mrow><mi>r</mi></mrow><mrow><mi>n</mi></mrow></msub><mo>(</mo><mi>t</mi><mo>)</mo><mo>}</mo></mrow><mrow><mi>n</mi><mo>∈</mo><mi>Z</mi></mrow></msub></math></span> be its dual Schauder basis. Let <em>A</em> be a non-empty subset of the integers containing exactly <em>M</em> elements. We prove that for <span><math><mi>α</mi><mo>></mo><mn>0</mn></math></span> the weighted system <span><math><msub><mrow><mo>{</mo><msup><mrow><mi>t</mi></mrow><mrow><mi>α</mi></mrow></msup><mo>⋅</mo><msub><mrow><mi>r</mi></mrow><mrow><mi>n</mi></mrow></msub><mo>(</mo><mi>t</mi><mo>)</mo><mo>}</mo></mrow><mrow><mi>n</mi><mo>∈</mo><mi>Z</mi><mo>∖</mo><mi>A</mi></mrow></msub></math></span> is exact in the space <span><math><msup><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>(</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo>)</mo></math></span>, that is, it is complete and minimal in <span><math><msup><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>(</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo>)</mo></math></span>, if and only if <span><math><mi>α</mi><mo>∈</mo><mo>[</mo><mi>M</mi><mo>−</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mn>2</mn></mrow></mfrac><mo>,</mo><mi>M</mi><mo>+</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mn>2</mn></mrow></mfrac><mo>)</mo></math></span>. We also show that such a system is not a Riesz basis for <span><math><msup><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>(</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo>)</mo></math></span>.</div><div>In particular, the weighted trigonometric system <span><math><msub><mrow><mo>{</mo><msup><mrow><mi>t</mi></mrow><mrow><mi>α</mi></mrow></msup><mo>⋅</mo><msup><mrow><mi>e</mi></mrow><mrow><mn>2</mn><mi>π</mi><mi>i</mi><mi>n</mi><mi>t</mi></mrow></msup><mo>}</mo></mrow><mrow><mi>n</mi><mo>∈</mo><mi>Z</mi><mo>∖</mo><mi>A</mi></mrow></msub></math></span> is exact in <span><math><msup><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>(</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo>)</mo></math></span>, if and only if <span><math><mi>α</mi><mo>∈</mo><mo>[</mo><mi>M</mi><mo>−</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mn>2</mn></mrow></mfrac><mo>,</mo><mi>M</mi><mo>+</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mn>2</mn></mrow></mfrac><mo>)</mo></math></span>, but this system is not even a Schauder basis for <span><math><msup><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>(</mo><mn>0</mn><mo>,</mo><mn>1</mn><mo>)</mo></math><","PeriodicalId":55504,"journal":{"name":"Applied and Computational Harmonic Analysis","volume":"79 ","pages":"Article 101794"},"PeriodicalIF":2.6,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144596066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-07DOI: 10.1016/j.acha.2025.101793
Shashank Sule , Richard G. Spencer , Wojciech Czaja
We characterize the exact solutions to neural network descrambling–a mathematical model for explaining the fully connected layers of trained neural networks (NNs). By reformulating the problem to the minimization of the Brockett function arising in graph matching and complexity theory we show that the principal components of the hidden layer preactivations can be characterized as the optimal “explainers” or descramblers for the layer weights, leading to descrambled weight matrices. We show that in typical deep learning contexts these descramblers take diverse and interesting forms including (1) matching largest principal components with the lowest frequency modes of the Fourier basis for isotropic hidden data, (2) discovering the semantic development in two-layer linear NNs for signal recovery problems, and (3) explaining CNNs by optimally permuting the neurons. Our numerical experiments indicate that the eigendecompositions of the hidden layer data–now understood as the descramblers–can also reveal the layer's underlying transformation. These results illustrate that the SVD is more directly related to the explainability of NNs than previously thought and offers a promising avenue for discovering interpretable motifs for the hidden action of NNs, especially in contexts of operator learning or physics-informed NNs, where the input/output data has limited human readability.
{"title":"On the limits of neural network explainability via descrambling","authors":"Shashank Sule , Richard G. Spencer , Wojciech Czaja","doi":"10.1016/j.acha.2025.101793","DOIUrl":"10.1016/j.acha.2025.101793","url":null,"abstract":"<div><div>We characterize the exact solutions to <em>neural network descrambling</em>–a mathematical model for explaining the fully connected layers of trained neural networks (NNs). By reformulating the problem to the minimization of the Brockett function arising in graph matching and complexity theory we show that the principal components of the hidden layer preactivations can be characterized as the optimal “explainers” or <em>descramblers</em> for the layer weights, leading to <em>descrambled</em> weight matrices. We show that in typical deep learning contexts these descramblers take diverse and interesting forms including (1) matching largest principal components with the lowest frequency modes of the Fourier basis for isotropic hidden data, (2) discovering the semantic development in two-layer linear NNs for signal recovery problems, and (3) explaining CNNs by optimally permuting the neurons. Our numerical experiments indicate that the eigendecompositions of the hidden layer data–now understood as the descramblers–can also reveal the layer's underlying transformation. These results illustrate that the SVD is more directly related to the explainability of NNs than previously thought and offers a promising avenue for discovering interpretable motifs for the hidden action of NNs, especially in contexts of operator learning or physics-informed NNs, where the input/output data has limited human readability.</div></div>","PeriodicalId":55504,"journal":{"name":"Applied and Computational Harmonic Analysis","volume":"79 ","pages":"Article 101793"},"PeriodicalIF":2.6,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144572145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-05DOI: 10.1016/j.acha.2025.101792
P. Michael Kielstra , Michael Lindsey
We introduce a fast algorithm for Gaussian process regression in low dimensions, applicable to a widely-used family of non-stationary kernels. The non-stationarity of these kernels is induced by arbitrary spatially-varying vertical and horizontal scales. In particular, any stationary kernel can be accommodated as a special case, and we focus especially on the generalization of the standard Matérn kernel. Our subroutine for kernel matrix-vector multiplications scales almost optimally as , where N is the number of regression points. Like the recently developed equispaced Fourier Gaussian process (EFGP) methodology, which is applicable only to stationary kernels, our approach exploits non-uniform fast Fourier transforms (NUFFTs). We offer a complete analysis controlling the approximation error of our method, and we validate the method's practical performance with numerical experiments. In particular we demonstrate improved scalability compared to state-of-the-art rank-structured approaches in spatial dimension .
{"title":"Gaussian process regression with log-linear scaling for common non-stationary kernels","authors":"P. Michael Kielstra , Michael Lindsey","doi":"10.1016/j.acha.2025.101792","DOIUrl":"10.1016/j.acha.2025.101792","url":null,"abstract":"<div><div>We introduce a fast algorithm for Gaussian process regression in low dimensions, applicable to a widely-used family of non-stationary kernels. The non-stationarity of these kernels is induced by arbitrary spatially-varying vertical and horizontal scales. In particular, any stationary kernel can be accommodated as a special case, and we focus especially on the generalization of the standard Matérn kernel. Our subroutine for kernel matrix-vector multiplications scales almost optimally as <span><math><mi>O</mi><mo>(</mo><mi>N</mi><mi>log</mi><mo></mo><mi>N</mi><mo>)</mo></math></span>, where <em>N</em> is the number of regression points. Like the recently developed equispaced Fourier Gaussian process (EFGP) methodology, which is applicable only to stationary kernels, our approach exploits non-uniform fast Fourier transforms (NUFFTs). We offer a complete analysis controlling the approximation error of our method, and we validate the method's practical performance with numerical experiments. In particular we demonstrate improved scalability compared to state-of-the-art rank-structured approaches in spatial dimension <span><math><mi>d</mi><mo>></mo><mn>1</mn></math></span>.</div></div>","PeriodicalId":55504,"journal":{"name":"Applied and Computational Harmonic Analysis","volume":"79 ","pages":"Article 101792"},"PeriodicalIF":2.6,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144596067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-20DOI: 10.1016/j.acha.2025.101790
Hartmut Führ, Max Getter
We analyze energy decay for deep convolutional neural networks employed as feature extractors, including Mallat's wavelet scattering transform. For time-frequency scattering transforms based on Gabor filters, previous work has established that energy decay is exponential for arbitrary square-integrable input signals. In contrast, our main results allow proving that this is false for wavelet scattering in any dimension. Specifically, we show that the energy decay of wavelet and wavelet-like scattering transforms acting on generic square-integrable signals can be arbitrarily slow. Importantly, this slow decay behavior holds for dense subsets of , indicating that rapid energy decay is generally an unstable property of signals. We complement these findings with positive results that allow us to infer fast (up to exponential) energy decay for generalized Sobolev spaces tailored to the frequency localization of the underlying filter bank. Both negative and positive results highlight that energy decay in scattering networks critically depends on the interplay between the respective frequency localizations of both the signal and the filters used.
{"title":"Energy propagation in scattering convolution networks can be arbitrarily slow","authors":"Hartmut Führ, Max Getter","doi":"10.1016/j.acha.2025.101790","DOIUrl":"10.1016/j.acha.2025.101790","url":null,"abstract":"<div><div>We analyze energy decay for deep convolutional neural networks employed as feature extractors, including Mallat's wavelet scattering transform. For time-frequency scattering transforms based on Gabor filters, previous work has established that energy decay is exponential for arbitrary square-integrable input signals. In contrast, our main results allow proving that this is false for wavelet scattering in any dimension. Specifically, we show that the energy decay of wavelet and wavelet-like scattering transforms acting on generic square-integrable signals can be arbitrarily slow. Importantly, this slow decay behavior holds for dense subsets of <span><math><msup><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>(</mo><msup><mrow><mi>R</mi></mrow><mrow><mi>d</mi></mrow></msup><mo>)</mo></math></span>, indicating that rapid energy decay is generally an unstable property of signals. We complement these findings with positive results that allow us to infer fast (up to exponential) energy decay for generalized Sobolev spaces tailored to the frequency localization of the underlying filter bank. Both negative and positive results highlight that energy decay in scattering networks critically depends on the interplay between the respective frequency localizations of both the signal and the filters used.</div></div>","PeriodicalId":55504,"journal":{"name":"Applied and Computational Harmonic Analysis","volume":"79 ","pages":"Article 101790"},"PeriodicalIF":2.6,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144337762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-18DOI: 10.1016/j.acha.2025.101789
Daniel Potts, Laura Weidensager
We propose two algorithms for boosting random Fourier feature models for approximating high-dimensional functions. These methods utilize the classical and generalized analysis of variance (ANOVA) decomposition to learn low-order functions, where there are few interactions between the variables. Our algorithms are able to find an index set of important input variables and variable interactions reliably.
Furthermore, we generalize already existing random Fourier feature models to an ANOVA setting, where terms of different order can be used. Our algorithms have the advantage of being interpretable, meaning that the influence of every input variable is known in the learned model, even for dependent input variables. We provide theoretical as well as numerical results that our algorithms perform well for sensitivity analysis. The ANOVA-boosting step reduces the approximation error of existing methods significantly.
{"title":"ANOVA-boosting for random Fourier features","authors":"Daniel Potts, Laura Weidensager","doi":"10.1016/j.acha.2025.101789","DOIUrl":"10.1016/j.acha.2025.101789","url":null,"abstract":"<div><div>We propose two algorithms for boosting random Fourier feature models for approximating high-dimensional functions. These methods utilize the classical and generalized analysis of variance (ANOVA) decomposition to learn low-order functions, where there are few interactions between the variables. Our algorithms are able to find an index set of important input variables and variable interactions reliably.</div><div>Furthermore, we generalize already existing random Fourier feature models to an ANOVA setting, where terms of different order can be used. Our algorithms have the advantage of being interpretable, meaning that the influence of every input variable is known in the learned model, even for dependent input variables. We provide theoretical as well as numerical results that our algorithms perform well for sensitivity analysis. The ANOVA-boosting step reduces the approximation error of existing methods significantly.</div></div>","PeriodicalId":55504,"journal":{"name":"Applied and Computational Harmonic Analysis","volume":"79 ","pages":"Article 101789"},"PeriodicalIF":2.6,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144313768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-11DOI: 10.1016/j.acha.2025.101786
Tao Zhang , Gennian Ge
The problem of sparse representation has significant applications in signal processing. The spark of a dictionary plays a crucial role in the study of sparse representation. Donoho and Elad initially explored the spark, and they provided a general lower bound. When the dictionary is a union of several orthonormal bases, Gribonval and Nielsen presented an improved lower bound for spark. In this paper, we introduce a new construction of dictionary, achieving the spark bound given by Gribonval and Nielsen. More precisely, let q be a power of 2, we show that for any positive integer t, there exists a dictionary in , which is a union of orthonormal bases, such that the spark of the dictionary attains Gribonval-Nielsen's bound. Our result extends previously best known result from to arbitrarily positive integer t, and our construction is technically different from previous ones. Their method is more combinatorial, while ours is algebraic, which is more general.
{"title":"New results on sparse representations in unions of orthonormal bases","authors":"Tao Zhang , Gennian Ge","doi":"10.1016/j.acha.2025.101786","DOIUrl":"10.1016/j.acha.2025.101786","url":null,"abstract":"<div><div>The problem of sparse representation has significant applications in signal processing. The spark of a dictionary plays a crucial role in the study of sparse representation. Donoho and Elad initially explored the spark, and they provided a general lower bound. When the dictionary is a union of several orthonormal bases, Gribonval and Nielsen presented an improved lower bound for spark. In this paper, we introduce a new construction of dictionary, achieving the spark bound given by Gribonval and Nielsen. More precisely, let <em>q</em> be a power of 2, we show that for any positive integer <em>t</em>, there exists a dictionary in <span><math><msup><mrow><mi>R</mi></mrow><mrow><msup><mrow><mi>q</mi></mrow><mrow><mn>2</mn><mi>t</mi></mrow></msup></mrow></msup></math></span>, which is a union of <span><math><mi>q</mi><mo>+</mo><mn>1</mn></math></span> orthonormal bases, such that the spark of the dictionary attains Gribonval-Nielsen's bound. Our result extends previously best known result from <span><math><mi>t</mi><mo>=</mo><mn>1</mn><mo>,</mo><mn>2</mn></math></span> to arbitrarily positive integer <em>t</em>, and our construction is technically different from previous ones. Their method is more combinatorial, while ours is algebraic, which is more general.</div></div>","PeriodicalId":55504,"journal":{"name":"Applied and Computational Harmonic Analysis","volume":"79 ","pages":"Article 101786"},"PeriodicalIF":2.6,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144262223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}