Pub Date : 2025-06-02DOI: 10.1016/j.acha.2025.101778
Haoyu Zhang , Rayan Saab
Quantization and pruning are two essential techniques for compressing neural networks, yet they are often treated independently, with limited theoretical analysis connecting them. This paper introduces a unified framework for post-training quantization and pruning using stochastic path-following algorithms. Our approach builds on the Stochastic Path Following Quantization (SPFQ) method, extending its applicability to pruning and low-bit quantization, including challenging 1-bit regimes. By incorporating a scaling parameter and generalizing the stochastic operator, the proposed method achieves robust error correction and yields rigorous theoretical error bounds for both quantization and pruning as well as their combination.
{"title":"Unified stochastic framework for neural network quantization and pruning","authors":"Haoyu Zhang , Rayan Saab","doi":"10.1016/j.acha.2025.101778","DOIUrl":"10.1016/j.acha.2025.101778","url":null,"abstract":"<div><div>Quantization and pruning are two essential techniques for compressing neural networks, yet they are often treated independently, with limited theoretical analysis connecting them. This paper introduces a unified framework for post-training quantization and pruning using stochastic path-following algorithms. Our approach builds on the Stochastic Path Following Quantization (SPFQ) method, extending its applicability to pruning and low-bit quantization, including challenging 1-bit regimes. By incorporating a scaling parameter and generalizing the stochastic operator, the proposed method achieves robust error correction and yields rigorous theoretical error bounds for both quantization and pruning as well as their combination.</div></div>","PeriodicalId":55504,"journal":{"name":"Applied and Computational Harmonic Analysis","volume":"79 ","pages":"Article 101778"},"PeriodicalIF":2.6,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144230135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-21DOI: 10.1016/j.acha.2025.101777
Xianchen Zhou , Kun Hu , Hongxia Wang
The generalization capability of Graph Convolutional Networks (GCNs) has been researched recently. The generalization error bound based on algorithmic stability is obtained for various structures of GCN. However, the generalization error bound computed by this method increases rapidly during the iteration since the algorithmic stability exponential depends on the number of iterations, which is not consistent with the performance of GCNs in practice. Based on the fact that the property of loss landscape, such as convex, exp-concave, or Polyak-Lojasiewicz* (PL*) leads to tighter stability and better generalization error bound, this paper focuses on the semi-supervised loss landscape of wide GCN. It shows that a wide GCN has a Hessian matrix with a small norm, which can lead to a positive definite training tangent kernel. Then GCN's loss can satisfy the PL* condition and lead to a tighter uniform stability independent of the iteration compared with previous work. Therefore, the generalization error bound in this paper depends on the graph filter's norm and layers, which is consistent with the experiments' results.
{"title":"A tighter generalization error bound for wide GCN based on loss landscape","authors":"Xianchen Zhou , Kun Hu , Hongxia Wang","doi":"10.1016/j.acha.2025.101777","DOIUrl":"10.1016/j.acha.2025.101777","url":null,"abstract":"<div><div>The generalization capability of Graph Convolutional Networks (GCNs) has been researched recently. The generalization error bound based on algorithmic stability is obtained for various structures of GCN. However, the generalization error bound computed by this method increases rapidly during the iteration since the algorithmic stability exponential depends on the number of iterations, which is not consistent with the performance of GCNs in practice. Based on the fact that the property of loss landscape, such as convex, exp-concave, or Polyak-Lojasiewicz* (PL*) leads to tighter stability and better generalization error bound, this paper focuses on the semi-supervised loss landscape of wide GCN. It shows that a wide GCN has a Hessian matrix with a small norm, which can lead to a positive definite training tangent kernel. Then GCN's loss can satisfy the PL* condition and lead to a tighter uniform stability independent of the iteration compared with previous work. Therefore, the generalization error bound in this paper depends on the graph filter's norm and layers, which is consistent with the experiments' results.</div></div>","PeriodicalId":55504,"journal":{"name":"Applied and Computational Harmonic Analysis","volume":"78 ","pages":"Article 101777"},"PeriodicalIF":2.6,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144108039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-21DOI: 10.1016/j.acha.2025.101776
Michael E. Mckenna , Hrushikesh N. Mhaskar , Richard G. Spencer
Motivated by applications in magnetic resonance relaxometry, we consider the following problem: given samples of a function , where is an integer, , for , determine K, 's and 's. Unlike the case in which the 's are purely imaginary, this problem is notoriously ill-posed. Our goal is to show that this problem can be transformed into an equivalent one in which the 's are replaced by . We show that this may be accomplished by approximation in terms of Hermite functions, and using the fact that these functions are eigenfunctions of the Fourier transform. We present a preliminary numerical exploration of parameter extraction from this formalism, including the effect of noise. The inherent ill-posedness of the original problem persists in the new domain, as reflected in the numerical results.
{"title":"An eigenfunction approach to conversion of the Laplace transform of point masses on the real line to the Fourier domain","authors":"Michael E. Mckenna , Hrushikesh N. Mhaskar , Richard G. Spencer","doi":"10.1016/j.acha.2025.101776","DOIUrl":"10.1016/j.acha.2025.101776","url":null,"abstract":"<div><div>Motivated by applications in magnetic resonance relaxometry, we consider the following problem: given samples of a function <span><math><mi>t</mi><mo>↦</mo><msubsup><mrow><mo>∑</mo></mrow><mrow><mi>k</mi><mo>=</mo><mn>1</mn></mrow><mrow><mi>K</mi></mrow></msubsup><msub><mrow><mi>A</mi></mrow><mrow><mi>k</mi></mrow></msub><mi>exp</mi><mo></mo><mo>(</mo><mo>−</mo><mi>t</mi><msub><mrow><mi>λ</mi></mrow><mrow><mi>k</mi></mrow></msub><mo>)</mo></math></span>, where <span><math><mi>K</mi><mo>≥</mo><mn>2</mn></math></span> is an integer, <span><math><msub><mrow><mi>A</mi></mrow><mrow><mi>k</mi></mrow></msub><mo>∈</mo><mi>R</mi></math></span>, <span><math><msub><mrow><mi>λ</mi></mrow><mrow><mi>k</mi></mrow></msub><mo>></mo><mn>0</mn></math></span> for <span><math><mi>k</mi><mo>=</mo><mn>1</mn><mo>,</mo><mo>⋯</mo><mo>,</mo><mi>K</mi></math></span>, determine <em>K</em>, <span><math><msub><mrow><mi>A</mi></mrow><mrow><mi>k</mi></mrow></msub></math></span>'s and <span><math><msub><mrow><mi>λ</mi></mrow><mrow><mi>k</mi></mrow></msub></math></span>'s. Unlike the case in which the <span><math><msub><mrow><mi>λ</mi></mrow><mrow><mi>k</mi></mrow></msub></math></span>'s are purely imaginary, this problem is notoriously ill-posed. Our goal is to show that this problem can be transformed into an equivalent one in which the <span><math><msub><mrow><mi>λ</mi></mrow><mrow><mi>k</mi></mrow></msub></math></span>'s are replaced by <span><math><mi>i</mi><msub><mrow><mi>λ</mi></mrow><mrow><mi>k</mi></mrow></msub></math></span>. We show that this may be accomplished by approximation in terms of Hermite functions, and using the fact that these functions are eigenfunctions of the Fourier transform. We present a preliminary numerical exploration of parameter extraction from this formalism, including the effect of noise. The inherent ill-posedness of the original problem persists in the new domain, as reflected in the numerical results.</div></div>","PeriodicalId":55504,"journal":{"name":"Applied and Computational Harmonic Analysis","volume":"79 ","pages":"Article 101776"},"PeriodicalIF":2.6,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144222723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-12DOI: 10.1016/j.acha.2025.101773
Xinliang Liu , Bingxin Zhou , Chutian Zhang , Yu Guang Wang
Graph neural networks have achieved champions in wide applications. Neural message passing is a typical key module for feature propagation by aggregating neighboring features. In this work, we propose a new message passing based on multiscale framelet transforms, called Framelet Message Passing. Different from traditional spatial methods, it integrates framelet representation of neighbor nodes from multiple hops away in node message update. We also propose a continuous message passing using neural ODE solvers. Both discrete and continuous cases can provably mitigate oversmoothing and achieve superior performance. Numerical experiments on real graph datasets show that the continuous version of the framelet message passing significantly outperforms existing methods when learning heterogeneous graphs and achieves state-of-the-art performance on classic node classification tasks with low computational costs.
{"title":"Framelet message passing","authors":"Xinliang Liu , Bingxin Zhou , Chutian Zhang , Yu Guang Wang","doi":"10.1016/j.acha.2025.101773","DOIUrl":"10.1016/j.acha.2025.101773","url":null,"abstract":"<div><div>Graph neural networks have achieved champions in wide applications. Neural message passing is a typical key module for feature propagation by aggregating neighboring features. In this work, we propose a new message passing based on multiscale framelet transforms, called Framelet Message Passing. Different from traditional spatial methods, it integrates framelet representation of neighbor nodes from multiple hops away in node message update. We also propose a continuous message passing using neural ODE solvers. Both discrete and continuous cases can provably mitigate oversmoothing and achieve superior performance. Numerical experiments on real graph datasets show that the continuous version of the framelet message passing significantly outperforms existing methods when learning heterogeneous graphs and achieves state-of-the-art performance on classic node classification tasks with low computational costs.</div></div>","PeriodicalId":55504,"journal":{"name":"Applied and Computational Harmonic Analysis","volume":"78 ","pages":"Article 101773"},"PeriodicalIF":2.6,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144125322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-08DOI: 10.1016/j.acha.2025.101775
Jun Fan , Jie Sun , Ailing Yan , Shenglong Zhou
Recovering an unknown signal from quadratic measurements has gained popularity due to its wide range of applications, including phase retrieval, fusion frame phase retrieval, and positive operator-valued measures. In this paper, we employ a least squares approach to reconstruct the signal and establish its non-asymptotic statistical properties. Our analysis shows that the estimator perfectly recovers the true signal in the noiseless case, while the error between the estimator and the true signal is bounded by in the noisy case, where n is the number of measurements and p is the dimension of the signal. We then develop a two-phase algorithm, gradient regularized Newton method (GRNM), to solve the least squares problem. It is proven that the first phase terminates within finitely many steps, and the sequence generated in the second phase converges to a unique local minimum at a superlinear rate under certain mild conditions. Beyond these deterministic results, GRNM is capable of exactly reconstructing the true signal in the noiseless case and achieving the stated error rate with a high probability in the noisy case. Numerical experiments demonstrate that GRNM offers a high level of recovery capability and accuracy as well as fast computational speed.
{"title":"An oracle gradient regularized Newton method for quadratic measurements regression","authors":"Jun Fan , Jie Sun , Ailing Yan , Shenglong Zhou","doi":"10.1016/j.acha.2025.101775","DOIUrl":"10.1016/j.acha.2025.101775","url":null,"abstract":"<div><div>Recovering an unknown signal from quadratic measurements has gained popularity due to its wide range of applications, including phase retrieval, fusion frame phase retrieval, and positive operator-valued measures. In this paper, we employ a least squares approach to reconstruct the signal and establish its non-asymptotic statistical properties. Our analysis shows that the estimator perfectly recovers the true signal in the noiseless case, while the error between the estimator and the true signal is bounded by <span><math><mi>O</mi><mo>(</mo><msqrt><mrow><mi>p</mi><mi>log</mi><mo></mo><mo>(</mo><mn>1</mn><mo>+</mo><mn>2</mn><mi>n</mi><mo>)</mo><mo>/</mo><mi>n</mi></mrow></msqrt><mo>)</mo></math></span> in the noisy case, where <em>n</em> is the number of measurements and <em>p</em> is the dimension of the signal. We then develop a two-phase algorithm, gradient regularized Newton method (GRNM), to solve the least squares problem. It is proven that the first phase terminates within finitely many steps, and the sequence generated in the second phase converges to a unique local minimum at a superlinear rate under certain mild conditions. Beyond these deterministic results, GRNM is capable of exactly reconstructing the true signal in the noiseless case and achieving the stated error rate with a high probability in the noisy case. Numerical experiments demonstrate that GRNM offers a high level of recovery capability and accuracy as well as fast computational speed.</div></div>","PeriodicalId":55504,"journal":{"name":"Applied and Computational Harmonic Analysis","volume":"78 ","pages":"Article 101775"},"PeriodicalIF":2.6,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143935916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-02DOI: 10.1016/j.acha.2025.101774
Junren Chen , Michael K. Ng
A covariance matrix estimator using two bits per entry was recently developed by Dirksen et al. (2022) [11]. The estimator achieves near minimax operator norm rate for general sub-Gaussian distributions, but also suffers from two downsides: theoretically, there is an essential gap on operator norm error between their estimator and sample covariance when the diagonal of the covariance matrix is dominated by only a few entries; practically, its performance heavily relies on the dithering scale, which needs to be tuned according to some unknown parameters. In this work, we propose a new 2-bit covariance matrix estimator that simultaneously addresses both issues. Unlike the sign quantizer associated with uniform dither in Dirksen et al., we adopt a triangular dither prior to a 2-bit quantizer inspired by the multi-bit uniform quantizer. By employing dithering scales varying across entries, our estimator enjoys an improved operator norm error rate that depends on the effective rank of the underlying covariance matrix rather than the ambient dimension, which is optimal up to logarithmic factors. Moreover, our proposed method eliminates the need of any tuning parameter, as the dithering scales are entirely determined by the data. While our estimator requires a pass of all unquantized samples to determine the dithering scales, it can be adapted to the online setting where the samples arise sequentially. Experimental results are provided to demonstrate the advantages of our estimators over the existing ones.
{"title":"A parameter-free two-bit covariance estimator with improved operator norm error rate","authors":"Junren Chen , Michael K. Ng","doi":"10.1016/j.acha.2025.101774","DOIUrl":"10.1016/j.acha.2025.101774","url":null,"abstract":"<div><div>A covariance matrix estimator using two bits per entry was recently developed by Dirksen et al. (2022) <span><span>[11]</span></span>. The estimator achieves near minimax operator norm rate for general sub-Gaussian distributions, but also suffers from two downsides: theoretically, there is an essential gap on operator norm error between their estimator and sample covariance when the diagonal of the covariance matrix is dominated by only a few entries; practically, its performance heavily relies on the dithering scale, which needs to be tuned according to some unknown parameters. In this work, we propose a new 2-bit covariance matrix estimator that simultaneously addresses both issues. Unlike the sign quantizer associated with uniform dither in Dirksen et al., we adopt a triangular dither prior to a 2-bit quantizer inspired by the multi-bit uniform quantizer. By employing dithering scales varying across entries, our estimator enjoys an improved operator norm error rate that depends on the effective rank of the underlying covariance matrix rather than the ambient dimension, which is optimal up to logarithmic factors. Moreover, our proposed method eliminates the need of <em>any</em> tuning parameter, as the dithering scales are entirely determined by the data. While our estimator requires a pass of all unquantized samples to determine the dithering scales, it can be adapted to the online setting where the samples arise sequentially. Experimental results are provided to demonstrate the advantages of our estimators over the existing ones.</div></div>","PeriodicalId":55504,"journal":{"name":"Applied and Computational Harmonic Analysis","volume":"78 ","pages":"Article 101774"},"PeriodicalIF":2.6,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143903641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-28DOI: 10.1016/j.acha.2025.101766
M. Fanuel, R. Bardenet
In this paper, we consider a -connection graph, that is, a graph where each oriented edge is endowed with a unit modulus complex number that is conjugated under orientation flip. A natural replacement for the combinatorial Laplacian is then the magnetic Laplacian, an Hermitian matrix that includes information about the graph's connection. Magnetic Laplacians appear, e.g., in the problem of angular synchronization. In the context of large and dense graphs, we study here sparsifiers of the magnetic Laplacian Δ, i.e., spectral approximations based on subgraphs with few edges. Our approach relies on sampling multi-type spanning forests (MTSFs) using a custom determinantal point process, a probability distribution over edges that favors diversity. In a word, an MTSF is a spanning subgraph whose connected components are either trees or cycle-rooted trees. The latter partially capture the angular inconsistencies of the connection graph, and thus provide a way to compress the information contained in the connection. Interestingly, when the connection graph has weakly inconsistent cycles, samples from the determinantal point process under consideration can be obtained à la Wilson, using a random walk with cycle popping. We provide statistical guarantees for a choice of natural estimators of the connection Laplacian, and investigate two practical applications of our sparsifiers: ranking with angular synchronization and graph-based semi-supervised learning. From a statistical perspective, a side result of this paper of independent interest is a matrix Chernoff bound with intrinsic dimension, which allows considering the influence of a regularization – of the form with – on sparsification guarantees.
本文考虑一个U(1)-连接图,即每个有向边都有一个单位模复数,该复数在有向翻转下共轭。组合拉普拉斯的自然替代品是磁拉普拉斯,一个包含图连接信息的厄米矩阵。例如,在角同步问题中出现了磁拉普拉斯算子。在大而密集图的背景下,我们研究了磁拉普拉斯Δ的稀疏化算子,即基于少边子图的谱近似。我们的方法依赖于使用自定义确定性点过程对多类型跨越森林(mtsf)进行采样,这是一种有利于多样性的边缘概率分布。简而言之,MTSF是一个生成子图,其连接的组件要么是树,要么是环根树。后者部分捕获连接图的角度不一致,从而提供一种压缩连接中包含的信息的方法。有趣的是,当连接图具有弱不一致的循环时,可以使用带有循环弹出的随机漫步,从所考虑的确定性点过程中获得样本。我们为连接拉普拉斯的自然估计量的选择提供了统计保证,并研究了我们的稀疏化器的两个实际应用:角同步排序和基于图的半监督学习。从统计的角度来看,本文的一个独立的结果是一个具有固有维数的矩阵Chernoff界,它允许考虑形式为Δ+qI with q>;0的正则化-对稀疏化保证的影响。
{"title":"Sparsification of the regularized magnetic Laplacian with multi-type spanning forests","authors":"M. Fanuel, R. Bardenet","doi":"10.1016/j.acha.2025.101766","DOIUrl":"10.1016/j.acha.2025.101766","url":null,"abstract":"<div><div>In this paper, we consider a <span><math><mi>U</mi><mo>(</mo><mn>1</mn><mo>)</mo></math></span>-connection graph, that is, a graph where each oriented edge is endowed with a unit modulus complex number that is conjugated under orientation flip. A natural replacement for the combinatorial Laplacian is then the <em>magnetic</em> Laplacian, an Hermitian matrix that includes information about the graph's connection. Magnetic Laplacians appear, e.g., in the problem of angular synchronization. In the context of large and dense graphs, we study here sparsifiers of the magnetic Laplacian Δ, i.e., spectral approximations based on subgraphs with few edges. Our approach relies on sampling multi-type spanning forests (MTSFs) using a custom determinantal point process, a probability distribution over edges that favors diversity. In a word, an MTSF is a spanning subgraph whose connected components are either trees or cycle-rooted trees. The latter partially capture the angular inconsistencies of the connection graph, and thus provide a way to compress the information contained in the connection. Interestingly, when the connection graph has weakly inconsistent cycles, samples from the determinantal point process under consideration can be obtained <em>à la Wilson</em>, using a random walk with cycle popping. We provide statistical guarantees for a choice of natural estimators of the connection Laplacian, and investigate two practical applications of our sparsifiers: ranking with angular synchronization and graph-based semi-supervised learning. From a statistical perspective, a side result of this paper of independent interest is a matrix Chernoff bound with intrinsic dimension, which allows considering the influence of a regularization – of the form <span><math><mi>Δ</mi><mo>+</mo><mi>q</mi><mi>I</mi></math></span> with <span><math><mi>q</mi><mo>></mo><mn>0</mn></math></span> – on sparsification guarantees.</div></div>","PeriodicalId":55504,"journal":{"name":"Applied and Computational Harmonic Analysis","volume":"78 ","pages":"Article 101766"},"PeriodicalIF":2.6,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143891370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-27DOI: 10.1016/j.acha.2025.101765
Len Spek , Tjeerd Jan Heeringa , Felix Schwenninger , Christoph Brune
Reproducing Kernel Hilbert spaces (RKHS) have been a very successful tool in various areas of machine learning. Recently, Barron spaces have been used to prove bounds on the generalisation error for neural networks. Unfortunately, Barron spaces cannot be understood in terms of RKHS due to the strong nonlinear coupling of the weights. This can be solved by using the more general Reproducing Kernel Banach spaces (RKBS). We show that these Barron spaces belong to a class of integral RKBS. This class can also be understood as an infinite union of RKHS spaces. Furthermore, we show that the dual space of such RKBSs, is again an RKBS where the roles of the data and parameters are interchanged, forming an adjoint pair of RKBSs including a reproducing kernel. This allows us to construct the saddle point problem for neural networks, which can be used in the whole field of primal-dual optimisation.
{"title":"Duality for neural networks through Reproducing Kernel Banach Spaces","authors":"Len Spek , Tjeerd Jan Heeringa , Felix Schwenninger , Christoph Brune","doi":"10.1016/j.acha.2025.101765","DOIUrl":"10.1016/j.acha.2025.101765","url":null,"abstract":"<div><div>Reproducing Kernel Hilbert spaces (RKHS) have been a very successful tool in various areas of machine learning. Recently, Barron spaces have been used to prove bounds on the generalisation error for neural networks. Unfortunately, Barron spaces cannot be understood in terms of RKHS due to the strong nonlinear coupling of the weights. This can be solved by using the more general Reproducing Kernel Banach spaces (RKBS). We show that these Barron spaces belong to a class of integral RKBS. This class can also be understood as an infinite union of RKHS spaces. Furthermore, we show that the dual space of such RKBSs, is again an RKBS where the roles of the data and parameters are interchanged, forming an adjoint pair of RKBSs including a reproducing kernel. This allows us to construct the saddle point problem for neural networks, which can be used in the whole field of primal-dual optimisation.</div></div>","PeriodicalId":55504,"journal":{"name":"Applied and Computational Harmonic Analysis","volume":"78 ","pages":"Article 101765"},"PeriodicalIF":2.6,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-25DOI: 10.1016/j.acha.2025.101764
Michael Unser, Alexis Goujon, Stanislas Ducotterd
We present a general variational framework for the training of freeform nonlinearities in layered computational architectures subject to some slope constraints. The regularization that we add to the traditional training loss penalizes the second-order total variation of each trainable activation. The slope constraints allow us to impose properties such as 1-Lipschitz stability, firm non-expansiveness, and monotonicity/invertibility. These properties are crucial to ensure the proper functioning of certain classes of signal-processing algorithms (e.g., plug-and-play schemes, unrolled proximal gradient, invertible flows). We prove that the global optimum of the stated constrained-optimization problem is achieved with nonlinearities that are adaptive nonuniform linear splines. We then show how to solve the resulting function-optimization problem numerically by representing the nonlinearities in a suitable (nonuniform) B-spline basis. Finally, we illustrate the use of our framework with the data-driven design of (weakly) convex regularizers for the denoising of images and the resolution of inverse problems.
{"title":"Controlled learning of pointwise nonlinearities in neural-network-like architectures","authors":"Michael Unser, Alexis Goujon, Stanislas Ducotterd","doi":"10.1016/j.acha.2025.101764","DOIUrl":"10.1016/j.acha.2025.101764","url":null,"abstract":"<div><div>We present a general variational framework for the training of freeform nonlinearities in layered computational architectures subject to some slope constraints. The regularization that we add to the traditional training loss penalizes the second-order total variation of each trainable activation. The slope constraints allow us to impose properties such as 1-Lipschitz stability, firm non-expansiveness, and monotonicity/invertibility. These properties are crucial to ensure the proper functioning of certain classes of signal-processing algorithms (e.g., plug-and-play schemes, unrolled proximal gradient, invertible flows). We prove that the global optimum of the stated constrained-optimization problem is achieved with nonlinearities that are adaptive nonuniform linear splines. We then show how to solve the resulting function-optimization problem numerically by representing the nonlinearities in a suitable (nonuniform) B-spline basis. Finally, we illustrate the use of our framework with the data-driven design of (weakly) convex regularizers for the denoising of images and the resolution of inverse problems.</div></div>","PeriodicalId":55504,"journal":{"name":"Applied and Computational Harmonic Analysis","volume":"77 ","pages":"Article 101764"},"PeriodicalIF":2.6,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143738438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-24DOI: 10.1016/j.acha.2025.101763
Holger Boche , Adalbert Fono , Gitta Kutyniok
Deep learning still has drawbacks regarding trustworthiness, which describes a comprehensible, fair, safe, and reliable method. To mitigate the potential risk of AI, clear obligations associated with trustworthiness have been proposed via regulatory guidelines, e.g., in the European AI Act. Therefore, a central question is to what extent trustworthy deep learning can be realized. Establishing the described properties constituting trustworthiness requires that the factors influencing an algorithmic computation can be retraced, i.e., the algorithmic implementation is transparent. Motivated by the observation that the current evolution of deep learning models necessitates a change in computing technology, we derive a mathematical framework that enables us to analyze whether a transparent implementation in a computing model is feasible. The core idea is to formalize and subsequently relate the properties of a transparent algorithmic implementation to the mathematical model of the computing platform, thereby establishing verifiable criteria.
We exemplarily apply our trustworthiness framework to analyze deep learning approaches for inverse problems in digital and analog computing models represented by Turing and Blum-Shub-Smale machines, respectively. Based on previous results, we find that Blum-Shub-Smale machines have the potential to establish trustworthy solvers for inverse problems under fairly general conditions, whereas Turing machines cannot guarantee trustworthiness to the same degree.
{"title":"Mathematical algorithm design for deep learning under societal and judicial constraints: The algorithmic transparency requirement","authors":"Holger Boche , Adalbert Fono , Gitta Kutyniok","doi":"10.1016/j.acha.2025.101763","DOIUrl":"10.1016/j.acha.2025.101763","url":null,"abstract":"<div><div>Deep learning still has drawbacks regarding trustworthiness, which describes a comprehensible, fair, safe, and reliable method. To mitigate the potential risk of AI, clear obligations associated with trustworthiness have been proposed via regulatory guidelines, e.g., in the European AI Act. Therefore, a central question is to what extent trustworthy deep learning can be realized. Establishing the described properties constituting trustworthiness requires that the factors influencing an algorithmic computation can be retraced, i.e., the algorithmic implementation is transparent. Motivated by the observation that the current evolution of deep learning models necessitates a change in computing technology, we derive a mathematical framework that enables us to analyze whether a transparent implementation in a computing model is feasible. The core idea is to formalize and subsequently relate the properties of a transparent algorithmic implementation to the mathematical model of the computing platform, thereby establishing verifiable criteria.</div><div>We exemplarily apply our trustworthiness framework to analyze deep learning approaches for inverse problems in digital and analog computing models represented by Turing and Blum-Shub-Smale machines, respectively. Based on previous results, we find that Blum-Shub-Smale machines have the potential to establish trustworthy solvers for inverse problems under fairly general conditions, whereas Turing machines cannot guarantee trustworthiness to the same degree.</div></div>","PeriodicalId":55504,"journal":{"name":"Applied and Computational Harmonic Analysis","volume":"77 ","pages":"Article 101763"},"PeriodicalIF":2.6,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143696911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}