It is well-known that machine learning models are vulnerable to small but cleverly-designed adversarial perturbations that can cause misclassification. While there has been major progress in designing attacks and defenses for various adversarial settings, many fundamental and theoretical problems are yet to be resolved. In this paper, we consider classification in the presence of $ell_0$-bounded adversarial perturbations, a.k.a. sparse attacks. This setting is significantly different from other $ell_p$-adversarial settings, with $pgeq 1$, as the $ell_0$-ball is non-convex and highly non-smooth. Under the assumption that data is distributed according to the Gaussian mixture model, our goal is to characterize the optimal robust classifier and the corresponding robust classification error as well as a variety of trade-offs between robustness, accuracy, and the adversary's budget. To this end, we develop a novel classification algorithm called FilTrun that has two main modules: Filtration and Truncation. The key idea of our method is to first filter out the non-robust coordinates of the input and then apply a carefully-designed truncated inner product for classification. By analyzing the performance of FilTrun, we derive an upper bound on the optimal robust classification error. We also find a lower bound by designing a specific adversarial strategy that enables us to derive the corresponding robust classifier and its achieved error. For the case that the covariance matrix of the Gaussian mixtures is diagonal, we show that as the input's dimension gets large, the upper and lower bounds converge; i.e. we characterize the asymptotically-optimal robust classifier. Throughout, we discuss several examples that illustrate interesting behaviors such as the existence of a phase transition for adversary's budget determining whether the effect of adversarial perturbation can be fully neutralized.
{"title":"Robust Classification Under 𝓁0 Attack for the Gaussian Mixture Model","authors":"Payam Delgosha, Hamed Hassani, Ramtin Pedarsani","doi":"10.1137/21m1426286","DOIUrl":"https://doi.org/10.1137/21m1426286","url":null,"abstract":"It is well-known that machine learning models are vulnerable to small but cleverly-designed adversarial perturbations that can cause misclassification. While there has been major progress in designing attacks and defenses for various adversarial settings, many fundamental and theoretical problems are yet to be resolved. In this paper, we consider classification in the presence of $ell_0$-bounded adversarial perturbations, a.k.a. sparse attacks. This setting is significantly different from other $ell_p$-adversarial settings, with $pgeq 1$, as the $ell_0$-ball is non-convex and highly non-smooth. Under the assumption that data is distributed according to the Gaussian mixture model, our goal is to characterize the optimal robust classifier and the corresponding robust classification error as well as a variety of trade-offs between robustness, accuracy, and the adversary's budget. To this end, we develop a novel classification algorithm called FilTrun that has two main modules: Filtration and Truncation. The key idea of our method is to first filter out the non-robust coordinates of the input and then apply a carefully-designed truncated inner product for classification. By analyzing the performance of FilTrun, we derive an upper bound on the optimal robust classification error. We also find a lower bound by designing a specific adversarial strategy that enables us to derive the corresponding robust classifier and its achieved error. For the case that the covariance matrix of the Gaussian mixtures is diagonal, we show that as the input's dimension gets large, the upper and lower bounds converge; i.e. we characterize the asymptotically-optimal robust classifier. Throughout, we discuss several examples that illustrate interesting behaviors such as the existence of a phase transition for adversary's budget determining whether the effect of adversarial perturbation can be fully neutralized.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"10 3 1","pages":"362-385"},"PeriodicalIF":0.0,"publicationDate":"2021-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80147874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In an attempt to better understand structural benefits and generalization power of deep neural networks, we firstly present a novel graph theoretical formulation of neural network models, including fully connected, residual network (ResNet) and densely connected networks (DenseNet). Secondly, we extend the error analysis of the population risk for two layer network cite{ew2019prioriTwo} and ResNet cite{e2019prioriRes} to DenseNet, and show further that for neural networks satisfying certain mild conditions, similar estimates can be obtained. These estimates are a priori in nature since they depend sorely on the information prior to the training process, in particular, the bounds for the estimation errors are independent of the input dimension.
{"title":"Nonlinear Weighted Directed Acyclic Graph and A Priori Estimates for Neural Networks","authors":"Yuqing Li, Tao Luo, Chao Ma","doi":"10.1137/21M140955","DOIUrl":"https://doi.org/10.1137/21M140955","url":null,"abstract":"In an attempt to better understand structural benefits and generalization power of deep neural networks, we firstly present a novel graph theoretical formulation of neural network models, including fully connected, residual network (ResNet) and densely connected networks (DenseNet). Secondly, we extend the error analysis of the population risk for two layer network cite{ew2019prioriTwo} and ResNet cite{e2019prioriRes} to DenseNet, and show further that for neural networks satisfying certain mild conditions, similar estimates can be obtained. These estimates are a priori in nature since they depend sorely on the information prior to the training process, in particular, the bounds for the estimation errors are independent of the input dimension.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"2 1","pages":"694-720"},"PeriodicalIF":0.0,"publicationDate":"2021-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72963610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cluster detection plays a fundamental role in the analysis of data. In this paper, we focus on the use of s-defective clique models for network-based cluster detection and propose a nonlinear optimization approach that efficiently handles those models in practice. In particular, we introduce an equivalent continuous formulation for the problem under analysis, and we analyze some tailored variants of the Frank-Wolfe algorithm that enable us to quickly find maximal s-defective cliques. The good practical behavior of those algorithmic tools, which is closely connected to their support identification properties, makes them very appealing in practical applications. The reported numerical results clearly show the effectiveness of the proposed approach.
{"title":"Fast Cluster Detection in Networks by First Order Optimization","authors":"I. Bomze, F. Rinaldi, Damiano Zeffiro","doi":"10.1137/21m1408658","DOIUrl":"https://doi.org/10.1137/21m1408658","url":null,"abstract":"Cluster detection plays a fundamental role in the analysis of data. In this paper, we focus on the use of s-defective clique models for network-based cluster detection and propose a nonlinear optimization approach that efficiently handles those models in practice. In particular, we introduce an equivalent continuous formulation for the problem under analysis, and we analyze some tailored variants of the Frank-Wolfe algorithm that enable us to quickly find maximal s-defective cliques. The good practical behavior of those algorithmic tools, which is closely connected to their support identification properties, makes them very appealing in practical applications. The reported numerical results clearly show the effectiveness of the proposed approach.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"12 1","pages":"285-305"},"PeriodicalIF":0.0,"publicationDate":"2021-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76391350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Caragea, D. Lee, J. Maly, G. Pfander, F. Voigtlaender
Until recently, applications of neural networks in machine learning have almost exclusively relied on real-valued networks. It was recently observed, however, that complex-valued neural networks (CVNNs) exhibit superior performance in applications in which the input is naturally complex-valued, such as MRI fingerprinting. While the mathematical theory of real-valued networks has, by now, reached some level of maturity, this is far from true for complex-valued networks. In this paper, we analyze the expressivity of complex-valued networks by providing explicit quantitative error bounds for approximating $C^n$ functions on compact subsets of $mathbb{C}^d$ by complex-valued neural networks that employ the modReLU activation function, given by $sigma(z) = mathrm{ReLU}(|z| - 1) , mathrm{sgn} (z)$, which is one of the most popular complex activation functions used in practice. We show that the derived approximation rates are optimal (up to log factors) in the class of modReLU networks with weights of moderate growth.
{"title":"Quantitative approximation results for complex-valued neural networks","authors":"A. Caragea, D. Lee, J. Maly, G. Pfander, F. Voigtlaender","doi":"10.1137/21m1429540","DOIUrl":"https://doi.org/10.1137/21m1429540","url":null,"abstract":"Until recently, applications of neural networks in machine learning have almost exclusively relied on real-valued networks. It was recently observed, however, that complex-valued neural networks (CVNNs) exhibit superior performance in applications in which the input is naturally complex-valued, such as MRI fingerprinting. While the mathematical theory of real-valued networks has, by now, reached some level of maturity, this is far from true for complex-valued networks. In this paper, we analyze the expressivity of complex-valued networks by providing explicit quantitative error bounds for approximating $C^n$ functions on compact subsets of $mathbb{C}^d$ by complex-valued neural networks that employ the modReLU activation function, given by $sigma(z) = mathrm{ReLU}(|z| - 1) , mathrm{sgn} (z)$, which is one of the most popular complex activation functions used in practice. We show that the derived approximation rates are optimal (up to log factors) in the class of modReLU networks with weights of moderate growth.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"118 1","pages":"553-580"},"PeriodicalIF":0.0,"publicationDate":"2021-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89797195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We show that sparsity constrained optimization problems over low dimensional spaces tend to have a small duality gap. We use the Shapley-Folkman theorem to derive both data-driven bounds on the duality gap, and an efficient primalization procedure to recover feasible points satisfying these bounds. These error bounds are proportional to the rate of growth of the objective with the target cardinality, which means in particular that the relaxation is nearly tight as soon as the target cardinality is large enough so that only uninformative features are added.
{"title":"Approximation Bounds for Sparse Programs","authors":"Armin Askari, A. d’Aspremont, L. Ghaoui","doi":"10.1137/21m1398677","DOIUrl":"https://doi.org/10.1137/21m1398677","url":null,"abstract":"We show that sparsity constrained optimization problems over low dimensional spaces tend to have a small duality gap. We use the Shapley-Folkman theorem to derive both data-driven bounds on the duality gap, and an efficient primalization procedure to recover feasible points satisfying these bounds. These error bounds are proportional to the rate of growth of the objective with the target cardinality, which means in particular that the relaxation is nearly tight as soon as the target cardinality is large enough so that only uninformative features are added.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"88 1","pages":"514-530"},"PeriodicalIF":0.0,"publicationDate":"2021-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80282769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It was shown recently by Su et al. (2016) that Nesterov's accelerated gradient method for minimizing a smooth convex function $f$ can be thought of as the time discretization of a second-order ODE, and that $f(x(t))$ converges to its optimal value at a rate of $mathcal{O}(1/t^2)$ along any trajectory $x(t)$ of this ODE. A variational formulation was introduced in Wibisono et al. (2016) which allowed for accelerated convergence at a rate of $mathcal{O}(1/t^p)$, for arbitrary $p>0$, in normed vector spaces. This framework was exploited in Duruisseaux et al. (2021) to design efficient explicit algorithms for symplectic accelerated optimization. In Alimisis et al. (2020), a second-order ODE was proposed as the continuous-time limit of a Riemannian accelerated algorithm, and it was shown that the objective function $f(x(t))$ converges to its optimal value at a rate of $mathcal{O}(1/t^2)$ along solutions of this ODE. In this paper, we show that on Riemannian manifolds, the convergence rate of $f(x(t))$ to its optimal value can also be accelerated to an arbitrary convergence rate $mathcal{O}(1/t^p)$, by considering a family of time-dependent Bregman Lagrangian and Hamiltonian systems on Riemannian manifolds. This generalizes the results of Wibisono et al. (2016) to Riemannian manifolds and also provides a variational framework for accelerated optimization on Riemannian manifolds. An approach based on the time-invariance property of the family of Bregman Lagrangians and Hamiltonians was used to construct very efficient optimization algorithms in Duruisseaux et al. (2021), and we establish a similar time-invariance property in the Riemannian setting. One expects that a geometric numerical integrator that is time-adaptive, symplectic, and Riemannian manifold preserving will yield a class of promising optimization algorithms on manifolds.
Su et al.(2016)最近表明,用于最小化光滑凸函数$f$的Nesterov加速梯度方法可以被认为是二阶ODE的时间离散化,并且$f(x(t))$沿着该ODE的任何轨迹$x(t)$以$mathcal{O}(1/t^2)$的速率收敛到其最优值。Wibisono等人(2016)引入了一个变分公式,该公式允许在赋范向量空间中以$mathcal{O}(1/t^p)$的速率加速收敛,对于任意$p>0$。Duruisseaux等人(2021)利用该框架为辛加速优化设计了高效的显式算法。在Alimisis et al.(2020)中,提出了二阶ODE作为黎曼加速算法的连续时间极限,并证明了目标函数f(x(t))$沿该ODE的解以$mathcal{O}(1/t^2)$的速率收敛到其最优值。在黎曼流形上,通过考虑黎曼流形上的一类时变布雷格曼-拉格朗日系统和哈密顿系统,我们证明了f(x(t))$到其最优值的收敛速率也可以加速到任意收敛速率$mathcal{O}(1/t^p)$。这将Wibisono等人(2016)的结果推广到黎曼流形,并为黎曼流形的加速优化提供了变分框架。Duruisseaux等人(2021)利用布雷格曼-拉格朗日算子和哈密顿算子族的时不变性质构建了非常高效的优化算法,我们在黎曼设置中建立了类似的时不变性质。人们期望一个具有时间适应性、辛性和保持黎曼流形的几何数值积分器将产生一类有前途的流形优化算法。
{"title":"A Variational Formulation of Accelerated Optimization on Riemannian Manifolds","authors":"Valentin Duruisseaux, M. Leok","doi":"10.1137/21m1395648","DOIUrl":"https://doi.org/10.1137/21m1395648","url":null,"abstract":"It was shown recently by Su et al. (2016) that Nesterov's accelerated gradient method for minimizing a smooth convex function $f$ can be thought of as the time discretization of a second-order ODE, and that $f(x(t))$ converges to its optimal value at a rate of $mathcal{O}(1/t^2)$ along any trajectory $x(t)$ of this ODE. A variational formulation was introduced in Wibisono et al. (2016) which allowed for accelerated convergence at a rate of $mathcal{O}(1/t^p)$, for arbitrary $p>0$, in normed vector spaces. This framework was exploited in Duruisseaux et al. (2021) to design efficient explicit algorithms for symplectic accelerated optimization. In Alimisis et al. (2020), a second-order ODE was proposed as the continuous-time limit of a Riemannian accelerated algorithm, and it was shown that the objective function $f(x(t))$ converges to its optimal value at a rate of $mathcal{O}(1/t^2)$ along solutions of this ODE. In this paper, we show that on Riemannian manifolds, the convergence rate of $f(x(t))$ to its optimal value can also be accelerated to an arbitrary convergence rate $mathcal{O}(1/t^p)$, by considering a family of time-dependent Bregman Lagrangian and Hamiltonian systems on Riemannian manifolds. This generalizes the results of Wibisono et al. (2016) to Riemannian manifolds and also provides a variational framework for accelerated optimization on Riemannian manifolds. An approach based on the time-invariance property of the family of Bregman Lagrangians and Hamiltonians was used to construct very efficient optimization algorithms in Duruisseaux et al. (2021), and we establish a similar time-invariance property in the Riemannian setting. One expects that a geometric numerical integrator that is time-adaptive, symplectic, and Riemannian manifold preserving will yield a class of promising optimization algorithms on manifolds.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"1 1","pages":"649-674"},"PeriodicalIF":0.0,"publicationDate":"2021-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88860357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Belyaeva, Kaie Kubjas, Lawrence Sun, Caroline Uhler
The spatial organization of the DNA in the cell nucleus plays an important role for gene regulation, DNA replication, and genomic integrity. Through the development of chromosome conformation capture experiments (such as 3C, 4C, Hi-C) it is now possible to obtain the contact frequencies of the DNA at the whole-genome level. In this paper, we study the problem of reconstructing the 3D organization of the genome from such whole-genome contact frequencies. A standard approach is to transform the contact frequencies into noisy distance measurements and then apply semidefinite programming (SDP) formulations to obtain the 3D configuration. However, neglected in such reconstructions is the fact that most eukaryotes including humans are diploid and therefore contain two copies of each genomic locus. We prove that the 3D organization of the DNA is not identifiable from distance measurements derived from contact frequencies in diploid organisms. In fact, there are infinitely many solutions even in the noise-free setting. We then discuss various additional biologically relevant and experimentally measurable constraints (including distances between neighboring genomic loci and higher-order interactions) and prove identifiability under these conditions. Furthermore, we provide SDP formulations for computing the 3D embedding of the DNA with these additional constraints and show that we can recover the true 3D embedding with high accuracy from both noiseless and noisy measurements. Finally, we apply our algorithm to real pairwise and higher-order contact frequency data and show that we can recover known genome organization patterns.
{"title":"Identifying 3D Genome Organization in Diploid Organisms via Euclidean Distance Geometry","authors":"A. Belyaeva, Kaie Kubjas, Lawrence Sun, Caroline Uhler","doi":"10.1137/21m1390372","DOIUrl":"https://doi.org/10.1137/21m1390372","url":null,"abstract":"The spatial organization of the DNA in the cell nucleus plays an important role for gene regulation, DNA replication, and genomic integrity. Through the development of chromosome conformation capture experiments (such as 3C, 4C, Hi-C) it is now possible to obtain the contact frequencies of the DNA at the whole-genome level. In this paper, we study the problem of reconstructing the 3D organization of the genome from such whole-genome contact frequencies. A standard approach is to transform the contact frequencies into noisy distance measurements and then apply semidefinite programming (SDP) formulations to obtain the 3D configuration. However, neglected in such reconstructions is the fact that most eukaryotes including humans are diploid and therefore contain two copies of each genomic locus. We prove that the 3D organization of the DNA is not identifiable from distance measurements derived from contact frequencies in diploid organisms. In fact, there are infinitely many solutions even in the noise-free setting. We then discuss various additional biologically relevant and experimentally measurable constraints (including distances between neighboring genomic loci and higher-order interactions) and prove identifiability under these conditions. Furthermore, we provide SDP formulations for computing the 3D embedding of the DNA with these additional constraints and show that we can recover the true 3D embedding with high accuracy from both noiseless and noisy measurements. Finally, we apply our algorithm to real pairwise and higher-order contact frequency data and show that we can recover known genome organization patterns.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"33 1","pages":"204-228"},"PeriodicalIF":0.0,"publicationDate":"2021-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85464754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a framework for learning Granger causality networks for multivariate categorical time series based on the mixture transition distribution (MTD) model. Traditionally, MTD is plagued by a nonconvex objective, non-identifiability, and presence of local optima. To circumvent these problems, we recast inference in the MTD as a convex problem. The new formulation facilitates the application of MTD to high-dimensional multivariate time series. As a baseline, we also formulate a multi-output logistic autoregressive model (mLTD), which while a straightforward extension of autoregressive Bernoulli generalized linear models, has not been previously applied to the analysis of multivariate categorial time series. We establish identifiability conditions of the MTD model and compare them to those for mLTD. We further devise novel and efficient optimization algorithms for MTD based on our proposed convex formulation, and compare the MTD and mLTD in both simulated and real data experiments. Finally, we establish consistency of the convex MTD in high dimensions. Our approach simultaneously provides a comparison of methods for network inference in categorical time series and opens the door to modern, regularized inference with the MTD model.
{"title":"The Convex Mixture Distribution: Granger Causality for Categorical Time Series.","authors":"Alex Tank, Xiudi Li, Emily B Fox, Ali Shojaie","doi":"10.1137/20m133097x","DOIUrl":"10.1137/20m133097x","url":null,"abstract":"<p><p>We present a framework for learning Granger causality networks for multivariate categorical time series based on the mixture transition distribution (MTD) model. Traditionally, MTD is plagued by a nonconvex objective, non-identifiability, and presence of local optima. To circumvent these problems, we recast inference in the MTD as a convex problem. The new formulation facilitates the application of MTD to high-dimensional multivariate time series. As a baseline, we also formulate a multi-output logistic autoregressive model (mLTD), which while a straightforward extension of autoregressive Bernoulli generalized linear models, has not been previously applied to the analysis of multivariate categorial time series. We establish identifiability conditions of the MTD model and compare them to those for mLTD. We further devise novel and efficient optimization algorithms for MTD based on our proposed convex formulation, and compare the MTD and mLTD in both simulated and real data experiments. Finally, we establish consistency of the convex MTD in high dimensions. Our approach simultaneously provides a comparison of methods for network inference in categorical time series and opens the door to modern, regularized inference with the MTD model.</p>","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"3 1","pages":"83-112"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10586348/pdf/nihms-1885629.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49686043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Effectiveness of Richardson Extrapolation in Data Science","authors":"Francis R. Bach","doi":"10.1137/21m1397349","DOIUrl":"https://doi.org/10.1137/21m1397349","url":null,"abstract":",","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"20 1","pages":"1251-1277"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78237825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we apply the previously introduced approximation method based on the ANOVA (analysis of variance) decomposition and Grouped Transformations to synthetic and real data. The advantage of this method is the interpretability of the approximation, i.e., the ability to rank the importance of the attribute interactions or the variable couplings. Moreover, we are able to generate an attribute ranking to identify unimportant variables and reduce the dimensionality of the problem. We compare the method to other approaches on publicly available benchmark datasets.
{"title":"Interpretable Approximation of High-Dimensional Data","authors":"D. Potts, Michael Schmischke","doi":"10.1137/21M1407707","DOIUrl":"https://doi.org/10.1137/21M1407707","url":null,"abstract":"In this paper we apply the previously introduced approximation method based on the ANOVA (analysis of variance) decomposition and Grouped Transformations to synthetic and real data. The advantage of this method is the interpretability of the approximation, i.e., the ability to rank the importance of the attribute interactions or the variable couplings. Moreover, we are able to generate an attribute ranking to identify unimportant variables and reduce the dimensionality of the problem. We compare the method to other approaches on publicly available benchmark datasets.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"2018 1","pages":"1301-1323"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74747897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}