We propose a hybrid resampling method to approximate finitely supported Wasserstein barycenters on large-scale datasets, which can be combined with any exact solver. Nonasymptotic bounds on the expected error of the objective value as well as the barycenters themselves allow to calibrate computational cost and statistical accuracy. The rate of these upper bounds is shown to be optimal and independent of the underlying dimension, which appears only in the constants. Using a simple modification of the subgradient descent algorithm of Cuturi and Doucet, we showcase the applicability of our method on a myriad of simulated datasets, as well as a real-data example which are out of reach for state of the art algorithms for computing Wasserstein barycenters.
{"title":"Randomized Wasserstein Barycenter Computation: Resampling with Statistical Guarantees","authors":"F. Heinemann, A. Munk, Y. Zemel","doi":"10.1137/20m1385263","DOIUrl":"https://doi.org/10.1137/20m1385263","url":null,"abstract":"We propose a hybrid resampling method to approximate finitely supported Wasserstein barycenters on large-scale datasets, which can be combined with any exact solver. Nonasymptotic bounds on the expected error of the objective value as well as the barycenters themselves allow to calibrate computational cost and statistical accuracy. The rate of these upper bounds is shown to be optimal and independent of the underlying dimension, which appears only in the constants. Using a simple modification of the subgradient descent algorithm of Cuturi and Doucet, we showcase the applicability of our method on a myriad of simulated datasets, as well as a real-data example which are out of reach for state of the art algorithms for computing Wasserstein barycenters.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"69 1","pages":"229-259"},"PeriodicalIF":0.0,"publicationDate":"2020-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83296297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present new stochastic geometry theorems that give bounds on the probability that $m$ random data classes all contain a point in common in their convex hulls. These theorems relate to the existe...
{"title":"Stochastic Tverberg Theorems With Applications in Multiclass Logistic Regression, Separability, and Centerpoints of Data","authors":"J. D. Loera, T. A. Hogan","doi":"10.1137/19m1277102","DOIUrl":"https://doi.org/10.1137/19m1277102","url":null,"abstract":"We present new stochastic geometry theorems that give bounds on the probability that $m$ random data classes all contain a point in common in their convex hulls. These theorems relate to the existe...","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"116 1","pages":"1151-1166"},"PeriodicalIF":0.0,"publicationDate":"2020-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79373299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep neural networks generalize well despite being exceedingly overparameterized and being trained without explicit regularization. This curious phenomenon has inspired extensive research activity in establishing its statistical principles: Under what conditions is it observed? How do these depend on the data and on the training algorithm? When does regularization benefit generalization? While such questions remain wide open for deep neural nets, recent works have attempted gaining insights by studying simpler, often linear, models. Our paper contributes to this growing line of work by examining binary linear classification under a generative Gaussian mixture model. Motivated by recent results on the implicit bias of gradient descent, we study both max-margin SVM classifiers (corresponding to logistic loss) and min-norm interpolating classifiers (corresponding to least-squares loss). First, we leverage an idea introduced in [V. Muthukumar et al., arXiv:2005.08054, (2020)] to relate the SVM solution to the min-norm interpolating solution. Second, we derive novel non-asymptotic bounds on the classification error of the latter. Combining the two, we present novel sufficient conditions on the covariance spectrum and on the signal-to-noise ratio (SNR) under which interpolating estimators achieve asymptotically optimal performance as overparameterization increases. Interestingly, our results extend to a noisy model with constant probability noise flips. Contrary to previously studied discriminative data models, our results emphasize the crucial role of the SNR and its interplay with the data covariance. Finally, via a combination of analytical arguments and numerical demonstrations we identify conditions under which the interpolating estimator performs better than corresponding regularized estimates.
尽管深度神经网络过于参数化,而且训练时没有明确的正则化,但它泛化得很好。这种奇怪的现象激发了广泛的研究活动,以建立其统计原理:在什么条件下观察到它?这些是如何依赖于数据和训练算法的?什么时候正则化有利于泛化?虽然这些问题对于深度神经网络来说仍然是开放的,但最近的研究试图通过研究更简单的、通常是线性的模型来获得见解。我们的论文通过研究生成高斯混合模型下的二元线性分类,为这一不断增长的工作做出了贡献。受最近关于梯度下降隐式偏差的研究结果的启发,我们研究了最大边际SVM分类器(对应于逻辑损失)和最小范数插值分类器(对应于最小二乘损失)。首先,我们利用[V.]Muthukumar et al. [j] ., vol . 4:2005.08054, (2020) .]其次,我们对后者的分类误差给出了新的非渐近界。结合这两者,我们提出了新的充分条件,在协方差谱和信噪比(SNR)下,随着过参数化的增加,插值估计器达到渐近最优性能。有趣的是,我们的结果扩展到具有恒定概率的噪声翻转的噪声模型。与以往研究的判别数据模型相反,我们的研究结果强调信噪比及其与数据协方差的相互作用的关键作用。最后,通过分析论证和数值论证的结合,我们确定了插值估计量比相应的正则化估计性能更好的条件。
{"title":"Binary Classification of Gaussian Mixtures: Abundance of Support Vectors, Benign Overfitting, and Regularization","authors":"Ke Wang, Christos Thrampoulidis","doi":"10.1137/21m1415121","DOIUrl":"https://doi.org/10.1137/21m1415121","url":null,"abstract":"Deep neural networks generalize well despite being exceedingly overparameterized and being trained without explicit regularization. This curious phenomenon has inspired extensive research activity in establishing its statistical principles: Under what conditions is it observed? How do these depend on the data and on the training algorithm? When does regularization benefit generalization? While such questions remain wide open for deep neural nets, recent works have attempted gaining insights by studying simpler, often linear, models. Our paper contributes to this growing line of work by examining binary linear classification under a generative Gaussian mixture model. Motivated by recent results on the implicit bias of gradient descent, we study both max-margin SVM classifiers (corresponding to logistic loss) and min-norm interpolating classifiers (corresponding to least-squares loss). First, we leverage an idea introduced in [V. Muthukumar et al., arXiv:2005.08054, (2020)] to relate the SVM solution to the min-norm interpolating solution. Second, we derive novel non-asymptotic bounds on the classification error of the latter. Combining the two, we present novel sufficient conditions on the covariance spectrum and on the signal-to-noise ratio (SNR) under which interpolating estimators achieve asymptotically optimal performance as overparameterization increases. Interestingly, our results extend to a noisy model with constant probability noise flips. Contrary to previously studied discriminative data models, our results emphasize the crucial role of the SNR and its interplay with the data covariance. Finally, via a combination of analytical arguments and numerical demonstrations we identify conditions under which the interpolating estimator performs better than corresponding regularized estimates.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"20 1","pages":"260-284"},"PeriodicalIF":0.0,"publicationDate":"2020-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75813460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Overwhelming theoretical and empirical evidence shows that mildly overparametrized neural networks---those with more connections than the size of the training data---are often able to memorize the ...
{"title":"Memory Capacity of Neural Networks with Threshold and Rectified Linear Unit Activations","authors":"R. Vershynin","doi":"10.1137/20m1314884","DOIUrl":"https://doi.org/10.1137/20m1314884","url":null,"abstract":"Overwhelming theoretical and empirical evidence shows that mildly overparametrized neural networks---those with more connections than the size of the training data---are often able to memorize the ...","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"19 1","pages":"1004-1033"},"PeriodicalIF":0.0,"publicationDate":"2020-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76907136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
While post-training model compression can greatly reduce the inference cost of a deep neural network, uncompressed training still consumes a huge amount of hardware resources, run-time and energy. It is highly desirable to directly train a compact neural network from scratch with low memory and low computational cost. Low-rank tensor decomposition is one of the most effective approaches to reduce the memory and computing requirements of large-size neural networks. However, directly training a low-rank tensorized neural network is a very challenging task because it is hard to determine a proper tensor rank {it a priori}, which controls the model complexity and compression ratio in the training process. This paper presents a novel end-to-end framework for low-rank tensorized training of neural networks. We first develop a flexible Bayesian model that can handle various low-rank tensor formats (e.g., CP, Tucker, tensor train and tensor-train matrix) that compress neural network parameters in training. This model can automatically determine the tensor ranks inside a nonlinear forward model, which is beyond the capability of existing Bayesian tensor methods. We further develop a scalable stochastic variational inference solver to estimate the posterior density of large-scale problems in training. Our work provides the first general-purpose rank-adaptive framework for end-to-end tensorized training. Our numerical results on various neural network architectures show orders-of-magnitude parameter reduction and little accuracy loss (or even better accuracy) in the training process. Specifically, on a very large deep learning recommendation system with over $4.2times 10^9$ model parameters, our method can reduce the variables to only $1.6times 10^5$ automatically in the training process (i.e., by $2.6times 10^4$ times) while achieving almost the same accuracy.
虽然训练后模型压缩可以大大降低深度神经网络的推理成本,但未压缩的训练仍然消耗大量的硬件资源、运行时间和能量。直接从零开始训练具有低内存和低计算成本的紧凑型神经网络是非常理想的。低秩张量分解是减少大型神经网络内存和计算需求的最有效方法之一。然而,直接训练一个低秩张化神经网络是一项非常具有挑战性的任务,因为很难确定一个合适的张量秩(it a priori),它在训练过程中控制着模型的复杂度和压缩比。提出了一种新颖的端到端神经网络低秩张化训练框架。我们首先开发了一个灵活的贝叶斯模型,可以处理各种低秩张量格式(例如,CP, Tucker,张量训练和张量训练矩阵),在训练中压缩神经网络参数。该模型可以自动确定非线性正演模型内的张量秩,这是现有贝叶斯张量方法所无法做到的。我们进一步开发了一个可扩展的随机变分推理求解器来估计训练中大规模问题的后验密度。我们的工作为端到端张化训练提供了第一个通用的秩自适应框架。我们在各种神经网络架构上的数值结果表明,在训练过程中参数降低了数量级,精度损失很小(甚至更高)。具体来说,在一个拥有超过4.2times 10^9$模型参数的非常大的深度学习推荐系统上,我们的方法可以在训练过程中自动将变量减少到只有1.6times 10^5$(即减少2.6times 10^4$),同时达到几乎相同的精度。
{"title":"Towards Compact Neural Networks via End-to-End Training: A Bayesian Tensor Approach with Automatic Rank Determination","authors":"Cole Hawkins, Xing-er Liu, Zheng Zhang","doi":"10.1137/21m1391444","DOIUrl":"https://doi.org/10.1137/21m1391444","url":null,"abstract":"While post-training model compression can greatly reduce the inference cost of a deep neural network, uncompressed training still consumes a huge amount of hardware resources, run-time and energy. It is highly desirable to directly train a compact neural network from scratch with low memory and low computational cost. Low-rank tensor decomposition is one of the most effective approaches to reduce the memory and computing requirements of large-size neural networks. However, directly training a low-rank tensorized neural network is a very challenging task because it is hard to determine a proper tensor rank {it a priori}, which controls the model complexity and compression ratio in the training process. This paper presents a novel end-to-end framework for low-rank tensorized training of neural networks. We first develop a flexible Bayesian model that can handle various low-rank tensor formats (e.g., CP, Tucker, tensor train and tensor-train matrix) that compress neural network parameters in training. This model can automatically determine the tensor ranks inside a nonlinear forward model, which is beyond the capability of existing Bayesian tensor methods. We further develop a scalable stochastic variational inference solver to estimate the posterior density of large-scale problems in training. Our work provides the first general-purpose rank-adaptive framework for end-to-end tensorized training. Our numerical results on various neural network architectures show orders-of-magnitude parameter reduction and little accuracy loss (or even better accuracy) in the training process. Specifically, on a very large deep learning recommendation system with over $4.2times 10^9$ model parameters, our method can reduce the variables to only $1.6times 10^5$ automatically in the training process (i.e., by $2.6times 10^4$ times) while achieving almost the same accuracy.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"45 1","pages":"46-71"},"PeriodicalIF":0.0,"publicationDate":"2020-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80791672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Archetypal analysis is an unsupervised learning method that uses a convex polytope to summarize multivariate data. For fixed $k$, the method finds a convex polytope with $k$ vertices, called archetype points, such that the polytope is contained in the convex hull of the data and the mean squared distance between the data and the polytope is minimal. In this paper, we prove a consistency result that shows if the data is independently sampled from a probability measure with bounded support, then the archetype points converge to a solution of the continuum version of the problem, of which we identify and establish several properties. We also obtain the convergence rate of the optimal objective values under appropriate assumptions on the distribution. If the data is independently sampled from a distribution with unbounded support, we also prove a consistency result for a modified method that penalizes the dispersion of the archetype points. Our analysis is supported by detailed computational experiments of the archetype points for data sampled from the uniform distribution in a disk, the normal distribution, an annular distribution, and a Gaussian mixture model.
{"title":"Consistency of Archetypal Analysis","authors":"B. Osting, Dong Wang, Yiming Xu, Dominique Zosso","doi":"10.1137/20M1331792","DOIUrl":"https://doi.org/10.1137/20M1331792","url":null,"abstract":"Archetypal analysis is an unsupervised learning method that uses a convex polytope to summarize multivariate data. For fixed $k$, the method finds a convex polytope with $k$ vertices, called archetype points, such that the polytope is contained in the convex hull of the data and the mean squared distance between the data and the polytope is minimal. In this paper, we prove a consistency result that shows if the data is independently sampled from a probability measure with bounded support, then the archetype points converge to a solution of the continuum version of the problem, of which we identify and establish several properties. We also obtain the convergence rate of the optimal objective values under appropriate assumptions on the distribution. If the data is independently sampled from a distribution with unbounded support, we also prove a consistency result for a modified method that penalizes the dispersion of the archetype points. Our analysis is supported by detailed computational experiments of the archetype points for data sampled from the uniform distribution in a disk, the normal distribution, an annular distribution, and a Gaussian mixture model.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"50 1","pages":"1-30"},"PeriodicalIF":0.0,"publicationDate":"2020-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90037347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abhishek K. Gupta, Hao Chen, Jianzong Pi, Gaurav Tendolkar
Recursive stochastic algorithms have gained significant attention in the recent past due to data-driven applications. Examples include stochastic gradient descent for solving large-scale optimizati...
由于数据驱动的应用,递归随机算法在最近得到了极大的关注。例子包括解决大规模优化问题的随机梯度下降。
{"title":"Some Limit Properties of Markov Chains Induced by Recursive Stochastic Algorithms","authors":"Abhishek K. Gupta, Hao Chen, Jianzong Pi, Gaurav Tendolkar","doi":"10.1137/19m1258104","DOIUrl":"https://doi.org/10.1137/19m1258104","url":null,"abstract":"Recursive stochastic algorithms have gained significant attention in the recent past due to data-driven applications. Examples include stochastic gradient descent for solving large-scale optimizati...","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"97 1","pages":"967-1003"},"PeriodicalIF":0.0,"publicationDate":"2020-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79472710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
W. Weir, Benjamin Walker, Lenka Zdeborov'a, P. Mucha
Modularity-based community detection encompasses a number of widely used, efficient heuristics for identification of structure in networks. Recently, a belief propagation approach to modularity opt...
{"title":"Multilayer Modularity Belief Propagation to Assess Detectability of Community Structure","authors":"W. Weir, Benjamin Walker, Lenka Zdeborov'a, P. Mucha","doi":"10.1137/19m1279812","DOIUrl":"https://doi.org/10.1137/19m1279812","url":null,"abstract":"Modularity-based community detection encompasses a number of widely used, efficient heuristics for identification of structure in networks. Recently, a belief propagation approach to modularity opt...","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"1 1","pages":"872-900"},"PeriodicalIF":0.0,"publicationDate":"2020-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90478008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accounting for inequality constraints, such as boundedness, monotonicity or convexity, is challenging when modeling costly-to-evaluate black box functions. In this regard, finite-dimensional Gaussian process (GP) models bring a valuable solution, as they guarantee that the inequality constraints are satisfied everywhere. Nevertheless, these models are currently restricted to small dimensional situations (up to dimension 5). Addressing this issue, we introduce the MaxMod algorithm that sequentially inserts one-dimensional knots or adds active variables, thereby performing at the same time dimension reduction and efficient knot allocation. We prove the convergence of this algorithm. In intermediary steps of the proof, we propose the notion of multi-affine extension and study its properties. We also prove the convergence of finite-dimensional GPs, when the knots are not dense in the input space, extending the recent literature. With simulated and real data, we demonstrate that the MaxMod algorithm remains efficient in higher dimension (at least in dimension 20), and has a smaller computational complexity than other constrained GP models from the state-of-the-art, to reach a given approximation error.
{"title":"Sequential Construction and Dimension Reduction of Gaussian Processes Under Inequality Constraints","authors":"F. Bachoc, A. F. López-Lopera, O. Roustant","doi":"10.1137/21m1407513","DOIUrl":"https://doi.org/10.1137/21m1407513","url":null,"abstract":"Accounting for inequality constraints, such as boundedness, monotonicity or convexity, is challenging when modeling costly-to-evaluate black box functions. In this regard, finite-dimensional Gaussian process (GP) models bring a valuable solution, as they guarantee that the inequality constraints are satisfied everywhere. Nevertheless, these models are currently restricted to small dimensional situations (up to dimension 5). Addressing this issue, we introduce the MaxMod algorithm that sequentially inserts one-dimensional knots or adds active variables, thereby performing at the same time dimension reduction and efficient knot allocation. We prove the convergence of this algorithm. In intermediary steps of the proof, we propose the notion of multi-affine extension and study its properties. We also prove the convergence of finite-dimensional GPs, when the knots are not dense in the input space, extending the recent literature. With simulated and real data, we demonstrate that the MaxMod algorithm remains efficient in higher dimension (at least in dimension 20), and has a smaller computational complexity than other constrained GP models from the state-of-the-art, to reach a given approximation error.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45269075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emmanuel Chevallier, Didong Li, Yulong Lu, D. Dunson
. In many applications, the curvature of the space supporting the data makes the statistical modelling challenging. In this paper we discuss the construction and use of probability distributions wrapped around manifolds using exponential maps. These distributions have already been used on specific manifolds. We describe their construction in the unifying framework of affine locally symmetric spaces. Affine locally symmetric spaces are a broad class of manifolds containing many manifolds encountered in data sciences. We show that on these spaces, exponential-wrapped distributions enjoy interesting properties for practical use. We provide the generic expression of the Jacobian appearing in these distributions and compute it on two particular examples: Grassmannians and pseudo-hyperboloids. We illustrate the interest of such distributions in a classification experiment on simulated data.
{"title":"Exponential-Wrapped Distributions on Symmetric Spaces","authors":"Emmanuel Chevallier, Didong Li, Yulong Lu, D. Dunson","doi":"10.1137/21m1461551","DOIUrl":"https://doi.org/10.1137/21m1461551","url":null,"abstract":". In many applications, the curvature of the space supporting the data makes the statistical modelling challenging. In this paper we discuss the construction and use of probability distributions wrapped around manifolds using exponential maps. These distributions have already been used on specific manifolds. We describe their construction in the unifying framework of affine locally symmetric spaces. Affine locally symmetric spaces are a broad class of manifolds containing many manifolds encountered in data sciences. We show that on these spaces, exponential-wrapped distributions enjoy interesting properties for practical use. We provide the generic expression of the Jacobian appearing in these distributions and compute it on two particular examples: Grassmannians and pseudo-hyperboloids. We illustrate the interest of such distributions in a classification experiment on simulated data.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47949469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}