J. Mach. Learn. Res.最新文献

英文中文

Maximum Likelihood Estimation in Gaussian Process Regression is Ill-Posed 高斯过程回归中的极大似然估计是不适定的

J. Mach. Learn. Res.

Pub Date : 2022-03-17 DOI: 10.48550/arXiv.2203.09179

T. Karvonen, C. Oates

Gaussian process regression underpins countless academic and industrial applications of machine learning and statistics, with maximum likelihood estimation routinely used to select appropriate parameters for the covariance kernel. However, it remains an open problem to establish the circumstances in which maximum likelihood estimation is well-posed, that is, when the predictions of the regression model are insensitive to small perturbations of the data. This article identifies scenarios where the maximum likelihood estimator fails to be well-posed, in that the predictive distributions are not Lipschitz in the data with respect to the Hellinger distance. These failure cases occur in the noiseless data setting, for any Gaussian process with a stationary covariance function whose lengthscale parameter is estimated using maximum likelihood. Although the failure of maximum likelihood estimation is part of Gaussian process folklore, these rigorous theoretical results appear to be the first of their kind. The implication of these negative results is that well-posedness may need to be assessed post-hoc, on a case-by-case basis, when maximum likelihood estimation is used to train a Gaussian process model.

高斯过程回归支持机器学习和统计学的无数学术和工业应用，最大似然估计通常用于为协方差核选择适当的参数。然而，建立最大似然估计是适定的情况仍然是一个悬而未决的问题，也就是说，当回归模型的预测对数据的小扰动不敏感时。本文确定了最大似然估计量不能被适定的情况，因为数据中关于海灵格距离的预测分布不是利普希茨分布。这些故障情况发生在无噪声数据设置中，对于任何具有平稳协方差函数的高斯过程，其长度尺度参数使用最大似然估计。虽然最大似然估计的失败是高斯过程民间传说的一部分，但这些严格的理论结果似乎是此类研究的第一次。这些负面结果的含义是，当使用最大似然估计来训练高斯过程模型时，可能需要在个案基础上对适当性进行事后评估。

{"title":"Maximum Likelihood Estimation in Gaussian Process Regression is Ill-Posed","authors":"T. Karvonen, C. Oates","doi":"10.48550/arXiv.2203.09179","DOIUrl":"https://doi.org/10.48550/arXiv.2203.09179","url":null,"abstract":"Gaussian process regression underpins countless academic and industrial applications of machine learning and statistics, with maximum likelihood estimation routinely used to select appropriate parameters for the covariance kernel. However, it remains an open problem to establish the circumstances in which maximum likelihood estimation is well-posed, that is, when the predictions of the regression model are insensitive to small perturbations of the data. This article identifies scenarios where the maximum likelihood estimator fails to be well-posed, in that the predictive distributions are not Lipschitz in the data with respect to the Hellinger distance. These failure cases occur in the noiseless data setting, for any Gaussian process with a stationary covariance function whose lengthscale parameter is estimated using maximum likelihood. Although the failure of maximum likelihood estimation is part of Gaussian process folklore, these rigorous theoretical results appear to be the first of their kind. The implication of these negative results is that well-posedness may need to be assessed post-hoc, on a case-by-case basis, when maximum likelihood estimation is used to train a Gaussian process model.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"34 1","pages":"120:1-120:47"},"PeriodicalIF":0.0,"publicationDate":"2022-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78760449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Deepchecks: A Library for Testing and Validating Machine Learning Models and Data Deepchecks:用于测试和验证机器学习模型和数据的库

J. Mach. Learn. Res.

Pub Date : 2022-03-16 DOI: 10.48550/arXiv.2203.08491

Shir Chorev, Philip Tannor, Daniel Israel, Noam Bressler, I. Gabbay, Nir Hutnik, Jonatan Liberman, Matan Perlmutter, Yurii Romanyshyn, L. Rokach

This paper presents Deepchecks, a Python library for comprehensively validating machine learning models and data. Our goal is to provide an easy-to-use library comprising of many checks related to various types of issues, such as model predictive performance, data integrity, data distribution mismatches, and more. The package is distributed under the GNU Affero General Public License (AGPL) and relies on core libraries from the scientific Python ecosystem: scikit-learn, PyTorch, NumPy, pandas, and SciPy. Source code, documentation, examples, and an extensive user guide can be found at url{https://github.com/deepchecks/deepchecks} and url{https://docs.deepchecks.com/}.

本文介绍了Deepchecks，一个用于全面验证机器学习模型和数据的Python库。我们的目标是提供一个易于使用的库，其中包含与各种类型的问题相关的许多检查，例如模型预测性能、数据完整性、数据分布不匹配等等。该软件包在GNU Affero通用公共许可证(AGPL)下发布，并依赖于科学Python生态系统的核心库:scikit-learn, PyTorch, NumPy, pandas和SciPy。源代码、文档、示例和广泛的用户指南可以在url{https://github.com/deepchecks/deepchecks}和url{https://docs.deepchecks.com/}上找到。

引用次数: 6

Robust Load Balancing with Machine Learned Advice 强大的负载平衡与机器学习的建议

J. Mach. Learn. Res.

Pub Date : 2022-01-01 DOI: 10.1137/1.9781611977073.2

Sara Ahmadian, Hossein Esfandiari, V. Mirrokni, Binghui Peng

Motivated by the exploding growth of web-based services and the importance of eﬃciently managing the computational resources of such systems, we introduce and study a theoretical model for load balancing of very large databases such as commercial search engines. Our model is a more realistic version of the well-received balls-into-bins model with an additional constraint that limits the number of servers that carry each piece of the data. This additional constraint is necessary when, on one hand, the data is so large that we can not copy the whole data on each server. On the other hand, the query response time is so limited that we can not ignore the fact that the number of queries for each piece of the data changes over time, and hence we can not simply split the data over diﬀerent machines. In this paper, we develop an almost optimal load balancing algorithm that works given an estimate of the load of each piece of the data. Our algorithm is almost perfectly robust to wrong estimates, to the extent that even when all of the loads are adversarially chosen the performance of our algorithm is 1 − 1 /e , which is provably optimal. Along the way, we develop various techniques for analyzing the balls-into-bins process under certain correlations and build a novel connection with the multiplicative weights update scheme.

基于web服务的爆炸式增长以及有效管理此类系统的计算资源的重要性，我们引入并研究了一个用于超大型数据库(如商业搜索引擎)负载平衡的理论模型。我们的模型是广受欢迎的“把球放进箱子”模型的一个更现实的版本，它附加了一个约束，限制了承载每条数据的服务器数量。一方面，当数据非常大，我们无法在每个服务器上复制整个数据时，这个额外的约束是必要的。另一方面，查询响应时间是如此有限，以至于我们不能忽略这样一个事实，即每个数据块的查询数量会随着时间的推移而变化，因此我们不能简单地将数据分割到不同的机器上。在本文中，我们开发了一种几乎最优的负载平衡算法，该算法在给定每个数据块的负载估计的情况下工作。我们的算法对于错误的估计具有几乎完美的鲁棒性，甚至当所有负载都是对抗性选择时，我们的算法的性能为1−1 /e，这可以证明是最优的。在此过程中，我们开发了各种技术来分析特定相关性下的球入箱过程，并使用乘法权更新方案建立了新的连接。

{"title":"Robust Load Balancing with Machine Learned Advice","authors":"Sara Ahmadian, Hossein Esfandiari, V. Mirrokni, Binghui Peng","doi":"10.1137/1.9781611977073.2","DOIUrl":"https://doi.org/10.1137/1.9781611977073.2","url":null,"abstract":"Motivated by the exploding growth of web-based services and the importance of eﬃciently managing the computational resources of such systems, we introduce and study a theoretical model for load balancing of very large databases such as commercial search engines. Our model is a more realistic version of the well-received balls-into-bins model with an additional constraint that limits the number of servers that carry each piece of the data. This additional constraint is necessary when, on one hand, the data is so large that we can not copy the whole data on each server. On the other hand, the query response time is so limited that we can not ignore the fact that the number of queries for each piece of the data changes over time, and hence we can not simply split the data over diﬀerent machines. In this paper, we develop an almost optimal load balancing algorithm that works given an estimate of the load of each piece of the data. Our algorithm is almost perfectly robust to wrong estimates, to the extent that even when all of the loads are adversarially chosen the performance of our algorithm is 1 − 1 /e , which is provably optimal. Along the way, we develop various techniques for analyzing the balls-into-bins process under certain correlations and build a novel connection with the multiplicative weights update scheme.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"23 1","pages":"44:1-44:46"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83981817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms 连续时空中的策略梯度和行为-批评学习:理论与算法

J. Mach. Learn. Res.

Pub Date : 2021-11-22 DOI: 10.2139/ssrn.3969101

Yanwei Jia, X. Zhou

We study policy gradient (PG) for reinforcement learning in continuous time and space under the regularized exploratory formulation developed by Wang et al. (2020). We represent the gradient of the value function with respect to a given parameterized stochastic policy as the expected integration of an auxiliary running reward function that can be evaluated using samples and the current value function. This effectively turns PG into a policy evaluation (PE) problem, enabling us to apply the martingale approach recently developed by Jia and Zhou (2021) for PE to solve our PG problem. Based on this analysis, we propose two types of the actor-critic algorithms for RL, where we learn and update value functions and policies simultaneously and alternatingly. The first type is based directly on the aforementioned representation which involves future trajectories and hence is offline. The second type, designed for online learning, employs the first-order condition of the policy gradient and turns it into martingale orthogonality conditions. These conditions are then incorporated using stochastic approximation when updating policies. Finally, we demonstrate the algorithms by simulations in two concrete examples.

我们在Wang等人(2020)开发的正则化探索性公式下研究连续时间和空间中的强化学习策略梯度(PG)。我们将值函数相对于给定参数化随机策略的梯度表示为辅助运行奖励函数的期望积分，该函数可以使用样本和当前值函数进行评估。这有效地将PG转变为政策评估(PE)问题，使我们能够应用Jia和Zhou(2021)最近为PE开发的鞅方法来解决我们的PG问题。基于这一分析，我们提出了两种类型的强化学习行为-批评算法，其中我们同时和交替地学习和更新价值函数和策略。第一种类型直接基于上述表示，涉及到未来的轨迹，因此是离线的。第二类是为在线学习而设计的，它采用策略梯度的一阶条件，并将其转化为鞅正交条件。然后在更新策略时使用随机逼近将这些条件合并。最后，通过两个具体实例对算法进行了仿真验证。

{"title":"Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms","authors":"Yanwei Jia, X. Zhou","doi":"10.2139/ssrn.3969101","DOIUrl":"https://doi.org/10.2139/ssrn.3969101","url":null,"abstract":"We study policy gradient (PG) for reinforcement learning in continuous time and space under the regularized exploratory formulation developed by Wang et al. (2020). We represent the gradient of the value function with respect to a given parameterized stochastic policy as the expected integration of an auxiliary running reward function that can be evaluated using samples and the current value function. This effectively turns PG into a policy evaluation (PE) problem, enabling us to apply the martingale approach recently developed by Jia and Zhou (2021) for PE to solve our PG problem. Based on this analysis, we propose two types of the actor-critic algorithms for RL, where we learn and update value functions and policies simultaneously and alternatingly. The first type is based directly on the aforementioned representation which involves future trajectories and hence is offline. The second type, designed for online learning, employs the first-order condition of the policy gradient and turns it into martingale orthogonality conditions. These conditions are then incorporated using stochastic approximation when updating policies. Finally, we demonstrate the algorithms by simulations in two concrete examples.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"104 1","pages":"275:1-275:50"},"PeriodicalIF":0.0,"publicationDate":"2021-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76109377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

Existence, Stability and Scalability of Orthogonal Convolutional Neural Networks 正交卷积神经网络的存在性、稳定性和可扩展性

J. Mach. Learn. Res.

Pub Date : 2021-08-12 DOI: 10.48550/arXiv.2108.05623

E. M. Achour, Franccois Malgouyres, Franck Mamalet

Imposing orthogonality on the layers of neural networks is known to facilitate the learning by limiting the exploding/vanishing of the gradient; decorrelate the features; improve the robustness. This paper studies the theoretical properties of orthogonal convolutional layers.We establish necessary and sufficient conditions on the layer architecture guaranteeing the existence of an orthogonal convolutional transform. The conditions prove that orthogonal convolutional transforms exist for almost all architectures used in practice for 'circular' padding.We also exhibit limitations with 'valid' boundary conditions and 'same' boundary conditions with zero-padding.Recently, a regularization term imposing the orthogonality of convolutional layers has been proposed, and impressive empirical results have been obtained in different applications (Wang et al. 2020).The second motivation of the present paper is to specify the theory behind this.We make the link between this regularization term and orthogonality measures. In doing so, we show that this regularization strategy is stable with respect to numerical and optimization errors and that, in the presence of small errors and when the size of the signal/image is large, the convolutional layers remain close to isometric.The theoretical results are confirmed with experiments and the landscape of the regularization term is studied. Experiments on real data sets show that when orthogonality is used to enforce robustness, the parameter multiplying the regularization termcan be used to tune a tradeoff between accuracy and orthogonality, for the benefit of both accuracy and robustness.Altogether, the study guarantees that the regularization proposed in Wang et al. (2020) is an efficient, flexible and stable numerical strategy to learn orthogonal convolutional layers.

在神经网络层上施加正交性可以通过限制梯度的爆炸/消失来促进学习;去关联特征;提高鲁棒性。本文研究正交卷积层的理论性质。建立了保证正交卷积变换存在的层结构的充分必要条件。这些条件证明了正交卷积变换在几乎所有用于“圆形”填充的体系结构中都存在。我们还展示了“有效”边界条件和带有零填充的“相同”边界条件的局限性。最近，提出了一个施加卷积层正交性的正则化项，并在不同的应用中获得了令人印象深刻的经验结果(Wang et al. 2020)。本文的第二个动机是详细说明这背后的理论。我们在正则化项和正交度量之间建立联系。在此过程中，我们证明了这种正则化策略在数值误差和优化误差方面是稳定的，并且在存在小误差和信号/图像大小较大时，卷积层保持接近等距。实验验证了理论结果，并对正则化项的格局进行了研究。在真实数据集上的实验表明，当使用正交性来增强鲁棒性时，可以使用参数乘以正则化项来调整精度和正交性之间的权衡，从而既有利于精度又有利于鲁棒性。综上所述，本研究保证了Wang et al.(2020)提出的正则化是一种高效、灵活、稳定的学习正交卷积层的数值策略。

{"title":"Existence, Stability and Scalability of Orthogonal Convolutional Neural Networks","authors":"E. M. Achour, Franccois Malgouyres, Franck Mamalet","doi":"10.48550/arXiv.2108.05623","DOIUrl":"https://doi.org/10.48550/arXiv.2108.05623","url":null,"abstract":"Imposing orthogonality on the layers of neural networks is known to facilitate the learning by limiting the exploding/vanishing of the gradient; decorrelate the features; improve the robustness. This paper studies the theoretical properties of orthogonal convolutional layers.We establish necessary and sufficient conditions on the layer architecture guaranteeing the existence of an orthogonal convolutional transform. The conditions prove that orthogonal convolutional transforms exist for almost all architectures used in practice for 'circular' padding.We also exhibit limitations with 'valid' boundary conditions and 'same' boundary conditions with zero-padding.Recently, a regularization term imposing the orthogonality of convolutional layers has been proposed, and impressive empirical results have been obtained in different applications (Wang et al. 2020).The second motivation of the present paper is to specify the theory behind this.We make the link between this regularization term and orthogonality measures. In doing so, we show that this regularization strategy is stable with respect to numerical and optimization errors and that, in the presence of small errors and when the size of the signal/image is large, the convolutional layers remain close to isometric.The theoretical results are confirmed with experiments and the landscape of the regularization term is studied. Experiments on real data sets show that when orthogonality is used to enforce robustness, the parameter multiplying the regularization termcan be used to tune a tradeoff between accuracy and orthogonality, for the benefit of both accuracy and robustness.Altogether, the study guarantees that the regularization proposed in Wang et al. (2020) is an efficient, flexible and stable numerical strategy to learn orthogonal convolutional layers.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"27 1","pages":"347:1-347:56"},"PeriodicalIF":0.0,"publicationDate":"2021-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81025245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences 策略优化的网格化算子:研究正向和反向KL散度

J. Mach. Learn. Res.

Pub Date : 2021-07-17 DOI: 10.7939/R3-M4YX-N678

Alan Chan, Hugo Silva, Sungsu Lim, Tadashi Kozuno, A. Mahmood, Martha White

Approximate Policy Iteration (API) algorithms alternate between (approximate) policy evaluation and (approximate) greedification. Many different approaches have been explored for approximate policy evaluation, but less is understood about approximate greedification and what choices guarantee policy improvement. In this work, we investigate approximate greedification when reducing the KL divergence between the parameterized policy and the Boltzmann distribution over action values. In particular, we investigate the difference between the forward and reverse KL divergences, with varying degrees of entropy regularization. We show that the reverse KL has stronger policy improvement guarantees, but that reducing the forward KL can result in a worse policy. We also demonstrate, however, that a large enough reduction of the forward KL can induce improvement under additional assumptions. Empirically, we show on simple continuous-action environments that the forward KL can induce more exploration, but at the cost of a more suboptimal policy. No significant differences were observed in the discrete-action setting or on a suite of benchmark problems. Throughout, we highlight that many policy gradient methods can be seen as an instance of API, with either the forward or reverse KL for the policy update, and discuss next steps for understanding and improving our policy optimization algorithms.

近似策略迭代(Approximate Policy Iteration, API)算法在(近似)策略评估和(近似)网格化之间交替进行。人们已经探索了许多不同的近似政策评估方法，但对近似化和哪些选择保证政策改进的了解较少。在这项工作中，我们研究了在减少参数化策略与动作值上的玻尔兹曼分布之间的KL散度时的近似网格化。特别地，我们研究了不同熵正则化程度下正向和反向KL散度之间的差异。我们证明反向KL具有更强的政策改进保证，但减小正向KL可能导致更差的政策。然而，我们也证明，在额外的假设下，足够大的前向KL的减少可以诱导改进。从经验上看，我们表明在简单的连续动作环境中，前向KL可以诱导更多的探索，但代价是更次优的策略。在离散动作设置或一组基准问题中没有观察到显著差异。在整个过程中，我们强调许多策略梯度方法可以被视为API的一个实例，具有用于策略更新的正向或反向KL，并讨论了理解和改进策略优化算法的后续步骤。

{"title":"Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences","authors":"Alan Chan, Hugo Silva, Sungsu Lim, Tadashi Kozuno, A. Mahmood, Martha White","doi":"10.7939/R3-M4YX-N678","DOIUrl":"https://doi.org/10.7939/R3-M4YX-N678","url":null,"abstract":"Approximate Policy Iteration (API) algorithms alternate between (approximate) policy evaluation and (approximate) greedification. Many different approaches have been explored for approximate policy evaluation, but less is understood about approximate greedification and what choices guarantee policy improvement. In this work, we investigate approximate greedification when reducing the KL divergence between the parameterized policy and the Boltzmann distribution over action values. In particular, we investigate the difference between the forward and reverse KL divergences, with varying degrees of entropy regularization. We show that the reverse KL has stronger policy improvement guarantees, but that reducing the forward KL can result in a worse policy. We also demonstrate, however, that a large enough reduction of the forward KL can induce improvement under additional assumptions. Empirically, we show on simple continuous-action environments that the forward KL can induce more exploration, but at the cost of a more suboptimal policy. No significant differences were observed in the discrete-action setting or on a suite of benchmark problems. Throughout, we highlight that many policy gradient methods can be seen as an instance of API, with either the forward or reverse KL for the policy update, and discuss next steps for understanding and improving our policy optimization algorithms.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"60 1","pages":"253:1-253:79"},"PeriodicalIF":0.0,"publicationDate":"2021-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80463716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Combinatorial optimization and reasoning with graph neural networks 图神经网络的组合优化与推理

J. Mach. Learn. Res.

Pub Date : 2021-02-18 DOI: 10.24963/ijcai.2021/595

Quentin Cappart, D. Chételat, Elias Boutros Khalil, Andrea Lodi, Christopher Morris, Petar Velickovic

Combinatorial optimization is a well-established area in operations research and computer science. Until recently, its methods have mostly focused on solving problem instances in isolation, ignoring the fact that they often stem from related data distributions in practice. However, recent years have seen a surge of interest in using machine learning, especially graph neural networks, as a key building block for combinatorial tasks, either directly as solvers or by enhancing the former. This paper presents a conceptual review of recent key advancements in this emerging field, aiming at researchers in both optimization and machine learning.

组合优化是运筹学和计算机科学中一个成熟的领域。直到最近，它的方法主要集中在孤立地解决问题实例，忽略了它们在实践中往往源于相关数据分布的事实。然而，近年来人们对使用机器学习，特别是图神经网络，作为组合任务的关键构建块的兴趣激增，要么直接作为求解器，要么通过增强前者。本文对这一新兴领域的最新关键进展进行了概念性回顾，主要针对优化和机器学习方面的研究人员。

引用次数: 186

Matrix-wise ℓ 0-constrained sparse nonnegative least squares 矩阵方向的0约束稀疏非负最小二乘

J. Mach. Learn. Res.

Pub Date : 2020-11-22 DOI: 10.1007/s10994-022-06260-2

Nicolas Nadisic, Jérémy E. Cohen, A. Vandaele, Nicolas Gillis

引用次数: 0

Distributed Learning of Finite Gaussian Mixtures 有限高斯混合的分布式学习

J. Mach. Learn. Res.

Pub Date : 2020-10-20 DOI: 10.11159/icsta21.001

Qiong Zhang, Jiahua Chen

Advances in information technology have led to extremely large datasets that are often kept in different storage centers. Existing statistical methods must be adapted to overcome the resulting computational obstacles while retaining statistical validity and efficiency. Split-and-conquer approaches have been applied in many areas, including quantile processes, regression analysis, principal eigenspaces, and exponential families. We study split-and-conquer approaches for the distributed learning of finite Gaussian mixtures. We recommend a reduction strategy and develop an effective MM algorithm. The new estimator is shown to be consistent and retains root-n consistency under some general conditions. Experiments based on simulated and real-world data show that the proposed split-and-conquer approach has comparable statistical performance with the global estimator based on the full dataset, if the latter is feasible. It can even slightly outperform the global estimator if the model assumption does not match the real-world data. It also has better statistical and computational performance than some existing methods.

信息技术的进步导致了极其庞大的数据集，这些数据集通常保存在不同的存储中心。必须调整现有的统计方法，以克服由此产生的计算障碍，同时保持统计的有效性和效率。分而治之的方法已经应用于许多领域，包括分位数过程、回归分析、主特征空间和指数族。我们研究了有限高斯混合分布学习的分治方法。我们推荐了一个减少策略，并开发了一个有效的MM算法。在一般条件下，新估计量是相合的，并保持根n相合。基于模拟数据和真实数据的实验表明，如果基于完整数据集的全局估计器可行，所提出的分而治之方法与基于完整数据集的全局估计器具有相当的统计性能。如果模型假设与实际数据不匹配，它甚至可以略微优于全局估计器。与现有的一些方法相比，它具有更好的统计性能和计算性能。

引用次数: 3

Unlinked Monotone Regression 非链接单调回归

J. Mach. Learn. Res.

Pub Date : 2020-07-02 DOI: 10.3929/ETHZ-B-000501663

F. Balabdaoui, Charles R. Doss, C. Durot

We consider so-called univariate unlinked (sometimes "decoupled," or "shuffled") regression when the unknown regression curve is monotone. In standard monotone regression, one observes a pair $(X,Y)$ where a response $Y$ is linked to a covariate $X$ through the model $Y= m_0(X) + epsilon$, with $m_0$ the (unknown) monotone regression function and $epsilon$ the unobserved error (assumed to be independent of $X$). In the unlinked regression setting one gets only to observe a vector of realizations from both the response $Y$ and from the covariate $X$ where now $Y stackrel{d}{=} m_0(X) + epsilon$. There is no (observed) pairing of $X$ and $Y$. Despite this, it is actually still possible to derive a consistent non-parametric estimator of $m_0$ under the assumption of monotonicity of $m_0$ and knowledge of the distribution of the noise $epsilon$. In this paper, we establish an upper bound on the rate of convergence of such an estimator under minimal assumption on the distribution of the covariate $X$. We discuss extensions to the case in which the distribution of the noise is unknown. We develop a gradient-descent-based algorithm for its computation, and we demonstrate its use on synthetic data. Finally, we apply our method (in a fully data driven way, without knowledge of the error distribution) on longitudinal data from the US Consumer Expenditure Survey.

当未知的回归曲线是单调的时，我们考虑所谓的单变量非链接(有时“解耦”或“洗牌”)回归。在标准单调回归中，人们观察到一对$(X,Y)$，其中响应$Y$通过模型$Y= m_0(X) + epsilon$链接到协变量$X$，其中$m_0$是(未知的)单调回归函数，$epsilon$是未观察到的误差(假设与$X$无关)。在非链接回归设置中，人们只能从响应$Y$和协变量$X$中观察到实现向量，其中现在$Y stackrel{d}{=} m_0(X) + epsilon$。没有(观察到的)$X$和$Y$的配对。尽管如此，在假设$m_0$单调性和知道噪声$epsilon$分布的情况下，实际上仍然可以推导出$m_0$的一致非参数估计量。本文在协变量$X$分布的最小假设下，建立了这类估计量收敛速率的上界。我们讨论了噪声分布未知情况下的扩展。我们开发了一种基于梯度下降的算法来计算它，并演示了它在合成数据上的应用。最后，我们将我们的方法(以完全数据驱动的方式，不知道误差分布)应用于美国消费者支出调查的纵向数据。

{"title":"Unlinked Monotone Regression","authors":"F. Balabdaoui, Charles R. Doss, C. Durot","doi":"10.3929/ETHZ-B-000501663","DOIUrl":"https://doi.org/10.3929/ETHZ-B-000501663","url":null,"abstract":"We consider so-called univariate unlinked (sometimes \"decoupled,\" or \"shuffled\") regression when the unknown regression curve is monotone. In standard monotone regression, one observes a pair $(X,Y)$ where a response $Y$ is linked to a covariate $X$ through the model $Y= m_0(X) + epsilon$, with $m_0$ the (unknown) monotone regression function and $epsilon$ the unobserved error (assumed to be independent of $X$). In the unlinked regression setting one gets only to observe a vector of realizations from both the response $Y$ and from the covariate $X$ where now $Y stackrel{d}{=} m_0(X) + epsilon$. There is no (observed) pairing of $X$ and $Y$. Despite this, it is actually still possible to derive a consistent non-parametric estimator of $m_0$ under the assumption of monotonicity of $m_0$ and knowledge of the distribution of the noise $epsilon$. In this paper, we establish an upper bound on the rate of convergence of such an estimator under minimal assumption on the distribution of the covariate $X$. We discuss extensions to the case in which the distribution of the noise is unknown. We develop a gradient-descent-based algorithm for its computation, and we demonstrate its use on synthetic data. Finally, we apply our method (in a fully data driven way, without knowledge of the error distribution) on longitudinal data from the US Consumer Expenditure Survey.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"28 6 1","pages":"172:1-172:60"},"PeriodicalIF":0.0,"publicationDate":"2020-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78190400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

J. Mach. Learn. Res.

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀