Journal of Machine Learning Research最新文献

英文中文

Reinforcement Learning with Function Approximation: From Linear to Nonlinear 函数逼近的强化学习:从线性到非线性

3区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Journal of Machine Learning Research

Pub Date : 2023-06-01 DOI: 10.4208/jml.230105

Jihao Long and Jiequn Han

引用次数: 0

Why Self-Attention is Natural for Sequence-to-Sequence Problems? A Perspective from Symmetries 为什么自我关注是序列对序列问题的自然表现?从对称角度看问题

3区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Journal of Machine Learning Research

Pub Date : 2023-06-01 DOI: 10.4208/jml.221206

Chao Ma and Lexing Ying null

引用次数: 0

Selective inference for k-means clustering. k-means 聚类的选择性推理。

IF 4.3 3区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Journal of Machine Learning Research

Pub Date : 2023-05-01

Yiqun T Chen, Daniela M Witten

We consider the problem of testing for a difference in means between clusters of observations identified via $k$ -means clustering. In this setting, classical hypothesis tests lead to an inflated Type I error rate. In recent work, Gao et al. (2022) considered a related problem in the context of hierarchical clustering. Unfortunately, their solution is highly-tailored to the context of hierarchical clustering, and thus cannot be applied in the setting of $k$ -means clustering. In this paper, we propose a p-value that conditions on all of the intermediate clustering assignments in the $k$ -means algorithm. We show that the p-value controls the selective Type I error for a test of the difference in means between a pair of clusters obtained using $k$ -means clustering in finite samples, and can be efficiently computed. We apply our proposal on hand-written digits data and on single-cell RNA-sequencing data.

我们考虑的问题是检验通过 k-means 聚类确定的观测数据聚类之间的均值差异。在这种情况下，经典的假设检验会导致 I 类错误率上升。在最近的工作中，Gao 等人（2022 年）考虑了分层聚类背景下的相关问题。遗憾的是，他们的解决方案与分层聚类的背景高度契合，因此无法应用于 k-means 聚类。在本文中，我们提出了一个 p 值，它是 k-means 算法中所有中间聚类分配的条件。我们证明，该 p 值可以控制在有限样本中使用 k-means 聚类对一对聚类的均值差异进行检验时的选择性 I 类错误，并且可以高效计算。我们将我们的建议应用于手写数字数据和单细胞 RNA 序列数据。

引用次数: 0

RNN-Attention Based Deep Learning for Solving Inverse Boundary Problems in Nonlinear Marshak Waves 基于rnn -注意力的深度学习求解非线性马沙克波反边界问题

IF 6 3区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Journal of Machine Learning Research

Pub Date : 2023-04-01 DOI: 10.4208/jml.221209

Di Zhao, Weiming Li, Wengu Chen, Peng Song, and Han Wang null

. Radiative transfer, described by the radiative transfer equation (RTE), is one of the dominant energy exchange processes in the inertial conﬁnement fusion (ICF) experiments. The Marshak wave problem is an important benchmark for time-dependent RTE. In this work, we present a neural network architecture termed RNN-attention deep learning (RADL) as a surrogate model to solve the inverse boundary problem of the nonlinear Marshak wave in a data-driven fashion. We train the surrogate model by numerical simulation data of the forward problem, and then solve the inverse problem by minimizing the distance between the target solution and the surrogate predicted solution concerning the boundary condition. This minimization is made efﬁcient because the surrogate model by-passes the expensive numerical solution, and the model is differentiable so the gradient-based optimization algorithms are adopted. The effectiveness of our approach is demonstrated by solving the inverse boundary problems of the Marshak wave benchmark in two case studies: where the transport process is modeled by RTE and where it is modeled by its nonlinear diffusion approximation (DA). Last but not least, the importance of using both the RNN and the factor-attention blocks in the RADL model is illustrated, and the data efﬁciency of our model is investigated in this work.

。辐射传递是惯性约束聚变(ICF)实验中主要的能量交换过程之一，用辐射传递方程(RTE)来描述。马沙克波问题是时变RTE的一个重要基准。在这项工作中，我们提出了一种称为rnn -注意力深度学习(RADL)的神经网络架构作为代理模型，以数据驱动的方式解决非线性马沙克波的逆边界问题。我们利用正演问题的数值模拟数据训练代理模型，然后在边界条件下通过最小化目标解与代理预测解之间的距离来求解逆问题。由于替代模型绕过了昂贵的数值解，并且模型是可微的，因此采用了基于梯度的优化算法，从而使这种最小化变得高效。通过在两个案例研究中解决马沙克波基准的逆边界问题，我们的方法的有效性得到了证明:其中输运过程是由RTE建模的，而它是由其非线性扩散近似(DA)建模的。最后，说明了在RADL模型中同时使用RNN和因子注意块的重要性，并对我们的模型的数据效率进行了研究。

{"title":"RNN-Attention Based Deep Learning for Solving Inverse Boundary Problems in Nonlinear Marshak Waves","authors":"Di Zhao, Weiming Li, Wengu Chen, Peng Song, and Han Wang null","doi":"10.4208/jml.221209","DOIUrl":"https://doi.org/10.4208/jml.221209","url":null,"abstract":". Radiative transfer, described by the radiative transfer equation (RTE), is one of the dominant energy exchange processes in the inertial conﬁnement fusion (ICF) experiments. The Marshak wave problem is an important benchmark for time-dependent RTE. In this work, we present a neural network architecture termed RNN-attention deep learning (RADL) as a surrogate model to solve the inverse boundary problem of the nonlinear Marshak wave in a data-driven fashion. We train the surrogate model by numerical simulation data of the forward problem, and then solve the inverse problem by minimizing the distance between the target solution and the surrogate predicted solution concerning the boundary condition. This minimization is made efﬁcient because the surrogate model by-passes the expensive numerical solution, and the model is differentiable so the gradient-based optimization algorithms are adopted. The effectiveness of our approach is demonstrated by solving the inverse boundary problems of the Marshak wave benchmark in two case studies: where the transport process is modeled by RTE and where it is modeled by its nonlinear diffusion approximation (DA). Last but not least, the importance of using both the RNN and the factor-attention blocks in the RADL model is illustrated, and the data efﬁciency of our model is investigated in this work.","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"75 1","pages":""},"PeriodicalIF":6.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74640699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Inference for Gaussian Processes with Matérn Covariogram on Compact Riemannian Manifolds. 紧凑黎曼曼形上具有马特恩协方差的高斯过程推理

IF 6 3区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Journal of Machine Learning Research

Pub Date : 2023-03-01

Didong Li, Wenpin Tang, Sudipto Banerjee

Gaussian processes are widely employed as versatile modelling and predictive tools in spatial statistics, functional data analysis, computer modelling and diverse applications of machine learning. They have been widely studied over Euclidean spaces, where they are specified using covariance functions or covariograms for modelling complex dependencies. There is a growing literature on Gaussian processes over Riemannian manifolds in order to develop richer and more flexible inferential frameworks for non-Euclidean data. While numerical approximations through graph representations have been well studied for the Matérn covariogram and heat kernel, the behaviour of asymptotic inference on the parameters of the covariogram has received relatively scant attention. We focus on asymptotic behaviour for Gaussian processes constructed over compact Riemannian manifolds. Building upon a recently introduced Matérn covariogram on a compact Riemannian manifold, we employ formal notions and conditions for the equivalence of two Matérn Gaussian random measures on compact manifolds to derive the parameter that is identifiable, also known as the microergodic parameter, and formally establish the consistency of the maximum likelihood estimate and the asymptotic optimality of the best linear unbiased predictor. The circle is studied as a specific example of compact Riemannian manifolds with numerical experiments to illustrate and corroborate the theory.

高斯过程是空间统计学、函数数据分析、计算机建模和机器学习各种应用中广泛使用的通用建模和预测工具。人们对欧几里得空间上的高斯过程进行了广泛的研究，利用协方差函数或协方差图对复杂的依赖关系进行建模。关于黎曼流形上的高斯过程的文献越来越多，以便为非欧几里得数据开发更丰富、更灵活的推理框架。虽然通过图形表示对马特恩协方差和热核的数值近似进行了深入研究，但对协方差参数的渐近推断行为的关注却相对较少。我们重点研究在紧凑黎曼流形上构建的高斯过程的渐近行为。以最近引入的紧凑黎曼流形上的马特恩协变图为基础，我们采用紧凑流形上两个马特恩高斯随机度量等价的形式化概念和条件，推导出可识别的参数（也称为微角参数），并正式建立最大似然估计的一致性和最佳线性无偏预测器的渐近最优性。我们将圆作为紧凑黎曼流形的一个具体实例进行研究，并通过数值实验来说明和证实这一理论。

{"title":"Inference for Gaussian Processes with Matérn Covariogram on Compact Riemannian Manifolds.","authors":"Didong Li, Wenpin Tang, Sudipto Banerjee","doi":"","DOIUrl":"","url":null,"abstract":"Gaussian processes are widely employed as versatile modelling and predictive tools in spatial statistics, functional data analysis, computer modelling and diverse applications of machine learning. They have been widely studied over Euclidean spaces, where they are specified using covariance functions or covariograms for modelling complex dependencies. There is a growing literature on Gaussian processes over Riemannian manifolds in order to develop richer and more flexible inferential frameworks for non-Euclidean data. While numerical approximations through graph representations have been well studied for the Matérn covariogram and heat kernel, the behaviour of asymptotic inference on the parameters of the covariogram has received relatively scant attention. We focus on asymptotic behaviour for Gaussian processes constructed over compact Riemannian manifolds. Building upon a recently introduced Matérn covariogram on a compact Riemannian manifold, we employ formal notions and conditions for the equivalence of two Matérn Gaussian random measures on compact manifolds to derive the parameter that is identifiable, also known as the microergodic parameter, and formally establish the consistency of the maximum likelihood estimate and the asymptotic optimality of the best linear unbiased predictor. The circle is studied as a specific example of compact Riemannian manifolds with numerical experiments to illustrate and corroborate the theory.","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"24 ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10361735/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9876354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bayesian Data Selection. 贝叶斯数据选择。

IF 6 3区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Journal of Machine Learning Research

Pub Date : 2023-01-01

Eli N Weinstein, Jeffrey W Miller

Insights into complex, high-dimensional data can be obtained by discovering features of the data that match or do not match a model of interest. To formalize this task, we introduce the "data selection" problem: finding a lower-dimensional statistic-such as a subset of variables-that is well fit by a given parametric model of interest. A fully Bayesian approach to data selection would be to parametrically model the value of the statistic, nonparametrically model the remaining "background" components of the data, and perform standard Bayesian model selection for the choice of statistic. However, fitting a nonparametric model to high-dimensional data tends to be highly inefficient, statistically and computationally. We propose a novel score for performing data selection, the "Stein volume criterion (SVC)", that does not require fitting a nonparametric model. The SVC takes the form of a generalized marginal likelihood with a kernelized Stein discrepancy in place of the Kullback-Leibler divergence. We prove that the SVC is consistent for data selection, and establish consistency and asymptotic normality of the corresponding generalized posterior on parameters. We apply the SVC to the analysis of single-cell RNA sequencing data sets using probabilistic principal components analysis and a spin glass model of gene regulation.

通过发现与感兴趣的模型匹配或不匹配的数据特征，可以获得对复杂高维数据的洞察。为了形式化这个任务，我们引入了“数据选择”问题:找到一个较低维的统计量——比如变量的子集——它与给定的参数模型很好地拟合。数据选择的完全贝叶斯方法是对统计值进行参数化建模，对数据的剩余“背景”成分进行非参数化建模，并对统计值的选择执行标准贝叶斯模型选择。然而，拟合一个非参数模型到高维数据往往是非常低效的，统计和计算。我们提出了一种用于执行数据选择的新评分，即“Stein体积准则(SVC)”，它不需要拟合非参数模型。SVC采用广义边际似然的形式，用核化的Stein差异代替Kullback-Leibler散度。证明了SVC在数据选择上是一致的，并建立了相应的广义后验在参数上的一致性和渐近正态性。我们使用概率主成分分析和基因调控的自旋玻璃模型将SVC应用于单细胞RNA测序数据集的分析。

{"title":"Bayesian Data Selection.","authors":"Eli N Weinstein, Jeffrey W Miller","doi":"","DOIUrl":"","url":null,"abstract":"Insights into complex, high-dimensional data can be obtained by discovering features of the data that match or do not match a model of interest. To formalize this task, we introduce the \"data selection\" problem: finding a lower-dimensional statistic-such as a subset of variables-that is well fit by a given parametric model of interest. A fully Bayesian approach to data selection would be to parametrically model the value of the statistic, nonparametrically model the remaining \"background\" components of the data, and perform standard Bayesian model selection for the choice of statistic. However, fitting a nonparametric model to high-dimensional data tends to be highly inefficient, statistically and computationally. We propose a novel score for performing data selection, the \"Stein volume criterion (SVC)\", that does not require fitting a nonparametric model. The SVC takes the form of a generalized marginal likelihood with a kernelized Stein discrepancy in place of the Kullback-Leibler divergence. We prove that the SVC is consistent for data selection, and establish consistency and asymptotic normality of the corresponding generalized posterior on parameters. We apply the SVC to the analysis of single-cell RNA sequencing data sets using probabilistic principal components analysis and a spin glass model of gene regulation.","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"24 23","pages":""},"PeriodicalIF":6.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10194814/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9574086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DART: Distance Assisted Recursive Testing. DART：距离辅助递归测试。

IF 4.3 3区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Journal of Machine Learning Research

Pub Date : 2023-01-01

Xuechan Li, Anthony D Sung, Jichun Xie

Multiple testing is a commonly used tool in modern data science. Sometimes, the hypotheses are embedded in a space; the distances between the hypotheses reflect their co-null/co-alternative patterns. Properly incorporating the distance information in testing will boost testing power. Hence, we developed a new multiple testing framework named Distance Assisted Recursive Testing (DART). DART features in joint artificial intelligence (AI) and statistics modeling. It has two stages. The first stage uses AI models to construct an aggregation tree that reflects the distance information. The second stage uses statistical models to embed the testing on the tree and control the false discovery rate. Theoretical analysis and numerical experiments demonstrated that DART generates valid, robust, and powerful results. We applied DART to a clinical trial in the allogeneic stem cell transplantation study to identify the gut microbiota whose abundance was impacted by post-transplant care.

多重测试是现代数据科学常用的工具。有时，假设被嵌入一个空间；假设之间的距离反映了它们的共空/共变模式。在测试中适当纳入距离信息将提高测试能力。因此，我们开发了一种新的多重测试框架，名为 "距离辅助递归测试（DART）"。DART 的特点是联合人工智能（AI）和统计建模。它分为两个阶段。第一阶段使用人工智能模型构建反映距离信息的聚合树。第二阶段使用统计模型对聚合树进行嵌入测试并控制误发现率。理论分析和数值实验证明，DART 能生成有效、稳健和强大的结果。我们将 DART 应用于异体干细胞移植研究中的一项临床试验，以确定其丰度受移植后护理影响的肠道微生物群。

引用次数: 0

Inference for a Large Directed Acyclic Graph with Unspecified Interventions. 具有未指定干预的大有向非循环图的推理。

IF 6 3区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Journal of Machine Learning Research

Pub Date : 2023-01-01

Chunlin Li, Xiaotong Shen, Wei Pan

Statistical inference of directed relations given some unspecified interventions (i.e., the intervention targets are unknown) is challenging. In this article, we test hypothesized directed relations with unspecified interventions. First, we derive conditions to yield an identifiable model. Unlike classical inference, testing directed relations requires to identify the ancestors and relevant interventions of hypothesis-specific primary variables. To this end, we propose a peeling algorithm based on nodewise regressions to establish a topological order of primary variables. Moreover, we prove that the peeling algorithm yields a consistent estimator in low-order polynomial time. Second, we propose a likelihood ratio test integrated with a data perturbation scheme to account for the uncertainty of identifying ancestors and interventions. Also, we show that the distribution of a data perturbation test statistic converges to the target distribution. Numerical examples demonstrate the utility and effectiveness of the proposed methods, including an application to infer gene regulatory networks. The R implementation is available at https://github.com/chunlinli/intdag.

在给定一些未指明的干预措施（即干预目标未知）的情况下，对定向关系进行统计推断是具有挑战性的。在这篇文章中，我们测试了假设的直接关系与未指明的干预措施。首先，我们导出了产生可识别模型的条件。与经典推理不同，测试定向关系需要识别特定假设的主要变量的祖先和相关干预。为此，我们提出了一种基于节点回归的剥离算法来建立主变量的拓扑顺序。此外，我们证明了剥离算法在低阶多项式时间内产生了一致的估计量。其次，我们提出了一种与数据扰动方案相结合的似然比检验，以解释识别祖先和干预措施的不确定性。此外，我们还证明了数据扰动测试统计量的分布收敛于目标分布。数值例子证明了所提出的方法的实用性和有效性，包括推断基因调控网络的应用。R的实施可在https://github.com/chunlinli/intdag.

{"title":"Inference for a Large Directed Acyclic Graph with Unspecified Interventions.","authors":"Chunlin Li, Xiaotong Shen, Wei Pan","doi":"","DOIUrl":"","url":null,"abstract":"Statistical inference of directed relations given some unspecified interventions (i.e., the intervention targets are unknown) is challenging. In this article, we test hypothesized directed relations with unspecified interventions. First, we derive conditions to yield an identifiable model. Unlike classical inference, testing directed relations requires to identify the ancestors and relevant interventions of hypothesis-specific primary variables. To this end, we propose a peeling algorithm based on nodewise regressions to establish a topological order of primary variables. Moreover, we prove that the peeling algorithm yields a consistent estimator in low-order polynomial time. Second, we propose a likelihood ratio test integrated with a data perturbation scheme to account for the uncertainty of identifying ancestors and interventions. Also, we show that the distribution of a data perturbation test statistic converges to the target distribution. Numerical examples demonstrate the utility and effectiveness of the proposed methods, including an application to infer gene regulatory networks. The R implementation is available at https://github.com/chunlinli/intdag.","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"24 ","pages":""},"PeriodicalIF":6.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10497226/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10242964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fair Data Representation for Machine Learning at the Pareto Frontier. 帕累托前沿机器学习的公平数据表示

IF 4.3 3区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Journal of Machine Learning Research

Pub Date : 2023-01-01

Shizhou Xu, Thomas Strohmer

As machine learning powered decision-making becomes increasingly important in our daily lives, it is imperative to strive for fairness in the underlying data processing. We propose a pre-processing algorithm for fair data representation via which $L^{2} (ℙ)$ -objective supervised learning results in estimations of the Pareto frontier between prediction error and statistical disparity. Particularly, the present work applies the optimal affine transport to approach the post-processing Wasserstein barycenter characterization of the optimal fair $L^{2}$ -objective supervised learning via a pre-processing data deformation. Furthermore, we show that the Wasserstein geodesics from learning outcome marginals to their barycenter characterizes the Pareto frontier between $L^{2}$ -loss and total Wasserstein distance among the marginals. Numerical simulations underscore the advantages: (1) the pre-processing step is compositive with arbitrary conditional expectation estimation supervised learning methods and unseen data; (2) the fair representation protects the sensitive information by limiting the inference capability of the remaining data with respect to the sensitive data; (3) the optimal affine maps are computationally efficient even for high-dimensional data.

随着机器学习驱动的决策在我们的日常生活中变得越来越重要，在底层数据处理中力求公平势在必行。我们提出了一种用于公平数据表示的预处理算法，通过这种算法，目标监督学习可以估计预测误差和统计差异之间的帕累托前沿。特别是，本研究应用最优仿射传输，通过预处理数据变形，接近最优公平 L 2 目标监督学习的后处理 Wasserstein barycenter 特性。此外，我们还证明了从学习结果边际到其原点的瓦瑟斯坦大地线表征了边际间的 L 2 -损失和总瓦瑟斯坦距离之间的帕累托前沿。数值模拟证明了该方法的优势：（1）预处理步骤与任意条件期望估计监督学习方法和未见数据具有可比性；（2）公平表示通过限制其余数据相对于敏感数据的推理能力来保护敏感信息；（3）即使对于高维数据，最优仿射图的计算效率也很高。

{"title":"Fair Data Representation for Machine Learning at the Pareto Frontier.","authors":"Shizhou Xu, Thomas Strohmer","doi":"","DOIUrl":"","url":null,"abstract":"As machine learning powered decision-making becomes increasingly important in our daily lives, it is imperative to strive for fairness in the underlying data processing. We propose a pre-processing algorithm for fair data representation via which <math> <mrow><msup><mi>L</mi> <mn>2</mn></msup> <mo>(</mo> <mtext>ℙ</mtext> <mo>)</mo></mrow> </math> -objective supervised learning results in estimations of the Pareto frontier between prediction error and statistical disparity. Particularly, the present work applies the optimal affine transport to approach the post-processing Wasserstein barycenter characterization of the optimal fair <math> <mrow><msup><mi>L</mi> <mn>2</mn></msup> </mrow> </math> -objective supervised learning via a pre-processing data deformation. Furthermore, we show that the Wasserstein geodesics from learning outcome marginals to their barycenter characterizes the Pareto frontier between <math> <mrow><msup><mi>L</mi> <mn>2</mn></msup> </mrow> </math> -loss and total Wasserstein distance among the marginals. Numerical simulations underscore the advantages: (1) the pre-processing step is compositive with arbitrary conditional expectation estimation supervised learning methods and unseen data; (2) the fair representation protects the sensitive information by limiting the inference capability of the remaining data with respect to the sensitive data; (3) the optimal affine maps are computationally efficient even for high-dimensional data.","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"24 ","pages":""},"PeriodicalIF":4.3,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11494318/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142512129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Minimax Estimation for Personalized Federated Learning: An Alternative between FedAvg and Local Training? 个性化联合学习的最小估计：FedAvg 和本地训练之间的替代方案？

IF 4.3 3区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Journal of Machine Learning Research

Pub Date : 2023-01-01

Shuxiao Chen, Qinqing Zheng, Qi Long, Weijie J Su

A widely recognized difficulty in federated learning arises from the statistical heterogeneity among clients: local datasets often originate from distinct yet not entirely unrelated probability distributions, and personalization is, therefore, necessary to achieve optimal results from each individual's perspective. In this paper, we show how the excess risks of personalized federated learning using a smooth, strongly convex loss depend on data heterogeneity from a minimax point of view, with a focus on the FedAvg algorithm (McMahan et al., 2017) and pure local training (i.e., clients solve empirical risk minimization problems on their local datasets without any communication). Our main result reveals an approximate alternative between these two baseline algorithms for federated learning: the former algorithm is minimax rate optimal over a collection of instances when data heterogeneity is small, whereas the latter is minimax rate optimal when data heterogeneity is large, and the threshold is sharp up to a constant. As an implication, our results show that from a worst-case point of view, a dichotomous strategy that makes a choice between the two baseline algorithms is rate-optimal. Another implication is that the popular FedAvg following by local fine tuning strategy is also minimax optimal under additional regularity conditions. Our analysis relies on a new notion of algorithmic stability that takes into account the nature of federated learning.

联合学习中一个公认的难题来自于客户之间的统计异质性：本地数据集通常来自不同但并非完全无关的概率分布，因此，要想从每个人的角度获得最佳结果，就必须实现个性化。在本文中，我们从最小化的角度展示了使用平滑、强凸损失的个性化联合学习的超额风险如何取决于数据异质性，重点关注 FedAvg 算法（McMahan 等人，2017 年）和纯本地训练（即客户在不进行任何交流的情况下解决其本地数据集上的经验风险最小化问题）。我们的主要结果揭示了这两种联合学习基线算法之间的近似替代方案：当数据异质性较小时，前一种算法在实例集合上是最小率最优的，而当数据异质性较大且阈值尖锐到一个常数时，后一种算法是最小率最优的。我们的结果表明，从最坏情况的角度来看，在两种基准算法之间做出选择的二分法策略是速率最优的。另一个含义是，在额外的规则性条件下，流行的 FedAvg 跟随局部微调策略也是最小最优的。我们的分析依赖于一个新的算法稳定性概念，它考虑到了联合学习的本质。

{"title":"Minimax Estimation for Personalized Federated Learning: An Alternative between FedAvg and Local Training?","authors":"Shuxiao Chen, Qinqing Zheng, Qi Long, Weijie J Su","doi":"","DOIUrl":"","url":null,"abstract":"A widely recognized difficulty in federated learning arises from the statistical heterogeneity among clients: local datasets often originate from distinct yet not entirely unrelated probability distributions, and personalization is, therefore, necessary to achieve optimal results from each individual's perspective. In this paper, we show how the excess risks of personalized federated learning using a smooth, strongly convex loss depend on data heterogeneity from a minimax point of view, with a focus on the FedAvg algorithm (McMahan et al., 2017) and pure local training (i.e., clients solve empirical risk minimization problems on their local datasets without any communication). Our main result reveals an approximate alternative between these two baseline algorithms for federated learning: the former algorithm is minimax rate optimal over a collection of instances when data heterogeneity is small, whereas the latter is minimax rate optimal when data heterogeneity is large, and the threshold is sharp up to a constant. As an implication, our results show that from a worst-case point of view, a dichotomous strategy that makes a choice between the two baseline algorithms is rate-optimal. Another implication is that the popular FedAvg following by local fine tuning strategy is also minimax optimal under additional regularity conditions. Our analysis relies on a new notion of algorithmic stability that takes into account the nature of federated learning.","PeriodicalId":50161,"journal":{"name":"Journal of Machine Learning Research","volume":"24 ","pages":""},"PeriodicalIF":4.3,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11299893/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141895178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Journal of Machine Learning Research

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀