J. Mach. Learn. Res.最新文献

英文中文

FLIP: A Utility Preserving Privacy Mechanism for Time Series FLIP:一种实用的时间序列隐私保护机制

J. Mach. Learn. Res.

Pub Date : 2022-07-15 DOI: 10.48550/arXiv.2207.07721

T. McElroy, A. Roy, Gaurab Hore

Guaranteeing privacy in released data is an important goal for data-producing agencies. There has been extensive research on developing suitable privacy mechanisms in recent years. Particularly notable is the idea of noise addition with the guarantee of differential privacy. There are, however, concerns about compromising data utility when very stringent privacy mechanisms are applied. Such compromises can be quite stark in correlated data, such as time series data. Adding white noise to a stochastic process may significantly change the correlation structure, a facet of the process that is essential to optimal prediction. We propose the use of all-pass filtering as a privacy mechanism for regularly sampled time series data, showing that this procedure preserves utility while also providing sufficient privacy guarantees to entity-level time series.

保障发布数据的隐私性是数据产生机构的重要目标。近年来，人们对开发合适的隐私机制进行了广泛的研究。特别值得注意的是噪声添加与差分隐私保证的思想。然而，当应用非常严格的隐私机制时，存在损害数据效用的担忧。这种折衷在相关数据(如时间序列数据)中可能非常明显。在随机过程中加入白噪声可能会显著改变相关结构，这是过程的一个方面，对最佳预测至关重要。我们建议使用全通滤波作为定期采样时间序列数据的隐私机制，表明该过程保留了实用性，同时也为实体级时间序列提供了足够的隐私保证。

引用次数: 0

The d-separation criterion in Categorical Probability 分类概率中的d-分离准则

J. Mach. Learn. Res.

Pub Date : 2022-07-12 DOI: 10.48550/arXiv.2207.05740

T. Fritz, Andreas Klingler

The d-separation criterion detects the compatibility of a joint probability distribution with a directed acyclic graph through certain conditional independences. In this work, we study this problem in the context of categorical probability theory by introducing a categorical definition of causal models, a categorical notion of d-separation, and proving an abstract version of the d-separation criterion. This approach has two main benefits. First, categorical d-separation is a very intuitive criterion based on topological connectedness. Second, our results apply both to measure-theoretic probability (with standard Borel spaces) and beyond probability theory, including to deterministic and possibilistic networks. It therefore provides a clean proof of the equivalence of local and global Markov properties with causal compatibility for continuous and mixed random variables as well as deterministic and possibilistic variables.

d分离准则通过一定的条件独立性来检测联合概率分布与有向无环图的相容性。在这项工作中，我们通过引入因果模型的分类定义，d-分离的分类概念，并证明d-分离准则的抽象版本，在分类概率论的背景下研究这个问题。这种方法有两个主要好处。首先，分类d分离是一个基于拓扑连通性的非常直观的标准。其次，我们的结果既适用于测量论概率(标准Borel空间)，也适用于概率论之外，包括确定性和可能性网络。因此，它为连续和混合随机变量以及确定性和可能性变量提供了具有因果相容性的局部和全局马尔可夫性质的等价性的清晰证明。

引用次数: 10

q-Learning in Continuous Time 连续时间的q-学习

J. Mach. Learn. Res.

Pub Date : 2022-07-02 DOI: 10.48550/arXiv.2207.00713

Yanwei Jia, X. Zhou

We study the continuous-time counterpart of Q-learning for reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation introduced by Wang et al. (2020). As the conventional (big) Q-function collapses in continuous time, we consider its first-order approximation and coin the term ``(little) q-function". This function is related to the instantaneous advantage rate function as well as the Hamiltonian. We develop a ``q-learning"theory around the q-function that is independent of time discretization. Given a stochastic policy, we jointly characterize the associated q-function and value function by martingale conditions of certain stochastic processes, in both on-policy and off-policy settings. We then apply the theory to devise different actor-critic algorithms for solving underlying RL problems, depending on whether or not the density function of the Gibbs measure generated from the q-function can be computed explicitly. One of our algorithms interprets the well-known Q-learning algorithm SARSA, and another recovers a policy gradient (PG) based continuous-time algorithm proposed in Jia and Zhou (2022b). Finally, we conduct simulation experiments to compare the performance of our algorithms with those of PG-based algorithms in Jia and Zhou (2022b) and time-discretized conventional Q-learning algorithms.

我们在Wang等人(2020)引入的熵正则化、探索性扩散过程公式下，研究了q -学习的连续时间对对物，用于强化学习(RL)。由于常规的(大)q函数在连续时间内坍缩，我们考虑它的一阶近似，并引入“(小)q函数”一词。这个函数与瞬时优势率函数以及哈密顿函数有关。我们围绕q函数开发了一个独立于时间离散化的“q学习”理论。给定一个随机策略，在策略上和非策略下，我们用鞅条件共同刻画了相关的q函数和值函数。然后，我们应用该理论设计不同的行为者批评算法来解决潜在的RL问题，这取决于是否可以显式计算由q函数生成的吉布斯测度的密度函数。我们的一种算法解释了著名的Q-learning算法SARSA，另一种算法恢复了Jia和Zhou (2022b)提出的基于策略梯度(PG)的连续时间算法。最后，我们进行了仿真实验，将我们的算法与Jia和Zhou (2022b)中基于pg的算法和时间离散的传统q -学习算法的性能进行了比较。

{"title":"q-Learning in Continuous Time","authors":"Yanwei Jia, X. Zhou","doi":"10.48550/arXiv.2207.00713","DOIUrl":"https://doi.org/10.48550/arXiv.2207.00713","url":null,"abstract":"We study the continuous-time counterpart of Q-learning for reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation introduced by Wang et al. (2020). As the conventional (big) Q-function collapses in continuous time, we consider its first-order approximation and coin the term ``(little) q-function\". This function is related to the instantaneous advantage rate function as well as the Hamiltonian. We develop a ``q-learning\"theory around the q-function that is independent of time discretization. Given a stochastic policy, we jointly characterize the associated q-function and value function by martingale conditions of certain stochastic processes, in both on-policy and off-policy settings. We then apply the theory to devise different actor-critic algorithms for solving underlying RL problems, depending on whether or not the density function of the Gibbs measure generated from the q-function can be computed explicitly. One of our algorithms interprets the well-known Q-learning algorithm SARSA, and another recovers a policy gradient (PG) based continuous-time algorithm proposed in Jia and Zhou (2022b). Finally, we conduct simulation experiments to compare the performance of our algorithms with those of PG-based algorithms in Jia and Zhou (2022b) and time-discretized conventional Q-learning algorithms.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"17 1","pages":"161:1-161:61"},"PeriodicalIF":0.0,"publicationDate":"2022-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81941524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Self-Healing Robust Neural Networks via Closed-Loop Control 基于闭环控制的自愈鲁棒神经网络

J. Mach. Learn. Res.

Pub Date : 2022-06-26 DOI: 10.48550/arXiv.2206.12963

Zhuotong Chen, Qianxiao Li, Zheng Zhang

Despite the wide applications of neural networks, there have been increasing concerns about their vulnerability issue. While numerous attack and defense techniques have been developed, this work investigates the robustness issue from a new angle: can we design a self-healing neural network that can automatically detect and fix the vulnerability issue by itself? A typical self-healing mechanism is the immune system of a human body. This biology-inspired idea has been used in many engineering designs but is rarely investigated in deep learning. This paper considers the post-training self-healing of a neural network, and proposes a closed-loop control formulation to automatically detect and fix the errors caused by various attacks or perturbations. We provide a margin-based analysis to explain how this formulation can improve the robustness of a classifier. To speed up the inference of the proposed self-healing network, we solve the control problem via improving the Pontryagin Maximum Principle-based solver. Lastly, we present an error estimation of the proposed framework for neural networks with nonlinear activation functions. We validate the performance on several network architectures against various perturbations. Since the self-healing method does not need a-priori information about data perturbations/attacks, it can handle a broad class of unforeseen perturbations.

神经网络在广泛应用的同时，其脆弱性问题也日益受到关注。虽然已经开发了许多攻击和防御技术，但这项工作从一个新的角度研究了鲁棒性问题:我们能否设计一个自我修复的神经网络，它可以自动检测和修复漏洞问题?典型的自我修复机制是人体的免疫系统。这种受生物学启发的想法已被用于许多工程设计中，但很少在深度学习中进行研究。本文考虑了神经网络的训练后自愈问题，提出了一种闭环控制公式来自动检测和修复各种攻击或扰动引起的误差。我们提供了一个基于边际的分析来解释这个公式如何提高分类器的鲁棒性。为了加快所提出的自愈网络的推理速度，我们通过改进基于庞特里亚金极大原理的求解器来解决控制问题。最后，我们给出了具有非线性激活函数的神经网络框架的误差估计。我们在几种网络架构上针对各种扰动验证了性能。由于自愈方法不需要关于数据扰动/攻击的先验信息，它可以处理广泛的不可预见的扰动。

{"title":"Self-Healing Robust Neural Networks via Closed-Loop Control","authors":"Zhuotong Chen, Qianxiao Li, Zheng Zhang","doi":"10.48550/arXiv.2206.12963","DOIUrl":"https://doi.org/10.48550/arXiv.2206.12963","url":null,"abstract":"Despite the wide applications of neural networks, there have been increasing concerns about their vulnerability issue. While numerous attack and defense techniques have been developed, this work investigates the robustness issue from a new angle: can we design a self-healing neural network that can automatically detect and fix the vulnerability issue by itself? A typical self-healing mechanism is the immune system of a human body. This biology-inspired idea has been used in many engineering designs but is rarely investigated in deep learning. This paper considers the post-training self-healing of a neural network, and proposes a closed-loop control formulation to automatically detect and fix the errors caused by various attacks or perturbations. We provide a margin-based analysis to explain how this formulation can improve the robustness of a classifier. To speed up the inference of the proposed self-healing network, we solve the control problem via improving the Pontryagin Maximum Principle-based solver. Lastly, we present an error estimation of the proposed framework for neural networks with nonlinear activation functions. We validate the performance on several network architectures against various perturbations. Since the self-healing method does not need a-priori information about data perturbations/attacks, it can handle a broad class of unforeseen perturbations.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"15 1","pages":"319:1-319:54"},"PeriodicalIF":0.0,"publicationDate":"2022-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84672069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

tntorch: Tensor Network Learning with PyTorch tntorch:使用PyTorch学习张量网络

J. Mach. Learn. Res.

Pub Date : 2022-06-22 DOI: 10.48550/arXiv.2206.11128

Mikhail (Misha) Usvyatsov, R. Ballester-Ripoll, K. Schindler

We present tntorch, a tensor learning framework that supports multiple decompositions (including Candecomp/Parafac, Tucker, and Tensor Train) under a unified interface. With our library, the user can learn and handle low-rank tensors with automatic differentiation, seamless GPU support, and the convenience of PyTorch's API. Besides decomposition algorithms, tntorch implements differentiable tensor algebra, rank truncation, cross-approximation, batch processing, comprehensive tensor arithmetics, and more.

我们提出了一个张量学习框架tntorch，它支持在统一接口下的多种分解(包括Candecomp/Parafac、Tucker和tensor Train)。使用我们的库，用户可以学习和处理低秩张量，具有自动微分，无缝GPU支持以及PyTorch API的便利性。除了分解算法，tntorch还实现了可微张量代数、秩截断、交叉逼近、批处理、综合张量算法等。

引用次数: 12

A General Theory for Federated Optimization with Asynchronous and Heterogeneous Clients Updates 异步和异构客户端更新联邦优化的一般理论

J. Mach. Learn. Res.

Pub Date : 2022-06-21 DOI: 10.48550/arXiv.2206.10189

Yann Fraboni, Richard Vidal, Laetitia Kameni, Marco Lorenzi

We propose a novel framework to study asynchronous federated learning optimization with delays in gradient updates. Our theoretical framework extends the standard FedAvg aggregation scheme by introducing stochastic aggregation weights to represent the variability of the clients update time, due for example to heterogeneous hardware capabilities. Our formalism applies to the general federated setting where clients have heterogeneous datasets and perform at least one step of stochastic gradient descent (SGD). We demonstrate convergence for such a scheme and provide sufficient conditions for the related minimum to be the optimum of the federated problem. We show that our general framework applies to existing optimization schemes including centralized learning, FedAvg, asynchronous FedAvg, and FedBuff. The theory here provided allows drawing meaningful guidelines for designing a federated learning experiment in heterogeneous conditions. In particular, we develop in this work FedFix, a novel extension of FedAvg enabling efficient asynchronous federated training while preserving the convergence stability of synchronous aggregation. We empirically demonstrate our theory on a series of experiments showing that asynchronous FedAvg leads to fast convergence at the expense of stability, and we finally demonstrate the improvements of FedFix over synchronous and asynchronous FedAvg.

我们提出了一个新的框架来研究具有梯度更新延迟的异步联邦学习优化。我们的理论框架扩展了标准fedag聚合方案，引入随机聚合权重来表示客户机更新时间的可变性，例如由于异构硬件功能。我们的形式化方法适用于一般的联邦设置，其中客户端具有异构数据集并执行至少一步的随机梯度下降(SGD)。证明了该方案的收敛性，并给出了相关最小值为联邦问题最优的充分条件。我们展示了我们的通用框架适用于现有的优化方案，包括集中式学习、fedag、异步fedag和FedBuff。这里提供的理论为在异构条件下设计联邦学习实验提供了有意义的指导。特别地，我们在这项工作中开发了FedFix，它是fedag的一个新扩展，在保持同步聚合的收敛稳定性的同时，实现了高效的异步联邦训练。我们通过一系列实验证明了我们的理论，表明异步fedag以牺牲稳定性为代价导致快速收敛，并且我们最终证明了FedFix相对于同步和异步fedag的改进。

{"title":"A General Theory for Federated Optimization with Asynchronous and Heterogeneous Clients Updates","authors":"Yann Fraboni, Richard Vidal, Laetitia Kameni, Marco Lorenzi","doi":"10.48550/arXiv.2206.10189","DOIUrl":"https://doi.org/10.48550/arXiv.2206.10189","url":null,"abstract":"We propose a novel framework to study asynchronous federated learning optimization with delays in gradient updates. Our theoretical framework extends the standard FedAvg aggregation scheme by introducing stochastic aggregation weights to represent the variability of the clients update time, due for example to heterogeneous hardware capabilities. Our formalism applies to the general federated setting where clients have heterogeneous datasets and perform at least one step of stochastic gradient descent (SGD). We demonstrate convergence for such a scheme and provide sufficient conditions for the related minimum to be the optimum of the federated problem. We show that our general framework applies to existing optimization schemes including centralized learning, FedAvg, asynchronous FedAvg, and FedBuff. The theory here provided allows drawing meaningful guidelines for designing a federated learning experiment in heterogeneous conditions. In particular, we develop in this work FedFix, a novel extension of FedAvg enabling efficient asynchronous federated training while preserving the convergence stability of synchronous aggregation. We empirically demonstrate our theory on a series of experiments showing that asynchronous FedAvg leads to fast convergence at the expense of stability, and we finally demonstrate the improvements of FedFix over synchronous and asynchronous FedAvg.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"1 1","pages":"110:1-110:43"},"PeriodicalIF":0.0,"publicationDate":"2022-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83351939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Preconditioned Gradient Descent for Overparameterized Nonconvex Burer-Monteiro Factorization with Global Optimality Certification 具有全局最优性证明的超参数化非凸Burer-Monteiro分解的预条件梯度下降

J. Mach. Learn. Res.

Pub Date : 2022-06-07 DOI: 10.48550/arXiv.2206.03345

G. Zhang, S. Fattahi, Richard Y. Zhang

We consider using gradient descent to minimize the nonconvex function $f(X)=phi(XX^{T})$ over an $ntimes r$ factor matrix $X$, in which $phi$ is an underlying smooth convex cost function defined over $ntimes n$ matrices. While only a second-order stationary point $X$ can be provably found in reasonable time, if $X$ is additionally rank deficient, then its rank deficiency certifies it as being globally optimal. This way of certifying global optimality necessarily requires the search rank $r$ of the current iterate $X$ to be overparameterized with respect to the rank $r^{star}$ of the global minimizer $X^{star}$. Unfortunately, overparameterization significantly slows down the convergence of gradient descent, from a linear rate with $r=r^{star}$ to a sublinear rate when $r>r^{star}$, even when $phi$ is strongly convex. In this paper, we propose an inexpensive preconditioner that restores the convergence rate of gradient descent back to linear in the overparameterized case, while also making it agnostic to possible ill-conditioning in the global minimizer $X^{star}$.

我们考虑使用梯度下降最小化非凸函数$f(X)=phi(XX^{T})$除以一个$n乘以r$因子矩阵$X$，其中$phi$是一个定义在$n乘以n$矩阵上的平滑凸代价函数。虽然在合理的时间内只能证明找到二阶平稳点$X$，但如果$X$又是秩不足的，则其秩不足证明它是全局最优的。这种证明全局最优性的方法必然要求当前迭代X$的搜索秩$r$相对于全局最小化器X^{星}$的秩$r^{星}$过度参数化。不幸的是，过度参数化显著地减慢了梯度下降的收敛速度，从$r=r^{star}$的线性速率到$r>r^{star}$的次线性速率，即使$phi$是强凸的。在本文中，我们提出了一种廉价的预条件，使梯度下降的收敛速度在过参数化情况下恢复到线性，同时使其对全局最小器X^{star}$可能的病态不可知。

引用次数: 4

Infinite-dimensional optimization and Bayesian nonparametric learning of stochastic differential equations 随机微分方程的无限维优化与贝叶斯非参数学习

J. Mach. Learn. Res.

Pub Date : 2022-05-30 DOI: 10.48550/arXiv.2205.15368

A. Ganguly, Riten Mitra, Jin Zhou

The paper has two major themes. The first part of the paper establishes certain general results for infinite-dimensional optimization problems on Hilbert spaces. These results cover the classical representer theorem and many of its variants as special cases and offer a wider scope of applications. The second part of the paper then develops a systematic approach for learning the drift function of a stochastic differential equation by integrating the results of the first part with Bayesian hierarchical framework. Importantly, our Baysian approach incorporates low-cost sparse learning through proper use of shrinkage priors while allowing proper quantification of uncertainty through posterior distributions. Several examples at the end illustrate the accuracy of our learning scheme.

这篇论文有两个主要主题。本文第一部分建立了Hilbert空间上无限维优化问题的若干一般结果。这些结果涵盖了经典的表示定理和它的许多变体作为特殊情况，并提供了更广泛的应用范围。然后，论文的第二部分通过将第一部分的结果与贝叶斯层次框架相结合，开发了一种系统的方法来学习随机微分方程的漂移函数。重要的是，我们的贝叶斯方法通过适当使用收缩先验结合了低成本稀疏学习，同时允许通过后验分布适当量化不确定性。最后的几个例子说明了我们的学习方案的准确性。

引用次数: 0

Metrics of calibration for probabilistic predictions 概率预测的校准度量

J. Mach. Learn. Res.

Pub Date : 2022-05-19 DOI: 10.48550/arXiv.2205.09680

Imanol Arrieta Ibarra, Paman Gujral, Jonathan Tannen, M. Tygert, Cherie Xu

Predictions are often probabilities; e.g., a prediction could be for precipitation tomorrow, but with only a 30% chance. Given such probabilistic predictions together with the actual outcomes,"reliability diagrams"help detect and diagnose statistically significant discrepancies -- so-called"miscalibration"-- between the predictions and the outcomes. The canonical reliability diagrams histogram the observed and expected values of the predictions; replacing the hard histogram binning with soft kernel density estimation is another common practice. But, which widths of bins or kernels are best? Plots of the cumulative differences between the observed and expected values largely avoid this question, by displaying miscalibration directly as the slopes of secant lines for the graphs. Slope is easy to perceive with quantitative precision, even when the constant offsets of the secant lines are irrelevant; there is no need to bin or perform kernel density estimation. The existing standard metrics of miscalibration each summarize a reliability diagram as a single scalar statistic. The cumulative plots naturally lead to scalar metrics for the deviation of the graph of cumulative differences away from zero; good calibration corresponds to a horizontal, flat graph which deviates little from zero. The cumulative approach is currently unconventional, yet offers many favorable statistical properties, guaranteed via mathematical theory backed by rigorous proofs and illustrative numerical examples. In particular, metrics based on binning or kernel density estimation unavoidably must trade-off statistical confidence for the ability to resolve variations as a function of the predicted probability or vice versa. Widening the bins or kernels averages away random noise while giving up some resolving power. Narrowing the bins or kernels enhances resolving power while not averaging away as much noise.

预测往往是概率;例如，预测明天的降水，但只有30%的可能性。将这种概率预测与实际结果结合起来，“可靠性图”有助于检测和诊断预测与结果之间的统计显著差异——即所谓的“误校准”。典型信度图直方图表示预测的观测值和期望值;用软核密度估计代替硬直方图分类是另一种常见的做法。但是，哪种宽度的桶或核是最好的呢?观测值和期望值之间的累积差异图通过直接显示割线的斜率，在很大程度上避免了这个问题。即使割线的恒定偏移量不相关，斜率也很容易以定量精度感知;不需要bin或执行核密度估计。现有的误校正标准度量都将可靠性图总结为单个标量统计量。累积图自然会产生标量度量，用于累积差值图偏离零的偏差;良好的校准对应于一个水平的，与零偏差很小的平面图形。累积方法目前是非常规的，但提供了许多有利的统计特性，通过严格的证明和说明性数值例子支持的数学理论来保证。特别是，基于分组或核密度估计的度量不可避免地必须权衡统计置信度，以便将变化作为预测概率的函数来解决，反之亦然。扩大箱子或核平均去除随机噪声，同时放弃一些分辨率。缩小箱子或核可以提高分辨率，同时不会平均掉太多的噪音。

{"title":"Metrics of calibration for probabilistic predictions","authors":"Imanol Arrieta Ibarra, Paman Gujral, Jonathan Tannen, M. Tygert, Cherie Xu","doi":"10.48550/arXiv.2205.09680","DOIUrl":"https://doi.org/10.48550/arXiv.2205.09680","url":null,"abstract":"Predictions are often probabilities; e.g., a prediction could be for precipitation tomorrow, but with only a 30% chance. Given such probabilistic predictions together with the actual outcomes,\"reliability diagrams\"help detect and diagnose statistically significant discrepancies -- so-called\"miscalibration\"-- between the predictions and the outcomes. The canonical reliability diagrams histogram the observed and expected values of the predictions; replacing the hard histogram binning with soft kernel density estimation is another common practice. But, which widths of bins or kernels are best? Plots of the cumulative differences between the observed and expected values largely avoid this question, by displaying miscalibration directly as the slopes of secant lines for the graphs. Slope is easy to perceive with quantitative precision, even when the constant offsets of the secant lines are irrelevant; there is no need to bin or perform kernel density estimation. The existing standard metrics of miscalibration each summarize a reliability diagram as a single scalar statistic. The cumulative plots naturally lead to scalar metrics for the deviation of the graph of cumulative differences away from zero; good calibration corresponds to a horizontal, flat graph which deviates little from zero. The cumulative approach is currently unconventional, yet offers many favorable statistical properties, guaranteed via mathematical theory backed by rigorous proofs and illustrative numerical examples. In particular, metrics based on binning or kernel density estimation unavoidably must trade-off statistical confidence for the ability to resolve variations as a function of the predicted probability or vice versa. Widening the bins or kernels averages away random noise while giving up some resolving power. Narrowing the bins or kernels enhances resolving power while not averaging away as much noise.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"54 1","pages":"351:1-351:54"},"PeriodicalIF":0.0,"publicationDate":"2022-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80185762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

An Inexact Augmented Lagrangian Algorithm for Training Leaky ReLU Neural Network with Group Sparsity 群稀疏型漏型ReLU神经网络的非精确增广拉格朗日训练算法

J. Mach. Learn. Res.

Pub Date : 2022-05-11 DOI: 10.48550/arXiv.2205.05428

Wei Liu, Xin Liu, Xiaojun Chen

The leaky ReLU network with a group sparse regularization term has been widely used in the recent years. However, training such a network yields a nonsmooth nonconvex optimization problem and there exists a lack of approaches to compute a stationary point deterministically. In this paper, we first resolve the multi-layer composite term in the original optimization problem by introducing auxiliary variables and additional constraints. We show the new model has a nonempty and bounded solution set and its feasible set satisfies the Mangasarian-Fromovitz constraint qualification. Moreover, we show the relationship between the new model and the original problem. Remarkably, we propose an inexact augmented Lagrangian algorithm for solving the new model and show the convergence of the algorithm to a KKT point. Numerical experiments demonstrate that our algorithm is more efficient for training sparse leaky ReLU neural networks than some well-known algorithms.

带群稀疏正则化项的泄漏ReLU网络近年来得到了广泛的应用。然而，训练这样的网络会产生一个非光滑的非凸优化问题，并且缺乏确定性地计算驻点的方法。本文首先通过引入辅助变量和附加约束来解决原优化问题中的多层复合项。证明了该模型具有非空有界解集，其可行集满足Mangasarian-Fromovitz约束条件。此外，我们还展示了新模型与原问题之间的关系。值得注意的是，我们提出了一种非精确增广拉格朗日算法来求解新模型，并证明了该算法收敛到一个KKT点。数值实验表明，该算法比一些已知算法更有效地训练稀疏泄漏ReLU神经网络。

引用次数: 2

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

J. Mach. Learn. Res.

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀