Information and Inference-A Journal of the Ima最新文献

英文中文

Exit Time Analysis for Approximations of Gradient Descent Trajectories Around Saddle Points 鞍点附近梯度下降轨迹近似的退出时间分析

IF 1.6 4区数学 Q2 MATHEMATICS, APPLIED

Information and Inference-A Journal of the Ima

Pub Date : 2022-08-01 DOI: 10.1093/imaiai/iaac025

Rishabh Dixit;Mert Gürbüzbalaban;Waheed U Bajwa

This paper considers the problem of understanding the exit time for trajectories of gradient-related first-order methods from saddle neighborhoods under some initial boundary conditions. Given the ‘flat’ geometry around saddle points, first-order methods can struggle to escape these regions in a fast manner due to the small magnitudes of gradients encountered. In particular, while it is known that gradient-related first-order methods escape strict-saddle neighborhoods, existing analytic techniques do not explicitly leverage the local geometry around saddle points in order to control behavior of gradient trajectories. It is in this context that this paper puts forth a rigorous geometric analysis of the gradient-descent method around strict-saddle neighborhoods using matrix perturbation theory. In doing so, it provides a key result that can be used to generate an approximate gradient trajectory for any given initial conditions. In addition, the analysis leads to a linear exit-time solution for gradient-descent method under certain necessary initial conditions, which explicitly bring out the dependence on problem dimension, conditioning of the saddle neighborhood, and more, for a class of strict-saddle functions.

本文考虑了在一些初始边界条件下，从鞍邻域理解梯度相关一阶方法轨迹的退出时间的问题。考虑到鞍点周围的“平坦”几何结构，由于遇到的梯度幅度较小，一阶方法可能难以快速逃离这些区域。特别地，虽然已知梯度相关的一阶方法避开了严格的鞍邻域，但现有的分析技术并没有明确地利用鞍点周围的局部几何来控制梯度轨迹的行为。正是在这种背景下，本文利用矩阵摄动理论对严格鞍邻域周围的梯度下降方法进行了严格的几何分析。在这样做的过程中，它提供了一个关键结果，可用于生成任何给定初始条件的近似梯度轨迹。此外，分析得出了梯度下降法在某些必要的初始条件下的线性退出时间解，明确地给出了一类严格鞍函数对问题维数、鞍邻域条件等的依赖性。

{"title":"Exit Time Analysis for Approximations of Gradient Descent Trajectories Around Saddle Points","authors":"Rishabh Dixit;Mert Gürbüzbalaban;Waheed U Bajwa","doi":"10.1093/imaiai/iaac025","DOIUrl":"https://doi.org/10.1093/imaiai/iaac025","url":null,"abstract":"This paper considers the problem of understanding the exit time for trajectories of gradient-related first-order methods from saddle neighborhoods under some initial boundary conditions. Given the ‘flat’ geometry around saddle points, first-order methods can struggle to escape these regions in a fast manner due to the small magnitudes of gradients encountered. In particular, while it is known that gradient-related first-order methods escape strict-saddle neighborhoods, existing analytic techniques do not explicitly leverage the local geometry around saddle points in order to control behavior of gradient trajectories. It is in this context that this paper puts forth a rigorous geometric analysis of the gradient-descent method around strict-saddle neighborhoods using matrix perturbation theory. In doing so, it provides a key result that can be used to generate an approximate gradient trajectory for any given initial conditions. In addition, the analysis leads to a linear exit-time solution for gradient-descent method under certain necessary initial conditions, which explicitly bring out the dependence on problem dimension, conditioning of the saddle neighborhood, and more, for a class of strict-saddle functions.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"12 2","pages":"714-786"},"PeriodicalIF":1.6,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50297617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Uncertainty quantification in the Bradley–Terry–Luce model Bradley–Terry–Luce模型中的不确定性量化

IF 1.6 4区数学 Q2 MATHEMATICS, APPLIED

Information and Inference-A Journal of the Ima

Pub Date : 2022-08-01 DOI: 10.1093/imaiai/iaac032

Chao Gao;Yandi Shen;Anderson Y Zhang

The Bradley–Terry–Luce (BTL) model is a benchmark model for pairwise comparisons between individuals. Despite recent progress on the first-order asymptotics of several popular procedures, the understanding of uncertainty quantification in the BTL model remains largely incomplete, especially when the underlying comparison graph is sparse. In this paper, we fill this gap by focusing on two estimators that have received much recent attention: the maximum likelihood estimator (MLE) and the spectral estimator. Using a unified proof strategy, we derive sharp and uniform non-asymptotic expansions for both estimators in the sparsest possible regime (up to some poly-logarithmic factors) of the underlying comparison graph. These expansions allow us to obtain: (i) finite-dimensional central limit theorems for both estimators; (ii) construction of confidence intervals for individual ranks; (iii) optimal constant of $ell _2$ estimation, which is achieved by the MLE but not by the spectral estimator. Our proof is based on a self-consistent equation of the second-order remainder vector and a novel leave-two-out analysis.

Bradley–Terry–Luce（BTL）模型是个体之间成对比较的基准模型。尽管最近在几种流行程序的一阶渐近性方面取得了进展，但对BTL模型中不确定性量化的理解在很大程度上仍然不完整，尤其是当基础比较图稀疏时。在本文中，我们通过关注最近备受关注的两种估计量来填补这一空白：最大似然估计量（MLE）和谱估计量。使用统一的证明策略，我们在基础比较图的最稀疏的可能状态（直到一些多对数因子）中导出了两个估计量的尖锐和一致的非渐近展开式。这些展开允许我们得到：（i）两个估计量的有限维中心极限定理；（ii）个别职级的置信区间的构造；（iii）$ell_2$估计的最优常数，其通过MLE而不是通过谱估计器来实现。我们的证明是基于一个二阶余数向量的自洽方程和一个新颖的二舍二入分析。

引用次数: 11

Optimal orthogonal group synchronization and rotation group synchronization 最优正交群同步和旋转群同步

IF 1.6 4区数学 Q2 MATHEMATICS, APPLIED

Information and Inference-A Journal of the Ima

Pub Date : 2022-08-01 DOI: 10.1093/imaiai/iaac022

Chao Gao;Anderson Y Zhang

We study the statistical estimation problem of orthogonal group synchronization and rotation group synchronization. The model is $Y_{ij} = Z_i^* Z_j^{*T} + sigma W_{ij}in{mathbb{R}}^{dtimes d}$ where $W_{ij}$ is a Gaussian random matrix and $Z_i^*$ is either an orthogonal matrix or a rotation matrix, and each $Y_{ij}$ is observed independently with probability $p$. We analyze an iterative polar decomposition algorithm for the estimation of $Z^*$ and show it has an error of $(1+o(1))frac{sigma ^2 d(d-1)}{2np}$ when initialized by spectral methods. A matching minimax lower bound is further established that leads to the optimality of the proposed algorithm as it achieves the exact minimax risk.

研究了正交群同步和旋转群同步的统计估计问题。该模型为$Y_｛ij｝=Z_i^*Z_j^｛*T｝+mathbb｛R｝｝^｛d times d｝$中的σW_｛ij｝$，其中$W_{ij｝$是高斯随机矩阵，$Z_i^**$是正交矩阵或旋转矩阵，并且每个$Y_。我们分析了一种用于$Z^*$估计的迭代极分解算法，并表明当用谱方法初始化时，它的误差为$（1+o（1））frac｛sigma^2 d（d-1）｝｛2np｝$。进一步建立了匹配的极小极大下界，该下界导致所提出的算法的最优性，因为它实现了精确的极小极大风险。

引用次数: 7

Fast splitting algorithms for sparsity-constrained and noisy group testing 稀疏性约束和噪声群测试的快速分裂算法

IF 1.6 4区数学 Q2 MATHEMATICS, APPLIED

Information and Inference-A Journal of the Ima

Pub Date : 2022-08-01 DOI: 10.1093/imaiai/iaac031

Eric Price;Jonathan Scarlett;Nelvin Tan

In group testing, the goal is to identify a subset of defective items within a larger set of items based on tests whose outcomes indicate whether at least one defective item is present. This problem is relevant in areas such as medical testing, DNA sequencing, communication protocols and many more. In this paper, we study (i) a sparsity-constrained version of the problem, in which the testing procedure is subjected to one of the following two constraints: items are finitely divisible and thus may participate in at most $gamma $ tests; or tests are size-constrained to pool no more than $rho $ items per test; and (ii) a noisy version of the problem, where each test outcome is independently flipped with some constant probability. Under each of these settings, considering the for-each recovery guarantee with asymptotically vanishing error probability, we introduce a fast splitting algorithm and establish its near-optimality not only in terms of the number of tests, but also in terms of the decoding time. While the most basic formulations of our algorithms require $varOmega (n)$ storage for each algorithm, we also provide low-storage variants based on hashing, with similar recovery guarantees.

在小组测试中，目标是基于测试结果指示是否存在至少一个缺陷项目的测试，在更大的项目集合中识别缺陷项目的子集。这个问题与医学检测、DNA测序、通信协议等领域有关。在本文中，我们研究了（i）该问题的稀疏性约束版本，其中测试过程受到以下两个约束之一的约束：项目是有限可分的，因此最多可以参与$gamma$测试；或者测试的大小被限制为每次测试汇集不超过$rho$个项目；以及（ii）问题的噪声版本，其中每个测试结果以一定的恒定概率独立翻转。在每种设置下，考虑到误差概率渐近消失的每种恢复保证，我们引入了一种快速分裂算法，并建立了它的近似最优性，不仅在测试次数方面，而且在解码时间方面。虽然我们算法的最基本公式需要每个算法的$varOmega（n）$存储，但我们也提供了基于哈希的低存储变体，具有类似的恢复保证。

{"title":"Fast splitting algorithms for sparsity-constrained and noisy group testing","authors":"Eric Price;Jonathan Scarlett;Nelvin Tan","doi":"10.1093/imaiai/iaac031","DOIUrl":"https://doi.org/10.1093/imaiai/iaac031","url":null,"abstract":"In group testing, the goal is to identify a subset of defective items within a larger set of items based on tests whose outcomes indicate whether at least one defective item is present. This problem is relevant in areas such as medical testing, DNA sequencing, communication protocols and many more. In this paper, we study (i) a sparsity-constrained version of the problem, in which the testing procedure is subjected to one of the following two constraints: items are finitely divisible and thus may participate in at most \u0000<tex>$gamma $</tex>\u0000 tests; or tests are size-constrained to pool no more than \u0000<tex>$rho $</tex>\u0000 items per test; and (ii) a noisy version of the problem, where each test outcome is independently flipped with some constant probability. Under each of these settings, considering the for-each recovery guarantee with asymptotically vanishing error probability, we introduce a fast splitting algorithm and establish its near-optimality not only in terms of the number of tests, but also in terms of the decoding time. While the most basic formulations of our algorithms require \u0000<tex>$varOmega (n)$</tex>\u0000 storage for each algorithm, we also provide low-storage variants based on hashing, with similar recovery guarantees.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"12 2","pages":"1141-1171"},"PeriodicalIF":1.6,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50297919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

On the robustness to adversarial corruption and to heavy-tailed data of the Stahel–Donoho median of means Stahel–Donoho均值中值对对抗性腐败和重尾数据的稳健性

IF 1.6 4区数学 Q2 MATHEMATICS, APPLIED

Information and Inference-A Journal of the Ima

Pub Date : 2022-08-01 DOI: 10.1093/imaiai/iaac026

Jules Depersin;Guillaume Lecué

We consider median of means (MOM) versions of the Stahel–Donoho outlyingness (SDO) [23, 66] and of the Median Absolute Deviation (MAD) [30] functions to construct subgaussian estimators of a mean vector under adversarial contamination and heavy-tailed data. We develop a single analysis of the MOM version of the SDO which covers all cases ranging from the Gaussian case to the $L_2$ case. It is based on isomorphic and almost isometric properties of the MOM versions of SDO and MAD. This analysis also covers cases where the mean does not even exist but a location parameter does; in those cases we still recover the same subgaussian rates and the same price for adversarial contamination even though there is not even a first moment. These properties are achieved by the classical SDO median and are therefore the first non-asymptotic statistical bounds on the Stahel–Donoho median complementing the $sqrt{n}$-consistency [58] and asymptotic normality [74] of the Stahel–Donoho estimators. We also show that the MOM version of MAD can be used to construct an estimator of the covariance matrix only under the existence of a second moment or of a scatter matrix if a second moment does not exist.

我们考虑Stahel–Donoho寿命（SDO）[23，66]和中值绝对偏差（MAD）[30]函数的均值中值（MOM）版本，以在对抗性污染和重尾数据下构建均值向量的亚高斯估计量。我们开发了SDO的MOM版本的单一分析，它涵盖了从高斯情况到$L_2$情况的所有情况。它基于SDO和MAD的MOM版本的同构和几乎等距性质。该分析还涵盖了平均值甚至不存在，但位置参数存在的情况；在这些情况下，我们仍然可以恢复相同的亚高斯速率和相同的对抗性污染价格，即使没有第一时间。这些性质是由经典SDO中值实现的，因此是Stahel–Donoho中值上的第一个非渐近统计界，补充了Stahel-Donoho估计量的$sqrt｛n｝$-一致性[58]和渐近正态性[74]。我们还证明了只有在存在二阶矩的情况下，MAD的MOM版本才能用于构造协方差矩阵的估计器，或者如果不存在二阶力矩，则可以用于构造散射矩阵的估计器。

{"title":"On the robustness to adversarial corruption and to heavy-tailed data of the Stahel–Donoho median of means","authors":"Jules Depersin;Guillaume Lecué","doi":"10.1093/imaiai/iaac026","DOIUrl":"https://doi.org/10.1093/imaiai/iaac026","url":null,"abstract":"We consider median of means (MOM) versions of the Stahel–Donoho outlyingness (SDO) [23, 66] and of the Median Absolute Deviation (MAD) [30] functions to construct subgaussian estimators of a mean vector under adversarial contamination and heavy-tailed data. We develop a single analysis of the MOM version of the SDO which covers all cases ranging from the Gaussian case to the \u0000<tex>$L_2$</tex>\u0000 case. It is based on isomorphic and almost isometric properties of the MOM versions of SDO and MAD. This analysis also covers cases where the mean does not even exist but a location parameter does; in those cases we still recover the same subgaussian rates and the same price for adversarial contamination even though there is not even a first moment. These properties are achieved by the classical SDO median and are therefore the first non-asymptotic statistical bounds on the Stahel–Donoho median complementing the \u0000<tex>$sqrt{n}$</tex>\u0000-consistency [58] and asymptotic normality [74] of the Stahel–Donoho estimators. We also show that the MOM version of MAD can be used to construct an estimator of the covariance matrix only under the existence of a second moment or of a scatter matrix if a second moment does not exist.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"12 2","pages":"814-850"},"PeriodicalIF":1.6,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50298050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sparse recovery by reduced variance stochastic approximation 基于降方差随机近似的稀疏恢复

IF 1.6 4区数学 Q2 MATHEMATICS, APPLIED

Information and Inference-A Journal of the Ima

Pub Date : 2022-08-01 DOI: 10.1093/imaiai/iaac028

Anatoli Juditsky;Andrei Kulunchakov;Hlib Tsyntseus

In this paper, we discuss application of iterative Stochastic Optimization routines to the problem of sparse signal recovery from noisy observation. Using Stochastic Mirror Descent algorithm as a building block, we develop a multistage procedure for recovery of sparse solutions to Stochastic Optimization problem under assumption of smoothness and quadratic minoration on the expected objective. An interesting feature of the proposed algorithm is linear convergence of the approximate solution during the preliminary phase of the routine when the component of stochastic error in the gradient observation, which is due to bad initial approximation of the optimal solution, is larger than the ‘ideal’ asymptotic error component owing to observation noise ‘at the optimal solution’. We also show how one can straightforwardly enhance reliability of the corresponding solution using Median-of-Means-like techniques.We illustrate the performance of the proposed algorithms in application to classical problems of recovery of sparse and low-rank signals in the generalized linear regression framework. We show, under rather weak assumption on the regressor and noise distributions, how they lead to parameter estimates which obey (up to factors which are logarithmic in problem dimension and confidence level) the best known accuracy bounds.

在本文中，我们讨论了迭代随机优化例程在从噪声观测中恢复稀疏信号问题中的应用。以随机镜像下降算法为构建块，在期望目标光滑性和二次幂的假设下，我们开发了一个多阶段随机优化问题稀疏解的恢复过程。所提出的算法的一个有趣的特征是，在程序的初始阶段，当梯度观测中的随机误差分量（由于最优解的初始近似不良）大于“最优解”处的观测噪声引起的“理想”渐近误差分量时，近似解的线性收敛。我们还展示了如何使用类似均值的中位数技术直接提高相应解决方案的可靠性。我们说明了所提出的算法在广义线性回归框架中应用于稀疏和低秩信号恢复的经典问题中的性能。我们展示了在对回归器和噪声分布的较弱假设下，它们如何导致参数估计服从（问题维度和置信水平为对数的因素）最已知的精度边界。

{"title":"Sparse recovery by reduced variance stochastic approximation","authors":"Anatoli Juditsky;Andrei Kulunchakov;Hlib Tsyntseus","doi":"10.1093/imaiai/iaac028","DOIUrl":"https://doi.org/10.1093/imaiai/iaac028","url":null,"abstract":"In this paper, we discuss application of iterative Stochastic Optimization routines to the problem of sparse signal recovery from noisy observation. Using Stochastic Mirror Descent algorithm as a building block, we develop a multistage procedure for recovery of sparse solutions to Stochastic Optimization problem under assumption of smoothness and quadratic minoration on the expected objective. An interesting feature of the proposed algorithm is linear convergence of the approximate solution during the preliminary phase of the routine when the component of stochastic error in the gradient observation, which is due to bad initial approximation of the optimal solution, is larger than the ‘ideal’ asymptotic error component owing to observation noise ‘at the optimal solution’. We also show how one can straightforwardly enhance reliability of the corresponding solution using Median-of-Means-like techniques.We illustrate the performance of the proposed algorithms in application to classical problems of recovery of sparse and low-rank signals in the generalized linear regression framework. We show, under rather weak assumption on the regressor and noise distributions, how they lead to parameter estimates which obey (up to factors which are logarithmic in problem dimension and confidence level) the best known accuracy bounds.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"12 2","pages":"851-896"},"PeriodicalIF":1.6,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50298051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

The geometry of adversarial training in binary classification 二元分类中对抗性训练的几何结构

IF 1.6 4区数学 Q2 MATHEMATICS, APPLIED

Information and Inference-A Journal of the Ima

Pub Date : 2022-08-01 DOI: 10.1093/imaiai/iaac029

Leon Bungert;Nicolás García Trillos;Ryan Murray

We establish an equivalence between a family of adversarial training problems for non-parametric binary classification and a family of regularized risk minimization problems where the regularizer is a nonlocal perimeter functional. The resulting regularized risk minimization problems admit exact convex relaxations of the type $L^1+text{(nonlocal)}operatorname{TV}$, a form frequently studied in image analysis and graph-based learning. A rich geometric structure is revealed by this reformulation which in turn allows us to establish a series of properties of optimal solutions of the original problem, including the existence of minimal and maximal solutions (interpreted in a suitable sense) and the existence of regular solutions (also interpreted in a suitable sense). In addition, we highlight how the connection between adversarial training and perimeter minimization problems provides a novel, directly interpretable, statistical motivation for a family of regularized risk minimization problems involving perimeter/total variation. The majority of our theoretical results are independent of the distance used to define adversarial attacks.

我们在非参数二元分类的对抗性训练问题族和正则化风险最小化问题族之间建立了等价性，其中正则化子是非局部周边函数。由此产生的正则化风险最小化问题允许类型为$L^1+text{（非局部）}operatorname{TV}$的精确凸松弛，这是图像分析和基于图的学习中经常研究的一种形式。这种重新表述揭示了丰富的几何结构，这反过来又使我们能够建立原始问题最优解的一系列性质，包括极小解和极大解的存在性（在适当的意义上解释）以及正则解的存在（也在适当的义义上解释）。此外，我们强调了对抗性训练和周长最小化问题之间的联系如何为涉及周长/总变异的正则化风险最小化问题家族提供了一种新的、可直接解释的统计动机。我们的大多数理论结果与用于定义对抗性攻击的距离无关。

{"title":"The geometry of adversarial training in binary classification","authors":"Leon Bungert;Nicolás García Trillos;Ryan Murray","doi":"10.1093/imaiai/iaac029","DOIUrl":"https://doi.org/10.1093/imaiai/iaac029","url":null,"abstract":"We establish an equivalence between a family of adversarial training problems for non-parametric binary classification and a family of regularized risk minimization problems where the regularizer is a nonlocal perimeter functional. The resulting regularized risk minimization problems admit exact convex relaxations of the type \u0000<tex>$L^1+text{(nonlocal)}operatorname{TV}$</tex>\u0000, a form frequently studied in image analysis and graph-based learning. A rich geometric structure is revealed by this reformulation which in turn allows us to establish a series of properties of optimal solutions of the original problem, including the existence of minimal and maximal solutions (interpreted in a suitable sense) and the existence of regular solutions (also interpreted in a suitable sense). In addition, we highlight how the connection between adversarial training and perimeter minimization problems provides a novel, directly interpretable, statistical motivation for a family of regularized risk minimization problems involving perimeter/total variation. The majority of our theoretical results are independent of the distance used to define adversarial attacks.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"12 2","pages":"921-968"},"PeriodicalIF":1.6,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50298053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Nearly minimax-optimal rates for noisy sparse phase retrieval via early-stopped mirror descent 基于早期停止镜像下降的噪声稀疏相位检索的近似极小极大最优速率

IF 1.6 4区数学 Q2 MATHEMATICS, APPLIED

Information and Inference-A Journal of the Ima

Pub Date : 2022-08-01 DOI: 10.1093/imaiai/iaac024

Fan Wu;Patrick Rebeschini

This paper studies early-stopped mirror descent applied to noisy sparse phase retrieval, which is the problem of recovering a $k$-sparse signal $textbf{x}^star in{mathbb{R}}^n$ from a set of quadratic Gaussian measurements corrupted by sub-exponential noise. We consider the (non-convex) unregularized empirical risk minimization problem and show that early-stopped mirror descent, when equipped with the hypentropy mirror map and proper initialization, achieves a nearly minimax-optimal rate of convergence, provided the sample size is at least of order $k^2$ (modulo logarithmic term) and the minimum (in modulus) non-zero entry of the signal is on the order of $|textbf{x}^star |_2/sqrt{k}$. Our theory leads to a simple algorithm that does not rely on explicit regularization or thresholding steps to promote sparsity. More generally, our results establish a connection between mirror descent and sparsity in the non-convex problem of noisy sparse phase retrieval, adding to the literature on early stopping that has mostly focused on non-sparse, Euclidean and convex settings via gradient descent. Our proof combines a potential-based analysis of mirror descent with a quantitative control on a variational coherence property that we establish along the path of mirror descent, up to a prescribed stopping time.

本文研究了应用于噪声稀疏相位检索的早期停止镜像下降，这是从一组被亚指数噪声破坏的二次高斯测量中恢复$k$-稀疏信号$textbf｛x｝^starin｛mathbb｛R｝｝^n$的问题。我们考虑了（非凸）非规则经验风险最小化问题，并表明当配备有高熵镜像图和适当的初始化时，早期停止镜像下降实现了几乎最小最大的最优收敛速度，假设样本大小至少为$k^2$阶（模对数项），并且信号的最小（以模为单位）非零项为$|textbf｛x｝^star|_2/sqrt｛k｝$阶。我们的理论导致了一种简单的算法，该算法不依赖于显式正则化或阈值步骤来提高稀疏性。更普遍地说，我们的结果在噪声稀疏相位检索的非凸问题中建立了镜像下降和稀疏性之间的联系，增加了早期停止的文献，该文献主要关注通过梯度下降的非稀疏、欧几里得和凸设置。我们的证明将基于势的镜像下降分析与我们沿着镜像下降路径建立的变分相干性质的定量控制相结合，直到规定的停止时间。

{"title":"Nearly minimax-optimal rates for noisy sparse phase retrieval via early-stopped mirror descent","authors":"Fan Wu;Patrick Rebeschini","doi":"10.1093/imaiai/iaac024","DOIUrl":"https://doi.org/10.1093/imaiai/iaac024","url":null,"abstract":"This paper studies early-stopped mirror descent applied to noisy sparse phase retrieval, which is the problem of recovering a \u0000<tex>$k$</tex>\u0000-sparse signal \u0000<tex>$textbf{x}^star in{mathbb{R}}^n$</tex>\u0000 from a set of quadratic Gaussian measurements corrupted by sub-exponential noise. We consider the (non-convex) unregularized empirical risk minimization problem and show that early-stopped mirror descent, when equipped with the hypentropy mirror map and proper initialization, achieves a nearly minimax-optimal rate of convergence, provided the sample size is at least of order \u0000<tex>$k^2$</tex>\u0000 (modulo logarithmic term) and the minimum (in modulus) non-zero entry of the signal is on the order of \u0000<tex>$|textbf{x}^star |_2/sqrt{k}$</tex>\u0000. Our theory leads to a simple algorithm that does not rely on explicit regularization or thresholding steps to promote sparsity. More generally, our results establish a connection between mirror descent and sparsity in the non-convex problem of noisy sparse phase retrieval, adding to the literature on early stopping that has mostly focused on non-sparse, Euclidean and convex settings via gradient descent. Our proof combines a potential-based analysis of mirror descent with a quantitative control on a variational coherence property that we establish along the path of mirror descent, up to a prescribed stopping time.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"12 2","pages":"633-713"},"PeriodicalIF":1.6,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8016800/10058586/10058608.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50297616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Perturbation bounds for (nearly) orthogonally decomposable tensors with statistical applications （近似）正交可分解张量的扰动界及其统计应用

IF 1.6 4区数学 Q2 MATHEMATICS, APPLIED

Information and Inference-A Journal of the Ima

Pub Date : 2022-08-01 DOI: 10.1093/imaiai/iaac033

Arnab Auddy;Ming Yuan

We develop deterministic perturbation bounds for singular values and vectors of orthogonally decomposable tensors, in a spirit similar to classical results for matrices such as those due to Weyl, Davis, Kahan and Wedin. Our bounds demonstrate intriguing differences between matrices and higher order tensors. Most notably, they indicate that for higher order tensors perturbation affects each essential singular value/vector in isolation, and its effect on an essential singular vector does not depend on the multiplicity of its corresponding singular value or its distance from other singular values. Our results can be readily applied and provide a unified treatment to many different problems involving higher order orthogonally decomposable tensors. In particular, we illustrate the implications of our bounds through connected yet seemingly different high-dimensional data analysis tasks: the unsupervised learning scenario of tensor SVD and the supervised task of tensor regression, leading to new insights in both of these settings.

我们为正交可分解张量的奇异值和向量开发了确定性扰动界，其精神类似于矩阵的经典结果，如Weyl、Davis、Kahan和Wedin的结果。我们的边界证明了矩阵和高阶张量之间有趣的差异。最值得注意的是，它们表明，对于高阶张量，扰动孤立地影响每个本质奇异值/向量，并且其对本质奇异向量的影响不取决于其相应奇异值的多重性或其与其他奇异值的距离。我们的结果可以很容易地应用，并为涉及高阶正交可分解张量的许多不同问题提供了统一的处理方法。特别是，我们通过连接但看似不同的高维数据分析任务来说明我们的边界的含义：张量SVD的无监督学习场景和张量回归的有监督任务，在这两种情况下都有了新的见解。

引用次数: 0

Breaking the sample complexity barrier to regret-optimal model-free reinforcement learning 打破样本复杂性障碍后悔最优无模型强化学习

IF 1.6 4区数学 Q2 MATHEMATICS, APPLIED

Information and Inference-A Journal of the Ima

Pub Date : 2022-08-01 DOI: 10.1093/imaiai/iaac034

Gen Li;Laixi Shi;Yuxin Chen;Yuejie Chi

Achieving sample efficiency in online episodic reinforcement learning (RL) requires optimally balancing exploration and exploitation. When it comes to a finite-horizon episodic Markov decision process with $S$ states, $A$ actions and horizon length $H$, substantial progress has been achieved toward characterizing the minimax-optimal regret, which scales on the order of $sqrt{H^2SAT}$ (modulo log factors) with $T$ the total number of samples. While several competing solution paradigms have been proposed to minimize regret, they are either memory-inefficient, or fall short of optimality unless the sample size exceeds an enormous threshold (e.g. $S^6A^4 ,mathrm{poly}(H)$ for existing model-free methods).To overcome such a large sample size barrier to efficient RL, we design a novel model-free algorithm, with space complexity $O(SAH)$, that achieves near-optimal regret as soon as the sample size exceeds the order of $SA,mathrm{poly}(H)$. In terms of this sample size requirement (also referred to the initial burn-in cost), our method improves—by at least a factor of $S^5A^3$—upon any prior memory-efficient algorithm that is asymptotically regret-optimal. Leveraging the recently introduced variance reduction strategy (also called reference-advantage decomposition), the proposed algorithm employs an early-settled reference update rule, with the aid of two Q-learning sequences with upper and lower confidence bounds. The design principle of our early-settled variance reduction method might be of independent interest to other RL settings that involve intricate exploration–exploitation trade-offs.

在线情景强化学习（RL）中实现样本效率需要优化地平衡探索和开发。当涉及到具有$S$状态、$a$动作和视界长度$H$的有限视界幕式马尔可夫决策过程时，在表征最小最大最优后悔方面取得了实质性进展，该最小最大最优遗憾按$sqrt｛H^2SAT｝$（模对数因子）的阶数缩放，$T$为样本总数。虽然已经提出了几种竞争解决方案范式来最大限度地减少遗憾，但它们要么记忆效率低下，要么除非样本量超过一个巨大的阈值，否则达不到最优性（例如，对于现有的无模型方法，$S^6A^4，mathrm｛poly｝（H）$）。为了克服有效RL的如此大的样本量障碍，我们设计了一种新的无模型算法，其空间复杂度为$O（SAH）$，一旦样本量超过$SA，mathrm｛poly｝（H）$的数量级，就实现了接近最优的后悔。就这个样本量要求（也称为初始老化成本）而言，我们的方法比任何先验的渐进后悔最优的内存有效算法改进了至少一倍$S^5A^3$。利用最近引入的方差减少策略（也称为参考优势分解），该算法采用了一个早期确定的参考更新规则，并借助于两个具有上下限置信度的Q学习序列。我们早期确定的方差减少方法的设计原理可能对其他涉及复杂勘探-开发权衡的RL设置独立感兴趣。

{"title":"Breaking the sample complexity barrier to regret-optimal model-free reinforcement learning","authors":"Gen Li;Laixi Shi;Yuxin Chen;Yuejie Chi","doi":"10.1093/imaiai/iaac034","DOIUrl":"https://doi.org/10.1093/imaiai/iaac034","url":null,"abstract":"Achieving sample efficiency in online episodic reinforcement learning (RL) requires optimally balancing exploration and exploitation. When it comes to a finite-horizon episodic Markov decision process with \u0000<tex>$S$</tex>\u0000 states, \u0000<tex>$A$</tex>\u0000 actions and horizon length \u0000<tex>$H$</tex>\u0000, substantial progress has been achieved toward characterizing the minimax-optimal regret, which scales on the order of \u0000<tex>$sqrt{H^2SAT}$</tex>\u0000 (modulo log factors) with \u0000<tex>$T$</tex>\u0000 the total number of samples. While several competing solution paradigms have been proposed to minimize regret, they are either memory-inefficient, or fall short of optimality unless the sample size exceeds an enormous threshold (e.g. \u0000<tex>$S^6A^4 ,mathrm{poly}(H)$</tex>\u0000 for existing model-free methods).To overcome such a large sample size barrier to efficient RL, we design a novel model-free algorithm, with space complexity \u0000<tex>$O(SAH)$</tex>\u0000, that achieves near-optimal regret as soon as the sample size exceeds the order of \u0000<tex>$SA,mathrm{poly}(H)$</tex>\u0000. In terms of this sample size requirement (also referred to the initial burn-in cost), our method improves—by at least a factor of \u0000<tex>$S^5A^3$</tex>\u0000—upon any prior memory-efficient algorithm that is asymptotically regret-optimal. Leveraging the recently introduced variance reduction strategy (also called reference-advantage decomposition), the proposed algorithm employs an early-settled reference update rule, with the aid of two Q-learning sequences with upper and lower confidence bounds. The design principle of our early-settled variance reduction method might be of independent interest to other RL settings that involve intricate exploration–exploitation trade-offs.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"12 2","pages":"969-1043"},"PeriodicalIF":1.6,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8016800/10058586/10058618.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50298054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Information and Inference-A Journal of the Ima

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀