Tractability from overparametrization: the example of the negative perceptron

IF 1.5 1区数学 Q2 STATISTICS & PROBABILITY Probability Theory and Related Fields Pub Date : 2024-01-22 DOI:10.1007/s00440-023-01248-y

Andrea Montanari, Yiqiao Zhong, Kangjie Zhou

{"title":"Tractability from overparametrization: the example of the negative perceptron","authors":"Andrea Montanari, Yiqiao Zhong, Kangjie Zhou","doi":"10.1007/s00440-023-01248-y","DOIUrl":null,"url":null,"abstract":"In the negative perceptron problem we are given n data points \\((\\varvec{x}_i,y_i)\\), where \\(\\varvec{x}_i\\) is a d-dimensional vector and \\(y_i\\in \\{+1,-1\\}\\) is a binary label. The data are not linearly separable and hence we content ourselves to find a linear classifier with the largest possible negative margin. In other words, we want to find a unit norm vector \\(\\varvec{\\theta }\\) that maximizes \\(\\min _{i\\le n}y_i\\langle \\varvec{\\theta },\\varvec{x}_i\\rangle \\). This is a non-convex optimization problem (it is equivalent to finding a maximum norm vector in a polytope), and we study its typical properties under two random models for the data. We consider the proportional asymptotics in which \\(n,d\\rightarrow \\infty \\) with \\(n/d\\rightarrow \\delta \\), and prove upper and lower bounds on the maximum margin \\(\\kappa _{{\\textrm{s}}}(\\delta )\\) or—equivalently—on its inverse function \\(\\delta _{{\\textrm{s}}}(\\kappa )\\). In other words, \\(\\delta _{{\\textrm{s}}}(\\kappa )\\) is the overparametrization threshold: for \\(n/d\\le \\delta _{{\\textrm{s}}}(\\kappa )-{\\varepsilon }\\) a classifier achieving vanishing training error exists with high probability, while for \\(n/d\\ge \\delta _{{\\textrm{s}}}(\\kappa )+{\\varepsilon }\\) it does not. Our bounds on \\(\\delta _{{\\textrm{s}}}(\\kappa )\\) match to the leading order as \\(\\kappa \\rightarrow -\\infty \\). We then analyze a linear programming algorithm to find a solution, and characterize the corresponding threshold \\(\\delta _{\\textrm{lin}}(\\kappa )\\). We observe a gap between the interpolation threshold \\(\\delta _{{\\textrm{s}}}(\\kappa )\\) and the linear programming threshold \\(\\delta _{\\textrm{lin}}(\\kappa )\\), raising the question of the behavior of other algorithms.\n","PeriodicalId":20527,"journal":{"name":"Probability Theory and Related Fields","volume":"113 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Probability Theory and Related Fields","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s00440-023-01248-y","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

Abstract

In the negative perceptron problem we are given n data points \((\varvec{x}_i,y_i)\), where \(\varvec{x}_i\) is a d-dimensional vector and \(y_i\in \{+1,-1\}\) is a binary label. The data are not linearly separable and hence we content ourselves to find a linear classifier with the largest possible negative margin. In other words, we want to find a unit norm vector \(\varvec{\theta }\) that maximizes \(\min _{i\le n}y_i\langle \varvec{\theta },\varvec{x}_i\rangle \). This is a non-convex optimization problem (it is equivalent to finding a maximum norm vector in a polytope), and we study its typical properties under two random models for the data. We consider the proportional asymptotics in which \(n,d\rightarrow \infty \) with \(n/d\rightarrow \delta \), and prove upper and lower bounds on the maximum margin \(\kappa _{{\textrm{s}}}(\delta )\) or—equivalently—on its inverse function \(\delta _{{\textrm{s}}}(\kappa )\). In other words, \(\delta _{{\textrm{s}}}(\kappa )\) is the overparametrization threshold: for \(n/d\le \delta _{{\textrm{s}}}(\kappa )-{\varepsilon }\) a classifier achieving vanishing training error exists with high probability, while for \(n/d\ge \delta _{{\textrm{s}}}(\kappa )+{\varepsilon }\) it does not. Our bounds on \(\delta _{{\textrm{s}}}(\kappa )\) match to the leading order as \(\kappa \rightarrow -\infty \). We then analyze a linear programming algorithm to find a solution, and characterize the corresponding threshold \(\delta _{\textrm{lin}}(\kappa )\). We observe a gap between the interpolation threshold \(\delta _{{\textrm{s}}}(\kappa )\) and the linear programming threshold \(\delta _{\textrm{lin}}(\kappa )\), raising the question of the behavior of other algorithms.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

过度参数化的可操作性：以负感知器为例

在负感知器问题中，我们得到了 n 个数据点 \((\varvec{x}_i,y_i)\)，其中 \(\varvec{x}_i\)是一个 d 维向量，\(y_i\in \{+1,-1\}\)是一个二进制标签。数据不是线性可分的，因此我们只想找到一个负边际最大的线性分类器。换句话说，我们想找到一个单位规范向量，使其最大化（min _{i\le n}y_i\langle \varvec{\theta },\varvec{x}_i\rangle \）。这是一个非凸优化问题（相当于在多面体中寻找最大规范向量），我们将研究它在两种随机数据模型下的典型性质。我们考虑了其中 \(n,d\rightarrow \infty \) 与 \(n/d\rightarrow \delta \) 的比例渐近，并证明了最大边际 \(\kappa _{\textrm{s}}(\delta )\) 或--等价于--其反函数 \(\delta _{\textrm{s}}(\kappa )\) 的上界和下界。换句话说，\(\delta _{textrm{s}}}(\kappa )\)就是过参数化阈值：对于（n/d\le \delta _{\textrm{s}}}(\kappa )-{\varepsilon }\）来说，训练误差消失的分类器很有可能存在，而对于（n/d\ge \delta _{\textrm{s}}}(\kappa )+{\varepsilon }\）来说则不存在。我们对 \(\delta _{\textrm{s}}}(\kappa )\) 的约束与 \(\kappa \rightarrow -\infty \) 的前序相匹配。然后，我们分析了一种线性规划算法来找到一个解，并描述了相应的阈值 \(\delta_{textrm{lin}}(\kappa )\)。我们观察到插值阈值（\delta _{textrm{s}}(\kappa )\）和线性规划阈值（\delta _{textrm{lin}}(\kappa )\）之间存在差距，从而提出了其他算法的行为问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Probability Theory and Related Fields 数学-统计学与概率论

CiteScore

3.70

自引率

5.00%

发文量

审稿时长

6-12 weeks

期刊介绍： Probability Theory and Related Fields publishes research papers in modern probability theory and its various fields of application. Thus, subjects of interest include: mathematical statistical physics, mathematical statistics, mathematical biology, theoretical computer science, and applications of probability theory to other areas of mathematics such as combinatorics, analysis, ergodic theory and geometry. Survey papers on emerging areas of importance may be considered for publication. The main languages of publication are English, French and German.

期刊最新文献

The dynamical Ising-Kac model in 3D converges to Φ 3 4. From ABC to KPZ. Rearranged Stochastic Heat Equation. Homogenisation of nonlinear Dirichlet problems in randomly perforated domains under minimal assumptions on the size of perforations On questions of uniqueness for the vacant set of Wiener sausages and Brownian interlacements