Tractability from overparametrization: the example of the negative perceptron

IF 1.5 1区 数学 Q2 STATISTICS & PROBABILITY Probability Theory and Related Fields Pub Date : 2024-01-22 DOI:10.1007/s00440-023-01248-y
Andrea Montanari, Yiqiao Zhong, Kangjie Zhou
{"title":"Tractability from overparametrization: the example of the negative perceptron","authors":"Andrea Montanari, Yiqiao Zhong, Kangjie Zhou","doi":"10.1007/s00440-023-01248-y","DOIUrl":null,"url":null,"abstract":"<p>In the negative perceptron problem we are given <i>n</i> data points <span>\\((\\varvec{x}_i,y_i)\\)</span>, where <span>\\(\\varvec{x}_i\\)</span> is a <i>d</i>-dimensional vector and <span>\\(y_i\\in \\{+1,-1\\}\\)</span> is a binary label. The data are not linearly separable and hence we content ourselves to find a linear classifier with the largest possible <i>negative</i> margin. In other words, we want to find a unit norm vector <span>\\(\\varvec{\\theta }\\)</span> that maximizes <span>\\(\\min _{i\\le n}y_i\\langle \\varvec{\\theta },\\varvec{x}_i\\rangle \\)</span>. This is a non-convex optimization problem (it is equivalent to finding a maximum norm vector in a polytope), and we study its typical properties under two random models for the data. We consider the proportional asymptotics in which <span>\\(n,d\\rightarrow \\infty \\)</span> with <span>\\(n/d\\rightarrow \\delta \\)</span>, and prove upper and lower bounds on the maximum margin <span>\\(\\kappa _{{\\textrm{s}}}(\\delta )\\)</span> or—equivalently—on its inverse function <span>\\(\\delta _{{\\textrm{s}}}(\\kappa )\\)</span>. In other words, <span>\\(\\delta _{{\\textrm{s}}}(\\kappa )\\)</span> is the overparametrization threshold: for <span>\\(n/d\\le \\delta _{{\\textrm{s}}}(\\kappa )-{\\varepsilon }\\)</span> a classifier achieving vanishing training error exists with high probability, while for <span>\\(n/d\\ge \\delta _{{\\textrm{s}}}(\\kappa )+{\\varepsilon }\\)</span> it does not. Our bounds on <span>\\(\\delta _{{\\textrm{s}}}(\\kappa )\\)</span> match to the leading order as <span>\\(\\kappa \\rightarrow -\\infty \\)</span>. We then analyze a linear programming algorithm to find a solution, and characterize the corresponding threshold <span>\\(\\delta _{\\textrm{lin}}(\\kappa )\\)</span>. We observe a gap between the interpolation threshold <span>\\(\\delta _{{\\textrm{s}}}(\\kappa )\\)</span> and the linear programming threshold <span>\\(\\delta _{\\textrm{lin}}(\\kappa )\\)</span>, raising the question of the behavior of other algorithms.\n</p>","PeriodicalId":20527,"journal":{"name":"Probability Theory and Related Fields","volume":"113 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Probability Theory and Related Fields","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s00440-023-01248-y","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

Abstract

In the negative perceptron problem we are given n data points \((\varvec{x}_i,y_i)\), where \(\varvec{x}_i\) is a d-dimensional vector and \(y_i\in \{+1,-1\}\) is a binary label. The data are not linearly separable and hence we content ourselves to find a linear classifier with the largest possible negative margin. In other words, we want to find a unit norm vector \(\varvec{\theta }\) that maximizes \(\min _{i\le n}y_i\langle \varvec{\theta },\varvec{x}_i\rangle \). This is a non-convex optimization problem (it is equivalent to finding a maximum norm vector in a polytope), and we study its typical properties under two random models for the data. We consider the proportional asymptotics in which \(n,d\rightarrow \infty \) with \(n/d\rightarrow \delta \), and prove upper and lower bounds on the maximum margin \(\kappa _{{\textrm{s}}}(\delta )\) or—equivalently—on its inverse function \(\delta _{{\textrm{s}}}(\kappa )\). In other words, \(\delta _{{\textrm{s}}}(\kappa )\) is the overparametrization threshold: for \(n/d\le \delta _{{\textrm{s}}}(\kappa )-{\varepsilon }\) a classifier achieving vanishing training error exists with high probability, while for \(n/d\ge \delta _{{\textrm{s}}}(\kappa )+{\varepsilon }\) it does not. Our bounds on \(\delta _{{\textrm{s}}}(\kappa )\) match to the leading order as \(\kappa \rightarrow -\infty \). We then analyze a linear programming algorithm to find a solution, and characterize the corresponding threshold \(\delta _{\textrm{lin}}(\kappa )\). We observe a gap between the interpolation threshold \(\delta _{{\textrm{s}}}(\kappa )\) and the linear programming threshold \(\delta _{\textrm{lin}}(\kappa )\), raising the question of the behavior of other algorithms.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
过度参数化的可操作性:以负感知器为例
在负感知器问题中,我们得到了 n 个数据点 \((\varvec{x}_i,y_i)\),其中 \(\varvec{x}_i\)是一个 d 维向量,\(y_i\in \{+1,-1\}\)是一个二进制标签。数据不是线性可分的,因此我们只想找到一个负边际最大的线性分类器。换句话说,我们想找到一个单位规范向量,使其最大化(min _{i\le n}y_i\langle \varvec{\theta },\varvec{x}_i\rangle \)。这是一个非凸优化问题(相当于在多面体中寻找最大规范向量),我们将研究它在两种随机数据模型下的典型性质。我们考虑了其中 \(n,d\rightarrow \infty \) 与 \(n/d\rightarrow \delta \) 的比例渐近,并证明了最大边际 \(\kappa _{\textrm{s}}(\delta )\) 或--等价于--其反函数 \(\delta _{\textrm{s}}(\kappa )\) 的上界和下界。换句话说,\(\delta _{textrm{s}}}(\kappa )\)就是过参数化阈值:对于(n/d\le \delta _{\textrm{s}}}(\kappa )-{\varepsilon }\)来说,训练误差消失的分类器很有可能存在,而对于(n/d\ge \delta _{\textrm{s}}}(\kappa )+{\varepsilon }\)来说则不存在。我们对 \(\delta _{\textrm{s}}}(\kappa )\) 的约束与 \(\kappa \rightarrow -\infty \) 的前序相匹配。然后,我们分析了一种线性规划算法来找到一个解,并描述了相应的阈值 \(\delta_{textrm{lin}}(\kappa )\)。我们观察到插值阈值(\delta _{textrm{s}}(\kappa )\)和线性规划阈值(\delta _{textrm{lin}}(\kappa )\)之间存在差距,从而提出了其他算法的行为问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Probability Theory and Related Fields
Probability Theory and Related Fields 数学-统计学与概率论
CiteScore
3.70
自引率
5.00%
发文量
71
审稿时长
6-12 weeks
期刊介绍: Probability Theory and Related Fields publishes research papers in modern probability theory and its various fields of application. Thus, subjects of interest include: mathematical statistical physics, mathematical statistics, mathematical biology, theoretical computer science, and applications of probability theory to other areas of mathematics such as combinatorics, analysis, ergodic theory and geometry. Survey papers on emerging areas of importance may be considered for publication. The main languages of publication are English, French and German.
期刊最新文献
Homogenisation of nonlinear Dirichlet problems in randomly perforated domains under minimal assumptions on the size of perforations On questions of uniqueness for the vacant set of Wiener sausages and Brownian interlacements Sharp metastability transition for two-dimensional bootstrap percolation with symmetric isotropic threshold rules Subexponential lower bounds for f-ergodic Markov processes Weighted sums and Berry-Esseen type estimates in free probability theory
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1