Spectral Methods for Data Science: A Statistical Perspective

Yuxin Chen, Yuejie Chi, Jianqing Fan, Cong Ma
{"title":"Spectral Methods for Data Science: A Statistical Perspective","authors":"Yuxin Chen, Yuejie Chi, Jianqing Fan, Cong Ma","doi":"10.1561/2200000079","DOIUrl":null,"url":null,"abstract":"Spectral methods have emerged as a simple yet surprisingly effective approach for extracting information from massive, noisy and incomplete data. In a nutshell, spectral methods refer to a collection of algorithms built upon the eigenvalues (resp. singular values) and eigenvectors (resp. singular vectors) of some properly designed matrices constructed from data. A diverse array of applications have been found in machine learning, data science, and signal processing. Due to their simplicity and effectiveness, spectral methods are not only used as a stand-alone estimator, but also frequently employed to initialize other more sophisticated algorithms to improve performance. \nWhile the studies of spectral methods can be traced back to classical matrix perturbation theory and methods of moments, the past decade has witnessed tremendous theoretical advances in demystifying their efficacy through the lens of statistical modeling, with the aid of non-asymptotic random matrix theory. This monograph aims to present a systematic, comprehensive, yet accessible introduction to spectral methods from a modern statistical perspective, highlighting their algorithmic implications in diverse large-scale applications. In particular, our exposition gravitates around several central questions that span various applications: how to characterize the sample efficiency of spectral methods in reaching a target level of statistical accuracy, and how to assess their stability in the face of random noise, missing data, and adversarial corruptions? In addition to conventional $\\ell_2$ perturbation analysis, we present a systematic $\\ell_{\\infty}$ and $\\ell_{2,\\infty}$ perturbation theory for eigenspace and singular subspaces, which has only recently become available owing to a powerful \"leave-one-out\" analysis framework.","PeriodicalId":431372,"journal":{"name":"Found. Trends Mach. Learn.","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"111","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Found. Trends Mach. Learn.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1561/2200000079","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 111

Abstract

Spectral methods have emerged as a simple yet surprisingly effective approach for extracting information from massive, noisy and incomplete data. In a nutshell, spectral methods refer to a collection of algorithms built upon the eigenvalues (resp. singular values) and eigenvectors (resp. singular vectors) of some properly designed matrices constructed from data. A diverse array of applications have been found in machine learning, data science, and signal processing. Due to their simplicity and effectiveness, spectral methods are not only used as a stand-alone estimator, but also frequently employed to initialize other more sophisticated algorithms to improve performance. While the studies of spectral methods can be traced back to classical matrix perturbation theory and methods of moments, the past decade has witnessed tremendous theoretical advances in demystifying their efficacy through the lens of statistical modeling, with the aid of non-asymptotic random matrix theory. This monograph aims to present a systematic, comprehensive, yet accessible introduction to spectral methods from a modern statistical perspective, highlighting their algorithmic implications in diverse large-scale applications. In particular, our exposition gravitates around several central questions that span various applications: how to characterize the sample efficiency of spectral methods in reaching a target level of statistical accuracy, and how to assess their stability in the face of random noise, missing data, and adversarial corruptions? In addition to conventional $\ell_2$ perturbation analysis, we present a systematic $\ell_{\infty}$ and $\ell_{2,\infty}$ perturbation theory for eigenspace and singular subspaces, which has only recently become available owing to a powerful "leave-one-out" analysis framework.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
数据科学的光谱方法:统计学视角
光谱方法已经成为一种简单但令人惊讶的有效方法,用于从大量,嘈杂和不完整的数据中提取信息。简而言之,谱方法是指建立在特征值基础上的算法集合。奇异值)和特征向量(resp。由数据构造的适当设计的矩阵的奇异向量。在机器学习、数据科学和信号处理中发现了各种各样的应用。由于其简单和有效,谱方法不仅用作独立的估计器,而且经常用于初始化其他更复杂的算法以提高性能。虽然光谱方法的研究可以追溯到经典的矩阵摄动理论和矩量方法,但在过去的十年中,借助非渐近随机矩阵理论,通过统计建模的视角,在揭开其功效的神秘面纱方面取得了巨大的理论进展。本专著旨在从现代统计角度介绍光谱方法的系统,全面,但可访问的介绍,突出其在各种大规模应用中的算法含义。特别是,我们的阐述围绕着几个跨越各种应用的核心问题:如何表征光谱方法在达到统计精度目标水平时的样本效率,以及如何在面对随机噪声、缺失数据和对抗性腐蚀时评估它们的稳定性?除了传统的$\ell_2$摄动分析,我们提出了一个系统的$\ell_{\infty}$和$\ell_{2,\infty}$摄动理论的特征空间和奇异子空间,这是最近才成为可用的,由于一个强大的“留一个”的分析框架。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Tensor Regression Tutorial on Amortized Optimization Machine Learning for Automated Theorem Proving: Learning to Solve SAT and QSAT A unifying tutorial on Approximate Message Passing Reinforcement Learning, Bit by Bit
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1