Optimal discriminant analysis in high-dimensional latent factor models

Xin Bing, M. Wegkamp
{"title":"Optimal discriminant analysis in high-dimensional latent factor models","authors":"Xin Bing, M. Wegkamp","doi":"10.1214/23-aos2289","DOIUrl":null,"url":null,"abstract":"In high-dimensional classification problems, a commonly used approach is to first project the high-dimensional features into a lower dimensional space, and base the classification on the resulting lower dimensional projections. In this paper, we formulate a latent-variable model with a hidden low-dimensional structure to justify this two-step procedure and to guide which projection to choose. We propose a computationally efficient classifier that takes certain principal components (PCs) of the observed features as projections, with the number of retained PCs selected in a data-driven way. A general theory is established for analyzing such two-step classifiers based on any projections. We derive explicit rates of convergence of the excess risk of the proposed PC-based classifier. The obtained rates are further shown to be optimal up to logarithmic factors in the minimax sense. Our theory allows the lower-dimension to grow with the sample size and is also valid even when the feature dimension (greatly) exceeds the sample size. Extensive simulations corroborate our theoretical findings. The proposed method also performs favorably relative to other existing discriminant methods on three real data examples.","PeriodicalId":22375,"journal":{"name":"The Annals of Statistics","volume":"370 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Annals of Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1214/23-aos2289","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

In high-dimensional classification problems, a commonly used approach is to first project the high-dimensional features into a lower dimensional space, and base the classification on the resulting lower dimensional projections. In this paper, we formulate a latent-variable model with a hidden low-dimensional structure to justify this two-step procedure and to guide which projection to choose. We propose a computationally efficient classifier that takes certain principal components (PCs) of the observed features as projections, with the number of retained PCs selected in a data-driven way. A general theory is established for analyzing such two-step classifiers based on any projections. We derive explicit rates of convergence of the excess risk of the proposed PC-based classifier. The obtained rates are further shown to be optimal up to logarithmic factors in the minimax sense. Our theory allows the lower-dimension to grow with the sample size and is also valid even when the feature dimension (greatly) exceeds the sample size. Extensive simulations corroborate our theoretical findings. The proposed method also performs favorably relative to other existing discriminant methods on three real data examples.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
高维潜在因子模型的最优判别分析
在高维分类问题中,一种常用的方法是首先将高维特征投影到低维空间中,然后根据得到的低维投影进行分类。在本文中,我们制定了一个隐藏低维结构的潜变量模型来证明这两步过程,并指导选择哪个投影。我们提出了一种计算效率高的分类器,它将观察到的特征的某些主成分(PCs)作为投影,并以数据驱动的方式选择保留的PCs的数量。建立了基于任意投影的两步分类器分析的一般理论。我们推导了基于pc的分类器的超额风险的显式收敛率。得到的速率进一步证明是最优的,直到对数因子在极小极大意义上。我们的理论允许低维随着样本量的增长而增长,即使特征维(大大)超过样本量也有效。大量的模拟证实了我们的理论发现。在3个实际数据实例上,该方法的性能优于其他判别方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Maximum likelihood for high-noise group orbit estimation and single-particle cryo-EM Local Whittle estimation of high-dimensional long-run variance and precision matrices Efficient estimation of the maximal association between multiple predictors and a survival outcome The impacts of unobserved covariates on covariate-adaptive randomized experiments Estimation of expected Euler characteristic curves of nonstationary smooth random fields
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1