HIGH-DIMENSIONAL FACTOR REGRESSION FOR HETEROGENEOUS SUBPOPULATIONS.

IF 1.5 3区 数学 Q2 STATISTICS & PROBABILITY Statistica Sinica Pub Date : 2023-01-01 DOI:10.5705/ss.202020.0145
Peiyao Wang, Quefeng Li, Dinggang Shen, Yufeng Liu
{"title":"HIGH-DIMENSIONAL FACTOR REGRESSION FOR HETEROGENEOUS SUBPOPULATIONS.","authors":"Peiyao Wang, Quefeng Li, Dinggang Shen, Yufeng Liu","doi":"10.5705/ss.202020.0145","DOIUrl":null,"url":null,"abstract":"<p><p>In modern scientific research, data heterogeneity is commonly observed owing to the abundance of complex data. We propose a factor regression model for data with heterogeneous subpopulations. The proposed model can be represented as a decomposition of heterogeneous and homogeneous terms. The heterogeneous term is driven by latent factors in different subpopulations. The homogeneous term captures common variation in the covariates and shares common regression coefficients across subpopulations. Our proposed model attains a good balance between a global model and a group-specific model. The global model ignores the data heterogeneity, while the group-specific model fits each subgroup separately. We prove the estimation and prediction consistency for our proposed estimators, and show that it has better convergence rates than those of the group-specific and global models. We show that the extra cost of estimating latent factors is asymptotically negligible and the minimax rate is still attainable. We further demonstrate the robustness of our proposed method by studying its prediction error under a mis-specified group-specific model. Finally, we conduct simulation studies and analyze a data set from the Alzheimer's Disease Neuroimaging Initiative and an aggregated microarray data set to further demonstrate the competitiveness and interpretability of our proposed factor regression model.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"33 1","pages":"27-53"},"PeriodicalIF":1.5000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10583735/pdf/nihms-1892524.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistica Sinica","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.5705/ss.202020.0145","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

Abstract

In modern scientific research, data heterogeneity is commonly observed owing to the abundance of complex data. We propose a factor regression model for data with heterogeneous subpopulations. The proposed model can be represented as a decomposition of heterogeneous and homogeneous terms. The heterogeneous term is driven by latent factors in different subpopulations. The homogeneous term captures common variation in the covariates and shares common regression coefficients across subpopulations. Our proposed model attains a good balance between a global model and a group-specific model. The global model ignores the data heterogeneity, while the group-specific model fits each subgroup separately. We prove the estimation and prediction consistency for our proposed estimators, and show that it has better convergence rates than those of the group-specific and global models. We show that the extra cost of estimating latent factors is asymptotically negligible and the minimax rate is still attainable. We further demonstrate the robustness of our proposed method by studying its prediction error under a mis-specified group-specific model. Finally, we conduct simulation studies and analyze a data set from the Alzheimer's Disease Neuroimaging Initiative and an aggregated microarray data set to further demonstrate the competitiveness and interpretability of our proposed factor regression model.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
异质亚群的高维因子回归。
在现代科学研究中,由于复杂数据的丰富性,通常会观察到数据的异质性。我们为具有异质亚群的数据提出了一个因子回归模型。所提出的模型可以表示为异构项和齐次项的分解。异质性术语是由不同亚群中的潜在因素驱动的。齐次项捕捉了协变量的共同变化,并在子群体中共享共同的回归系数。我们提出的模型在全局模型和特定群体模型之间取得了良好的平衡。全局模型忽略了数据的异质性,而特定于组的模型分别适用于每个子组。我们证明了我们提出的估计量的估计和预测的一致性,并表明它比特定群体和全局模型具有更好的收敛速度。我们证明了估计潜在因素的额外成本是渐近可忽略的,并且极小极大率仍然是可实现的。我们通过研究在错误指定的特定群体模型下的预测误差,进一步证明了我们提出的方法的稳健性。最后,我们进行了模拟研究,并分析了阿尔茨海默病神经成像倡议的数据集和汇总的微阵列数据集,以进一步证明我们提出的因子回归模型的竞争力和可解释性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Statistica Sinica
Statistica Sinica 数学-统计学与概率论
CiteScore
2.10
自引率
0.00%
发文量
82
审稿时长
10.5 months
期刊介绍: Statistica Sinica aims to meet the needs of statisticians in a rapidly changing world. It provides a forum for the publication of innovative work of high quality in all areas of statistics, including theory, methodology and applications. The journal encourages the development and principled use of statistical methodology that is relevant for society, science and technology.
期刊最新文献
Multi-response Regression for Block-missing Multi-modal Data without Imputation. On the Efficiency of Composite Likelihood Estimation for Gaussian Spatial Processes Adaptive Randomization via Mahalanobis Distance Unbiased Boosting Estimation for Censored Survival Data Parsimonious Tensor Discriminant Analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1