PLS for Big Data: A unified parallel algorithm for regularised group PLS

IF 11 Q1 STATISTICS & PROBABILITY Statistics Surveys Pub Date : 2019-01-01 DOI:10.1214/19-ss125
P. L. D. Micheaux, B. Liquet, Matthew Sutton
{"title":"PLS for Big Data: A unified parallel algorithm for regularised group PLS","authors":"P. L. D. Micheaux, B. Liquet, Matthew Sutton","doi":"10.1214/19-ss125","DOIUrl":null,"url":null,"abstract":"Partial Least Squares (PLS) methods have been heavily exploited to analyse the association between two blocks of data. These powerful approaches can be applied to data sets where the number of variables is greater than the number of observations and in the presence of high collinearity between variables. Different sparse versions of PLS have been developed to integrate multiple data sets while simultaneously selecting the contributing variables. Sparse modeling is a key factor in obtaining better estimators and identifying associations between multiple data sets. The cornerstone of the sparse PLS methods is the link between the singular value decomposition (SVD) of a matrix (constructed from deflated versions of the original data) and least squares minimization in linear regression. We review four popular PLS methods for two blocks of data. A unified algorithm is proposed to perform all four types of PLS including their regularised versions. We present various approaches to decrease the computation time and show how the whole procedure can be scalable to big data sets. The bigsgPLS R package implements our unified algorithm and is available at https://github.com/matt-sutton/bigsgPLS. MSC 2010 subject classifications: Primary 6202, 62J99.","PeriodicalId":46627,"journal":{"name":"Statistics Surveys","volume":"148 1","pages":""},"PeriodicalIF":11.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistics Surveys","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1214/19-ss125","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 6

Abstract

Partial Least Squares (PLS) methods have been heavily exploited to analyse the association between two blocks of data. These powerful approaches can be applied to data sets where the number of variables is greater than the number of observations and in the presence of high collinearity between variables. Different sparse versions of PLS have been developed to integrate multiple data sets while simultaneously selecting the contributing variables. Sparse modeling is a key factor in obtaining better estimators and identifying associations between multiple data sets. The cornerstone of the sparse PLS methods is the link between the singular value decomposition (SVD) of a matrix (constructed from deflated versions of the original data) and least squares minimization in linear regression. We review four popular PLS methods for two blocks of data. A unified algorithm is proposed to perform all four types of PLS including their regularised versions. We present various approaches to decrease the computation time and show how the whole procedure can be scalable to big data sets. The bigsgPLS R package implements our unified algorithm and is available at https://github.com/matt-sutton/bigsgPLS. MSC 2010 subject classifications: Primary 6202, 62J99.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
面向大数据的PLS:正则化群PLS的统一并行算法
偏最小二乘(PLS)方法已被大量利用来分析两个数据块之间的关联。这些强大的方法可以应用于变量数量大于观测数量以及变量之间存在高共线性的数据集。不同的稀疏版本的PLS已经开发集成多个数据集,同时选择贡献变量。稀疏建模是获得更好的估计器和识别多个数据集之间关联的关键因素。稀疏PLS方法的基础是矩阵的奇异值分解(SVD)(由原始数据的压缩版本构造)和线性回归中的最小二乘最小化之间的联系。我们回顾了两个数据块的四种流行的PLS方法。提出了一种统一的算法来执行所有四种类型的PLS,包括它们的正则化版本。我们提出了各种方法来减少计算时间,并展示了整个过程如何可扩展到大数据集。bigsgPLS R包实现了我们的统一算法,可在https://github.com/matt-sutton/bigsgPLS获得。MSC 2010学科分类:Primary 6202, 62J99。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Statistics Surveys
Statistics Surveys STATISTICS & PROBABILITY-
CiteScore
11.70
自引率
0.00%
发文量
5
期刊介绍: Statistics Surveys publishes survey articles in theoretical, computational, and applied statistics. The style of articles may range from reviews of recent research to graduate textbook exposition. Articles may be broad or narrow in scope. The essential requirements are a well specified topic and target audience, together with clear exposition. Statistics Surveys is sponsored by the American Statistical Association, the Bernoulli Society, the Institute of Mathematical Statistics, and by the Statistical Society of Canada.
期刊最新文献
White noise testing for functional time series Spline local basis methods for nonparametric density estimation Core-periphery structure in networks: A statistical exposition Kronecker-structured covariance models for multiway data A brief and understandable guide to pseudo-random number generators and specific models for security
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1