小样本量线性模型拟合的正则化方法:用R拟合Lasso估计量。

Q2 Social Sciences Practical Assessment, Research and Evaluation Pub Date : 2016-05-01 DOI:10.7275/JR3D-CQ04

W. H. Finch, M. Finch

{"title":"小样本量线性模型拟合的正则化方法:用R拟合Lasso估计量。","authors":"W. H. Finch, M. Finch","doi":"10.7275/JR3D-CQ04","DOIUrl":null,"url":null,"abstract":"Researchers and data analysts are sometimes faced with the problem of very small samples, where the number of variables approaches or exceeds the overall sample size; i.e. high dimensional data. In such cases, standard statistical models such as regression or analysis of variance cannot be used, either because the resulting parameter estimates exhibit very high variance and can therefore not be trusted, or because the statistical algorithm cannot converge on parameter estimates at all. There exist an alternative set of model estimation procedures, known collectively as regularization methods, which can be used in such circumstances, and which have been shown through simulation research to yield accurate parameter estimates. The purpose of this paper is to describe, for those unfamiliar with them, the most popular of these regularization methods, the lasso, and to demonstrate its use on an actual high dimensional dataset involving adults with autism, using the R software language. Results of analyses involving relating measures of executive functioning with a full scale intelligence test score are presented, and implications of using these models are discussed.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"Regularization Methods for Fitting Linear Models with Small Sample Sizes: Fitting the Lasso Estimator Using R.\",\"authors\":\"W. H. Finch, M. Finch\",\"doi\":\"10.7275/JR3D-CQ04\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Researchers and data analysts are sometimes faced with the problem of very small samples, where the number of variables approaches or exceeds the overall sample size; i.e. high dimensional data. In such cases, standard statistical models such as regression or analysis of variance cannot be used, either because the resulting parameter estimates exhibit very high variance and can therefore not be trusted, or because the statistical algorithm cannot converge on parameter estimates at all. There exist an alternative set of model estimation procedures, known collectively as regularization methods, which can be used in such circumstances, and which have been shown through simulation research to yield accurate parameter estimates. The purpose of this paper is to describe, for those unfamiliar with them, the most popular of these regularization methods, the lasso, and to demonstrate its use on an actual high dimensional dataset involving adults with autism, using the R software language. Results of analyses involving relating measures of executive functioning with a full scale intelligence test score are presented, and implications of using these models are discussed.\",\"PeriodicalId\":20361,\"journal\":{\"name\":\"Practical Assessment, Research and Evaluation\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Practical Assessment, Research and Evaluation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.7275/JR3D-CQ04\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Practical Assessment, Research and Evaluation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7275/JR3D-CQ04","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Social Sciences","Score":null,"Total":0}

引用次数: 24

摘要

研究人员和数据分析师有时会面临样本非常小的问题，其中变量的数量接近或超过总体样本量;即高维数据。在这种情况下，不能使用诸如回归或方差分析之类的标准统计模型，因为所得到的参数估计表现出非常高的方差，因此不可信，或者因为统计算法根本不能收敛于参数估计。存在一组可供选择的模型估计程序，统称为正则化方法，可以在这种情况下使用，并且通过仿真研究表明可以产生准确的参数估计。对于那些不熟悉这些正则化方法的人来说，本文的目的是描述这些正则化方法中最流行的套索，并使用R软件语言演示其在涉及自闭症成年人的实际高维数据集上的使用。分析结果涉及执行功能的相关措施与全面的智力测试成绩提出，并讨论了使用这些模型的含义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Regularization Methods for Fitting Linear Models with Small Sample Sizes: Fitting the Lasso Estimator Using R.

Researchers and data analysts are sometimes faced with the problem of very small samples, where the number of variables approaches or exceeds the overall sample size; i.e. high dimensional data. In such cases, standard statistical models such as regression or analysis of variance cannot be used, either because the resulting parameter estimates exhibit very high variance and can therefore not be trusted, or because the statistical algorithm cannot converge on parameter estimates at all. There exist an alternative set of model estimation procedures, known collectively as regularization methods, which can be used in such circumstances, and which have been shown through simulation research to yield accurate parameter estimates. The purpose of this paper is to describe, for those unfamiliar with them, the most popular of these regularization methods, the lasso, and to demonstrate its use on an actual high dimensional dataset involving adults with autism, using the R software language. Results of analyses involving relating measures of executive functioning with a full scale intelligence test score are presented, and implications of using these models are discussed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Practical Assessment, Research and Evaluation Social Sciences-Education

CiteScore

2.60

自引率

0.00%

发文量