{"title":"用模拟方法比较两种收缩方法(脊-套索)","authors":"Z. Ghareeb, Suhad Ali Shaheed Al-Temimi","doi":"10.21533/pen.v11i2.3472","DOIUrl":null,"url":null,"abstract":"The general linear model is widely used in many scientific fields, especially biological ones. The Ordinary Least Squares (OLS) estimators for the coefficients of the general linear model are characterized by good specifications symbolized by the acronym BLUE (Best Linear Unbiased Estimator), provided that the basic assumptions for building the model under study are met. The failure to achieve one of the basic assumptions or hypotheses required to build the model can lead to the emergence of estimators with low bias and high variance, which results in poor performance in both prediction and explanation of the model in question. The hypothesis that there are no multiple linear relationships between the explanatory variables is considered one of the leading hypotheses on which the model is based. Thus, the emergence of this problem leads to misleading results and high (Wide) confidence limits for the estimators associated with those variables due to problems characterizing the model. Shrinkage methods are considered one of the most effective and preferable ways to eliminate the multicollinearity problem. These methods are based on addressing the multicollinearity problems by reducing the variance of estimators in the model. Ridge and Lasso methods represent the most and most common of these methods of shrinkage. The simulation was carried out for different sample sizes (40, 120, 200) and some variables (P=30, 60) in the first and second experiments arbitrarily and at the level of low, medium, and high correlation coefficients (0.2, 0.5, 0.8). When (p=30, 60) Lasso method has the smallest (MSE) than the Ridge method. The Lasso method proved its efficiency by obtaining the least MSE. Optimal Penalty parameter (λ) chosen from Cross-Validation through minimizing (MSE) of prediction. We see a rapid increase for (MSE) for both (Ridge-Lasso) where the top axis indicates the number of model variables, and when the correlation between variables increases and sample size too, we can see the (MSE) values increase in the Ridge method than the Lasso method. A ridge method gives greater efficiency when the sample size is more significant than variables (p<n), but the Ridge method cannot shrink coefficients to precisely zero. So, the elasticity of ridge coefficients decreases, but variance increases bias, also (MSE) first remains relatively constant and then increases fast.","PeriodicalId":37519,"journal":{"name":"Periodicals of Engineering and Natural Sciences","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A comparative study between shrinkage methods (ridge-lasso) using simulation\",\"authors\":\"Z. Ghareeb, Suhad Ali Shaheed Al-Temimi\",\"doi\":\"10.21533/pen.v11i2.3472\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The general linear model is widely used in many scientific fields, especially biological ones. The Ordinary Least Squares (OLS) estimators for the coefficients of the general linear model are characterized by good specifications symbolized by the acronym BLUE (Best Linear Unbiased Estimator), provided that the basic assumptions for building the model under study are met. The failure to achieve one of the basic assumptions or hypotheses required to build the model can lead to the emergence of estimators with low bias and high variance, which results in poor performance in both prediction and explanation of the model in question. The hypothesis that there are no multiple linear relationships between the explanatory variables is considered one of the leading hypotheses on which the model is based. Thus, the emergence of this problem leads to misleading results and high (Wide) confidence limits for the estimators associated with those variables due to problems characterizing the model. Shrinkage methods are considered one of the most effective and preferable ways to eliminate the multicollinearity problem. These methods are based on addressing the multicollinearity problems by reducing the variance of estimators in the model. Ridge and Lasso methods represent the most and most common of these methods of shrinkage. The simulation was carried out for different sample sizes (40, 120, 200) and some variables (P=30, 60) in the first and second experiments arbitrarily and at the level of low, medium, and high correlation coefficients (0.2, 0.5, 0.8). When (p=30, 60) Lasso method has the smallest (MSE) than the Ridge method. The Lasso method proved its efficiency by obtaining the least MSE. Optimal Penalty parameter (λ) chosen from Cross-Validation through minimizing (MSE) of prediction. We see a rapid increase for (MSE) for both (Ridge-Lasso) where the top axis indicates the number of model variables, and when the correlation between variables increases and sample size too, we can see the (MSE) values increase in the Ridge method than the Lasso method. A ridge method gives greater efficiency when the sample size is more significant than variables (p<n), but the Ridge method cannot shrink coefficients to precisely zero. So, the elasticity of ridge coefficients decreases, but variance increases bias, also (MSE) first remains relatively constant and then increases fast.\",\"PeriodicalId\":37519,\"journal\":{\"name\":\"Periodicals of Engineering and Natural Sciences\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Periodicals of Engineering and Natural Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21533/pen.v11i2.3472\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Engineering\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Periodicals of Engineering and Natural Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21533/pen.v11i2.3472","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Engineering","Score":null,"Total":0}
A comparative study between shrinkage methods (ridge-lasso) using simulation
The general linear model is widely used in many scientific fields, especially biological ones. The Ordinary Least Squares (OLS) estimators for the coefficients of the general linear model are characterized by good specifications symbolized by the acronym BLUE (Best Linear Unbiased Estimator), provided that the basic assumptions for building the model under study are met. The failure to achieve one of the basic assumptions or hypotheses required to build the model can lead to the emergence of estimators with low bias and high variance, which results in poor performance in both prediction and explanation of the model in question. The hypothesis that there are no multiple linear relationships between the explanatory variables is considered one of the leading hypotheses on which the model is based. Thus, the emergence of this problem leads to misleading results and high (Wide) confidence limits for the estimators associated with those variables due to problems characterizing the model. Shrinkage methods are considered one of the most effective and preferable ways to eliminate the multicollinearity problem. These methods are based on addressing the multicollinearity problems by reducing the variance of estimators in the model. Ridge and Lasso methods represent the most and most common of these methods of shrinkage. The simulation was carried out for different sample sizes (40, 120, 200) and some variables (P=30, 60) in the first and second experiments arbitrarily and at the level of low, medium, and high correlation coefficients (0.2, 0.5, 0.8). When (p=30, 60) Lasso method has the smallest (MSE) than the Ridge method. The Lasso method proved its efficiency by obtaining the least MSE. Optimal Penalty parameter (λ) chosen from Cross-Validation through minimizing (MSE) of prediction. We see a rapid increase for (MSE) for both (Ridge-Lasso) where the top axis indicates the number of model variables, and when the correlation between variables increases and sample size too, we can see the (MSE) values increase in the Ridge method than the Lasso method. A ridge method gives greater efficiency when the sample size is more significant than variables (p