J. A. D. Costa, Carolina Azevedo, M. Nascimento, F. F. Silva, M. Resende, A. C. Nascimento
{"title":"研究论文基于降维的基因组预测回归方法的比较","authors":"J. A. D. Costa, Carolina Azevedo, M. Nascimento, F. F. Silva, M. Resende, A. C. Nascimento","doi":"10.4238/GMR18877","DOIUrl":null,"url":null,"abstract":". The quality of fit of a multiple linear regression model often encounters multicollinearity and high dimensionality problems, making it impossible to obtain stable estimates through the traditional method of estimation based on ordinary least squares. To overcome such challenges, dimensionality reduction methods have been proposed, because of their simple theory and easy application. We compared three dimensionality reduction methods: Principal Components Regression (PCR), Partial Least Squares (PLS), and Independent Components Regression (ICR). An important step for dimensionality reduction and prediction is selecting the number of components, as it affects the linear combinations of the explanatory variables. The linear combinations are inserted into the model to predict the response based on a reduced number of parameters. We examined the criteria for the selection of the number of components. The dimensionality reduction methods were applied to genomic and phenotype data. We evaluated 370 accessions of Asian rice, Oryza sativa , which were genotyped for 36,901 SNPs markers considered to predict the genomic values for the number of panicles per plant trait. This data set presented multicollinearity and high dimensionality. The computational time for each method was also recorded. Among the methods, PCR and ICR gave the highest accuracy values, with ICR standing out for presenting estimates of the least biased genomic values. However, ICR required more computational time than the other methodologies.","PeriodicalId":12518,"journal":{"name":"Genetics and Molecular Research","volume":"56 1","pages":""},"PeriodicalIF":0.6000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research Article A comparison of regression methods based on dimensional reduction for genomic prediction\",\"authors\":\"J. A. D. Costa, Carolina Azevedo, M. Nascimento, F. F. Silva, M. Resende, A. C. Nascimento\",\"doi\":\"10.4238/GMR18877\",\"DOIUrl\":null,\"url\":null,\"abstract\":\". The quality of fit of a multiple linear regression model often encounters multicollinearity and high dimensionality problems, making it impossible to obtain stable estimates through the traditional method of estimation based on ordinary least squares. To overcome such challenges, dimensionality reduction methods have been proposed, because of their simple theory and easy application. We compared three dimensionality reduction methods: Principal Components Regression (PCR), Partial Least Squares (PLS), and Independent Components Regression (ICR). An important step for dimensionality reduction and prediction is selecting the number of components, as it affects the linear combinations of the explanatory variables. The linear combinations are inserted into the model to predict the response based on a reduced number of parameters. We examined the criteria for the selection of the number of components. The dimensionality reduction methods were applied to genomic and phenotype data. We evaluated 370 accessions of Asian rice, Oryza sativa , which were genotyped for 36,901 SNPs markers considered to predict the genomic values for the number of panicles per plant trait. This data set presented multicollinearity and high dimensionality. The computational time for each method was also recorded. Among the methods, PCR and ICR gave the highest accuracy values, with ICR standing out for presenting estimates of the least biased genomic values. However, ICR required more computational time than the other methodologies.\",\"PeriodicalId\":12518,\"journal\":{\"name\":\"Genetics and Molecular Research\",\"volume\":\"56 1\",\"pages\":\"\"},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2021-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genetics and Molecular Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4238/GMR18877\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetics and Molecular Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4238/GMR18877","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
Research Article A comparison of regression methods based on dimensional reduction for genomic prediction
. The quality of fit of a multiple linear regression model often encounters multicollinearity and high dimensionality problems, making it impossible to obtain stable estimates through the traditional method of estimation based on ordinary least squares. To overcome such challenges, dimensionality reduction methods have been proposed, because of their simple theory and easy application. We compared three dimensionality reduction methods: Principal Components Regression (PCR), Partial Least Squares (PLS), and Independent Components Regression (ICR). An important step for dimensionality reduction and prediction is selecting the number of components, as it affects the linear combinations of the explanatory variables. The linear combinations are inserted into the model to predict the response based on a reduced number of parameters. We examined the criteria for the selection of the number of components. The dimensionality reduction methods were applied to genomic and phenotype data. We evaluated 370 accessions of Asian rice, Oryza sativa , which were genotyped for 36,901 SNPs markers considered to predict the genomic values for the number of panicles per plant trait. This data set presented multicollinearity and high dimensionality. The computational time for each method was also recorded. Among the methods, PCR and ICR gave the highest accuracy values, with ICR standing out for presenting estimates of the least biased genomic values. However, ICR required more computational time than the other methodologies.
期刊介绍:
Genetics and Molecular Research (GMR), maintained by the Research Foundation of Ribeirão Preto (Fundação de Pesquisas Científicas de Ribeirão Preto), publishes high quality research in genetics and molecular biology. GMR reflects the full breadth and interdisciplinary nature of this research by publishing outstanding original contributions in all areas of biology.
GMR publishes human studies, as well as research on model organisms—from mice and flies, to plants and bacteria. Our emphasis is on studies of broad interest that provide significant insight into a biological process or processes. Topics include, but are not limited to gene discovery and function, population genetics, evolution, genome projects, comparative and functional genomics, molecular analysis of simple and complex genetic traits, cancer genetics, medical genetics, disease biology, agricultural genomics, developmental genetics, regulatory variation in gene expression, pharmacological genomics, evolution, gene expression, chromosome biology, and epigenetics.