Ridge regression and deep learning models for genome-wide selection of complex traits in New Mexican Chile peppers.

IF 2.5 Q3 GENETICS & HEREDITY BMC genomic data Pub Date : 2023-12-18 DOI:10.1186/s12863-023-01179-6

Dennis N Lozada, Karansher Singh Sandhu, Madhav Bhatta

{"title":"Ridge regression and deep learning models for genome-wide selection of complex traits in New Mexican Chile peppers.","authors":"Dennis N Lozada, Karansher Singh Sandhu, Madhav Bhatta","doi":"10.1186/s12863-023-01179-6","DOIUrl":null,"url":null,"abstract":"Background: Genomewide prediction estimates the genomic breeding values of selection candidates which can be utilized for population improvement and cultivar development. Ridge regression and deep learning-based selection models were implemented for yield and agronomic traits of 204 chile pepper genotypes evaluated in multi-environment trials in New Mexico, USA.Results: Accuracy of prediction differed across different models under ten-fold cross-validations, where high prediction accuracy was observed for highly heritable traits such as plant height and plant width. No model was superior across traits using 14,922 SNP markers for genomewide selection. Bayesian ridge regression had the highest average accuracy for first pod date (0.77) and total yield per plant (0.33). Multilayer perceptron (MLP) was the most superior for flowering time (0.76) and plant height (0.73), whereas the genomic BLUP model had the highest accuracy for plant width (0.62). Using a subset of 7,690 SNP loci resulting from grouping markers based on linkage disequilibrium coefficients resulted in improved accuracy for first pod date, ten pod weight, and total yield per plant, even under a relatively small training population size for MLP and random forest models. Genomic and ridge regression BLUP models were sufficient for optimal prediction accuracies for small training population size. Combining phenotypic selection and genomewide selection resulted in improved selection response for yield-related traits, indicating that integrated approaches can result in improved gains achieved through selection.Conclusions: Accuracy values for ridge regression and deep learning prediction models demonstrate the potential of implementing genomewide selection for genetic improvement in chile pepper breeding programs. Ultimately, a large training data is relevant for improved genomic selection accuracy for the deep learning models.","PeriodicalId":72427,"journal":{"name":"BMC genomic data","volume":"24 1","pages":"80"},"PeriodicalIF":2.5000,"publicationDate":"2023-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10726521/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC genomic data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s12863-023-01179-6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Genomewide prediction estimates the genomic breeding values of selection candidates which can be utilized for population improvement and cultivar development. Ridge regression and deep learning-based selection models were implemented for yield and agronomic traits of 204 chile pepper genotypes evaluated in multi-environment trials in New Mexico, USA.

Results: Accuracy of prediction differed across different models under ten-fold cross-validations, where high prediction accuracy was observed for highly heritable traits such as plant height and plant width. No model was superior across traits using 14,922 SNP markers for genomewide selection. Bayesian ridge regression had the highest average accuracy for first pod date (0.77) and total yield per plant (0.33). Multilayer perceptron (MLP) was the most superior for flowering time (0.76) and plant height (0.73), whereas the genomic BLUP model had the highest accuracy for plant width (0.62). Using a subset of 7,690 SNP loci resulting from grouping markers based on linkage disequilibrium coefficients resulted in improved accuracy for first pod date, ten pod weight, and total yield per plant, even under a relatively small training population size for MLP and random forest models. Genomic and ridge regression BLUP models were sufficient for optimal prediction accuracies for small training population size. Combining phenotypic selection and genomewide selection resulted in improved selection response for yield-related traits, indicating that integrated approaches can result in improved gains achieved through selection.

Conclusions: Accuracy values for ridge regression and deep learning prediction models demonstrate the potential of implementing genomewide selection for genetic improvement in chile pepper breeding programs. Ultimately, a large training data is relevant for improved genomic selection accuracy for the deep learning models.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于新墨西哥智利辣椒复杂性状全基因组选择的岭回归和深度学习模型。

背景：全基因组预测可以估算出候选品种的基因组育种价值，并将其用于品种改良和栽培品种开发。在美国新墨西哥州的多环境试验中，对 204 个辣椒基因型的产量和农艺性状进行了评估，并建立了基于岭回归和深度学习的选择模型：在十倍交叉验证下，不同模型的预测准确率各不相同，植株高度和植株宽度等高遗传性状的预测准确率较高。在使用 14922 个 SNP 标记进行全基因组选择时，没有哪个模型在所有性状上都更优越。贝叶斯脊回归对第一个结荚期（0.77）和每株总产量（0.33）的平均准确率最高。多层感知器（MLP）在花期（0.76）和株高（0.73）方面的准确率最高，而基因组 BLUP 模型在株宽（0.62）方面的准确率最高。根据连锁不平衡系数对标记进行分组后得到的 7,690 个 SNP 位点子集，即使在 MLP 和随机森林模型的训练群体规模相对较小的情况下，也能提高第一荚日期、十荚重和单株总产量的准确性。基因组和脊回归 BLUP 模型足以在训练群体较小的情况下达到最佳预测精度。将表型选择与全基因组选择相结合可提高产量相关性状的选择响应，这表明综合方法可提高通过选择获得的收益：结论：脊回归和深度学习预测模型的准确度值表明，在辣椒育种项目中实施全基因组选择进行遗传改良具有潜力。最终，大量的训练数据对于提高深度学习模型的基因组选择准确性至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

BMC genomic data

CiteScore

4.90

自引率

0.00%

发文量