Genomic prediction based on selective linkage disequilibrium pruning of low-coverage whole-genome sequence variants in a pure Duroc population.

IF 3.1 1区农林科学 Q1 AGRICULTURE, DAIRY & ANIMAL SCIENCE Genetics Selection Evolution Pub Date : 2023-10-18 DOI:10.1186/s12711-023-00843-w

Di Zhu, Yiqiang Zhao, Ran Zhang, Hanyu Wu, Gengyuan Cai, Zhenfang Wu, Yuzhe Wang, Xiaoxiang Hu

{"title":"Genomic prediction based on selective linkage disequilibrium pruning of low-coverage whole-genome sequence variants in a pure Duroc population.","authors":"Di Zhu, Yiqiang Zhao, Ran Zhang, Hanyu Wu, Gengyuan Cai, Zhenfang Wu, Yuzhe Wang, Xiaoxiang Hu","doi":"10.1186/s12711-023-00843-w","DOIUrl":null,"url":null,"abstract":"Background: Although the accumulation of whole-genome sequencing (WGS) data has accelerated the identification of mutations underlying complex traits, its impact on the accuracy of genomic predictions is limited. Reliable genotyping data and pre-selected beneficial loci can be used to improve prediction accuracy. Previously, we reported a low-coverage sequencing genotyping method that yielded 11.3 million highly accurate single-nucleotide polymorphisms (SNPs) in pigs. Here, we introduce a method termed selective linkage disequilibrium pruning (SLDP), which refines the set of SNPs that show a large gain during prediction of complex traits using whole-genome SNP data.Results: We used the SLDP method to identify and select markers among millions of SNPs based on genome-wide association study (GWAS) prior information. We evaluated the performance of SLDP with respect to three real traits and six simulated traits with varying genetic architectures using two representative models (genomic best linear unbiased prediction and BayesR) on samples from 3579 Duroc boars. SLDP was determined by testing 180 combinations of two core parameters (GWAS P-value thresholds and linkage disequilibrium r2). The parameters for each trait were optimized in the training population by five fold cross-validation and then tested in the validation population. Similar to previous GWAS prior-based methods, the performance of SLDP was mainly affected by the genetic architecture of the traits analyzed. Specifically, SLDP performed better for traits controlled by major quantitative trait loci (QTL) or a small number of quantitative trait nucleotides (QTN). Compared with two commercial SNP chips, genotyping-by-sequencing data, and an unselected whole-genome SNP panel, the SLDP strategy led to significant improvements in prediction accuracy, which ranged from 0.84 to 3.22% for real traits controlled by major or moderate QTL and from 1.23 to 11.47% for simulated traits controlled by a small number of QTN.Conclusions: The SLDP marker selection method can be incorporated into mainstream prediction models to yield accuracy improvements for traits with a relatively simple genetic architecture, however, it has no significant advantage for traits not controlled by major QTL. The main factors that affect its performance are the genetic architecture of traits and the reliability of GWAS prior information. Our findings can facilitate the application of WGS-based genomic selection.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"55 1","pages":"72"},"PeriodicalIF":3.1000,"publicationDate":"2023-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10583454/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetics Selection Evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12711-023-00843-w","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, DAIRY & ANIMAL SCIENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Although the accumulation of whole-genome sequencing (WGS) data has accelerated the identification of mutations underlying complex traits, its impact on the accuracy of genomic predictions is limited. Reliable genotyping data and pre-selected beneficial loci can be used to improve prediction accuracy. Previously, we reported a low-coverage sequencing genotyping method that yielded 11.3 million highly accurate single-nucleotide polymorphisms (SNPs) in pigs. Here, we introduce a method termed selective linkage disequilibrium pruning (SLDP), which refines the set of SNPs that show a large gain during prediction of complex traits using whole-genome SNP data.

Results: We used the SLDP method to identify and select markers among millions of SNPs based on genome-wide association study (GWAS) prior information. We evaluated the performance of SLDP with respect to three real traits and six simulated traits with varying genetic architectures using two representative models (genomic best linear unbiased prediction and BayesR) on samples from 3579 Duroc boars. SLDP was determined by testing 180 combinations of two core parameters (GWAS P-value thresholds and linkage disequilibrium r²). The parameters for each trait were optimized in the training population by five fold cross-validation and then tested in the validation population. Similar to previous GWAS prior-based methods, the performance of SLDP was mainly affected by the genetic architecture of the traits analyzed. Specifically, SLDP performed better for traits controlled by major quantitative trait loci (QTL) or a small number of quantitative trait nucleotides (QTN). Compared with two commercial SNP chips, genotyping-by-sequencing data, and an unselected whole-genome SNP panel, the SLDP strategy led to significant improvements in prediction accuracy, which ranged from 0.84 to 3.22% for real traits controlled by major or moderate QTL and from 1.23 to 11.47% for simulated traits controlled by a small number of QTN.

Conclusions: The SLDP marker selection method can be incorporated into mainstream prediction models to yield accuracy improvements for traits with a relatively simple genetic architecture, however, it has no significant advantage for traits not controlled by major QTL. The main factors that affect its performance are the genetic architecture of traits and the reliability of GWAS prior information. Our findings can facilitate the application of WGS-based genomic selection.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于纯杜洛克群体中低覆盖率全基因组序列变异的选择性连锁不平衡修剪的基因组预测。

背景：尽管全基因组测序（WGS）数据的积累加速了复杂性状突变的识别，但其对基因组预测准确性的影响有限。可靠的基因分型数据和预先选择的有益基因座可用于提高预测准确性。此前，我们报道了一种低覆盖率测序基因分型方法，该方法在猪中产生了1130万个高度准确的单核苷酸多态性（SNPs）。在这里，我们介绍了一种称为选择性连锁不平衡修剪（SLDP）的方法，该方法使用全基因组SNP数据来细化在预测复杂性状过程中显示出大增益的SNP集。结果：基于全基因组关联研究（GWAS）的先验信息，我们使用SLDP方法在数百万个SNPs中识别和选择标记。我们使用两个代表性模型（基因组最佳线性无偏预测和BayesR）对3579头杜洛克公猪的样本，评估了SLDP在三个真实性状和六个具有不同遗传结构的模拟性状方面的性能。SLDP是通过测试两个核心参数（GWAS P值阈值和连锁不平衡r2）的180个组合来确定的。每个特征的参数在训练群体中通过五倍交叉验证进行优化，然后在验证群体中进行测试。与以前基于GWAS先验的方法类似，SLDP的性能主要受所分析性状的遗传结构的影响。具体而言，SLDP对由主要数量性状基因座（QTL）或少量数量性状核苷酸（QTN）控制的性状表现更好。与两种商业SNP芯片、通过测序数据进行基因分型和未选择的全基因组SNP面板相比，SLDP策略显著提高了预测准确性，主要或中等QTL控制的真实性状的产量为0.84%至3.22%，少量QTN控制的模拟性状的产量则为1.23%至11.47%，对不受主效QTL控制的性状没有显著优势。影响其性能的主要因素是性状的遗传结构和GWAS先验信息的可靠性。我们的发现可以促进基于WGS的基因组选择的应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Genetics Selection Evolution 生物-奶制品与动物科学

CiteScore

6.50

自引率

9.80%

发文量

审稿时长

1 months

期刊介绍： Genetics Selection Evolution invites basic, applied and methodological content that will aid the current understanding and the utilization of genetic variability in domestic animal species. Although the focus is on domestic animal species, research on other species is invited if it contributes to the understanding of the use of genetic variability in domestic animals. Genetics Selection Evolution publishes results from all levels of study, from the gene to the quantitative trait, from the individual to the population, the breed or the species. Contributions concerning both the biological approach, from molecular genetics to quantitative genetics, as well as the mathematical approach, from population genetics to statistics, are welcome. Specific areas of interest include but are not limited to: gene and QTL identification, mapping and characterization, analysis of new phenotypes, high-throughput SNP data analysis, functional genomics, cytogenetics, genetic diversity of populations and breeds, genetic evaluation, applied and experimental selection, genomic selection, selection efficiency, and statistical methodology for the genetic analysis of phenotypes with quantitative and mixed inheritance.