Genetics Selection Evolution最新文献_第10页

Sex identification in rainbow trout using genomic information and machine learning 利用基因组信息和机器学习进行虹鳟鱼性别鉴定

IF 4.1 1区农林科学 Q1 AGRICULTURE, DAIRY & ANIMAL SCIENCE

Genetics Selection Evolution

Pub Date : 2024-12-30 DOI: 10.1186/s12711-024-00944-0

Andrei A. Kudinov, Antti Kause

Sex identification in farmed fish is important for the management of fish stocks and breeding programs, but identification based on visual characteristics is typically difficult or impossible in juvenile or premature fish. The amount of genomic data obtained from farmed fish is rapidly growing with the implementation of genomic selection in aquaculture. In comparison to mammals and birds, ray-finned fishes exhibit a greater diversity of sex determination systems, with an absence of conserved genomic regions. A group of genomic markers located on a standard genotyping array has been reported to potentially be linked with sex determination in rainbow trout. However, the set of markers suitable for sex identification may vary between populations. Sex identification from genomic data is usually performed using probabilistic methods, where suitable markers are known beforehand. In our study, we demonstrated the use of the Extreme Gradient Boosting approach from the supervised machine learning gradient boost framework to predict sex from unimputed genomic data, when the suitability of the markers was unknown a priori. The accuracy of the method was assessed using four simulated datasets with different genotyping error rates and one real dataset from the Finnish Rainbow Trout Breeding Program. The method showed high prediction quality on both simulated and real datasets. For simulated datasets with low (5%) and high (50%) genotyping error rates, the accuracies were 1.0 and 0.60, respectively. In the real data, the method achieved a prediction accuracy of 98%, which is suitable for routine use.

养殖鱼类的性别鉴定对鱼类种群管理和繁殖计划至关重要，但基于视觉特征的鉴定在幼鱼或早熟鱼中通常是困难或不可能的。随着基因组选择在水产养殖中的实施，从养殖鱼类中获得的基因组数据量正在迅速增长。与哺乳动物和鸟类相比，鳍鱼表现出更大的性别决定系统多样性，缺乏保守的基因组区域。据报道，一组位于标准基因分型阵列上的基因组标记可能与虹鳟鱼的性别决定有关。然而，适合于性别鉴定的一组标记可能在不同的种群中有所不同。从基因组数据中进行性别鉴定通常使用概率方法，预先知道合适的标记。在我们的研究中，我们演示了使用监督机器学习梯度增强框架中的极端梯度增强方法，当标记的适用性先验未知时，从未输入的基因组数据中预测性别。使用四个具有不同基因分型错误率的模拟数据集和一个来自芬兰虹鳟鱼育种计划的真实数据集来评估该方法的准确性。该方法在模拟和实际数据集上均显示出较高的预测质量。对于低（5%）和高（50%）基因分型错误率的模拟数据集，准确率分别为1.0和0.60。在实际数据中，该方法的预测准确率达到98%，适合日常使用。

{"title":"Sex identification in rainbow trout using genomic information and machine learning","authors":"Andrei A. Kudinov, Antti Kause","doi":"10.1186/s12711-024-00944-0","DOIUrl":"https://doi.org/10.1186/s12711-024-00944-0","url":null,"abstract":"Sex identification in farmed fish is important for the management of fish stocks and breeding programs, but identification based on visual characteristics is typically difficult or impossible in juvenile or premature fish. The amount of genomic data obtained from farmed fish is rapidly growing with the implementation of genomic selection in aquaculture. In comparison to mammals and birds, ray-finned fishes exhibit a greater diversity of sex determination systems, with an absence of conserved genomic regions. A group of genomic markers located on a standard genotyping array has been reported to potentially be linked with sex determination in rainbow trout. However, the set of markers suitable for sex identification may vary between populations. Sex identification from genomic data is usually performed using probabilistic methods, where suitable markers are known beforehand. In our study, we demonstrated the use of the Extreme Gradient Boosting approach from the supervised machine learning gradient boost framework to predict sex from unimputed genomic data, when the suitability of the markers was unknown a priori. The accuracy of the method was assessed using four simulated datasets with different genotyping error rates and one real dataset from the Finnish Rainbow Trout Breeding Program. The method showed high prediction quality on both simulated and real datasets. For simulated datasets with low (5%) and high (50%) genotyping error rates, the accuracies were 1.0 and 0.60, respectively. In the real data, the method achieved a prediction accuracy of 98%, which is suitable for routine use.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"4 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142901813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Haplotype analysis incorporating ancestral origins identified novel genetic loci associated with chicken body weight using an advanced intercross line 结合祖先起源的单倍型分析利用先进的杂交系发现了与鸡体重相关的新的遗传位点

IF 4.1 1区农林科学 Q1 AGRICULTURE, DAIRY & ANIMAL SCIENCE

Genetics Selection Evolution

Pub Date : 2024-12-20 DOI: 10.1186/s12711-024-00946-y

Lina Bu, Yuzhe Wang, Lizhi Tan, Zilong Wen, Xiaoxiang Hu, Zhiwu Zhang, Yiqiang Zhao

The genome-wide association study (GWAS) is a powerful method for mapping quantitative trait loci (QTL). However, standard GWAS can detect only QTL that segregate in the mapping population. Crossing populations with different characteristics increases genetic variability but F2 or back-crosses lack mapping resolution due to the limited number of recombination events. This drawback can be overcome with advanced intercross line (AIL) populations, which increase the number recombination events and provide a more accurate mapping resolution. Recent studies in humans have revealed ancestry-dependent genetic architecture and shown the effectiveness of admixture mapping in admixed populations. Through the incorporation of line-of-origin effects and GWAS on an F9 AIL population, we identified genes that affect body weight at eight weeks of age (BW8) in chickens. The proposed ancestral-haplotype-based GWAS (testing only the origin regardless of the alleles) revealed three new QTLs on GGA12, GGA15, and GGA20. By using the concepts of ancestral homozygotes (individuals that carry two haplotypes of the same origin) and ancestral heterozygotes (carrying one haplotype of each origin), we identified 632 loci that exhibited high-parent (the heterozygote is better than both parents) and mid-parent (the heterozygote is better than the median of the parents) dominance across 12 chromosomes. Out of the 199 genes associated with BW8, EYA1, PDE1C, and MYC were identified as the best candidate genes for further validation. In addition to the candidate genes reported in this study, our research demonstrates the effectiveness of incorporating ancestral information in population genetic analyses, which can be broadly applicable for genetic mapping in populations generated by ancestors with distinct phenotypes and genetic backgrounds. Our methods can benefit both geneticists and biologists interested in the genetic determinism of complex traits.

全基因组关联研究（GWAS）是绘制数量性状位点（QTL）图的有力方法。然而，标准的全基因组关联研究只能检测在制图群体中分离的 QTL。具有不同特征的群体杂交可增加遗传变异性，但由于重组事件数量有限，F2 或回交缺乏制图分辨率。先进的杂交系（AIL）群体可以克服这一缺点，它们增加了重组事件的数量，提供了更精确的制图分辨率。最近对人类的研究揭示了依赖祖先的遗传结构，并显示了在混血人群中进行混血绘图的有效性。通过在一个 F9 AIL 群体中纳入原系效应和 GWAS，我们确定了影响鸡八周龄体重（BW8）的基因。所提出的基于祖先组型的 GWAS（只检测起源而不检测等位基因）揭示了 GGA12、GGA15 和 GGA20 上的三个新 QTL。通过使用祖先同源基因（携带两个同源单倍型的个体）和祖先杂合基因（各携带一个同源单倍型）的概念，我们在 12 条染色体上发现了 632 个表现出高亲本（杂合基因优于双亲）和中亲本（杂合基因优于亲本的中位数）显性的位点。在与 BW8 相关的 199 个基因中，EYA1、PDE1C 和 MYC 被确定为需要进一步验证的最佳候选基因。除了本研究中报告的候选基因外，我们的研究还证明了将祖先信息纳入群体遗传分析的有效性，这种方法可广泛应用于由具有不同表型和遗传背景的祖先所产生的群体的遗传图谱绘制。我们的方法可以使遗传学家和对复杂性状的遗传决定论感兴趣的生物学家受益。

{"title":"Haplotype analysis incorporating ancestral origins identified novel genetic loci associated with chicken body weight using an advanced intercross line","authors":"Lina Bu, Yuzhe Wang, Lizhi Tan, Zilong Wen, Xiaoxiang Hu, Zhiwu Zhang, Yiqiang Zhao","doi":"10.1186/s12711-024-00946-y","DOIUrl":"https://doi.org/10.1186/s12711-024-00946-y","url":null,"abstract":"The genome-wide association study (GWAS) is a powerful method for mapping quantitative trait loci (QTL). However, standard GWAS can detect only QTL that segregate in the mapping population. Crossing populations with different characteristics increases genetic variability but F2 or back-crosses lack mapping resolution due to the limited number of recombination events. This drawback can be overcome with advanced intercross line (AIL) populations, which increase the number recombination events and provide a more accurate mapping resolution. Recent studies in humans have revealed ancestry-dependent genetic architecture and shown the effectiveness of admixture mapping in admixed populations. Through the incorporation of line-of-origin effects and GWAS on an F9 AIL population, we identified genes that affect body weight at eight weeks of age (BW8) in chickens. The proposed ancestral-haplotype-based GWAS (testing only the origin regardless of the alleles) revealed three new QTLs on GGA12, GGA15, and GGA20. By using the concepts of ancestral homozygotes (individuals that carry two haplotypes of the same origin) and ancestral heterozygotes (carrying one haplotype of each origin), we identified 632 loci that exhibited high-parent (the heterozygote is better than both parents) and mid-parent (the heterozygote is better than the median of the parents) dominance across 12 chromosomes. Out of the 199 genes associated with BW8, EYA1, PDE1C, and MYC were identified as the best candidate genes for further validation. In addition to the candidate genes reported in this study, our research demonstrates the effectiveness of incorporating ancestral information in population genetic analyses, which can be broadly applicable for genetic mapping in populations generated by ancestors with distinct phenotypes and genetic backgrounds. Our methods can benefit both geneticists and biologists interested in the genetic determinism of complex traits.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"64 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142858434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Predicted breeding values for relative scrapie susceptibility for genotyped and ungenotyped sheep 基因型羊和非基因型羊相对痒病易感性的预测育种值

IF 4.1 1区农林科学 Q1 AGRICULTURE, DAIRY & ANIMAL SCIENCE

Genetics Selection Evolution

Pub Date : 2024-12-18 DOI: 10.1186/s12711-024-00947-x

Jón H. Eiríksson, Þórdís Þórarinsdóttir, Egill Gautason

Scrapie is an infectious prion disease in sheep. Selective breeding for resistant genotypes of the prion protein gene (PRNP) is an effective way to prevent scrapie outbreaks. Genotyping all selection candidates in a population is expensive but existing pedigree records can help infer the probabilities of genotypes in relatives of genotyped animals. We used linear models to predict allele content for the various PRNP alleles found in Icelandic sheep and compiled the available estimates of relative scrapie susceptibility (RSS) associated with PRNP genotypes from the literature. Using the predicted allele content and the genotypic RSS we calculated estimated breeding values (EBV) for RSS. We tested the predictions on simulated data under different scenarios that varied in the proportion of genotyped sheep, genotyping strategy, pedigree recording accuracy, genotyping error rates and assumed heritability of allele content. Prediction of allele content for rare alleles was less successful than for alleles with moderate frequencies. The accuracy of allele content and RSS EBV predictions was not affected by the assumed heritability, but the dispersion of prediction was affected. In a scenario where 40% of rams were genotyped and no errors in genotyping or recorded pedigree, the accuracy of RSS EBV for ungenotyped selection candidates was 0.49. If only 20% of rams were genotyped, or rams and ewes were genotyped randomly, or there were 10% pedigree errors, or there were 2% genotyping errors, the accuracy decreased by 0.07, 0.08, 0.03 and 0.04, respectively. With empirical data, the accuracy of RSS EBV for ungenotyped sheep was 0.46–0.65. A linear model for predicting allele content for the PRNP gene, combined with estimates of relative susceptibility associated with PRNP genotypes, can provide RSS EBV for scrapie resistance for ungenotyped selection candidates with accuracy up to 0.65. These RSS EBV can complement selection strategies based on PRNP genotypes, especially in populations where resistant genotypes are rare.

痒病是绵羊的传染性朊病毒疾病。朊蛋白基因（PRNP）抗性基因型的选择性选育是预防痒病暴发的有效途径。对种群中所有选择候选者进行基因分型是昂贵的，但现有的家谱记录可以帮助推断基因分型动物亲属的基因分型概率。我们使用线性模型预测在冰岛羊中发现的各种PRNP等位基因的等位基因含量，并从文献中编译了与PRNP基因型相关的相对痒病易感性（RSS）的可用估计。利用预测的等位基因含量和基因型RSS计算了RSS的估计育种值（EBV）。在基因型羊比例、基因分型策略、家谱记录准确性、基因分型错误率和等位基因含量的假设遗传力等不同情景下，利用模拟数据对预测结果进行检验。对罕见等位基因含量的预测不如中等频率等位基因的预测成功。等位基因含量和RSS EBV预测的准确性不受假设遗传力的影响，但预测的分散性受到影响。在40%的公羊进行基因分型，且没有基因分型错误或家谱记录错误的情况下，RSS EBV对非基因分型选择候选羊的准确性为0.49。当只对20%的公羊进行基因分型、公羊和母羊随机分型、家系误差为10%、基因分型误差为2%时，准确率分别降低0.07、0.08、0.03和0.04。实验数据表明，非基因分型绵羊的RSS EBV检测精度为0.46 ~ 0.65。预测PRNP基因等位基因含量的线性模型，结合与PRNP基因型相关的相对易感性估计，可以为非基因型选择候选人提供瘙痒病抗性的RSS EBV，准确率高达0.65。这些RSS EBV可以补充基于PRNP基因型的选择策略，特别是在耐药基因型罕见的人群中。

{"title":"Predicted breeding values for relative scrapie susceptibility for genotyped and ungenotyped sheep","authors":"Jón H. Eiríksson, Þórdís Þórarinsdóttir, Egill Gautason","doi":"10.1186/s12711-024-00947-x","DOIUrl":"https://doi.org/10.1186/s12711-024-00947-x","url":null,"abstract":"Scrapie is an infectious prion disease in sheep. Selective breeding for resistant genotypes of the prion protein gene (PRNP) is an effective way to prevent scrapie outbreaks. Genotyping all selection candidates in a population is expensive but existing pedigree records can help infer the probabilities of genotypes in relatives of genotyped animals. We used linear models to predict allele content for the various PRNP alleles found in Icelandic sheep and compiled the available estimates of relative scrapie susceptibility (RSS) associated with PRNP genotypes from the literature. Using the predicted allele content and the genotypic RSS we calculated estimated breeding values (EBV) for RSS. We tested the predictions on simulated data under different scenarios that varied in the proportion of genotyped sheep, genotyping strategy, pedigree recording accuracy, genotyping error rates and assumed heritability of allele content. Prediction of allele content for rare alleles was less successful than for alleles with moderate frequencies. The accuracy of allele content and RSS EBV predictions was not affected by the assumed heritability, but the dispersion of prediction was affected. In a scenario where 40% of rams were genotyped and no errors in genotyping or recorded pedigree, the accuracy of RSS EBV for ungenotyped selection candidates was 0.49. If only 20% of rams were genotyped, or rams and ewes were genotyped randomly, or there were 10% pedigree errors, or there were 2% genotyping errors, the accuracy decreased by 0.07, 0.08, 0.03 and 0.04, respectively. With empirical data, the accuracy of RSS EBV for ungenotyped sheep was 0.46–0.65. A linear model for predicting allele content for the PRNP gene, combined with estimates of relative susceptibility associated with PRNP genotypes, can provide RSS EBV for scrapie resistance for ungenotyped selection candidates with accuracy up to 0.65. These RSS EBV can complement selection strategies based on PRNP genotypes, especially in populations where resistant genotypes are rare.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"54 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142849415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Changes in allele frequencies and genetic architecture due to selection in two pig populations 两个猪种群中等位基因频率和遗传结构因选择而发生的变化

IF 4.1 1区农林科学 Q1 AGRICULTURE, DAIRY & ANIMAL SCIENCE

Genetics Selection Evolution

Pub Date : 2024-12-17 DOI: 10.1186/s12711-024-00941-3

Yvonne C. J. Wientjes, Katrijn Peeters, Piter Bijma, Abe E. Huisman, Mario P. L. Calus

Genetic selection improves a population by increasing the frequency of favorable alleles. Understanding and monitoring allele frequency changes is, therefore, important to obtain more insight into the long-term effects of selection. This study aimed to investigate changes in allele frequencies and in results of genome-wide association studies (GWAS), and how those two are related to each other. This was studied in two maternal pig lines where selection was based on a broad selection index. Genotypes and phenotypes were available from 2015 to 2021. Several large changes in allele frequencies over the years were observed in both lines. The largest allele frequency changes were not larger than expected under drift based on gene dropping simulations, but the average allele frequency change was larger with selection. Moreover, several significant regions were found in the GWAS for the traits under selection, but those regions did not overlap with regions with larger allele frequency changes. No significant GWAS regions were found for the selection index in both lines, which included multiple traits, indicating that the index is affected by many loci of small effect. Additionally, many significant regions showed pleiotropic, and often antagonistic, associations with other traits under selection. This reduces the selection pressure on those regions, which can explain why those regions are still segregating, although the traits have been under selection for several generations. Across the years, only small changes in Manhattan plots were found, indicating that the genetic architecture was reasonably constant. No significant GWAS regions were found for any of the traits under selection among the regions with the largest changes in allele frequency, and the correlation between significance level of marker associations and changes in allele frequency over one generation was close to zero for all traits. Moreover, the largest changes in allele frequency could be explained by drift and were not necessarily a result of selection. This is probably because selection acted on a broad index for which no significant GWAS regions were found. Our results show that selecting on a broad index spreads the selection pressure across the genome, thereby limiting allele frequency changes.

遗传选择通过提高有利等位基因的频率来改善种群。因此，了解和监测等位基因频率的变化对于更深入地了解选择的长期效应非常重要。本研究旨在调查等位基因频率和全基因组关联研究（GWAS）结果的变化，以及这两者之间的关系。研究对象是基于广泛选择指数进行选择的两个母猪品系。基因型和表型可在 2015 年至 2021 年间获得。在这两个品系中都观察到了等位基因频率随时间推移而发生的一些巨大变化。最大的等位基因频率变化并不比基于基因下降模拟的预期漂移大，但平均等位基因频率变化却比选择大。此外，在选择性状的 GWAS 中发现了几个重要区域，但这些区域与等位基因频率变化较大的区域并不重叠。在两个品系的选择指数（包括多个性状）中都没有发现显著的 GWAS 区域，这表明该指数受到许多影响较小的位点的影响。此外，许多显著区域与其他被选择的性状呈现出多效性，而且往往是拮抗性。这就降低了这些区域的选择压力，从而解释了为什么这些区域仍然存在分离现象，尽管这些性状已经经过了几代的选择。在不同年份，曼哈顿地块只发生了很小的变化，这表明遗传结构相当稳定。在等位基因频率变化最大的区域中，没有发现任何受选择性状的重要 GWAS 区域，而且在所有性状中，标记关联的显著性水平与等位基因频率在一代中的变化之间的相关性接近于零。此外，等位基因频率的最大变化可以用漂移来解释，而不一定是选择的结果。这可能是因为选择作用于一个广泛的指数，而在这个指数上没有发现显著的全球基因组分析区域。我们的研究结果表明，在广泛的指数上进行选择会将选择压力分散到整个基因组，从而限制等位基因频率的变化。

{"title":"Changes in allele frequencies and genetic architecture due to selection in two pig populations","authors":"Yvonne C. J. Wientjes, Katrijn Peeters, Piter Bijma, Abe E. Huisman, Mario P. L. Calus","doi":"10.1186/s12711-024-00941-3","DOIUrl":"https://doi.org/10.1186/s12711-024-00941-3","url":null,"abstract":"Genetic selection improves a population by increasing the frequency of favorable alleles. Understanding and monitoring allele frequency changes is, therefore, important to obtain more insight into the long-term effects of selection. This study aimed to investigate changes in allele frequencies and in results of genome-wide association studies (GWAS), and how those two are related to each other. This was studied in two maternal pig lines where selection was based on a broad selection index. Genotypes and phenotypes were available from 2015 to 2021. Several large changes in allele frequencies over the years were observed in both lines. The largest allele frequency changes were not larger than expected under drift based on gene dropping simulations, but the average allele frequency change was larger with selection. Moreover, several significant regions were found in the GWAS for the traits under selection, but those regions did not overlap with regions with larger allele frequency changes. No significant GWAS regions were found for the selection index in both lines, which included multiple traits, indicating that the index is affected by many loci of small effect. Additionally, many significant regions showed pleiotropic, and often antagonistic, associations with other traits under selection. This reduces the selection pressure on those regions, which can explain why those regions are still segregating, although the traits have been under selection for several generations. Across the years, only small changes in Manhattan plots were found, indicating that the genetic architecture was reasonably constant. No significant GWAS regions were found for any of the traits under selection among the regions with the largest changes in allele frequency, and the correlation between significance level of marker associations and changes in allele frequency over one generation was close to zero for all traits. Moreover, the largest changes in allele frequency could be explained by drift and were not necessarily a result of selection. This is probably because selection acted on a broad index for which no significant GWAS regions were found. Our results show that selecting on a broad index spreads the selection pressure across the genome, thereby limiting allele frequency changes.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"26 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142832236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On the inverse association between the number of QTL and the trait-specific genomic relationship of a candidate to the training set. QTL数量与训练集候选性状特异性基因组关系的负相关研究。

IF 4.1 1区农林科学 Q1 AGRICULTURE, DAIRY & ANIMAL SCIENCE

Genetics Selection Evolution

Pub Date : 2024-12-13 DOI: 10.1186/s12711-024-00940-4

Christian Stricker, Rohan L. Fernando, Albrecht Melchinger, Hans-Juergen Auinger, Chris-Carolin Schoen

Accuracy of genomic prediction depends on the heritability of the trait, the size of the training set, the relationship of the candidates to the training set, and the $$text {Min}(N_{text {QTL}},M_e)$$ , where $$N_{text {QTL}}$$ is the number of QTL and $$M_e$$ is the number of independently segregating chromosomal segments. Due to LD, the number $$Q_e$$ of independently segregating QTL (effective QTL) can be lower than $$text {Min}(N_{text {QTL}},M_e)$$ . In this paper, we show that $$Q_e$$ is inversely associated with the trait-specific genomic relationship of a candidate to the training set. This provides an explanation for the inverse association between $$Q_e$$ and the accuracy of prediction. To quantify the genomic relationship of a candidate to all members of the training set, we considered the $$k^2$$ statistic that has been previously used for this purpose. It quantifies how well the marker covariate vector of a candidate can be represented as a linear combination of the rows of the marker covariate matrix of the training set. In this paper, we used Bayesian regression to make this statistic trait specific and argue that the trait-specific genomic relationship of a candidate to the training set is inversely associated with $$Q_e$$ . Simulation was used to demonstrate the dependence of the trait-specific $$k^2$$ statistic on $$Q_e$$ , which is related to $$N_{text {QTL}}$$ . The posterior distributions of the trait-specific $$k^2$$ statistic showed that the trait-specific genomic relationship between a candidate and the training set is inversely associated to $$Q_e$$ and $$N_{text {QTL}}$$ . Further, we show that trait-specific genomic relationship between a candidate and the training set is directly related to the size of the training set.

基因组预测的准确性取决于性状的遗传力、训练集的大小、候选基因与训练集的关系以及$$text {Min}(N_{text {QTL}},M_e)$$，其中$$N_{text {QTL}}$$是QTL的数量，$$M_e$$是独立分离的染色体片段的数量。由于LD的存在，独立分离的QTL（有效QTL）数量$$Q_e$$可能低于$$text {Min}(N_{text {QTL}},M_e)$$。在本文中，我们表明$$Q_e$$与训练集候选人的性状特异性基因组关系呈负相关。这就解释了$$Q_e$$与预测精度之间的负相关关系。为了量化候选对象与训练集所有成员的基因组关系，我们考虑了先前用于此目的的$$k^2$$统计量。它量化了候选人的标记协变量向量可以如何很好地表示为训练集的标记协变量矩阵的行的线性组合。在本文中，我们使用贝叶斯回归使这一统计特征特异性，并认为候选对象与训练集的性状特异性基因组关系与$$Q_e$$呈负相关。通过仿真验证了特定性状的$$k^2$$统计量对$$Q_e$$的依赖性，而与$$N_{text {QTL}}$$相关。性状特异性$$k^2$$统计量的后验分布表明，候选基因与训练集之间的性状特异性基因组关系与$$Q_e$$和$$N_{text {QTL}}$$呈负相关。此外，我们表明候选基因与训练集之间的性状特异性基因组关系与训练集的大小直接相关。

{"title":"On the inverse association between the number of QTL and the trait-specific genomic relationship of a candidate to the training set.","authors":"Christian Stricker, Rohan L. Fernando, Albrecht Melchinger, Hans-Juergen Auinger, Chris-Carolin Schoen","doi":"10.1186/s12711-024-00940-4","DOIUrl":"https://doi.org/10.1186/s12711-024-00940-4","url":null,"abstract":"Accuracy of genomic prediction depends on the heritability of the trait, the size of the training set, the relationship of the candidates to the training set, and the $$text {Min}(N_{text {QTL}},M_e)$$ , where $$N_{text {QTL}}$$ is the number of QTL and $$M_e$$ is the number of independently segregating chromosomal segments. Due to LD, the number $$Q_e$$ of independently segregating QTL (effective QTL) can be lower than $$text {Min}(N_{text {QTL}},M_e)$$ . In this paper, we show that $$Q_e$$ is inversely associated with the trait-specific genomic relationship of a candidate to the training set. This provides an explanation for the inverse association between $$Q_e$$ and the accuracy of prediction. To quantify the genomic relationship of a candidate to all members of the training set, we considered the $$k^2$$ statistic that has been previously used for this purpose. It quantifies how well the marker covariate vector of a candidate can be represented as a linear combination of the rows of the marker covariate matrix of the training set. In this paper, we used Bayesian regression to make this statistic trait specific and argue that the trait-specific genomic relationship of a candidate to the training set is inversely associated with $$Q_e$$ . Simulation was used to demonstrate the dependence of the trait-specific $$k^2$$ statistic on $$Q_e$$ , which is related to $$N_{text {QTL}}$$ . The posterior distributions of the trait-specific $$k^2$$ statistic showed that the trait-specific genomic relationship between a candidate and the training set is inversely associated to $$Q_e$$ and $$N_{text {QTL}}$$ . Further, we show that trait-specific genomic relationship between a candidate and the training set is directly related to the size of the training set.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"73 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142815891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A computationally efficient algorithm to leverage average information REML for (co)variance component estimation in the genomic era 在基因组时代利用平均信息 REML 进行（共）方差成分估计的高效计算算法

IF 4.1 1区农林科学 Q1 AGRICULTURE, DAIRY & ANIMAL SCIENCE

Genetics Selection Evolution

Pub Date : 2024-11-21 DOI: 10.1186/s12711-024-00939-x

Ismo Strandén, Esa A. Mäntysaari, Martin H. Lidauer, Robin Thompson, Hongding Gao

Methods for estimating variance components (VC) using restricted maximum likelihood (REML) typically require elements from the inverse of the coefficient matrix of the mixed model equations (MME). As genomic information becomes more prevalent, the coefficient matrix of the MME becomes denser, presenting a challenge for analyzing large datasets. Thus, computational algorithms based on iterative solving and Monte Carlo approximation of the inverse of the coefficient matrix become appealing. While the standard average information REML (AI-REML) is known for its rapid convergence, its computational intensity imposes limitations. In particular, the standard AI-REML requires solving the MME for each VC, which can be computationally demanding, especially when dealing with complex models with many VC. To bridge this gap, here we (1) present a computationally efficient and tractable algorithm, named the augmented AI-REML, which facilitates the AI-REML by solving an augmented MME only once within each REML iteration; and (2) implement this approach for VC estimation in a general framework of a multi-trait GBLUP model. VC estimation was investigated based on the number of VC in the model, including a two-trait, three-trait, four-trait, and five-trait GBLUP model. We compared the augmented AI-REML with the standard AI-REML in terms of computing time per REML iteration. Direct and iterative solving methods were used to assess the advances of the augmented AI-REML. When using the direct solving method, the augmented AI-REML and the standard AI-REML required similar computing times for models with a small number of VC (the two- and three-trait GBLUP model), while the augmented AI-REML demonstrated more notable reductions in computing time as the number of VC in the model increased. When using the iterative solving method, the augmented AI-REML demonstrated substantial improvements in computational efficiency compared to the standard AI-REML. The elapsed time of each REML iteration was reduced by 75%, 84%, and 86% for the two-, three-, and four-trait GBLUP models, respectively. The augmented AI-REML can considerably reduce the computing time within each REML iteration, particularly when using an iterative solver. Our results demonstrate the potential of the augmented AI-REML as an appealing approach for large-scale VC estimation in the genomic era.

使用限制性最大似然法（REML）估计方差成分（VC）的方法通常需要混合模型方程（MME）系数矩阵逆矩阵中的元素。随着基因组信息越来越普遍，MME 的系数矩阵也变得越来越密集，这给分析大型数据集带来了挑战。因此，基于系数矩阵逆的迭代求解和蒙特卡罗近似的计算算法变得非常有吸引力。虽然标准平均信息 REML（AI-REML）以收敛速度快而著称，但其计算强度也有局限性。特别是，标准平均信息 REML 需要求解每个 VC 的 MME，这对计算能力要求很高，尤其是在处理有许多 VC 的复杂模型时。为了弥补这一差距，我们在这里（1）提出了一种计算效率高、易于操作的算法，命名为增强型 AI-REML，该算法在每次 REML 迭代中只需求解一次增强型 MME，从而简化了 AI-REML；（2）在多性状 GBLUP 模型的一般框架下，将这种方法用于 VC 估计。我们根据模型中 VC 的数量对 VC 估计进行了研究，包括二性状、三性状、四性状和五性状 GBLUP 模型。我们比较了增强型 AI-REML 和标准 AI-REML 每次 REML 迭代的计算时间。我们使用了直接求解法和迭代求解法来评估增强型 AI-REML 的进步。在使用直接求解法时，增强型 AI-REML 和标准 AI-REML 对具有少量 VC 的模型（两特征和三特征 GBLUP 模型）所需的计算时间相似，而随着模型中 VC 数量的增加，增强型 AI-REML 的计算时间明显减少。在使用迭代求解方法时，增强型 AI-REML 的计算效率比标准 AI-REML 有了大幅提高。对于两特征、三特征和四特征 GBLUP 模型，每次 REML 迭代所需的时间分别减少了 75%、84% 和 86%。增强型 AI-REML 可以大大减少每次 REML 迭代的计算时间，尤其是在使用迭代求解器时。我们的研究结果证明了增强型 AI-REML 作为基因组时代大规模 VC 估计的一种有吸引力的方法的潜力。

{"title":"A computationally efficient algorithm to leverage average information REML for (co)variance component estimation in the genomic era","authors":"Ismo Strandén, Esa A. Mäntysaari, Martin H. Lidauer, Robin Thompson, Hongding Gao","doi":"10.1186/s12711-024-00939-x","DOIUrl":"https://doi.org/10.1186/s12711-024-00939-x","url":null,"abstract":"Methods for estimating variance components (VC) using restricted maximum likelihood (REML) typically require elements from the inverse of the coefficient matrix of the mixed model equations (MME). As genomic information becomes more prevalent, the coefficient matrix of the MME becomes denser, presenting a challenge for analyzing large datasets. Thus, computational algorithms based on iterative solving and Monte Carlo approximation of the inverse of the coefficient matrix become appealing. While the standard average information REML (AI-REML) is known for its rapid convergence, its computational intensity imposes limitations. In particular, the standard AI-REML requires solving the MME for each VC, which can be computationally demanding, especially when dealing with complex models with many VC. To bridge this gap, here we (1) present a computationally efficient and tractable algorithm, named the augmented AI-REML, which facilitates the AI-REML by solving an augmented MME only once within each REML iteration; and (2) implement this approach for VC estimation in a general framework of a multi-trait GBLUP model. VC estimation was investigated based on the number of VC in the model, including a two-trait, three-trait, four-trait, and five-trait GBLUP model. We compared the augmented AI-REML with the standard AI-REML in terms of computing time per REML iteration. Direct and iterative solving methods were used to assess the advances of the augmented AI-REML. When using the direct solving method, the augmented AI-REML and the standard AI-REML required similar computing times for models with a small number of VC (the two- and three-trait GBLUP model), while the augmented AI-REML demonstrated more notable reductions in computing time as the number of VC in the model increased. When using the iterative solving method, the augmented AI-REML demonstrated substantial improvements in computational efficiency compared to the standard AI-REML. The elapsed time of each REML iteration was reduced by 75%, 84%, and 86% for the two-, three-, and four-trait GBLUP models, respectively. The augmented AI-REML can considerably reduce the computing time within each REML iteration, particularly when using an iterative solver. Our results demonstrate the potential of the augmented AI-REML as an appealing approach for large-scale VC estimation in the genomic era.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"11 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142678314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On the ability of the LR method to detect bias when there is pedigree misspecification and lack of connectedness 关于在血统指定错误和缺乏关联性的情况下 LR 方法检测偏差的能力

IF 4.1 1区农林科学 Q1 AGRICULTURE, DAIRY & ANIMAL SCIENCE

Genetics Selection Evolution

Pub Date : 2024-11-21 DOI: 10.1186/s12711-024-00943-1

Alan M. Pardo, Andres Legarra, Zulma G. Vitezica, Natalia S. Forneris, Daniel O. Maizon, Sebastián Munilla

Cross-validation techniques in genetic evaluations encounter limitations due to the unobservable nature of breeding values and the challenge of validating estimated breeding values (EBVs) against pre-corrected phenotypes, challenges which the Linear Regression (LR) method addresses as an alternative. Furthermore, beef cattle genetic evaluation programs confront challenges with connectedness among herds and pedigree errors. The objective of this work was to evaluate the LR method's performance under pedigree errors and weak connectedness typical in beef cattle genetic evaluations, through simulation. We simulated a beef cattle population resembling the Argentinean Brangus, including a quantitative trait selected over six pseudo-generations with a heritability of 0.4. This study considered various scenarios, including: 25% and 40% pedigree errors (PE-25 and PE-40), weak and strong connectedness among herds (WCO and SCO, respectively), and a benchmark scenario (BEN) with complete pedigree and optimal herd connections. Over six pseudo-generations of selection, genetic gain was simulated to be under- and over-estimated in PE-40 and WCO, respectively, contrary to the BEN scenario which was unbiased. In genetic evaluations with PE-25 and PE-40, true biases of − 0.13 and − 0.18 genetic standard deviations were simulated, respectively. In the BEN scenario, the LR method accurately estimated bias, however, in PE-25 and PE-40 scenarios, it overestimated biases by 0.17 and 0.25 genetic standard deviations, respectively. In herds facing WCO, significant true bias due to confounding environmental and genetic effects was simulated, and the corresponding LR statistic failed to accurately estimate the magnitude and direction of this bias. On average, true dispersion values were close to one for BEN, PE-40, SCO and WCO, showing no significant inflation or deflation, and the values were accurately estimated by LR. However, PE-25 exhibited inflation of EBVs and was slightly underestimated by LR. Accuracies and reliabilities showed good agreement between true and LR estimated values for the scenarios evaluated. The LR method demonstrated limitations in identifying biases induced by incomplete pedigrees, including scenarios with as much as 40% pedigree errors, or lack of connectedness, but it was effective in assessing dispersion, and population accuracies and reliabilities even in the challenging scenarios addressed.

由于育种值的不可观测性和根据预校正表型验证估计育种值（EBV）的挑战，遗传评估中的交叉验证技术遇到了限制，而线性回归（LR）方法可作为一种替代方法来应对这些挑战。此外，肉牛遗传评估项目还面临着牛群之间的关联性和血统误差的挑战。这项工作的目的是通过模拟，评估线性回归法在肉牛遗传评估中典型的血统误差和弱联系情况下的性能。我们模拟了一个与阿根廷布兰格斯牛相似的肉牛种群，包括一个在六个假世代中选择的遗传率为 0.4 的数量性状。这项研究考虑了各种情况，包括25% 和 40% 的血统误差（PE-25 和 PE-40）、牛群之间的弱联系和强联系（分别为 WCO 和 SCO），以及具有完整血统和最佳牛群联系的基准方案（BEN）。在六个假世代的选择过程中，PE-40 和 WCO 模拟的遗传增益分别被低估和高估，而 BEN 情景则没有偏差。在 PE-25 和 PE-40 的遗传评估中，模拟的真实偏差分别为-0.13 和-0.18 个遗传标准差。在 BEN 情景中，LR 方法准确估计了偏差，但在 PE-25 和 PE-40 情景中，它分别高估了 0.17 和 0.25 个遗传标准差。在面临 WCO 的牛群中，模拟了由于环境和遗传效应的混杂而导致的显著真实偏差，而相应的 LR 统计量未能准确估计这种偏差的大小和方向。平均而言，BEN、PE-40、SCO 和 WCO 的真实离散值接近于 1，没有出现明显的膨胀或缩小，LR 统计量也能准确估计出这些值。然而，PE-25 的 EBV 值出现了膨胀，LR 估算值略微偏低。在所评估的方案中，真实值和 LR 估计值之间的准确度和可靠性显示出良好的一致性。LR 方法在识别不完整血统（包括血统误差高达 40% 或缺乏关联性的情况）引起的偏差方面存在局限性，但它在评估分散性、种群准确性和可靠性方面非常有效，即使是在具有挑战性的情况下也是如此。

{"title":"On the ability of the LR method to detect bias when there is pedigree misspecification and lack of connectedness","authors":"Alan M. Pardo, Andres Legarra, Zulma G. Vitezica, Natalia S. Forneris, Daniel O. Maizon, Sebastián Munilla","doi":"10.1186/s12711-024-00943-1","DOIUrl":"https://doi.org/10.1186/s12711-024-00943-1","url":null,"abstract":"Cross-validation techniques in genetic evaluations encounter limitations due to the unobservable nature of breeding values and the challenge of validating estimated breeding values (EBVs) against pre-corrected phenotypes, challenges which the Linear Regression (LR) method addresses as an alternative. Furthermore, beef cattle genetic evaluation programs confront challenges with connectedness among herds and pedigree errors. The objective of this work was to evaluate the LR method's performance under pedigree errors and weak connectedness typical in beef cattle genetic evaluations, through simulation. We simulated a beef cattle population resembling the Argentinean Brangus, including a quantitative trait selected over six pseudo-generations with a heritability of 0.4. This study considered various scenarios, including: 25% and 40% pedigree errors (PE-25 and PE-40), weak and strong connectedness among herds (WCO and SCO, respectively), and a benchmark scenario (BEN) with complete pedigree and optimal herd connections. Over six pseudo-generations of selection, genetic gain was simulated to be under- and over-estimated in PE-40 and WCO, respectively, contrary to the BEN scenario which was unbiased. In genetic evaluations with PE-25 and PE-40, true biases of − 0.13 and − 0.18 genetic standard deviations were simulated, respectively. In the BEN scenario, the LR method accurately estimated bias, however, in PE-25 and PE-40 scenarios, it overestimated biases by 0.17 and 0.25 genetic standard deviations, respectively. In herds facing WCO, significant true bias due to confounding environmental and genetic effects was simulated, and the corresponding LR statistic failed to accurately estimate the magnitude and direction of this bias. On average, true dispersion values were close to one for BEN, PE-40, SCO and WCO, showing no significant inflation or deflation, and the values were accurately estimated by LR. However, PE-25 exhibited inflation of EBVs and was slightly underestimated by LR. Accuracies and reliabilities showed good agreement between true and LR estimated values for the scenarios evaluated. The LR method demonstrated limitations in identifying biases induced by incomplete pedigrees, including scenarios with as much as 40% pedigree errors, or lack of connectedness, but it was effective in assessing dispersion, and population accuracies and reliabilities even in the challenging scenarios addressed.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"14 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142678315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Empirical versus estimated accuracy of imputation: optimising filtering thresholds for sequence imputation 归因的经验准确性与估计准确性：优化序列归因的过滤阈值

IF 4.1 1区农林科学 Q1 AGRICULTURE, DAIRY & ANIMAL SCIENCE

Genetics Selection Evolution

Pub Date : 2024-11-15 DOI: 10.1186/s12711-024-00942-2

Tuan V. Nguyen, Sunduimijid Bolormaa, Coralie M. Reich, Amanda J. Chamberlain, Christy J. Vander Jagt, Hans D. Daetwyler, Iona M. MacLeod

Genotype imputation is a cost-effective method for obtaining sequence genotypes for downstream analyses such as genome-wide association studies (GWAS). However, low imputation accuracy can increase the risk of false positives, so it is important to pre-filter data or at least assess the potential limitations due to imputation accuracy. In this study, we benchmarked three different imputation programs (Beagle 5.2, Minimac4 and IMPUTE5) and compared the empirical accuracy of imputation with the software estimated accuracy of imputation (Rsqsoft). We also tested the accuracy of imputation in cattle for autosomal and X chromosomes, SNP and INDEL, when imputing from either low-density or high-density genotypes. The accuracy of imputing sequence variants from real high-density genotypes was higher than from low-density genotypes. In our software benchmark, all programs performed well with only minor differences in accuracy. While there was a close relationship between empirical imputation accuracy and the imputation Rsqsoft, this differed considerably for Minimac4 compared to Beagle 5.2 and IMPUTE5. We found that the Rsqsoft threshold for removing poorly imputed variants must be customised according to the software and this should be accounted for when merging data from multiple studies, such as in meta-GWAS studies. We also found that imposing an Rsqsoft filter has a positive impact on genomic regions with poor imputation accuracy due to large segmental duplications that are susceptible to error-prone alignment. Overall, our results showed that on average the imputation accuracy for INDEL was approximately 6% lower than SNP for all software programs. Importantly, the imputation accuracy for the non-PAR (non-Pseudo-Autosomal Region) of the X chromosome was comparable to autosomal imputation accuracy, while for the PAR it was substantially lower, particularly when starting from low-density genotypes. This study provides an empirically derived approach to apply customised software-specific Rsqsoft thresholds for downstream analyses of imputed variants, such as needed for a meta-GWAS. The very poor empirical imputation accuracy for variants on the PAR when starting from low density genotypes demonstrates that this region should be imputed starting from a higher density of real genotypes.

基因型估算是为全基因组关联研究（GWAS）等下游分析获取序列基因型的一种经济有效的方法。然而，低估算准确性会增加假阳性的风险，因此预先过滤数据或至少评估估算准确性可能带来的限制是非常重要的。在本研究中，我们对三种不同的估算程序（Beagle 5.2、Minimac4 和 IMPUTE5）进行了基准测试，并比较了估算的经验准确性和软件估算的估算准确性（Rsqsoft）。我们还测试了牛的常染色体和 X 染色体、SNP 和 INDEL 在从低密度或高密度基因型推算时的推算准确性。从真实的高密度基因型推算序列变异的准确率高于从低密度基因型推算的准确率。在我们的软件基准测试中，所有程序都表现良好，只是在准确性上略有不同。虽然经验估算准确率与估算 Rsqsoft 之间关系密切，但 Minimac4 与 Beagle 5.2 和 IMPUTE5 相比差别很大。我们发现，剔除归因不佳变异的 Rsqsoft 阈值必须根据软件进行定制，在合并来自多个研究的数据时（如在 meta-GWAS 研究中）应考虑到这一点。我们还发现，实施 Rsqsoft 过滤器对估算准确性较差的基因组区域有积极影响，因为这些区域存在大的片段重复，容易发生配对错误。总之，我们的结果显示，在所有软件程序中，INDEL 的估算准确率平均比 SNP 低约 6%。重要的是，X 染色体非 PAR（非假常染色体区域）的估算准确率与常染色体估算准确率相当，而 PAR 的估算准确率则要低得多，尤其是从低密度基因型开始估算时。本研究提供了一种根据经验得出的方法，可将定制软件特定的 Rsqsoft 阈值应用于推算变异的下游分析，如元全球基因组分析系统（meta-GWAS）所需的分析。当从低密度基因型开始计算时，PAR 上变异的经验估算准确率非常低，这表明该区域应从更高密度的真实基因型开始估算。

{"title":"Empirical versus estimated accuracy of imputation: optimising filtering thresholds for sequence imputation","authors":"Tuan V. Nguyen, Sunduimijid Bolormaa, Coralie M. Reich, Amanda J. Chamberlain, Christy J. Vander Jagt, Hans D. Daetwyler, Iona M. MacLeod","doi":"10.1186/s12711-024-00942-2","DOIUrl":"https://doi.org/10.1186/s12711-024-00942-2","url":null,"abstract":"Genotype imputation is a cost-effective method for obtaining sequence genotypes for downstream analyses such as genome-wide association studies (GWAS). However, low imputation accuracy can increase the risk of false positives, so it is important to pre-filter data or at least assess the potential limitations due to imputation accuracy. In this study, we benchmarked three different imputation programs (Beagle 5.2, Minimac4 and IMPUTE5) and compared the empirical accuracy of imputation with the software estimated accuracy of imputation (Rsqsoft). We also tested the accuracy of imputation in cattle for autosomal and X chromosomes, SNP and INDEL, when imputing from either low-density or high-density genotypes. The accuracy of imputing sequence variants from real high-density genotypes was higher than from low-density genotypes. In our software benchmark, all programs performed well with only minor differences in accuracy. While there was a close relationship between empirical imputation accuracy and the imputation Rsqsoft, this differed considerably for Minimac4 compared to Beagle 5.2 and IMPUTE5. We found that the Rsqsoft threshold for removing poorly imputed variants must be customised according to the software and this should be accounted for when merging data from multiple studies, such as in meta-GWAS studies. We also found that imposing an Rsqsoft filter has a positive impact on genomic regions with poor imputation accuracy due to large segmental duplications that are susceptible to error-prone alignment. Overall, our results showed that on average the imputation accuracy for INDEL was approximately 6% lower than SNP for all software programs. Importantly, the imputation accuracy for the non-PAR (non-Pseudo-Autosomal Region) of the X chromosome was comparable to autosomal imputation accuracy, while for the PAR it was substantially lower, particularly when starting from low-density genotypes. This study provides an empirically derived approach to apply customised software-specific Rsqsoft thresholds for downstream analyses of imputed variants, such as needed for a meta-GWAS. The very poor empirical imputation accuracy for variants on the PAR when starting from low density genotypes demonstrates that this region should be imputed starting from a higher density of real genotypes.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"29 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142637636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The effect of phenotyping, adult selection, and mating strategies on genetic gain and rate of inbreeding in black soldier fly breeding programs 表型、成蝇选择和交配策略对黑实蝇育种项目遗传增益和近交率的影响

IF 4.1 1区农林科学 Q1 AGRICULTURE, DAIRY & ANIMAL SCIENCE

Genetics Selection Evolution

Pub Date : 2024-11-04 DOI: 10.1186/s12711-024-00938-y

Margot Slagboom, Hanne Marie Nielsen, Morten Kargo, Mark Henryon, Laura Skrubbeltrang Hansen

The aim of this study was to compare genetic gain and rate of inbreeding for different mass selection breeding programs with the aim of increasing larval body weight (LBW) in black soldier flies. The breeding programs differed in: (1) sampling of individuals for phenotyping (either random over the whole population or a fixed number per full sib family), (2) selection of adult flies for breeding (based on an adult individual’s phenotype for LBW or random from larvae preselected based on LBW), and (3) mating strategy (mating in a group with unequal male contributions or controlled between two females and one male). In addition, the numbers of phenotyped and preselected larvae were varied. The sex of an individual was unknown during preselection and females had higher LBW, resulting in more females being preselected. Selecting adult flies based on their phenotype for LBW increased genetic gain by 0.06 genetic standard deviation units compared to randomly selecting from the preselected larvae. Fixing the number of phenotyped larvae per family increased the rate of inbreeding by 0.15 to 0.20% per generation. Controlled mating compared to group mating decreased the rate of inbreeding by 0.02 to 0.03% per generation. Phenotyping more than 4000 larvae resulted in a lack of preselected males due to the sexual dimorphism. Preselecting both too few and too many larvae could negatively impact genetic gain, depending on the breeding program. A mass selection breeding programs in which the adult fly is selected based on their larval phenotype, breeding animals mate in a group and sampling larvae for phenotyping at random over the whole population is recommended for black soldier flies, considering the positive effect on rates of genetic gain and inbreeding. The number of phenotyped and preselected larvae should be calculated based on the expected female weight deviation to ensure sufficient male and female candidates are selected.

本研究旨在比较不同大规模选育计划的遗传增益和近交率，目的是提高黑兵蝇幼虫体重（LBW）。育种方案在以下方面有所不同(1)表型个体取样（在整个种群中随机取样或在每个兄弟姐妹家族中固定取样），(2)选择成蝇进行繁殖（根据成蝇个体的低体重表型或从根据低体重预选的幼虫中随机选择），(3)交配策略（在雄性贡献不均的群体中交配或在两只雌蝇和一只雄蝇之间控制交配）。此外，表型幼虫和预选幼虫的数量也各不相同。在预选过程中，个体的性别是未知的，而雌蝇的低体重率较高，因此更多的雌蝇被预选。与从预选幼虫中随机选择相比，根据表型选择成蝇的枸杞体重增加了 0.06 个遗传标准差单位。固定每个家系的表型幼虫数量可使近交率每代提高 0.15% 至 0.20%。控制交配与群体交配相比，每代近交率降低了 0.02% 至 0.03%。对 4000 多条幼虫进行表型分析后发现，由于性二型，缺乏预选雄虫。预选幼虫数量过少或过多都会对遗传增益产生负面影响，这取决于育种计划。考虑到对遗传增殖率和近交率的积极影响，建议对黑兵蝇实施大规模选择育种计划，即根据幼虫表型选择成蝇，育种动物在群体中交配，并在整个种群中随机抽取幼虫进行表型分析。应根据预期的雌蝇体重偏差计算表型和预选幼虫的数量，以确保选出足够的雌雄候选幼虫。

{"title":"The effect of phenotyping, adult selection, and mating strategies on genetic gain and rate of inbreeding in black soldier fly breeding programs","authors":"Margot Slagboom, Hanne Marie Nielsen, Morten Kargo, Mark Henryon, Laura Skrubbeltrang Hansen","doi":"10.1186/s12711-024-00938-y","DOIUrl":"https://doi.org/10.1186/s12711-024-00938-y","url":null,"abstract":"The aim of this study was to compare genetic gain and rate of inbreeding for different mass selection breeding programs with the aim of increasing larval body weight (LBW) in black soldier flies. The breeding programs differed in: (1) sampling of individuals for phenotyping (either random over the whole population or a fixed number per full sib family), (2) selection of adult flies for breeding (based on an adult individual’s phenotype for LBW or random from larvae preselected based on LBW), and (3) mating strategy (mating in a group with unequal male contributions or controlled between two females and one male). In addition, the numbers of phenotyped and preselected larvae were varied. The sex of an individual was unknown during preselection and females had higher LBW, resulting in more females being preselected. Selecting adult flies based on their phenotype for LBW increased genetic gain by 0.06 genetic standard deviation units compared to randomly selecting from the preselected larvae. Fixing the number of phenotyped larvae per family increased the rate of inbreeding by 0.15 to 0.20% per generation. Controlled mating compared to group mating decreased the rate of inbreeding by 0.02 to 0.03% per generation. Phenotyping more than 4000 larvae resulted in a lack of preselected males due to the sexual dimorphism. Preselecting both too few and too many larvae could negatively impact genetic gain, depending on the breeding program. A mass selection breeding programs in which the adult fly is selected based on their larval phenotype, breeding animals mate in a group and sampling larvae for phenotyping at random over the whole population is recommended for black soldier flies, considering the positive effect on rates of genetic gain and inbreeding. The number of phenotyped and preselected larvae should be calculated based on the expected female weight deviation to ensure sufficient male and female candidates are selected.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"25 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142574491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Investigating genotype by environment interaction for beef cattle fertility traits in commercial herds in northern Australia with multi-trait analysis 通过多性状分析调查澳大利亚北部商业牛群中肉牛繁殖力性状的基因型与环境的交互作用

IF 4.1 1区农林科学 Q1 AGRICULTURE, DAIRY & ANIMAL SCIENCE

Genetics Selection Evolution

Pub Date : 2024-10-31 DOI: 10.1186/s12711-024-00936-0

James P. Copley, Benjamin J. Hayes, Elizabeth M. Ross, Shannon Speight, Geoffry Fordyce, Benjamin J. Wood, Bailey N. Engle

Genotype by environment interactions (GxE) affect a range of production traits in beef cattle. Quantifying the effect of GxE in commercial and multi-breed herds is challenging due to unknown genetic linkage between animals across environment levels. The primary aim of this study was to use multi-trait models to investigate GxE for three heifer fertility traits, corpus luteum (CL) presence, first pregnancy and second pregnancy, in a large tropical beef multibreed dataset (n = 21,037). Environmental levels were defined by two different descriptors, burden of heat load (temperature humidity index, THI) and nutritional availability (based on mean average daily gain for the herd, ADWG). To separate the effects of genetic linkage and real GxE across the environments, 1000 replicates of a simulated phenotype were generated by simulating QTL effects with no GxE onto real marker genotypes from the population, to determine the genetic correlations that could be expected across environments due to the existing genetic linkage only. Correlations from the real phenotypes were then compared to the empirical distribution under the null hypothesis from the simulated data. By adopting this approach, this study attempted to establish if low genetic correlations between environmental levels were due to GxE or insufficient genetic linkage between animals in each environmental level. The correlations (being less than <0.8) for the real phenotypes were indicative of GxE for CL presence between ADWG environmental levels and in pregnancy traits. However, none of the correlations for CL presence or first pregnancy between ADWG levels were below the 5th percentile value for the empirical distribution under the null hypothesis from the simulated data. Only one statistically significant (P < 0.05) indication of GxE for first pregnancy was found between THI environmental levels, where rg = 0.28 and 5th percentile value = 0.29, and this result was marginal. Only one case of statistically significant GxE for fertility traits was detected for first pregnancy between THI environmental levels 2 and 3. Other initial indications of GxE that were observed from the real phenotypes did not prove significant when compared to an empirical null distribution from simulated phenotypes. The lack of compelling evidence of GxE indicates that direct selection for fertility traits can be made accurately, using a single evaluation, regardless of environment.

基因型与环境的相互作用（GxE）会影响肉牛的一系列生产性状。由于不同环境水平下动物之间的遗传联系未知，因此量化 GxE 对商业牛群和多品种牛群的影响具有挑战性。本研究的主要目的是在一个大型热带肉牛多品种数据集（n = 21,037）中，使用多性状模型研究 GxE 对黄体（CL）存在、首次妊娠和第二次妊娠这三个母牛繁殖性状的影响。环境水平由两个不同的描述因子定义，即热负荷（温度湿度指数，THI）和营养供应（基于牛群平均日增重，ADWG）。为了区分遗传连锁和真实 GxE 在不同环境中的影响，通过将无 GxE 的 QTL 效应模拟到种群的真实标记基因型上，生成了 1000 个重复的模拟表型，以确定仅由于现有的遗传连锁而可能在不同环境中产生的遗传相关性。然后将真实表型的相关性与模拟数据零假设下的经验分布进行比较。通过采用这种方法，本研究试图确定环境水平之间的低遗传相关性是由于 GxE 还是由于各环境水平中动物之间的遗传联系不足。实际表型的相关性（小于<0.8）表明，ADWG 环境水平与妊娠性状之间的 CL 存在 GxE。然而，在模拟数据的零假设下，ADWG 水平与 CL 存在或首次怀孕之间的相关性均不低于经验分布的第 5 百分位值。在 THI 环境水平之间只发现了一个具有统计学意义（P < 0.05）的 GxE 显示，即 rg = 0.28 和第 5 百分位值 = 0.29，而且这个结果是边缘性的。在 THI 环境水平 2 和 3 之间，仅发现一例具有统计学意义的初孕生殖力性状 GxE。从实际表型中观察到的其他 GxE 初步迹象与模拟表型的经验零分布相比并不显著。缺乏令人信服的 GxE 证据表明，无论环境如何，都可以通过单一评价对育种性状进行准确的直接选择。

{"title":"Investigating genotype by environment interaction for beef cattle fertility traits in commercial herds in northern Australia with multi-trait analysis","authors":"James P. Copley, Benjamin J. Hayes, Elizabeth M. Ross, Shannon Speight, Geoffry Fordyce, Benjamin J. Wood, Bailey N. Engle","doi":"10.1186/s12711-024-00936-0","DOIUrl":"https://doi.org/10.1186/s12711-024-00936-0","url":null,"abstract":"Genotype by environment interactions (GxE) affect a range of production traits in beef cattle. Quantifying the effect of GxE in commercial and multi-breed herds is challenging due to unknown genetic linkage between animals across environment levels. The primary aim of this study was to use multi-trait models to investigate GxE for three heifer fertility traits, corpus luteum (CL) presence, first pregnancy and second pregnancy, in a large tropical beef multibreed dataset (n = 21,037). Environmental levels were defined by two different descriptors, burden of heat load (temperature humidity index, THI) and nutritional availability (based on mean average daily gain for the herd, ADWG). To separate the effects of genetic linkage and real GxE across the environments, 1000 replicates of a simulated phenotype were generated by simulating QTL effects with no GxE onto real marker genotypes from the population, to determine the genetic correlations that could be expected across environments due to the existing genetic linkage only. Correlations from the real phenotypes were then compared to the empirical distribution under the null hypothesis from the simulated data. By adopting this approach, this study attempted to establish if low genetic correlations between environmental levels were due to GxE or insufficient genetic linkage between animals in each environmental level. The correlations (being less than <0.8) for the real phenotypes were indicative of GxE for CL presence between ADWG environmental levels and in pregnancy traits. However, none of the correlations for CL presence or first pregnancy between ADWG levels were below the 5th percentile value for the empirical distribution under the null hypothesis from the simulated data. Only one statistically significant (P < 0.05) indication of GxE for first pregnancy was found between THI environmental levels, where rg = 0.28 and 5th percentile value = 0.29, and this result was marginal. Only one case of statistically significant GxE for fertility traits was detected for first pregnancy between THI environmental levels 2 and 3. Other initial indications of GxE that were observed from the real phenotypes did not prove significant when compared to an empirical null distribution from simulated phenotypes. The lack of compelling evidence of GxE indicates that direct selection for fertility traits can be made accurately, using a single evaluation, regardless of environment.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"27 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142555824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0