Pub Date : 2025-01-13DOI: 10.1186/s12711-024-00945-z
Simona Antonios, Silvia T. Rodríguez-Ramilo, Andres Legarra, Jean-Michel Astruc, Luis Varona, Zulma G. Vitezica
The magnitude of inbreeding depression depends on the recessive burden of the individual, which can be traced back to the hidden (recessive) inbreeding load among ancestors. However, these ancestors carry different alleles at potentially deleterious loci and therefore there is individual variability of this inbreeding load. Estimation of the additive genetic value for inbreeding load is possible using a decomposition of inbreeding in partial inbreeding components due to ancestors. Both the magnitude of variation in partial inbreeding components and the additive genetic variance of inbreeding loads are largely unknown. Our study had three objectives. First, based on substitution effect under non-random matings, we showed analytically that inbreeding load of an ancestor can be expressed as an additive genetic effect. Second, we analysed the structure of individual inbreeding by examining the contributions of specific ancestors/founders using the concept of partial inbreeding coefficients in three French dairy sheep populations (Basco-Béarnaise, Manech Tête Noire and Manech Tête Rousse). Third, we included these coefficients in a mixed model as random regression covariates, to predict genetic variance and breeding values of the inbreeding load for milk yield in the same breeds. Pedigrees included 190,276, 166,028 and 633,655 animals of Basco-Béarnaise, Manech Tête Noire and Manech Tête Rousse, respectively, born between 1985 and 2021. A fraction of 99.1% of the partial inbreeding coefficients were lower than 0.01 in all breeds, meaning that in practice inbreeding occurs in pedigree loops that span several generations backwards. Less than 5% ancestors generate inbreeding, because mating is essentially between unrelated individuals. Inbreeding load estimations involved 658,731, 541,180 and 2,168,454 records of yearly milk yield from 178,123, 151,863 and 596,586 females in Basco-Béarnaise, Manech Tête Noire and Manech Tête Rousse, respectively. Adding the inbreeding load effect to the model improved the fitting (values of the statistic Likelihood Ratio Test between 132 and 383) for milk yield in the three breeds. The inbreeding load variances were equal to 11,804 and 9435 L squared of milk yield for a fully inbred (100%) descendant in Manech Tête Noire and Manech Tête Rousse. In Basco-Béarnaise, the estimate of the inbreeding load variance (11,804) was not significantly different from zero. The correlations between (direct effect) additive genetic and inbreeding load effects were − 0.09, − 0.08 and − 0.12 in Basco-Béarnaise, Manech Tête Noire and Manech Tête Rousse. The decomposition of inbreeding in partial coefficients in these populations shows that inbreeding is mostly due to several small contributions of ancestors (lower than 0.001) going back several generations (5 to 7 generations), which is according to the policy of avoiding close matings. There is variation of inbreeding load among animals, although its magnitude does not seem enough to warr
近交抑制的程度取决于个体的隐性负担,这可以追溯到祖先之间的隐性近交负荷。然而,这些祖先在潜在的有害位点上携带不同的等位基因,因此这种近交负荷存在个体差异。通过对由于祖先的部分近交成分的近交进行分解,可以估计近交负荷的加性遗传值。部分近交成分的变异幅度和近交负荷的加性遗传变异在很大程度上都是未知的。我们的研究有三个目标。首先,基于非随机交配下的替代效应,分析表明祖先近交负荷可以表达为加性遗传效应。其次,我们利用部分近交系数的概念,通过考察特定祖先/建立者对三个法国乳羊群体(basco - b arnaise, Manech Tête Noire和Manech Tête Rousse)的贡献,分析了个体近交的结构。第三,将这些系数作为随机回归协变量纳入混合模型,预测同一品种近交负荷对产奶量的遗传方差和育种值。在1985年至2021年间出生的basco - b阿纳斯,Manech Tête Noire和Manech Tête Rousse的血统分别为190,276,166,028和633,655只动物。99.1%的部分近交系数在所有品种中都低于0.01,这意味着在实践中近交发生在跨越几代的谱系循环中。只有不到5%的祖先会近亲交配,因为交配基本上是在没有血缘关系的个体之间进行的。近交负荷估计分别涉及basco - bassarnaise、Manech Tête Noire和Manech Tête Rousse地区178,123、151,863和596,586头雌性奶牛的年产奶量658,731、541,180和2168,454条记录。在模型中加入近交系负荷效应,提高了3个品种产奶量的拟合(统计似然比检验值在132 ~ 383之间)。全自交系(100%)后代产奶量的近交系负荷方差分别为11,804和9435 L平方。在basco - bsamarnaise中,近交负荷方差(11,804)的估计值与零没有显著差异。basco - bassarnaise、Manech Tête Noire和Manech Tête Rousse的加性遗传负荷效应(直接效应)与近交负荷效应的相关系数分别为- 0.09、- 0.08和- 0.12。近交的部分系数分解表明,近交主要是由于几代(5 ~ 7代)祖先的少量贡献(小于0.001),这符合避免近交的策略。动物之间的近亲繁殖负荷是有差异的,尽管其大小似乎不足以保证根据这一标准进行选择。
{"title":"Genetic inbreeding load and its individual prediction for milk yield in French dairy sheep","authors":"Simona Antonios, Silvia T. Rodríguez-Ramilo, Andres Legarra, Jean-Michel Astruc, Luis Varona, Zulma G. Vitezica","doi":"10.1186/s12711-024-00945-z","DOIUrl":"https://doi.org/10.1186/s12711-024-00945-z","url":null,"abstract":"The magnitude of inbreeding depression depends on the recessive burden of the individual, which can be traced back to the hidden (recessive) inbreeding load among ancestors. However, these ancestors carry different alleles at potentially deleterious loci and therefore there is individual variability of this inbreeding load. Estimation of the additive genetic value for inbreeding load is possible using a decomposition of inbreeding in partial inbreeding components due to ancestors. Both the magnitude of variation in partial inbreeding components and the additive genetic variance of inbreeding loads are largely unknown. Our study had three objectives. First, based on substitution effect under non-random matings, we showed analytically that inbreeding load of an ancestor can be expressed as an additive genetic effect. Second, we analysed the structure of individual inbreeding by examining the contributions of specific ancestors/founders using the concept of partial inbreeding coefficients in three French dairy sheep populations (Basco-Béarnaise, Manech Tête Noire and Manech Tête Rousse). Third, we included these coefficients in a mixed model as random regression covariates, to predict genetic variance and breeding values of the inbreeding load for milk yield in the same breeds. Pedigrees included 190,276, 166,028 and 633,655 animals of Basco-Béarnaise, Manech Tête Noire and Manech Tête Rousse, respectively, born between 1985 and 2021. A fraction of 99.1% of the partial inbreeding coefficients were lower than 0.01 in all breeds, meaning that in practice inbreeding occurs in pedigree loops that span several generations backwards. Less than 5% ancestors generate inbreeding, because mating is essentially between unrelated individuals. Inbreeding load estimations involved 658,731, 541,180 and 2,168,454 records of yearly milk yield from 178,123, 151,863 and 596,586 females in Basco-Béarnaise, Manech Tête Noire and Manech Tête Rousse, respectively. Adding the inbreeding load effect to the model improved the fitting (values of the statistic Likelihood Ratio Test between 132 and 383) for milk yield in the three breeds. The inbreeding load variances were equal to 11,804 and 9435 L squared of milk yield for a fully inbred (100%) descendant in Manech Tête Noire and Manech Tête Rousse. In Basco-Béarnaise, the estimate of the inbreeding load variance (11,804) was not significantly different from zero. The correlations between (direct effect) additive genetic and inbreeding load effects were − 0.09, − 0.08 and − 0.12 in Basco-Béarnaise, Manech Tête Noire and Manech Tête Rousse. The decomposition of inbreeding in partial coefficients in these populations shows that inbreeding is mostly due to several small contributions of ancestors (lower than 0.001) going back several generations (5 to 7 generations), which is according to the policy of avoiding close matings. There is variation of inbreeding load among animals, although its magnitude does not seem enough to warr","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"50 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142968270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-30DOI: 10.1186/s12711-024-00944-0
Andrei A. Kudinov, Antti Kause
Sex identification in farmed fish is important for the management of fish stocks and breeding programs, but identification based on visual characteristics is typically difficult or impossible in juvenile or premature fish. The amount of genomic data obtained from farmed fish is rapidly growing with the implementation of genomic selection in aquaculture. In comparison to mammals and birds, ray-finned fishes exhibit a greater diversity of sex determination systems, with an absence of conserved genomic regions. A group of genomic markers located on a standard genotyping array has been reported to potentially be linked with sex determination in rainbow trout. However, the set of markers suitable for sex identification may vary between populations. Sex identification from genomic data is usually performed using probabilistic methods, where suitable markers are known beforehand. In our study, we demonstrated the use of the Extreme Gradient Boosting approach from the supervised machine learning gradient boost framework to predict sex from unimputed genomic data, when the suitability of the markers was unknown a priori. The accuracy of the method was assessed using four simulated datasets with different genotyping error rates and one real dataset from the Finnish Rainbow Trout Breeding Program. The method showed high prediction quality on both simulated and real datasets. For simulated datasets with low (5%) and high (50%) genotyping error rates, the accuracies were 1.0 and 0.60, respectively. In the real data, the method achieved a prediction accuracy of 98%, which is suitable for routine use.
{"title":"Sex identification in rainbow trout using genomic information and machine learning","authors":"Andrei A. Kudinov, Antti Kause","doi":"10.1186/s12711-024-00944-0","DOIUrl":"https://doi.org/10.1186/s12711-024-00944-0","url":null,"abstract":"Sex identification in farmed fish is important for the management of fish stocks and breeding programs, but identification based on visual characteristics is typically difficult or impossible in juvenile or premature fish. The amount of genomic data obtained from farmed fish is rapidly growing with the implementation of genomic selection in aquaculture. In comparison to mammals and birds, ray-finned fishes exhibit a greater diversity of sex determination systems, with an absence of conserved genomic regions. A group of genomic markers located on a standard genotyping array has been reported to potentially be linked with sex determination in rainbow trout. However, the set of markers suitable for sex identification may vary between populations. Sex identification from genomic data is usually performed using probabilistic methods, where suitable markers are known beforehand. In our study, we demonstrated the use of the Extreme Gradient Boosting approach from the supervised machine learning gradient boost framework to predict sex from unimputed genomic data, when the suitability of the markers was unknown a priori. The accuracy of the method was assessed using four simulated datasets with different genotyping error rates and one real dataset from the Finnish Rainbow Trout Breeding Program. The method showed high prediction quality on both simulated and real datasets. For simulated datasets with low (5%) and high (50%) genotyping error rates, the accuracies were 1.0 and 0.60, respectively. In the real data, the method achieved a prediction accuracy of 98%, which is suitable for routine use.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"4 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142901813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The genome-wide association study (GWAS) is a powerful method for mapping quantitative trait loci (QTL). However, standard GWAS can detect only QTL that segregate in the mapping population. Crossing populations with different characteristics increases genetic variability but F2 or back-crosses lack mapping resolution due to the limited number of recombination events. This drawback can be overcome with advanced intercross line (AIL) populations, which increase the number recombination events and provide a more accurate mapping resolution. Recent studies in humans have revealed ancestry-dependent genetic architecture and shown the effectiveness of admixture mapping in admixed populations. Through the incorporation of line-of-origin effects and GWAS on an F9 AIL population, we identified genes that affect body weight at eight weeks of age (BW8) in chickens. The proposed ancestral-haplotype-based GWAS (testing only the origin regardless of the alleles) revealed three new QTLs on GGA12, GGA15, and GGA20. By using the concepts of ancestral homozygotes (individuals that carry two haplotypes of the same origin) and ancestral heterozygotes (carrying one haplotype of each origin), we identified 632 loci that exhibited high-parent (the heterozygote is better than both parents) and mid-parent (the heterozygote is better than the median of the parents) dominance across 12 chromosomes. Out of the 199 genes associated with BW8, EYA1, PDE1C, and MYC were identified as the best candidate genes for further validation. In addition to the candidate genes reported in this study, our research demonstrates the effectiveness of incorporating ancestral information in population genetic analyses, which can be broadly applicable for genetic mapping in populations generated by ancestors with distinct phenotypes and genetic backgrounds. Our methods can benefit both geneticists and biologists interested in the genetic determinism of complex traits.
{"title":"Haplotype analysis incorporating ancestral origins identified novel genetic loci associated with chicken body weight using an advanced intercross line","authors":"Lina Bu, Yuzhe Wang, Lizhi Tan, Zilong Wen, Xiaoxiang Hu, Zhiwu Zhang, Yiqiang Zhao","doi":"10.1186/s12711-024-00946-y","DOIUrl":"https://doi.org/10.1186/s12711-024-00946-y","url":null,"abstract":"The genome-wide association study (GWAS) is a powerful method for mapping quantitative trait loci (QTL). However, standard GWAS can detect only QTL that segregate in the mapping population. Crossing populations with different characteristics increases genetic variability but F2 or back-crosses lack mapping resolution due to the limited number of recombination events. This drawback can be overcome with advanced intercross line (AIL) populations, which increase the number recombination events and provide a more accurate mapping resolution. Recent studies in humans have revealed ancestry-dependent genetic architecture and shown the effectiveness of admixture mapping in admixed populations. Through the incorporation of line-of-origin effects and GWAS on an F9 AIL population, we identified genes that affect body weight at eight weeks of age (BW8) in chickens. The proposed ancestral-haplotype-based GWAS (testing only the origin regardless of the alleles) revealed three new QTLs on GGA12, GGA15, and GGA20. By using the concepts of ancestral homozygotes (individuals that carry two haplotypes of the same origin) and ancestral heterozygotes (carrying one haplotype of each origin), we identified 632 loci that exhibited high-parent (the heterozygote is better than both parents) and mid-parent (the heterozygote is better than the median of the parents) dominance across 12 chromosomes. Out of the 199 genes associated with BW8, EYA1, PDE1C, and MYC were identified as the best candidate genes for further validation. In addition to the candidate genes reported in this study, our research demonstrates the effectiveness of incorporating ancestral information in population genetic analyses, which can be broadly applicable for genetic mapping in populations generated by ancestors with distinct phenotypes and genetic backgrounds. Our methods can benefit both geneticists and biologists interested in the genetic determinism of complex traits.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"64 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142858434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-18DOI: 10.1186/s12711-024-00947-x
Jón H. Eiríksson, Þórdís Þórarinsdóttir, Egill Gautason
Scrapie is an infectious prion disease in sheep. Selective breeding for resistant genotypes of the prion protein gene (PRNP) is an effective way to prevent scrapie outbreaks. Genotyping all selection candidates in a population is expensive but existing pedigree records can help infer the probabilities of genotypes in relatives of genotyped animals. We used linear models to predict allele content for the various PRNP alleles found in Icelandic sheep and compiled the available estimates of relative scrapie susceptibility (RSS) associated with PRNP genotypes from the literature. Using the predicted allele content and the genotypic RSS we calculated estimated breeding values (EBV) for RSS. We tested the predictions on simulated data under different scenarios that varied in the proportion of genotyped sheep, genotyping strategy, pedigree recording accuracy, genotyping error rates and assumed heritability of allele content. Prediction of allele content for rare alleles was less successful than for alleles with moderate frequencies. The accuracy of allele content and RSS EBV predictions was not affected by the assumed heritability, but the dispersion of prediction was affected. In a scenario where 40% of rams were genotyped and no errors in genotyping or recorded pedigree, the accuracy of RSS EBV for ungenotyped selection candidates was 0.49. If only 20% of rams were genotyped, or rams and ewes were genotyped randomly, or there were 10% pedigree errors, or there were 2% genotyping errors, the accuracy decreased by 0.07, 0.08, 0.03 and 0.04, respectively. With empirical data, the accuracy of RSS EBV for ungenotyped sheep was 0.46–0.65. A linear model for predicting allele content for the PRNP gene, combined with estimates of relative susceptibility associated with PRNP genotypes, can provide RSS EBV for scrapie resistance for ungenotyped selection candidates with accuracy up to 0.65. These RSS EBV can complement selection strategies based on PRNP genotypes, especially in populations where resistant genotypes are rare.
{"title":"Predicted breeding values for relative scrapie susceptibility for genotyped and ungenotyped sheep","authors":"Jón H. Eiríksson, Þórdís Þórarinsdóttir, Egill Gautason","doi":"10.1186/s12711-024-00947-x","DOIUrl":"https://doi.org/10.1186/s12711-024-00947-x","url":null,"abstract":"Scrapie is an infectious prion disease in sheep. Selective breeding for resistant genotypes of the prion protein gene (PRNP) is an effective way to prevent scrapie outbreaks. Genotyping all selection candidates in a population is expensive but existing pedigree records can help infer the probabilities of genotypes in relatives of genotyped animals. We used linear models to predict allele content for the various PRNP alleles found in Icelandic sheep and compiled the available estimates of relative scrapie susceptibility (RSS) associated with PRNP genotypes from the literature. Using the predicted allele content and the genotypic RSS we calculated estimated breeding values (EBV) for RSS. We tested the predictions on simulated data under different scenarios that varied in the proportion of genotyped sheep, genotyping strategy, pedigree recording accuracy, genotyping error rates and assumed heritability of allele content. Prediction of allele content for rare alleles was less successful than for alleles with moderate frequencies. The accuracy of allele content and RSS EBV predictions was not affected by the assumed heritability, but the dispersion of prediction was affected. In a scenario where 40% of rams were genotyped and no errors in genotyping or recorded pedigree, the accuracy of RSS EBV for ungenotyped selection candidates was 0.49. If only 20% of rams were genotyped, or rams and ewes were genotyped randomly, or there were 10% pedigree errors, or there were 2% genotyping errors, the accuracy decreased by 0.07, 0.08, 0.03 and 0.04, respectively. With empirical data, the accuracy of RSS EBV for ungenotyped sheep was 0.46–0.65. A linear model for predicting allele content for the PRNP gene, combined with estimates of relative susceptibility associated with PRNP genotypes, can provide RSS EBV for scrapie resistance for ungenotyped selection candidates with accuracy up to 0.65. These RSS EBV can complement selection strategies based on PRNP genotypes, especially in populations where resistant genotypes are rare.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"54 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142849415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-17DOI: 10.1186/s12711-024-00941-3
Yvonne C. J. Wientjes, Katrijn Peeters, Piter Bijma, Abe E. Huisman, Mario P. L. Calus
Genetic selection improves a population by increasing the frequency of favorable alleles. Understanding and monitoring allele frequency changes is, therefore, important to obtain more insight into the long-term effects of selection. This study aimed to investigate changes in allele frequencies and in results of genome-wide association studies (GWAS), and how those two are related to each other. This was studied in two maternal pig lines where selection was based on a broad selection index. Genotypes and phenotypes were available from 2015 to 2021. Several large changes in allele frequencies over the years were observed in both lines. The largest allele frequency changes were not larger than expected under drift based on gene dropping simulations, but the average allele frequency change was larger with selection. Moreover, several significant regions were found in the GWAS for the traits under selection, but those regions did not overlap with regions with larger allele frequency changes. No significant GWAS regions were found for the selection index in both lines, which included multiple traits, indicating that the index is affected by many loci of small effect. Additionally, many significant regions showed pleiotropic, and often antagonistic, associations with other traits under selection. This reduces the selection pressure on those regions, which can explain why those regions are still segregating, although the traits have been under selection for several generations. Across the years, only small changes in Manhattan plots were found, indicating that the genetic architecture was reasonably constant. No significant GWAS regions were found for any of the traits under selection among the regions with the largest changes in allele frequency, and the correlation between significance level of marker associations and changes in allele frequency over one generation was close to zero for all traits. Moreover, the largest changes in allele frequency could be explained by drift and were not necessarily a result of selection. This is probably because selection acted on a broad index for which no significant GWAS regions were found. Our results show that selecting on a broad index spreads the selection pressure across the genome, thereby limiting allele frequency changes.
{"title":"Changes in allele frequencies and genetic architecture due to selection in two pig populations","authors":"Yvonne C. J. Wientjes, Katrijn Peeters, Piter Bijma, Abe E. Huisman, Mario P. L. Calus","doi":"10.1186/s12711-024-00941-3","DOIUrl":"https://doi.org/10.1186/s12711-024-00941-3","url":null,"abstract":"Genetic selection improves a population by increasing the frequency of favorable alleles. Understanding and monitoring allele frequency changes is, therefore, important to obtain more insight into the long-term effects of selection. This study aimed to investigate changes in allele frequencies and in results of genome-wide association studies (GWAS), and how those two are related to each other. This was studied in two maternal pig lines where selection was based on a broad selection index. Genotypes and phenotypes were available from 2015 to 2021. Several large changes in allele frequencies over the years were observed in both lines. The largest allele frequency changes were not larger than expected under drift based on gene dropping simulations, but the average allele frequency change was larger with selection. Moreover, several significant regions were found in the GWAS for the traits under selection, but those regions did not overlap with regions with larger allele frequency changes. No significant GWAS regions were found for the selection index in both lines, which included multiple traits, indicating that the index is affected by many loci of small effect. Additionally, many significant regions showed pleiotropic, and often antagonistic, associations with other traits under selection. This reduces the selection pressure on those regions, which can explain why those regions are still segregating, although the traits have been under selection for several generations. Across the years, only small changes in Manhattan plots were found, indicating that the genetic architecture was reasonably constant. No significant GWAS regions were found for any of the traits under selection among the regions with the largest changes in allele frequency, and the correlation between significance level of marker associations and changes in allele frequency over one generation was close to zero for all traits. Moreover, the largest changes in allele frequency could be explained by drift and were not necessarily a result of selection. This is probably because selection acted on a broad index for which no significant GWAS regions were found. Our results show that selecting on a broad index spreads the selection pressure across the genome, thereby limiting allele frequency changes.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"26 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142832236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-13DOI: 10.1186/s12711-024-00940-4
Christian Stricker, Rohan L. Fernando, Albrecht Melchinger, Hans-Juergen Auinger, Chris-Carolin Schoen
Accuracy of genomic prediction depends on the heritability of the trait, the size of the training set, the relationship of the candidates to the training set, and the $$text {Min}(N_{text {QTL}},M_e)$$ , where $$N_{text {QTL}}$$ is the number of QTL and $$M_e$$ is the number of independently segregating chromosomal segments. Due to LD, the number $$Q_e$$ of independently segregating QTL (effective QTL) can be lower than $$text {Min}(N_{text {QTL}},M_e)$$ . In this paper, we show that $$Q_e$$ is inversely associated with the trait-specific genomic relationship of a candidate to the training set. This provides an explanation for the inverse association between $$Q_e$$ and the accuracy of prediction. To quantify the genomic relationship of a candidate to all members of the training set, we considered the $$k^2$$ statistic that has been previously used for this purpose. It quantifies how well the marker covariate vector of a candidate can be represented as a linear combination of the rows of the marker covariate matrix of the training set. In this paper, we used Bayesian regression to make this statistic trait specific and argue that the trait-specific genomic relationship of a candidate to the training set is inversely associated with $$Q_e$$ . Simulation was used to demonstrate the dependence of the trait-specific $$k^2$$ statistic on $$Q_e$$ , which is related to $$N_{text {QTL}}$$ . The posterior distributions of the trait-specific $$k^2$$ statistic showed that the trait-specific genomic relationship between a candidate and the training set is inversely associated to $$Q_e$$ and $$N_{text {QTL}}$$ . Further, we show that trait-specific genomic relationship between a candidate and the training set is directly related to the size of the training set.
{"title":"On the inverse association between the number of QTL and the trait-specific genomic relationship of a candidate to the training set.","authors":"Christian Stricker, Rohan L. Fernando, Albrecht Melchinger, Hans-Juergen Auinger, Chris-Carolin Schoen","doi":"10.1186/s12711-024-00940-4","DOIUrl":"https://doi.org/10.1186/s12711-024-00940-4","url":null,"abstract":"Accuracy of genomic prediction depends on the heritability of the trait, the size of the training set, the relationship of the candidates to the training set, and the $$text {Min}(N_{text {QTL}},M_e)$$ , where $$N_{text {QTL}}$$ is the number of QTL and $$M_e$$ is the number of independently segregating chromosomal segments. Due to LD, the number $$Q_e$$ of independently segregating QTL (effective QTL) can be lower than $$text {Min}(N_{text {QTL}},M_e)$$ . In this paper, we show that $$Q_e$$ is inversely associated with the trait-specific genomic relationship of a candidate to the training set. This provides an explanation for the inverse association between $$Q_e$$ and the accuracy of prediction. To quantify the genomic relationship of a candidate to all members of the training set, we considered the $$k^2$$ statistic that has been previously used for this purpose. It quantifies how well the marker covariate vector of a candidate can be represented as a linear combination of the rows of the marker covariate matrix of the training set. In this paper, we used Bayesian regression to make this statistic trait specific and argue that the trait-specific genomic relationship of a candidate to the training set is inversely associated with $$Q_e$$ . Simulation was used to demonstrate the dependence of the trait-specific $$k^2$$ statistic on $$Q_e$$ , which is related to $$N_{text {QTL}}$$ . The posterior distributions of the trait-specific $$k^2$$ statistic showed that the trait-specific genomic relationship between a candidate and the training set is inversely associated to $$Q_e$$ and $$N_{text {QTL}}$$ . Further, we show that trait-specific genomic relationship between a candidate and the training set is directly related to the size of the training set.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"73 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142815891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-21DOI: 10.1186/s12711-024-00939-x
Ismo Strandén, Esa A. Mäntysaari, Martin H. Lidauer, Robin Thompson, Hongding Gao
Methods for estimating variance components (VC) using restricted maximum likelihood (REML) typically require elements from the inverse of the coefficient matrix of the mixed model equations (MME). As genomic information becomes more prevalent, the coefficient matrix of the MME becomes denser, presenting a challenge for analyzing large datasets. Thus, computational algorithms based on iterative solving and Monte Carlo approximation of the inverse of the coefficient matrix become appealing. While the standard average information REML (AI-REML) is known for its rapid convergence, its computational intensity imposes limitations. In particular, the standard AI-REML requires solving the MME for each VC, which can be computationally demanding, especially when dealing with complex models with many VC. To bridge this gap, here we (1) present a computationally efficient and tractable algorithm, named the augmented AI-REML, which facilitates the AI-REML by solving an augmented MME only once within each REML iteration; and (2) implement this approach for VC estimation in a general framework of a multi-trait GBLUP model. VC estimation was investigated based on the number of VC in the model, including a two-trait, three-trait, four-trait, and five-trait GBLUP model. We compared the augmented AI-REML with the standard AI-REML in terms of computing time per REML iteration. Direct and iterative solving methods were used to assess the advances of the augmented AI-REML. When using the direct solving method, the augmented AI-REML and the standard AI-REML required similar computing times for models with a small number of VC (the two- and three-trait GBLUP model), while the augmented AI-REML demonstrated more notable reductions in computing time as the number of VC in the model increased. When using the iterative solving method, the augmented AI-REML demonstrated substantial improvements in computational efficiency compared to the standard AI-REML. The elapsed time of each REML iteration was reduced by 75%, 84%, and 86% for the two-, three-, and four-trait GBLUP models, respectively. The augmented AI-REML can considerably reduce the computing time within each REML iteration, particularly when using an iterative solver. Our results demonstrate the potential of the augmented AI-REML as an appealing approach for large-scale VC estimation in the genomic era.
{"title":"A computationally efficient algorithm to leverage average information REML for (co)variance component estimation in the genomic era","authors":"Ismo Strandén, Esa A. Mäntysaari, Martin H. Lidauer, Robin Thompson, Hongding Gao","doi":"10.1186/s12711-024-00939-x","DOIUrl":"https://doi.org/10.1186/s12711-024-00939-x","url":null,"abstract":"Methods for estimating variance components (VC) using restricted maximum likelihood (REML) typically require elements from the inverse of the coefficient matrix of the mixed model equations (MME). As genomic information becomes more prevalent, the coefficient matrix of the MME becomes denser, presenting a challenge for analyzing large datasets. Thus, computational algorithms based on iterative solving and Monte Carlo approximation of the inverse of the coefficient matrix become appealing. While the standard average information REML (AI-REML) is known for its rapid convergence, its computational intensity imposes limitations. In particular, the standard AI-REML requires solving the MME for each VC, which can be computationally demanding, especially when dealing with complex models with many VC. To bridge this gap, here we (1) present a computationally efficient and tractable algorithm, named the augmented AI-REML, which facilitates the AI-REML by solving an augmented MME only once within each REML iteration; and (2) implement this approach for VC estimation in a general framework of a multi-trait GBLUP model. VC estimation was investigated based on the number of VC in the model, including a two-trait, three-trait, four-trait, and five-trait GBLUP model. We compared the augmented AI-REML with the standard AI-REML in terms of computing time per REML iteration. Direct and iterative solving methods were used to assess the advances of the augmented AI-REML. When using the direct solving method, the augmented AI-REML and the standard AI-REML required similar computing times for models with a small number of VC (the two- and three-trait GBLUP model), while the augmented AI-REML demonstrated more notable reductions in computing time as the number of VC in the model increased. When using the iterative solving method, the augmented AI-REML demonstrated substantial improvements in computational efficiency compared to the standard AI-REML. The elapsed time of each REML iteration was reduced by 75%, 84%, and 86% for the two-, three-, and four-trait GBLUP models, respectively. The augmented AI-REML can considerably reduce the computing time within each REML iteration, particularly when using an iterative solver. Our results demonstrate the potential of the augmented AI-REML as an appealing approach for large-scale VC estimation in the genomic era.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"11 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142678314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-21DOI: 10.1186/s12711-024-00943-1
Alan M. Pardo, Andres Legarra, Zulma G. Vitezica, Natalia S. Forneris, Daniel O. Maizon, Sebastián Munilla
Cross-validation techniques in genetic evaluations encounter limitations due to the unobservable nature of breeding values and the challenge of validating estimated breeding values (EBVs) against pre-corrected phenotypes, challenges which the Linear Regression (LR) method addresses as an alternative. Furthermore, beef cattle genetic evaluation programs confront challenges with connectedness among herds and pedigree errors. The objective of this work was to evaluate the LR method's performance under pedigree errors and weak connectedness typical in beef cattle genetic evaluations, through simulation. We simulated a beef cattle population resembling the Argentinean Brangus, including a quantitative trait selected over six pseudo-generations with a heritability of 0.4. This study considered various scenarios, including: 25% and 40% pedigree errors (PE-25 and PE-40), weak and strong connectedness among herds (WCO and SCO, respectively), and a benchmark scenario (BEN) with complete pedigree and optimal herd connections. Over six pseudo-generations of selection, genetic gain was simulated to be under- and over-estimated in PE-40 and WCO, respectively, contrary to the BEN scenario which was unbiased. In genetic evaluations with PE-25 and PE-40, true biases of − 0.13 and − 0.18 genetic standard deviations were simulated, respectively. In the BEN scenario, the LR method accurately estimated bias, however, in PE-25 and PE-40 scenarios, it overestimated biases by 0.17 and 0.25 genetic standard deviations, respectively. In herds facing WCO, significant true bias due to confounding environmental and genetic effects was simulated, and the corresponding LR statistic failed to accurately estimate the magnitude and direction of this bias. On average, true dispersion values were close to one for BEN, PE-40, SCO and WCO, showing no significant inflation or deflation, and the values were accurately estimated by LR. However, PE-25 exhibited inflation of EBVs and was slightly underestimated by LR. Accuracies and reliabilities showed good agreement between true and LR estimated values for the scenarios evaluated. The LR method demonstrated limitations in identifying biases induced by incomplete pedigrees, including scenarios with as much as 40% pedigree errors, or lack of connectedness, but it was effective in assessing dispersion, and population accuracies and reliabilities even in the challenging scenarios addressed.
{"title":"On the ability of the LR method to detect bias when there is pedigree misspecification and lack of connectedness","authors":"Alan M. Pardo, Andres Legarra, Zulma G. Vitezica, Natalia S. Forneris, Daniel O. Maizon, Sebastián Munilla","doi":"10.1186/s12711-024-00943-1","DOIUrl":"https://doi.org/10.1186/s12711-024-00943-1","url":null,"abstract":"Cross-validation techniques in genetic evaluations encounter limitations due to the unobservable nature of breeding values and the challenge of validating estimated breeding values (EBVs) against pre-corrected phenotypes, challenges which the Linear Regression (LR) method addresses as an alternative. Furthermore, beef cattle genetic evaluation programs confront challenges with connectedness among herds and pedigree errors. The objective of this work was to evaluate the LR method's performance under pedigree errors and weak connectedness typical in beef cattle genetic evaluations, through simulation. We simulated a beef cattle population resembling the Argentinean Brangus, including a quantitative trait selected over six pseudo-generations with a heritability of 0.4. This study considered various scenarios, including: 25% and 40% pedigree errors (PE-25 and PE-40), weak and strong connectedness among herds (WCO and SCO, respectively), and a benchmark scenario (BEN) with complete pedigree and optimal herd connections. Over six pseudo-generations of selection, genetic gain was simulated to be under- and over-estimated in PE-40 and WCO, respectively, contrary to the BEN scenario which was unbiased. In genetic evaluations with PE-25 and PE-40, true biases of − 0.13 and − 0.18 genetic standard deviations were simulated, respectively. In the BEN scenario, the LR method accurately estimated bias, however, in PE-25 and PE-40 scenarios, it overestimated biases by 0.17 and 0.25 genetic standard deviations, respectively. In herds facing WCO, significant true bias due to confounding environmental and genetic effects was simulated, and the corresponding LR statistic failed to accurately estimate the magnitude and direction of this bias. On average, true dispersion values were close to one for BEN, PE-40, SCO and WCO, showing no significant inflation or deflation, and the values were accurately estimated by LR. However, PE-25 exhibited inflation of EBVs and was slightly underestimated by LR. Accuracies and reliabilities showed good agreement between true and LR estimated values for the scenarios evaluated. The LR method demonstrated limitations in identifying biases induced by incomplete pedigrees, including scenarios with as much as 40% pedigree errors, or lack of connectedness, but it was effective in assessing dispersion, and population accuracies and reliabilities even in the challenging scenarios addressed.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"14 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142678315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-15DOI: 10.1186/s12711-024-00942-2
Tuan V. Nguyen, Sunduimijid Bolormaa, Coralie M. Reich, Amanda J. Chamberlain, Christy J. Vander Jagt, Hans D. Daetwyler, Iona M. MacLeod
Genotype imputation is a cost-effective method for obtaining sequence genotypes for downstream analyses such as genome-wide association studies (GWAS). However, low imputation accuracy can increase the risk of false positives, so it is important to pre-filter data or at least assess the potential limitations due to imputation accuracy. In this study, we benchmarked three different imputation programs (Beagle 5.2, Minimac4 and IMPUTE5) and compared the empirical accuracy of imputation with the software estimated accuracy of imputation (Rsqsoft). We also tested the accuracy of imputation in cattle for autosomal and X chromosomes, SNP and INDEL, when imputing from either low-density or high-density genotypes. The accuracy of imputing sequence variants from real high-density genotypes was higher than from low-density genotypes. In our software benchmark, all programs performed well with only minor differences in accuracy. While there was a close relationship between empirical imputation accuracy and the imputation Rsqsoft, this differed considerably for Minimac4 compared to Beagle 5.2 and IMPUTE5. We found that the Rsqsoft threshold for removing poorly imputed variants must be customised according to the software and this should be accounted for when merging data from multiple studies, such as in meta-GWAS studies. We also found that imposing an Rsqsoft filter has a positive impact on genomic regions with poor imputation accuracy due to large segmental duplications that are susceptible to error-prone alignment. Overall, our results showed that on average the imputation accuracy for INDEL was approximately 6% lower than SNP for all software programs. Importantly, the imputation accuracy for the non-PAR (non-Pseudo-Autosomal Region) of the X chromosome was comparable to autosomal imputation accuracy, while for the PAR it was substantially lower, particularly when starting from low-density genotypes. This study provides an empirically derived approach to apply customised software-specific Rsqsoft thresholds for downstream analyses of imputed variants, such as needed for a meta-GWAS. The very poor empirical imputation accuracy for variants on the PAR when starting from low density genotypes demonstrates that this region should be imputed starting from a higher density of real genotypes.
{"title":"Empirical versus estimated accuracy of imputation: optimising filtering thresholds for sequence imputation","authors":"Tuan V. Nguyen, Sunduimijid Bolormaa, Coralie M. Reich, Amanda J. Chamberlain, Christy J. Vander Jagt, Hans D. Daetwyler, Iona M. MacLeod","doi":"10.1186/s12711-024-00942-2","DOIUrl":"https://doi.org/10.1186/s12711-024-00942-2","url":null,"abstract":"Genotype imputation is a cost-effective method for obtaining sequence genotypes for downstream analyses such as genome-wide association studies (GWAS). However, low imputation accuracy can increase the risk of false positives, so it is important to pre-filter data or at least assess the potential limitations due to imputation accuracy. In this study, we benchmarked three different imputation programs (Beagle 5.2, Minimac4 and IMPUTE5) and compared the empirical accuracy of imputation with the software estimated accuracy of imputation (Rsqsoft). We also tested the accuracy of imputation in cattle for autosomal and X chromosomes, SNP and INDEL, when imputing from either low-density or high-density genotypes. The accuracy of imputing sequence variants from real high-density genotypes was higher than from low-density genotypes. In our software benchmark, all programs performed well with only minor differences in accuracy. While there was a close relationship between empirical imputation accuracy and the imputation Rsqsoft, this differed considerably for Minimac4 compared to Beagle 5.2 and IMPUTE5. We found that the Rsqsoft threshold for removing poorly imputed variants must be customised according to the software and this should be accounted for when merging data from multiple studies, such as in meta-GWAS studies. We also found that imposing an Rsqsoft filter has a positive impact on genomic regions with poor imputation accuracy due to large segmental duplications that are susceptible to error-prone alignment. Overall, our results showed that on average the imputation accuracy for INDEL was approximately 6% lower than SNP for all software programs. Importantly, the imputation accuracy for the non-PAR (non-Pseudo-Autosomal Region) of the X chromosome was comparable to autosomal imputation accuracy, while for the PAR it was substantially lower, particularly when starting from low-density genotypes. This study provides an empirically derived approach to apply customised software-specific Rsqsoft thresholds for downstream analyses of imputed variants, such as needed for a meta-GWAS. The very poor empirical imputation accuracy for variants on the PAR when starting from low density genotypes demonstrates that this region should be imputed starting from a higher density of real genotypes.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"29 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142637636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-04DOI: 10.1186/s12711-024-00938-y
Margot Slagboom, Hanne Marie Nielsen, Morten Kargo, Mark Henryon, Laura Skrubbeltrang Hansen
The aim of this study was to compare genetic gain and rate of inbreeding for different mass selection breeding programs with the aim of increasing larval body weight (LBW) in black soldier flies. The breeding programs differed in: (1) sampling of individuals for phenotyping (either random over the whole population or a fixed number per full sib family), (2) selection of adult flies for breeding (based on an adult individual’s phenotype for LBW or random from larvae preselected based on LBW), and (3) mating strategy (mating in a group with unequal male contributions or controlled between two females and one male). In addition, the numbers of phenotyped and preselected larvae were varied. The sex of an individual was unknown during preselection and females had higher LBW, resulting in more females being preselected. Selecting adult flies based on their phenotype for LBW increased genetic gain by 0.06 genetic standard deviation units compared to randomly selecting from the preselected larvae. Fixing the number of phenotyped larvae per family increased the rate of inbreeding by 0.15 to 0.20% per generation. Controlled mating compared to group mating decreased the rate of inbreeding by 0.02 to 0.03% per generation. Phenotyping more than 4000 larvae resulted in a lack of preselected males due to the sexual dimorphism. Preselecting both too few and too many larvae could negatively impact genetic gain, depending on the breeding program. A mass selection breeding programs in which the adult fly is selected based on their larval phenotype, breeding animals mate in a group and sampling larvae for phenotyping at random over the whole population is recommended for black soldier flies, considering the positive effect on rates of genetic gain and inbreeding. The number of phenotyped and preselected larvae should be calculated based on the expected female weight deviation to ensure sufficient male and female candidates are selected.
{"title":"The effect of phenotyping, adult selection, and mating strategies on genetic gain and rate of inbreeding in black soldier fly breeding programs","authors":"Margot Slagboom, Hanne Marie Nielsen, Morten Kargo, Mark Henryon, Laura Skrubbeltrang Hansen","doi":"10.1186/s12711-024-00938-y","DOIUrl":"https://doi.org/10.1186/s12711-024-00938-y","url":null,"abstract":"The aim of this study was to compare genetic gain and rate of inbreeding for different mass selection breeding programs with the aim of increasing larval body weight (LBW) in black soldier flies. The breeding programs differed in: (1) sampling of individuals for phenotyping (either random over the whole population or a fixed number per full sib family), (2) selection of adult flies for breeding (based on an adult individual’s phenotype for LBW or random from larvae preselected based on LBW), and (3) mating strategy (mating in a group with unequal male contributions or controlled between two females and one male). In addition, the numbers of phenotyped and preselected larvae were varied. The sex of an individual was unknown during preselection and females had higher LBW, resulting in more females being preselected. Selecting adult flies based on their phenotype for LBW increased genetic gain by 0.06 genetic standard deviation units compared to randomly selecting from the preselected larvae. Fixing the number of phenotyped larvae per family increased the rate of inbreeding by 0.15 to 0.20% per generation. Controlled mating compared to group mating decreased the rate of inbreeding by 0.02 to 0.03% per generation. Phenotyping more than 4000 larvae resulted in a lack of preselected males due to the sexual dimorphism. Preselecting both too few and too many larvae could negatively impact genetic gain, depending on the breeding program. A mass selection breeding programs in which the adult fly is selected based on their larval phenotype, breeding animals mate in a group and sampling larvae for phenotyping at random over the whole population is recommended for black soldier flies, considering the positive effect on rates of genetic gain and inbreeding. The number of phenotyped and preselected larvae should be calculated based on the expected female weight deviation to ensure sufficient male and female candidates are selected.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"25 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142574491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}