Pub Date : 2024-12-30DOI: 10.1186/s12711-024-00944-0
Andrei A. Kudinov, Antti Kause
Sex identification in farmed fish is important for the management of fish stocks and breeding programs, but identification based on visual characteristics is typically difficult or impossible in juvenile or premature fish. The amount of genomic data obtained from farmed fish is rapidly growing with the implementation of genomic selection in aquaculture. In comparison to mammals and birds, ray-finned fishes exhibit a greater diversity of sex determination systems, with an absence of conserved genomic regions. A group of genomic markers located on a standard genotyping array has been reported to potentially be linked with sex determination in rainbow trout. However, the set of markers suitable for sex identification may vary between populations. Sex identification from genomic data is usually performed using probabilistic methods, where suitable markers are known beforehand. In our study, we demonstrated the use of the Extreme Gradient Boosting approach from the supervised machine learning gradient boost framework to predict sex from unimputed genomic data, when the suitability of the markers was unknown a priori. The accuracy of the method was assessed using four simulated datasets with different genotyping error rates and one real dataset from the Finnish Rainbow Trout Breeding Program. The method showed high prediction quality on both simulated and real datasets. For simulated datasets with low (5%) and high (50%) genotyping error rates, the accuracies were 1.0 and 0.60, respectively. In the real data, the method achieved a prediction accuracy of 98%, which is suitable for routine use.
{"title":"Sex identification in rainbow trout using genomic information and machine learning","authors":"Andrei A. Kudinov, Antti Kause","doi":"10.1186/s12711-024-00944-0","DOIUrl":"https://doi.org/10.1186/s12711-024-00944-0","url":null,"abstract":"Sex identification in farmed fish is important for the management of fish stocks and breeding programs, but identification based on visual characteristics is typically difficult or impossible in juvenile or premature fish. The amount of genomic data obtained from farmed fish is rapidly growing with the implementation of genomic selection in aquaculture. In comparison to mammals and birds, ray-finned fishes exhibit a greater diversity of sex determination systems, with an absence of conserved genomic regions. A group of genomic markers located on a standard genotyping array has been reported to potentially be linked with sex determination in rainbow trout. However, the set of markers suitable for sex identification may vary between populations. Sex identification from genomic data is usually performed using probabilistic methods, where suitable markers are known beforehand. In our study, we demonstrated the use of the Extreme Gradient Boosting approach from the supervised machine learning gradient boost framework to predict sex from unimputed genomic data, when the suitability of the markers was unknown a priori. The accuracy of the method was assessed using four simulated datasets with different genotyping error rates and one real dataset from the Finnish Rainbow Trout Breeding Program. The method showed high prediction quality on both simulated and real datasets. For simulated datasets with low (5%) and high (50%) genotyping error rates, the accuracies were 1.0 and 0.60, respectively. In the real data, the method achieved a prediction accuracy of 98%, which is suitable for routine use.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"4 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142901813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The genome-wide association study (GWAS) is a powerful method for mapping quantitative trait loci (QTL). However, standard GWAS can detect only QTL that segregate in the mapping population. Crossing populations with different characteristics increases genetic variability but F2 or back-crosses lack mapping resolution due to the limited number of recombination events. This drawback can be overcome with advanced intercross line (AIL) populations, which increase the number recombination events and provide a more accurate mapping resolution. Recent studies in humans have revealed ancestry-dependent genetic architecture and shown the effectiveness of admixture mapping in admixed populations. Through the incorporation of line-of-origin effects and GWAS on an F9 AIL population, we identified genes that affect body weight at eight weeks of age (BW8) in chickens. The proposed ancestral-haplotype-based GWAS (testing only the origin regardless of the alleles) revealed three new QTLs on GGA12, GGA15, and GGA20. By using the concepts of ancestral homozygotes (individuals that carry two haplotypes of the same origin) and ancestral heterozygotes (carrying one haplotype of each origin), we identified 632 loci that exhibited high-parent (the heterozygote is better than both parents) and mid-parent (the heterozygote is better than the median of the parents) dominance across 12 chromosomes. Out of the 199 genes associated with BW8, EYA1, PDE1C, and MYC were identified as the best candidate genes for further validation. In addition to the candidate genes reported in this study, our research demonstrates the effectiveness of incorporating ancestral information in population genetic analyses, which can be broadly applicable for genetic mapping in populations generated by ancestors with distinct phenotypes and genetic backgrounds. Our methods can benefit both geneticists and biologists interested in the genetic determinism of complex traits.
{"title":"Haplotype analysis incorporating ancestral origins identified novel genetic loci associated with chicken body weight using an advanced intercross line","authors":"Lina Bu, Yuzhe Wang, Lizhi Tan, Zilong Wen, Xiaoxiang Hu, Zhiwu Zhang, Yiqiang Zhao","doi":"10.1186/s12711-024-00946-y","DOIUrl":"https://doi.org/10.1186/s12711-024-00946-y","url":null,"abstract":"The genome-wide association study (GWAS) is a powerful method for mapping quantitative trait loci (QTL). However, standard GWAS can detect only QTL that segregate in the mapping population. Crossing populations with different characteristics increases genetic variability but F2 or back-crosses lack mapping resolution due to the limited number of recombination events. This drawback can be overcome with advanced intercross line (AIL) populations, which increase the number recombination events and provide a more accurate mapping resolution. Recent studies in humans have revealed ancestry-dependent genetic architecture and shown the effectiveness of admixture mapping in admixed populations. Through the incorporation of line-of-origin effects and GWAS on an F9 AIL population, we identified genes that affect body weight at eight weeks of age (BW8) in chickens. The proposed ancestral-haplotype-based GWAS (testing only the origin regardless of the alleles) revealed three new QTLs on GGA12, GGA15, and GGA20. By using the concepts of ancestral homozygotes (individuals that carry two haplotypes of the same origin) and ancestral heterozygotes (carrying one haplotype of each origin), we identified 632 loci that exhibited high-parent (the heterozygote is better than both parents) and mid-parent (the heterozygote is better than the median of the parents) dominance across 12 chromosomes. Out of the 199 genes associated with BW8, EYA1, PDE1C, and MYC were identified as the best candidate genes for further validation. In addition to the candidate genes reported in this study, our research demonstrates the effectiveness of incorporating ancestral information in population genetic analyses, which can be broadly applicable for genetic mapping in populations generated by ancestors with distinct phenotypes and genetic backgrounds. Our methods can benefit both geneticists and biologists interested in the genetic determinism of complex traits.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"64 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142858434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-18DOI: 10.1186/s12711-024-00947-x
Jón H. Eiríksson, Þórdís Þórarinsdóttir, Egill Gautason
Scrapie is an infectious prion disease in sheep. Selective breeding for resistant genotypes of the prion protein gene (PRNP) is an effective way to prevent scrapie outbreaks. Genotyping all selection candidates in a population is expensive but existing pedigree records can help infer the probabilities of genotypes in relatives of genotyped animals. We used linear models to predict allele content for the various PRNP alleles found in Icelandic sheep and compiled the available estimates of relative scrapie susceptibility (RSS) associated with PRNP genotypes from the literature. Using the predicted allele content and the genotypic RSS we calculated estimated breeding values (EBV) for RSS. We tested the predictions on simulated data under different scenarios that varied in the proportion of genotyped sheep, genotyping strategy, pedigree recording accuracy, genotyping error rates and assumed heritability of allele content. Prediction of allele content for rare alleles was less successful than for alleles with moderate frequencies. The accuracy of allele content and RSS EBV predictions was not affected by the assumed heritability, but the dispersion of prediction was affected. In a scenario where 40% of rams were genotyped and no errors in genotyping or recorded pedigree, the accuracy of RSS EBV for ungenotyped selection candidates was 0.49. If only 20% of rams were genotyped, or rams and ewes were genotyped randomly, or there were 10% pedigree errors, or there were 2% genotyping errors, the accuracy decreased by 0.07, 0.08, 0.03 and 0.04, respectively. With empirical data, the accuracy of RSS EBV for ungenotyped sheep was 0.46–0.65. A linear model for predicting allele content for the PRNP gene, combined with estimates of relative susceptibility associated with PRNP genotypes, can provide RSS EBV for scrapie resistance for ungenotyped selection candidates with accuracy up to 0.65. These RSS EBV can complement selection strategies based on PRNP genotypes, especially in populations where resistant genotypes are rare.
{"title":"Predicted breeding values for relative scrapie susceptibility for genotyped and ungenotyped sheep","authors":"Jón H. Eiríksson, Þórdís Þórarinsdóttir, Egill Gautason","doi":"10.1186/s12711-024-00947-x","DOIUrl":"https://doi.org/10.1186/s12711-024-00947-x","url":null,"abstract":"Scrapie is an infectious prion disease in sheep. Selective breeding for resistant genotypes of the prion protein gene (PRNP) is an effective way to prevent scrapie outbreaks. Genotyping all selection candidates in a population is expensive but existing pedigree records can help infer the probabilities of genotypes in relatives of genotyped animals. We used linear models to predict allele content for the various PRNP alleles found in Icelandic sheep and compiled the available estimates of relative scrapie susceptibility (RSS) associated with PRNP genotypes from the literature. Using the predicted allele content and the genotypic RSS we calculated estimated breeding values (EBV) for RSS. We tested the predictions on simulated data under different scenarios that varied in the proportion of genotyped sheep, genotyping strategy, pedigree recording accuracy, genotyping error rates and assumed heritability of allele content. Prediction of allele content for rare alleles was less successful than for alleles with moderate frequencies. The accuracy of allele content and RSS EBV predictions was not affected by the assumed heritability, but the dispersion of prediction was affected. In a scenario where 40% of rams were genotyped and no errors in genotyping or recorded pedigree, the accuracy of RSS EBV for ungenotyped selection candidates was 0.49. If only 20% of rams were genotyped, or rams and ewes were genotyped randomly, or there were 10% pedigree errors, or there were 2% genotyping errors, the accuracy decreased by 0.07, 0.08, 0.03 and 0.04, respectively. With empirical data, the accuracy of RSS EBV for ungenotyped sheep was 0.46–0.65. A linear model for predicting allele content for the PRNP gene, combined with estimates of relative susceptibility associated with PRNP genotypes, can provide RSS EBV for scrapie resistance for ungenotyped selection candidates with accuracy up to 0.65. These RSS EBV can complement selection strategies based on PRNP genotypes, especially in populations where resistant genotypes are rare.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"54 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142849415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-17DOI: 10.1186/s12711-024-00941-3
Yvonne C. J. Wientjes, Katrijn Peeters, Piter Bijma, Abe E. Huisman, Mario P. L. Calus
Genetic selection improves a population by increasing the frequency of favorable alleles. Understanding and monitoring allele frequency changes is, therefore, important to obtain more insight into the long-term effects of selection. This study aimed to investigate changes in allele frequencies and in results of genome-wide association studies (GWAS), and how those two are related to each other. This was studied in two maternal pig lines where selection was based on a broad selection index. Genotypes and phenotypes were available from 2015 to 2021. Several large changes in allele frequencies over the years were observed in both lines. The largest allele frequency changes were not larger than expected under drift based on gene dropping simulations, but the average allele frequency change was larger with selection. Moreover, several significant regions were found in the GWAS for the traits under selection, but those regions did not overlap with regions with larger allele frequency changes. No significant GWAS regions were found for the selection index in both lines, which included multiple traits, indicating that the index is affected by many loci of small effect. Additionally, many significant regions showed pleiotropic, and often antagonistic, associations with other traits under selection. This reduces the selection pressure on those regions, which can explain why those regions are still segregating, although the traits have been under selection for several generations. Across the years, only small changes in Manhattan plots were found, indicating that the genetic architecture was reasonably constant. No significant GWAS regions were found for any of the traits under selection among the regions with the largest changes in allele frequency, and the correlation between significance level of marker associations and changes in allele frequency over one generation was close to zero for all traits. Moreover, the largest changes in allele frequency could be explained by drift and were not necessarily a result of selection. This is probably because selection acted on a broad index for which no significant GWAS regions were found. Our results show that selecting on a broad index spreads the selection pressure across the genome, thereby limiting allele frequency changes.
{"title":"Changes in allele frequencies and genetic architecture due to selection in two pig populations","authors":"Yvonne C. J. Wientjes, Katrijn Peeters, Piter Bijma, Abe E. Huisman, Mario P. L. Calus","doi":"10.1186/s12711-024-00941-3","DOIUrl":"https://doi.org/10.1186/s12711-024-00941-3","url":null,"abstract":"Genetic selection improves a population by increasing the frequency of favorable alleles. Understanding and monitoring allele frequency changes is, therefore, important to obtain more insight into the long-term effects of selection. This study aimed to investigate changes in allele frequencies and in results of genome-wide association studies (GWAS), and how those two are related to each other. This was studied in two maternal pig lines where selection was based on a broad selection index. Genotypes and phenotypes were available from 2015 to 2021. Several large changes in allele frequencies over the years were observed in both lines. The largest allele frequency changes were not larger than expected under drift based on gene dropping simulations, but the average allele frequency change was larger with selection. Moreover, several significant regions were found in the GWAS for the traits under selection, but those regions did not overlap with regions with larger allele frequency changes. No significant GWAS regions were found for the selection index in both lines, which included multiple traits, indicating that the index is affected by many loci of small effect. Additionally, many significant regions showed pleiotropic, and often antagonistic, associations with other traits under selection. This reduces the selection pressure on those regions, which can explain why those regions are still segregating, although the traits have been under selection for several generations. Across the years, only small changes in Manhattan plots were found, indicating that the genetic architecture was reasonably constant. No significant GWAS regions were found for any of the traits under selection among the regions with the largest changes in allele frequency, and the correlation between significance level of marker associations and changes in allele frequency over one generation was close to zero for all traits. Moreover, the largest changes in allele frequency could be explained by drift and were not necessarily a result of selection. This is probably because selection acted on a broad index for which no significant GWAS regions were found. Our results show that selecting on a broad index spreads the selection pressure across the genome, thereby limiting allele frequency changes.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"26 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142832236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-13DOI: 10.1186/s12711-024-00940-4
Christian Stricker, Rohan L. Fernando, Albrecht Melchinger, Hans-Juergen Auinger, Chris-Carolin Schoen
Accuracy of genomic prediction depends on the heritability of the trait, the size of the training set, the relationship of the candidates to the training set, and the $$text {Min}(N_{text {QTL}},M_e)$$ , where $$N_{text {QTL}}$$ is the number of QTL and $$M_e$$ is the number of independently segregating chromosomal segments. Due to LD, the number $$Q_e$$ of independently segregating QTL (effective QTL) can be lower than $$text {Min}(N_{text {QTL}},M_e)$$ . In this paper, we show that $$Q_e$$ is inversely associated with the trait-specific genomic relationship of a candidate to the training set. This provides an explanation for the inverse association between $$Q_e$$ and the accuracy of prediction. To quantify the genomic relationship of a candidate to all members of the training set, we considered the $$k^2$$ statistic that has been previously used for this purpose. It quantifies how well the marker covariate vector of a candidate can be represented as a linear combination of the rows of the marker covariate matrix of the training set. In this paper, we used Bayesian regression to make this statistic trait specific and argue that the trait-specific genomic relationship of a candidate to the training set is inversely associated with $$Q_e$$ . Simulation was used to demonstrate the dependence of the trait-specific $$k^2$$ statistic on $$Q_e$$ , which is related to $$N_{text {QTL}}$$ . The posterior distributions of the trait-specific $$k^2$$ statistic showed that the trait-specific genomic relationship between a candidate and the training set is inversely associated to $$Q_e$$ and $$N_{text {QTL}}$$ . Further, we show that trait-specific genomic relationship between a candidate and the training set is directly related to the size of the training set.
{"title":"On the inverse association between the number of QTL and the trait-specific genomic relationship of a candidate to the training set.","authors":"Christian Stricker, Rohan L. Fernando, Albrecht Melchinger, Hans-Juergen Auinger, Chris-Carolin Schoen","doi":"10.1186/s12711-024-00940-4","DOIUrl":"https://doi.org/10.1186/s12711-024-00940-4","url":null,"abstract":"Accuracy of genomic prediction depends on the heritability of the trait, the size of the training set, the relationship of the candidates to the training set, and the $$text {Min}(N_{text {QTL}},M_e)$$ , where $$N_{text {QTL}}$$ is the number of QTL and $$M_e$$ is the number of independently segregating chromosomal segments. Due to LD, the number $$Q_e$$ of independently segregating QTL (effective QTL) can be lower than $$text {Min}(N_{text {QTL}},M_e)$$ . In this paper, we show that $$Q_e$$ is inversely associated with the trait-specific genomic relationship of a candidate to the training set. This provides an explanation for the inverse association between $$Q_e$$ and the accuracy of prediction. To quantify the genomic relationship of a candidate to all members of the training set, we considered the $$k^2$$ statistic that has been previously used for this purpose. It quantifies how well the marker covariate vector of a candidate can be represented as a linear combination of the rows of the marker covariate matrix of the training set. In this paper, we used Bayesian regression to make this statistic trait specific and argue that the trait-specific genomic relationship of a candidate to the training set is inversely associated with $$Q_e$$ . Simulation was used to demonstrate the dependence of the trait-specific $$k^2$$ statistic on $$Q_e$$ , which is related to $$N_{text {QTL}}$$ . The posterior distributions of the trait-specific $$k^2$$ statistic showed that the trait-specific genomic relationship between a candidate and the training set is inversely associated to $$Q_e$$ and $$N_{text {QTL}}$$ . Further, we show that trait-specific genomic relationship between a candidate and the training set is directly related to the size of the training set.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"73 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142815891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-21DOI: 10.1186/s12711-024-00939-x
Ismo Strandén, Esa A. Mäntysaari, Martin H. Lidauer, Robin Thompson, Hongding Gao
Methods for estimating variance components (VC) using restricted maximum likelihood (REML) typically require elements from the inverse of the coefficient matrix of the mixed model equations (MME). As genomic information becomes more prevalent, the coefficient matrix of the MME becomes denser, presenting a challenge for analyzing large datasets. Thus, computational algorithms based on iterative solving and Monte Carlo approximation of the inverse of the coefficient matrix become appealing. While the standard average information REML (AI-REML) is known for its rapid convergence, its computational intensity imposes limitations. In particular, the standard AI-REML requires solving the MME for each VC, which can be computationally demanding, especially when dealing with complex models with many VC. To bridge this gap, here we (1) present a computationally efficient and tractable algorithm, named the augmented AI-REML, which facilitates the AI-REML by solving an augmented MME only once within each REML iteration; and (2) implement this approach for VC estimation in a general framework of a multi-trait GBLUP model. VC estimation was investigated based on the number of VC in the model, including a two-trait, three-trait, four-trait, and five-trait GBLUP model. We compared the augmented AI-REML with the standard AI-REML in terms of computing time per REML iteration. Direct and iterative solving methods were used to assess the advances of the augmented AI-REML. When using the direct solving method, the augmented AI-REML and the standard AI-REML required similar computing times for models with a small number of VC (the two- and three-trait GBLUP model), while the augmented AI-REML demonstrated more notable reductions in computing time as the number of VC in the model increased. When using the iterative solving method, the augmented AI-REML demonstrated substantial improvements in computational efficiency compared to the standard AI-REML. The elapsed time of each REML iteration was reduced by 75%, 84%, and 86% for the two-, three-, and four-trait GBLUP models, respectively. The augmented AI-REML can considerably reduce the computing time within each REML iteration, particularly when using an iterative solver. Our results demonstrate the potential of the augmented AI-REML as an appealing approach for large-scale VC estimation in the genomic era.
{"title":"A computationally efficient algorithm to leverage average information REML for (co)variance component estimation in the genomic era","authors":"Ismo Strandén, Esa A. Mäntysaari, Martin H. Lidauer, Robin Thompson, Hongding Gao","doi":"10.1186/s12711-024-00939-x","DOIUrl":"https://doi.org/10.1186/s12711-024-00939-x","url":null,"abstract":"Methods for estimating variance components (VC) using restricted maximum likelihood (REML) typically require elements from the inverse of the coefficient matrix of the mixed model equations (MME). As genomic information becomes more prevalent, the coefficient matrix of the MME becomes denser, presenting a challenge for analyzing large datasets. Thus, computational algorithms based on iterative solving and Monte Carlo approximation of the inverse of the coefficient matrix become appealing. While the standard average information REML (AI-REML) is known for its rapid convergence, its computational intensity imposes limitations. In particular, the standard AI-REML requires solving the MME for each VC, which can be computationally demanding, especially when dealing with complex models with many VC. To bridge this gap, here we (1) present a computationally efficient and tractable algorithm, named the augmented AI-REML, which facilitates the AI-REML by solving an augmented MME only once within each REML iteration; and (2) implement this approach for VC estimation in a general framework of a multi-trait GBLUP model. VC estimation was investigated based on the number of VC in the model, including a two-trait, three-trait, four-trait, and five-trait GBLUP model. We compared the augmented AI-REML with the standard AI-REML in terms of computing time per REML iteration. Direct and iterative solving methods were used to assess the advances of the augmented AI-REML. When using the direct solving method, the augmented AI-REML and the standard AI-REML required similar computing times for models with a small number of VC (the two- and three-trait GBLUP model), while the augmented AI-REML demonstrated more notable reductions in computing time as the number of VC in the model increased. When using the iterative solving method, the augmented AI-REML demonstrated substantial improvements in computational efficiency compared to the standard AI-REML. The elapsed time of each REML iteration was reduced by 75%, 84%, and 86% for the two-, three-, and four-trait GBLUP models, respectively. The augmented AI-REML can considerably reduce the computing time within each REML iteration, particularly when using an iterative solver. Our results demonstrate the potential of the augmented AI-REML as an appealing approach for large-scale VC estimation in the genomic era.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"11 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142678314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-21DOI: 10.1186/s12711-024-00943-1
Alan M. Pardo, Andres Legarra, Zulma G. Vitezica, Natalia S. Forneris, Daniel O. Maizon, Sebastián Munilla
Cross-validation techniques in genetic evaluations encounter limitations due to the unobservable nature of breeding values and the challenge of validating estimated breeding values (EBVs) against pre-corrected phenotypes, challenges which the Linear Regression (LR) method addresses as an alternative. Furthermore, beef cattle genetic evaluation programs confront challenges with connectedness among herds and pedigree errors. The objective of this work was to evaluate the LR method's performance under pedigree errors and weak connectedness typical in beef cattle genetic evaluations, through simulation. We simulated a beef cattle population resembling the Argentinean Brangus, including a quantitative trait selected over six pseudo-generations with a heritability of 0.4. This study considered various scenarios, including: 25% and 40% pedigree errors (PE-25 and PE-40), weak and strong connectedness among herds (WCO and SCO, respectively), and a benchmark scenario (BEN) with complete pedigree and optimal herd connections. Over six pseudo-generations of selection, genetic gain was simulated to be under- and over-estimated in PE-40 and WCO, respectively, contrary to the BEN scenario which was unbiased. In genetic evaluations with PE-25 and PE-40, true biases of − 0.13 and − 0.18 genetic standard deviations were simulated, respectively. In the BEN scenario, the LR method accurately estimated bias, however, in PE-25 and PE-40 scenarios, it overestimated biases by 0.17 and 0.25 genetic standard deviations, respectively. In herds facing WCO, significant true bias due to confounding environmental and genetic effects was simulated, and the corresponding LR statistic failed to accurately estimate the magnitude and direction of this bias. On average, true dispersion values were close to one for BEN, PE-40, SCO and WCO, showing no significant inflation or deflation, and the values were accurately estimated by LR. However, PE-25 exhibited inflation of EBVs and was slightly underestimated by LR. Accuracies and reliabilities showed good agreement between true and LR estimated values for the scenarios evaluated. The LR method demonstrated limitations in identifying biases induced by incomplete pedigrees, including scenarios with as much as 40% pedigree errors, or lack of connectedness, but it was effective in assessing dispersion, and population accuracies and reliabilities even in the challenging scenarios addressed.
{"title":"On the ability of the LR method to detect bias when there is pedigree misspecification and lack of connectedness","authors":"Alan M. Pardo, Andres Legarra, Zulma G. Vitezica, Natalia S. Forneris, Daniel O. Maizon, Sebastián Munilla","doi":"10.1186/s12711-024-00943-1","DOIUrl":"https://doi.org/10.1186/s12711-024-00943-1","url":null,"abstract":"Cross-validation techniques in genetic evaluations encounter limitations due to the unobservable nature of breeding values and the challenge of validating estimated breeding values (EBVs) against pre-corrected phenotypes, challenges which the Linear Regression (LR) method addresses as an alternative. Furthermore, beef cattle genetic evaluation programs confront challenges with connectedness among herds and pedigree errors. The objective of this work was to evaluate the LR method's performance under pedigree errors and weak connectedness typical in beef cattle genetic evaluations, through simulation. We simulated a beef cattle population resembling the Argentinean Brangus, including a quantitative trait selected over six pseudo-generations with a heritability of 0.4. This study considered various scenarios, including: 25% and 40% pedigree errors (PE-25 and PE-40), weak and strong connectedness among herds (WCO and SCO, respectively), and a benchmark scenario (BEN) with complete pedigree and optimal herd connections. Over six pseudo-generations of selection, genetic gain was simulated to be under- and over-estimated in PE-40 and WCO, respectively, contrary to the BEN scenario which was unbiased. In genetic evaluations with PE-25 and PE-40, true biases of − 0.13 and − 0.18 genetic standard deviations were simulated, respectively. In the BEN scenario, the LR method accurately estimated bias, however, in PE-25 and PE-40 scenarios, it overestimated biases by 0.17 and 0.25 genetic standard deviations, respectively. In herds facing WCO, significant true bias due to confounding environmental and genetic effects was simulated, and the corresponding LR statistic failed to accurately estimate the magnitude and direction of this bias. On average, true dispersion values were close to one for BEN, PE-40, SCO and WCO, showing no significant inflation or deflation, and the values were accurately estimated by LR. However, PE-25 exhibited inflation of EBVs and was slightly underestimated by LR. Accuracies and reliabilities showed good agreement between true and LR estimated values for the scenarios evaluated. The LR method demonstrated limitations in identifying biases induced by incomplete pedigrees, including scenarios with as much as 40% pedigree errors, or lack of connectedness, but it was effective in assessing dispersion, and population accuracies and reliabilities even in the challenging scenarios addressed.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"14 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142678315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-15DOI: 10.1186/s12711-024-00942-2
Tuan V. Nguyen, Sunduimijid Bolormaa, Coralie M. Reich, Amanda J. Chamberlain, Christy J. Vander Jagt, Hans D. Daetwyler, Iona M. MacLeod
Genotype imputation is a cost-effective method for obtaining sequence genotypes for downstream analyses such as genome-wide association studies (GWAS). However, low imputation accuracy can increase the risk of false positives, so it is important to pre-filter data or at least assess the potential limitations due to imputation accuracy. In this study, we benchmarked three different imputation programs (Beagle 5.2, Minimac4 and IMPUTE5) and compared the empirical accuracy of imputation with the software estimated accuracy of imputation (Rsqsoft). We also tested the accuracy of imputation in cattle for autosomal and X chromosomes, SNP and INDEL, when imputing from either low-density or high-density genotypes. The accuracy of imputing sequence variants from real high-density genotypes was higher than from low-density genotypes. In our software benchmark, all programs performed well with only minor differences in accuracy. While there was a close relationship between empirical imputation accuracy and the imputation Rsqsoft, this differed considerably for Minimac4 compared to Beagle 5.2 and IMPUTE5. We found that the Rsqsoft threshold for removing poorly imputed variants must be customised according to the software and this should be accounted for when merging data from multiple studies, such as in meta-GWAS studies. We also found that imposing an Rsqsoft filter has a positive impact on genomic regions with poor imputation accuracy due to large segmental duplications that are susceptible to error-prone alignment. Overall, our results showed that on average the imputation accuracy for INDEL was approximately 6% lower than SNP for all software programs. Importantly, the imputation accuracy for the non-PAR (non-Pseudo-Autosomal Region) of the X chromosome was comparable to autosomal imputation accuracy, while for the PAR it was substantially lower, particularly when starting from low-density genotypes. This study provides an empirically derived approach to apply customised software-specific Rsqsoft thresholds for downstream analyses of imputed variants, such as needed for a meta-GWAS. The very poor empirical imputation accuracy for variants on the PAR when starting from low density genotypes demonstrates that this region should be imputed starting from a higher density of real genotypes.
{"title":"Empirical versus estimated accuracy of imputation: optimising filtering thresholds for sequence imputation","authors":"Tuan V. Nguyen, Sunduimijid Bolormaa, Coralie M. Reich, Amanda J. Chamberlain, Christy J. Vander Jagt, Hans D. Daetwyler, Iona M. MacLeod","doi":"10.1186/s12711-024-00942-2","DOIUrl":"https://doi.org/10.1186/s12711-024-00942-2","url":null,"abstract":"Genotype imputation is a cost-effective method for obtaining sequence genotypes for downstream analyses such as genome-wide association studies (GWAS). However, low imputation accuracy can increase the risk of false positives, so it is important to pre-filter data or at least assess the potential limitations due to imputation accuracy. In this study, we benchmarked three different imputation programs (Beagle 5.2, Minimac4 and IMPUTE5) and compared the empirical accuracy of imputation with the software estimated accuracy of imputation (Rsqsoft). We also tested the accuracy of imputation in cattle for autosomal and X chromosomes, SNP and INDEL, when imputing from either low-density or high-density genotypes. The accuracy of imputing sequence variants from real high-density genotypes was higher than from low-density genotypes. In our software benchmark, all programs performed well with only minor differences in accuracy. While there was a close relationship between empirical imputation accuracy and the imputation Rsqsoft, this differed considerably for Minimac4 compared to Beagle 5.2 and IMPUTE5. We found that the Rsqsoft threshold for removing poorly imputed variants must be customised according to the software and this should be accounted for when merging data from multiple studies, such as in meta-GWAS studies. We also found that imposing an Rsqsoft filter has a positive impact on genomic regions with poor imputation accuracy due to large segmental duplications that are susceptible to error-prone alignment. Overall, our results showed that on average the imputation accuracy for INDEL was approximately 6% lower than SNP for all software programs. Importantly, the imputation accuracy for the non-PAR (non-Pseudo-Autosomal Region) of the X chromosome was comparable to autosomal imputation accuracy, while for the PAR it was substantially lower, particularly when starting from low-density genotypes. This study provides an empirically derived approach to apply customised software-specific Rsqsoft thresholds for downstream analyses of imputed variants, such as needed for a meta-GWAS. The very poor empirical imputation accuracy for variants on the PAR when starting from low density genotypes demonstrates that this region should be imputed starting from a higher density of real genotypes.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"29 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142637636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-04DOI: 10.1186/s12711-024-00938-y
Margot Slagboom, Hanne Marie Nielsen, Morten Kargo, Mark Henryon, Laura Skrubbeltrang Hansen
The aim of this study was to compare genetic gain and rate of inbreeding for different mass selection breeding programs with the aim of increasing larval body weight (LBW) in black soldier flies. The breeding programs differed in: (1) sampling of individuals for phenotyping (either random over the whole population or a fixed number per full sib family), (2) selection of adult flies for breeding (based on an adult individual’s phenotype for LBW or random from larvae preselected based on LBW), and (3) mating strategy (mating in a group with unequal male contributions or controlled between two females and one male). In addition, the numbers of phenotyped and preselected larvae were varied. The sex of an individual was unknown during preselection and females had higher LBW, resulting in more females being preselected. Selecting adult flies based on their phenotype for LBW increased genetic gain by 0.06 genetic standard deviation units compared to randomly selecting from the preselected larvae. Fixing the number of phenotyped larvae per family increased the rate of inbreeding by 0.15 to 0.20% per generation. Controlled mating compared to group mating decreased the rate of inbreeding by 0.02 to 0.03% per generation. Phenotyping more than 4000 larvae resulted in a lack of preselected males due to the sexual dimorphism. Preselecting both too few and too many larvae could negatively impact genetic gain, depending on the breeding program. A mass selection breeding programs in which the adult fly is selected based on their larval phenotype, breeding animals mate in a group and sampling larvae for phenotyping at random over the whole population is recommended for black soldier flies, considering the positive effect on rates of genetic gain and inbreeding. The number of phenotyped and preselected larvae should be calculated based on the expected female weight deviation to ensure sufficient male and female candidates are selected.
{"title":"The effect of phenotyping, adult selection, and mating strategies on genetic gain and rate of inbreeding in black soldier fly breeding programs","authors":"Margot Slagboom, Hanne Marie Nielsen, Morten Kargo, Mark Henryon, Laura Skrubbeltrang Hansen","doi":"10.1186/s12711-024-00938-y","DOIUrl":"https://doi.org/10.1186/s12711-024-00938-y","url":null,"abstract":"The aim of this study was to compare genetic gain and rate of inbreeding for different mass selection breeding programs with the aim of increasing larval body weight (LBW) in black soldier flies. The breeding programs differed in: (1) sampling of individuals for phenotyping (either random over the whole population or a fixed number per full sib family), (2) selection of adult flies for breeding (based on an adult individual’s phenotype for LBW or random from larvae preselected based on LBW), and (3) mating strategy (mating in a group with unequal male contributions or controlled between two females and one male). In addition, the numbers of phenotyped and preselected larvae were varied. The sex of an individual was unknown during preselection and females had higher LBW, resulting in more females being preselected. Selecting adult flies based on their phenotype for LBW increased genetic gain by 0.06 genetic standard deviation units compared to randomly selecting from the preselected larvae. Fixing the number of phenotyped larvae per family increased the rate of inbreeding by 0.15 to 0.20% per generation. Controlled mating compared to group mating decreased the rate of inbreeding by 0.02 to 0.03% per generation. Phenotyping more than 4000 larvae resulted in a lack of preselected males due to the sexual dimorphism. Preselecting both too few and too many larvae could negatively impact genetic gain, depending on the breeding program. A mass selection breeding programs in which the adult fly is selected based on their larval phenotype, breeding animals mate in a group and sampling larvae for phenotyping at random over the whole population is recommended for black soldier flies, considering the positive effect on rates of genetic gain and inbreeding. The number of phenotyped and preselected larvae should be calculated based on the expected female weight deviation to ensure sufficient male and female candidates are selected.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"25 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142574491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-31DOI: 10.1186/s12711-024-00936-0
James P. Copley, Benjamin J. Hayes, Elizabeth M. Ross, Shannon Speight, Geoffry Fordyce, Benjamin J. Wood, Bailey N. Engle
Genotype by environment interactions (GxE) affect a range of production traits in beef cattle. Quantifying the effect of GxE in commercial and multi-breed herds is challenging due to unknown genetic linkage between animals across environment levels. The primary aim of this study was to use multi-trait models to investigate GxE for three heifer fertility traits, corpus luteum (CL) presence, first pregnancy and second pregnancy, in a large tropical beef multibreed dataset (n = 21,037). Environmental levels were defined by two different descriptors, burden of heat load (temperature humidity index, THI) and nutritional availability (based on mean average daily gain for the herd, ADWG). To separate the effects of genetic linkage and real GxE across the environments, 1000 replicates of a simulated phenotype were generated by simulating QTL effects with no GxE onto real marker genotypes from the population, to determine the genetic correlations that could be expected across environments due to the existing genetic linkage only. Correlations from the real phenotypes were then compared to the empirical distribution under the null hypothesis from the simulated data. By adopting this approach, this study attempted to establish if low genetic correlations between environmental levels were due to GxE or insufficient genetic linkage between animals in each environmental level. The correlations (being less than <0.8) for the real phenotypes were indicative of GxE for CL presence between ADWG environmental levels and in pregnancy traits. However, none of the correlations for CL presence or first pregnancy between ADWG levels were below the 5th percentile value for the empirical distribution under the null hypothesis from the simulated data. Only one statistically significant (P < 0.05) indication of GxE for first pregnancy was found between THI environmental levels, where rg = 0.28 and 5th percentile value = 0.29, and this result was marginal. Only one case of statistically significant GxE for fertility traits was detected for first pregnancy between THI environmental levels 2 and 3. Other initial indications of GxE that were observed from the real phenotypes did not prove significant when compared to an empirical null distribution from simulated phenotypes. The lack of compelling evidence of GxE indicates that direct selection for fertility traits can be made accurately, using a single evaluation, regardless of environment.
{"title":"Investigating genotype by environment interaction for beef cattle fertility traits in commercial herds in northern Australia with multi-trait analysis","authors":"James P. Copley, Benjamin J. Hayes, Elizabeth M. Ross, Shannon Speight, Geoffry Fordyce, Benjamin J. Wood, Bailey N. Engle","doi":"10.1186/s12711-024-00936-0","DOIUrl":"https://doi.org/10.1186/s12711-024-00936-0","url":null,"abstract":"Genotype by environment interactions (GxE) affect a range of production traits in beef cattle. Quantifying the effect of GxE in commercial and multi-breed herds is challenging due to unknown genetic linkage between animals across environment levels. The primary aim of this study was to use multi-trait models to investigate GxE for three heifer fertility traits, corpus luteum (CL) presence, first pregnancy and second pregnancy, in a large tropical beef multibreed dataset (n = 21,037). Environmental levels were defined by two different descriptors, burden of heat load (temperature humidity index, THI) and nutritional availability (based on mean average daily gain for the herd, ADWG). To separate the effects of genetic linkage and real GxE across the environments, 1000 replicates of a simulated phenotype were generated by simulating QTL effects with no GxE onto real marker genotypes from the population, to determine the genetic correlations that could be expected across environments due to the existing genetic linkage only. Correlations from the real phenotypes were then compared to the empirical distribution under the null hypothesis from the simulated data. By adopting this approach, this study attempted to establish if low genetic correlations between environmental levels were due to GxE or insufficient genetic linkage between animals in each environmental level. The correlations (being less than <0.8) for the real phenotypes were indicative of GxE for CL presence between ADWG environmental levels and in pregnancy traits. However, none of the correlations for CL presence or first pregnancy between ADWG levels were below the 5th percentile value for the empirical distribution under the null hypothesis from the simulated data. Only one statistically significant (P < 0.05) indication of GxE for first pregnancy was found between THI environmental levels, where rg = 0.28 and 5th percentile value = 0.29, and this result was marginal. Only one case of statistically significant GxE for fertility traits was detected for first pregnancy between THI environmental levels 2 and 3. Other initial indications of GxE that were observed from the real phenotypes did not prove significant when compared to an empirical null distribution from simulated phenotypes. The lack of compelling evidence of GxE indicates that direct selection for fertility traits can be made accurately, using a single evaluation, regardless of environment.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"27 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142555824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}