The integration of nuclear mitochondrial DNA (mtDNA) into the mammalian genomes is an ongoing, yet rare evolutionary process that produces nuclear sequences of mitochondrial origin (NUMT). In this study, we identified and analysed NUMT inserted into the pig (Sus scrofa) genome and in the genomes of a few other Suinae species. First, we constructed a comparative distribution map of NUMT in the Sscrofa11.1 reference genome and in 22 other assembled S. scrofa genomes (from Asian and European pig breeds and populations), as well as the assembled genomes of the Visayan warty pig (Sus cebifrons) and warthog (Phacochoerus africanus). We then analysed a total of 485 whole genome sequencing datasets, from different breeds, populations, or Sus species, to discover polymorphic NUMT (inserted/deleted in the pig genome). The insertion age was inferred based on the presence or absence of orthologous NUMT in the genomes of different species, taking into account their evolutionary divergence. Additionally, the age of the NUMT was calculated based on sequence degradation compared to the authentic mtDNA sequence. We also validated a selected set of representative NUMT via PCR amplification. We have constructed an atlas of 418 NUMT regions, 70 of which were not present in any assembled genomes. We identified ancient NUMT regions (older than 55 million years ago, Mya) and NUMT that appeared at different time points along the Suinae evolutionary lineage. We identified very recent polymorphic NUMT (private to S. scrofa, with < 1 Mya), and more ancient polymorphic NUMT (3.5–10 Mya) present in various Sus species. These latest polymorphic NUMT regions, which segregate in European and Asian pig breeds and populations, are likely the results of interspecies admixture within the Sus genus. This study provided a first comprehensive analysis of NUMT present in the Sus scrofa genome, comparing them to NUMT found in other species within the order Cetartiodactyla. The NUMT-based evolutionary window that we reconstructed from NUMT integration ages could be useful to better understand the micro-evolutionary events that shaped the modern pig genome and enriched the genetic diversity of this species.
{"title":"A comprehensive atlas of nuclear sequences of mitochondrial origin (NUMT) inserted into the pig genome","authors":"Matteo Bolner, Samuele Bovo, Mohamad Ballan, Giuseppina Schiavo, Valeria Taurisano, Anisa Ribani, Francesca Bertolini, Luca Fontanesi","doi":"10.1186/s12711-024-00930-6","DOIUrl":"https://doi.org/10.1186/s12711-024-00930-6","url":null,"abstract":"The integration of nuclear mitochondrial DNA (mtDNA) into the mammalian genomes is an ongoing, yet rare evolutionary process that produces nuclear sequences of mitochondrial origin (NUMT). In this study, we identified and analysed NUMT inserted into the pig (Sus scrofa) genome and in the genomes of a few other Suinae species. First, we constructed a comparative distribution map of NUMT in the Sscrofa11.1 reference genome and in 22 other assembled S. scrofa genomes (from Asian and European pig breeds and populations), as well as the assembled genomes of the Visayan warty pig (Sus cebifrons) and warthog (Phacochoerus africanus). We then analysed a total of 485 whole genome sequencing datasets, from different breeds, populations, or Sus species, to discover polymorphic NUMT (inserted/deleted in the pig genome). The insertion age was inferred based on the presence or absence of orthologous NUMT in the genomes of different species, taking into account their evolutionary divergence. Additionally, the age of the NUMT was calculated based on sequence degradation compared to the authentic mtDNA sequence. We also validated a selected set of representative NUMT via PCR amplification. We have constructed an atlas of 418 NUMT regions, 70 of which were not present in any assembled genomes. We identified ancient NUMT regions (older than 55 million years ago, Mya) and NUMT that appeared at different time points along the Suinae evolutionary lineage. We identified very recent polymorphic NUMT (private to S. scrofa, with < 1 Mya), and more ancient polymorphic NUMT (3.5–10 Mya) present in various Sus species. These latest polymorphic NUMT regions, which segregate in European and Asian pig breeds and populations, are likely the results of interspecies admixture within the Sus genus. This study provided a first comprehensive analysis of NUMT present in the Sus scrofa genome, comparing them to NUMT found in other species within the order Cetartiodactyla. The NUMT-based evolutionary window that we reconstructed from NUMT integration ages could be useful to better understand the micro-evolutionary events that shaped the modern pig genome and enriched the genetic diversity of this species.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"12 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142234457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-12DOI: 10.1186/s12711-024-00931-5
Jigme Dorji, Amanda J. Chamberlain, Coralie M. Reich, Christy J. VanderJagt, Tuan V. Nguyen, Hans D. Daetwyler, Iona M. MacLeod
Mitochondrial genomes differ from the nuclear genome and in humans it is known that mitochondrial variants contribute to genetic disorders. Prior to genomics, some livestock studies assessed the role of the mitochondrial genome but these were limited and inconclusive. Modern genome sequencing provides an opportunity to re-evaluate the potential impact of mitochondrial variation on livestock traits. This study first evaluated the empirical accuracy of mitochondrial sequence imputation and then used real and imputed mitochondrial sequence genotypes to study the role of mitochondrial variants on milk production traits of dairy cattle. The empirical accuracy of imputation from Single Nucleotide Polymorphism (SNP) panels to mitochondrial sequence genotypes was assessed in 516 test animals of Holstein, Jersey and Red breeds using Beagle software and a sequence reference of 1883 animals. The overall accuracy estimated as the Pearson’s correlation squared (R2) between all imputed and real genotypes across all animals was 0.454. The low accuracy was attributed partly to the majority of variants having low minor allele frequency (MAF < 0.005) but also due to variants in the hypervariable D-loop region showing poor imputation accuracy. Beagle software provides an internal estimate of imputation accuracy (DR2), and 10 percent of the total 1927 imputed positions showed DR2 greater than 0.9 (N = 201). There were 151 sites with empirical R2 > 0.9 (of 954 variants segregating in the test animals) and 138 of these overlapped the sites with DR2 > 0.9. This suggests that the DR2 statistic is a reasonable proxy to select sites that are imputed with higher accuracy for downstream analyses. Accordingly, in the second part of the study mitochondrial sequence variants were imputed from real mitochondrial SNP panel genotypes of 9515 Australian Holstein, Jersey and Red dairy cattle. Then, using only sites with DR2 > 0.900 and real genotypes, we undertook a genome-wide association study (GWAS) for milk, fat and protein yields. The GWAS mitochondrial SNP effects were not significant. The accuracy of imputation of mitochondrial genotypes from the SNP panel to sequence was generally low. The Beagle DR2 statistic enabled selection of sites imputed with higher empirical accuracy. We recommend building larger reference populations with mitochondrial sequence to improve the accuracy of imputing less common variants and ensuring that SNP panels include common variants in the D-loop region.
线粒体基因组不同于核基因组,在人类中,线粒体变异导致了遗传疾病。在基因组学出现之前,一些家畜研究对线粒体基因组的作用进行了评估,但评估结果有限,而且没有定论。现代基因组测序技术为重新评估线粒体变异对家畜性状的潜在影响提供了机会。本研究首先评估了线粒体序列估算的经验准确性,然后使用真实和估算的线粒体序列基因型研究线粒体变异对奶牛产奶性状的作用。使用 Beagle 软件和 1883 头动物的序列参照,对 516 头荷斯坦、娟珊和红种的测试动物进行了评估,结果表明,从单核苷酸多态性(SNP)面板到线粒体序列基因型的推算经验准确性很高。根据所有动物的所有估算基因型与真实基因型之间的皮尔逊相关平方(R2)估算,总体准确度为 0.454。准确率低的部分原因是大多数变异的小等位基因频率(MAF 0.9)较低(在测试动物中分离出 954 个变异),其中 138 个与 DR2 > 0.9 的位点重叠。这表明,DR2 统计量是一个合理的替代指标,可用于为下游分析选择更准确的估算位点。因此,在研究的第二部分,从 9515 头澳大利亚荷斯坦牛、娟珊牛和红奶牛的真实线粒体 SNP 面板基因型中推算线粒体序列变异。然后,我们仅使用 DR2 > 0.900 的位点和真实基因型,对牛奶、脂肪和蛋白质产量进行了全基因组关联研究(GWAS)。GWAS 的线粒体 SNP 影响并不显著。从 SNP 面板到序列的线粒体基因型估算准确率普遍较低。使用 Beagle DR2 统计量可以选择经验准确性较高的归因位点。我们建议利用线粒体序列建立更大的参考群体,以提高较不常见变异的归因准确性,并确保 SNP 面板包括 D 环区域的常见变异。
{"title":"Mitochondrial sequence variants: testing imputation accuracy and their association with dairy cattle milk traits","authors":"Jigme Dorji, Amanda J. Chamberlain, Coralie M. Reich, Christy J. VanderJagt, Tuan V. Nguyen, Hans D. Daetwyler, Iona M. MacLeod","doi":"10.1186/s12711-024-00931-5","DOIUrl":"https://doi.org/10.1186/s12711-024-00931-5","url":null,"abstract":"Mitochondrial genomes differ from the nuclear genome and in humans it is known that mitochondrial variants contribute to genetic disorders. Prior to genomics, some livestock studies assessed the role of the mitochondrial genome but these were limited and inconclusive. Modern genome sequencing provides an opportunity to re-evaluate the potential impact of mitochondrial variation on livestock traits. This study first evaluated the empirical accuracy of mitochondrial sequence imputation and then used real and imputed mitochondrial sequence genotypes to study the role of mitochondrial variants on milk production traits of dairy cattle. The empirical accuracy of imputation from Single Nucleotide Polymorphism (SNP) panels to mitochondrial sequence genotypes was assessed in 516 test animals of Holstein, Jersey and Red breeds using Beagle software and a sequence reference of 1883 animals. The overall accuracy estimated as the Pearson’s correlation squared (R2) between all imputed and real genotypes across all animals was 0.454. The low accuracy was attributed partly to the majority of variants having low minor allele frequency (MAF < 0.005) but also due to variants in the hypervariable D-loop region showing poor imputation accuracy. Beagle software provides an internal estimate of imputation accuracy (DR2), and 10 percent of the total 1927 imputed positions showed DR2 greater than 0.9 (N = 201). There were 151 sites with empirical R2 > 0.9 (of 954 variants segregating in the test animals) and 138 of these overlapped the sites with DR2 > 0.9. This suggests that the DR2 statistic is a reasonable proxy to select sites that are imputed with higher accuracy for downstream analyses. Accordingly, in the second part of the study mitochondrial sequence variants were imputed from real mitochondrial SNP panel genotypes of 9515 Australian Holstein, Jersey and Red dairy cattle. Then, using only sites with DR2 > 0.900 and real genotypes, we undertook a genome-wide association study (GWAS) for milk, fat and protein yields. The GWAS mitochondrial SNP effects were not significant. The accuracy of imputation of mitochondrial genotypes from the SNP panel to sequence was generally low. The Beagle DR2 statistic enabled selection of sites imputed with higher empirical accuracy. We recommend building larger reference populations with mitochondrial sequence to improve the accuracy of imputing less common variants and ensuring that SNP panels include common variants in the D-loop region.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"104 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142170438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-12DOI: 10.1186/s12711-024-00929-z
Leteisha A. Prescott, Megan R. Scholtens, Seumas P. Walker, Shannon M. Clarke, Ken G. Dodds, Matthew R. Miller, Jayson M. Semmens, Chris G. Carter, Jane E. Symonds
A genotype-by-environment (G × E) interaction is defined as genotypes responding differently to different environments. In salmonids, G × E interactions can occur in different rearing conditions, including changes in salinity or temperature. However, water flow, an important variable that can influence metabolism, has yet to be considered for potential G × E interactions, although water flows differ across production stages. The salmonid industry is now manipulating flow in tanks to improve welfare and production performance, and expanding sea pen farming offshore, where flow dynamics are substantially greater. Therefore, there is a need to test whether G × E interactions occur under low and higher flow regimes to determine if industry should consider modifying their performance evaluation and selection criteria to account for different flow environments. Here, we used genotype-by-sequencing to create a genomic-relationship matrix of 37 Chinook salmon, Oncorhynchus tshawytscha, families to assess possible G × E interactions for production performance under two flow environments: a low flow regime (0.3 body lengths per second; bl s−1) and a moderate flow regime (0.8 bl s−1). Genetic correlations for the same production performance trait between flow regimes suggest there is minimal evidence of a G × E interaction between the low and moderate flow regimes tested in this study, for Chinook salmon reared from 82.9 ± 16.8 g ( $${overline{text{x}}}$$ ± s.d.) to 583.2 ± 117.1 g ( $${overline{text{x}}}$$ ± s.d.). Estimates of genetic and phenotypic correlations between traits did not reveal any unfavorable trait correlations for size- (weight and condition factor) and growth-related traits, regardless of the flow regime, but did suggest measuring feed intake would be the preferred approach to improve feed efficiency because of the strong correlations between feed intake and feed efficiency, consistent with previous studies. This new information suggests that Chinook salmon families do not need to be selected separately for performance across different flow regimes. However, further studies are needed to confirm this across a wider range of fish sizes and flows. This information is key for breeding programs to determine if separate evaluation groups are required for different flow regimes that are used for production (e.g., hatchery, post smolt recirculating aquaculture system, or offshore).
基因型与环境的相互作用(G × E)是指基因型对不同环境的不同反应。在鲑科鱼类中,G × E 相互作用可能发生在不同的饲养条件下,包括盐度或温度的变化。然而,水流是影响新陈代谢的一个重要变量,虽然不同生产阶段的水流不同,但尚未考虑潜在的 G × E 相互作用。目前,鲑鱼养殖业正在控制水箱中的水流,以提高福利和生产性能,并在近海扩大海栏养殖,因为那里的水流动态更大。因此,有必要测试在低流量和高流量条件下是否会发生 G × E 相互作用,以确定该行业是否应考虑修改其性能评估和选择标准,以适应不同的流量环境。在此,我们使用基因型测序方法创建了 37 个大鳞大麻哈鱼(Oncorhynchus tshawytscha)家系的基因组关系矩阵,以评估在两种水流环境(低水流环境(0.3 体长/秒;bl s-1)和中水流环境(0.8 bl s-1))下生产性能可能存在的 G × E 相互作用。对于饲养体重从 82.9 ± 16.8 g($${overline{text{x}}$±s.d.)到 583.2 ± 117.1 g($${overline{text{x}}$±s.d.)的大鳞大麻哈鱼而言,不同水流条件下同一生产性能特征的遗传相关性表明,在本研究测试的低水流条件和中等水流条件下,G × E 相互作用的证据极少。)对性状间遗传和表型相关性的估计并未发现任何不利于体型(体重和体况因子)和生长相关性状的性状相关性,与水流制度无关,但由于采食量和饲料效率之间的强相关性,表明测量采食量将是提高饲料效率的首选方法,这与之前的研究一致。这一新信息表明,大鳞大麻哈鱼家族不需要在不同水流条件下分别进行性能选择。不过,还需要进一步研究,以便在更广泛的鱼体大小和水流范围内证实这一点。这些信息对育种计划至关重要,有助于确定是否需要对用于生产的不同水流条件(如孵化场、蜕皮后循环水产养殖系统或近海)进行单独的评估分组。
{"title":"Genetic parameters and genotype-by-environment interaction estimates for growth and feed efficiency related traits in Chinook salmon, Oncorhynchus tshawytscha, reared under low and moderate flow regimes","authors":"Leteisha A. Prescott, Megan R. Scholtens, Seumas P. Walker, Shannon M. Clarke, Ken G. Dodds, Matthew R. Miller, Jayson M. Semmens, Chris G. Carter, Jane E. Symonds","doi":"10.1186/s12711-024-00929-z","DOIUrl":"https://doi.org/10.1186/s12711-024-00929-z","url":null,"abstract":"A genotype-by-environment (G × E) interaction is defined as genotypes responding differently to different environments. In salmonids, G × E interactions can occur in different rearing conditions, including changes in salinity or temperature. However, water flow, an important variable that can influence metabolism, has yet to be considered for potential G × E interactions, although water flows differ across production stages. The salmonid industry is now manipulating flow in tanks to improve welfare and production performance, and expanding sea pen farming offshore, where flow dynamics are substantially greater. Therefore, there is a need to test whether G × E interactions occur under low and higher flow regimes to determine if industry should consider modifying their performance evaluation and selection criteria to account for different flow environments. Here, we used genotype-by-sequencing to create a genomic-relationship matrix of 37 Chinook salmon, Oncorhynchus tshawytscha, families to assess possible G × E interactions for production performance under two flow environments: a low flow regime (0.3 body lengths per second; bl s−1) and a moderate flow regime (0.8 bl s−1). Genetic correlations for the same production performance trait between flow regimes suggest there is minimal evidence of a G × E interaction between the low and moderate flow regimes tested in this study, for Chinook salmon reared from 82.9 ± 16.8 g ( $${overline{text{x}}}$$ ± s.d.) to 583.2 ± 117.1 g ( $${overline{text{x}}}$$ ± s.d.). Estimates of genetic and phenotypic correlations between traits did not reveal any unfavorable trait correlations for size- (weight and condition factor) and growth-related traits, regardless of the flow regime, but did suggest measuring feed intake would be the preferred approach to improve feed efficiency because of the strong correlations between feed intake and feed efficiency, consistent with previous studies. This new information suggests that Chinook salmon families do not need to be selected separately for performance across different flow regimes. However, further studies are needed to confirm this across a wider range of fish sizes and flows. This information is key for breeding programs to determine if separate evaluation groups are required for different flow regimes that are used for production (e.g., hatchery, post smolt recirculating aquaculture system, or offshore).","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"10 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142174671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-03DOI: 10.1186/s12711-024-00928-0
Naomi Duijvesteijn, Julius H. J. van der Werf, Brian P. Kinghorn
The objective of this study was to introduce a genome-wide association study (GWAS) in conjunction with segregation analysis on monogenic categorical traits. Genotype probabilities calculated from phenotypes, mode of inheritance and pedigree information, are expressed as the expected allele count (EAC) (range 0 to 2), and are inherited additively, by definition, unlike the original phenotypes, which are non-additive and could be of incomplete penetrance. The EAC are regressed on the single nucleotide polymorphism (SNP) genotypes, similar to an additive GWAS. In this study, horn phenotypes in Merino sheep are used to illustrate the advantages of using the segregation GWAS, a trait believed to be monogenic, affected by dominance, sex-dependent expression and likely affected by incomplete penetrance. We also used simulation to investigate whether incomplete penetrance can cause prediction errors in Merino sheep for horn status. Estimated penetrance values differed between the sexes, where males showed almost complete penetrance, especially for horned and polled phenotypes, while females had low penetrance values for the horned status. This suggests that females homozygous for the ‘horned allele’ have a horned phenotype in only 22% of the cases while 78% will be knobbed or have scurs. The GWAS using EAC on 4001 animals and 510,174 SNP genotypes from the Illumina Ovine high-density (600k) chip gave a stronger association compared to using actual phenotypes. The correlation between the EAC and the allele count of the SNP with the highest –log10(p-value) was 0.73 in males and 0.67 in females. Simulations using penetrance values found by the segregation analyses resulted in higher correlations between the EAC and the causative mutation (0.95 for males and 0.89 for females, respectively), suggesting that the most predictive SNP is not in full LD with the causative mutation. Our results show clear differences in penetrance values between males and female Merino sheep for horn status. Segregation analysis for a trait with mutually exclusive phenotypes, non-additive inheritance, and/or incomplete penetrance can lead to considerably more power in a GWAS because the linearized genotype probabilities are additive and can accommodate incomplete penetrance. This method can be extended to any monogenic controlled categorical trait of which the phenotypes are mutually exclusive.
{"title":"Segregation GWAS to linearize a non-additive locus with incomplete penetrance: an example of horn status in sheep","authors":"Naomi Duijvesteijn, Julius H. J. van der Werf, Brian P. Kinghorn","doi":"10.1186/s12711-024-00928-0","DOIUrl":"https://doi.org/10.1186/s12711-024-00928-0","url":null,"abstract":"The objective of this study was to introduce a genome-wide association study (GWAS) in conjunction with segregation analysis on monogenic categorical traits. Genotype probabilities calculated from phenotypes, mode of inheritance and pedigree information, are expressed as the expected allele count (EAC) (range 0 to 2), and are inherited additively, by definition, unlike the original phenotypes, which are non-additive and could be of incomplete penetrance. The EAC are regressed on the single nucleotide polymorphism (SNP) genotypes, similar to an additive GWAS. In this study, horn phenotypes in Merino sheep are used to illustrate the advantages of using the segregation GWAS, a trait believed to be monogenic, affected by dominance, sex-dependent expression and likely affected by incomplete penetrance. We also used simulation to investigate whether incomplete penetrance can cause prediction errors in Merino sheep for horn status. Estimated penetrance values differed between the sexes, where males showed almost complete penetrance, especially for horned and polled phenotypes, while females had low penetrance values for the horned status. This suggests that females homozygous for the ‘horned allele’ have a horned phenotype in only 22% of the cases while 78% will be knobbed or have scurs. The GWAS using EAC on 4001 animals and 510,174 SNP genotypes from the Illumina Ovine high-density (600k) chip gave a stronger association compared to using actual phenotypes. The correlation between the EAC and the allele count of the SNP with the highest –log10(p-value) was 0.73 in males and 0.67 in females. Simulations using penetrance values found by the segregation analyses resulted in higher correlations between the EAC and the causative mutation (0.95 for males and 0.89 for females, respectively), suggesting that the most predictive SNP is not in full LD with the causative mutation. Our results show clear differences in penetrance values between males and female Merino sheep for horn status. Segregation analysis for a trait with mutually exclusive phenotypes, non-additive inheritance, and/or incomplete penetrance can lead to considerably more power in a GWAS because the linearized genotype probabilities are additive and can accommodate incomplete penetrance. This method can be extended to any monogenic controlled categorical trait of which the phenotypes are mutually exclusive.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"6 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142123713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-03DOI: 10.1186/s12711-024-00927-1
Chang-heng Zhao, Dan Wang, Cheng Yang, Yan Chen, Jun Teng, Xin-yi Zhang, Zhi Cao, Xian-ming Wei, Chao Ning, Qi-en Yang, Wen-fa Lv, Qin Zhang
Accurate breed identification is essential for the conservation and sustainable use of indigenous farm animal genetic resources. In this study, we evaluated the phylogenetic relationships and genomic breed compositions of 13 sheep breeds using SNP and InDel data from whole genome sequencing. The breeds included 11 Chinese indigenous and 2 foreign commercial breeds. We compared different strategies for breed identification with respect to different marker types, i.e. SNPs, InDels, and a combination of SNPs and InDels (named SIs), different breed-informative marker detection methods, and different machine learning classification methods. Using WGS-based SNPs and InDels, we revealed the phylogenetic relationships between 11 Chinese indigenous and two foreign sheep breeds and quantified their purities through estimated genomic breed compositions. We found that the optimal strategy for identifying these breeds was the combination of DFI_union for breed-informative marker detection, which integrated the methods of Delta, Pairwise Wright's FST, and Informativeness for Assignment (namely DFI) by merging the breed-informative markers derived from the three methods, and KSR for breed assignment, which integrated the methods of K-Nearest Neighbor, Support Vector Machine, and Random Forest (namely KSR) by intersecting their results. Using SI markers improved the identification accuracy compared to using SNPs or InDels alone. We achieved accuracies over 97.5% when using at least the 1000 most breed-informative (MBI) SI markers and even 100% when using 5000 SI markers. Our results provide not only an important foundation for conservation of these Chinese local sheep breeds, but also general approaches for breed identification of indigenous farm animal breeds.
准确的品种鉴定对于本土农场动物遗传资源的保护和可持续利用至关重要。在本研究中,我们利用全基因组测序的 SNP 和 InDel 数据评估了 13 个绵羊品种的系统发育关系和基因组品种组成。这些品种包括 11 个中国本土品种和 2 个国外商业品种。我们比较了不同标记类型(即SNPs、InDels以及SNPs和InDels的组合(命名为SIs))、不同品种信息标记检测方法以及不同机器学习分类方法的不同品种鉴定策略。利用基于 WGS 的 SNPs 和 InDels,我们揭示了 11 个中国本土绵羊品种和 2 个外国绵羊品种之间的系统发育关系,并通过估计的基因组品种组成量化了它们的纯度。我们发现,鉴定这些品种的最佳策略是将 DFI_union 与 KSR 结合起来,前者用于品种信息标记检测,通过合并三种方法得出的品种信息标记,整合了 Delta、配对赖特 FST 和分配信息度方法(即 DFI);后者用于品种分配,通过交叉它们的结果,整合了 K-近邻、支持向量机和随机森林方法(即 KSR)。与单独使用 SNP 或 InDels 相比,使用 SI 标记提高了鉴定准确率。当使用至少 1000 个最具品种信息(MBI)的 SI 标记时,我们的准确率超过了 97.5%,而当使用 5000 个 SI 标记时,准确率甚至达到了 100%。我们的研究结果不仅为这些中国地方绵羊品种的保护提供了重要依据,也为本土农畜品种的品种识别提供了一般方法。
{"title":"Population structure and breed identification of Chinese indigenous sheep breeds using whole genome SNPs and InDels","authors":"Chang-heng Zhao, Dan Wang, Cheng Yang, Yan Chen, Jun Teng, Xin-yi Zhang, Zhi Cao, Xian-ming Wei, Chao Ning, Qi-en Yang, Wen-fa Lv, Qin Zhang","doi":"10.1186/s12711-024-00927-1","DOIUrl":"https://doi.org/10.1186/s12711-024-00927-1","url":null,"abstract":"Accurate breed identification is essential for the conservation and sustainable use of indigenous farm animal genetic resources. In this study, we evaluated the phylogenetic relationships and genomic breed compositions of 13 sheep breeds using SNP and InDel data from whole genome sequencing. The breeds included 11 Chinese indigenous and 2 foreign commercial breeds. We compared different strategies for breed identification with respect to different marker types, i.e. SNPs, InDels, and a combination of SNPs and InDels (named SIs), different breed-informative marker detection methods, and different machine learning classification methods. Using WGS-based SNPs and InDels, we revealed the phylogenetic relationships between 11 Chinese indigenous and two foreign sheep breeds and quantified their purities through estimated genomic breed compositions. We found that the optimal strategy for identifying these breeds was the combination of DFI_union for breed-informative marker detection, which integrated the methods of Delta, Pairwise Wright's FST, and Informativeness for Assignment (namely DFI) by merging the breed-informative markers derived from the three methods, and KSR for breed assignment, which integrated the methods of K-Nearest Neighbor, Support Vector Machine, and Random Forest (namely KSR) by intersecting their results. Using SI markers improved the identification accuracy compared to using SNPs or InDels alone. We achieved accuracies over 97.5% when using at least the 1000 most breed-informative (MBI) SI markers and even 100% when using 5000 SI markers. Our results provide not only an important foundation for conservation of these Chinese local sheep breeds, but also general approaches for breed identification of indigenous farm animal breeds.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"25 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142123714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Single-nucleotide polymorphism (SNP) effects can be backsolved from ssGBLUP genomic estimated breeding values (GEBV) and used for genome-wide association studies (ssGWAS). However, obtaining p-values for those SNP effects relies on the inversion of dense matrices, which poses computational limitations in large genotyped populations. In this study, we present a method to approximate SNP p-values for ssGWAS with many genotyped animals. This method relies on the combination of a sparse approximation of the inverse of the genomic relationship matrix ( $${mathbf{G}}_{mathbf{A}mathbf{P}mathbf{Y}}^mathbf{-1}$$ ) built with the algorithm for proven and young ( $$text{APY}$$ ) and an approximation of the prediction error variance of SNP effects which does not require the inversion of the left-hand side (LHS) of the mixed model equations. To test the proposed p-value computing method, we used a reduced genotyped population of 50K genotyped animals and compared the approximated SNP p-values with benchmark p-values obtained with the direct inverse of LHS built with an exact genomic relationship matrix ( $${mathbf{G}}^mathbf{-1})$$ . Then, we applied the proposed approximation method to obtain SNP p-values for a larger genotyped population composed of 450K genotyped animals. The same genomic regions on chromosomes 7 and 20 were identified across all p-value computing methods when using 50K genotyped animals. In terms of computational requirements, obtaining p-values with the proposed approximation reduced the wall-clock time by 38 times and the memory requirement by ten times compared to using the exact inversion of the LHS. When the approximation was applied to a population of 450K genotyped animals, two new significant regions on chromosomes 6 and 14 were uncovered, indicating an increase in GWAS detection power when including more genotypes in the analyses. The process of obtaining p-values with the approximation and 450K genotyped individuals took 24.5 wall-clock hours and 87.66GB of memory, which is expected to increase linearly with the addition of noncore genotyped individuals. With the proposed method, obtaining p-values for SNP effects in ssGWAS is computationally feasible in large genotyped populations. The computational cost of obtaining p-values in ssGWAS may no longer be a limitation in extensive populations with many genotyped animals.
单核苷酸多态性(SNP)效应可以从 ssGBLUP 基因组估计育种值(GEBV)中反演算出来,并用于全基因组关联研究(ssGWAS)。然而,要获得这些 SNP 效应的 p 值,需要对密集矩阵进行反演,这给大型基因分型群体的计算带来了限制。在本研究中,我们提出了一种方法,用于近似许多基因分型动物的ssGWAS的SNP p值。该方法依赖于对基因组关系矩阵($${mathbf{G}}_{mathbf{A}mathbf{P}mathbf{Y}}^mathbf{-1}$$ )和 SNP 影响预测误差方差的近似值,后者不需要对混合模型方程的左手侧(LHS)进行反演。为了测试所提出的 p 值计算方法,我们使用了一个由 50K 只基因分型动物组成的缩小基因分型群体,并将近似 SNP p 值与使用精确基因组关系矩阵($${mathbf{G}}^mathbf{-1})建立的 LHS 直接反演得到的基准 p 值进行了比较。然后,我们应用所提出的近似方法获得了由 450K 个基因分型动物组成的更大基因分型群体的 SNP p 值。当使用 50K 只基因分型动物时,所有 p 值计算方法都能确定 7 号和 20 号染色体上的相同基因组区域。在计算要求方面,与使用 LHS 精确反转法相比,使用所提出的近似法获得 p 值的挂钟时间减少了 38 倍,内存需求减少了 10 倍。当把近似值应用于 450K 个基因分型的动物群体时,发现了 6 号和 14 号染色体上两个新的重要区域,这表明当分析中包含更多基因型时,GWAS 的检测能力会提高。利用近似方法和 450K 个基因分型个体获得 p 值的过程耗时 24.5 个壁钟小时,内存 87.66GB,预计随着非核心基因分型个体的增加,p 值将呈线性增长。采用所提出的方法,在ssGWAS中获取SNP效应的p值在大型基因分型群体中是可行的。在有许多基因分型动物的大种群中,在 ssGWAS 中获取 p 值的计算成本可能不再是一个限制因素。
{"title":"Marker effect p-values for single-step GWAS with the algorithm for proven and young in large genotyped populations","authors":"Natália Galoro Leite, Matias Bermann, Shogo Tsuruta, Ignacy Misztal, Daniela Lourenco","doi":"10.1186/s12711-024-00925-3","DOIUrl":"https://doi.org/10.1186/s12711-024-00925-3","url":null,"abstract":"Single-nucleotide polymorphism (SNP) effects can be backsolved from ssGBLUP genomic estimated breeding values (GEBV) and used for genome-wide association studies (ssGWAS). However, obtaining p-values for those SNP effects relies on the inversion of dense matrices, which poses computational limitations in large genotyped populations. In this study, we present a method to approximate SNP p-values for ssGWAS with many genotyped animals. This method relies on the combination of a sparse approximation of the inverse of the genomic relationship matrix ( $${mathbf{G}}_{mathbf{A}mathbf{P}mathbf{Y}}^mathbf{-1}$$ ) built with the algorithm for proven and young ( $$text{APY}$$ ) and an approximation of the prediction error variance of SNP effects which does not require the inversion of the left-hand side (LHS) of the mixed model equations. To test the proposed p-value computing method, we used a reduced genotyped population of 50K genotyped animals and compared the approximated SNP p-values with benchmark p-values obtained with the direct inverse of LHS built with an exact genomic relationship matrix ( $${mathbf{G}}^mathbf{-1})$$ . Then, we applied the proposed approximation method to obtain SNP p-values for a larger genotyped population composed of 450K genotyped animals. The same genomic regions on chromosomes 7 and 20 were identified across all p-value computing methods when using 50K genotyped animals. In terms of computational requirements, obtaining p-values with the proposed approximation reduced the wall-clock time by 38 times and the memory requirement by ten times compared to using the exact inversion of the LHS. When the approximation was applied to a population of 450K genotyped animals, two new significant regions on chromosomes 6 and 14 were uncovered, indicating an increase in GWAS detection power when including more genotypes in the analyses. The process of obtaining p-values with the approximation and 450K genotyped individuals took 24.5 wall-clock hours and 87.66GB of memory, which is expected to increase linearly with the addition of noncore genotyped individuals. With the proposed method, obtaining p-values for SNP effects in ssGWAS is computationally feasible in large genotyped populations. The computational cost of obtaining p-values in ssGWAS may no longer be a limitation in extensive populations with many genotyped animals.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"14 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142021889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-16DOI: 10.1186/s12711-024-00926-2
Ismo Strandén, Janez Jenko
Regions of genome-wide marker data may have differing influences on the evaluated traits. This can be reflected in the genomic models by assigning different weights to the markers, which can enhance the accuracy of genomic prediction. However, the standard multi-trait single-step genomic evaluation model can be computationally infeasible when the traits are allowed to have different marker weights. In this study, we developed and implemented a multi-trait single-step single nucleotide polymorphism best linear unbiased prediction (SNPBLUP) model for large genomic data evaluations that allows for the use of precomputed trait-specific marker weights. The modifications to the standard single-step SNPBLUP model were minor and did not significantly increase the preprocessing workload. The model was tested using simulated data and marker weights precomputed using BayesA. Based on the results, memory requirements and computing time per iteration slightly increased compared to the standard single-step model without weights. Moreover, convergence of the model was slower when using marker weights, which resulted in longer total computing time. The use of marker weights, however, improved prediction accuracy. We investigated a single-step SNPBLUP model that can be used to accommodate trait-specific marker weights. The marker-weighted single-step model improved prediction accuracy. The approach can be used for large genomic data evaluations using precomputed marker weights.
{"title":"A computationally feasible multi-trait single-step genomic prediction model with trait-specific marker weights","authors":"Ismo Strandén, Janez Jenko","doi":"10.1186/s12711-024-00926-2","DOIUrl":"https://doi.org/10.1186/s12711-024-00926-2","url":null,"abstract":"Regions of genome-wide marker data may have differing influences on the evaluated traits. This can be reflected in the genomic models by assigning different weights to the markers, which can enhance the accuracy of genomic prediction. However, the standard multi-trait single-step genomic evaluation model can be computationally infeasible when the traits are allowed to have different marker weights. In this study, we developed and implemented a multi-trait single-step single nucleotide polymorphism best linear unbiased prediction (SNPBLUP) model for large genomic data evaluations that allows for the use of precomputed trait-specific marker weights. The modifications to the standard single-step SNPBLUP model were minor and did not significantly increase the preprocessing workload. The model was tested using simulated data and marker weights precomputed using BayesA. Based on the results, memory requirements and computing time per iteration slightly increased compared to the standard single-step model without weights. Moreover, convergence of the model was slower when using marker weights, which resulted in longer total computing time. The use of marker weights, however, improved prediction accuracy. We investigated a single-step SNPBLUP model that can be used to accommodate trait-specific marker weights. The marker-weighted single-step model improved prediction accuracy. The approach can be used for large genomic data evaluations using precomputed marker weights.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"19 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141991929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-06DOI: 10.1186/s12711-024-00924-4
Erin G. Smith, Dominic L. Waters, Samuel F. Walkom, Sam A. Clark
The effects of environmental disturbances on livestock are often observed indirectly through the variability patterns of repeated performance records over time. Sheep are frequently exposed to diverse extensive environments but currently lack appropriate measures of resilience (or sensitivity) towards environmental disturbance. In this study, random regression models were used to analyse repeated records of the fibre diameter of wool taken along the wool staple (bundle of wool fibres) to investigate how the genetic and environmental variance of fibre diameter changes with different growing environments. A model containing a fifth, fourth and second-order Legendre polynomial applied to the fixed, additive and permanent environmental effects, respectively, was optimal for modelling fibre diameter along the wool staple. The additive genetic and permanent environmental variance both showed variability across the staple length trajectory. The ranking of sire estimated breeding values (EBV) for fibre diameter was shown to change along the staple and the genetic correlations decreased as the distance between measurements along the staple increased. This result suggests that some genotypes were potentially more resilient towards the changes in the growing environment compared to others. In addition, the eigenfunctions of the random regression model implied the ability to change the fibre diameter trajectory to reduce its variability along the wool staple. These results show that genetic variation in fibre diameter measured along the wool staple exists and this could be used to provide greater insight into the ability to select for resilience in extensively raised sheep populations.
{"title":"Analysis of the genetic variance of fibre diameter measured along the wool staple for use as a potential indicator of resilience in sheep","authors":"Erin G. Smith, Dominic L. Waters, Samuel F. Walkom, Sam A. Clark","doi":"10.1186/s12711-024-00924-4","DOIUrl":"https://doi.org/10.1186/s12711-024-00924-4","url":null,"abstract":"The effects of environmental disturbances on livestock are often observed indirectly through the variability patterns of repeated performance records over time. Sheep are frequently exposed to diverse extensive environments but currently lack appropriate measures of resilience (or sensitivity) towards environmental disturbance. In this study, random regression models were used to analyse repeated records of the fibre diameter of wool taken along the wool staple (bundle of wool fibres) to investigate how the genetic and environmental variance of fibre diameter changes with different growing environments. A model containing a fifth, fourth and second-order Legendre polynomial applied to the fixed, additive and permanent environmental effects, respectively, was optimal for modelling fibre diameter along the wool staple. The additive genetic and permanent environmental variance both showed variability across the staple length trajectory. The ranking of sire estimated breeding values (EBV) for fibre diameter was shown to change along the staple and the genetic correlations decreased as the distance between measurements along the staple increased. This result suggests that some genotypes were potentially more resilient towards the changes in the growing environment compared to others. In addition, the eigenfunctions of the random regression model implied the ability to change the fibre diameter trajectory to reduce its variability along the wool staple. These results show that genetic variation in fibre diameter measured along the wool staple exists and this could be used to provide greater insight into the ability to select for resilience in extensively raised sheep populations.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"55 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141895460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-30DOI: 10.1186/s12711-024-00905-7
Gabrielle M. Becker, Jacob W. Thorne, Joan M. Burke, Ronald M. Lewis, David R. Notter, James L. M. Morgan, Christopher S. Schauer, Whit C. Stewart, R. R. Redden, Brenda M. Murdoch
Managing genetic diversity is critically important for maintaining species fitness. Excessive homozygosity caused by the loss of genetic diversity can have detrimental effects on the reproduction and production performance of a breed. Analysis of genetic diversity can facilitate the identification of signatures of selection which may contribute to the specific characteristics regarding the health, production and physical appearance of a breed or population. In this study, breeds with well-characterized traits such as fine wool production (Rambouillet, N = 745), parasite resistance (Katahdin, N = 581) and environmental hardiness (Dorper, N = 265) were evaluated for inbreeding, effective population size (Ne), runs of homozygosity (ROH) and Wright’s fixation index (FST) outlier approach to identify differential signatures of selection at 36,113 autosomal single nucleotide polymorphisms (SNPs). Katahdin sheep had the largest current Ne at the most recent generation estimated with both the GONe and NeEstimator software. The most highly conserved ROH Island was identified in Rambouillet with a signature of selection on chromosome 6 containing 202 SNPs called in an ROH in 50 to 94% of the individuals. This region contained the DCAF16, LCORL and NCAPG genes that have been previously reported to be under selection and have biological roles related to milk production and growth traits. The outlier regions identified through the FST comparisons of Katahdin with Rambouillet and Dorper contained genes with known roles in milk production and mastitis resistance or susceptibility, and the FST comparisons of Rambouillet with Katahdin and Dorper identified genes related to wool growth, suggesting these traits have been under natural or artificial selection pressure in these populations. Genes involved in the cytokine-cytokine receptor interaction pathways were identified in all FST breed comparisons, which indicates the presence of allelic diversity between these breeds in genomic regions controlling cytokine signaling mechanisms. In this paper, we describe signatures of selection within diverse and economically important U.S. sheep breeds. The genes contained within these signatures are proposed for further study to understand their relevance to biological traits and improve understanding of breed diversity.
{"title":"Genetic diversity of United States Rambouillet, Katahdin and Dorper sheep","authors":"Gabrielle M. Becker, Jacob W. Thorne, Joan M. Burke, Ronald M. Lewis, David R. Notter, James L. M. Morgan, Christopher S. Schauer, Whit C. Stewart, R. R. Redden, Brenda M. Murdoch","doi":"10.1186/s12711-024-00905-7","DOIUrl":"https://doi.org/10.1186/s12711-024-00905-7","url":null,"abstract":"Managing genetic diversity is critically important for maintaining species fitness. Excessive homozygosity caused by the loss of genetic diversity can have detrimental effects on the reproduction and production performance of a breed. Analysis of genetic diversity can facilitate the identification of signatures of selection which may contribute to the specific characteristics regarding the health, production and physical appearance of a breed or population. In this study, breeds with well-characterized traits such as fine wool production (Rambouillet, N = 745), parasite resistance (Katahdin, N = 581) and environmental hardiness (Dorper, N = 265) were evaluated for inbreeding, effective population size (Ne), runs of homozygosity (ROH) and Wright’s fixation index (FST) outlier approach to identify differential signatures of selection at 36,113 autosomal single nucleotide polymorphisms (SNPs). Katahdin sheep had the largest current Ne at the most recent generation estimated with both the GONe and NeEstimator software. The most highly conserved ROH Island was identified in Rambouillet with a signature of selection on chromosome 6 containing 202 SNPs called in an ROH in 50 to 94% of the individuals. This region contained the DCAF16, LCORL and NCAPG genes that have been previously reported to be under selection and have biological roles related to milk production and growth traits. The outlier regions identified through the FST comparisons of Katahdin with Rambouillet and Dorper contained genes with known roles in milk production and mastitis resistance or susceptibility, and the FST comparisons of Rambouillet with Katahdin and Dorper identified genes related to wool growth, suggesting these traits have been under natural or artificial selection pressure in these populations. Genes involved in the cytokine-cytokine receptor interaction pathways were identified in all FST breed comparisons, which indicates the presence of allelic diversity between these breeds in genomic regions controlling cytokine signaling mechanisms. In this paper, we describe signatures of selection within diverse and economically important U.S. sheep breeds. The genes contained within these signatures are proposed for further study to understand their relevance to biological traits and improve understanding of breed diversity.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"10 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141794607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-27DOI: 10.1186/s12711-024-00923-5
Elena Petretto, Maria Luisa Dettori, María Gracia Luigi-Sierra, Antonia Noce, Michele Pazzola, Giuseppe Massimo Vacca, Antonio Molina, Amparo Martínez, Félix Goyache, Sean Carolan, Marcel Amills
Goats were domesticated in the Fertile Crescent about 10,000 years before present (YBP) and subsequently spread across Eurasia and Africa. This dispersal is expected to generate a gradient of declining genetic diversity with increasing distance from the areas of early livestock management. Previous studies have reported the existence of such genetic cline in European goat populations, but they were based on a limited number of microsatellite markers. Here, we have analyzed data generated by the AdaptMap project and other studies. More specifically, we have used the geographic coordinates and estimates of the observed (Ho) and expected (He) heterozygosities of 1077 European, 1187 African and 617 Asian goats belonging to 38, 43 and 22 different breeds, respectively, to find out whether genetic diversity and distance to Ganj Dareh, a Neolithic settlement in western Iran for which evidence of an early management of domestic goats has been obtained, are significantly correlated. Principal component and ADMIXTURE analyses revealed an incomplete regional differentiation of European breeds, but two genetic clusters representing Northern Europe and the British-Irish Isles were remarkably differentiated from the remaining European populations. In African breeds, we observed five main clusters: (1) North Africa, (2) West Africa, (3) East Africa, (4) South Africa, and (5) Madagascar. Regarding Asian breeds, three well differentiated West Asian, South Asian and East Asian groups were observed. For European and Asian goats, no strong evidence of significant correlations between Ho and He and distance to Ganj Dareh was found. In contrast, in African breeds we detected a significant gradient of diversity, which decreased with distance to Ganj Dareh. The detection of a genetic cline associated with distance to the Ganj Dareh in African but not in European or Asian goat breeds might reflect differences in the post-domestication dispersal process and subsequent migratory movements associated with the management of caprine populations from these three continents.
{"title":"Investigating the footprint of post-domestication dispersal on the diversity of modern European, African and Asian goats","authors":"Elena Petretto, Maria Luisa Dettori, María Gracia Luigi-Sierra, Antonia Noce, Michele Pazzola, Giuseppe Massimo Vacca, Antonio Molina, Amparo Martínez, Félix Goyache, Sean Carolan, Marcel Amills","doi":"10.1186/s12711-024-00923-5","DOIUrl":"https://doi.org/10.1186/s12711-024-00923-5","url":null,"abstract":"Goats were domesticated in the Fertile Crescent about 10,000 years before present (YBP) and subsequently spread across Eurasia and Africa. This dispersal is expected to generate a gradient of declining genetic diversity with increasing distance from the areas of early livestock management. Previous studies have reported the existence of such genetic cline in European goat populations, but they were based on a limited number of microsatellite markers. Here, we have analyzed data generated by the AdaptMap project and other studies. More specifically, we have used the geographic coordinates and estimates of the observed (Ho) and expected (He) heterozygosities of 1077 European, 1187 African and 617 Asian goats belonging to 38, 43 and 22 different breeds, respectively, to find out whether genetic diversity and distance to Ganj Dareh, a Neolithic settlement in western Iran for which evidence of an early management of domestic goats has been obtained, are significantly correlated. Principal component and ADMIXTURE analyses revealed an incomplete regional differentiation of European breeds, but two genetic clusters representing Northern Europe and the British-Irish Isles were remarkably differentiated from the remaining European populations. In African breeds, we observed five main clusters: (1) North Africa, (2) West Africa, (3) East Africa, (4) South Africa, and (5) Madagascar. Regarding Asian breeds, three well differentiated West Asian, South Asian and East Asian groups were observed. For European and Asian goats, no strong evidence of significant correlations between Ho and He and distance to Ganj Dareh was found. In contrast, in African breeds we detected a significant gradient of diversity, which decreased with distance to Ganj Dareh. The detection of a genetic cline associated with distance to the Ganj Dareh in African but not in European or Asian goat breeds might reflect differences in the post-domestication dispersal process and subsequent migratory movements associated with the management of caprine populations from these three continents.","PeriodicalId":55120,"journal":{"name":"Genetics Selection Evolution","volume":"106 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141768487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}