Pub Date : 2024-12-19DOI: 10.1093/genetics/iyae211
Vivaswat Shastry, Jeremy J Berg
For many problems in population genetics, it is useful to characterize the distribution of fitness effects (DFE) of de novo mutations among a certain class of sites. A DFE is typically estimated by fitting an observed site frequency spectrum (SFS) to an expected SFS given a hypothesized distribution of selection coefficients and demographic history. The development of tools to infer gene trees from haplotype alignments, along with ancient DNA resources, provides us with additional information about the frequency trajectories of segregating mutations. Here, we ask how useful this additional information is for learning about the DFE, using the joint distribution on allele frequency and age to summarize information about the trajectory. To this end, we introduce an accurate and efficient numerical method for computing the density on the age of a segregating variant found at a given sample frequency, given the strength of selection and an arbitrarily complex population size history. We then use this framework to show that the unconditional age distribution of negatively selected alleles is very closely approximated by re-weighting the neutral age distribution in terms of the negatively selected SFS, suggesting that allele ages provide little information about the DFE beyond that already contained in the present day frequency. To confirm this prediction, we extended the standard Poisson Random Field (PRF) method to incorporate the joint distribution of frequency and age in estimating selection coefficients, and test its performance using simulations. We find that when the full SFS is observed and the true allele ages are known, including ages in the estimation provides only small increases in the accuracy of estimated selection coefficients. However, if only sites with frequencies above a certain threshold are observed, then the true ages can provide substantial information about the selection coefficients, especially when the selection coefficient is large. When ages are estimated from haplotype data using state-of-the-art tools, uncertainty about the age abrogates most of the additional information in the fully observed SFS case, while the neutral prior assumed in these tools when estimating ages induces a downward bias in the case of the thresholded SFS.
{"title":"Allele ages provide limited information about the strength of negative selection.","authors":"Vivaswat Shastry, Jeremy J Berg","doi":"10.1093/genetics/iyae211","DOIUrl":"https://doi.org/10.1093/genetics/iyae211","url":null,"abstract":"<p><p>For many problems in population genetics, it is useful to characterize the distribution of fitness effects (DFE) of de novo mutations among a certain class of sites. A DFE is typically estimated by fitting an observed site frequency spectrum (SFS) to an expected SFS given a hypothesized distribution of selection coefficients and demographic history. The development of tools to infer gene trees from haplotype alignments, along with ancient DNA resources, provides us with additional information about the frequency trajectories of segregating mutations. Here, we ask how useful this additional information is for learning about the DFE, using the joint distribution on allele frequency and age to summarize information about the trajectory. To this end, we introduce an accurate and efficient numerical method for computing the density on the age of a segregating variant found at a given sample frequency, given the strength of selection and an arbitrarily complex population size history. We then use this framework to show that the unconditional age distribution of negatively selected alleles is very closely approximated by re-weighting the neutral age distribution in terms of the negatively selected SFS, suggesting that allele ages provide little information about the DFE beyond that already contained in the present day frequency. To confirm this prediction, we extended the standard Poisson Random Field (PRF) method to incorporate the joint distribution of frequency and age in estimating selection coefficients, and test its performance using simulations. We find that when the full SFS is observed and the true allele ages are known, including ages in the estimation provides only small increases in the accuracy of estimated selection coefficients. However, if only sites with frequencies above a certain threshold are observed, then the true ages can provide substantial information about the selection coefficients, especially when the selection coefficient is large. When ages are estimated from haplotype data using state-of-the-art tools, uncertainty about the age abrogates most of the additional information in the fully observed SFS case, while the neutral prior assumed in these tools when estimating ages induces a downward bias in the case of the thresholded SFS.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142856179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-18DOI: 10.1093/genetics/iyae210
Roshni A Patel, Clemens L Weiß, Huisheng Zhu, Hakhamanesh Mostafavi, Yuval B Simons, Jeffrey P Spence, Jonathan K Pritchard
Natural selection on complex traits is difficult to study in part due to the ascertainment inherent to genome-wide association studies (GWAS). The power to detect a trait-associated variant in GWAS is a function of frequency and effect size - but for traits under selection, the effect size of a variant determines the strength of selection against it, constraining its frequency. Recognizing the biases inherent to GWAS ascertainment, we propose studying the joint distribution of allele frequencies across populations, conditional on the frequencies in the GWAS cohort. Before considering these conditional frequency spectra, we first characterized the impact of selection and non-equilibrium demography on allele frequency dynamics forwards and backwards in time. We then used these results to understand conditional frequency spectra under realistic human demography. Finally, we investigated empirical conditional frequency spectra for GWAS variants associated with 106 complex traits, finding compelling evidence for either stabilizing or purifying selection. Our results provide insight into polygenic score portability and other properties of variants ascertained with GWAS, highlighting the utility of conditional frequency spectra.
{"title":"Characterizing selection on complex traits through conditional frequency spectra.","authors":"Roshni A Patel, Clemens L Weiß, Huisheng Zhu, Hakhamanesh Mostafavi, Yuval B Simons, Jeffrey P Spence, Jonathan K Pritchard","doi":"10.1093/genetics/iyae210","DOIUrl":"10.1093/genetics/iyae210","url":null,"abstract":"<p><p>Natural selection on complex traits is difficult to study in part due to the ascertainment inherent to genome-wide association studies (GWAS). The power to detect a trait-associated variant in GWAS is a function of frequency and effect size - but for traits under selection, the effect size of a variant determines the strength of selection against it, constraining its frequency. Recognizing the biases inherent to GWAS ascertainment, we propose studying the joint distribution of allele frequencies across populations, conditional on the frequencies in the GWAS cohort. Before considering these conditional frequency spectra, we first characterized the impact of selection and non-equilibrium demography on allele frequency dynamics forwards and backwards in time. We then used these results to understand conditional frequency spectra under realistic human demography. Finally, we investigated empirical conditional frequency spectra for GWAS variants associated with 106 complex traits, finding compelling evidence for either stabilizing or purifying selection. Our results provide insight into polygenic score portability and other properties of variants ascertained with GWAS, highlighting the utility of conditional frequency spectra.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142847951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-12DOI: 10.1093/genetics/iyae207
Jonathan P Harbin, Yongquan Shen, Shin-Yi Lin, Kevin Kemper, Eric S Haag, Erich M Schwarz, Ronald E Ellis
Sexual characteristics and reproductive systems are dynamic traits in many taxa, but the developmental modifications that allow change and innovation are largely unknown. A leading model for this process is the evolution of self-fertile hermaphrodites from male/female ancestors. However, these studies require direct analysis of sex-determination in male/female species, as well as in the hermaphroditic species that are related to them. In Caenorhabditis nematodes this has only become possible recently, with the discovery of new species. Here, we use gene editing to characterize major sex-determination genes in C. nigoni, a sister to the widely studied hermaphroditic species C. briggsae. These two species are close enough to mate and form partially fertile hybrids. First, we find that tra-1 functions as the master regulator of sex in C. nigoni, in both the soma and the germ line. Surprisingly, these mutants make only sperm, in contrast to tra-1 mutants in related hermaphroditic species. Moreover, the XX mutants display a unique defect in somatic gonad development that is not seen elsewhere in the genus. Second, the fem-3 gene acts upstream of tra-1 in C. nigoni, and the mutants are females, unlike in the sister species C. briggsae, where they develop as hermaphrodites. This result points to a divergence in the role of fem-3 in the germ line of these species. Third, tra-2 encodes a transmembrane receptor that acts upstream of fem-3 in C. nigoni. Outside of the germ line, tra-2 mutations in all species cause a similar pattern of partial masculinization. However, heterozygosity for tra-2 does not alter germ cell fates in C. nigoni, as it can in sensitized backgrounds of two hermaphroditic species of Caenorhabditis. Finally, the epistatic relationships point to a simple, linear germline pathway in which tra-2 regulates fem-3 which regulates tra-1, unlike the more complex relationships seen in hermaphrodite germ cell development. Taking these results together, the regulation of sex determination is more robust and streamlined in the male/female species C. nigoni than in related species that make self-fertile hermaphrodites, a conclusion supported by studies of interspecies hybrids using sex-determination mutations. Thus, we infer that the origin of self-fertility not only required mutations that activated the spermatogenesis program in XX germ lines, but prior to these there must have been mutations that decanalized the sex-determination process, allowing for subsequent changes to germ cell fates.
{"title":"Robust sex determination in the Caenorhabditis nigoni germ line.","authors":"Jonathan P Harbin, Yongquan Shen, Shin-Yi Lin, Kevin Kemper, Eric S Haag, Erich M Schwarz, Ronald E Ellis","doi":"10.1093/genetics/iyae207","DOIUrl":"10.1093/genetics/iyae207","url":null,"abstract":"<p><p>Sexual characteristics and reproductive systems are dynamic traits in many taxa, but the developmental modifications that allow change and innovation are largely unknown. A leading model for this process is the evolution of self-fertile hermaphrodites from male/female ancestors. However, these studies require direct analysis of sex-determination in male/female species, as well as in the hermaphroditic species that are related to them. In Caenorhabditis nematodes this has only become possible recently, with the discovery of new species. Here, we use gene editing to characterize major sex-determination genes in C. nigoni, a sister to the widely studied hermaphroditic species C. briggsae. These two species are close enough to mate and form partially fertile hybrids. First, we find that tra-1 functions as the master regulator of sex in C. nigoni, in both the soma and the germ line. Surprisingly, these mutants make only sperm, in contrast to tra-1 mutants in related hermaphroditic species. Moreover, the XX mutants display a unique defect in somatic gonad development that is not seen elsewhere in the genus. Second, the fem-3 gene acts upstream of tra-1 in C. nigoni, and the mutants are females, unlike in the sister species C. briggsae, where they develop as hermaphrodites. This result points to a divergence in the role of fem-3 in the germ line of these species. Third, tra-2 encodes a transmembrane receptor that acts upstream of fem-3 in C. nigoni. Outside of the germ line, tra-2 mutations in all species cause a similar pattern of partial masculinization. However, heterozygosity for tra-2 does not alter germ cell fates in C. nigoni, as it can in sensitized backgrounds of two hermaphroditic species of Caenorhabditis. Finally, the epistatic relationships point to a simple, linear germline pathway in which tra-2 regulates fem-3 which regulates tra-1, unlike the more complex relationships seen in hermaphrodite germ cell development. Taking these results together, the regulation of sex determination is more robust and streamlined in the male/female species C. nigoni than in related species that make self-fertile hermaphrodites, a conclusion supported by studies of interspecies hybrids using sex-determination mutations. Thus, we infer that the origin of self-fertility not only required mutations that activated the spermatogenesis program in XX germ lines, but prior to these there must have been mutations that decanalized the sex-determination process, allowing for subsequent changes to germ cell fates.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142814683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-09DOI: 10.1093/genetics/iyae205
John P Hamilton, Julia Brose, C Robin Buell
Potato is a key food crop with a complex, polyploid genome. Advancements in sequencing technologies coupled with improvements in genome assembly algorithms have enabled generation of phased, chromosome-scale genome assemblies for cultivated tetraploid potato. The SpudDB database houses potato genome sequence and annotation, with the doubled monoploid DM 1-3 516 R44 (hereafter DM) genome serving as the reference genome and haplotype. Diverse annotation data types for DM genes are provided through a suite of Gene Report Pages including gene expression profiles across 438 potato samples. To further annotate potato genes based on expression, 65 gene co-expression modules were constructed that permit identification of tightly co-regulated genes within DM across development and responses to wounding, abiotic stress, and biotic stress. Genome browser views of DM and 28 other potato genomes are provided along with a download page for genome sequence and annotation. To link syntenic genes within and between haplotypes, syntelogs were identified across 25 cultivated potato genomes. Through access to potato genome sequences and associated annotations, SpudDB can enable potato biologists, geneticists and breeders to continue to improve this key food crop.
{"title":"SpudDB: A database for accessing potato genomic data.","authors":"John P Hamilton, Julia Brose, C Robin Buell","doi":"10.1093/genetics/iyae205","DOIUrl":"https://doi.org/10.1093/genetics/iyae205","url":null,"abstract":"<p><p>Potato is a key food crop with a complex, polyploid genome. Advancements in sequencing technologies coupled with improvements in genome assembly algorithms have enabled generation of phased, chromosome-scale genome assemblies for cultivated tetraploid potato. The SpudDB database houses potato genome sequence and annotation, with the doubled monoploid DM 1-3 516 R44 (hereafter DM) genome serving as the reference genome and haplotype. Diverse annotation data types for DM genes are provided through a suite of Gene Report Pages including gene expression profiles across 438 potato samples. To further annotate potato genes based on expression, 65 gene co-expression modules were constructed that permit identification of tightly co-regulated genes within DM across development and responses to wounding, abiotic stress, and biotic stress. Genome browser views of DM and 28 other potato genomes are provided along with a download page for genome sequence and annotation. To link syntenic genes within and between haplotypes, syntelogs were identified across 25 cultivated potato genomes. Through access to potato genome sequences and associated annotations, SpudDB can enable potato biologists, geneticists and breeders to continue to improve this key food crop.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142808008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-28DOI: 10.1093/genetics/iyae199
Clarence Zheng, Curtis Furukawa, Jerry Liu, Srishti Sankaran, Han Lin, Nidhi Munugeti, Meranda Wang, Gerald R Smith
For decades, it has been repeatedly claimed that the potent bacterial helicase-nuclease RecBCD (exonuclease V) destroys foreign (non-self) DNA, such as that of phages, but repairs and recombines cellular (self) DNA. While this would constitute a strong host-survival mechanism, no phage destroyed by RecBCD is ever specified in those claims. To determine which phages are destroyed by RecBCD, we searched for phage isolates that grow on Escherichia coli ΔrecBCD but not on recBCD+. In contrast to the prevailing claim, we found none among >80 new isolates from nature and >80 from previous collections. Based on these and previous observations, we conclude that RecBCD repairs broken DNA that can recombine but destroys DNA that cannot recombine and recycles the nucleotides.
{"title":"Debunking the dogma that RecBCD nuclease destroys phage.","authors":"Clarence Zheng, Curtis Furukawa, Jerry Liu, Srishti Sankaran, Han Lin, Nidhi Munugeti, Meranda Wang, Gerald R Smith","doi":"10.1093/genetics/iyae199","DOIUrl":"https://doi.org/10.1093/genetics/iyae199","url":null,"abstract":"<p><p>For decades, it has been repeatedly claimed that the potent bacterial helicase-nuclease RecBCD (exonuclease V) destroys foreign (non-self) DNA, such as that of phages, but repairs and recombines cellular (self) DNA. While this would constitute a strong host-survival mechanism, no phage destroyed by RecBCD is ever specified in those claims. To determine which phages are destroyed by RecBCD, we searched for phage isolates that grow on Escherichia coli ΔrecBCD but not on recBCD+. In contrast to the prevailing claim, we found none among >80 new isolates from nature and >80 from previous collections. Based on these and previous observations, we conclude that RecBCD repairs broken DNA that can recombine but destroys DNA that cannot recombine and recycles the nucleotides.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142741074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-28DOI: 10.1093/genetics/iyae170
Nicholas R Powell, Renee C Geck, Dongbing Lai, Tyler Shugg, Todd C Skaar, Maitreya J Dunham
The glucose-6-phosphate dehydrogenase (G6PD) enzyme protects red blood cells against oxidative damage. Individuals with G6PD-impairing polymorphisms are at risk of hemolytic anemia from oxidative stressors. Prevention of G6PD deficiency-related hemolytic anemia is achievable by identifying affected individuals through G6PD genetic testing. However, accurately predicting the clinical consequence of G6PD variants is limited by over 800 G6PD variants which remain of uncertain significance (VUS). There also remains inconsistency in which deficiency-causing variants are included in genetic testing arrays: many institutions only test c.202G > A, though dozens of other variants can cause G6PD deficiency. Here, we improve G6PD genotype interpretations using the All of Us Research Program data and a yeast functional assay. We confirm that G6PD coding variants are the main contributor to decreased G6PD activity and that 13% of individuals in the All of Us data with deficiency-causing variants would be missed by only genotyping for c.202G > A. We expand clinical interpretation for G6PD VUS, reporting that c.595A > G ("Dagua" or "Açores") and the novel variant c.430C > G reduce activity sufficiently to lead to G6PD deficiency. We also provide evidence that 5 missense VUS are unlikely to lead to G6PD deficiency, and we applied the new World Health Organization (WHO) guidelines to recommend classifying 2 synonymous variants as WHO Class C. In total, we provide new or updated clinical interpretations for 9 G6PD variants. We anticipate these results will improve the accuracy, and prompt increased use, of G6PD genetic tests through a more complete clinical interpretation of G6PD variants.
葡萄糖-6-磷酸脱氢酶(G6PD)保护红细胞免受氧化损伤。具有g6pd损伤多态性的个体有氧化应激源导致溶血性贫血的风险。通过G6PD基因检测识别受影响的个体,可以预防G6PD缺乏症相关的溶血性贫血。然而,准确预测G6PD变异的临床后果受到800多个G6PD变异的限制,这些变异仍然具有不确定的意义(VUS)。导致G6PD缺陷的变异在基因检测阵列中也存在不一致:许多机构只检测c.202G > A,尽管其他数十种变异可能导致G6PD缺陷。在这里,我们使用All of Us Research Program数据和酵母功能分析来改进G6PD基因型解释。我们证实G6PD编码变异体是G6PD活性降低的主要原因,并且在All of Us数据中,仅通过基因分型c.202G > a就会遗漏13%具有缺陷性变异体的个体。我们扩大了对G6PD VUS的临床解释,报告了c.595A > G(“Dagua”或“aores”)和新变体c.430C > G的活性降低足以导致G6PD缺乏。我们还提供了5种错义VUS不太可能导致G6PD缺陷的证据,并且我们应用了新的世界卫生组织(WHO)指南,建议将2种同义变体分类为WHO c类。我们总共为9种G6PD变体提供了新的或更新的临床解释。我们预计这些结果将通过对G6PD变异的更完整的临床解释来提高G6PD基因检测的准确性,并促进更多的使用。
{"title":"Functional analysis of G6PD variants associated with low G6PD activity in the All of Us Research Program.","authors":"Nicholas R Powell, Renee C Geck, Dongbing Lai, Tyler Shugg, Todd C Skaar, Maitreya J Dunham","doi":"10.1093/genetics/iyae170","DOIUrl":"10.1093/genetics/iyae170","url":null,"abstract":"<p><p>The glucose-6-phosphate dehydrogenase (G6PD) enzyme protects red blood cells against oxidative damage. Individuals with G6PD-impairing polymorphisms are at risk of hemolytic anemia from oxidative stressors. Prevention of G6PD deficiency-related hemolytic anemia is achievable by identifying affected individuals through G6PD genetic testing. However, accurately predicting the clinical consequence of G6PD variants is limited by over 800 G6PD variants which remain of uncertain significance (VUS). There also remains inconsistency in which deficiency-causing variants are included in genetic testing arrays: many institutions only test c.202G > A, though dozens of other variants can cause G6PD deficiency. Here, we improve G6PD genotype interpretations using the All of Us Research Program data and a yeast functional assay. We confirm that G6PD coding variants are the main contributor to decreased G6PD activity and that 13% of individuals in the All of Us data with deficiency-causing variants would be missed by only genotyping for c.202G > A. We expand clinical interpretation for G6PD VUS, reporting that c.595A > G (\"Dagua\" or \"Açores\") and the novel variant c.430C > G reduce activity sufficiently to lead to G6PD deficiency. We also provide evidence that 5 missense VUS are unlikely to lead to G6PD deficiency, and we applied the new World Health Organization (WHO) guidelines to recommend classifying 2 synonymous variants as WHO Class C. In total, we provide new or updated clinical interpretations for 9 G6PD variants. We anticipate these results will improve the accuracy, and prompt increased use, of G6PD genetic tests through a more complete clinical interpretation of G6PD variants.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11631396/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142752193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-19DOI: 10.1093/genetics/iyae164
Anirban Samaddar, Tapabrata Maiti, Gustavo de Los Campos
Variable selection and large-scale hypothesis testing are techniques commonly used to analyze high-dimensional genomic data. Despite recent advances in theory and methodology, variable selection and inference with highly collinear features remain challenging. For instance, collinearity poses a great challenge in genome-wide association studies involving millions of variants, many of which may be in high linkage disequilibrium. In such settings, collinearity can significantly reduce the power of variable selection methods to identify individual variants associated with an outcome. To address such challenges, we developed a Bayesian hierarchical hypothesis testing (BHHT)-a novel multiresolution testing procedure that offers high power with adequate error control and fine-mapping resolution. We demonstrate through simulations that the proposed methodology has a power-FDR performance that is competitive with (and in many scenarios better than) state-of-the-art methods. Finally, we demonstrate the feasibility of using BHHT with large sample size (n∼ 300,000) and ultra dimensional genotypes (∼ 15 million single-nucleotide polymorphisms or SNPs) by applying it to eight complex traits using data from the UK-Biobank. Our results show that the proposed methodology leads to many more discoveries than those obtained using traditional SNP-centered inference procedures. The article is accompanied by open-source software that implements the methods described in this study using algorithms that scale to biobank-size ultra-high-dimensional data.
变量选择和大规模假设检验是分析高维基因组数据的常用技术。尽管最近在理论和方法上取得了进步,但变量选择和高度共线性特征的推断仍然具有挑战性。例如,在涉及数百万个变异体的全基因组关联研究中,共线性是一个巨大的挑战,其中许多变异体可能处于高度连锁不平衡状态。在这种情况下,共线性会大大降低变量选择方法识别与结果相关的个体变异的能力。为了应对这些挑战,我们开发了贝叶斯分层假设检验(BHHT)--一种新颖的多分辨率检验程序,它能在充分控制误差和精细映射分辨率的情况下提供高功率。我们通过仿真证明,所提出的方法的功率-FDR 性能可与最先进的方法相媲美(在许多情况下甚至优于)。最后,我们利用英国生物库的数据,将 BHHT 应用于八个复杂性状,证明了它在大样本量(n∼ 300,000)和超维基因型(1,500 万个单核苷酸多态性或 SNPs)条件下的可行性。结果表明,与传统的以 SNP 为中心的推断程序相比,我们提出的方法能带来更多的发现。文章附有开源软件,该软件使用可扩展到生物库规模的超高维数据的算法来实现本研究中描述的方法。
{"title":"Bayesian hierarchical hypothesis testing in large-scale genome-wide association analysis.","authors":"Anirban Samaddar, Tapabrata Maiti, Gustavo de Los Campos","doi":"10.1093/genetics/iyae164","DOIUrl":"10.1093/genetics/iyae164","url":null,"abstract":"<p><p>Variable selection and large-scale hypothesis testing are techniques commonly used to analyze high-dimensional genomic data. Despite recent advances in theory and methodology, variable selection and inference with highly collinear features remain challenging. For instance, collinearity poses a great challenge in genome-wide association studies involving millions of variants, many of which may be in high linkage disequilibrium. In such settings, collinearity can significantly reduce the power of variable selection methods to identify individual variants associated with an outcome. To address such challenges, we developed a Bayesian hierarchical hypothesis testing (BHHT)-a novel multiresolution testing procedure that offers high power with adequate error control and fine-mapping resolution. We demonstrate through simulations that the proposed methodology has a power-FDR performance that is competitive with (and in many scenarios better than) state-of-the-art methods. Finally, we demonstrate the feasibility of using BHHT with large sample size (n∼ 300,000) and ultra dimensional genotypes (∼ 15 million single-nucleotide polymorphisms or SNPs) by applying it to eight complex traits using data from the UK-Biobank. Our results show that the proposed methodology leads to many more discoveries than those obtained using traditional SNP-centered inference procedures. The article is accompanied by open-source software that implements the methods described in this study using algorithms that scale to biobank-size ultra-high-dimensional data.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11631447/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142668832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-12DOI: 10.1093/genetics/iyae185
Stacia R Engel, Suzi Aleksander, Robert S Nash, Edith D Wong, Shuai Weng, Stuart R Miyasato, Gavin Sherlock, J Michael Cherry
Budding yeast (Saccharomyces cerevisiae) is the most extensively characterized eukaryotic model organism and has long been used to gain insight into the fundamentals of genetics, cellular biology, and the functions of specific genes and proteins. The Saccharomyces Genome Database (SGD) is a scientific resource that provides information about the genome and biology of S. cerevisiae. For more than 30 years, SGD has maintained the genetic nomenclature, chromosome maps, and functional annotation for budding yeast along with search and analysis tools to explore these data. Here we describe recent updates at SGD, including the two most recent reference genome annotation updates, expanded biochemical pathways representation, changes to SGD search and data files, and other enhancements to the SGD website and user interface. These activities are part of our continuing effort to promote insights gained from yeast to enable the discovery of functional relationships between sequence and gene products in fungi and higher eukaryotes.
{"title":"Saccharomyces Genome Database: Advances in Genome Annotation, Expanded Biochemical Pathways, and Other Key Enhancements.","authors":"Stacia R Engel, Suzi Aleksander, Robert S Nash, Edith D Wong, Shuai Weng, Stuart R Miyasato, Gavin Sherlock, J Michael Cherry","doi":"10.1093/genetics/iyae185","DOIUrl":"10.1093/genetics/iyae185","url":null,"abstract":"<p><p>Budding yeast (Saccharomyces cerevisiae) is the most extensively characterized eukaryotic model organism and has long been used to gain insight into the fundamentals of genetics, cellular biology, and the functions of specific genes and proteins. The Saccharomyces Genome Database (SGD) is a scientific resource that provides information about the genome and biology of S. cerevisiae. For more than 30 years, SGD has maintained the genetic nomenclature, chromosome maps, and functional annotation for budding yeast along with search and analysis tools to explore these data. Here we describe recent updates at SGD, including the two most recent reference genome annotation updates, expanded biochemical pathways representation, changes to SGD search and data files, and other enhancements to the SGD website and user interface. These activities are part of our continuing effort to promote insights gained from yeast to enable the discovery of functional relationships between sequence and gene products in fungi and higher eukaryotes.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142630843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-06DOI: 10.1093/genetics/iyae144
Gözde Atağ, Shamam Waldman, Shai Carmi, Mehmet Somel
Patterson's f-statistics are among the most heavily utilized tools for analyzing genome-wide allele frequency data for demographic inference. Beyond studying admixture, f3- and f4-statistics are also used for clustering populations to identify groups with similar histories. However, previous studies have noted an unexpected behavior of f-statistics: multiple populations from a certain region systematically show higher genetic affinity to a more distant population than to their neighbors, a pattern that is mismatched with alternative measures of genetic similarity. We call this counter-intuitive pattern "sister repulsion". We first present a novel instance of sister repulsion, where genomes from Bronze Age East Anatolian sites show higher affinity toward Bronze Age Greece rather than each other. This is observed both using f3- and f4-statistics, contrasts with archaeological/historical expectation, and also contradicts genetic affinity patterns captured using principal components analysis or multidimensional scaling on genetic distances. We then propose a simple demographic model to explain this pattern, where sister populations receive gene flow from a genetically distant source. We calculate f3- and f4-statistics using simulated genetic data with varying population genetic parameters, confirming that low-level gene flow from an external source into populations from 1 region can create sister repulsion in f-statistics. Unidirectional gene flow between the studied regions (without an external source) can likewise create repulsion. Meanwhile, similar to our empirical observations, multidimensional scaling analyses of genetic distances still cluster sister populations together. Overall, our results highlight the impact of low-level admixture events when inferring demographic history using f-statistics.
Patterson 的 f 统计量是用于分析全基因组等位基因频率数据以进行人口推断的最常用工具之一。除了研究混杂外,f3 和 f4 统计量还用于聚类,以确定具有相似历史的群体。然而,以往的研究注意到了 f 统计量的一种意想不到的行为:来自某一地区的多个种群系统性地表现出与较远种群的遗传亲和性高于与邻近种群的遗传亲和性,这种模式与遗传相似性的其他衡量标准不匹配。我们称这种反直觉模式为 "姊妹排斥"。我们首先介绍了姊妹排斥的一个新实例,即青铜时代东安纳托利亚遗址的基因组与青铜时代希腊的亲和力更高,而不是相互亲和力更高。这是用 f3- 和 f4 统计法观察到的,与考古学/历史学的预期相反,也与用主成分分析或遗传距离多维缩放捕捉到的遗传亲和模式相矛盾。随后,我们提出了一个简单的人口统计模型来解释这种模式,即姐妹种群接受来自遗传上遥远来源的基因流。我们利用不同种群遗传参数的模拟遗传数据计算了f3-和f4-统计量,证实了来自外部的低水平基因流进入来自一个地区的种群会在f-统计量中产生姊妹排斥。研究区域之间的单向基因流动(无外部来源)同样会产生排斥。同时,与我们的经验观察相似,遗传距离的多维比例分析仍然会将姊妹种群聚集在一起。总之,我们的研究结果凸显了利用 f 统计量推断人口历史时低水平混杂事件的影响。
{"title":"An explanation for the sister repulsion phenomenon in Patterson's f-statistics.","authors":"Gözde Atağ, Shamam Waldman, Shai Carmi, Mehmet Somel","doi":"10.1093/genetics/iyae144","DOIUrl":"10.1093/genetics/iyae144","url":null,"abstract":"<p><p>Patterson's f-statistics are among the most heavily utilized tools for analyzing genome-wide allele frequency data for demographic inference. Beyond studying admixture, f3- and f4-statistics are also used for clustering populations to identify groups with similar histories. However, previous studies have noted an unexpected behavior of f-statistics: multiple populations from a certain region systematically show higher genetic affinity to a more distant population than to their neighbors, a pattern that is mismatched with alternative measures of genetic similarity. We call this counter-intuitive pattern \"sister repulsion\". We first present a novel instance of sister repulsion, where genomes from Bronze Age East Anatolian sites show higher affinity toward Bronze Age Greece rather than each other. This is observed both using f3- and f4-statistics, contrasts with archaeological/historical expectation, and also contradicts genetic affinity patterns captured using principal components analysis or multidimensional scaling on genetic distances. We then propose a simple demographic model to explain this pattern, where sister populations receive gene flow from a genetically distant source. We calculate f3- and f4-statistics using simulated genetic data with varying population genetic parameters, confirming that low-level gene flow from an external source into populations from 1 region can create sister repulsion in f-statistics. Unidirectional gene flow between the studied regions (without an external source) can likewise create repulsion. Meanwhile, similar to our empirical observations, multidimensional scaling analyses of genetic distances still cluster sister populations together. Overall, our results highlight the impact of low-level admixture events when inferring demographic history using f-statistics.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11538414/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142299277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-06DOI: 10.1093/genetics/iyae140
Arthur Zwaenepoel, Himani Sachdeva, Christelle Fraïsse
We consider how the genetic architecture underlying locally adaptive traits determines the strength of a barrier to gene flow in a mainland-island model. Assuming a general life cycle, we derive an expression for the effective migration rate when local adaptation is due to genetic variation at many loci under directional selection on the island, allowing for arbitrary fitness and dominance effects across loci. We show how the effective migration rate can be combined with classical single-locus diffusion theory to accurately predict multilocus differentiation between the mainland and island at migration-selection-drift equilibrium and determine the migration rate beyond which local adaptation collapses, while accounting for genetic drift and weak linkage. Using our efficient numerical tools, we then present a detailed study of the effects of dominance on barriers to gene flow, showing that when total selection is sufficiently strong, more recessive local adaptation generates stronger barriers to gene flow. We then study how heterogeneous genetic architectures of local adaptation affect barriers to gene flow, characterizing adaptive differentiation at migration-selection balance for different distributions of fitness effects. We find that a more heterogeneous genetic architecture generally yields a stronger genome-wide barrier to gene flow and that the detailed genetic architecture underlying locally adaptive traits can have an important effect on observable differentiation when divergence is not too large. Lastly, we study the limits of our approach as loci become more tightly linked, showing that our predictions remain accurate over a large biologically relevant domain.
{"title":"The genetic architecture of polygenic local adaptation and its role in shaping barriers to gene flow.","authors":"Arthur Zwaenepoel, Himani Sachdeva, Christelle Fraïsse","doi":"10.1093/genetics/iyae140","DOIUrl":"10.1093/genetics/iyae140","url":null,"abstract":"<p><p>We consider how the genetic architecture underlying locally adaptive traits determines the strength of a barrier to gene flow in a mainland-island model. Assuming a general life cycle, we derive an expression for the effective migration rate when local adaptation is due to genetic variation at many loci under directional selection on the island, allowing for arbitrary fitness and dominance effects across loci. We show how the effective migration rate can be combined with classical single-locus diffusion theory to accurately predict multilocus differentiation between the mainland and island at migration-selection-drift equilibrium and determine the migration rate beyond which local adaptation collapses, while accounting for genetic drift and weak linkage. Using our efficient numerical tools, we then present a detailed study of the effects of dominance on barriers to gene flow, showing that when total selection is sufficiently strong, more recessive local adaptation generates stronger barriers to gene flow. We then study how heterogeneous genetic architectures of local adaptation affect barriers to gene flow, characterizing adaptive differentiation at migration-selection balance for different distributions of fitness effects. We find that a more heterogeneous genetic architecture generally yields a stronger genome-wide barrier to gene flow and that the detailed genetic architecture underlying locally adaptive traits can have an important effect on observable differentiation when divergence is not too large. Lastly, we study the limits of our approach as loci become more tightly linked, showing that our predictions remain accurate over a large biologically relevant domain.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11538419/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142019206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}