Pub Date : 2026-01-07DOI: 10.1093/genetics/iyaf232
Manas Geeta Arun, Aidan Angus-Henry, Darren J Obbard, Jarrod D Hadfield
The rate of adaptation is equal to the additive genetic variance for relative fitness (VA) in the population. Estimating VA typically involves obtaining suitable measures of fitness on a large number of individuals with known pairwise relatedness. Such data are hard to collect and the results are often sensitive to the definition of fitness used. Here, we present a new method for estimating VA that does not involve making measurements of fitness on individuals, but instead tracks changes in the genetic composition of the population. First, we show that VA can readily be expressed as a function of the genome-wide diversity/linkage disequilibrium matrix and genome-wide expected change in allele frequency due to selection. We then show how independent experimental replicates can be used to infer the expected change in allele frequency due to selection and then estimate VA via a linear mixed model. Finally, using individual-based simulations, we demonstrate that our approach yields precise and accurate estimates over a range of biologically plausible scenarios.
{"title":"Estimating the additive genetic variance for relative fitness from changes in allele frequency.","authors":"Manas Geeta Arun, Aidan Angus-Henry, Darren J Obbard, Jarrod D Hadfield","doi":"10.1093/genetics/iyaf232","DOIUrl":"10.1093/genetics/iyaf232","url":null,"abstract":"<p><p>The rate of adaptation is equal to the additive genetic variance for relative fitness (VA) in the population. Estimating VA typically involves obtaining suitable measures of fitness on a large number of individuals with known pairwise relatedness. Such data are hard to collect and the results are often sensitive to the definition of fitness used. Here, we present a new method for estimating VA that does not involve making measurements of fitness on individuals, but instead tracks changes in the genetic composition of the population. First, we show that VA can readily be expressed as a function of the genome-wide diversity/linkage disequilibrium matrix and genome-wide expected change in allele frequency due to selection. We then show how independent experimental replicates can be used to infer the expected change in allele frequency due to selection and then estimate VA via a linear mixed model. Finally, using individual-based simulations, we demonstrate that our approach yields precise and accurate estimates over a range of biologically plausible scenarios.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":5.1,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12774835/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145394004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07DOI: 10.1093/genetics/iyaf251
Lydia K Wooldridge, Micah Pietraho, Peyton DiSiena, Sam Littman, Benjamin Clauss, Beth L Dumont
Recombination rates vary across species, populations, and sexes. House mice (Mus musculus) present a particularly extreme example. Prior studies have established large differences in global recombination rates between M. musculus subspecies and inbred strains, with males exhibiting more extensive variation than females. The observation of sex-limited variation has prompted the hypothesis that male and female recombination rates may evolve by distinct evolutionary mechanisms in M. musculus. Here, we formally evaluate this hypothesis in a phylogenetic framework. We combine cytogenetic estimates of genomic crossover counts with published data to compile a large dataset of sex-specific crossover rate estimates totaling >6,000 single meiotic cells from 31 genetically diverse inbred mouse strains representing five Mus species and four M. musculus subspecies. We show that the phylogenetic distribution of male recombination rates is well predicted by the underlying Mus phylogeny (phylogenetic heritability, HP2 = 0.82), contrasting with the weaker phylogenetic signal observed in females (HP2 = 0.24). M. m. musculus males exhibit a marked increase in recombination rate compared to males from other M. musculus subspecies, prompting us to test explicit models of lineage-specific evolution. We uncover evidence for an adaptive increase in male recombination rate along the M. m. musculus subspecies lineage but find no support for a parallel increase in females. Taken together, our findings confirm the hypothesis that recombination rate evolution in house mice is governed by distinct sex-specific evolutionary regimes and motivate future efforts to ascertain the sex-specific selective pressures and sex-specific genetic architectures that underlie these observations.
{"title":"Sex-specific evolutionary programs shape recombination rate evolution in house mice.","authors":"Lydia K Wooldridge, Micah Pietraho, Peyton DiSiena, Sam Littman, Benjamin Clauss, Beth L Dumont","doi":"10.1093/genetics/iyaf251","DOIUrl":"10.1093/genetics/iyaf251","url":null,"abstract":"<p><p>Recombination rates vary across species, populations, and sexes. House mice (Mus musculus) present a particularly extreme example. Prior studies have established large differences in global recombination rates between M. musculus subspecies and inbred strains, with males exhibiting more extensive variation than females. The observation of sex-limited variation has prompted the hypothesis that male and female recombination rates may evolve by distinct evolutionary mechanisms in M. musculus. Here, we formally evaluate this hypothesis in a phylogenetic framework. We combine cytogenetic estimates of genomic crossover counts with published data to compile a large dataset of sex-specific crossover rate estimates totaling >6,000 single meiotic cells from 31 genetically diverse inbred mouse strains representing five Mus species and four M. musculus subspecies. We show that the phylogenetic distribution of male recombination rates is well predicted by the underlying Mus phylogeny (phylogenetic heritability, HP2 = 0.82), contrasting with the weaker phylogenetic signal observed in females (HP2 = 0.24). M. m. musculus males exhibit a marked increase in recombination rate compared to males from other M. musculus subspecies, prompting us to test explicit models of lineage-specific evolution. We uncover evidence for an adaptive increase in male recombination rate along the M. m. musculus subspecies lineage but find no support for a parallel increase in females. Taken together, our findings confirm the hypothesis that recombination rate evolution in house mice is governed by distinct sex-specific evolutionary regimes and motivate future efforts to ascertain the sex-specific selective pressures and sex-specific genetic architectures that underlie these observations.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":5.1,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12774844/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145514458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07DOI: 10.1093/genetics/iyaf242
Hilde Schneemann, John J Welch
Hybridization between divergent populations places alleles in novel genomic contexts. This can inject adaptive variation-which is useful for breeders and conservationists-or reduce fitness, leading to reproductive isolation. Most theoretical work on hybrids involves haploid or diploid hybrids between two parental lineages, but real-world hybridization is often more complex. We introduce a simple fitness landscape model to predict hybrid fitness with arbitrary ploidy and an arbitrary number of hybridizing lineages. We test our model on published data from maize (Zea mays) and rye (Secale cereale), including hybrids between multiple inbred lines, both as diploids and synthetic tetraploids. Quantitative predictions for the effects of inbreeding, and the strength of progressive heterosis, are well supported. Results suggest that the model captures the important properties of dosage and genetic interactions, and may help to unify theories of heterosis and reproductive isolation.
{"title":"Predicting hybrid fitness: the effects of ploidy and complex ancestry.","authors":"Hilde Schneemann, John J Welch","doi":"10.1093/genetics/iyaf242","DOIUrl":"10.1093/genetics/iyaf242","url":null,"abstract":"<p><p>Hybridization between divergent populations places alleles in novel genomic contexts. This can inject adaptive variation-which is useful for breeders and conservationists-or reduce fitness, leading to reproductive isolation. Most theoretical work on hybrids involves haploid or diploid hybrids between two parental lineages, but real-world hybridization is often more complex. We introduce a simple fitness landscape model to predict hybrid fitness with arbitrary ploidy and an arbitrary number of hybridizing lineages. We test our model on published data from maize (Zea mays) and rye (Secale cereale), including hybrids between multiple inbred lines, both as diploids and synthetic tetraploids. Quantitative predictions for the effects of inbreeding, and the strength of progressive heterosis, are well supported. Results suggest that the model captures the important properties of dosage and genetic interactions, and may help to unify theories of heterosis and reproductive isolation.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":5.1,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12774836/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145477206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07DOI: 10.1093/genetics/iyaf068
Filip Thor, Carl Nettelblad
We introduce a framework for using contrastive learning for dimensionality reduction on genetic datasets to create principal component analysis (PCA)-like population visualizations. Contrastive learning is a self-supervised deep learning method that uses similarities between samples to train the neural network to discriminate between samples. Many of the advances in these types of models have been made for computer vision, but some common methodology does not translate well from image to genetic data. We define a loss function that outperforms loss functions commonly used in contrastive learning, and a data augmentation scheme tailored specifically towards SNP genotype datasets. We compare the performance of our method to PCA and contemporary nonlinear methods with respect to how well they preserve local and global structure, and how well they generalize to new data. Our method displays good preservation of global structure and has improved generalization properties over t-distributed stochastic neighbor embedding, Uniform Manifold Approximation and Projection, and popvae, while preserving relative distances between individuals to a high extent. A strength of the deep learning framework is the possibility of projecting new samples and fine-tuning to new datasets using a pretrained model without access to the original training data, and the ability to incorporate more domain-specific information in the model. We show examples of population classification on two datasets of dog and human genotypes.
{"title":"Dimensionality reduction of genetic data using contrastive learning.","authors":"Filip Thor, Carl Nettelblad","doi":"10.1093/genetics/iyaf068","DOIUrl":"10.1093/genetics/iyaf068","url":null,"abstract":"<p><p>We introduce a framework for using contrastive learning for dimensionality reduction on genetic datasets to create principal component analysis (PCA)-like population visualizations. Contrastive learning is a self-supervised deep learning method that uses similarities between samples to train the neural network to discriminate between samples. Many of the advances in these types of models have been made for computer vision, but some common methodology does not translate well from image to genetic data. We define a loss function that outperforms loss functions commonly used in contrastive learning, and a data augmentation scheme tailored specifically towards SNP genotype datasets. We compare the performance of our method to PCA and contemporary nonlinear methods with respect to how well they preserve local and global structure, and how well they generalize to new data. Our method displays good preservation of global structure and has improved generalization properties over t-distributed stochastic neighbor embedding, Uniform Manifold Approximation and Projection, and popvae, while preserving relative distances between individuals to a high extent. A strength of the deep learning framework is the possibility of projecting new samples and fine-tuning to new datasets using a pretrained model without access to the original training data, and the ability to incorporate more domain-specific information in the model. We show examples of population classification on two datasets of dog and human genotypes.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":5.1,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12774828/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143804628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07DOI: 10.1093/genetics/iyaf158
Ryan Christ, Xinxin Wang, Louis J M Aslett, David Steinsaltz, Ira Hall
Testing inferred haplotype genealogies for association with phenotypes has been a longstanding goal in human genetics given their potential to detect association signals driven by allelic heterogeneity-when multiple causal variants modulate a phenotype-in both coding and noncoding regions. Recent scalable methods for inferring locus-specific genealogical trees along the genome, or representations thereof, have made substantial progress towards this goal; however, the problem of testing these trees for association with phenotypes has remained unsolved due to the growth in the number of clades with increasing sample size. To address this issue, we introduce several practical improvements to the kalis ancestry inference engine, including a general optimal checkpointing algorithm for decoding hidden Markov models, thereby enabling efficient genome-wide analyses. We then propose LOCATER, a powerful new procedure based on the recently proposed Stable Distillation framework, to test local tree representations for trait association. Although LOCATER is demonstrated here in conjunction with kalis, it may be used for testing output from any ancestry inference engine, regardless of whether such engines return discrete tree structures, relatedness matrices, or some combination of the two at each locus. Using simulated quantitative phenotypes, our results indicate that LOCATER achieves substantial power gains over traditional single marker testing, ARG-Needle, and window-based testing in cases of allelic heterogeneity, while also improving causal region localization. These findings suggest that genealogy-based association testing will be a fruitful approach for gene discovery, especially for signals driven by multiple ultra-rare variants.
{"title":"Clade distillation for genome-wide association studies.","authors":"Ryan Christ, Xinxin Wang, Louis J M Aslett, David Steinsaltz, Ira Hall","doi":"10.1093/genetics/iyaf158","DOIUrl":"10.1093/genetics/iyaf158","url":null,"abstract":"<p><p>Testing inferred haplotype genealogies for association with phenotypes has been a longstanding goal in human genetics given their potential to detect association signals driven by allelic heterogeneity-when multiple causal variants modulate a phenotype-in both coding and noncoding regions. Recent scalable methods for inferring locus-specific genealogical trees along the genome, or representations thereof, have made substantial progress towards this goal; however, the problem of testing these trees for association with phenotypes has remained unsolved due to the growth in the number of clades with increasing sample size. To address this issue, we introduce several practical improvements to the kalis ancestry inference engine, including a general optimal checkpointing algorithm for decoding hidden Markov models, thereby enabling efficient genome-wide analyses. We then propose LOCATER, a powerful new procedure based on the recently proposed Stable Distillation framework, to test local tree representations for trait association. Although LOCATER is demonstrated here in conjunction with kalis, it may be used for testing output from any ancestry inference engine, regardless of whether such engines return discrete tree structures, relatedness matrices, or some combination of the two at each locus. Using simulated quantitative phenotypes, our results indicate that LOCATER achieves substantial power gains over traditional single marker testing, ARG-Needle, and window-based testing in cases of allelic heterogeneity, while also improving causal region localization. These findings suggest that genealogy-based association testing will be a fruitful approach for gene discovery, especially for signals driven by multiple ultra-rare variants.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":5.1,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12667359/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144838346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07DOI: 10.1093/genetics/iyaf229
Takahiro Sakamoto, Sam Yeaman
Local adaptation occurs when species adapt to spatially heterogeneous environments. The stability of local adaptation is determined by migration-selection-drift balance: selection favors adaptive divergence whereas migration and random genetic drift cause the collapse of divergence. The evolutionary dynamics of this balance have been extensively studied, but most previous theories used models with simple population structure and environmental variation, precluding their applicability to complex situations in nature. To address this issue, we developed a new theoretical method to analyze complex multi-population models, allowing heterogeneity in selection, migration, and population density. In essence, our method approximates a complex spatial model with a panmictic one-population model while retaining the core stochastic structure, enabling the application of conventional diffusion methods. By comparing with simulations, we confirmed that our method accurately describes stochastic evolutionary dynamics in various spatial models when migration is sufficiently high. This method is then applied to examine the effect of the pattern of environmental variation in 2D space. Assuming landscapes with different levels of the spatial autocorrelation of the environment, we found that the maintenance of locally adaptive alleles is significantly promoted when the spatial autocorrelation is high. These results highlight how complex spatial heterogeneity, as seen in nature, could affect the qualitative outcome of evolution.
{"title":"Maintenance of polymorphism in spatially heterogeneous environments.","authors":"Takahiro Sakamoto, Sam Yeaman","doi":"10.1093/genetics/iyaf229","DOIUrl":"10.1093/genetics/iyaf229","url":null,"abstract":"<p><p>Local adaptation occurs when species adapt to spatially heterogeneous environments. The stability of local adaptation is determined by migration-selection-drift balance: selection favors adaptive divergence whereas migration and random genetic drift cause the collapse of divergence. The evolutionary dynamics of this balance have been extensively studied, but most previous theories used models with simple population structure and environmental variation, precluding their applicability to complex situations in nature. To address this issue, we developed a new theoretical method to analyze complex multi-population models, allowing heterogeneity in selection, migration, and population density. In essence, our method approximates a complex spatial model with a panmictic one-population model while retaining the core stochastic structure, enabling the application of conventional diffusion methods. By comparing with simulations, we confirmed that our method accurately describes stochastic evolutionary dynamics in various spatial models when migration is sufficiently high. This method is then applied to examine the effect of the pattern of environmental variation in 2D space. Assuming landscapes with different levels of the spatial autocorrelation of the environment, we found that the maintenance of locally adaptive alleles is significantly promoted when the spatial autocorrelation is high. These results highlight how complex spatial heterogeneity, as seen in nature, could affect the qualitative outcome of evolution.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":5.1,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12774833/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145379528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07DOI: 10.1093/genetics/iyaf225
Moyao Wang, Payal Arora, Craig D Kaplan, David A Brow
Antitermination factors for eukaryotic RNA polymerase II (RNAP II) that are released upon binding sequences in the terminator of nascent transcripts were proposed almost 40 years ago but few candidates have been found. Here we report genetic evidence that the yeast nuclear RNA-binding protein Hrp1, also known as Nab4 and CF1B, acts as an RNAP II antitermination factor. A Lys to Glu substitution at residue 9 (K9E) of the Rpb3 subunit of RNAP II causes readthrough of Nrd1-Nab3-Sen1-dependent (NNS) terminators in a reporter gene and cold-sensitive growth, as does an Asp but not an Ala, Met, Arg, or Gln substitution. These allele-specific phenotypes and the location of Rpb3-K9 suggest substitution with Glu or Asp stabilizes binding of an antitermination factor via a salt bridge. A genome-wide selection for suppressors of the cold-sensitivity of Rpb3-K9E yielded an Arg to Gly substitution at residue 317 of Hrp1 in RNA recognition motif 2 (RRM2), consistent with the hypothesis. Nanopore direct RNA-seq revealed strong readthrough of endogenous NNS terminators due to Rpb3-K9E and confirmed their partial suppression by Hrp1-R317G. A targeted selection for suppressors of Rpb3-K9E in HRP1 yielded substitutions in RRMs 1 and 2 and in an essential Met- and Gln-rich low complexity domain, as well as early nonsense mutations. We propose that Hrp1 binds to the RNAP II elongation complex via these regions to promote elongation and, in the presence of Rpb3-K9E, is less rapidly released upon binding terminator sequences in the nascent transcript, resulting in readthrough. The Rpb3-K9E-suppressor substitutions in Hrp1 are proposed to weaken binding to the RNAP II elongation complex, compensating for Rpb3-K9E.
{"title":"Substitutions in RNA-binding protein Hrp1 map a potential interaction surface with the yeast RNA polymerase II elongation complex.","authors":"Moyao Wang, Payal Arora, Craig D Kaplan, David A Brow","doi":"10.1093/genetics/iyaf225","DOIUrl":"10.1093/genetics/iyaf225","url":null,"abstract":"<p><p>Antitermination factors for eukaryotic RNA polymerase II (RNAP II) that are released upon binding sequences in the terminator of nascent transcripts were proposed almost 40 years ago but few candidates have been found. Here we report genetic evidence that the yeast nuclear RNA-binding protein Hrp1, also known as Nab4 and CF1B, acts as an RNAP II antitermination factor. A Lys to Glu substitution at residue 9 (K9E) of the Rpb3 subunit of RNAP II causes readthrough of Nrd1-Nab3-Sen1-dependent (NNS) terminators in a reporter gene and cold-sensitive growth, as does an Asp but not an Ala, Met, Arg, or Gln substitution. These allele-specific phenotypes and the location of Rpb3-K9 suggest substitution with Glu or Asp stabilizes binding of an antitermination factor via a salt bridge. A genome-wide selection for suppressors of the cold-sensitivity of Rpb3-K9E yielded an Arg to Gly substitution at residue 317 of Hrp1 in RNA recognition motif 2 (RRM2), consistent with the hypothesis. Nanopore direct RNA-seq revealed strong readthrough of endogenous NNS terminators due to Rpb3-K9E and confirmed their partial suppression by Hrp1-R317G. A targeted selection for suppressors of Rpb3-K9E in HRP1 yielded substitutions in RRMs 1 and 2 and in an essential Met- and Gln-rich low complexity domain, as well as early nonsense mutations. We propose that Hrp1 binds to the RNAP II elongation complex via these regions to promote elongation and, in the presence of Rpb3-K9E, is less rapidly released upon binding terminator sequences in the nascent transcript, resulting in readthrough. The Rpb3-K9E-suppressor substitutions in Hrp1 are proposed to weaken binding to the RNAP II elongation complex, compensating for Rpb3-K9E.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":5.1,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12671597/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145349484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07DOI: 10.1093/genetics/iyaf204
Anna A Nagel, Bruce Rannala
Inferring the time of origin (age) of mutations is an old question in population genetics and inferring their population of origin has become of particular interest with the sequencing of the Neanderthal genome. However, existing methods to infer mutation ages and populations of origin do not explicitly consider population structure, migration rates, and divergence times, which may bias estimates, and it is unclear how to even apply single-population estimators to structured populations. We develop a method to jointly estimate the time and population of origin of a mutation (as well as the ancestral and derived states) in a structured population using population genomic data and examine its statistical performance using simulations. Results indicate that mutation age and population of origin can be quite uncertain, even with long sequences or many samples, but this uncertainty is accurately captured using credible intervals/sets. The ancestral nucleotide state is relatively easy to infer. We apply our method to whole genome data from the 1,000 Genomes Project, analyzing 7 SNP mutations from 6 genes associated with human skin pigmentation for populations from Great Britain, China, and Kenya. Our results partially support previous conclusions, with the putative ancestral alleles from the literature matching our inferences, while the mutation age estimates only overlap in some cases.
{"title":"Mutation ages and population origins inferred from genomes in structured populations.","authors":"Anna A Nagel, Bruce Rannala","doi":"10.1093/genetics/iyaf204","DOIUrl":"10.1093/genetics/iyaf204","url":null,"abstract":"<p><p>Inferring the time of origin (age) of mutations is an old question in population genetics and inferring their population of origin has become of particular interest with the sequencing of the Neanderthal genome. However, existing methods to infer mutation ages and populations of origin do not explicitly consider population structure, migration rates, and divergence times, which may bias estimates, and it is unclear how to even apply single-population estimators to structured populations. We develop a method to jointly estimate the time and population of origin of a mutation (as well as the ancestral and derived states) in a structured population using population genomic data and examine its statistical performance using simulations. Results indicate that mutation age and population of origin can be quite uncertain, even with long sequences or many samples, but this uncertainty is accurately captured using credible intervals/sets. The ancestral nucleotide state is relatively easy to infer. We apply our method to whole genome data from the 1,000 Genomes Project, analyzing 7 SNP mutations from 6 genes associated with human skin pigmentation for populations from Great Britain, China, and Kenya. Our results partially support previous conclusions, with the putative ancestral alleles from the literature matching our inferences, while the mutation age estimates only overlap in some cases.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":5.1,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07DOI: 10.1093/genetics/iyaf245
Chuanke Fu, Job van Schipstal, Mario P L Calus, Pascal Duenk
Although standard genomic prediction (GP) models such as genomic best linear unbiased prediction (GBLUP) assume that single-nucleotide polymorphisms (SNPs) contribute equally to genetic variation, some SNPs may be more informative than others because they are more closely linked to causal variants. GP models could therefore be finetuned by incorporating biological annotations. Here, we used combined annotation dependent depletion (CADD) scores, which reflect the likelihood of a genetic variant being deleterious, as prior information in genomic prediction. Our objective was to determine the benefit of using CADD scores to select or weigh SNPs in genomic prediction. We analyzed 10 traits in a dataset of 835 mice from the diversity outbred (DO) mouse population. For selecting or weighing SNPs, we either used the CADD scores at the exact position of SNPs (CADD-SNP) or the maximum CADD score in a predefined window around the SNPs (CADD-window). In addition, we employed 5 GP models (GBLUP, BayesA, BayesB, BayesC, and BayesR) to analyze different sets of selected SNPs, and a weighted GBLUP model for weighing scenarios. The results showed that selecting SNPs based on CADD-SNP did not improve prediction accuracy. In contrast, compared to using all SNPs, selecting the top 40% of SNPs based on CADD-window was the optimal scenario. This approach effectively removed noninformative SNPs and improved prediction accuracy for at least 6 out of 10 traits. The improvements among these traits ranged from an average of 0.014 for body weight at 10 weeks to 0.094 for bone mineral density across 5 GP models. Weighing (selected) SNPs based on either CADD-SNP or CADD-window had little impact on accuracy. In conclusion, using CADD-window scores to select SNPs improved prediction accuracy, but the benefit depended on the trait of interest and the GP model that was used, while using CADD scores to weigh SNPs did not improve prediction accuracy.
{"title":"Genomic prediction using mCADD scores as prior information in a mouse population.","authors":"Chuanke Fu, Job van Schipstal, Mario P L Calus, Pascal Duenk","doi":"10.1093/genetics/iyaf245","DOIUrl":"10.1093/genetics/iyaf245","url":null,"abstract":"<p><p>Although standard genomic prediction (GP) models such as genomic best linear unbiased prediction (GBLUP) assume that single-nucleotide polymorphisms (SNPs) contribute equally to genetic variation, some SNPs may be more informative than others because they are more closely linked to causal variants. GP models could therefore be finetuned by incorporating biological annotations. Here, we used combined annotation dependent depletion (CADD) scores, which reflect the likelihood of a genetic variant being deleterious, as prior information in genomic prediction. Our objective was to determine the benefit of using CADD scores to select or weigh SNPs in genomic prediction. We analyzed 10 traits in a dataset of 835 mice from the diversity outbred (DO) mouse population. For selecting or weighing SNPs, we either used the CADD scores at the exact position of SNPs (CADD-SNP) or the maximum CADD score in a predefined window around the SNPs (CADD-window). In addition, we employed 5 GP models (GBLUP, BayesA, BayesB, BayesC, and BayesR) to analyze different sets of selected SNPs, and a weighted GBLUP model for weighing scenarios. The results showed that selecting SNPs based on CADD-SNP did not improve prediction accuracy. In contrast, compared to using all SNPs, selecting the top 40% of SNPs based on CADD-window was the optimal scenario. This approach effectively removed noninformative SNPs and improved prediction accuracy for at least 6 out of 10 traits. The improvements among these traits ranged from an average of 0.014 for body weight at 10 weeks to 0.094 for bone mineral density across 5 GP models. Weighing (selected) SNPs based on either CADD-SNP or CADD-window had little impact on accuracy. In conclusion, using CADD-window scores to select SNPs improved prediction accuracy, but the benefit depended on the trait of interest and the GP model that was used, while using CADD scores to weigh SNPs did not improve prediction accuracy.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":5.1,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145514472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07DOI: 10.1093/genetics/iyaf060
Peng Wang, Pengfei Lyu, Shyamal Peddada, Hongyuan Cao
Data obtained from high throughput experiments often exhibit complex dependencies among features. These dependencies arise from various sources, including genetic correlation, batch effects, technical replicates, and shared biological pathways. Ignoring these dependencies can lead to inflated false discovery rate (FDR), reduced statistical power, and biased biological interpretations. Properly accounting for these dependencies is crucial for accurate detection of biological signals. We propose a new method called Analysis of Correlated Expressions (ACE) to compare the mean expression of features between two groups. ACE is based on a factor analytic model that accounts for dependence among features and also incorporates heterogeneity of variances between groups, a common feature of high throughput data. Furthermore, ACE does not require the data to be normally distributed. It is scalable and free of any unknown tuning parameters. Extensive simulation studies indicate that it is more powerful than many existing methods while controlling the FDR. Application of ACE to a microRNA dataset, a neuroblastoma gene expression dataset, and a Huntington's disease dataset resulted in some novel findings that were missed by existing methods.
{"title":"Statistical analysis of correlated expression data from high throughput experiments.","authors":"Peng Wang, Pengfei Lyu, Shyamal Peddada, Hongyuan Cao","doi":"10.1093/genetics/iyaf060","DOIUrl":"10.1093/genetics/iyaf060","url":null,"abstract":"<p><p>Data obtained from high throughput experiments often exhibit complex dependencies among features. These dependencies arise from various sources, including genetic correlation, batch effects, technical replicates, and shared biological pathways. Ignoring these dependencies can lead to inflated false discovery rate (FDR), reduced statistical power, and biased biological interpretations. Properly accounting for these dependencies is crucial for accurate detection of biological signals. We propose a new method called Analysis of Correlated Expressions (ACE) to compare the mean expression of features between two groups. ACE is based on a factor analytic model that accounts for dependence among features and also incorporates heterogeneity of variances between groups, a common feature of high throughput data. Furthermore, ACE does not require the data to be normally distributed. It is scalable and free of any unknown tuning parameters. Extensive simulation studies indicate that it is more powerful than many existing methods while controlling the FDR. Application of ACE to a microRNA dataset, a neuroblastoma gene expression dataset, and a Huntington's disease dataset resulted in some novel findings that were missed by existing methods.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":5.1,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12774819/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143743975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}