Pub Date : 2026-01-07DOI: 10.1093/genetics/iyaf232
Manas Geeta Arun, Aidan Angus-Henry, Darren J Obbard, Jarrod D Hadfield
The rate of adaptation is equal to the additive genetic variance for relative fitness (VA) in the population. Estimating VA typically involves obtaining suitable measures of fitness on a large number of individuals with known pairwise relatedness. Such data are hard to collect and the results are often sensitive to the definition of fitness used. Here, we present a new method for estimating VA that does not involve making measurements of fitness on individuals, but instead tracks changes in the genetic composition of the population. First, we show that VA can readily be expressed as a function of the genome-wide diversity/linkage disequilibrium matrix and genome-wide expected change in allele frequency due to selection. We then show how independent experimental replicates can be used to infer the expected change in allele frequency due to selection and then estimate VA via a linear mixed model. Finally, using individual-based simulations, we demonstrate that our approach yields precise and accurate estimates over a range of biologically plausible scenarios.
{"title":"Estimating the additive genetic variance for relative fitness from changes in allele frequency.","authors":"Manas Geeta Arun, Aidan Angus-Henry, Darren J Obbard, Jarrod D Hadfield","doi":"10.1093/genetics/iyaf232","DOIUrl":"10.1093/genetics/iyaf232","url":null,"abstract":"<p><p>The rate of adaptation is equal to the additive genetic variance for relative fitness (VA) in the population. Estimating VA typically involves obtaining suitable measures of fitness on a large number of individuals with known pairwise relatedness. Such data are hard to collect and the results are often sensitive to the definition of fitness used. Here, we present a new method for estimating VA that does not involve making measurements of fitness on individuals, but instead tracks changes in the genetic composition of the population. First, we show that VA can readily be expressed as a function of the genome-wide diversity/linkage disequilibrium matrix and genome-wide expected change in allele frequency due to selection. We then show how independent experimental replicates can be used to infer the expected change in allele frequency due to selection and then estimate VA via a linear mixed model. Finally, using individual-based simulations, we demonstrate that our approach yields precise and accurate estimates over a range of biologically plausible scenarios.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":5.1,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12774835/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145394004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07DOI: 10.1093/genetics/iyaf251
Lydia K Wooldridge, Micah Pietraho, Peyton DiSiena, Sam Littman, Benjamin Clauss, Beth L Dumont
Recombination rates vary across species, populations, and sexes. House mice (Mus musculus) present a particularly extreme example. Prior studies have established large differences in global recombination rates between M. musculus subspecies and inbred strains, with males exhibiting more extensive variation than females. The observation of sex-limited variation has prompted the hypothesis that male and female recombination rates may evolve by distinct evolutionary mechanisms in M. musculus. Here, we formally evaluate this hypothesis in a phylogenetic framework. We combine cytogenetic estimates of genomic crossover counts with published data to compile a large dataset of sex-specific crossover rate estimates totaling >6,000 single meiotic cells from 31 genetically diverse inbred mouse strains representing five Mus species and four M. musculus subspecies. We show that the phylogenetic distribution of male recombination rates is well predicted by the underlying Mus phylogeny (phylogenetic heritability, HP2 = 0.82), contrasting with the weaker phylogenetic signal observed in females (HP2 = 0.24). M. m. musculus males exhibit a marked increase in recombination rate compared to males from other M. musculus subspecies, prompting us to test explicit models of lineage-specific evolution. We uncover evidence for an adaptive increase in male recombination rate along the M. m. musculus subspecies lineage but find no support for a parallel increase in females. Taken together, our findings confirm the hypothesis that recombination rate evolution in house mice is governed by distinct sex-specific evolutionary regimes and motivate future efforts to ascertain the sex-specific selective pressures and sex-specific genetic architectures that underlie these observations.
{"title":"Sex-specific evolutionary programs shape recombination rate evolution in house mice.","authors":"Lydia K Wooldridge, Micah Pietraho, Peyton DiSiena, Sam Littman, Benjamin Clauss, Beth L Dumont","doi":"10.1093/genetics/iyaf251","DOIUrl":"10.1093/genetics/iyaf251","url":null,"abstract":"<p><p>Recombination rates vary across species, populations, and sexes. House mice (Mus musculus) present a particularly extreme example. Prior studies have established large differences in global recombination rates between M. musculus subspecies and inbred strains, with males exhibiting more extensive variation than females. The observation of sex-limited variation has prompted the hypothesis that male and female recombination rates may evolve by distinct evolutionary mechanisms in M. musculus. Here, we formally evaluate this hypothesis in a phylogenetic framework. We combine cytogenetic estimates of genomic crossover counts with published data to compile a large dataset of sex-specific crossover rate estimates totaling >6,000 single meiotic cells from 31 genetically diverse inbred mouse strains representing five Mus species and four M. musculus subspecies. We show that the phylogenetic distribution of male recombination rates is well predicted by the underlying Mus phylogeny (phylogenetic heritability, HP2 = 0.82), contrasting with the weaker phylogenetic signal observed in females (HP2 = 0.24). M. m. musculus males exhibit a marked increase in recombination rate compared to males from other M. musculus subspecies, prompting us to test explicit models of lineage-specific evolution. We uncover evidence for an adaptive increase in male recombination rate along the M. m. musculus subspecies lineage but find no support for a parallel increase in females. Taken together, our findings confirm the hypothesis that recombination rate evolution in house mice is governed by distinct sex-specific evolutionary regimes and motivate future efforts to ascertain the sex-specific selective pressures and sex-specific genetic architectures that underlie these observations.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":5.1,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12774844/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145514458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07DOI: 10.1093/genetics/iyaf068
Filip Thor, Carl Nettelblad
We introduce a framework for using contrastive learning for dimensionality reduction on genetic datasets to create principal component analysis (PCA)-like population visualizations. Contrastive learning is a self-supervised deep learning method that uses similarities between samples to train the neural network to discriminate between samples. Many of the advances in these types of models have been made for computer vision, but some common methodology does not translate well from image to genetic data. We define a loss function that outperforms loss functions commonly used in contrastive learning, and a data augmentation scheme tailored specifically towards SNP genotype datasets. We compare the performance of our method to PCA and contemporary nonlinear methods with respect to how well they preserve local and global structure, and how well they generalize to new data. Our method displays good preservation of global structure and has improved generalization properties over t-distributed stochastic neighbor embedding, Uniform Manifold Approximation and Projection, and popvae, while preserving relative distances between individuals to a high extent. A strength of the deep learning framework is the possibility of projecting new samples and fine-tuning to new datasets using a pretrained model without access to the original training data, and the ability to incorporate more domain-specific information in the model. We show examples of population classification on two datasets of dog and human genotypes.
{"title":"Dimensionality reduction of genetic data using contrastive learning.","authors":"Filip Thor, Carl Nettelblad","doi":"10.1093/genetics/iyaf068","DOIUrl":"10.1093/genetics/iyaf068","url":null,"abstract":"<p><p>We introduce a framework for using contrastive learning for dimensionality reduction on genetic datasets to create principal component analysis (PCA)-like population visualizations. Contrastive learning is a self-supervised deep learning method that uses similarities between samples to train the neural network to discriminate between samples. Many of the advances in these types of models have been made for computer vision, but some common methodology does not translate well from image to genetic data. We define a loss function that outperforms loss functions commonly used in contrastive learning, and a data augmentation scheme tailored specifically towards SNP genotype datasets. We compare the performance of our method to PCA and contemporary nonlinear methods with respect to how well they preserve local and global structure, and how well they generalize to new data. Our method displays good preservation of global structure and has improved generalization properties over t-distributed stochastic neighbor embedding, Uniform Manifold Approximation and Projection, and popvae, while preserving relative distances between individuals to a high extent. A strength of the deep learning framework is the possibility of projecting new samples and fine-tuning to new datasets using a pretrained model without access to the original training data, and the ability to incorporate more domain-specific information in the model. We show examples of population classification on two datasets of dog and human genotypes.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":5.1,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12774828/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143804628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07DOI: 10.1093/genetics/iyaf158
Ryan Christ, Xinxin Wang, Louis J M Aslett, David Steinsaltz, Ira Hall
Testing inferred haplotype genealogies for association with phenotypes has been a longstanding goal in human genetics given their potential to detect association signals driven by allelic heterogeneity-when multiple causal variants modulate a phenotype-in both coding and noncoding regions. Recent scalable methods for inferring locus-specific genealogical trees along the genome, or representations thereof, have made substantial progress towards this goal; however, the problem of testing these trees for association with phenotypes has remained unsolved due to the growth in the number of clades with increasing sample size. To address this issue, we introduce several practical improvements to the kalis ancestry inference engine, including a general optimal checkpointing algorithm for decoding hidden Markov models, thereby enabling efficient genome-wide analyses. We then propose LOCATER, a powerful new procedure based on the recently proposed Stable Distillation framework, to test local tree representations for trait association. Although LOCATER is demonstrated here in conjunction with kalis, it may be used for testing output from any ancestry inference engine, regardless of whether such engines return discrete tree structures, relatedness matrices, or some combination of the two at each locus. Using simulated quantitative phenotypes, our results indicate that LOCATER achieves substantial power gains over traditional single marker testing, ARG-Needle, and window-based testing in cases of allelic heterogeneity, while also improving causal region localization. These findings suggest that genealogy-based association testing will be a fruitful approach for gene discovery, especially for signals driven by multiple ultra-rare variants.
{"title":"Clade distillation for genome-wide association studies.","authors":"Ryan Christ, Xinxin Wang, Louis J M Aslett, David Steinsaltz, Ira Hall","doi":"10.1093/genetics/iyaf158","DOIUrl":"10.1093/genetics/iyaf158","url":null,"abstract":"<p><p>Testing inferred haplotype genealogies for association with phenotypes has been a longstanding goal in human genetics given their potential to detect association signals driven by allelic heterogeneity-when multiple causal variants modulate a phenotype-in both coding and noncoding regions. Recent scalable methods for inferring locus-specific genealogical trees along the genome, or representations thereof, have made substantial progress towards this goal; however, the problem of testing these trees for association with phenotypes has remained unsolved due to the growth in the number of clades with increasing sample size. To address this issue, we introduce several practical improvements to the kalis ancestry inference engine, including a general optimal checkpointing algorithm for decoding hidden Markov models, thereby enabling efficient genome-wide analyses. We then propose LOCATER, a powerful new procedure based on the recently proposed Stable Distillation framework, to test local tree representations for trait association. Although LOCATER is demonstrated here in conjunction with kalis, it may be used for testing output from any ancestry inference engine, regardless of whether such engines return discrete tree structures, relatedness matrices, or some combination of the two at each locus. Using simulated quantitative phenotypes, our results indicate that LOCATER achieves substantial power gains over traditional single marker testing, ARG-Needle, and window-based testing in cases of allelic heterogeneity, while also improving causal region localization. These findings suggest that genealogy-based association testing will be a fruitful approach for gene discovery, especially for signals driven by multiple ultra-rare variants.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":5.1,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12667359/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144838346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07DOI: 10.1093/genetics/iyaf229
Takahiro Sakamoto, Sam Yeaman
Local adaptation occurs when species adapt to spatially heterogeneous environments. The stability of local adaptation is determined by migration-selection-drift balance: selection favors adaptive divergence whereas migration and random genetic drift cause the collapse of divergence. The evolutionary dynamics of this balance have been extensively studied, but most previous theories used models with simple population structure and environmental variation, precluding their applicability to complex situations in nature. To address this issue, we developed a new theoretical method to analyze complex multi-population models, allowing heterogeneity in selection, migration, and population density. In essence, our method approximates a complex spatial model with a panmictic one-population model while retaining the core stochastic structure, enabling the application of conventional diffusion methods. By comparing with simulations, we confirmed that our method accurately describes stochastic evolutionary dynamics in various spatial models when migration is sufficiently high. This method is then applied to examine the effect of the pattern of environmental variation in 2D space. Assuming landscapes with different levels of the spatial autocorrelation of the environment, we found that the maintenance of locally adaptive alleles is significantly promoted when the spatial autocorrelation is high. These results highlight how complex spatial heterogeneity, as seen in nature, could affect the qualitative outcome of evolution.
{"title":"Maintenance of polymorphism in spatially heterogeneous environments.","authors":"Takahiro Sakamoto, Sam Yeaman","doi":"10.1093/genetics/iyaf229","DOIUrl":"10.1093/genetics/iyaf229","url":null,"abstract":"<p><p>Local adaptation occurs when species adapt to spatially heterogeneous environments. The stability of local adaptation is determined by migration-selection-drift balance: selection favors adaptive divergence whereas migration and random genetic drift cause the collapse of divergence. The evolutionary dynamics of this balance have been extensively studied, but most previous theories used models with simple population structure and environmental variation, precluding their applicability to complex situations in nature. To address this issue, we developed a new theoretical method to analyze complex multi-population models, allowing heterogeneity in selection, migration, and population density. In essence, our method approximates a complex spatial model with a panmictic one-population model while retaining the core stochastic structure, enabling the application of conventional diffusion methods. By comparing with simulations, we confirmed that our method accurately describes stochastic evolutionary dynamics in various spatial models when migration is sufficiently high. This method is then applied to examine the effect of the pattern of environmental variation in 2D space. Assuming landscapes with different levels of the spatial autocorrelation of the environment, we found that the maintenance of locally adaptive alleles is significantly promoted when the spatial autocorrelation is high. These results highlight how complex spatial heterogeneity, as seen in nature, could affect the qualitative outcome of evolution.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":5.1,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12774833/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145379528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07DOI: 10.1093/genetics/iyaf242
Hilde Schneemann, John J Welch
Hybridization between divergent populations places alleles in novel genomic contexts. This can inject adaptive variation-which is useful for breeders and conservationists-or reduce fitness, leading to reproductive isolation. Most theoretical work on hybrids involves haploid or diploid hybrids between two parental lineages, but real-world hybridization is often more complex. We introduce a simple fitness landscape model to predict hybrid fitness with arbitrary ploidy and an arbitrary number of hybridizing lineages. We test our model on published data from maize (Zea mays) and rye (Secale cereale), including hybrids between multiple inbred lines, both as diploids and synthetic tetraploids. Quantitative predictions for the effects of inbreeding, and the strength of progressive heterosis, are well supported. Results suggest that the model captures the important properties of dosage and genetic interactions, and may help to unify theories of heterosis and reproductive isolation.
{"title":"Predicting hybrid fitness: the effects of ploidy and complex ancestry.","authors":"Hilde Schneemann, John J Welch","doi":"10.1093/genetics/iyaf242","DOIUrl":"10.1093/genetics/iyaf242","url":null,"abstract":"<p><p>Hybridization between divergent populations places alleles in novel genomic contexts. This can inject adaptive variation-which is useful for breeders and conservationists-or reduce fitness, leading to reproductive isolation. Most theoretical work on hybrids involves haploid or diploid hybrids between two parental lineages, but real-world hybridization is often more complex. We introduce a simple fitness landscape model to predict hybrid fitness with arbitrary ploidy and an arbitrary number of hybridizing lineages. We test our model on published data from maize (Zea mays) and rye (Secale cereale), including hybrids between multiple inbred lines, both as diploids and synthetic tetraploids. Quantitative predictions for the effects of inbreeding, and the strength of progressive heterosis, are well supported. Results suggest that the model captures the important properties of dosage and genetic interactions, and may help to unify theories of heterosis and reproductive isolation.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":5.1,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12774836/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145477206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07DOI: 10.1093/genetics/iyaf060
Peng Wang, Pengfei Lyu, Shyamal Peddada, Hongyuan Cao
Data obtained from high throughput experiments often exhibit complex dependencies among features. These dependencies arise from various sources, including genetic correlation, batch effects, technical replicates, and shared biological pathways. Ignoring these dependencies can lead to inflated false discovery rate (FDR), reduced statistical power, and biased biological interpretations. Properly accounting for these dependencies is crucial for accurate detection of biological signals. We propose a new method called Analysis of Correlated Expressions (ACE) to compare the mean expression of features between two groups. ACE is based on a factor analytic model that accounts for dependence among features and also incorporates heterogeneity of variances between groups, a common feature of high throughput data. Furthermore, ACE does not require the data to be normally distributed. It is scalable and free of any unknown tuning parameters. Extensive simulation studies indicate that it is more powerful than many existing methods while controlling the FDR. Application of ACE to a microRNA dataset, a neuroblastoma gene expression dataset, and a Huntington's disease dataset resulted in some novel findings that were missed by existing methods.
{"title":"Statistical analysis of correlated expression data from high throughput experiments.","authors":"Peng Wang, Pengfei Lyu, Shyamal Peddada, Hongyuan Cao","doi":"10.1093/genetics/iyaf060","DOIUrl":"10.1093/genetics/iyaf060","url":null,"abstract":"<p><p>Data obtained from high throughput experiments often exhibit complex dependencies among features. These dependencies arise from various sources, including genetic correlation, batch effects, technical replicates, and shared biological pathways. Ignoring these dependencies can lead to inflated false discovery rate (FDR), reduced statistical power, and biased biological interpretations. Properly accounting for these dependencies is crucial for accurate detection of biological signals. We propose a new method called Analysis of Correlated Expressions (ACE) to compare the mean expression of features between two groups. ACE is based on a factor analytic model that accounts for dependence among features and also incorporates heterogeneity of variances between groups, a common feature of high throughput data. Furthermore, ACE does not require the data to be normally distributed. It is scalable and free of any unknown tuning parameters. Extensive simulation studies indicate that it is more powerful than many existing methods while controlling the FDR. Application of ACE to a microRNA dataset, a neuroblastoma gene expression dataset, and a Huntington's disease dataset resulted in some novel findings that were missed by existing methods.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":5.1,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12774819/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143743975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07DOI: 10.1093/genetics/iyaf209
Alan M Moses, Jason E Stajich, Audrey P Gasch, David A Knowles
Gene expression patterns are determined to a large extent by transcription factor (TF) binding to noncoding regulatory regions in the genome. However, gene expression cannot yet be systematically predicted from genome sequences, in part because nonfunctional matches to the sequence patterns (motifs) recognized by TFs occur frequently throughout the genome. Large-scale functional genomics data for many TFs has enabled characterization of regulatory networks in experimentally accessible cells such as budding yeast. Beyond yeast, fungi are important industrial organisms and pathogens, but large-scale functional data is only sporadically available. Uncharacterized regulatory networks control key pathways and gene expression programs associated with fungal phenotypes. Here, we explore a sequence-only approach to inferring regulatory networks by leveraging the 100s of genomes now available for many clades of fungi. We use gene orthology as the learning signal to infer interpretable, TF motif-based representations of noncoding regulatory regions. Using these representations to identify conserved signals for motifs, comparative genomics can be scaled to evolutionary comparisons where sequence similarity cannot be detected. We show that similarity of these conserved motif signals predicts gene expression and regulation better than using experimental data, and that we can infer known and novel regulatory connections in diverse fungi. Our new predictions include a pathway for recombination in Candida albicans and pathways for mating and an RNAi immune response in Neurospora. Taken together, our results indicate that specific hypotheses about transcriptional regulation in fungi can be obtained for many genes from genome sequence analysis alone.
{"title":"Inferring fungal cis-regulatory networks from genome sequences via unsupervised and interpretable representation learning.","authors":"Alan M Moses, Jason E Stajich, Audrey P Gasch, David A Knowles","doi":"10.1093/genetics/iyaf209","DOIUrl":"10.1093/genetics/iyaf209","url":null,"abstract":"<p><p>Gene expression patterns are determined to a large extent by transcription factor (TF) binding to noncoding regulatory regions in the genome. However, gene expression cannot yet be systematically predicted from genome sequences, in part because nonfunctional matches to the sequence patterns (motifs) recognized by TFs occur frequently throughout the genome. Large-scale functional genomics data for many TFs has enabled characterization of regulatory networks in experimentally accessible cells such as budding yeast. Beyond yeast, fungi are important industrial organisms and pathogens, but large-scale functional data is only sporadically available. Uncharacterized regulatory networks control key pathways and gene expression programs associated with fungal phenotypes. Here, we explore a sequence-only approach to inferring regulatory networks by leveraging the 100s of genomes now available for many clades of fungi. We use gene orthology as the learning signal to infer interpretable, TF motif-based representations of noncoding regulatory regions. Using these representations to identify conserved signals for motifs, comparative genomics can be scaled to evolutionary comparisons where sequence similarity cannot be detected. We show that similarity of these conserved motif signals predicts gene expression and regulation better than using experimental data, and that we can infer known and novel regulatory connections in diverse fungi. Our new predictions include a pathway for recombination in Candida albicans and pathways for mating and an RNAi immune response in Neurospora. Taken together, our results indicate that specific hypotheses about transcriptional regulation in fungi can be obtained for many genes from genome sequence analysis alone.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":5.1,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12774827/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145193670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07DOI: 10.1093/genetics/iyaf212
Meikun Zhou, Maddie E James, Jan Engelstädter, Daniel Ortiz-Barrientos
Despite transformative advances in genomic technologies, missing data remain a fundamental constraint that limits the full potential of genomic research across biological systems. Genotype imputation offers a remedy by inferring unobserved genotypes from observed data. However, conventional imputation methods typically rely on external reference panels constructed from complete genome sequences of hundreds of individuals, a costly approach largely inaccessible for nonmodel organisms. Moreover, these methods generally overlook novel genomic positions not captured in existing panels. To overcome these limitations, we developed Retriever, a method for constructing a chimeric reference panel that enables genotype imputation without the need for an external reference panel. Retriever constructs a chimeric reference panel directly from the target samples using a sliding window approach to identify and retrieve genomic partitions with complete data. By exploiting the complementary distribution of missing data across samples, Retriever assembles a panel that preserves local patterns of linkage disequilibrium and captures novel variants. When the Retriever-constructed panels are used with Beagle for genotype imputation, Retriever consistently achieves accuracy exceeding 95% across diverse datasets, including plants, animals, and fungi. By eliminating the need for costly external panels, Retriever provides an accessible and cost-effective solution that broadens the application of genomic analyses across various species.
{"title":"Chimeric reference panels for genomic imputation.","authors":"Meikun Zhou, Maddie E James, Jan Engelstädter, Daniel Ortiz-Barrientos","doi":"10.1093/genetics/iyaf212","DOIUrl":"10.1093/genetics/iyaf212","url":null,"abstract":"<p><p>Despite transformative advances in genomic technologies, missing data remain a fundamental constraint that limits the full potential of genomic research across biological systems. Genotype imputation offers a remedy by inferring unobserved genotypes from observed data. However, conventional imputation methods typically rely on external reference panels constructed from complete genome sequences of hundreds of individuals, a costly approach largely inaccessible for nonmodel organisms. Moreover, these methods generally overlook novel genomic positions not captured in existing panels. To overcome these limitations, we developed Retriever, a method for constructing a chimeric reference panel that enables genotype imputation without the need for an external reference panel. Retriever constructs a chimeric reference panel directly from the target samples using a sliding window approach to identify and retrieve genomic partitions with complete data. By exploiting the complementary distribution of missing data across samples, Retriever assembles a panel that preserves local patterns of linkage disequilibrium and captures novel variants. When the Retriever-constructed panels are used with Beagle for genotype imputation, Retriever consistently achieves accuracy exceeding 95% across diverse datasets, including plants, animals, and fungi. By eliminating the need for costly external panels, Retriever provides an accessible and cost-effective solution that broadens the application of genomic analyses across various species.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":5.1,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12774824/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145201436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-07DOI: 10.1093/genetics/iyaf230
Christi Sagariya, Václav Bittner, Torsten Pook, Amit Roy, Milan Lstibůrek
Genomic relationship matrices computed from single nucleotide polymorphism (SNP) data are now widely used to estimate narrow-sense heritability (h2), yet the impact of genotyping error on these estimates is not well understood. We used stochastic simulation and supporting algebra to examine this impact and its interplay with marker density. Starting from a diploid founder population with 300 additive quantitative trait loci, we simulated SNP panels with densities ranging from 6.25 to 50 SNPs per cM and traits with true h2 of either 0.2 or 0.6. Genotypes were then altered at error rates ε=0-1 under three error kernels. For each of 100 simulation replicates, we calculated the genomic relationship matrix using VanRaden's method and estimated h2 with restricted maximum likelihood (REML). In the absence of error, low-density marker panels underestimated h2. Sparse panels were also the most tolerant up to ε≈0.1 yet still underestimated h2. Conversely, the densest panel recovered the true h2 when ε=0, but even a small error ε>0.01 caused an upward bias. The analysis reveals that all distortions are attributable to: (i) a shift in the mean off-diagonal elements of the genomic relationship matrix with magnitude (1-ε)2 and (ii) a change in the ratio between the mean diagonal and mean off-diagonal elements of the genomic relationship matrix. When ε≳0.6, every kernel pushed h2 toward zero. Thus, even modest genotyping error can inflate or deflate additive genetic variance estimates. SNP panels therefore require rigorous laboratory quality control, error-aware imputation, and statistical models that account for genotype uncertainty when estimating h2.
{"title":"Impact of erroneous marker data on the accuracy of narrow-sense heritability.","authors":"Christi Sagariya, Václav Bittner, Torsten Pook, Amit Roy, Milan Lstibůrek","doi":"10.1093/genetics/iyaf230","DOIUrl":"10.1093/genetics/iyaf230","url":null,"abstract":"<p><p>Genomic relationship matrices computed from single nucleotide polymorphism (SNP) data are now widely used to estimate narrow-sense heritability (h2), yet the impact of genotyping error on these estimates is not well understood. We used stochastic simulation and supporting algebra to examine this impact and its interplay with marker density. Starting from a diploid founder population with 300 additive quantitative trait loci, we simulated SNP panels with densities ranging from 6.25 to 50 SNPs per cM and traits with true h2 of either 0.2 or 0.6. Genotypes were then altered at error rates ε=0-1 under three error kernels. For each of 100 simulation replicates, we calculated the genomic relationship matrix using VanRaden's method and estimated h2 with restricted maximum likelihood (REML). In the absence of error, low-density marker panels underestimated h2. Sparse panels were also the most tolerant up to ε≈0.1 yet still underestimated h2. Conversely, the densest panel recovered the true h2 when ε=0, but even a small error ε>0.01 caused an upward bias. The analysis reveals that all distortions are attributable to: (i) a shift in the mean off-diagonal elements of the genomic relationship matrix with magnitude (1-ε)2 and (ii) a change in the ratio between the mean diagonal and mean off-diagonal elements of the genomic relationship matrix. When ε≳0.6, every kernel pushed h2 toward zero. Thus, even modest genotyping error can inflate or deflate additive genetic variance estimates. SNP panels therefore require rigorous laboratory quality control, error-aware imputation, and statistical models that account for genotype uncertainty when estimating h2.</p>","PeriodicalId":48925,"journal":{"name":"Genetics","volume":" ","pages":""},"PeriodicalIF":5.1,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12774826/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145349483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}