Marcel Tarbier, Sebastian D Mackowiak, Vaishnovi Sekar, Franziska Bonath, Etka Yapar, Bastian Fromm, Omid R Faridani, Inna Biryukova, Marc R Friedländer
microRNAs are small RNA molecules that can repress the expression of protein-coding genes post-transcriptionally. Previous studies have shown that microRNAs can also have alternative functions, including influencing target expression variation and covariation, but these observations have been limited to a few microRNAs. Here we systematically study microRNA alternative functions in mouse embryonic stem cells (mESCs) by genetically deleting Drosha, leading to global loss of microRNAs. We apply complementary single-cell RNA-seq methods to study the variation of the targets and the microRNAs themselves, and transcriptional inhibition to measure target half-lives. We find that microRNAs form four distinct coexpression groups across single cells. In particular, the mir-290 and the mir-182 genome clusters are abundantly, variably, and inversely expressed. Some cells have global biases toward specific miRNAs originating from either end of the hairpin precursor, suggesting the presence of unknown regulatory cofactors. We find that microRNAs generally increase variation and covariation of their targets at the RNA level, but we also find microRNAs such as miR-182 that appear to have opposite functions. In particular, microRNAs that are themselves variable in expression, such as miR-291a, are more likely to induce covariations. In summary, we apply genetic perturbation and multiomics to give the first global picture of microRNA dynamics at the single-cell level.
{"title":"Landscape of microRNA and target expression variation and covariation in single mouse embryonic stem cells.","authors":"Marcel Tarbier, Sebastian D Mackowiak, Vaishnovi Sekar, Franziska Bonath, Etka Yapar, Bastian Fromm, Omid R Faridani, Inna Biryukova, Marc R Friedländer","doi":"10.1101/gr.279914.124","DOIUrl":"10.1101/gr.279914.124","url":null,"abstract":"<p><p>microRNAs are small RNA molecules that can repress the expression of protein-coding genes post-transcriptionally. Previous studies have shown that microRNAs can also have alternative functions, including influencing target expression variation and covariation, but these observations have been limited to a few microRNAs. Here we systematically study microRNA alternative functions in mouse embryonic stem cells (mESCs) by genetically deleting <i>Drosha</i>, leading to global loss of microRNAs. We apply complementary single-cell RNA-seq methods to study the variation of the targets and the microRNAs themselves, and transcriptional inhibition to measure target half-lives. We find that microRNAs form four distinct coexpression groups across single cells. In particular, the <i>mir-290</i> and the <i>mir-182</i> genome clusters are abundantly, variably, and inversely expressed. Some cells have global biases toward specific miRNAs originating from either end of the hairpin precursor, suggesting the presence of unknown regulatory cofactors. We find that microRNAs generally increase variation and covariation of their targets at the RNA level, but we also find microRNAs such as miR-182 that appear to have opposite functions. In particular, microRNAs that are themselves variable in expression, such as miR-291a, are more likely to induce covariations. In summary, we apply genetic perturbation and multiomics to give the first global picture of microRNA dynamics at the single-cell level.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"291-302"},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863184/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145959287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenhai Zhang, Yuansheng Liu, Guangyi Li, Jialu Xu, Enlian Chen, Alexander Schönhuth, Xiao Luo
Microbes are omnipresent, thriving in a range of habitats, from oceans to soils, and even within our gastrointestinal tracts. They play a vital role in maintaining ecological equilibrium and promoting the health of their hosts. Consequently, understanding the diversity in terms of strains in microbial communities is crucial, as variations between strains can lead to different phenotypic expressions or diverse biological functions. However, current methods for taxonomic classification from metagenomic sequencing data have several limitations, including their reliance solely on species resolution, support for either short or long reads, or their confinement to a given single species. Most notably, most existing strain-level taxonomic classifiers rely on the sequence representation of multiple linear reference genomes, which fails to capture the sequence correlations among these genomes, potentially introducing ambiguity and biases in metagenomic profiling. Here, we present PanTax, a pangenome graph-based taxonomic profiler that overcomes the shortcomings of sequence-based approaches, because pangenome graphs possess the capability to depict the full range of genetic variability present across multiple evolutionarily or environmentally related genomes. PanTax provides a comprehensive solution to taxonomic classification for strain resolution, compatibility with both short and long reads, and compatibility with single or multiple species. Extensive benchmarking results demonstrate that PanTax drastically outperforms state-of-the-art approaches, primarily evidenced by its significantly higher F1 score at the strain level, while maintaining comparable or better performance in other aspects across various data sets.
{"title":"Strain-level metagenomic profiling using pangenome graphs with PanTax.","authors":"Wenhai Zhang, Yuansheng Liu, Guangyi Li, Jialu Xu, Enlian Chen, Alexander Schönhuth, Xiao Luo","doi":"10.1101/gr.280858.125","DOIUrl":"10.1101/gr.280858.125","url":null,"abstract":"<p><p>Microbes are omnipresent, thriving in a range of habitats, from oceans to soils, and even within our gastrointestinal tracts. They play a vital role in maintaining ecological equilibrium and promoting the health of their hosts. Consequently, understanding the diversity in terms of strains in microbial communities is crucial, as variations between strains can lead to different phenotypic expressions or diverse biological functions. However, current methods for taxonomic classification from metagenomic sequencing data have several limitations, including their reliance solely on species resolution, support for either short or long reads, or their confinement to a given single species. Most notably, most existing strain-level taxonomic classifiers rely on the sequence representation of multiple linear reference genomes, which fails to capture the sequence correlations among these genomes, potentially introducing ambiguity and biases in metagenomic profiling. Here, we present PanTax, a pangenome graph-based taxonomic profiler that overcomes the shortcomings of sequence-based approaches, because pangenome graphs possess the capability to depict the full range of genetic variability present across multiple evolutionarily or environmentally related genomes. PanTax provides a comprehensive solution to taxonomic classification for strain resolution, compatibility with both short and long reads, and compatibility with single or multiple species. Extensive benchmarking results demonstrate that PanTax drastically outperforms state-of-the-art approaches, primarily evidenced by its significantly higher F1 score at the strain level, while maintaining comparable or better performance in other aspects across various data sets.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"405-420"},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863173/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145984796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Margarita Geleta, Daniel Mas Montserrat, Xavier Giro-I-Nieto, Alexander G Ioannidis
Modern biobanks are providing numerous high-resolution genomic sequences of diverse populations. In order to account for diverse and admixed populations, new algorithmic tools are needed in order to properly capture the genetic composition of populations. Here, we explore deep learning techniques, namely, variational autoencoders (VAEs), to process genomic data from a population perspective. We show the power of VAEs for a variety of tasks relating to the interpretation, compression, classification, and simulation of genomic data with several worldwide whole genome data sets from both humans and canids, and evaluate the performance of the proposed applications with and without ancestry conditioning. The unsupervised setting of autoencoders allows for the detection and learning of granular population structure and inferring of informative latent factors. The learned latent spaces of VAEs are able to capture and represent differentiated Gaussian-like clusters of samples with similar genetic composition on a fine scale from single nucleotide polymorphisms (SNPs), enabling applications in dimensionality reduction and data simulation. These individual genotype sequences can then be decomposed into latent representations and reconstruction errors (residuals), which provide a sparse representation useful for lossless compression. We show that different populations have differentiated compression ratios and classification accuracies. Additionally, we analyze the entropy of the SNP data, its effect on compression across populations, and its relation to historical migrations, and we show how to introduce autoencoders into existing compression pipelines.
{"title":"Autoencoders for genomic variation analysis.","authors":"Margarita Geleta, Daniel Mas Montserrat, Xavier Giro-I-Nieto, Alexander G Ioannidis","doi":"10.1101/gr.280086.124","DOIUrl":"10.1101/gr.280086.124","url":null,"abstract":"<p><p>Modern biobanks are providing numerous high-resolution genomic sequences of diverse populations. In order to account for diverse and admixed populations, new algorithmic tools are needed in order to properly capture the genetic composition of populations. Here, we explore deep learning techniques, namely, variational autoencoders (VAEs), to process genomic data from a population perspective. We show the power of VAEs for a variety of tasks relating to the interpretation, compression, classification, and simulation of genomic data with several worldwide whole genome data sets from both humans and canids, and evaluate the performance of the proposed applications with and without ancestry conditioning. The unsupervised setting of autoencoders allows for the detection and learning of granular population structure and inferring of informative latent factors. The learned latent spaces of VAEs are able to capture and represent differentiated Gaussian-like clusters of samples with similar genetic composition on a fine scale from single nucleotide polymorphisms (SNPs), enabling applications in dimensionality reduction and data simulation. These individual genotype sequences can then be decomposed into latent representations and reconstruction errors (residuals), which provide a sparse representation useful for lossless compression. We show that different populations have differentiated compression ratios and classification accuracies. Additionally, we analyze the entropy of the SNP data, its effect on compression across populations, and its relation to historical migrations, and we show how to introduce autoencoders into existing compression pipelines.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"348-360"},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863191/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146010065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhitao Huang, Ruiqing Zheng, Pengzhen Jia, Xuhua Yan, Jinmiao Chen, Min Li
Currently, with the emergence of abundant single-cell multiomics data, there is a trend where labels are transferred from well-annotated scRNA-seq data to less-annotated omics data, such as scATAC-seq. This approach leverages the gene expression profiles available in scRNA-seq to help annotate common cell types and even novel cell types for other omics data. However, the heterogeneous features between scRNA-seq and scATAC-seq pose challenges for identifying different cell types, which hinders the discovery of novel types. In this study, we propose a new label transfer tool scSHEFT, which simultaneously considers gene expression count data, peak count data, and Gene Activity Scores as inputs to bridge the gap of heterogeneous features. Specifically, we transform scATAC-seq data into Gene Activity Scores based on prior knowledge to harmonize heterogeneous features. As the feature transformation would result in information loss, we introduce the raw ATAC-seq embeddings to preserve the original information. To achieve a balance between interomics alignment and intraomics heterogeneity, we propose a dual alignment strategy. Specifically, scSHEFT employs an anchor-based approach to align interomics anchor pairs and a contrastive-based strategy to preserve cellular heterogeneity within each omics layer. Benchmarking scSHEFT against 11 state-of-the-art methods across seven data sets demonstrates its superiority in handling data sets of varying scales and technical noises.
{"title":"scSHEFT enables multiomics label transfer from scRNA-seq to scATAC-seq through dual alignment.","authors":"Zhitao Huang, Ruiqing Zheng, Pengzhen Jia, Xuhua Yan, Jinmiao Chen, Min Li","doi":"10.1101/gr.280410.125","DOIUrl":"10.1101/gr.280410.125","url":null,"abstract":"<p><p>Currently, with the emergence of abundant single-cell multiomics data, there is a trend where labels are transferred from well-annotated scRNA-seq data to less-annotated omics data, such as scATAC-seq. This approach leverages the gene expression profiles available in scRNA-seq to help annotate common cell types and even novel cell types for other omics data. However, the heterogeneous features between scRNA-seq and scATAC-seq pose challenges for identifying different cell types, which hinders the discovery of novel types. In this study, we propose a new label transfer tool scSHEFT, which simultaneously considers gene expression count data, peak count data, and Gene Activity Scores as inputs to bridge the gap of heterogeneous features. Specifically, we transform scATAC-seq data into Gene Activity Scores based on prior knowledge to harmonize heterogeneous features. As the feature transformation would result in information loss, we introduce the raw ATAC-seq embeddings to preserve the original information. To achieve a balance between interomics alignment and intraomics heterogeneity, we propose a dual alignment strategy. Specifically, scSHEFT employs an anchor-based approach to align interomics anchor pairs and a contrastive-based strategy to preserve cellular heterogeneity within each omics layer. Benchmarking scSHEFT against 11 state-of-the-art methods across seven data sets demonstrates its superiority in handling data sets of varying scales and technical noises.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"387-396"},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863186/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146010073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Milad Razavi-Mohseni, Weitai Huang, Yu A Guo, Dustin Shigaki, Shamaine Wei Ting Ho, Patrick Tan, Anders J Skanderup, Michael A Beer
{"title":"Corrigendum: Machine learning identifies activation of RUNX/AP-1 as drivers of mesenchymal and fibrotic regulatory programs in gastric cancer.","authors":"Milad Razavi-Mohseni, Weitai Huang, Yu A Guo, Dustin Shigaki, Shamaine Wei Ting Ho, Patrick Tan, Anders J Skanderup, Michael A Beer","doi":"10.1101/gr.281294.125","DOIUrl":"10.1101/gr.281294.125","url":null,"abstract":"","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"36 2","pages":"432"},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863181/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146112776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David Peede, Mayra M Bañuelos, Jazeps Medina Tretmanis, Miriam Miyagi, Emilia Huerta-Sánchez
The exchange and subsequent incorporation of genetic material between distinct lineages, known as introgression, has emerged as a crucial concept in understanding human evolutionary history. With the advent of high-throughput sequencing technologies and the publication of the draft Neanderthal genome in 2010, Green and colleagues were able to demonstrate the presence of Neanderthal DNA in present-day Eurasians, a signature of past interbreeding events with archaic humans. This integration of genetic material from extinct human relatives, such as Neanderthals and Denisovans, into the genomes of modern humans due to historical gene flow events is known as archaic introgression. As new methods and data sets uncover a more complex intermingling between our ancestors and archaic humans than previously thought, the relevance of archaic introgression has only increased, opening exciting new avenues for studying human evolution. Here, we review recent methodological advances in the study of archaic introgression. We begin by providing an overview of the genealogical and genomic signatures left behind by introgression events before reviewing recent methods for studying archaic introgression by outlining their conceptual approaches, data requirements, and types of inferences they support. Finally, we provide recommendations for which methods are most appropriate given a research question and data set, discuss outstanding challenges, and suggest future lines of research to advance the study of archaic introgression.
{"title":"Recent advances in methods to characterize archaic introgression in modern humans.","authors":"David Peede, Mayra M Bañuelos, Jazeps Medina Tretmanis, Miriam Miyagi, Emilia Huerta-Sánchez","doi":"10.1101/gr.278993.124","DOIUrl":"10.1101/gr.278993.124","url":null,"abstract":"<p><p>The exchange and subsequent incorporation of genetic material between distinct lineages, known as introgression, has emerged as a crucial concept in understanding human evolutionary history. With the advent of high-throughput sequencing technologies and the publication of the draft Neanderthal genome in 2010, Green and colleagues were able to demonstrate the presence of Neanderthal DNA in present-day Eurasians, a signature of past interbreeding events with archaic humans. This integration of genetic material from extinct human relatives, such as Neanderthals and Denisovans, into the genomes of modern humans due to historical gene flow events is known as archaic introgression. As new methods and data sets uncover a more complex intermingling between our ancestors and archaic humans than previously thought, the relevance of archaic introgression has only increased, opening exciting new avenues for studying human evolution. Here, we review recent methodological advances in the study of archaic introgression. We begin by providing an overview of the genealogical and genomic signatures left behind by introgression events before reviewing recent methods for studying archaic introgression by outlining their conceptual approaches, data requirements, and types of inferences they support. Finally, we provide recommendations for which methods are most appropriate given a research question and data set, discuss outstanding challenges, and suggest future lines of research to advance the study of archaic introgression.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"239-256"},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863057/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145989146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wiesław Babik, Katarzyna Dudek, Gemma Palomar, Marzena Marszałek, Grzegorz Dubin, Maximina H Yun, Magdalena Migalska
Major Histocompatibility Complex (MHC) molecules are central to vertebrate adaptive immunity, and MHC genes serve as key models in evolutionary genomics, offering insight into birth-and-death evolution, gene duplication, and the maintenance of genetic diversity. However, the organization and evolution of the MHC in species with giant genomes, such as salamanders, remain poorly understood. Here, we use comparative genomics, expression across multiple ontogenetic stages and tissues, as well as polymorphism data to investigate MHC evolution in newts. Contrary to earlier suggestions of a massively expanded MHC in salamanders, we find that the core MHC region remains relatively compact, demonstrating that genome gigantism does not scale proportionally in this region. Our finding also challenges the model of coevolution between a single classical MHC-Ia gene and antigen processing genes (APGs), revealing instead several polymorphic and highly expressed putative MHC-Ia located at varying distances from the APGs. MHC-I genes exhibit lineage-specific duplications and signs of concerted evolution, resulting in poorly resolved phylogenies. In contrast, MHC-II genes are more conserved and exhibit extensive trans-species polymorphism. Expression and polymorphism patterns identify putative nonclassical MHC-Ib genes, likely repeatedly derived from MHC-Ia genes, paralleling patterns seen in mammals but contrasting with the situation in fish and Xenopus frogs. In all seven studied species, some MHC-Ib genes show high relative expression during the larval stage but not at adulthood, suggesting a role in larval immunity. Our results underscore the importance of salamanders for understanding the evolution of complex regions in giant genomes and the architecture of the tetrapod MHC.
{"title":"MHC in newts illuminates the evolutionary dynamics of complex regions in giant genomes.","authors":"Wiesław Babik, Katarzyna Dudek, Gemma Palomar, Marzena Marszałek, Grzegorz Dubin, Maximina H Yun, Magdalena Migalska","doi":"10.1101/gr.281127.125","DOIUrl":"10.1101/gr.281127.125","url":null,"abstract":"<p><p>Major Histocompatibility Complex (MHC) molecules are central to vertebrate adaptive immunity, and MHC genes serve as key models in evolutionary genomics, offering insight into birth-and-death evolution, gene duplication, and the maintenance of genetic diversity. However, the organization and evolution of the MHC in species with giant genomes, such as salamanders, remain poorly understood. Here, we use comparative genomics, expression across multiple ontogenetic stages and tissues, as well as polymorphism data to investigate MHC evolution in newts. Contrary to earlier suggestions of a massively expanded MHC in salamanders, we find that the core MHC region remains relatively compact, demonstrating that genome gigantism does not scale proportionally in this region. Our finding also challenges the model of coevolution between a single classical MHC-Ia gene and antigen processing genes (APGs), revealing instead several polymorphic and highly expressed putative MHC-Ia located at varying distances from the APGs. MHC-I genes exhibit lineage-specific duplications and signs of concerted evolution, resulting in poorly resolved phylogenies. In contrast, MHC-II genes are more conserved and exhibit extensive trans-species polymorphism. Expression and polymorphism patterns identify putative nonclassical MHC-Ib genes, likely repeatedly derived from MHC-Ia genes, paralleling patterns seen in mammals but contrasting with the situation in fish and <i>Xenopus</i> frogs. In all seven studied species, some MHC-Ib genes show high relative expression during the larval stage but not at adulthood, suggesting a role in larval immunity. Our results underscore the importance of salamanders for understanding the evolution of complex regions in giant genomes and the architecture of the tetrapod MHC.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"303-317"},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863176/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145959226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deciphering the relationships between cis-regulatory elements (CREs) and target gene expression has been a long-standing unsolved problem in molecular biology, and the dynamics of CREs in different cell types make this problem more challenging. To address this challenge, we propose a scalable computational framework for predicting gene expression (ScPGE) from discrete candidate CREs (cCREs). ScPGE assembles DNA sequences, transcription factor (TF) binding scores, and epigenomic tracks from discrete cCREs into three-dimensional tensors, and then models the relationships between cCREs and genes by combining convolutional neural networks with transformers. Compared with current state-of-the-art models, ScPGE exhibits superior performance in predicting gene expression and yields higher accuracy in identifying active enhancer-gene interactions through attention mechanisms. By comprehensively analyzing ScPGE's predictions, we find a pattern in true positives (TPs) that the regulatory effect of cCREs on genes decreases with distance. Inspired by the pattern, we design two methods to enhance the ability to capture distal cCRE-gene interactions by incorporating chromatin loops into the ScPGE model. Furthermore, ScPGE accurately discovers some crucial TF motifs within prioritized cCREs and reveals the different regulatory types of these cCREs.
{"title":"A scalable computational framework for predicting gene expression from candidate <i>cis</i>-regulatory elements.","authors":"Qinhu Zhang, Siguo Wang, Zhipeng Li, Wenzheng Bao, Wenjian Liu, De-Shuang Huang","doi":"10.1101/gr.281219.125","DOIUrl":"10.1101/gr.281219.125","url":null,"abstract":"<p><p>Deciphering the relationships between <i>cis</i>-regulatory elements (CREs) and target gene expression has been a long-standing unsolved problem in molecular biology, and the dynamics of CREs in different cell types make this problem more challenging. To address this challenge, we propose a <u>sc</u>alable computational framework for <u>p</u>redicting <u>g</u>ene <u>e</u>xpression (ScPGE) from discrete candidate CREs (cCREs). ScPGE assembles DNA sequences, transcription factor (TF) binding scores, and epigenomic tracks from discrete cCREs into three-dimensional tensors, and then models the relationships between cCREs and genes by combining convolutional neural networks with transformers. Compared with current state-of-the-art models, ScPGE exhibits superior performance in predicting gene expression and yields higher accuracy in identifying active enhancer-gene interactions through attention mechanisms. By comprehensively analyzing ScPGE's predictions, we find a pattern in true positives (TPs) that the regulatory effect of cCREs on genes decreases with distance. Inspired by the pattern, we design two methods to enhance the ability to capture distal cCRE-gene interactions by incorporating chromatin loops into the ScPGE model. Furthermore, ScPGE accurately discovers some crucial TF motifs within prioritized cCREs and reveals the different regulatory types of these cCREs.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"361-374"},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863192/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145989135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cheng Wang, Chase D Brownstein, Wenjun Chen, Zufa Ding, Dan Yu, Yu Deng, Chenguang Feng, Thomas J Near, Shunping He, Liandong Yang
Genomic evolution can propel and restrict species diversification. Rapid molecular evolution and genomic rearrangement is often associated with increased species diversification, but whether genome structural evolution shows a slow tempo in long-lived, species-poor lineages remains unclear. Here, we present two chromosome-level genomes of gars, a lineage of seven living species of freshwater fishes that are nearly identical in anatomy to extinct species from tens of millions of years ago. Using the new genomes, we show that gars have the slowest rates of genomic structural and sequence evolution of all vertebrates. In species of the two living gar genera Atractosteus and Lepisosteus, 83.35% of the genomes remain identical even though they diverged over 100 million years ago. Genome size variation among gars is almost entirely attributable to single base pair insertions and deletions. Yet, we also detect inflated GC repeat numbers on Chromosomes 14 and 23 of Atractosteus spatula that are absent in Lepisosteus and show that gar microchromosomes and macrochromosomes display different rates of structural evolution. Our analyses suggest that the genomic stability of gars, which may explain the ability of deeply divergent gar species to hybridize and has contributed to their higher structural similarity to tetrapod genomes than those of the far more closely related teleost fishes, may result from very low rates of transposable element origination and high inactivity compared to other vertebrates. Beyond providing a reference point for comparative vertebrate genomic studies, the new gar genomes illuminate a structural component of slow genomic evolution in living fossils and molecular mechanisms that may underlie exceptional genome stability.
{"title":"Stable genome structures in living fossil fishes.","authors":"Cheng Wang, Chase D Brownstein, Wenjun Chen, Zufa Ding, Dan Yu, Yu Deng, Chenguang Feng, Thomas J Near, Shunping He, Liandong Yang","doi":"10.1101/gr.280800.125","DOIUrl":"10.1101/gr.280800.125","url":null,"abstract":"<p><p>Genomic evolution can propel and restrict species diversification. Rapid molecular evolution and genomic rearrangement is often associated with increased species diversification, but whether genome structural evolution shows a slow tempo in long-lived, species-poor lineages remains unclear. Here, we present two chromosome-level genomes of gars, a lineage of seven living species of freshwater fishes that are nearly identical in anatomy to extinct species from tens of millions of years ago. Using the new genomes, we show that gars have the slowest rates of genomic structural and sequence evolution of all vertebrates. In species of the two living gar genera <i>Atractosteus</i> and <i>Lepisosteus</i>, 83.35% of the genomes remain identical even though they diverged over 100 million years ago. Genome size variation among gars is almost entirely attributable to single base pair insertions and deletions. Yet, we also detect inflated GC repeat numbers on Chromosomes 14 and 23 of <i>Atractosteus spatula</i> that are absent in <i>Lepisosteus</i> and show that gar microchromosomes and macrochromosomes display different rates of structural evolution. Our analyses suggest that the genomic stability of gars, which may explain the ability of deeply divergent gar species to hybridize and has contributed to their higher structural similarity to tetrapod genomes than those of the far more closely related teleost fishes, may result from very low rates of transposable element origination and high inactivity compared to other vertebrates. Beyond providing a reference point for comparative vertebrate genomic studies, the new gar genomes illuminate a structural component of slow genomic evolution in living fossils and molecular mechanisms that may underlie exceptional genome stability.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"318-329"},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863190/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145965817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yunxiao Ren, Ming Hu, Yang E Li, Andrew A Pieper, Jeffrey Cummings, Feixiong Cheng
Alzheimer's disease (AD) is a complex and poorly understood neurodegenerative disorder that lacks sufficiently effective treatments. Computational and integrative analyses that leverage multiomics data provide a promising strategy to uncover disease mechanisms and identify therapeutic opportunities. Here, we develop a cell type-specific regulatory atlas of the human middle temporal gyrus via leveraging single-nucleus RNA-seq (1,197,032 nuclei) and ATAC-seq (740,875 nuclei) datasets from 84 donors across four stages of AD neuropathological change (ADNC). We observe differential gene expression for six major cell types intensified at severe ADNC. Integrating peak-to-gene linkages and motif enrichment analyses, we reconstruct transcription factor (TF)-target gene networks across six major brain cell types. By integrating genome-wide association study (GWAS) loci with cell type-specific cis-regulatory DNA elements (CREs), we pinpoint 141 ADNC-associated genes. Using gene set enrichment analysis (GSEA) and network proximity analysis, we further identify nine candidate repurposable drugs that were associated with these ADNC-related genes. In summary, this cell type-specific multiomics atlas provides a comprehensive resource for mechanistic understanding, target prioritization, and therapeutic hypothesis generation in AD and AD-related dementia if broadly applied.
{"title":"Cell type-specific gene regulatory atlas prioritizes drug targets and repurposable medicines in Alzheimer's disease.","authors":"Yunxiao Ren, Ming Hu, Yang E Li, Andrew A Pieper, Jeffrey Cummings, Feixiong Cheng","doi":"10.1101/gr.280436.125","DOIUrl":"https://doi.org/10.1101/gr.280436.125","url":null,"abstract":"<p><p>Alzheimer's disease (AD) is a complex and poorly understood neurodegenerative disorder that lacks sufficiently effective treatments. Computational and integrative analyses that leverage multiomics data provide a promising strategy to uncover disease mechanisms and identify therapeutic opportunities. Here, we develop a cell type-specific regulatory atlas of the human middle temporal gyrus via leveraging single-nucleus RNA-seq (1,197,032 nuclei) and ATAC-seq (740,875 nuclei) datasets from 84 donors across four stages of AD neuropathological change (ADNC). We observe differential gene expression for six major cell types intensified at severe ADNC. Integrating peak-to-gene linkages and motif enrichment analyses, we reconstruct transcription factor (TF)-target gene networks across six major brain cell types. By integrating genome-wide association study (GWAS) loci with cell type-specific <i>cis</i>-regulatory DNA elements (CREs), we pinpoint 141 ADNC-associated genes. Using gene set enrichment analysis (GSEA) and network proximity analysis, we further identify nine candidate repurposable drugs that were associated with these ADNC-related genes. In summary, this cell type-specific multiomics atlas provides a comprehensive resource for mechanistic understanding, target prioritization, and therapeutic hypothesis generation in AD and AD-related dementia if broadly applied.</p>","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":""},"PeriodicalIF":5.5,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146018228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}