Genome research最新文献_第2页

Landscape of microRNA and target expression variation and covariation in single mouse embryonic stem cells. 单个小鼠胚胎干细胞中microRNA和靶标表达变异及共变异的景观。

IF 5.5 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research

Pub Date : 2026-02-03 DOI: 10.1101/gr.279914.124

Marcel Tarbier, Sebastian D Mackowiak, Vaishnovi Sekar, Franziska Bonath, Etka Yapar, Bastian Fromm, Omid R Faridani, Inna Biryukova, Marc R Friedländer

microRNAs are small RNA molecules that can repress the expression of protein-coding genes post-transcriptionally. Previous studies have shown that microRNAs can also have alternative functions, including influencing target expression variation and covariation, but these observations have been limited to a few microRNAs. Here we systematically study microRNA alternative functions in mouse embryonic stem cells (mESCs) by genetically deleting Drosha, leading to global loss of microRNAs. We apply complementary single-cell RNA-seq methods to study the variation of the targets and the microRNAs themselves, and transcriptional inhibition to measure target half-lives. We find that microRNAs form four distinct coexpression groups across single cells. In particular, the mir-290 and the mir-182 genome clusters are abundantly, variably, and inversely expressed. Some cells have global biases toward specific miRNAs originating from either end of the hairpin precursor, suggesting the presence of unknown regulatory cofactors. We find that microRNAs generally increase variation and covariation of their targets at the RNA level, but we also find microRNAs such as miR-182 that appear to have opposite functions. In particular, microRNAs that are themselves variable in expression, such as miR-291a, are more likely to induce covariations. In summary, we apply genetic perturbation and multiomics to give the first global picture of microRNA dynamics at the single-cell level.

microRNAs是一种小的RNA分子，可以在转录后抑制蛋白质编码基因的表达。先前的研究表明，microRNAs还可以具有其他功能，包括影响靶表达变异和共变异，但这些观察仅限于少数microRNAs。在这里，我们系统地研究了microRNA在小鼠胚胎干细胞（mESCs）中的替代功能，通过基因删除Drosha，导致microRNA的全局丢失。我们应用互补的单细胞RNA-seq方法来研究靶标和microrna本身的变化，并利用转录抑制来测量靶标的半衰期。我们发现microrna在单个细胞中形成四个不同的共表达组。特别是，mir-290和mir-182基因组簇是丰富的、可变的和负表达的。一些细胞对发夹前体两端的特定mirna具有全局偏倚，这表明存在未知的调节辅助因子。我们发现microrna通常在RNA水平上增加其靶标的变异和协变，但我们也发现microrna（如miR-182）似乎具有相反的功能。特别是，本身表达可变的microrna，如miR-291a，更有可能诱导协变。总之，我们应用遗传扰动和多组学在单细胞水平上给出了microRNA动力学的第一个全局图像。

{"title":"Landscape of microRNA and target expression variation and covariation in single mouse embryonic stem cells.","authors":"Marcel Tarbier, Sebastian D Mackowiak, Vaishnovi Sekar, Franziska Bonath, Etka Yapar, Bastian Fromm, Omid R Faridani, Inna Biryukova, Marc R Friedländer","doi":"10.1101/gr.279914.124","DOIUrl":"10.1101/gr.279914.124","url":null,"abstract":"microRNAs are small RNA molecules that can repress the expression of protein-coding genes post-transcriptionally. Previous studies have shown that microRNAs can also have alternative functions, including influencing target expression variation and covariation, but these observations have been limited to a few microRNAs. Here we systematically study microRNA alternative functions in mouse embryonic stem cells (mESCs) by genetically deleting Drosha, leading to global loss of microRNAs. We apply complementary single-cell RNA-seq methods to study the variation of the targets and the microRNAs themselves, and transcriptional inhibition to measure target half-lives. We find that microRNAs form four distinct coexpression groups across single cells. In particular, the mir-290 and the mir-182 genome clusters are abundantly, variably, and inversely expressed. Some cells have global biases toward specific miRNAs originating from either end of the hairpin precursor, suggesting the presence of unknown regulatory cofactors. We find that microRNAs generally increase variation and covariation of their targets at the RNA level, but we also find microRNAs such as miR-182 that appear to have opposite functions. In particular, microRNAs that are themselves variable in expression, such as miR-291a, are more likely to induce covariations. In summary, we apply genetic perturbation and multiomics to give the first global picture of microRNA dynamics at the single-cell level.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"291-302"},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863184/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145959287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Strain-level metagenomic profiling using pangenome graphs with PanTax. 使用PanTax的泛基因组图谱进行菌株水平宏基因组分析。

IF 5.5 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research

Pub Date : 2026-02-03 DOI: 10.1101/gr.280858.125

Wenhai Zhang, Yuansheng Liu, Guangyi Li, Jialu Xu, Enlian Chen, Alexander Schönhuth, Xiao Luo

Microbes are omnipresent, thriving in a range of habitats, from oceans to soils, and even within our gastrointestinal tracts. They play a vital role in maintaining ecological equilibrium and promoting the health of their hosts. Consequently, understanding the diversity in terms of strains in microbial communities is crucial, as variations between strains can lead to different phenotypic expressions or diverse biological functions. However, current methods for taxonomic classification from metagenomic sequencing data have several limitations, including their reliance solely on species resolution, support for either short or long reads, or their confinement to a given single species. Most notably, most existing strain-level taxonomic classifiers rely on the sequence representation of multiple linear reference genomes, which fails to capture the sequence correlations among these genomes, potentially introducing ambiguity and biases in metagenomic profiling. Here, we present PanTax, a pangenome graph-based taxonomic profiler that overcomes the shortcomings of sequence-based approaches, because pangenome graphs possess the capability to depict the full range of genetic variability present across multiple evolutionarily or environmentally related genomes. PanTax provides a comprehensive solution to taxonomic classification for strain resolution, compatibility with both short and long reads, and compatibility with single or multiple species. Extensive benchmarking results demonstrate that PanTax drastically outperforms state-of-the-art approaches, primarily evidenced by its significantly higher F1 score at the strain level, while maintaining comparable or better performance in other aspects across various data sets.

微生物无处不在，从海洋到土壤，甚至在我们的胃肠道中，它们在各种栖息地中茁壮成长。它们在维持生态平衡和促进宿主健康方面起着至关重要的作用。因此，了解微生物群落中菌株的多样性是至关重要的，因为菌株之间的差异可能导致不同的表型表达或不同的生物学功能。然而，目前基于宏基因组测序数据进行分类的方法存在一些局限性，包括它们仅依赖于物种分辨率，支持短或长读取，或者仅限于给定的单个物种。最值得注意的是，大多数现有的菌株水平分类器依赖于多个线性参考基因组的序列表示，这无法捕获这些基因组之间的序列相关性，可能会在宏基因组分析中引入歧义和偏差。在这里，我们提出了PanTax，一个基于泛基因组图的分类分析器，它克服了基于序列的方法的缺点，因为泛基因组图具有描述跨多个进化或环境相关基因组存在的全范围遗传变异的能力。PanTax提供了一个全面的解决方案，以分类分类的菌株分辨率，兼容短和长读取，并与单一或多个物种的兼容性。广泛的基准测试结果表明，PanTax的性能大大优于最先进的方法，主要证明了它在应变水平上的F1分数明显更高，同时在各种数据集的其他方面保持相当或更好的性能。

{"title":"Strain-level metagenomic profiling using pangenome graphs with PanTax.","authors":"Wenhai Zhang, Yuansheng Liu, Guangyi Li, Jialu Xu, Enlian Chen, Alexander Schönhuth, Xiao Luo","doi":"10.1101/gr.280858.125","DOIUrl":"10.1101/gr.280858.125","url":null,"abstract":"Microbes are omnipresent, thriving in a range of habitats, from oceans to soils, and even within our gastrointestinal tracts. They play a vital role in maintaining ecological equilibrium and promoting the health of their hosts. Consequently, understanding the diversity in terms of strains in microbial communities is crucial, as variations between strains can lead to different phenotypic expressions or diverse biological functions. However, current methods for taxonomic classification from metagenomic sequencing data have several limitations, including their reliance solely on species resolution, support for either short or long reads, or their confinement to a given single species. Most notably, most existing strain-level taxonomic classifiers rely on the sequence representation of multiple linear reference genomes, which fails to capture the sequence correlations among these genomes, potentially introducing ambiguity and biases in metagenomic profiling. Here, we present PanTax, a pangenome graph-based taxonomic profiler that overcomes the shortcomings of sequence-based approaches, because pangenome graphs possess the capability to depict the full range of genetic variability present across multiple evolutionarily or environmentally related genomes. PanTax provides a comprehensive solution to taxonomic classification for strain resolution, compatibility with both short and long reads, and compatibility with single or multiple species. Extensive benchmarking results demonstrate that PanTax drastically outperforms state-of-the-art approaches, primarily evidenced by its significantly higher F1 score at the strain level, while maintaining comparable or better performance in other aspects across various data sets.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"405-420"},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863173/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145984796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Autoencoders for genomic variation analysis. 基因组变异分析的自动编码器。

IF 5.5 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research

Pub Date : 2026-02-03 DOI: 10.1101/gr.280086.124

Margarita Geleta, Daniel Mas Montserrat, Xavier Giro-I-Nieto, Alexander G Ioannidis

Modern biobanks are providing numerous high-resolution genomic sequences of diverse populations. In order to account for diverse and admixed populations, new algorithmic tools are needed in order to properly capture the genetic composition of populations. Here, we explore deep learning techniques, namely, variational autoencoders (VAEs), to process genomic data from a population perspective. We show the power of VAEs for a variety of tasks relating to the interpretation, compression, classification, and simulation of genomic data with several worldwide whole genome data sets from both humans and canids, and evaluate the performance of the proposed applications with and without ancestry conditioning. The unsupervised setting of autoencoders allows for the detection and learning of granular population structure and inferring of informative latent factors. The learned latent spaces of VAEs are able to capture and represent differentiated Gaussian-like clusters of samples with similar genetic composition on a fine scale from single nucleotide polymorphisms (SNPs), enabling applications in dimensionality reduction and data simulation. These individual genotype sequences can then be decomposed into latent representations and reconstruction errors (residuals), which provide a sparse representation useful for lossless compression. We show that different populations have differentiated compression ratios and classification accuracies. Additionally, we analyze the entropy of the SNP data, its effect on compression across populations, and its relation to historical migrations, and we show how to introduce autoencoders into existing compression pipelines.

现代生物银行提供了大量不同人群的高分辨率基因组序列。为了考虑多样化和混合的种群，需要新的算法工具来适当地捕捉种群的遗传组成。在这里，我们探索深度学习技术，即变分自编码器（VAEs），从人口的角度处理基因组数据。我们用来自人类和犬科动物的几个全球全基因组数据集展示了VAEs在与基因组数据的解释、压缩、分类和模拟相关的各种任务中的能力，并评估了在有和没有祖先条件作用的情况下提出的应用程序的性能。自动编码器的无监督设置允许检测和学习颗粒种群结构和推断信息潜在因素。VAEs的学习潜空间能够从单核苷酸多态性（snp）中捕获和表示具有相似遗传组成的样本的差异化高斯类簇，从而实现降维和数据模拟的应用。然后，这些个体基因型序列可以分解为潜在表示和重建误差（残差），这为无损压缩提供了有用的稀疏表示。我们发现不同的种群具有不同的压缩比和分类精度。此外，我们分析了SNP数据的熵，它对跨种群压缩的影响，以及它与历史迁移的关系，我们展示了如何将自编码器引入现有的压缩管道。

{"title":"Autoencoders for genomic variation analysis.","authors":"Margarita Geleta, Daniel Mas Montserrat, Xavier Giro-I-Nieto, Alexander G Ioannidis","doi":"10.1101/gr.280086.124","DOIUrl":"10.1101/gr.280086.124","url":null,"abstract":"Modern biobanks are providing numerous high-resolution genomic sequences of diverse populations. In order to account for diverse and admixed populations, new algorithmic tools are needed in order to properly capture the genetic composition of populations. Here, we explore deep learning techniques, namely, variational autoencoders (VAEs), to process genomic data from a population perspective. We show the power of VAEs for a variety of tasks relating to the interpretation, compression, classification, and simulation of genomic data with several worldwide whole genome data sets from both humans and canids, and evaluate the performance of the proposed applications with and without ancestry conditioning. The unsupervised setting of autoencoders allows for the detection and learning of granular population structure and inferring of informative latent factors. The learned latent spaces of VAEs are able to capture and represent differentiated Gaussian-like clusters of samples with similar genetic composition on a fine scale from single nucleotide polymorphisms (SNPs), enabling applications in dimensionality reduction and data simulation. These individual genotype sequences can then be decomposed into latent representations and reconstruction errors (residuals), which provide a sparse representation useful for lossless compression. We show that different populations have differentiated compression ratios and classification accuracies. Additionally, we analyze the entropy of the SNP data, its effect on compression across populations, and its relation to historical migrations, and we show how to introduce autoencoders into existing compression pipelines.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"348-360"},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863191/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146010065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

scSHEFT enables multiomics label transfer from scRNA-seq to scATAC-seq through dual alignment. 通过双比对，scSHEFT可以将多组学标签从scRNA-seq转移到scATAC-seq。

IF 5.5 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research

Pub Date : 2026-02-03 DOI: 10.1101/gr.280410.125

Zhitao Huang, Ruiqing Zheng, Pengzhen Jia, Xuhua Yan, Jinmiao Chen, Min Li

Currently, with the emergence of abundant single-cell multiomics data, there is a trend where labels are transferred from well-annotated scRNA-seq data to less-annotated omics data, such as scATAC-seq. This approach leverages the gene expression profiles available in scRNA-seq to help annotate common cell types and even novel cell types for other omics data. However, the heterogeneous features between scRNA-seq and scATAC-seq pose challenges for identifying different cell types, which hinders the discovery of novel types. In this study, we propose a new label transfer tool scSHEFT, which simultaneously considers gene expression count data, peak count data, and Gene Activity Scores as inputs to bridge the gap of heterogeneous features. Specifically, we transform scATAC-seq data into Gene Activity Scores based on prior knowledge to harmonize heterogeneous features. As the feature transformation would result in information loss, we introduce the raw ATAC-seq embeddings to preserve the original information. To achieve a balance between interomics alignment and intraomics heterogeneity, we propose a dual alignment strategy. Specifically, scSHEFT employs an anchor-based approach to align interomics anchor pairs and a contrastive-based strategy to preserve cellular heterogeneity within each omics layer. Benchmarking scSHEFT against 11 state-of-the-art methods across seven data sets demonstrates its superiority in handling data sets of varying scales and technical noises.

目前，随着丰富的单细胞多组学数据的出现，有一种趋势是标签从注释良好的scRNA-seq数据转移到注释较少的组学数据，如scATAC-seq。这种方法利用scRNA-seq中可用的基因表达谱来帮助注释常见的细胞类型，甚至是其他组学数据的新细胞类型。然而，scRNA-seq和scATAC-seq之间的异质性特征给识别不同的细胞类型带来了挑战，这阻碍了新类型的发现。在本研究中，我们提出了一种新的标签转移工具scSHEFT，它同时考虑基因表达计数数据、峰值计数数据和基因活动评分作为输入，以弥合异构特征的差距。具体来说，我们将scATAC-seq数据转换为基于先验知识的基因活动评分，以协调异构特征。由于特征变换会导致信息丢失，我们引入原始的ATAC-seq嵌入来保留原始信息。为了实现组间一致性和组内异质性之间的平衡，我们提出了一种双重一致性策略。具体来说，scSHEFT采用基于锚定的方法来对齐组间锚定对，并采用基于对比的策略来保持每个组学层内的细胞异质性。对七个数据集的11种最先进的方法进行基准测试表明，scSHEFT在处理不同规模和技术噪声的数据集方面具有优势。

{"title":"scSHEFT enables multiomics label transfer from scRNA-seq to scATAC-seq through dual alignment.","authors":"Zhitao Huang, Ruiqing Zheng, Pengzhen Jia, Xuhua Yan, Jinmiao Chen, Min Li","doi":"10.1101/gr.280410.125","DOIUrl":"10.1101/gr.280410.125","url":null,"abstract":"Currently, with the emergence of abundant single-cell multiomics data, there is a trend where labels are transferred from well-annotated scRNA-seq data to less-annotated omics data, such as scATAC-seq. This approach leverages the gene expression profiles available in scRNA-seq to help annotate common cell types and even novel cell types for other omics data. However, the heterogeneous features between scRNA-seq and scATAC-seq pose challenges for identifying different cell types, which hinders the discovery of novel types. In this study, we propose a new label transfer tool scSHEFT, which simultaneously considers gene expression count data, peak count data, and Gene Activity Scores as inputs to bridge the gap of heterogeneous features. Specifically, we transform scATAC-seq data into Gene Activity Scores based on prior knowledge to harmonize heterogeneous features. As the feature transformation would result in information loss, we introduce the raw ATAC-seq embeddings to preserve the original information. To achieve a balance between interomics alignment and intraomics heterogeneity, we propose a dual alignment strategy. Specifically, scSHEFT employs an anchor-based approach to align interomics anchor pairs and a contrastive-based strategy to preserve cellular heterogeneity within each omics layer. Benchmarking scSHEFT against 11 state-of-the-art methods across seven data sets demonstrates its superiority in handling data sets of varying scales and technical noises.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"387-396"},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863186/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146010073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Corrigendum: Machine learning identifies activation of RUNX/AP-1 as drivers of mesenchymal and fibrotic regulatory programs in gastric cancer. 更正：机器学习识别RUNX/AP-1的激活是胃癌间充质和纤维化调节程序的驱动因素。

IF 5.5 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research

Pub Date : 2026-02-03 DOI: 10.1101/gr.281294.125

Milad Razavi-Mohseni, Weitai Huang, Yu A Guo, Dustin Shigaki, Shamaine Wei Ting Ho, Patrick Tan, Anders J Skanderup, Michael A Beer

引用次数: 0

Recent advances in methods to characterize archaic introgression in modern humans. 表征现代人类古代基因渗入的方法的最新进展。

IF 5.5 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research

Pub Date : 2026-02-03 DOI: 10.1101/gr.278993.124

David Peede, Mayra M Bañuelos, Jazeps Medina Tretmanis, Miriam Miyagi, Emilia Huerta-Sánchez

The exchange and subsequent incorporation of genetic material between distinct lineages, known as introgression, has emerged as a crucial concept in understanding human evolutionary history. With the advent of high-throughput sequencing technologies and the publication of the draft Neanderthal genome in 2010, Green and colleagues were able to demonstrate the presence of Neanderthal DNA in present-day Eurasians, a signature of past interbreeding events with archaic humans. This integration of genetic material from extinct human relatives, such as Neanderthals and Denisovans, into the genomes of modern humans due to historical gene flow events is known as archaic introgression. As new methods and data sets uncover a more complex intermingling between our ancestors and archaic humans than previously thought, the relevance of archaic introgression has only increased, opening exciting new avenues for studying human evolution. Here, we review recent methodological advances in the study of archaic introgression. We begin by providing an overview of the genealogical and genomic signatures left behind by introgression events before reviewing recent methods for studying archaic introgression by outlining their conceptual approaches, data requirements, and types of inferences they support. Finally, we provide recommendations for which methods are most appropriate given a research question and data set, discuss outstanding challenges, and suggest future lines of research to advance the study of archaic introgression.

遗传物质在不同谱系之间的交换和随后的结合，被称为遗传渗入，已经成为理解人类进化史的一个关键概念。随着高通量测序技术的出现和2010年尼安德特人基因组草图的发表，格林和他的同事们能够证明尼安德特人DNA在当今欧亚人身上的存在，这是过去与古人类杂交事件的标志。由于历史上的基因流动事件，尼安德特人和丹尼索瓦人等已灭绝的人类亲属的遗传物质整合到现代人的基因组中，这种整合被称为古代基因渗入。随着新的方法和数据集揭示了我们的祖先和古人类之间比以前认为的更复杂的混合，古代基因渗入的相关性只会增加，为研究人类进化开辟了令人兴奋的新途径。在此，我们回顾了近年来研究古代渗渗的方法进展。我们首先概述了遗传渗入事件留下的家谱和基因组特征，然后回顾了研究古代遗传渗入的最新方法，概述了它们的概念方法、数据要求和它们支持的推断类型。最后，针对研究问题和数据集，我们提供了最合适的方法建议，讨论了突出的挑战，并提出了未来的研究方向，以推进古代渗透的研究。

{"title":"Recent advances in methods to characterize archaic introgression in modern humans.","authors":"David Peede, Mayra M Bañuelos, Jazeps Medina Tretmanis, Miriam Miyagi, Emilia Huerta-Sánchez","doi":"10.1101/gr.278993.124","DOIUrl":"10.1101/gr.278993.124","url":null,"abstract":"The exchange and subsequent incorporation of genetic material between distinct lineages, known as introgression, has emerged as a crucial concept in understanding human evolutionary history. With the advent of high-throughput sequencing technologies and the publication of the draft Neanderthal genome in 2010, Green and colleagues were able to demonstrate the presence of Neanderthal DNA in present-day Eurasians, a signature of past interbreeding events with archaic humans. This integration of genetic material from extinct human relatives, such as Neanderthals and Denisovans, into the genomes of modern humans due to historical gene flow events is known as archaic introgression. As new methods and data sets uncover a more complex intermingling between our ancestors and archaic humans than previously thought, the relevance of archaic introgression has only increased, opening exciting new avenues for studying human evolution. Here, we review recent methodological advances in the study of archaic introgression. We begin by providing an overview of the genealogical and genomic signatures left behind by introgression events before reviewing recent methods for studying archaic introgression by outlining their conceptual approaches, data requirements, and types of inferences they support. Finally, we provide recommendations for which methods are most appropriate given a research question and data set, discuss outstanding challenges, and suggest future lines of research to advance the study of archaic introgression.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"239-256"},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863057/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145989146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MHC in newts illuminates the evolutionary dynamics of complex regions in giant genomes. 蝾螈的MHC阐明了巨大基因组中复杂区域的进化动力学。

IF 5.5 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research

Pub Date : 2026-02-03 DOI: 10.1101/gr.281127.125

Wiesław Babik, Katarzyna Dudek, Gemma Palomar, Marzena Marszałek, Grzegorz Dubin, Maximina H Yun, Magdalena Migalska

Major Histocompatibility Complex (MHC) molecules are central to vertebrate adaptive immunity, and MHC genes serve as key models in evolutionary genomics, offering insight into birth-and-death evolution, gene duplication, and the maintenance of genetic diversity. However, the organization and evolution of the MHC in species with giant genomes, such as salamanders, remain poorly understood. Here, we use comparative genomics, expression across multiple ontogenetic stages and tissues, as well as polymorphism data to investigate MHC evolution in newts. Contrary to earlier suggestions of a massively expanded MHC in salamanders, we find that the core MHC region remains relatively compact, demonstrating that genome gigantism does not scale proportionally in this region. Our finding also challenges the model of coevolution between a single classical MHC-Ia gene and antigen processing genes (APGs), revealing instead several polymorphic and highly expressed putative MHC-Ia located at varying distances from the APGs. MHC-I genes exhibit lineage-specific duplications and signs of concerted evolution, resulting in poorly resolved phylogenies. In contrast, MHC-II genes are more conserved and exhibit extensive trans-species polymorphism. Expression and polymorphism patterns identify putative nonclassical MHC-Ib genes, likely repeatedly derived from MHC-Ia genes, paralleling patterns seen in mammals but contrasting with the situation in fish and Xenopus frogs. In all seven studied species, some MHC-Ib genes show high relative expression during the larval stage but not at adulthood, suggesting a role in larval immunity. Our results underscore the importance of salamanders for understanding the evolution of complex regions in giant genomes and the architecture of the tetrapod MHC.

主要组织相容性复合体（MHC）分子是脊椎动物适应性免疫的核心，MHC基因是进化基因组学的关键模型，提供了对出生和死亡进化、基因复制和遗传多样性维持的见解。然而，对于具有巨大基因组的物种（如蝾螈）MHC的组织和进化，人们仍然知之甚少。在这里，我们使用比较基因组学，在多个个体发育阶段和组织中的表达，以及多态性数据来研究蝾螈的MHC进化。与早期关于蝾螈MHC大规模扩展的观点相反，我们发现核心MHC区域仍然相对紧凑，这表明基因组巨人症在该区域没有成比例地扩大。我们的发现也挑战了单个经典MHC-Ia基因和抗原加工基因（APGs）之间的共同进化模型，揭示了几个多态和高表达的推定MHC-Ia位于不同距离的APGs。mhc - 1基因表现出谱系特异性复制和协同进化的迹象，导致系统发育问题的解决不佳。相比之下，MHC-II基因更为保守，并表现出广泛的跨物种多态性。表达和多态性模式确定了假定的非经典MHC-Ib基因，可能反复衍生于MHC-Ia基因，与哺乳动物中的模式相似，但与鱼类和非洲爪蟾的情况形成对比。在所有被研究的7个物种中，一些MHC-Ib基因在幼虫期表现出较高的相对表达，而在成年期则没有，这表明MHC-Ib基因在幼虫免疫中起作用。我们的研究结果强调了蝾螈对于理解巨型基因组中复杂区域的进化和四足动物MHC结构的重要性。

{"title":"MHC in newts illuminates the evolutionary dynamics of complex regions in giant genomes.","authors":"Wiesław Babik, Katarzyna Dudek, Gemma Palomar, Marzena Marszałek, Grzegorz Dubin, Maximina H Yun, Magdalena Migalska","doi":"10.1101/gr.281127.125","DOIUrl":"10.1101/gr.281127.125","url":null,"abstract":"Major Histocompatibility Complex (MHC) molecules are central to vertebrate adaptive immunity, and MHC genes serve as key models in evolutionary genomics, offering insight into birth-and-death evolution, gene duplication, and the maintenance of genetic diversity. However, the organization and evolution of the MHC in species with giant genomes, such as salamanders, remain poorly understood. Here, we use comparative genomics, expression across multiple ontogenetic stages and tissues, as well as polymorphism data to investigate MHC evolution in newts. Contrary to earlier suggestions of a massively expanded MHC in salamanders, we find that the core MHC region remains relatively compact, demonstrating that genome gigantism does not scale proportionally in this region. Our finding also challenges the model of coevolution between a single classical MHC-Ia gene and antigen processing genes (APGs), revealing instead several polymorphic and highly expressed putative MHC-Ia located at varying distances from the APGs. MHC-I genes exhibit lineage-specific duplications and signs of concerted evolution, resulting in poorly resolved phylogenies. In contrast, MHC-II genes are more conserved and exhibit extensive trans-species polymorphism. Expression and polymorphism patterns identify putative nonclassical MHC-Ib genes, likely repeatedly derived from MHC-Ia genes, paralleling patterns seen in mammals but contrasting with the situation in fish and Xenopus frogs. In all seven studied species, some MHC-Ib genes show high relative expression during the larval stage but not at adulthood, suggesting a role in larval immunity. Our results underscore the importance of salamanders for understanding the evolution of complex regions in giant genomes and the architecture of the tetrapod MHC.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"303-317"},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863176/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145959226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A scalable computational framework for predicting gene expression from candidate cis-regulatory elements. 从候选顺式调控元件预测基因表达的可扩展计算框架。

IF 5.5 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research

Pub Date : 2026-02-03 DOI: 10.1101/gr.281219.125

Qinhu Zhang, Siguo Wang, Zhipeng Li, Wenzheng Bao, Wenjian Liu, De-Shuang Huang

Deciphering the relationships between cis-regulatory elements (CREs) and target gene expression has been a long-standing unsolved problem in molecular biology, and the dynamics of CREs in different cell types make this problem more challenging. To address this challenge, we propose a scalable computational framework for predicting gene expression (ScPGE) from discrete candidate CREs (cCREs). ScPGE assembles DNA sequences, transcription factor (TF) binding scores, and epigenomic tracks from discrete cCREs into three-dimensional tensors, and then models the relationships between cCREs and genes by combining convolutional neural networks with transformers. Compared with current state-of-the-art models, ScPGE exhibits superior performance in predicting gene expression and yields higher accuracy in identifying active enhancer-gene interactions through attention mechanisms. By comprehensively analyzing ScPGE's predictions, we find a pattern in true positives (TPs) that the regulatory effect of cCREs on genes decreases with distance. Inspired by the pattern, we design two methods to enhance the ability to capture distal cCRE-gene interactions by incorporating chromatin loops into the ScPGE model. Furthermore, ScPGE accurately discovers some crucial TF motifs within prioritized cCREs and reveals the different regulatory types of these cCREs.

破解顺式调控元件（cre）与靶基因表达之间的关系一直是分子生物学中长期未解决的问题，而cre在不同细胞类型中的动态变化使这一问题更具挑战性。为了应对这一挑战，我们提出了一个可扩展的计算框架，用于预测离散候选cre （cCREs）的基因表达（ScPGE）。ScPGE将离散cCREs的DNA序列、转录因子（TF）结合分数和表观基因组轨迹组装成三维张量，然后将卷积神经网络与变压器相结合，对cCREs与基因之间的关系进行建模。与目前最先进的模型相比，ScPGE在预测基因表达方面表现优异，并且在通过注意机制识别活性增强子-基因相互作用方面具有更高的准确性。通过综合分析ScPGE的预测，我们发现在真阳性（TPs）中，cCREs对基因的调节作用随着距离的增加而降低。受这种模式的启发，我们设计了两种方法，通过将染色质环纳入ScPGE模型来增强捕获远端ccre -基因相互作用的能力。此外，ScPGE准确地发现了优先cCREs中一些关键的TF基序，并揭示了这些cCREs的不同调控类型。

{"title":"A scalable computational framework for predicting gene expression from candidate cis-regulatory elements.","authors":"Qinhu Zhang, Siguo Wang, Zhipeng Li, Wenzheng Bao, Wenjian Liu, De-Shuang Huang","doi":"10.1101/gr.281219.125","DOIUrl":"10.1101/gr.281219.125","url":null,"abstract":"Deciphering the relationships between cis-regulatory elements (CREs) and target gene expression has been a long-standing unsolved problem in molecular biology, and the dynamics of CREs in different cell types make this problem more challenging. To address this challenge, we propose a scalable computational framework for predicting gene expression (ScPGE) from discrete candidate CREs (cCREs). ScPGE assembles DNA sequences, transcription factor (TF) binding scores, and epigenomic tracks from discrete cCREs into three-dimensional tensors, and then models the relationships between cCREs and genes by combining convolutional neural networks with transformers. Compared with current state-of-the-art models, ScPGE exhibits superior performance in predicting gene expression and yields higher accuracy in identifying active enhancer-gene interactions through attention mechanisms. By comprehensively analyzing ScPGE's predictions, we find a pattern in true positives (TPs) that the regulatory effect of cCREs on genes decreases with distance. Inspired by the pattern, we design two methods to enhance the ability to capture distal cCRE-gene interactions by incorporating chromatin loops into the ScPGE model. Furthermore, ScPGE accurately discovers some crucial TF motifs within prioritized cCREs and reveals the different regulatory types of these cCREs.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"361-374"},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863192/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145989135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Stable genome structures in living fossil fishes. 活化石鱼类的稳定基因组结构。

IF 5.5 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research

Pub Date : 2026-02-03 DOI: 10.1101/gr.280800.125

Cheng Wang, Chase D Brownstein, Wenjun Chen, Zufa Ding, Dan Yu, Yu Deng, Chenguang Feng, Thomas J Near, Shunping He, Liandong Yang

Genomic evolution can propel and restrict species diversification. Rapid molecular evolution and genomic rearrangement is often associated with increased species diversification, but whether genome structural evolution shows a slow tempo in long-lived, species-poor lineages remains unclear. Here, we present two chromosome-level genomes of gars, a lineage of seven living species of freshwater fishes that are nearly identical in anatomy to extinct species from tens of millions of years ago. Using the new genomes, we show that gars have the slowest rates of genomic structural and sequence evolution of all vertebrates. In species of the two living gar genera Atractosteus and Lepisosteus, 83.35% of the genomes remain identical even though they diverged over 100 million years ago. Genome size variation among gars is almost entirely attributable to single base pair insertions and deletions. Yet, we also detect inflated GC repeat numbers on Chromosomes 14 and 23 of Atractosteus spatula that are absent in Lepisosteus and show that gar microchromosomes and macrochromosomes display different rates of structural evolution. Our analyses suggest that the genomic stability of gars, which may explain the ability of deeply divergent gar species to hybridize and has contributed to their higher structural similarity to tetrapod genomes than those of the far more closely related teleost fishes, may result from very low rates of transposable element origination and high inactivity compared to other vertebrates. Beyond providing a reference point for comparative vertebrate genomic studies, the new gar genomes illuminate a structural component of slow genomic evolution in living fossils and molecular mechanisms that may underlie exceptional genome stability.

基因组进化可以促进和限制物种多样化。快速的分子进化和基因组重排通常与物种多样化的增加有关，但基因组结构进化是否在长寿、物种贫乏的谱系中显示出缓慢的节奏仍不清楚。在这里，我们展示了gar的两个染色体水平的基因组，这是一个由7种淡水鱼类组成的谱系，在解剖学上与数千万年前灭绝的物种几乎相同。利用新的基因组，我们发现在所有脊椎动物中，老虎的基因组结构和序列进化速度是最慢的。在现存的两种gar属（Atractosteus和Lepisosteus）中，83.35%的基因组保持相同，尽管它们在1亿多年前就已经分化了。不同种属之间的基因组大小差异几乎完全归因于单个碱基对的插入和缺失。然而，我们也在Atractosteus spatula的14号和23号染色体上发现了在Lepisosteus中不存在的GC重复数膨胀，这表明gar微染色体和大染色体表现出不同的结构进化速率。我们的分析表明，gar的基因组稳定性可能是由于与其他脊椎动物相比，转座因子的起源率非常低，并且不活跃，这可能解释了深度分化的gar物种的杂交能力，并导致它们与四足动物基因组的结构相似性高于与之密切相关的硬骨鱼。除了为比较脊椎动物基因组研究提供参考点之外，新的gar基因组阐明了活化石中缓慢基因组进化的结构组成部分和可能成为异常基因组稳定性基础的分子机制。

{"title":"Stable genome structures in living fossil fishes.","authors":"Cheng Wang, Chase D Brownstein, Wenjun Chen, Zufa Ding, Dan Yu, Yu Deng, Chenguang Feng, Thomas J Near, Shunping He, Liandong Yang","doi":"10.1101/gr.280800.125","DOIUrl":"10.1101/gr.280800.125","url":null,"abstract":"Genomic evolution can propel and restrict species diversification. Rapid molecular evolution and genomic rearrangement is often associated with increased species diversification, but whether genome structural evolution shows a slow tempo in long-lived, species-poor lineages remains unclear. Here, we present two chromosome-level genomes of gars, a lineage of seven living species of freshwater fishes that are nearly identical in anatomy to extinct species from tens of millions of years ago. Using the new genomes, we show that gars have the slowest rates of genomic structural and sequence evolution of all vertebrates. In species of the two living gar genera Atractosteus and Lepisosteus, 83.35% of the genomes remain identical even though they diverged over 100 million years ago. Genome size variation among gars is almost entirely attributable to single base pair insertions and deletions. Yet, we also detect inflated GC repeat numbers on Chromosomes 14 and 23 of Atractosteus spatula that are absent in Lepisosteus and show that gar microchromosomes and macrochromosomes display different rates of structural evolution. Our analyses suggest that the genomic stability of gars, which may explain the ability of deeply divergent gar species to hybridize and has contributed to their higher structural similarity to tetrapod genomes than those of the far more closely related teleost fishes, may result from very low rates of transposable element origination and high inactivity compared to other vertebrates. Beyond providing a reference point for comparative vertebrate genomic studies, the new gar genomes illuminate a structural component of slow genomic evolution in living fossils and molecular mechanisms that may underlie exceptional genome stability.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":"318-329"},"PeriodicalIF":5.5,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12863190/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145965817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cell type-specific gene regulatory atlas prioritizes drug targets and repurposable medicines in Alzheimer's disease. 细胞类型特异性基因调控图谱优先考虑阿尔茨海默病的药物靶点和可重复使用的药物。

IF 5.5 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research

Pub Date : 2026-01-21 DOI: 10.1101/gr.280436.125

Yunxiao Ren, Ming Hu, Yang E Li, Andrew A Pieper, Jeffrey Cummings, Feixiong Cheng

Alzheimer's disease (AD) is a complex and poorly understood neurodegenerative disorder that lacks sufficiently effective treatments. Computational and integrative analyses that leverage multiomics data provide a promising strategy to uncover disease mechanisms and identify therapeutic opportunities. Here, we develop a cell type-specific regulatory atlas of the human middle temporal gyrus via leveraging single-nucleus RNA-seq (1,197,032 nuclei) and ATAC-seq (740,875 nuclei) datasets from 84 donors across four stages of AD neuropathological change (ADNC). We observe differential gene expression for six major cell types intensified at severe ADNC. Integrating peak-to-gene linkages and motif enrichment analyses, we reconstruct transcription factor (TF)-target gene networks across six major brain cell types. By integrating genome-wide association study (GWAS) loci with cell type-specific cis-regulatory DNA elements (CREs), we pinpoint 141 ADNC-associated genes. Using gene set enrichment analysis (GSEA) and network proximity analysis, we further identify nine candidate repurposable drugs that were associated with these ADNC-related genes. In summary, this cell type-specific multiomics atlas provides a comprehensive resource for mechanistic understanding, target prioritization, and therapeutic hypothesis generation in AD and AD-related dementia if broadly applied.

阿尔茨海默病（AD）是一种复杂且知之甚少的神经退行性疾病，缺乏足够有效的治疗方法。利用多组学数据的计算和综合分析为揭示疾病机制和确定治疗机会提供了一种有前途的策略。在这里，我们通过利用来自84个供体的单核RNA-seq（1,197,032个核）和ATAC-seq（740,875个核）数据集，在AD神经病理改变（ADNC）的四个阶段建立了人类中颞回细胞类型特异性调控图谱。我们观察到六种主要细胞类型的差异基因表达在严重ADNC中增强。整合峰-基因连接和基序富集分析，我们重建转录因子(TF)-靶基因网络跨越六种主要的脑细胞类型。通过整合全基因组关联研究（GWAS）位点与细胞类型特异性顺式调控DNA元件（cre），我们确定了141个adnc相关基因。通过基因集富集分析（GSEA）和网络接近分析，我们进一步确定了与这些adnc相关基因相关的9种候选可重复利用药物。总之，如果广泛应用，这种细胞类型特异性多组学图谱为阿尔茨海默病和阿尔茨海默病相关痴呆的机制理解、目标优先排序和治疗假设生成提供了全面的资源。

{"title":"Cell type-specific gene regulatory atlas prioritizes drug targets and repurposable medicines in Alzheimer's disease.","authors":"Yunxiao Ren, Ming Hu, Yang E Li, Andrew A Pieper, Jeffrey Cummings, Feixiong Cheng","doi":"10.1101/gr.280436.125","DOIUrl":"https://doi.org/10.1101/gr.280436.125","url":null,"abstract":"Alzheimer's disease (AD) is a complex and poorly understood neurodegenerative disorder that lacks sufficiently effective treatments. Computational and integrative analyses that leverage multiomics data provide a promising strategy to uncover disease mechanisms and identify therapeutic opportunities. Here, we develop a cell type-specific regulatory atlas of the human middle temporal gyrus via leveraging single-nucleus RNA-seq (1,197,032 nuclei) and ATAC-seq (740,875 nuclei) datasets from 84 donors across four stages of AD neuropathological change (ADNC). We observe differential gene expression for six major cell types intensified at severe ADNC. Integrating peak-to-gene linkages and motif enrichment analyses, we reconstruct transcription factor (TF)-target gene networks across six major brain cell types. By integrating genome-wide association study (GWAS) loci with cell type-specific cis-regulatory DNA elements (CREs), we pinpoint 141 ADNC-associated genes. Using gene set enrichment analysis (GSEA) and network proximity analysis, we further identify nine candidate repurposable drugs that were associated with these ADNC-related genes. In summary, this cell type-specific multiomics atlas provides a comprehensive resource for mechanistic understanding, target prioritization, and therapeutic hypothesis generation in AD and AD-related dementia if broadly applied.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":" ","pages":""},"PeriodicalIF":5.5,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146018228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0