Genome research最新文献_第7页

Recovering gene regulatory networks in single-cell multiomics data with PRISM-GRN 利用PRISM-GRN恢复单细胞多组学数据中的基因调控网络

IF 7 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research

Pub Date : 2025-10-09 DOI: 10.1101/gr.280757.125

Wenhao Zhang, Lan Cao, Xiaoxuan Gu, Yongyu Long, Ying Wang

Understanding Gene Regulatory Networks (GRNs) is crucial for deciphering cellular heterogeneity and the mechanisms underlying development and disease. However, current GRN inference methods fail to utilize multiomics data and prior knowledge from a biologically-interpretable insight. Therefore, we propose PRISM-GRN, a Bayesian model that seamlessly incorporates known GRNs, along with scRNA-seq and scATAC-seq data, into a probabilistic framework to reconstruct cell type-specific GRNs. PRISM-GRN employs a biologically interpretable architecture firmly rooted in the established gene regulatory mechanism, which asserts that gene expression is influenced by TF expression levels and gene chromatin accessibility through GRNs. Accordingly, PRISM-GRN decomposes observable data into biologically meaningful latent variables through a mechanism-informed generation process and a prior-GRN-primed inference process, enabling precise and robust GRN reconstruction. We evaluate PRISM-GRN on four benchmarking datasets with paired scRNA-seq and scATAC-seq data, demonstrating its superior performance over seven baseline methods in GRN reconstruction, especially its higher precision under the inherently imbalanced scenario where the true regulatory interaction is sparse. Furthermore, benchmarking on directed GRNs highlights PRISM-GRN's ability to capture causality in gene regulation derived from the biologically-interpretable architecture. More importantly, PRISM-GRN performs well with unpaired omics data and limited prior GRN information, showcasing its flexibility and adaptability across various biological contexts. Finally, biological analyses on PBMC datasets demonstrate PRISM-GRN's potential to facilitate the identification of cell type-specific or context-specific GRNs across broader real-world biological research applications. Overall, PRISM-GRN provides a novel paradigm for precise, robust, and interpretable exploration of causal GRNs with prior knowledge and multiomics data.

了解基因调控网络（GRNs）对于破译细胞异质性和潜在的发育和疾病机制至关重要。然而，目前的GRN推断方法未能利用多组学数据和来自生物学可解释洞察力的先验知识。因此，我们提出了PRISM-GRN，这是一个贝叶斯模型，它将已知的grn与scRNA-seq和scATAC-seq数据无缝结合到一个概率框架中，以重建细胞类型特异性grn。PRISM-GRN采用了一种生物可解释的结构，牢固地植根于已建立的基因调控机制，该机制认为基因表达受TF表达水平和基因染色质通过grn的可及性的影响。因此，PRISM-GRN通过机制信息生成过程和先验GRN启动推理过程将可观测数据分解为具有生物学意义的潜在变量，从而实现精确和鲁棒的GRN重建。我们在四个具有配对的scRNA-seq和scATAC-seq数据的基准数据集上对PRISM-GRN进行了评估，证明其在GRN重建中优于7种基线方法，特别是在真正的调控相互作用稀疏的固有不平衡场景下具有更高的精度。此外，定向grn的基准测试强调了PRISM-GRN从生物学可解释的结构中捕获基因调控因果关系的能力。更重要的是，PRISM-GRN在未配对组学数据和有限的先前GRN信息中表现良好，展示了其在各种生物学背景下的灵活性和适应性。最后，对PBMC数据集的生物学分析表明，PRISM-GRN有潜力在更广泛的现实世界生物学研究应用中促进细胞类型特异性或上下文特异性grn的鉴定。总的来说，PRISM-GRN为利用先验知识和多组学数据对因果grn进行精确、稳健和可解释的探索提供了一种新的范例。

{"title":"Recovering gene regulatory networks in single-cell multiomics data with PRISM-GRN","authors":"Wenhao Zhang, Lan Cao, Xiaoxuan Gu, Yongyu Long, Ying Wang","doi":"10.1101/gr.280757.125","DOIUrl":"https://doi.org/10.1101/gr.280757.125","url":null,"abstract":"Understanding Gene Regulatory Networks (GRNs) is crucial for deciphering cellular heterogeneity and the mechanisms underlying development and disease. However, current GRN inference methods fail to utilize multiomics data and prior knowledge from a biologically-interpretable insight. Therefore, we propose PRISM-GRN, a Bayesian model that seamlessly incorporates known GRNs, along with scRNA-seq and scATAC-seq data, into a probabilistic framework to reconstruct cell type-specific GRNs. PRISM-GRN employs a biologically interpretable architecture firmly rooted in the established gene regulatory mechanism, which asserts that gene expression is influenced by TF expression levels and gene chromatin accessibility through GRNs. Accordingly, PRISM-GRN decomposes observable data into biologically meaningful latent variables through a mechanism-informed generation process and a prior-GRN-primed inference process, enabling precise and robust GRN reconstruction. We evaluate PRISM-GRN on four benchmarking datasets with paired scRNA-seq and scATAC-seq data, demonstrating its superior performance over seven baseline methods in GRN reconstruction, especially its higher precision under the inherently imbalanced scenario where the true regulatory interaction is sparse. Furthermore, benchmarking on directed GRNs highlights PRISM-GRN's ability to capture causality in gene regulation derived from the biologically-interpretable architecture. More importantly, PRISM-GRN performs well with unpaired omics data and limited prior GRN information, showcasing its flexibility and adaptability across various biological contexts. Finally, biological analyses on PBMC datasets demonstrate PRISM-GRN's potential to facilitate the identification of cell type-specific or context-specific GRNs across broader real-world biological research applications. Overall, PRISM-GRN provides a novel paradigm for precise, robust, and interpretable exploration of causal GRNs with prior knowledge and multiomics data.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"71 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145255623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Loss of multilevel 3D genome organization during breast cancer progression 乳腺癌进展过程中多层三维基因组组织的缺失

IF 7 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research

Pub Date : 2025-10-09 DOI: 10.1101/gr.280791.125

Roberto Rossini, Saleh Oshaghi, Maxim Nekrasov, Aurélie Bellanger, Renae Domaschenz, Yasmin Dijkwel, Mohamed Abdelhalim, Rahul Agrawal, Marit Ledsaak, Philippe Collas, Ragnhild Eskeland, David Tremethick, Jonas Paulsen

Breast cancer entails intricate alterations in genome organization and expression. However, how three-dimensional (3D) chromatin structure changes in the progression from a normal to a breast cancer malignant state remains unknown. To address this, we conducted an analysis combining Hi-C data with lamina-associated domains (LADs), epigenomic marks, and gene expression in an in vitro model of breast cancer progression. Our results reveal that while the fundamental properties of topologically associating domains (TADs) are overall maintained, significant changes occur in the organization of compartments and subcompartments. These changes are closely correlated with alterations in the expression of oncogenic genes. We also observe a restructuring of TAD-TAD interactions, coinciding with a loss of spatial compartmentalization and radial positioning of the 3D genome. Notably, we identify a previously unrecognized interchromosomal insertion event, wherein a locus on Chromosome 8 housing the MYC oncogene is inserted into a highly active subcompartment on Chromosome 10. This insertion is accompanied by the formation of de novo enhancer contacts and activation of MYC, illustrating how structural genomic variants can alter the 3D genome during oncogenesis. In summary, our findings provide evidence for the loss of genome organization at multiple scales during breast cancer progression revealing novel relationships between genome 3D structure and oncogenic processes.

乳腺癌涉及基因组组织和表达的复杂改变。然而，三维（3D）染色质结构如何在从正常到恶性乳腺癌状态的进展中发生变化仍然未知。为了解决这个问题，我们在乳腺癌进展的体外模型中进行了一项分析，将Hi-C数据与层相关结构域（LADs）、表观基因组标记和基因表达结合起来。我们的研究结果表明，虽然拓扑相关域（TADs）的基本性质总体上保持不变，但在区室和子区室的组织中发生了重大变化。这些变化与致癌基因表达的改变密切相关。我们还观察到TAD-TAD相互作用的重组，与三维基因组的空间区隔和径向定位的丧失相一致。值得注意的是，我们发现了一个以前未被识别的染色体间插入事件，其中8号染色体上容纳MYC癌基因的位点插入到10号染色体上一个高度活跃的亚室中。这种插入伴随着从头开始的增强子接触的形成和MYC的激活，说明了结构基因组变异如何在肿瘤发生过程中改变3D基因组。总之，我们的研究结果为乳腺癌进展过程中多个尺度的基因组组织缺失提供了证据，揭示了基因组3D结构与致癌过程之间的新关系。

{"title":"Loss of multilevel 3D genome organization during breast cancer progression","authors":"Roberto Rossini, Saleh Oshaghi, Maxim Nekrasov, Aurélie Bellanger, Renae Domaschenz, Yasmin Dijkwel, Mohamed Abdelhalim, Rahul Agrawal, Marit Ledsaak, Philippe Collas, Ragnhild Eskeland, David Tremethick, Jonas Paulsen","doi":"10.1101/gr.280791.125","DOIUrl":"https://doi.org/10.1101/gr.280791.125","url":null,"abstract":"Breast cancer entails intricate alterations in genome organization and expression. However, how three-dimensional (3D) chromatin structure changes in the progression from a normal to a breast cancer malignant state remains unknown. To address this, we conducted an analysis combining Hi-C data with lamina-associated domains (LADs), epigenomic marks, and gene expression in an in vitro model of breast cancer progression. Our results reveal that while the fundamental properties of topologically associating domains (TADs) are overall maintained, significant changes occur in the organization of compartments and subcompartments. These changes are closely correlated with alterations in the expression of oncogenic genes. We also observe a restructuring of TAD-TAD interactions, coinciding with a loss of spatial compartmentalization and radial positioning of the 3D genome. Notably, we identify a previously unrecognized interchromosomal insertion event, wherein a locus on Chromosome 8 housing the MYC oncogene is inserted into a highly active subcompartment on Chromosome 10. This insertion is accompanied by the formation of de novo enhancer contacts and activation of MYC, illustrating how structural genomic variants can alter the 3D genome during oncogenesis. In summary, our findings provide evidence for the loss of genome organization at multiple scales during breast cancer progression revealing novel relationships between genome 3D structure and oncogenic processes.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"9 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145255651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Integrative chromatin state annotation of 234 human ENCODE4 cell types using Segway 使用Segway对234种人类ENCODE4细胞类型的整合染色质状态进行注释

IF 7 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research

Pub Date : 2025-10-06 DOI: 10.1101/gr.280633.125

Marjan Farahbod, Aboud Diab, Paul Sud, Meenakshi S. Kagda, Ian Whaling, Mehdi Foroozandeh, Ishan Goel, Habib Daneshpajouh, Benjamin Hitz, J. Michael Cherry, Maxwell W. Libbrecht

The fourth and final phase of the ENCODE consortium has newly profiled epigenetic activity in hundreds of human tissues. Chromatin state annotations created by segmentation and genome annotation (SAGA) methods such as Segway have emerged as the predominant integrative summary of such data sets. Here, we present the ENCODE4 Catalog of Segway Annotations, a set of sample-specific genome-wide chromatin state annotations of 234 human biosamples inferred from 1,794 genomics experiments. This catalog identifies genomic elements, accurately captures cell type-specific regulatory patterns, and facilitates discovery of elements involved in phenotype and disease.

ENCODE联盟的第四阶段，也是最后阶段，对数百种人体组织的表观遗传活动进行了新的分析。由分割和基因组注释（SAGA）方法（如Segway）创建的染色质状态注释已成为此类数据集的主要综合摘要。在这里，我们提出了ENCODE4 Segway注释目录，这是一组从1,794个基因组学实验中推断出的234个人类生物样本的样本特异性全基因组染色质状态注释。该目录确定基因组元件，准确捕获细胞类型特异性调节模式，并促进发现与表型和疾病有关的元件。

引用次数: 0

Corrigendum: A sheep pangenome reveals the spectrum of structural variations and their effects on tail phenotypes 更正：绵羊泛基因组揭示了结构变异的频谱及其对尾部表型的影响

IF 7 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research

Pub Date : 2025-10-01 DOI: 10.1101/gr.281340.125

Ran Li, Mian Gong, Xinmiao Zhang, Fei Wang, Zhenyu Liu, Lei Zhang, Qimeng Yang, Yuan Xu, Mengsi Xu, Huanhuan Zhang, Yunfeng Zhang, Xuelei Dai, Yuanpeng Gao, Zhuangbiao Zhang, Wenwen Fang, Yuta Yang, Weiwei Fu, Chunna Cao, Peng Yang, Zeinab Amiri Ghanatsaman, Niloufar Jafarpour Negari, Hojjat Asadollahpour Nanaei, Xiangpeng Yue, Yuxuan Song, Xianyong Lan, Weidong Deng, Xihong Wang, Chuanying Pan, Ruidong Xiang, Eveline M. Ibeagha-Awemu, Pat (J.S.) Heslop-Harrison, Benjamin D. Rosen, Johannes A. Lenstra, Shangquan Gan, Yu Jiang

Genome Research 33: 463–477 (2023)

基因组研究33:463-477 （2023）

引用次数: 0

Strong bias in long-read sequencing prevents assembly of Drosophila melanogaster Y-linked genes 长读测序的强烈偏见阻止了果蝇y连锁基因的组装

IF 7 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research

Pub Date : 2025-10-01 DOI: 10.1101/gr.280604.125

Antonio Bernardo Carvalho, Bernard Y Kim, Fabiana Uno

Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) are generally considered free from sequence composition bias, a key factor - alongside read length - that explains their success in producing high quality genome assemblies. Indeed, there had been very few reports of bias, the clearest one against GA-rich repeats in the human genome. However, our study reveals a systematic failure of both technologies to sequence and assemble specific exons of Drosophila melanogaster genes, indicating an overlooked limitation. Namely, multiple Y-linked exons are nearly or completely absent from raw reads produced by deep sequencing with state-of-the-art ONT (10.4 flow cells, 200× coverage) and PacBio (HiFi 50×). The same exons are accurately assembled using Illumina 67× coverage. We found that these missing exons are consistently located near simple satellite sequences, where sequencing fails at multiple levels: read initiation (very few reads start within satellite regions), read elongation (satellite-containing reads are shorter on average), and base-calling (quality scores drop as sequencing enters a satellite sequence). These findings challenge the assumption that long-read technologies are unbiased and reveal a critical barrier to assembling sequences near repetitive regions. As large-scale sequencing projects move towards telomere-to-telomere assemblies in a wide range of organisms, recognizing and addressing these biases will be important to achieving truly complete and accurate genomes. Additionally, the underrepresented Y-linked exons provides a valuable benchmark for refining those sequencing technologies while improving the assembly of the highly heterochromatic and often neglected Drosophila Y Chromosome.

牛津纳米孔技术公司（ONT）和太平洋生物科学公司（PacBio）通常被认为没有序列组成偏差，这是一个关键因素——除了读取长度——解释了它们成功生产高质量基因组组装的原因。事实上，很少有偏见的报道，最明显的是反对人类基因组中富含ga的重复序列。然而，我们的研究揭示了这两种技术在对黑腹果蝇基因的特定外显子进行测序和组装方面的系统性失败，表明了一个被忽视的局限性。也就是说，使用最先进的ONT（10.4流式细胞，200x覆盖率）和PacBio （HiFi 50x）进行深度测序产生的原始reads中几乎或完全没有多个y连锁外显子。使用Illumina 67x覆盖准确地组装相同的外显子。我们发现这些缺失的外显子始终位于简单的卫星序列附近，其中测序在多个层面上失败：读取起始（很少的读取在卫星区域内开始），读取延伸（含卫星的读取平均较短）和碱基调用（测序进入卫星序列时质量分数下降）。这些发现挑战了长读技术是无偏倚的假设，并揭示了在重复区域附近组装序列的关键障碍。随着大规模测序项目在广泛的生物体中向端粒到端粒组装的方向发展，认识和解决这些偏差对于实现真正完整和准确的基因组将是重要的。此外，未被充分代表的Y连锁外显子为改进这些测序技术提供了有价值的基准，同时改善了高度异色且经常被忽视的果蝇Y染色体的组装。

{"title":"Strong bias in long-read sequencing prevents assembly of Drosophila melanogaster Y-linked genes","authors":"Antonio Bernardo Carvalho, Bernard Y Kim, Fabiana Uno","doi":"10.1101/gr.280604.125","DOIUrl":"https://doi.org/10.1101/gr.280604.125","url":null,"abstract":"Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) are generally considered free from sequence composition bias, a key factor - alongside read length - that explains their success in producing high quality genome assemblies. Indeed, there had been very few reports of bias, the clearest one against GA-rich repeats in the human genome. However, our study reveals a systematic failure of both technologies to sequence and assemble specific exons of Drosophila melanogaster genes, indicating an overlooked limitation. Namely, multiple Y-linked exons are nearly or completely absent from raw reads produced by deep sequencing with state-of-the-art ONT (10.4 flow cells, 200× coverage) and PacBio (HiFi 50×). The same exons are accurately assembled using Illumina 67× coverage. We found that these missing exons are consistently located near simple satellite sequences, where sequencing fails at multiple levels: read initiation (very few reads start within satellite regions), read elongation (satellite-containing reads are shorter on average), and base-calling (quality scores drop as sequencing enters a satellite sequence). These findings challenge the assumption that long-read technologies are unbiased and reveal a critical barrier to assembling sequences near repetitive regions. As large-scale sequencing projects move towards telomere-to-telomere assemblies in a wide range of organisms, recognizing and addressing these biases will be important to achieving truly complete and accurate genomes. Additionally, the underrepresented Y-linked exons provides a valuable benchmark for refining those sequencing technologies while improving the assembly of the highly heterochromatic and often neglected Drosophila Y Chromosome.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"101 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145203171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Highly accurate reference and method selection for universal cross-dataset cell type annotation with CAMUS 基于CAMUS的通用跨数据集单元类型标注的高精度参考和方法选择

IF 7 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research

Pub Date : 2025-10-01 DOI: 10.1101/gr.280821.125

Qunlun Shen, Shuqin Zhang, Shihua Zhang

Cell type annotation is a critical and essential task in single-cell data analysis. Various reference-based methods have provided rapid annotation for diverse single-cell data. However, how to select the optimal references and methods is often overlooked. To this end, we present a cross-dataset cell-type annotation methodology with a universal reference data and method selection strategy (CAMUS) to achieve highly accurate and efficient annotations. We demonstrate the advantages of CAMUS by conducting comprehensive analyses on 672 pairs of cross-species scRNA-seq datasets. The annotation results with references selected by CAMUS achieved substantial accuracy gains (25.0-124.7%) over random selection strategies across five reference-based methods. CAMUS achieved high accuracy in choosing the best reference-method pair among 3360 pairs (49.1%). Moreover, CAMUS showed high accuracy in selecting the best methods on the 80 scST datasets (82.5%) and five scATAC-seq datasets (100.0%), illustrating its universal applicability. In addition, we utilized the CAMUS score with other metrics to predict the annotation accuracy, providing direct guidance on whether to accept current annotation results.

在单细胞数据分析中，细胞类型标注是一项至关重要的任务。各种基于参考的方法为不同的单细胞数据提供了快速标注。然而，如何选择最佳的参考文献和方法往往被忽视。为此，我们提出了一种具有通用参考数据和方法选择策略（CAMUS）的跨数据集单元类型标注方法，以实现高精度和高效的标注。我们通过对672对跨物种scRNA-seq数据集进行综合分析，证明了CAMUS的优势。在五种基于参考文献的方法中，CAMUS选择参考文献的标注结果比随机选择策略获得了显著的准确率提升（25.0-124.7%）。在3360对参考方法对中，CAMUS选择最佳参考方法对的准确率为49.1%。此外，CAMUS在80个scST数据集（82.5%）和5个scATAC-seq数据集（100.0%）上的最佳方法选择准确率较高，说明其普遍适用性。此外，我们利用CAMUS分数和其他指标来预测标注准确性，为是否接受当前标注结果提供直接指导。

{"title":"Highly accurate reference and method selection for universal cross-dataset cell type annotation with CAMUS","authors":"Qunlun Shen, Shuqin Zhang, Shihua Zhang","doi":"10.1101/gr.280821.125","DOIUrl":"https://doi.org/10.1101/gr.280821.125","url":null,"abstract":"Cell type annotation is a critical and essential task in single-cell data analysis. Various reference-based methods have provided rapid annotation for diverse single-cell data. However, how to select the optimal references and methods is often overlooked. To this end, we present a cross-dataset cell-type annotation methodology with a universal reference data and method selection strategy (CAMUS) to achieve highly accurate and efficient annotations. We demonstrate the advantages of CAMUS by conducting comprehensive analyses on 672 pairs of cross-species scRNA-seq datasets. The annotation results with references selected by CAMUS achieved substantial accuracy gains (25.0-124.7%) over random selection strategies across five reference-based methods. CAMUS achieved high accuracy in choosing the best reference-method pair among 3360 pairs (49.1%). Moreover, CAMUS showed high accuracy in selecting the best methods on the 80 scST datasets (82.5%) and five scATAC-seq datasets (100.0%), illustrating its universal applicability. In addition, we utilized the CAMUS score with other metrics to predict the annotation accuracy, providing direct guidance on whether to accept current annotation results.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"95 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145203170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adaptation of centromere to breakage through local genomic and epigenomic remodeling in wheat 小麦着丝粒通过局部基因组和表观基因组重塑适应断裂

IF 7 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research

Pub Date : 2025-09-30 DOI: 10.1101/gr.280913.125

Jingwei Zhou, Yuhong Huang, Huan Ma, Yiqian Chen, Chuanye Chen, Fangpu Han, Handong Su

Centromeres, characterized by their unique chromatin attributes, are indispensable for safeguarding genomic stability. Due to their intricate and fragile nature, centromeres are susceptible to chromosomal rearrangements. However, the mechanisms preserving their functional integrity and supporting nucleus homeostasis following breakages remained enigmatic. In this study, we use wheat ditelosomic stocks, which arise from centromere breakage, to explore the genetic and epigenetic alterations in damaged centromeres. Our investigations unveil novel chromosome end structures marked by de novo addition of telomeres, as well as localized chromosomal shattering, including segment deletions and duplications near centromere breakpoints. We reveal that the damaged centromeres possess a remarkable capacity for self-regulation, through employing structural modifications such as expansion, contraction, and neocentromere formation to maintain their functional integrity. Centromere breakage triggers nucleosome remodeling and is accompanied by local transcription changes and chromatin reorganization, and subsequently may contribute to the stabilization of broken chromosomes. Our findings highlight the resilience and adaptability of plant chromosomes in response to centromere breakage, and provide valuable insights into the stability of centromeres, thereby offering promising prospects to manipulate centromeres for targeted chromosomal innovation and crop genetic improvement.

着丝粒以其独特的染色质属性为特征，在维护基因组稳定性方面是不可或缺的。由于其复杂和脆弱的性质，着丝粒易受染色体重排的影响。然而，保持其功能完整性和支持破坏后核稳态的机制仍然是谜。在这项研究中，我们利用小麦着丝粒断裂产生的二染色体种群来探索受损着丝粒的遗传和表观遗传变化。我们的研究揭示了新的染色体末端结构，其特征是端粒的重新添加，以及局部染色体破碎，包括着丝粒断点附近的片段缺失和复制。我们发现受损的着丝粒具有显著的自我调节能力，通过结构修饰，如扩张、收缩和新着丝粒的形成来维持其功能的完整性。着丝粒断裂触发核小体重塑，并伴随着局部转录变化和染色质重组，随后可能有助于断裂染色体的稳定。我们的研究结果突出了植物染色体对着丝粒断裂的恢复和适应性，并为着丝粒的稳定性提供了有价值的见解，从而为操纵着丝粒进行针对性的染色体创新和作物遗传改良提供了广阔的前景。

{"title":"Adaptation of centromere to breakage through local genomic and epigenomic remodeling in wheat","authors":"Jingwei Zhou, Yuhong Huang, Huan Ma, Yiqian Chen, Chuanye Chen, Fangpu Han, Handong Su","doi":"10.1101/gr.280913.125","DOIUrl":"https://doi.org/10.1101/gr.280913.125","url":null,"abstract":"Centromeres, characterized by their unique chromatin attributes, are indispensable for safeguarding genomic stability. Due to their intricate and fragile nature, centromeres are susceptible to chromosomal rearrangements. However, the mechanisms preserving their functional integrity and supporting nucleus homeostasis following breakages remained enigmatic. In this study, we use wheat ditelosomic stocks, which arise from centromere breakage, to explore the genetic and epigenetic alterations in damaged centromeres. Our investigations unveil novel chromosome end structures marked by de novo addition of telomeres, as well as localized chromosomal shattering, including segment deletions and duplications near centromere breakpoints. We reveal that the damaged centromeres possess a remarkable capacity for self-regulation, through employing structural modifications such as expansion, contraction, and neocentromere formation to maintain their functional integrity. Centromere breakage triggers nucleosome remodeling and is accompanied by local transcription changes and chromatin reorganization, and subsequently may contribute to the stabilization of broken chromosomes. Our findings highlight the resilience and adaptability of plant chromosomes in response to centromere breakage, and provide valuable insights into the stability of centromeres, thereby offering promising prospects to manipulate centromeres for targeted chromosomal innovation and crop genetic improvement.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"29 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145195152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Long-read reconstruction of many diverse haplotypes with devider 带分裂器的多种单倍型的长读重建

IF 7 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research

Pub Date : 2025-09-23 DOI: 10.1101/gr.280510.125

Jim Shaw, Christina Boucher, Yun William Yu, Noelle Noyes, Heng Li

Reconstructing exact haplotypes is important when sequencing a mixture of similar sequences. Long-read sequencing can connect distant alleles to disentangle similar haplotypes, but handling sequencing errors requires specialized techniques. We present devider, an algorithm for haplotyping small sequences - such as viruses or genes - from long-read sequencing. devider uses a positional de Bruijn graph with sequence-to-graph alignment on an alphabet of informative alleles to provide a fast assembly-inspired approach compatible with various long-read sequencing technologies. On a synthetic Nanopore dataset containing seven HIV strains, devider recovered 97% of the haplotype content and had the most accurate abundance estimates while taking < 4 minutes and 1 GB of memory for > 8000× coverage. Benchmarking on synthetic mixtures of antimicrobial resistance (AMR) genes showed that devider recovered 83% of haplotypes, 23 percentage points higher than the next best method. On real PacBio and Nanopore datasets, devider recapitulates previously known results in seconds, disentangling a bacterial community with > 10 strains and an HIV-1 co-infection dataset. We used devider to investigate the within-host diversity of a long-read bovine gut metagenome enriched for AMR genes, discovering 13 distinct haplotypes for a tet(Q) tetracycline resistance gene with > 18,000× coverage and 6 haplotypes for a CfxA2 beta-lactamase gene. We found clear recombination blocks for these AMR gene haplotypes, showcasing devider's ability to unveil evolutionary signals for heterogeneous mixtures.

当对相似序列的混合物进行测序时，重建精确的单倍型是重要的。长读测序可以连接遥远的等位基因，以解开相似的单倍型，但处理测序错误需要专门的技术。我们提出了devider，一种从长读序列中对小序列（如病毒或基因）进行单倍型分析的算法。devider使用位置de Bruijn图，在信息等位基因的字母表上进行序列对图对齐，以提供与各种长读测序技术兼容的快速组装启发方法。在包含7个HIV菌株的合成纳米孔数据集上，分离器恢复了97%的单倍型内容，并且获得了最准确的丰度估计，而花费了4分钟和1gb内存来获得8000x的覆盖率。对合成抗微生物药物耐药性（AMR）基因混合物的基准测试表明，分离方法恢复了83%的单倍型，比次优方法高出23个百分点。在真实的PacBio和Nanopore数据集上，devider可以在几秒钟内概括出先前已知的结果，从而分离出包含10个菌株和HIV-1合并感染数据集的细菌群落。我们使用分裂器研究了富含AMR基因的长读牛肠道宏基因组在宿主内的多样性，发现了一个覆盖面积为18,000倍的tet(Q)四环素抗性基因的13个不同的单倍型和一个CfxA2 β -内酰胺酶基因的6个单倍型。我们发现了这些AMR基因单倍型的清晰重组块，展示了分裂者揭示异质混合物进化信号的能力。

{"title":"Long-read reconstruction of many diverse haplotypes with devider","authors":"Jim Shaw, Christina Boucher, Yun William Yu, Noelle Noyes, Heng Li","doi":"10.1101/gr.280510.125","DOIUrl":"https://doi.org/10.1101/gr.280510.125","url":null,"abstract":"Reconstructing exact haplotypes is important when sequencing a mixture of similar sequences. Long-read sequencing can connect distant alleles to disentangle similar haplotypes, but handling sequencing errors requires specialized techniques. We present devider, an algorithm for haplotyping small sequences - such as viruses or genes - from long-read sequencing. devider uses a positional de Bruijn graph with sequence-to-graph alignment on an alphabet of informative alleles to provide a fast assembly-inspired approach compatible with various long-read sequencing technologies. On a synthetic Nanopore dataset containing seven HIV strains, devider recovered 97% of the haplotype content and had the most accurate abundance estimates while taking < 4 minutes and 1 GB of memory for > 8000× coverage. Benchmarking on synthetic mixtures of antimicrobial resistance (AMR) genes showed that devider recovered 83% of haplotypes, 23 percentage points higher than the next best method. On real PacBio and Nanopore datasets, devider recapitulates previously known results in seconds, disentangling a bacterial community with > 10 strains and an HIV-1 co-infection dataset. We used devider to investigate the within-host diversity of a long-read bovine gut metagenome enriched for AMR genes, discovering 13 distinct haplotypes for a tet(Q) tetracycline resistance gene with > 18,000× coverage and 6 haplotypes for a CfxA2 beta-lactamase gene. We found clear recombination blocks for these AMR gene haplotypes, showcasing devider's ability to unveil evolutionary signals for heterogeneous mixtures.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"28 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145127786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep structural clustering reveals hidden systematic biases in RNA sequencing data 深层结构聚类揭示了RNA测序数据中隐藏的系统性偏差

IF 7 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research

Pub Date : 2025-09-19 DOI: 10.1101/gr.280713.125

Qiang Su, Yi Long, Deming Gou, Junmin Quan, Xiaoming Zhou, Qizhou Lian

RNA sequencing (RNA-seq) is a pivotal tool for transcriptomic analysis, providing comprehensive exploration of gene expression across diverse biological contexts. However, RNA-seq data is susceptible to various biases that can significantly compromise the accuracy and reliability of transcript quantification. This study investigates the influence of high-dimensional RNA structures on local sequencing efficiency using an innovative unsupervised Variational Autoencoder-Gaussian Mixture Model (VAE-GMM). The VAE-GMM effectively captures intricate high-dimensional k-mer structural similarities by learning compact latent representations, which reduces dimensionality while meticulously preserving essential structural features crucial for bias identification. This sophisticated modeling allows precise tracking of local RNA-read conversion dynamics and the identification of complex, often overlooked, bias sources. We rigorously validate the VAE-GMM model's performance and robustness against conventional machine learning techniques, including Gaussian Mixture Models (GMM-only), Principal Component Analysis-based GMMs, k-means clustering, and Hierarchical Clustering. These validations, using an extensive and diverse array of datasets including synthetic RNA constructs, various human cell lines, and authentic tissue samples, consistently demonstrate the model's superior versatility and accuracy across different biological systems. Furthermore, in silico simulations of the sequencing process closely align with actual sequencing data, strongly reinforcing the critical role of high-dimensional RNA structures in determining sequencing efficiency and their impact on data quality. Our findings offer valuable insights into the underlying mechanisms of RNA structure-mediated sequencing bias. This deeper understanding enables more accurate and reliable RNA-seq analyses and is expected to improve the interpretation of transcriptomic data in future genomic studies.

RNA测序（RNA-seq）是转录组学分析的关键工具，可以全面探索不同生物背景下的基因表达。然而，RNA-seq数据容易受到各种偏差的影响，这些偏差会严重损害转录物定量的准确性和可靠性。本研究利用创新的无监督变分自编码器-高斯混合模型（VAE-GMM）研究了高维RNA结构对局部测序效率的影响。VAE-GMM通过学习紧凑的潜在表示有效地捕获复杂的高维k-mer结构相似性，从而降低了维数，同时一丝不苟地保留了对偏差识别至关重要的基本结构特征。这种复杂的建模允许精确跟踪局部rna读取转换动态和识别复杂的，经常被忽视的偏差源。我们严格验证了VAE-GMM模型对传统机器学习技术的性能和鲁棒性，包括高斯混合模型（仅限gmm）、基于主成分分析的gmm、k-means聚类和分层聚类。这些验证使用了广泛而多样的数据集，包括合成RNA结构、各种人类细胞系和真实的组织样本，一致地证明了该模型在不同生物系统中的优越多功能性和准确性。此外，测序过程的计算机模拟与实际测序数据密切一致，有力地强化了高维RNA结构在决定测序效率及其对数据质量的影响方面的关键作用。我们的发现为RNA结构介导的测序偏倚的潜在机制提供了有价值的见解。这种更深入的理解使RNA-seq分析更加准确和可靠，并有望在未来的基因组研究中改善转录组数据的解释。

{"title":"Deep structural clustering reveals hidden systematic biases in RNA sequencing data","authors":"Qiang Su, Yi Long, Deming Gou, Junmin Quan, Xiaoming Zhou, Qizhou Lian","doi":"10.1101/gr.280713.125","DOIUrl":"https://doi.org/10.1101/gr.280713.125","url":null,"abstract":"RNA sequencing (RNA-seq) is a pivotal tool for transcriptomic analysis, providing comprehensive exploration of gene expression across diverse biological contexts. However, RNA-seq data is susceptible to various biases that can significantly compromise the accuracy and reliability of transcript quantification. This study investigates the influence of high-dimensional RNA structures on local sequencing efficiency using an innovative unsupervised Variational Autoencoder-Gaussian Mixture Model (VAE-GMM). The VAE-GMM effectively captures intricate high-dimensional k-mer structural similarities by learning compact latent representations, which reduces dimensionality while meticulously preserving essential structural features crucial for bias identification. This sophisticated modeling allows precise tracking of local RNA-read conversion dynamics and the identification of complex, often overlooked, bias sources. We rigorously validate the VAE-GMM model's performance and robustness against conventional machine learning techniques, including Gaussian Mixture Models (GMM-only), Principal Component Analysis-based GMMs, k-means clustering, and Hierarchical Clustering. These validations, using an extensive and diverse array of datasets including synthetic RNA constructs, various human cell lines, and authentic tissue samples, consistently demonstrate the model's superior versatility and accuracy across different biological systems. Furthermore, in silico simulations of the sequencing process closely align with actual sequencing data, strongly reinforcing the critical role of high-dimensional RNA structures in determining sequencing efficiency and their impact on data quality. Our findings offer valuable insights into the underlying mechanisms of RNA structure-mediated sequencing bias. This deeper understanding enables more accurate and reliable RNA-seq analyses and is expected to improve the interpretation of transcriptomic data in future genomic studies.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"27 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145089444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Recalibrating differential gene expression by genetic dosage variance prioritizes functionally relevant genes 通过基因剂量方差重新校准差异基因表达优先考虑功能相关基因

IF 7 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research

Pub Date : 2025-09-17 DOI: 10.1101/gr.280360.124

Philipp Rentzsch, Aaron Kollotzek, Kaushik Ram Ganapathy, Pejman Mohammadi, Tuuli Lappalainen

Differential expression (DE) analysis is a widely used method for identifying genes that are functionally relevant for an observed phenotype or biological response. However, typical DE analysis includes selection of genes based on a threshold of fold change in expression under the implicit assumption that all genes are equally sensitive to dosage changes of their transcripts. This tends to favor highly variable genes over more constrained genes where even small changes in expression may be biologically relevant. To address this limitation, we have developed a method to recalibrate each gene's DE fold change based on genetic expression variance observed in the human population. The newly established metric ranks statistically differentially expressed genes, not by nominal change of expression, but by relative change in comparison to natural dosage variation for each gene. We apply our method to RNA sequencing data sets from in vitro stimulus response and neuropsychiatric disease experiments. Compared to the standard approach, our method adjusts the bias in discovery toward highly variable genes and enriches for pathways and biological processes related to metabolic and regulatory activity, indicating a prioritization of functionally relevant driver genes. Tissue-specific recalibration increases detection of known disease-relevant processes. Altogether, our method provides a novel view on DE and contributes toward bridging the existing gap between statistical and biological significance. We believe that this approach will simplify the identification of disease-causing molecular processes and enhance the discovery of therapeutic targets.

差异表达（DE）分析是一种广泛使用的方法，用于鉴定与观察到的表型或生物反应在功能上相关的基因。然而，典型的DE分析包括基于表达倍数变化阈值的基因选择，隐含的假设是所有基因对其转录物的剂量变化同样敏感。这倾向于支持高度可变的基因，而不是更受限制的基因，即使是表达的微小变化也可能具有生物学相关性。为了解决这一限制，我们开发了一种方法，根据在人群中观察到的遗传表达差异来重新校准每个基因的DE折叠变化。新建立的指标对统计上表达差异的基因进行排名，不是通过名义上的表达变化，而是通过与每个基因的自然剂量变化相比的相对变化。我们将我们的方法应用于体外刺激反应和神经精神疾病实验的RNA测序数据集。与标准方法相比，我们的方法调整了对高度可变基因的发现偏差，并丰富了与代谢和调节活动相关的途径和生物过程，表明了功能相关驱动基因的优先级。组织特异性重新校准增加了对已知疾病相关过程的检测。总之，我们的方法提供了一个关于DE的新观点，并有助于弥合统计和生物学意义之间的现有差距。我们相信这种方法将简化对致病分子过程的识别，并加强对治疗靶点的发现。

{"title":"Recalibrating differential gene expression by genetic dosage variance prioritizes functionally relevant genes","authors":"Philipp Rentzsch, Aaron Kollotzek, Kaushik Ram Ganapathy, Pejman Mohammadi, Tuuli Lappalainen","doi":"10.1101/gr.280360.124","DOIUrl":"https://doi.org/10.1101/gr.280360.124","url":null,"abstract":"Differential expression (DE) analysis is a widely used method for identifying genes that are functionally relevant for an observed phenotype or biological response. However, typical DE analysis includes selection of genes based on a threshold of fold change in expression under the implicit assumption that all genes are equally sensitive to dosage changes of their transcripts. This tends to favor highly variable genes over more constrained genes where even small changes in expression may be biologically relevant. To address this limitation, we have developed a method to recalibrate each gene's DE fold change based on genetic expression variance observed in the human population. The newly established metric ranks statistically differentially expressed genes, not by nominal change of expression, but by relative change in comparison to natural dosage variation for each gene. We apply our method to RNA sequencing data sets from in vitro stimulus response and neuropsychiatric disease experiments. Compared to the standard approach, our method adjusts the bias in discovery toward highly variable genes and enriches for pathways and biological processes related to metabolic and regulatory activity, indicating a prioritization of functionally relevant driver genes. Tissue-specific recalibration increases detection of known disease-relevant processes. Altogether, our method provides a novel view on DE and contributes toward bridging the existing gap between statistical and biological significance. We believe that this approach will simplify the identification of disease-causing molecular processes and enhance the discovery of therapeutic targets.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"53 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145077422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0