首页 > 最新文献

Genome Biology最新文献

英文 中文
Melon: metagenomic long-read-based taxonomic identification and quantification using marker genes 甜瓜:利用标记基因进行基于元基因组长读数的分类鉴定和量化
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2024-08-19 DOI: 10.1186/s13059-024-03363-y
Xi Chen, Xiaole Yin, Xianghui Shi, Weifu Yan, Yu Yang, Lei Liu, Tong Zhang
Long-read sequencing holds great potential for characterizing complex microbial communities, yet taxonomic profiling tools designed specifically for long reads remain lacking. We introduce Melon, a novel marker-based taxonomic profiler that capitalizes on the unique attributes of long reads. Melon employs a two-stage classification scheme to reduce computational time and is equipped with an expectation-maximization-based post-correction module to handle ambiguous reads. Melon achieves superior performance compared to existing tools in both mock and simulated samples. Using wastewater metagenomic samples, we demonstrate the applicability of Melon by showing it provides reliable estimates of overall genome copies, and species-level taxonomic profiles.
长读数测序在表征复杂的微生物群落方面具有巨大潜力,但专门针对长读数设计的分类剖析工具仍然缺乏。我们介绍了基于标记的新型分类剖析器 Melon,它充分利用了长读数的独特属性。Melon 采用两阶段分类方案来减少计算时间,并配备了基于期望最大化的后校正模块来处理模糊读数。与现有工具相比,Melon 在模拟样本和仿真样本中都取得了优异的性能。通过使用废水元基因组样本,我们证明了 Melon 的适用性,它能可靠地估计总体基因组拷贝数和物种级分类概况。
{"title":"Melon: metagenomic long-read-based taxonomic identification and quantification using marker genes","authors":"Xi Chen, Xiaole Yin, Xianghui Shi, Weifu Yan, Yu Yang, Lei Liu, Tong Zhang","doi":"10.1186/s13059-024-03363-y","DOIUrl":"https://doi.org/10.1186/s13059-024-03363-y","url":null,"abstract":"Long-read sequencing holds great potential for characterizing complex microbial communities, yet taxonomic profiling tools designed specifically for long reads remain lacking. We introduce Melon, a novel marker-based taxonomic profiler that capitalizes on the unique attributes of long reads. Melon employs a two-stage classification scheme to reduce computational time and is equipped with an expectation-maximization-based post-correction module to handle ambiguous reads. Melon achieves superior performance compared to existing tools in both mock and simulated samples. Using wastewater metagenomic samples, we demonstrate the applicability of Melon by showing it provides reliable estimates of overall genome copies, and species-level taxonomic profiles.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"2014 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142002819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Overlooked poor-quality patient samples in sequencing data impair reproducibility of published clinically relevant datasets 测序数据中被忽视的劣质患者样本损害了已发布的临床相关数据集的可重复性
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2024-08-16 DOI: 10.1186/s13059-024-03331-6
Maximilian Sprang, Jannik Möllmann, Miguel A. Andrade-Navarro, Jean-Fred Fontaine
Reproducibility is a major concern in biomedical studies, and existing publication guidelines do not solve the problem. Batch effects and quality imbalances between groups of biological samples are major factors hampering reproducibility. Yet, the latter is rarely considered in the scientific literature. Our analysis uses 40 clinically relevant RNA-seq datasets to quantify the impact of quality imbalance between groups of samples on the reproducibility of gene expression studies. High-quality imbalance is frequent (14 datasets; 35%), and hundreds of quality markers are present in more than 50% of the datasets. Enrichment analysis suggests common stress-driven effects among the low-quality samples and highlights a complementary role of transcription factors and miRNAs to regulate stress response. Preliminary ChIP-seq results show similar trends. Quality imbalance has an impact on the number of differential genes derived by comparing control to disease samples (the higher the imbalance, the higher the number of genes), on the proportion of quality markers in top differential genes (the higher the imbalance, the higher the proportion; up to 22%) and on the proportion of known disease genes in top differential genes (the higher the imbalance, the lower the proportion). We show that removing outliers based on their quality score improves the resulting downstream analysis. Thanks to a stringent selection of well-designed datasets, we demonstrate that quality imbalance between groups of samples can significantly reduce the relevance of differential genes, consequently reducing reproducibility between studies. Appropriate experimental design and analysis methods can substantially reduce the problem.
可重复性是生物医学研究中的一个主要问题,而现有的发表指南并不能解决这个问题。生物样本组之间的批次效应和质量不平衡是影响可重复性的主要因素。然而,科学文献很少考虑后者。我们的分析使用了 40 个临床相关的 RNA-seq 数据集,以量化样本组间质量不平衡对基因表达研究可重复性的影响。高质量不平衡现象很常见(14 个数据集,占 35%),50% 以上的数据集中存在数百个质量标记。富集分析表明,在低质量样本中存在共同的应激驱动效应,并强调了转录因子和 miRNA 在调节应激反应中的互补作用。初步的 ChIP-seq 结果显示了类似的趋势。质量不平衡会影响通过比较对照样本和疾病样本得出的差异基因数量(不平衡越高,基因数量越多)、顶级差异基因中质量标记物的比例(不平衡越高,比例越高;最高达 22%)以及顶级差异基因中已知疾病基因的比例(不平衡越高,比例越低)。我们的研究表明,根据异常值的质量得分剔除异常值可以改善下游分析结果。由于严格选择了设计良好的数据集,我们证明样本组之间的质量不平衡会显著降低差异基因的相关性,从而降低研究之间的可重复性。适当的实验设计和分析方法可以大大减少这一问题。
{"title":"Overlooked poor-quality patient samples in sequencing data impair reproducibility of published clinically relevant datasets","authors":"Maximilian Sprang, Jannik Möllmann, Miguel A. Andrade-Navarro, Jean-Fred Fontaine","doi":"10.1186/s13059-024-03331-6","DOIUrl":"https://doi.org/10.1186/s13059-024-03331-6","url":null,"abstract":"Reproducibility is a major concern in biomedical studies, and existing publication guidelines do not solve the problem. Batch effects and quality imbalances between groups of biological samples are major factors hampering reproducibility. Yet, the latter is rarely considered in the scientific literature. Our analysis uses 40 clinically relevant RNA-seq datasets to quantify the impact of quality imbalance between groups of samples on the reproducibility of gene expression studies. High-quality imbalance is frequent (14 datasets; 35%), and hundreds of quality markers are present in more than 50% of the datasets. Enrichment analysis suggests common stress-driven effects among the low-quality samples and highlights a complementary role of transcription factors and miRNAs to regulate stress response. Preliminary ChIP-seq results show similar trends. Quality imbalance has an impact on the number of differential genes derived by comparing control to disease samples (the higher the imbalance, the higher the number of genes), on the proportion of quality markers in top differential genes (the higher the imbalance, the higher the proportion; up to 22%) and on the proportion of known disease genes in top differential genes (the higher the imbalance, the lower the proportion). We show that removing outliers based on their quality score improves the resulting downstream analysis. Thanks to a stringent selection of well-designed datasets, we demonstrate that quality imbalance between groups of samples can significantly reduce the relevance of differential genes, consequently reducing reproducibility between studies. Appropriate experimental design and analysis methods can substantially reduce the problem.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"6 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141991832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Benchmarking computational methods for single-cell chromatin data analysis 单细胞染色质数据分析计算方法基准测试
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2024-08-16 DOI: 10.1186/s13059-024-03356-x
Siyuan Luo, Pierre-Luc Germain, Mark D. Robinson, Ferdinand von Meyenn
Single-cell chromatin accessibility assays, such as scATAC-seq, are increasingly employed in individual and joint multi-omic profiling of single cells. As the accumulation of scATAC-seq and multi-omics datasets continue, challenges in analyzing such sparse, noisy, and high-dimensional data become pressing. Specifically, one challenge relates to optimizing the processing of chromatin-level measurements and efficiently extracting information to discern cellular heterogeneity. This is of critical importance, since the identification of cell types is a fundamental step in current single-cell data analysis practices. We benchmark 8 feature engineering pipelines derived from 5 recent methods to assess their ability to discover and discriminate cell types. By using 10 metrics calculated at the cell embedding, shared nearest neighbor graph, or partition levels, we evaluate the performance of each method at different data processing stages. This comprehensive approach allows us to thoroughly understand the strengths and weaknesses of each method and the influence of parameter selection. Our analysis provides guidelines for choosing analysis methods for different datasets. Overall, feature aggregation, SnapATAC, and SnapATAC2 outperform latent semantic indexing-based methods. For datasets with complex cell-type structures, SnapATAC and SnapATAC2 are preferred. With large datasets, SnapATAC2 and ArchR are most scalable.
单细胞染色质可及性测定(如 scATAC-seq)越来越多地被用于单细胞的个体和联合多组学分析。随着 scATAC-seq 和多组学数据集的不断积累,分析这类稀疏、高噪声和高维数据的挑战变得越来越紧迫。具体来说,其中一个挑战与优化染色质水平测量的处理和有效提取信息以辨别细胞异质性有关。这一点至关重要,因为细胞类型的识别是当前单细胞数据分析实践的基本步骤。我们对源自 5 种最新方法的 8 个特征工程管道进行了基准测试,以评估它们发现和鉴别细胞类型的能力。通过使用在细胞嵌入、共享近邻图或分区层面计算的 10 个指标,我们评估了每种方法在不同数据处理阶段的性能。这种全面的方法使我们能够彻底了解每种方法的优缺点以及参数选择的影响。我们的分析为不同数据集选择分析方法提供了指导。总体而言,特征聚合、SnapATAC 和 SnapATAC2 优于基于潜在语义索引的方法。对于具有复杂细胞类型结构的数据集,SnapATAC 和 SnapATAC2 更受青睐。对于大型数据集,SnapATAC2 和 ArchR 的可扩展性最好。
{"title":"Benchmarking computational methods for single-cell chromatin data analysis","authors":"Siyuan Luo, Pierre-Luc Germain, Mark D. Robinson, Ferdinand von Meyenn","doi":"10.1186/s13059-024-03356-x","DOIUrl":"https://doi.org/10.1186/s13059-024-03356-x","url":null,"abstract":"Single-cell chromatin accessibility assays, such as scATAC-seq, are increasingly employed in individual and joint multi-omic profiling of single cells. As the accumulation of scATAC-seq and multi-omics datasets continue, challenges in analyzing such sparse, noisy, and high-dimensional data become pressing. Specifically, one challenge relates to optimizing the processing of chromatin-level measurements and efficiently extracting information to discern cellular heterogeneity. This is of critical importance, since the identification of cell types is a fundamental step in current single-cell data analysis practices. We benchmark 8 feature engineering pipelines derived from 5 recent methods to assess their ability to discover and discriminate cell types. By using 10 metrics calculated at the cell embedding, shared nearest neighbor graph, or partition levels, we evaluate the performance of each method at different data processing stages. This comprehensive approach allows us to thoroughly understand the strengths and weaknesses of each method and the influence of parameter selection. Our analysis provides guidelines for choosing analysis methods for different datasets. Overall, feature aggregation, SnapATAC, and SnapATAC2 outperform latent semantic indexing-based methods. For datasets with complex cell-type structures, SnapATAC and SnapATAC2 are preferred. With large datasets, SnapATAC2 and ArchR are most scalable.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"135 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141991924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
StaVia: spatially and temporally aware cartography with higher-order random walks for cell atlases StaVia:利用高阶随机游走为细胞图谱绘制时空感知地图
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2024-08-16 DOI: 10.1186/s13059-024-03347-y
Shobana V. Stassen, Minato Kobashi, Edmund Y. Lam, Yuanhua Huang, Joshua W. K. Ho, Kevin K. Tsia
Single-cell atlases pose daunting computational challenges pertaining to the integration of spatial and temporal information and the visualization of trajectories across large atlases. We introduce StaVia, a computational framework that synergizes multi-faceted single-cell data with higher-order random walks that leverage the memory of cells’ past states, fused with a cartographic Atlas View that offers intuitive graph visualization. This spatially aware cartography captures relationships between cell populations based on their spatial location as well as their gene expression and developmental stage. We demonstrate this using zebrafish gastrulation data, underscoring its potential to dissect complex biological landscapes in both spatial and temporal contexts.
单细胞图谱在时空信息整合和大型图谱轨迹可视化方面提出了艰巨的计算挑战。我们介绍的 StaVia 是一种计算框架,它将多方面的单细胞数据与利用细胞过去状态记忆的高阶随机游走相结合,并与提供直观图形可视化的地图集视图相融合。这种空间感知制图技术能根据细胞的空间位置、基因表达和发育阶段捕捉细胞群之间的关系。我们利用斑马鱼的胚胎发育数据演示了这一点,强调了它在空间和时间背景下剖析复杂生物景观的潜力。
{"title":"StaVia: spatially and temporally aware cartography with higher-order random walks for cell atlases","authors":"Shobana V. Stassen, Minato Kobashi, Edmund Y. Lam, Yuanhua Huang, Joshua W. K. Ho, Kevin K. Tsia","doi":"10.1186/s13059-024-03347-y","DOIUrl":"https://doi.org/10.1186/s13059-024-03347-y","url":null,"abstract":"Single-cell atlases pose daunting computational challenges pertaining to the integration of spatial and temporal information and the visualization of trajectories across large atlases. We introduce StaVia, a computational framework that synergizes multi-faceted single-cell data with higher-order random walks that leverage the memory of cells’ past states, fused with a cartographic Atlas View that offers intuitive graph visualization. This spatially aware cartography captures relationships between cell populations based on their spatial location as well as their gene expression and developmental stage. We demonstrate this using zebrafish gastrulation data, underscoring its potential to dissect complex biological landscapes in both spatial and temporal contexts.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"30 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141991904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
scParser: sparse representation learning for scalable single-cell RNA sequencing data analysis scParser:用于可扩展单细胞 RNA 测序数据分析的稀疏表示学习
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2024-08-16 DOI: 10.1186/s13059-024-03345-0
Kai Zhao, Hon-Cheong So, Zhixiang Lin
The rapid rise in the availability and scale of scRNA-seq data needs scalable methods for integrative analysis. Though many methods for data integration have been developed, few focus on understanding the heterogeneous effects of biological conditions across different cell populations in integrative analysis. Our proposed scalable approach, scParser, models the heterogeneous effects from biological conditions, which unveils the key mechanisms by which gene expression contributes to phenotypes. Notably, the extended scParser pinpoints biological processes in cell subpopulations that contribute to disease pathogenesis. scParser achieves favorable performance in cell clustering compared to state-of-the-art methods and has a broad and diverse applicability.
scRNA-seq 数据的可用性和规模的快速增长需要可扩展的整合分析方法。虽然已经开发出许多数据整合方法,但很少有方法能在整合分析中重点了解不同细胞群中生物条件的异质性影响。我们提出的可扩展方法 scParser 对生物条件的异质性影响进行建模,从而揭示基因表达对表型产生影响的关键机制。与最先进的方法相比,scParser 在细胞聚类方面取得了良好的性能,并具有广泛而多样的适用性。
{"title":"scParser: sparse representation learning for scalable single-cell RNA sequencing data analysis","authors":"Kai Zhao, Hon-Cheong So, Zhixiang Lin","doi":"10.1186/s13059-024-03345-0","DOIUrl":"https://doi.org/10.1186/s13059-024-03345-0","url":null,"abstract":"The rapid rise in the availability and scale of scRNA-seq data needs scalable methods for integrative analysis. Though many methods for data integration have been developed, few focus on understanding the heterogeneous effects of biological conditions across different cell populations in integrative analysis. Our proposed scalable approach, scParser, models the heterogeneous effects from biological conditions, which unveils the key mechanisms by which gene expression contributes to phenotypes. Notably, the extended scParser pinpoints biological processes in cell subpopulations that contribute to disease pathogenesis. scParser achieves favorable performance in cell clustering compared to state-of-the-art methods and has a broad and diverse applicability.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"142 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141991925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Associating transcription factors to single-cell trajectories with DREAMIT 利用 DREAMIT 将转录因子与单细胞轨迹联系起来
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2024-08-14 DOI: 10.1186/s13059-024-03368-7
Nathan D. Maulding, Lucas Seninge, Joshua M. Stuart
Inferring gene regulatory networks from single-cell RNA-sequencing trajectories has been an active area of research yet methods are still needed to identify regulators governing cell transitions. We developed DREAMIT (Dynamic Regulation of Expression Across Modules in Inferred Trajectories) to annotate transcription-factor activity along single-cell trajectory branches, using ensembles of relations to target genes. Using a benchmark representing several different tissues, as well as external validation with ATAC-Seq and Perturb-Seq data on hematopoietic cells, the method was found to have higher tissue-specific sensitivity and specificity over competing approaches.
从单细胞 RNA 测序轨迹推断基因调控网络一直是一个活跃的研究领域,但仍需要一些方法来识别细胞转换的调控因子。我们开发了 DREAMIT(推断轨迹中跨模块表达的动态调控),利用目标基因的关系集合,沿单细胞轨迹分支注释转录因子的活性。通过使用代表几种不同组织的基准以及造血细胞的 ATAC-Seq 和 Perturb-Seq 数据进行外部验证,发现该方法比其他竞争方法具有更高的组织特异性和特异性。
{"title":"Associating transcription factors to single-cell trajectories with DREAMIT","authors":"Nathan D. Maulding, Lucas Seninge, Joshua M. Stuart","doi":"10.1186/s13059-024-03368-7","DOIUrl":"https://doi.org/10.1186/s13059-024-03368-7","url":null,"abstract":"Inferring gene regulatory networks from single-cell RNA-sequencing trajectories has been an active area of research yet methods are still needed to identify regulators governing cell transitions. We developed DREAMIT (Dynamic Regulation of Expression Across Modules in Inferred Trajectories) to annotate transcription-factor activity along single-cell trajectory branches, using ensembles of relations to target genes. Using a benchmark representing several different tissues, as well as external validation with ATAC-Seq and Perturb-Seq data on hematopoietic cells, the method was found to have higher tissue-specific sensitivity and specificity over competing approaches.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"40 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141980908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comprehensive network modeling approaches unravel dynamic enhancer-promoter interactions across neural differentiation 综合网络建模方法揭示神经分化过程中增强子与启动子之间的动态相互作用
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2024-08-14 DOI: 10.1186/s13059-024-03365-w
William DeGroat, Fumitaka Inoue, Tal Ashuach, Nir Yosef, Nadav Ahituv, Anat Kreimer
Increasing evidence suggests that a substantial proportion of disease-associated mutations occur in enhancers, regions of non-coding DNA essential to gene regulation. Understanding the structures and mechanisms of the regulatory programs this variation affects can shed light on the apparatuses of human diseases. We collect epigenetic and gene expression datasets from seven early time points during neural differentiation. Focusing on this model system, we construct networks of enhancer-promoter interactions, each at an individual stage of neural induction. These networks serve as the base for a rich series of analyses, through which we demonstrate their temporal dynamics and enrichment for various disease-associated variants. We apply the Girvan-Newman clustering algorithm to these networks to reveal biologically relevant substructures of regulation. Additionally, we demonstrate methods to validate predicted enhancer-promoter interactions using transcription factor overexpression and massively parallel reporter assays. Our findings suggest a generalizable framework for exploring gene regulatory programs and their dynamics across developmental processes; this includes a comprehensive approach to studying the effects of disease-associated variation on transcriptional networks. The techniques applied to our networks have been published alongside our findings as a computational tool, E-P-INAnalyzer. Our procedure can be utilized across different cellular contexts and disorders.
越来越多的证据表明,相当一部分与疾病相关的突变发生在增强子中,而增强子是非编码 DNA 中对基因调控至关重要的区域。了解这种变异所影响的调控程序的结构和机制可以揭示人类疾病的机制。我们收集了神经分化过程中七个早期时间点的表观遗传和基因表达数据集。针对这一模型系统,我们构建了增强子-启动子相互作用网络,每个网络都处于神经诱导的一个单独阶段。这些网络是一系列丰富分析的基础,我们通过这些分析展示了它们的时间动态和各种疾病相关变异的富集。我们将 Girvan-Newman 聚类算法应用于这些网络,以揭示与生物学相关的调控子结构。此外,我们还展示了利用转录因子过表达和大规模并行报告实验验证预测的增强子-启动子相互作用的方法。我们的研究结果为探索基因调控程序及其在整个发育过程中的动态提供了一个可推广的框架;其中包括一种研究疾病相关变异对转录网络影响的综合方法。应用于我们网络的技术已作为计算工具 E-P-INAnalyzer 与我们的研究结果一同发表。我们的程序可用于不同的细胞环境和疾病。
{"title":"Comprehensive network modeling approaches unravel dynamic enhancer-promoter interactions across neural differentiation","authors":"William DeGroat, Fumitaka Inoue, Tal Ashuach, Nir Yosef, Nadav Ahituv, Anat Kreimer","doi":"10.1186/s13059-024-03365-w","DOIUrl":"https://doi.org/10.1186/s13059-024-03365-w","url":null,"abstract":"Increasing evidence suggests that a substantial proportion of disease-associated mutations occur in enhancers, regions of non-coding DNA essential to gene regulation. Understanding the structures and mechanisms of the regulatory programs this variation affects can shed light on the apparatuses of human diseases. We collect epigenetic and gene expression datasets from seven early time points during neural differentiation. Focusing on this model system, we construct networks of enhancer-promoter interactions, each at an individual stage of neural induction. These networks serve as the base for a rich series of analyses, through which we demonstrate their temporal dynamics and enrichment for various disease-associated variants. We apply the Girvan-Newman clustering algorithm to these networks to reveal biologically relevant substructures of regulation. Additionally, we demonstrate methods to validate predicted enhancer-promoter interactions using transcription factor overexpression and massively parallel reporter assays. Our findings suggest a generalizable framework for exploring gene regulatory programs and their dynamics across developmental processes; this includes a comprehensive approach to studying the effects of disease-associated variation on transcriptional networks. The techniques applied to our networks have been published alongside our findings as a computational tool, E-P-INAnalyzer. Our procedure can be utilized across different cellular contexts and disorders.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"11 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141980905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The GC-content at the 5′ ends of human protein-coding genes is undergoing mutational decay 人类蛋白质编码基因 5′末端的 GC 含量正在发生突变衰减
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2024-08-13 DOI: 10.1186/s13059-024-03364-x
Yi Qiu, Yoon Mo Kang, Christopher Korfmann, Fanny Pouyet, Andrew Eckford, Alexander F. Palazzo
In vertebrates, most protein-coding genes have a peak of GC-content near their 5′ transcriptional start site (TSS). This feature promotes both the efficient nuclear export and translation of mRNAs. Despite the importance of GC-content for RNA metabolism, its general features, origin, and maintenance remain mysterious. We investigate the evolutionary forces shaping GC-content at the transcriptional start site (TSS) of genes through both comparative genomic analysis of nucleotide substitution rates between different species and by examining human de novo mutations. Our data suggests that GC-peaks at TSSs were present in the last common ancestor of amniotes, and likely that of vertebrates. We observe that in apes and rodents, where recombination is directed away from TSSs by PRDM9, GC-content at the 5′ end of protein-coding gene is currently undergoing mutational decay. In canids, which lack PRDM9 and perform recombination at TSSs, GC-content at the 5′ end of protein-coding is increasing. We show that these patterns extend into the 5′ end of the open reading frame, thus impacting synonymous codon position choices. Our results indicate that the dynamics of this GC-peak in amniotes is largely shaped by historic patterns of recombination. Since decay of GC-content towards the mutation rate equilibrium is the default state for non-functional DNA, the observed decrease in GC-content at TSSs in apes and rodents indicates that the GC-peak is not being maintained by selection on most protein-coding genes in those species.
在脊椎动物中,大多数蛋白质编码基因的 5′转录起始位点(TSS)附近都有一个 GC 含量峰。这一特征促进了 mRNA 的有效核输出和翻译。尽管 GC 含量对 RNA 代谢非常重要,但它的一般特征、起源和维持仍然是个谜。我们通过对不同物种间核苷酸替换率的基因组比较分析,以及对人类新突变的研究,探讨了基因转录起始位点(TSS)GC-content的进化力量。我们的数据表明,转录起始位点的 GC 峰存在于羊膜动物的最后一个共同祖先,也很可能存在于脊椎动物的最后一个共同祖先。我们观察到,在猿类和啮齿类动物中,重组被 PRDM9 引导远离 TSS,蛋白编码基因 5′端的 GC 内容目前正在发生突变衰减。犬科动物缺乏 PRDM9,并在 TSS 处进行重组,因此蛋白编码基因 5′ 端的 GC 含量正在增加。我们的研究表明,这些模式延伸到了开放阅读框的 5′端,从而影响了同义密码子位置的选择。我们的研究结果表明,羊膜动物中这一 GC 峰的动态在很大程度上受历史重组模式的影响。由于 GC 含量向突变率平衡衰减是无功能 DNA 的默认状态,在猿类和啮齿类动物中观察到的 TSS 处 GC 含量的下降表明,在这些物种中,大多数蛋白质编码基因的选择并没有维持 GC 峰。
{"title":"The GC-content at the 5′ ends of human protein-coding genes is undergoing mutational decay","authors":"Yi Qiu, Yoon Mo Kang, Christopher Korfmann, Fanny Pouyet, Andrew Eckford, Alexander F. Palazzo","doi":"10.1186/s13059-024-03364-x","DOIUrl":"https://doi.org/10.1186/s13059-024-03364-x","url":null,"abstract":"In vertebrates, most protein-coding genes have a peak of GC-content near their 5′ transcriptional start site (TSS). This feature promotes both the efficient nuclear export and translation of mRNAs. Despite the importance of GC-content for RNA metabolism, its general features, origin, and maintenance remain mysterious. We investigate the evolutionary forces shaping GC-content at the transcriptional start site (TSS) of genes through both comparative genomic analysis of nucleotide substitution rates between different species and by examining human de novo mutations. Our data suggests that GC-peaks at TSSs were present in the last common ancestor of amniotes, and likely that of vertebrates. We observe that in apes and rodents, where recombination is directed away from TSSs by PRDM9, GC-content at the 5′ end of protein-coding gene is currently undergoing mutational decay. In canids, which lack PRDM9 and perform recombination at TSSs, GC-content at the 5′ end of protein-coding is increasing. We show that these patterns extend into the 5′ end of the open reading frame, thus impacting synonymous codon position choices. Our results indicate that the dynamics of this GC-peak in amniotes is largely shaped by historic patterns of recombination. Since decay of GC-content towards the mutation rate equilibrium is the default state for non-functional DNA, the observed decrease in GC-content at TSSs in apes and rodents indicates that the GC-peak is not being maintained by selection on most protein-coding genes in those species.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"16 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141973773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SynGAP: a synteny-based toolkit for gene structure annotation polishing SynGAP:基于同源关系的基因结构注释工具包
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2024-08-13 DOI: 10.1186/s13059-024-03359-8
Fengqi Wu, Yingxiao Mai, Chengjie Chen, Rui Xia
Genome sequencing has become a routine task for biologists, but the challenge of gene structure annotation persists, impeding accurate genomic and genetic research. Here, we present a bioinformatics toolkit, SynGAP (Synteny-based Gene structure Annotation Polisher), which uses gene synteny information to accomplish precise and automated polishing of gene structure annotation of genomes. SynGAP offers exceptional capabilities in the improvement of gene structure annotation quality and the profiling of integrative gene synteny between species. Furthermore, an expression variation index is designed for comparative transcriptomics analysis to explore candidate genes responsible for the development of distinct traits observed in phylogenetically related species.
基因组测序已成为生物学家的常规工作,但基因结构注释的难题依然存在,阻碍了基因组和遗传学研究的准确性。在这里,我们提出了一个生物信息学工具包 SynGAP(基于基因合成信息的基因结构注释抛光器),它利用基因合成信息对基因组的基因结构注释进行精确的自动抛光。SynGAP 在提高基因结构注释质量和分析物种间综合基因同源关系方面具有卓越的能力。此外,SynGAP 还为比较转录组学分析设计了表达变异指数,以探索在系统发育相关物种中观察到的负责形成不同性状的候选基因。
{"title":"SynGAP: a synteny-based toolkit for gene structure annotation polishing","authors":"Fengqi Wu, Yingxiao Mai, Chengjie Chen, Rui Xia","doi":"10.1186/s13059-024-03359-8","DOIUrl":"https://doi.org/10.1186/s13059-024-03359-8","url":null,"abstract":"Genome sequencing has become a routine task for biologists, but the challenge of gene structure annotation persists, impeding accurate genomic and genetic research. Here, we present a bioinformatics toolkit, SynGAP (Synteny-based Gene structure Annotation Polisher), which uses gene synteny information to accomplish precise and automated polishing of gene structure annotation of genomes. SynGAP offers exceptional capabilities in the improvement of gene structure annotation quality and the profiling of integrative gene synteny between species. Furthermore, an expression variation index is designed for comparative transcriptomics analysis to explore candidate genes responsible for the development of distinct traits observed in phylogenetically related species.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"47 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141973774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prevalence of and gene regulatory constraints on transcriptional adaptation in single cells 单细胞转录适应的普遍性和基因调控约束
IF 12.3 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Pub Date : 2024-08-12 DOI: 10.1186/s13059-024-03351-2
Ian A. Mellis, Madeline E. Melzer, Nicholas Bodkin, Yogesh Goyal
Cells and tissues have a remarkable ability to adapt to genetic perturbations via a variety of molecular mechanisms. Nonsense-induced transcriptional compensation, a form of transcriptional adaptation, has recently emerged as one such mechanism, in which nonsense mutations in a gene trigger upregulation of related genes, possibly conferring robustness at cellular and organismal levels. However, beyond a handful of developmental contexts and curated sets of genes, no comprehensive genome-wide investigation of this behavior has been undertaken for mammalian cell types and conditions. How the regulatory-level effects of inherently stochastic compensatory gene networks contribute to phenotypic penetrance in single cells remains unclear. We analyze existing bulk and single-cell transcriptomic datasets to uncover the prevalence of transcriptional adaptation in mammalian systems across diverse contexts and cell types. We perform regulon gene expression analyses of transcription factor target sets in both bulk and pooled single-cell genetic perturbation datasets. Our results reveal greater robustness in expression of regulons of transcription factors exhibiting transcriptional adaptation compared to those of transcription factors that do not. Stochastic mathematical modeling of minimal compensatory gene networks qualitatively recapitulates several aspects of transcriptional adaptation, including paralog upregulation and robustness to mutation. Combined with machine learning analysis of network features of interest, our framework offers potential explanations for which regulatory steps are most important for transcriptional adaptation. Our integrative approach identifies several putative hits—genes demonstrating possible transcriptional adaptation—to follow-up on experimentally and provides a formal quantitative framework to test and refine models of transcriptional adaptation.
细胞和组织具有通过各种分子机制适应遗传扰动的非凡能力。无义诱导转录补偿是转录适应的一种形式,它是最近出现的一种机制,在这种机制中,基因中的无义突变会触发相关基因的上调,从而可能在细胞和生物体水平上赋予稳健性。然而,除了少数几种发育环境和经过筛选的基因集之外,还没有针对哺乳动物细胞类型和条件对这种行为进行过全面的全基因组调查。固有随机代偿基因网络的调控水平效应如何在单细胞中促成表型的穿透性仍不清楚。我们分析了现有的体细胞和单细胞转录组数据集,以揭示哺乳动物系统在不同环境和细胞类型中转录适应的普遍性。我们对大容量数据集和汇集的单细胞遗传扰乱数据集中的转录因子目标集进行了调控基因表达分析。我们的结果表明,与不表现出转录适应性的转录因子相比,表现出转录适应性的转录因子调控子的表达具有更强的稳健性。最小补偿基因网络的随机数学建模定性地再现了转录适应的几个方面,包括旁系上调和对突变的稳健性。结合对相关网络特征的机器学习分析,我们的框架为哪些调控步骤对转录适应最重要提供了可能的解释。我们的综合方法确定了几种可能的命中基因--展示了可能的转录适应性的基因--以进行后续实验,并提供了一个正式的定量框架来测试和完善转录适应性模型。
{"title":"Prevalence of and gene regulatory constraints on transcriptional adaptation in single cells","authors":"Ian A. Mellis, Madeline E. Melzer, Nicholas Bodkin, Yogesh Goyal","doi":"10.1186/s13059-024-03351-2","DOIUrl":"https://doi.org/10.1186/s13059-024-03351-2","url":null,"abstract":"Cells and tissues have a remarkable ability to adapt to genetic perturbations via a variety of molecular mechanisms. Nonsense-induced transcriptional compensation, a form of transcriptional adaptation, has recently emerged as one such mechanism, in which nonsense mutations in a gene trigger upregulation of related genes, possibly conferring robustness at cellular and organismal levels. However, beyond a handful of developmental contexts and curated sets of genes, no comprehensive genome-wide investigation of this behavior has been undertaken for mammalian cell types and conditions. How the regulatory-level effects of inherently stochastic compensatory gene networks contribute to phenotypic penetrance in single cells remains unclear. We analyze existing bulk and single-cell transcriptomic datasets to uncover the prevalence of transcriptional adaptation in mammalian systems across diverse contexts and cell types. We perform regulon gene expression analyses of transcription factor target sets in both bulk and pooled single-cell genetic perturbation datasets. Our results reveal greater robustness in expression of regulons of transcription factors exhibiting transcriptional adaptation compared to those of transcription factors that do not. Stochastic mathematical modeling of minimal compensatory gene networks qualitatively recapitulates several aspects of transcriptional adaptation, including paralog upregulation and robustness to mutation. Combined with machine learning analysis of network features of interest, our framework offers potential explanations for which regulatory steps are most important for transcriptional adaptation. Our integrative approach identifies several putative hits—genes demonstrating possible transcriptional adaptation—to follow-up on experimentally and provides a formal quantitative framework to test and refine models of transcriptional adaptation.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"5 1","pages":""},"PeriodicalIF":12.3,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141918821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Genome Biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1