iModulonMiner and PyModulon: Software for unsupervised mining of gene expression compendia.

IF 3.8 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS PLoS Computational Biology Pub Date : 2024-10-23 eCollection Date: 2024-10-01 DOI:10.1371/journal.pcbi.1012546

Anand V Sastry, Yuan Yuan, Saugat Poudel, Kevin Rychel, Reo Yoo, Cameron R Lamoureux, Gaoyuan Li, Joshua T Burrows, Siddharth Chauhan, Zachary B Haiman, Tahani Al Bulushi, Yara Seif, Bernhard O Palsson, Daniel C Zielinski

{"title":"iModulonMiner and PyModulon: Software for unsupervised mining of gene expression compendia.","authors":"Anand V Sastry, Yuan Yuan, Saugat Poudel, Kevin Rychel, Reo Yoo, Cameron R Lamoureux, Gaoyuan Li, Joshua T Burrows, Siddharth Chauhan, Zachary B Haiman, Tahani Al Bulushi, Yara Seif, Bernhard O Palsson, Daniel C Zielinski","doi":"10.1371/journal.pcbi.1012546","DOIUrl":null,"url":null,"abstract":"<p><p>Public gene expression databases are a rapidly expanding resource of organism responses to diverse perturbations, presenting both an opportunity and a challenge for bioinformatics workflows to extract actionable knowledge of transcription regulatory network function. Here, we introduce a five-step computational pipeline, called iModulonMiner, to compile, process, curate, analyze, and characterize the totality of RNA-seq data for a given organism or cell type. This workflow is centered around the data-driven computation of co-regulated gene sets using Independent Component Analysis, called iModulons, which have been shown to have broad applications. As a demonstration, we applied this workflow to generate the iModulon structure of Bacillus subtilis using all high-quality, publicly-available RNA-seq data. Using this structure, we predicted regulatory interactions for multiple transcription factors, identified groups of co-expressed genes that are putatively regulated by undiscovered transcription factors, and predicted properties of a recently discovered single-subunit phage RNA polymerase. We also present a Python package, PyModulon, with functions to characterize, visualize, and explore computed iModulons. The pipeline, available at https://github.com/SBRG/iModulonMiner, can be readily applied to diverse organisms to gain a rapid understanding of their transcriptional regulatory network structure and condition-specific activity.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":null,"pages":null},"PeriodicalIF":3.8000,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11534266/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS Computational Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1371/journal.pcbi.1012546","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Public gene expression databases are a rapidly expanding resource of organism responses to diverse perturbations, presenting both an opportunity and a challenge for bioinformatics workflows to extract actionable knowledge of transcription regulatory network function. Here, we introduce a five-step computational pipeline, called iModulonMiner, to compile, process, curate, analyze, and characterize the totality of RNA-seq data for a given organism or cell type. This workflow is centered around the data-driven computation of co-regulated gene sets using Independent Component Analysis, called iModulons, which have been shown to have broad applications. As a demonstration, we applied this workflow to generate the iModulon structure of Bacillus subtilis using all high-quality, publicly-available RNA-seq data. Using this structure, we predicted regulatory interactions for multiple transcription factors, identified groups of co-expressed genes that are putatively regulated by undiscovered transcription factors, and predicted properties of a recently discovered single-subunit phage RNA polymerase. We also present a Python package, PyModulon, with functions to characterize, visualize, and explore computed iModulons. The pipeline, available at https://github.com/SBRG/iModulonMiner, can be readily applied to diverse organisms to gain a rapid understanding of their transcriptional regulatory network structure and condition-specific activity.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

iModulonMiner 和 PyModulon：用于无监督挖掘基因表达汇编的软件。

公共基因表达数据库是生物体对各种扰动反应的快速扩展资源，为生物信息学工作流提取转录调控网络功能的可行知识带来了机遇和挑战。在这里，我们介绍一种名为 iModulonMiner 的五步计算管道，用于编译、处理、整理、分析和表征特定生物体或细胞类型的全部 RNA-seq 数据。该工作流程的核心是利用独立成分分析法计算数据驱动的共调基因集，即 iModulons，它已被证明具有广泛的应用价值。作为演示，我们使用这一工作流程，利用所有高质量的公开 RNA-seq 数据生成了枯草杆菌的 iModulon 结构。利用这一结构，我们预测了多个转录因子的调控相互作用，确定了可能受未发现的转录因子调控的共表达基因组，并预测了最近发现的单亚基噬菌体 RNA 聚合酶的特性。我们还介绍了一个 Python 软件包 PyModulon，其中包含用于描述、可视化和探索计算 iModulons 的函数。该管道可在 https://github.com/SBRG/iModulonMiner 上获取，可随时应用于各种生物，以快速了解它们的转录调控网络结构和特定条件下的活性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

PLoS Computational Biology BIOCHEMICAL RESEARCH METHODS-MATHEMATICAL & COMPUTATIONAL BIOLOGY

CiteScore

7.10

自引率

4.70%

发文量

820

审稿时长

2.5 months

期刊介绍： PLOS Computational Biology features works of exceptional significance that further our understanding of living systems at all scales—from molecules and cells, to patient populations and ecosystems—through the application of computational methods. Readers include life and computational scientists, who can take the important findings presented here to the next level of discovery. Research articles must be declared as belonging to a relevant section. More information about the sections can be found in the submission guidelines. Research articles should model aspects of biological systems, demonstrate both methodological and scientific novelty, and provide profound new biological insights. Generally, reliability and significance of biological discovery through computation should be validated and enriched by experimental studies. Inclusion of experimental validation is not required for publication, but should be referenced where possible. Inclusion of experimental validation of a modest biological discovery through computation does not render a manuscript suitable for PLOS Computational Biology. Research articles specifically designated as Methods papers should describe outstanding methods of exceptional importance that have been shown, or have the promise to provide new biological insights. The method must already be widely adopted, or have the promise of wide adoption by a broad community of users. Enhancements to existing published methods will only be considered if those enhancements bring exceptional new capabilities.