iModulonMiner and PyModulon: Software for unsupervised mining of gene expression compendia.

IF 3.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS PLoS Computational Biology Pub Date : 2024-10-23 eCollection Date: 2024-10-01 DOI:10.1371/journal.pcbi.1012546
Anand V Sastry, Yuan Yuan, Saugat Poudel, Kevin Rychel, Reo Yoo, Cameron R Lamoureux, Gaoyuan Li, Joshua T Burrows, Siddharth Chauhan, Zachary B Haiman, Tahani Al Bulushi, Yara Seif, Bernhard O Palsson, Daniel C Zielinski
{"title":"iModulonMiner and PyModulon: Software for unsupervised mining of gene expression compendia.","authors":"Anand V Sastry, Yuan Yuan, Saugat Poudel, Kevin Rychel, Reo Yoo, Cameron R Lamoureux, Gaoyuan Li, Joshua T Burrows, Siddharth Chauhan, Zachary B Haiman, Tahani Al Bulushi, Yara Seif, Bernhard O Palsson, Daniel C Zielinski","doi":"10.1371/journal.pcbi.1012546","DOIUrl":null,"url":null,"abstract":"<p><p>Public gene expression databases are a rapidly expanding resource of organism responses to diverse perturbations, presenting both an opportunity and a challenge for bioinformatics workflows to extract actionable knowledge of transcription regulatory network function. Here, we introduce a five-step computational pipeline, called iModulonMiner, to compile, process, curate, analyze, and characterize the totality of RNA-seq data for a given organism or cell type. This workflow is centered around the data-driven computation of co-regulated gene sets using Independent Component Analysis, called iModulons, which have been shown to have broad applications. As a demonstration, we applied this workflow to generate the iModulon structure of Bacillus subtilis using all high-quality, publicly-available RNA-seq data. Using this structure, we predicted regulatory interactions for multiple transcription factors, identified groups of co-expressed genes that are putatively regulated by undiscovered transcription factors, and predicted properties of a recently discovered single-subunit phage RNA polymerase. We also present a Python package, PyModulon, with functions to characterize, visualize, and explore computed iModulons. The pipeline, available at https://github.com/SBRG/iModulonMiner, can be readily applied to diverse organisms to gain a rapid understanding of their transcriptional regulatory network structure and condition-specific activity.</p>","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":null,"pages":null},"PeriodicalIF":3.8000,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11534266/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS Computational Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1371/journal.pcbi.1012546","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Public gene expression databases are a rapidly expanding resource of organism responses to diverse perturbations, presenting both an opportunity and a challenge for bioinformatics workflows to extract actionable knowledge of transcription regulatory network function. Here, we introduce a five-step computational pipeline, called iModulonMiner, to compile, process, curate, analyze, and characterize the totality of RNA-seq data for a given organism or cell type. This workflow is centered around the data-driven computation of co-regulated gene sets using Independent Component Analysis, called iModulons, which have been shown to have broad applications. As a demonstration, we applied this workflow to generate the iModulon structure of Bacillus subtilis using all high-quality, publicly-available RNA-seq data. Using this structure, we predicted regulatory interactions for multiple transcription factors, identified groups of co-expressed genes that are putatively regulated by undiscovered transcription factors, and predicted properties of a recently discovered single-subunit phage RNA polymerase. We also present a Python package, PyModulon, with functions to characterize, visualize, and explore computed iModulons. The pipeline, available at https://github.com/SBRG/iModulonMiner, can be readily applied to diverse organisms to gain a rapid understanding of their transcriptional regulatory network structure and condition-specific activity.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
iModulonMiner 和 PyModulon:用于无监督挖掘基因表达汇编的软件。
公共基因表达数据库是生物体对各种扰动反应的快速扩展资源,为生物信息学工作流提取转录调控网络功能的可行知识带来了机遇和挑战。在这里,我们介绍一种名为 iModulonMiner 的五步计算管道,用于编译、处理、整理、分析和表征特定生物体或细胞类型的全部 RNA-seq 数据。该工作流程的核心是利用独立成分分析法计算数据驱动的共调基因集,即 iModulons,它已被证明具有广泛的应用价值。作为演示,我们使用这一工作流程,利用所有高质量的公开 RNA-seq 数据生成了枯草杆菌的 iModulon 结构。利用这一结构,我们预测了多个转录因子的调控相互作用,确定了可能受未发现的转录因子调控的共表达基因组,并预测了最近发现的单亚基噬菌体 RNA 聚合酶的特性。我们还介绍了一个 Python 软件包 PyModulon,其中包含用于描述、可视化和探索计算 iModulons 的函数。该管道可在 https://github.com/SBRG/iModulonMiner 上获取,可随时应用于各种生物,以快速了解它们的转录调控网络结构和特定条件下的活性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
PLoS Computational Biology
PLoS Computational Biology BIOCHEMICAL RESEARCH METHODS-MATHEMATICAL & COMPUTATIONAL BIOLOGY
CiteScore
7.10
自引率
4.70%
发文量
820
审稿时长
2.5 months
期刊介绍: PLOS Computational Biology features works of exceptional significance that further our understanding of living systems at all scales—from molecules and cells, to patient populations and ecosystems—through the application of computational methods. Readers include life and computational scientists, who can take the important findings presented here to the next level of discovery. Research articles must be declared as belonging to a relevant section. More information about the sections can be found in the submission guidelines. Research articles should model aspects of biological systems, demonstrate both methodological and scientific novelty, and provide profound new biological insights. Generally, reliability and significance of biological discovery through computation should be validated and enriched by experimental studies. Inclusion of experimental validation is not required for publication, but should be referenced where possible. Inclusion of experimental validation of a modest biological discovery through computation does not render a manuscript suitable for PLOS Computational Biology. Research articles specifically designated as Methods papers should describe outstanding methods of exceptional importance that have been shown, or have the promise to provide new biological insights. The method must already be widely adopted, or have the promise of wide adoption by a broad community of users. Enhancements to existing published methods will only be considered if those enhancements bring exceptional new capabilities.
期刊最新文献
A computational analysis of the oncogenic and anti-tumor immunity role of P4HA3 in human cancers. Assessing the effect of model specification and prior sensitivity on Bayesian tests of temporal signal. During haptic communication, the central nervous system compensates distinctly for delay and noise. Structure-aware annotation of leucine-rich repeat domains. A mechanistic model of in vitro plasma activation to evaluate therapeutic kallikrein-kinin system inhibitors.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1