bootGSEA: a bootstrap and rank aggregation pipeline for multi-study and multi-omics enrichment analyses

Shamini Hemandhar Kumar, Ines Tapken, Daniela Kuhn, Peter Claus, Klaus Jung
{"title":"bootGSEA: a bootstrap and rank aggregation pipeline for multi-study and multi-omics enrichment analyses","authors":"Shamini Hemandhar Kumar, Ines Tapken, Daniela Kuhn, Peter Claus, Klaus Jung","doi":"10.3389/fbinf.2024.1380928","DOIUrl":null,"url":null,"abstract":"Introduction: Gene set enrichment analysis (GSEA) subsequent to differential expression analysis is a standard step in transcriptomics and proteomics data analysis. Although many tools for this step are available, the results are often difficult to reproduce because set annotations can change in the databases, that is, new features can be added or existing features can be removed. Finally, such changes in set compositions can have an impact on biological interpretation. Methods: We present bootGSEA, a novel computational pipeline, to study the robustness of GSEA. By repeating GSEA based on bootstrap samples, the variability and robustness of results can be studied. In our pipeline, not all genes or proteins are involved in the different bootstrap replicates of the analyses. Finally, we aggregate the ranks from the bootstrap replicates to obtain a score per gene set that shows whether it gains or loses evidence compared to the ranking of the standard GSEA. Rank aggregation is also used to combine GSEA results from different omics levels or from multiple independent studies at the same omics level. Results: By applying our approach to six independent cancer transcriptomics datasets, we showed that bootstrap GSEA can aid in the selection of more robust enriched gene sets. Additionally, we applied our approach to paired transcriptomics and proteomics data obtained from a mouse model of spinal muscular atrophy (SMA), a neurodegenerative and neurodevelopmental disease associated with multi-system involvement. After obtaining a robust ranking at both omics levels, both ranking lists were combined to aggregate the findings from the transcriptomics and proteomics results. Furthermore, we constructed the new R-package “bootGSEA,” which implements the proposed methods and provides graphical views of the findings. Bootstrap-based GSEA was able in the example datasets to identify gene or protein sets that were less robust when the set composition changed during bootstrap analysis. Discussion: The rank aggregation step was useful for combining bootstrap results and making them comparable to the original findings on the single-omics level or for combining findings from multiple different omics levels.","PeriodicalId":507586,"journal":{"name":"Frontiers in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fbinf.2024.1380928","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: Gene set enrichment analysis (GSEA) subsequent to differential expression analysis is a standard step in transcriptomics and proteomics data analysis. Although many tools for this step are available, the results are often difficult to reproduce because set annotations can change in the databases, that is, new features can be added or existing features can be removed. Finally, such changes in set compositions can have an impact on biological interpretation. Methods: We present bootGSEA, a novel computational pipeline, to study the robustness of GSEA. By repeating GSEA based on bootstrap samples, the variability and robustness of results can be studied. In our pipeline, not all genes or proteins are involved in the different bootstrap replicates of the analyses. Finally, we aggregate the ranks from the bootstrap replicates to obtain a score per gene set that shows whether it gains or loses evidence compared to the ranking of the standard GSEA. Rank aggregation is also used to combine GSEA results from different omics levels or from multiple independent studies at the same omics level. Results: By applying our approach to six independent cancer transcriptomics datasets, we showed that bootstrap GSEA can aid in the selection of more robust enriched gene sets. Additionally, we applied our approach to paired transcriptomics and proteomics data obtained from a mouse model of spinal muscular atrophy (SMA), a neurodegenerative and neurodevelopmental disease associated with multi-system involvement. After obtaining a robust ranking at both omics levels, both ranking lists were combined to aggregate the findings from the transcriptomics and proteomics results. Furthermore, we constructed the new R-package “bootGSEA,” which implements the proposed methods and provides graphical views of the findings. Bootstrap-based GSEA was able in the example datasets to identify gene or protein sets that were less robust when the set composition changed during bootstrap analysis. Discussion: The rank aggregation step was useful for combining bootstrap results and making them comparable to the original findings on the single-omics level or for combining findings from multiple different omics levels.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
bootGSEA:用于多研究和多组学富集分析的自举和等级聚合管道
简介差异表达分析之后的基因组富集分析(GSEA)是转录组学和蛋白质组学数据分析的一个标准步骤。虽然有许多工具可用于这一步骤,但结果往往难以重现,因为数据库中的集合注释会发生变化,即可能添加新的特征或删除现有特征。最后,集合组成的这种变化会对生物学解释产生影响。方法我们提出了 bootGSEA 这一新型计算管道来研究 GSEA 的稳健性。通过重复基于引导样本的 GSEA,可以研究结果的可变性和稳健性。在我们的管道中,并非所有基因或蛋白质都参与了不同的引导复制分析。最后,我们汇总自举重复的等级,得到每个基因组的得分,显示与标准 GSEA 的等级相比,该基因组是获得了证据还是失去了证据。等级聚合还可用于合并来自不同 omics 层面或同一 omics 层面多个独立研究的 GSEA 结果。结果通过将我们的方法应用于六个独立的癌症转录组学数据集,我们发现引导式 GSEA 可以帮助选择更稳健的富集基因集。此外,我们还将我们的方法应用于脊髓性肌萎缩症(SMA)小鼠模型中获得的成对转录组学和蛋白质组学数据,脊髓性肌萎缩症是一种神经退行性和神经发育疾病,涉及多个系统。在获得两个omics水平的稳健排名后,我们将两个排名列表合并,以汇总转录组学和蛋白质组学的研究结果。此外,我们还构建了新的 R 软件包 "bootGSEA",它实现了所提出的方法,并提供了研究结果的图形视图。在示例数据集中,当数据集组成在引导分析过程中发生变化时,基于引导的 GSEA 能够识别出稳健性较差的基因或蛋白质集。讨论等级聚合步骤有助于合并 Bootstrap 结果,并使其与单个组学层面的原始结果或多个不同组学层面的结果具有可比性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Editorial: Machine learning approaches to antimicrobials: discovery and resistance RIPS (rapid intuitive pathogen surveillance): a tool for surveillance of genome sequence data from foodborne bacterial pathogens Editorial: Big data and artificial intelligence for genomics and therapeutics – Proceedings of the 19th Annual Meeting of the MidSouth Computational Biology and Bioinformatics Society (MCBIOS) In silico studies of benzothiazole derivatives as potential inhibitors of Anopheles funestus and Anopheles gambiae trehalase Predictive identification and design of potent inhibitors targeting resistance-inducing candidate genes from E. coli whole-genome sequences
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1