goSTAG:基因本体子树，用于标记和注释一组基因。

Q2 Decision Sciences Source Code for Biology and Medicine Pub Date : 2017-04-13 eCollection Date: 2017-01-01 DOI:10.1186/s13029-017-0066-1

Brian D Bennett, Pierre R Bushel

{"title":"goSTAG:基因本体子树，用于标记和注释一组基因。","authors":"Brian D Bennett, Pierre R Bushel","doi":"10.1186/s13029-017-0066-1","DOIUrl":null,"url":null,"abstract":"Background: Over-representation analysis (ORA) detects enrichment of genes within biological categories. Gene Ontology (GO) domains are commonly used for gene/gene-product annotation. When ORA is employed, often times there are hundreds of statistically significant GO terms per gene set. Comparing enriched categories between a large number of analyses and identifying the term within the GO hierarchy with the most connections is challenging. Furthermore, ascertaining biological themes representative of the samples can be highly subjective from the interpretation of the enriched categories.Results: We developed goSTAG for utilizing GO Subtrees to Tag and Annotate Genes that are part of a set. Given gene lists from microarray, RNA sequencing (RNA-Seq) or other genomic high-throughput technologies, goSTAG performs GO enrichment analysis and clusters the GO terms based on the p-values from the significance tests. GO subtrees are constructed for each cluster, and the term that has the most paths to the root within the subtree is used to tag and annotate the cluster as the biological theme. We tested goSTAG on a microarray gene expression data set of samples acquired from the bone marrow of rats exposed to cancer therapeutic drugs to determine whether the combination or the order of administration influenced bone marrow toxicity at the level of gene expression. Several clusters were labeled with GO biological processes (BPs) from the subtrees that are indicative of some of the prominent pathways modulated in bone marrow from animals treated with an oxaliplatin/topotecan combination. In particular, negative regulation of MAP kinase activity was the biological theme exclusively in the cluster associated with enrichment at 6 h after treatment with oxaliplatin followed by control. However, nucleoside triphosphate catabolic process was the GO BP labeled exclusively at 6 h after treatment with topotecan followed by control.Conclusions: goSTAG converts gene lists from genomic analyses into biological themes by enriching biological categories and constructing GO subtrees from over-represented terms in the clusters. The terms with the most paths to the root in the subtree are used to represent the biological themes. goSTAG is developed in R as a Bioconductor package and is available at https://bioconductor.org/packages/goSTAG.","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"12 ","pages":"6"},"PeriodicalIF":0.0000,"publicationDate":"2017-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13029-017-0066-1","citationCount":"7","resultStr":"{\"title\":\"goSTAG: gene ontology subtrees to tag and annotate genes within a set.\",\"authors\":\"Brian D Bennett, Pierre R Bushel\",\"doi\":\"10.1186/s13029-017-0066-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Over-representation analysis (ORA) detects enrichment of genes within biological categories. Gene Ontology (GO) domains are commonly used for gene/gene-product annotation. When ORA is employed, often times there are hundreds of statistically significant GO terms per gene set. Comparing enriched categories between a large number of analyses and identifying the term within the GO hierarchy with the most connections is challenging. Furthermore, ascertaining biological themes representative of the samples can be highly subjective from the interpretation of the enriched categories.Results: We developed goSTAG for utilizing GO Subtrees to Tag and Annotate Genes that are part of a set. Given gene lists from microarray, RNA sequencing (RNA-Seq) or other genomic high-throughput technologies, goSTAG performs GO enrichment analysis and clusters the GO terms based on the p-values from the significance tests. GO subtrees are constructed for each cluster, and the term that has the most paths to the root within the subtree is used to tag and annotate the cluster as the biological theme. We tested goSTAG on a microarray gene expression data set of samples acquired from the bone marrow of rats exposed to cancer therapeutic drugs to determine whether the combination or the order of administration influenced bone marrow toxicity at the level of gene expression. Several clusters were labeled with GO biological processes (BPs) from the subtrees that are indicative of some of the prominent pathways modulated in bone marrow from animals treated with an oxaliplatin/topotecan combination. In particular, negative regulation of MAP kinase activity was the biological theme exclusively in the cluster associated with enrichment at 6 h after treatment with oxaliplatin followed by control. However, nucleoside triphosphate catabolic process was the GO BP labeled exclusively at 6 h after treatment with topotecan followed by control.Conclusions: goSTAG converts gene lists from genomic analyses into biological themes by enriching biological categories and constructing GO subtrees from over-represented terms in the clusters. The terms with the most paths to the root in the subtree are used to represent the biological themes. goSTAG is developed in R as a Bioconductor package and is available at https://bioconductor.org/packages/goSTAG.\",\"PeriodicalId\":35052,\"journal\":{\"name\":\"Source Code for Biology and Medicine\",\"volume\":\"12 \",\"pages\":\"6\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-04-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1186/s13029-017-0066-1\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Source Code for Biology and Medicine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/s13029-017-0066-1\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2017/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"Decision Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Source Code for Biology and Medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s13029-017-0066-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2017/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"Decision Sciences","Score":null,"Total":0}

引用次数: 7

摘要

背景:过度代表性分析(ORA)检测生物类别内基因的富集。基因本体(GO)域通常用于基因/基因-产物注释。当使用ORA时，通常每个基因集有数百个统计上显着的GO项。在大量分析中比较丰富的类别并确定GO层次结构中具有最多联系的术语是具有挑战性的。此外，从对富集类别的解释中确定具有代表性的样品的生物主题可能是高度主观的。结果:我们开发了goSTAG，用于利用GO子树来标记和注释作为集合一部分的基因。给定来自微阵列、RNA测序(RNA- seq)或其他基因组高通量技术的基因列表，goSTAG执行氧化石墨烯富集分析，并根据显著性检验的p值对氧化石墨烯项进行聚类。为每个集群构建GO子树，并使用子树中到根路径最多的项来标记和注释集群作为生物主题。我们在暴露于癌症治疗药物的大鼠骨髓样本的微阵列基因表达数据集上测试goSTAG，以确定药物组合或给药顺序是否在基因表达水平上影响骨髓毒性。来自子树的几个簇被标记为氧化石墨烯生物过程(bp)，这些过程表明奥沙利铂/拓扑替康联合治疗的动物骨髓中一些重要的通路被调节。特别是，在奥沙利铂治疗后6小时，MAP激酶活性的负调控是与富集相关的集群的生物学主题，随后是对照组。然而，三磷酸核苷分解代谢过程是在拓扑替康治疗后6小时仅标记的氧化石墨烯BP，然后是对照组。结论:goSTAG通过丰富生物类别和从集群中过度代表的术语构建GO子树，将基因组分析中的基因列表转换为生物学主题。子树中到根的路径最多的项用于表示生物主题。goSTAG是在R语言中作为Bioconductor包开发的，可以在https://bioconductor.org/packages/goSTAG上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

goSTAG: gene ontology subtrees to tag and annotate genes within a set.

Background: Over-representation analysis (ORA) detects enrichment of genes within biological categories. Gene Ontology (GO) domains are commonly used for gene/gene-product annotation. When ORA is employed, often times there are hundreds of statistically significant GO terms per gene set. Comparing enriched categories between a large number of analyses and identifying the term within the GO hierarchy with the most connections is challenging. Furthermore, ascertaining biological themes representative of the samples can be highly subjective from the interpretation of the enriched categories.

Results: We developed goSTAG for utilizing GO Subtrees to Tag and Annotate Genes that are part of a set. Given gene lists from microarray, RNA sequencing (RNA-Seq) or other genomic high-throughput technologies, goSTAG performs GO enrichment analysis and clusters the GO terms based on the p-values from the significance tests. GO subtrees are constructed for each cluster, and the term that has the most paths to the root within the subtree is used to tag and annotate the cluster as the biological theme. We tested goSTAG on a microarray gene expression data set of samples acquired from the bone marrow of rats exposed to cancer therapeutic drugs to determine whether the combination or the order of administration influenced bone marrow toxicity at the level of gene expression. Several clusters were labeled with GO biological processes (BPs) from the subtrees that are indicative of some of the prominent pathways modulated in bone marrow from animals treated with an oxaliplatin/topotecan combination. In particular, negative regulation of MAP kinase activity was the biological theme exclusively in the cluster associated with enrichment at 6 h after treatment with oxaliplatin followed by control. However, nucleoside triphosphate catabolic process was the GO BP labeled exclusively at 6 h after treatment with topotecan followed by control.

Conclusions: goSTAG converts gene lists from genomic analyses into biological themes by enriching biological categories and constructing GO subtrees from over-represented terms in the clusters. The terms with the most paths to the root in the subtree are used to represent the biological themes. goSTAG is developed in R as a Bioconductor package and is available at https://bioconductor.org/packages/goSTAG.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Source Code for Biology and Medicine Decision Sciences-Information Systems and Management

自引率

0.00%

发文量

期刊介绍： Source Code for Biology and Medicine is a peer-reviewed open access, online journal that publishes articles on source code employed over a wide range of applications in biology and medicine. The journal"s aim is to publish source code for distribution and use in the public domain in order to advance biological and medical research. Through this dissemination, it may be possible to shorten the time required for solving certain computational problems for which there is limited source code availability or resources.

期刊最新文献

2DKD: a toolkit for content-based local image search. Computing and graphing probability values of pearson distributions: a SAS/IML macro. iPBAvizu: a PyMOL plugin for an efficient 3D protein structure superimposition approach Social support for collaboration and group awareness in life science research teams. MZPAQ: a FASTQ data compression tool.