Knowledge Driven Variable Selection (KDVS) - a new approach to enrichment analysis of gene signatures obtained from high-throughput data.

Q2 Decision Sciences Source Code for Biology and Medicine Pub Date : 2013-01-09 DOI:10.1186/1751-0473-8-2

Grzegorz Zycinski, Annalisa Barla, Margherita Squillario, Tiziana Sanavia, Barbara Di Camillo, Alessandro Verri

{"title":"Knowledge Driven Variable Selection (KDVS) - a new approach to enrichment analysis of gene signatures obtained from high-throughput data.","authors":"Grzegorz Zycinski, Annalisa Barla, Margherita Squillario, Tiziana Sanavia, Barbara Di Camillo, Alessandro Verri","doi":"10.1186/1751-0473-8-2","DOIUrl":null,"url":null,"abstract":"Background: High-throughput (HT) technologies provide huge amount of gene expression data that can be used to identify biomarkers useful in the clinical practice. The most frequently used approaches first select a set of genes (i.e. gene signature) able to characterize differences between two or more phenotypical conditions, and then provide a functional assessment of the selected genes with an a posteriori enrichment analysis, based on biological knowledge. However, this approach comes with some drawbacks. First, gene selection procedure often requires tunable parameters that affect the outcome, typically producing many false hits. Second, a posteriori enrichment analysis is based on mapping between biological concepts and gene expression measurements, which is hard to compute because of constant changes in biological knowledge and genome analysis. Third, such mapping is typically used in the assessment of the coverage of gene signature by biological concepts, that is either score-based or requires tunable parameters as well, limiting its power.Results: We present Knowledge Driven Variable Selection (KDVS), a framework that uses a priori biological knowledge in HT data analysis. The expression data matrix is transformed, according to prior knowledge, into smaller matrices, easier to analyze and to interpret from both computational and biological viewpoints. Therefore KDVS, unlike most approaches, does not exclude a priori any function or process potentially relevant for the biological question under investigation. Differently from the standard approach where gene selection and functional assessment are applied independently, KDVS embeds these two steps into a unified statistical framework, decreasing the variability derived from the threshold-dependent selection, the mapping to the biological concepts, and the signature coverage. We present three case studies to assess the usefulness of the method.Conclusions: We showed that KDVS not only enables the selection of known biological functionalities with accuracy, but also identification of new ones. An efficient implementation of KDVS was devised to obtain results in a fast and robust way. Computing time is drastically reduced by the effective use of distributed resources. Finally, integrated visualization techniques immediately increase the interpretability of results. Overall, KDVS approach can be considered as a viable alternative to enrichment-based approaches.","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":" ","pages":"2"},"PeriodicalIF":0.0000,"publicationDate":"2013-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1751-0473-8-2","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Source Code for Biology and Medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/1751-0473-8-2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Decision Sciences","Score":null,"Total":0}

引用次数: 7

Abstract

Background: High-throughput (HT) technologies provide huge amount of gene expression data that can be used to identify biomarkers useful in the clinical practice. The most frequently used approaches first select a set of genes (i.e. gene signature) able to characterize differences between two or more phenotypical conditions, and then provide a functional assessment of the selected genes with an a posteriori enrichment analysis, based on biological knowledge. However, this approach comes with some drawbacks. First, gene selection procedure often requires tunable parameters that affect the outcome, typically producing many false hits. Second, a posteriori enrichment analysis is based on mapping between biological concepts and gene expression measurements, which is hard to compute because of constant changes in biological knowledge and genome analysis. Third, such mapping is typically used in the assessment of the coverage of gene signature by biological concepts, that is either score-based or requires tunable parameters as well, limiting its power.

Results: We present Knowledge Driven Variable Selection (KDVS), a framework that uses a priori biological knowledge in HT data analysis. The expression data matrix is transformed, according to prior knowledge, into smaller matrices, easier to analyze and to interpret from both computational and biological viewpoints. Therefore KDVS, unlike most approaches, does not exclude a priori any function or process potentially relevant for the biological question under investigation. Differently from the standard approach where gene selection and functional assessment are applied independently, KDVS embeds these two steps into a unified statistical framework, decreasing the variability derived from the threshold-dependent selection, the mapping to the biological concepts, and the signature coverage. We present three case studies to assess the usefulness of the method.

Conclusions: We showed that KDVS not only enables the selection of known biological functionalities with accuracy, but also identification of new ones. An efficient implementation of KDVS was devised to obtain results in a fast and robust way. Computing time is drastically reduced by the effective use of distributed resources. Finally, integrated visualization techniques immediately increase the interpretability of results. Overall, KDVS approach can be considered as a viable alternative to enrichment-based approaches.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

知识驱动变量选择(KDVS)——一种从高通量数据中获得的基因特征富集分析的新方法。

背景:高通量(HT)技术提供了大量的基因表达数据，可用于识别在临床实践中有用的生物标志物。最常用的方法是首先选择一组能够表征两种或多种表型条件之间差异的基因(即基因标记)，然后根据生物学知识，通过事后富集分析对所选基因进行功能评估。然而，这种方法也有一些缺点。首先，基因选择过程通常需要可调整的参数来影响结果，通常会产生许多错误的结果。其次，后验富集分析是基于生物学概念和基因表达测量之间的映射，由于生物学知识和基因组分析的不断变化，这很难计算。第三，这种定位通常用于评估生物概念对基因特征的覆盖范围，这要么是基于分数的，要么需要可调参数，限制了它的能力。结果:我们提出了知识驱动变量选择(KDVS)，这是一个在HT数据分析中使用先验生物学知识的框架。根据先验知识，表达式数据矩阵被转换成更小的矩阵，从计算和生物学的角度更容易分析和解释。因此，与大多数方法不同，KDVS不会先验地排除与正在研究的生物学问题潜在相关的任何功能或过程。与独立应用基因选择和功能评估的标准方法不同，KDVS将这两个步骤嵌入到统一的统计框架中，减少了阈值依赖选择、生物概念映射和签名覆盖所产生的可变性。我们提出三个案例研究来评估该方法的有效性。结论:我们发现KDVS不仅可以准确地选择已知的生物学功能，而且可以识别新的功能。设计了一种有效的KDVS实现，以快速、鲁棒的方式获得结果。有效地利用分布式资源大大减少了计算时间。最后，集成的可视化技术立即增加了结果的可解释性。总体而言，KDVS方法可被视为基于富集方法的可行替代方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Source Code for Biology and Medicine Decision Sciences-Information Systems and Management

自引率

0.00%

发文量

期刊介绍： Source Code for Biology and Medicine is a peer-reviewed open access, online journal that publishes articles on source code employed over a wide range of applications in biology and medicine. The journal"s aim is to publish source code for distribution and use in the public domain in order to advance biological and medical research. Through this dissemination, it may be possible to shorten the time required for solving certain computational problems for which there is limited source code availability or resources.

期刊最新文献

2DKD: a toolkit for content-based local image search. Computing and graphing probability values of pearson distributions: a SAS/IML macro. iPBAvizu: a PyMOL plugin for an efficient 3D protein structure superimposition approach Social support for collaboration and group awareness in life science research teams. MZPAQ: a FASTQ data compression tool.