Weighted pooling high-throughput gene expression data sets to maximize the functional coherence of the top rank genes

2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW) Pub Date : 2011-11-12 DOI:10.1109/BIBMW.2011.6112550

Xiaodong Zhou, E. George

{"title":"Weighted pooling high-throughput gene expression data sets to maximize the functional coherence of the top rank genes","authors":"Xiaodong Zhou, E. George","doi":"10.1109/BIBMW.2011.6112550","DOIUrl":null,"url":null,"abstract":"In a typical gene expression study with high throughput technique, such as microarray, a biologist usually focuses on the top genes ranked by the P-values to establish gene functional relationship / network, biological pathway, and microbiologically ramifications of the gene's selection. With more datasets publically available, researchers pool data from independent experiments, typically by pooling P-values with equal weight assigned to each dataset, aiming to fetch more biological information from the pooled data. However, the qualities of datasets may vary substantially. Assigning equal weights may not guarantee the optimal result. Applying the equal weights approach to six independent datasets, we observe the top rank genes of data pooled with this approach have less functional coherence than the single dataset that has highest functional coherence. We propose a procedure based on enhanced simulated annealing (ESA) and literature semantic indexing cohesive (LSI-c) analysis to assign optimal weights to datasets so as to maximize the functional coherence of the top rank genes ordered by their pooled P-values. We observe significantly more functional coherence in optimally pooled data than any single dataset or data pooled with equal weights. Identification of top rank genes through our optimal procedure should improve the downstream analysis.","PeriodicalId":6345,"journal":{"name":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","volume":"91 1","pages":"1033-1033"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBMW.2011.6112550","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In a typical gene expression study with high throughput technique, such as microarray, a biologist usually focuses on the top genes ranked by the P-values to establish gene functional relationship / network, biological pathway, and microbiologically ramifications of the gene's selection. With more datasets publically available, researchers pool data from independent experiments, typically by pooling P-values with equal weight assigned to each dataset, aiming to fetch more biological information from the pooled data. However, the qualities of datasets may vary substantially. Assigning equal weights may not guarantee the optimal result. Applying the equal weights approach to six independent datasets, we observe the top rank genes of data pooled with this approach have less functional coherence than the single dataset that has highest functional coherence. We propose a procedure based on enhanced simulated annealing (ESA) and literature semantic indexing cohesive (LSI-c) analysis to assign optimal weights to datasets so as to maximize the functional coherence of the top rank genes ordered by their pooled P-values. We observe significantly more functional coherence in optimally pooled data than any single dataset or data pooled with equal weights. Identification of top rank genes through our optimal procedure should improve the downstream analysis.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

加权池高通量基因表达数据集，以最大限度地提高顶级基因的功能一致性

在典型的高通量基因表达研究中，如微阵列技术，生物学家通常关注p值排名靠前的基因，以建立基因功能关系/网络、生物学途径和基因选择的微生物学后果。随着越来越多的数据集公开可用，研究人员将来自独立实验的数据汇集在一起，通常是通过将每个数据集赋予相同权重的p值汇集在一起，旨在从汇集的数据中获取更多的生物信息。然而，数据集的质量可能会有很大差异。分配相等的权重可能不能保证最佳结果。将等权方法应用于6个独立数据集，我们观察到，与具有最高功能相干性的单个数据集相比，该方法汇集的顶级基因具有更低的功能相干性。我们提出了一种基于增强模拟退火(ESA)和文献语义索引内聚(LSI-c)分析的程序，为数据集分配最优权重，从而最大限度地提高按其汇集的p值排序的顶级基因的功能一致性。我们观察到，与任何单一数据集或具有相同权重的数据池相比，优化池数据中的功能一致性明显更高。通过我们的最优程序鉴定顶级基因将改善下游分析。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)

自引率

0.00%

发文量

期刊最新文献

Evolution of protein architectures inferred from phylogenomic analysis of CATH Hierarchical modeling of alternative exon usage associations with survival 3D point cloud sensors for low-cost medical in-situ visualization Bayesian Classifiers for Chemical Toxicity Prediction Normal mode analysis of protein structure dynamics based on residue contact energy