Weighted pooling high-throughput gene expression data sets to maximize the functional coherence of the top rank genes

Xiaodong Zhou, E. George
{"title":"Weighted pooling high-throughput gene expression data sets to maximize the functional coherence of the top rank genes","authors":"Xiaodong Zhou, E. George","doi":"10.1109/BIBMW.2011.6112550","DOIUrl":null,"url":null,"abstract":"In a typical gene expression study with high throughput technique, such as microarray, a biologist usually focuses on the top genes ranked by the P-values to establish gene functional relationship / network, biological pathway, and microbiologically ramifications of the gene's selection. With more datasets publically available, researchers pool data from independent experiments, typically by pooling P-values with equal weight assigned to each dataset, aiming to fetch more biological information from the pooled data. However, the qualities of datasets may vary substantially. Assigning equal weights may not guarantee the optimal result. Applying the equal weights approach to six independent datasets, we observe the top rank genes of data pooled with this approach have less functional coherence than the single dataset that has highest functional coherence. We propose a procedure based on enhanced simulated annealing (ESA) and literature semantic indexing cohesive (LSI-c) analysis to assign optimal weights to datasets so as to maximize the functional coherence of the top rank genes ordered by their pooled P-values. We observe significantly more functional coherence in optimally pooled data than any single dataset or data pooled with equal weights. Identification of top rank genes through our optimal procedure should improve the downstream analysis.","PeriodicalId":6345,"journal":{"name":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","volume":"91 1","pages":"1033-1033"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBMW.2011.6112550","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In a typical gene expression study with high throughput technique, such as microarray, a biologist usually focuses on the top genes ranked by the P-values to establish gene functional relationship / network, biological pathway, and microbiologically ramifications of the gene's selection. With more datasets publically available, researchers pool data from independent experiments, typically by pooling P-values with equal weight assigned to each dataset, aiming to fetch more biological information from the pooled data. However, the qualities of datasets may vary substantially. Assigning equal weights may not guarantee the optimal result. Applying the equal weights approach to six independent datasets, we observe the top rank genes of data pooled with this approach have less functional coherence than the single dataset that has highest functional coherence. We propose a procedure based on enhanced simulated annealing (ESA) and literature semantic indexing cohesive (LSI-c) analysis to assign optimal weights to datasets so as to maximize the functional coherence of the top rank genes ordered by their pooled P-values. We observe significantly more functional coherence in optimally pooled data than any single dataset or data pooled with equal weights. Identification of top rank genes through our optimal procedure should improve the downstream analysis.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
加权池高通量基因表达数据集,以最大限度地提高顶级基因的功能一致性
在典型的高通量基因表达研究中,如微阵列技术,生物学家通常关注p值排名靠前的基因,以建立基因功能关系/网络、生物学途径和基因选择的微生物学后果。随着越来越多的数据集公开可用,研究人员将来自独立实验的数据汇集在一起,通常是通过将每个数据集赋予相同权重的p值汇集在一起,旨在从汇集的数据中获取更多的生物信息。然而,数据集的质量可能会有很大差异。分配相等的权重可能不能保证最佳结果。将等权方法应用于6个独立数据集,我们观察到,与具有最高功能相干性的单个数据集相比,该方法汇集的顶级基因具有更低的功能相干性。我们提出了一种基于增强模拟退火(ESA)和文献语义索引内聚(LSI-c)分析的程序,为数据集分配最优权重,从而最大限度地提高按其汇集的p值排序的顶级基因的功能一致性。我们观察到,与任何单一数据集或具有相同权重的数据池相比,优化池数据中的功能一致性明显更高。通过我们的最优程序鉴定顶级基因将改善下游分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Evolution of protein architectures inferred from phylogenomic analysis of CATH Hierarchical modeling of alternative exon usage associations with survival 3D point cloud sensors for low-cost medical in-situ visualization Bayesian Classifiers for Chemical Toxicity Prediction Normal mode analysis of protein structure dynamics based on residue contact energy
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1