Significance analysis of clustering high throughput biological data

H. Otu, Shakirahmed Kolia, Jon Jones, Osman Osman, T. Libermann, Beth Israel
{"title":"Significance analysis of clustering high throughput biological data","authors":"H. Otu, Shakirahmed Kolia, Jon Jones, Osman Osman, T. Libermann, Beth Israel","doi":"10.1109/EIT.2005.1627001","DOIUrl":null,"url":null,"abstract":"In the post-genomic era, the availability of complete genome sequences has given rise to high throughput systems such as gene chips and protein arrays. These techniques revolutionize our understanding of biology by simultaneously probing thousands of biological entities at any given time. Unsupervised classification and clustering have emerged as important methods of analysis, which can be used to group samples with a similar molecular profile and/or molecules with a similar expression profile. However, techniques like hierarchical clustering, k-means, and self organizing maps (SOM) have been extensively used with little attention to the significance of their results. We propose a general method utilizing bootstrap technique to assign confidence levels to clustering results of high throughput biological data. We apply the proposed method to real genomics and proteomics data regarding Renal Cell Cancer (RCC), which is the most common malignancy of the adult kidney. We utilize protein profiles from IL-2 treatment responders and non-responders among metastatic RCC patients using surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI TOF-MS). We also use gene expression data using Affymetrix HG-U133A chips for primary RCC tumors, inquiring the Union International Contre le Cancer's (UICC) TNM classification","PeriodicalId":358002,"journal":{"name":"2005 IEEE International Conference on Electro Information Technology","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2005 IEEE International Conference on Electro Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EIT.2005.1627001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In the post-genomic era, the availability of complete genome sequences has given rise to high throughput systems such as gene chips and protein arrays. These techniques revolutionize our understanding of biology by simultaneously probing thousands of biological entities at any given time. Unsupervised classification and clustering have emerged as important methods of analysis, which can be used to group samples with a similar molecular profile and/or molecules with a similar expression profile. However, techniques like hierarchical clustering, k-means, and self organizing maps (SOM) have been extensively used with little attention to the significance of their results. We propose a general method utilizing bootstrap technique to assign confidence levels to clustering results of high throughput biological data. We apply the proposed method to real genomics and proteomics data regarding Renal Cell Cancer (RCC), which is the most common malignancy of the adult kidney. We utilize protein profiles from IL-2 treatment responders and non-responders among metastatic RCC patients using surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI TOF-MS). We also use gene expression data using Affymetrix HG-U133A chips for primary RCC tumors, inquiring the Union International Contre le Cancer's (UICC) TNM classification
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
高通量生物数据聚类的显著性分析
在后基因组时代,全基因组序列的可用性已经引起了高通量系统,如基因芯片和蛋白质阵列。这些技术通过在任何给定时间同时探测数千个生物实体,彻底改变了我们对生物学的理解。无监督分类和聚类已成为重要的分析方法,可用于对具有相似分子特征和/或具有相似表达特征的分子进行分组。然而,像分层聚类、k-means和自组织映射(SOM)这样的技术已经被广泛使用,但很少关注其结果的重要性。我们提出了一种利用自举技术为高通量生物数据的聚类结果分配置信水平的通用方法。我们将提出的方法应用于关于肾细胞癌(RCC)的真实基因组学和蛋白质组学数据,这是成人肾脏最常见的恶性肿瘤。我们利用表面增强激光解吸/电离飞行时间质谱(SELDI TOF-MS)分析了转移性RCC患者中IL-2治疗应答者和无应答者的蛋白质谱。我们还使用Affymetrix HG-U133A芯片对原发性RCC肿瘤进行基因表达数据分析,查询国际癌症联盟(UICC)的TNM分类
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Carry-save and binary sign-digit representations conversion Energy macro-modeling of embedded microprocessor using SystemC A proportional rate-based control scheme for active queue management Hierarchical clustering for image databases Theft-induced checkpointing for reconfigurable dataflow applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1