H. Otu, Shakirahmed Kolia, Jon Jones, Osman Osman, T. Libermann, Beth Israel
{"title":"高通量生物数据聚类的显著性分析","authors":"H. Otu, Shakirahmed Kolia, Jon Jones, Osman Osman, T. Libermann, Beth Israel","doi":"10.1109/EIT.2005.1627001","DOIUrl":null,"url":null,"abstract":"In the post-genomic era, the availability of complete genome sequences has given rise to high throughput systems such as gene chips and protein arrays. These techniques revolutionize our understanding of biology by simultaneously probing thousands of biological entities at any given time. Unsupervised classification and clustering have emerged as important methods of analysis, which can be used to group samples with a similar molecular profile and/or molecules with a similar expression profile. However, techniques like hierarchical clustering, k-means, and self organizing maps (SOM) have been extensively used with little attention to the significance of their results. We propose a general method utilizing bootstrap technique to assign confidence levels to clustering results of high throughput biological data. We apply the proposed method to real genomics and proteomics data regarding Renal Cell Cancer (RCC), which is the most common malignancy of the adult kidney. We utilize protein profiles from IL-2 treatment responders and non-responders among metastatic RCC patients using surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI TOF-MS). We also use gene expression data using Affymetrix HG-U133A chips for primary RCC tumors, inquiring the Union International Contre le Cancer's (UICC) TNM classification","PeriodicalId":358002,"journal":{"name":"2005 IEEE International Conference on Electro Information Technology","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Significance analysis of clustering high throughput biological data\",\"authors\":\"H. Otu, Shakirahmed Kolia, Jon Jones, Osman Osman, T. Libermann, Beth Israel\",\"doi\":\"10.1109/EIT.2005.1627001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the post-genomic era, the availability of complete genome sequences has given rise to high throughput systems such as gene chips and protein arrays. These techniques revolutionize our understanding of biology by simultaneously probing thousands of biological entities at any given time. Unsupervised classification and clustering have emerged as important methods of analysis, which can be used to group samples with a similar molecular profile and/or molecules with a similar expression profile. However, techniques like hierarchical clustering, k-means, and self organizing maps (SOM) have been extensively used with little attention to the significance of their results. We propose a general method utilizing bootstrap technique to assign confidence levels to clustering results of high throughput biological data. We apply the proposed method to real genomics and proteomics data regarding Renal Cell Cancer (RCC), which is the most common malignancy of the adult kidney. We utilize protein profiles from IL-2 treatment responders and non-responders among metastatic RCC patients using surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI TOF-MS). We also use gene expression data using Affymetrix HG-U133A chips for primary RCC tumors, inquiring the Union International Contre le Cancer's (UICC) TNM classification\",\"PeriodicalId\":358002,\"journal\":{\"name\":\"2005 IEEE International Conference on Electro Information Technology\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-05-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2005 IEEE International Conference on Electro Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EIT.2005.1627001\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2005 IEEE International Conference on Electro Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EIT.2005.1627001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Significance analysis of clustering high throughput biological data
In the post-genomic era, the availability of complete genome sequences has given rise to high throughput systems such as gene chips and protein arrays. These techniques revolutionize our understanding of biology by simultaneously probing thousands of biological entities at any given time. Unsupervised classification and clustering have emerged as important methods of analysis, which can be used to group samples with a similar molecular profile and/or molecules with a similar expression profile. However, techniques like hierarchical clustering, k-means, and self organizing maps (SOM) have been extensively used with little attention to the significance of their results. We propose a general method utilizing bootstrap technique to assign confidence levels to clustering results of high throughput biological data. We apply the proposed method to real genomics and proteomics data regarding Renal Cell Cancer (RCC), which is the most common malignancy of the adult kidney. We utilize protein profiles from IL-2 treatment responders and non-responders among metastatic RCC patients using surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI TOF-MS). We also use gene expression data using Affymetrix HG-U133A chips for primary RCC tumors, inquiring the Union International Contre le Cancer's (UICC) TNM classification