Natthakan Iam-on, Simon M. Garrett, C. Price, Tossapon Boongoen
{"title":"异构生物数据分析的基于链接的聚类集成","authors":"Natthakan Iam-on, Simon M. Garrett, C. Price, Tossapon Boongoen","doi":"10.1109/BIBM.2010.5706631","DOIUrl":null,"url":null,"abstract":"Clinical data has been employed as the major factor for traditional cancer prognosis. However, this classic approach may be ineffective for analyzing morphologically indistinguishable tumor subtypes. As such, the microarray technology emerges as the promising alternative. Despite a large number of microarray studies, the actual clinical application of gene expression data analysis remains limited due to the complexity of generated data and the noise level. Recently, the integrative cluster analysis of both clinical and gene expression data has shown to be an effective alternative to overcome the above-mentioned problems. This paper presents a novel method for using cluster ensembles that is accurate for analyzing heterogeneous biological data. It overcomes the problem of selecting an appropriate clustering algorithm or parameter setting of any potential candidate, especially with a new set of data. The evaluation on real biological and benchmark datasets suggests that the quality of the proposed model is higher than many state-of-the-art cluster ensemble techniques and standard clustering algorithms. Also, its performance is robust to the parameter perturbation, thus providing a reliable and useful means for data analysts and bioinformaticians. Online supplementary is available at http://users.aber.ac.uk/nii07/bibm2010.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Link-based cluster ensembles for heterogeneous biological data analysis\",\"authors\":\"Natthakan Iam-on, Simon M. Garrett, C. Price, Tossapon Boongoen\",\"doi\":\"10.1109/BIBM.2010.5706631\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Clinical data has been employed as the major factor for traditional cancer prognosis. However, this classic approach may be ineffective for analyzing morphologically indistinguishable tumor subtypes. As such, the microarray technology emerges as the promising alternative. Despite a large number of microarray studies, the actual clinical application of gene expression data analysis remains limited due to the complexity of generated data and the noise level. Recently, the integrative cluster analysis of both clinical and gene expression data has shown to be an effective alternative to overcome the above-mentioned problems. This paper presents a novel method for using cluster ensembles that is accurate for analyzing heterogeneous biological data. It overcomes the problem of selecting an appropriate clustering algorithm or parameter setting of any potential candidate, especially with a new set of data. The evaluation on real biological and benchmark datasets suggests that the quality of the proposed model is higher than many state-of-the-art cluster ensemble techniques and standard clustering algorithms. Also, its performance is robust to the parameter perturbation, thus providing a reliable and useful means for data analysts and bioinformaticians. Online supplementary is available at http://users.aber.ac.uk/nii07/bibm2010.\",\"PeriodicalId\":275098,\"journal\":{\"name\":\"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BIBM.2010.5706631\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2010.5706631","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Link-based cluster ensembles for heterogeneous biological data analysis
Clinical data has been employed as the major factor for traditional cancer prognosis. However, this classic approach may be ineffective for analyzing morphologically indistinguishable tumor subtypes. As such, the microarray technology emerges as the promising alternative. Despite a large number of microarray studies, the actual clinical application of gene expression data analysis remains limited due to the complexity of generated data and the noise level. Recently, the integrative cluster analysis of both clinical and gene expression data has shown to be an effective alternative to overcome the above-mentioned problems. This paper presents a novel method for using cluster ensembles that is accurate for analyzing heterogeneous biological data. It overcomes the problem of selecting an appropriate clustering algorithm or parameter setting of any potential candidate, especially with a new set of data. The evaluation on real biological and benchmark datasets suggests that the quality of the proposed model is higher than many state-of-the-art cluster ensemble techniques and standard clustering algorithms. Also, its performance is robust to the parameter perturbation, thus providing a reliable and useful means for data analysts and bioinformaticians. Online supplementary is available at http://users.aber.ac.uk/nii07/bibm2010.