基于分离的大维数据子空间聚类

2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS) Pub Date : 2010-11-01 DOI:10.1109/GENSIPS.2010.5719667

Majid I. Alsagabi, A. Tewfik

{"title":"基于分离的大维数据子空间聚类","authors":"Majid I. Alsagabi, A. Tewfik","doi":"10.1109/GENSIPS.2010.5719667","DOIUrl":null,"url":null,"abstract":"Clustering algorithms break down when the data points fall in huge-dimensional spaces. To tackle this problem, many subspace clustering methods were proposed to build up a subspace where data points cluster efficiently. The bottom-up approach is used widely to select a set of candidate features, and then to use a portion of this set to build up the hidden subspace step by step. The complexity depends exponentially or cubically on the number of the selected features. In this paper, we present SEGCLU, a SEGregation-based subspace CLUstering method which significantly reduces the size of the candidate features' set and has a cubic complexity. This algorithm was applied at noise-free data of DNA copy numbers of two groups of autistic and typically developing children to extract a potential bio-marker for autism. 85% of the individuals were classified correctly in a 13-dimensional subspace.","PeriodicalId":388703,"journal":{"name":"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Segregation-based subspace clustering for huge dimensional data\",\"authors\":\"Majid I. Alsagabi, A. Tewfik\",\"doi\":\"10.1109/GENSIPS.2010.5719667\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Clustering algorithms break down when the data points fall in huge-dimensional spaces. To tackle this problem, many subspace clustering methods were proposed to build up a subspace where data points cluster efficiently. The bottom-up approach is used widely to select a set of candidate features, and then to use a portion of this set to build up the hidden subspace step by step. The complexity depends exponentially or cubically on the number of the selected features. In this paper, we present SEGCLU, a SEGregation-based subspace CLUstering method which significantly reduces the size of the candidate features' set and has a cubic complexity. This algorithm was applied at noise-free data of DNA copy numbers of two groups of autistic and typically developing children to extract a potential bio-marker for autism. 85% of the individuals were classified correctly in a 13-dimensional subspace.\",\"PeriodicalId\":388703,\"journal\":{\"name\":\"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GENSIPS.2010.5719667\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GENSIPS.2010.5719667","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

当数据点落在大维度空间中时，聚类算法就会失效。为了解决这一问题，提出了许多子空间聚类方法来构建数据点有效聚类的子空间。自底向上的方法被广泛用于选择一组候选特征，然后使用该集合的一部分逐步构建隐藏子空间。复杂度取决于所选特征的数量。本文提出了一种基于分离的子空间聚类方法SEGCLU，该方法显著减小了候选特征集的大小，并具有一定的三次复杂度。将该算法应用于两组自闭症儿童和正常发育儿童的DNA拷贝数的无噪声数据，以提取自闭症的潜在生物标志物。85%的个体在13维子空间中被正确分类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Segregation-based subspace clustering for huge dimensional data

Clustering algorithms break down when the data points fall in huge-dimensional spaces. To tackle this problem, many subspace clustering methods were proposed to build up a subspace where data points cluster efficiently. The bottom-up approach is used widely to select a set of candidate features, and then to use a portion of this set to build up the hidden subspace step by step. The complexity depends exponentially or cubically on the number of the selected features. In this paper, we present SEGCLU, a SEGregation-based subspace CLUstering method which significantly reduces the size of the candidate features' set and has a cubic complexity. This algorithm was applied at noise-free data of DNA copy numbers of two groups of autistic and typically developing children to extract a potential bio-marker for autism. 85% of the individuals were classified correctly in a 13-dimensional subspace.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2010 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)

自引率

0.00%

发文量