缺失值数据的柔性容错子空间聚类

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI:10.1109/ICDM.2011.70

Stephan Günnemann, Emmanuel Müller, S. Raubach, T. Seidl

{"title":"缺失值数据的柔性容错子空间聚类","authors":"Stephan Günnemann, Emmanuel Müller, S. Raubach, T. Seidl","doi":"10.1109/ICDM.2011.70","DOIUrl":null,"url":null,"abstract":"In today's applications, data analysis tasks are hindered by many attributes per object as well as by faulty data with missing values. Subspace clustering tackles the challenge of many attributes by cluster detection in any subspace projection of the data. However, it poses novel challenges for handling missing values of objects, which are part of multiple subspace clusters in different projections of the data. In this work, we propose a general fault tolerance definition enhancing subspace clustering models to handle missing values. We introduce a flexible notion of fault tolerance that adapts to the individual characteristics of subspace clusters and ensures a robust parameterization. Allowing missing values in our model increases the computational complexity of subspace clustering. Thus, we prove novel monotonicity properties for an efficient computation of fault tolerant subspace clusters. Experiments on real and synthetic data show that our fault tolerance model yields high quality results even in the presence of many missing values. For repeatability, we provide all datasets and executables on our website.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"74 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"Flexible Fault Tolerant Subspace Clustering for Data with Missing Values\",\"authors\":\"Stephan Günnemann, Emmanuel Müller, S. Raubach, T. Seidl\",\"doi\":\"10.1109/ICDM.2011.70\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In today's applications, data analysis tasks are hindered by many attributes per object as well as by faulty data with missing values. Subspace clustering tackles the challenge of many attributes by cluster detection in any subspace projection of the data. However, it poses novel challenges for handling missing values of objects, which are part of multiple subspace clusters in different projections of the data. In this work, we propose a general fault tolerance definition enhancing subspace clustering models to handle missing values. We introduce a flexible notion of fault tolerance that adapts to the individual characteristics of subspace clusters and ensures a robust parameterization. Allowing missing values in our model increases the computational complexity of subspace clustering. Thus, we prove novel monotonicity properties for an efficient computation of fault tolerant subspace clusters. Experiments on real and synthetic data show that our fault tolerance model yields high quality results even in the presence of many missing values. For repeatability, we provide all datasets and executables on our website.\",\"PeriodicalId\":106216,\"journal\":{\"name\":\"2011 IEEE 11th International Conference on Data Mining\",\"volume\":\"74 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-12-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE 11th International Conference on Data Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDM.2011.70\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 11th International Conference on Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2011.70","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 22

摘要

在今天的应用程序中，数据分析任务受到每个对象的许多属性以及缺失值的错误数据的阻碍。子空间聚类通过在数据的任何子空间投影中进行聚类检测来解决许多属性的挑战。然而，它对处理对象的缺失值提出了新的挑战，这些对象是数据不同投影中多个子空间聚类的一部分。在这项工作中，我们提出了一个通用的容错定义，增强子空间聚类模型来处理缺失值。我们引入了一种灵活的容错概念，以适应子空间集群的个体特征，并确保鲁棒参数化。在我们的模型中允许缺失值增加了子空间聚类的计算复杂度。因此，我们证明了一种新的单调性，可以有效地计算容错子空间簇。在真实数据和合成数据上的实验表明，即使存在许多缺失值，我们的容错模型也能产生高质量的结果。为了可重复性，我们在我们的网站上提供所有的数据集和可执行文件。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Flexible Fault Tolerant Subspace Clustering for Data with Missing Values

In today's applications, data analysis tasks are hindered by many attributes per object as well as by faulty data with missing values. Subspace clustering tackles the challenge of many attributes by cluster detection in any subspace projection of the data. However, it poses novel challenges for handling missing values of objects, which are part of multiple subspace clusters in different projections of the data. In this work, we propose a general fault tolerance definition enhancing subspace clustering models to handle missing values. We introduce a flexible notion of fault tolerance that adapts to the individual characteristics of subspace clusters and ensures a robust parameterization. Allowing missing values in our model increases the computational complexity of subspace clustering. Thus, we prove novel monotonicity properties for an efficient computation of fault tolerant subspace clusters. Experiments on real and synthetic data show that our fault tolerance model yields high quality results even in the presence of many missing values. For repeatability, we provide all datasets and executables on our website.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 IEEE 11th International Conference on Data Mining

自引率

0.00%

发文量

期刊最新文献

Nonnegative Matrix Tri-factorization Based High-Order Co-clustering and Its Fast Implementation Helix: Unsupervised Grammar Induction for Structured Activity Recognition Partitionable Kernels for Mapping Kernels Multi-task Learning for Bayesian Matrix Factorization Discovering the Intrinsic Cardinality and Dimensionality of Time Series Using MDL