不确定数据的代表性聚类

Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2014-08-24 DOI:10.1145/2623330.2623725

Andreas Züfle, Tobias Emrich, Klaus Arthur Schmid, N. Mamoulis, A. Zimek, M. Renz

{"title":"不确定数据的代表性聚类","authors":"Andreas Züfle, Tobias Emrich, Klaus Arthur Schmid, N. Mamoulis, A. Zimek, M. Renz","doi":"10.1145/2623330.2623725","DOIUrl":null,"url":null,"abstract":"This paper targets the problem of computing meaningful clusterings from uncertain data sets. Existing methods for clustering uncertain data compute a single clustering without any indication of its quality and reliability; thus, decisions based on their results are questionable. In this paper, we describe a framework, based on possible-worlds semantics; when applied on an uncertain dataset, it computes a set of representative clusterings, each of which has a probabilistic guarantee not to exceed some maximum distance to the ground truth clustering, i.e., the clustering of the actual (but unknown) data. Our framework can be combined with any existing clustering algorithm and it is the first to provide quality guarantees about its result. In addition, our experimental evaluation shows that our representative clusterings have a much smaller deviation from the ground truth clustering than existing approaches, thus reducing the effect of uncertainty.","PeriodicalId":20536,"journal":{"name":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"57 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":"{\"title\":\"Representative clustering of uncertain data\",\"authors\":\"Andreas Züfle, Tobias Emrich, Klaus Arthur Schmid, N. Mamoulis, A. Zimek, M. Renz\",\"doi\":\"10.1145/2623330.2623725\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper targets the problem of computing meaningful clusterings from uncertain data sets. Existing methods for clustering uncertain data compute a single clustering without any indication of its quality and reliability; thus, decisions based on their results are questionable. In this paper, we describe a framework, based on possible-worlds semantics; when applied on an uncertain dataset, it computes a set of representative clusterings, each of which has a probabilistic guarantee not to exceed some maximum distance to the ground truth clustering, i.e., the clustering of the actual (but unknown) data. Our framework can be combined with any existing clustering algorithm and it is the first to provide quality guarantees about its result. In addition, our experimental evaluation shows that our representative clusterings have a much smaller deviation from the ground truth clustering than existing approaches, thus reducing the effect of uncertainty.\",\"PeriodicalId\":20536,\"journal\":{\"name\":\"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining\",\"volume\":\"57 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-08-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"32\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2623330.2623725\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2623330.2623725","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 32

摘要

本文研究了不确定数据集中有意义聚类的计算问题。现有的不确定数据聚类方法计算单个聚类，而不表明其质量和可靠性;因此，基于结果的决定是有问题的。在本文中，我们描述了一个基于可能世界语义的框架;当应用于不确定数据集时，它计算一组有代表性的聚类，每个聚类都有概率保证不超过到真实聚类的最大距离，即实际(但未知)数据的聚类。我们的框架可以与任何现有的聚类算法相结合，并且它是第一个为其结果提供质量保证的框架。此外，我们的实验评估表明，与现有方法相比，我们的代表性聚类与地面真实聚类的偏差要小得多，从而减少了不确定性的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Representative clustering of uncertain data

This paper targets the problem of computing meaningful clusterings from uncertain data sets. Existing methods for clustering uncertain data compute a single clustering without any indication of its quality and reliability; thus, decisions based on their results are questionable. In this paper, we describe a framework, based on possible-worlds semantics; when applied on an uncertain dataset, it computes a set of representative clusterings, each of which has a probabilistic guarantee not to exceed some maximum distance to the ground truth clustering, i.e., the clustering of the actual (but unknown) data. Our framework can be combined with any existing clustering algorithm and it is the first to provide quality guarantees about its result. In addition, our experimental evaluation shows that our representative clusterings have a much smaller deviation from the ground truth clustering than existing approaches, thus reducing the effect of uncertainty.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

自引率

0.00%

发文量