Andreas Züfle, Tobias Emrich, Klaus Arthur Schmid, N. Mamoulis, A. Zimek, M. Renz
{"title":"不确定数据的代表性聚类","authors":"Andreas Züfle, Tobias Emrich, Klaus Arthur Schmid, N. Mamoulis, A. Zimek, M. Renz","doi":"10.1145/2623330.2623725","DOIUrl":null,"url":null,"abstract":"This paper targets the problem of computing meaningful clusterings from uncertain data sets. Existing methods for clustering uncertain data compute a single clustering without any indication of its quality and reliability; thus, decisions based on their results are questionable. In this paper, we describe a framework, based on possible-worlds semantics; when applied on an uncertain dataset, it computes a set of representative clusterings, each of which has a probabilistic guarantee not to exceed some maximum distance to the ground truth clustering, i.e., the clustering of the actual (but unknown) data. Our framework can be combined with any existing clustering algorithm and it is the first to provide quality guarantees about its result. In addition, our experimental evaluation shows that our representative clusterings have a much smaller deviation from the ground truth clustering than existing approaches, thus reducing the effect of uncertainty.","PeriodicalId":20536,"journal":{"name":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"57 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":"{\"title\":\"Representative clustering of uncertain data\",\"authors\":\"Andreas Züfle, Tobias Emrich, Klaus Arthur Schmid, N. Mamoulis, A. Zimek, M. Renz\",\"doi\":\"10.1145/2623330.2623725\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper targets the problem of computing meaningful clusterings from uncertain data sets. Existing methods for clustering uncertain data compute a single clustering without any indication of its quality and reliability; thus, decisions based on their results are questionable. In this paper, we describe a framework, based on possible-worlds semantics; when applied on an uncertain dataset, it computes a set of representative clusterings, each of which has a probabilistic guarantee not to exceed some maximum distance to the ground truth clustering, i.e., the clustering of the actual (but unknown) data. Our framework can be combined with any existing clustering algorithm and it is the first to provide quality guarantees about its result. In addition, our experimental evaluation shows that our representative clusterings have a much smaller deviation from the ground truth clustering than existing approaches, thus reducing the effect of uncertainty.\",\"PeriodicalId\":20536,\"journal\":{\"name\":\"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining\",\"volume\":\"57 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-08-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"32\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2623330.2623725\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2623330.2623725","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
This paper targets the problem of computing meaningful clusterings from uncertain data sets. Existing methods for clustering uncertain data compute a single clustering without any indication of its quality and reliability; thus, decisions based on their results are questionable. In this paper, we describe a framework, based on possible-worlds semantics; when applied on an uncertain dataset, it computes a set of representative clusterings, each of which has a probabilistic guarantee not to exceed some maximum distance to the ground truth clustering, i.e., the clustering of the actual (but unknown) data. Our framework can be combined with any existing clustering algorithm and it is the first to provide quality guarantees about its result. In addition, our experimental evaluation shows that our representative clusterings have a much smaller deviation from the ground truth clustering than existing approaches, thus reducing the effect of uncertainty.