一种用于发现非线性可选聚类的层次信息理论技术

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI:10.1145/1835804.1835878

Xuan-Hong Dang, J. Bailey

{"title":"一种用于发现非线性可选聚类的层次信息理论技术","authors":"Xuan-Hong Dang, J. Bailey","doi":"10.1145/1835804.1835878","DOIUrl":null,"url":null,"abstract":"Discovery of alternative clusterings is an important method for exploring complex datasets. It provides the capability for the user to view clustering behaviour from different perspectives and thus explore new hypotheses. However, current algorithms for alternative clustering have focused mainly on linear scenarios and may not perform as desired for datasets containing clusters with non linear shapes. Our goal in this paper is to address this challenge of non linearity. In particular, we propose a novel algorithm to uncover an alternative clustering that is distinctively different from an existing, reference clustering. Our technique is information theory based and aims to ensure alternative clustering quality by maximizing the mutual information between clustering labels and data observations, whilst at the same time ensuring alternative clustering distinctiveness by minimizing the information sharing between the two clusterings. We perform experiments to assess our method against a large range of alternative clustering algorithms in the literature. We show our technique's performance is generally better for non-linear scenarios and furthermore, is highly competitive even for simpler, linear scenarios.","PeriodicalId":20529,"journal":{"name":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"7 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2010-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"51","resultStr":"{\"title\":\"A hierarchical information theoretic technique for the discovery of non linear alternative clusterings\",\"authors\":\"Xuan-Hong Dang, J. Bailey\",\"doi\":\"10.1145/1835804.1835878\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Discovery of alternative clusterings is an important method for exploring complex datasets. It provides the capability for the user to view clustering behaviour from different perspectives and thus explore new hypotheses. However, current algorithms for alternative clustering have focused mainly on linear scenarios and may not perform as desired for datasets containing clusters with non linear shapes. Our goal in this paper is to address this challenge of non linearity. In particular, we propose a novel algorithm to uncover an alternative clustering that is distinctively different from an existing, reference clustering. Our technique is information theory based and aims to ensure alternative clustering quality by maximizing the mutual information between clustering labels and data observations, whilst at the same time ensuring alternative clustering distinctiveness by minimizing the information sharing between the two clusterings. We perform experiments to assess our method against a large range of alternative clustering algorithms in the literature. We show our technique's performance is generally better for non-linear scenarios and furthermore, is highly competitive even for simpler, linear scenarios.\",\"PeriodicalId\":20529,\"journal\":{\"name\":\"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining\",\"volume\":\"7 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"51\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1835804.1835878\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1835804.1835878","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 51

摘要

发现替代聚类是探索复杂数据集的重要方法。它为用户提供了从不同角度观察聚类行为的能力，从而探索新的假设。然而，目前的替代聚类算法主要集中在线性场景上，对于包含非线性形状聚类的数据集可能不能像期望的那样执行。我们在本文中的目标是解决这一非线性的挑战。特别是，我们提出了一种新的算法来发现与现有的参考聚类明显不同的替代聚类。我们的技术是基于信息论的，旨在通过最大化聚类标签和数据观测之间的相互信息来确保备选聚类的质量，同时通过最小化两个聚类之间的信息共享来确保备选聚类的独特性。我们执行实验来评估我们的方法对大量的替代聚类算法在文献中。我们表明，我们的技术在非线性场景中的表现通常更好，而且，即使在更简单的线性场景中也具有很强的竞争力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A hierarchical information theoretic technique for the discovery of non linear alternative clusterings

Discovery of alternative clusterings is an important method for exploring complex datasets. It provides the capability for the user to view clustering behaviour from different perspectives and thus explore new hypotheses. However, current algorithms for alternative clustering have focused mainly on linear scenarios and may not perform as desired for datasets containing clusters with non linear shapes. Our goal in this paper is to address this challenge of non linearity. In particular, we propose a novel algorithm to uncover an alternative clustering that is distinctively different from an existing, reference clustering. Our technique is information theory based and aims to ensure alternative clustering quality by maximizing the mutual information between clustering labels and data observations, whilst at the same time ensuring alternative clustering distinctiveness by minimizing the information sharing between the two clusterings. We perform experiments to assess our method against a large range of alternative clustering algorithms in the literature. We show our technique's performance is generally better for non-linear scenarios and furthermore, is highly competitive even for simpler, linear scenarios.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

自引率

0.00%

发文量

期刊最新文献

Frequent regular itemset mining Suggesting friends using the implicit social graph Collusion-resistant privacy-preserving data mining Mining advisor-advisee relationships from research publication networks Session details: Research track 5: classification models and tools