{"title":"Scalable clustering with adaptive instance sampling","authors":"Jaekyung Yang, ByoungJin Yu, Myoungjin Choi","doi":"10.1109/IEEM.2013.6962622","DOIUrl":null,"url":null,"abstract":"Most of the clustering algorithms are affected by the number of attributes and instances with respect to the computation time. Thus, the data mining community has made efforts to enable induction of the clustering efficient. Hence, scalability is naturally a critical issue that the data mining community faces. A method to handle this issue is to use a subset of all instances. This paper suggests an algorithm that enables to perform clustering efficiently. This is done by using nested partitions method for solving the noisy performance problems, which arises when using a subset of instances and adjusting the sample rate properly at each iteration. This Adaptive NPCLUSTER algorithm had better similarity in small dataset and had worse similarity in large dataset than NPCLUSTER, but it had shorter computation time than NPCLUSTER.","PeriodicalId":6454,"journal":{"name":"2013 IEEE International Conference on Industrial Engineering and Engineering Management","volume":"105 1","pages":"1309-1313"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Conference on Industrial Engineering and Engineering Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IEEM.2013.6962622","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Most of the clustering algorithms are affected by the number of attributes and instances with respect to the computation time. Thus, the data mining community has made efforts to enable induction of the clustering efficient. Hence, scalability is naturally a critical issue that the data mining community faces. A method to handle this issue is to use a subset of all instances. This paper suggests an algorithm that enables to perform clustering efficiently. This is done by using nested partitions method for solving the noisy performance problems, which arises when using a subset of instances and adjusting the sample rate properly at each iteration. This Adaptive NPCLUSTER algorithm had better similarity in small dataset and had worse similarity in large dataset than NPCLUSTER, but it had shorter computation time than NPCLUSTER.