{"title":"A Novel Clustering Algorithm for Prefix-Coded Data Stream Based upon Median-Tree","authors":"Guangsheng Feng, Huiqiang Wang, Qian Zhao, Ying Liang","doi":"10.1109/ICICSE.2008.103","DOIUrl":null,"url":null,"abstract":"In actual data streams, there are lots of prefix-coded data, which widely existed in applications. What leads to non-ideal performance and clustering result is that the special treatment of these prefix-coded data structure is not considered in traditional clustering algorithm. To deal with this problem, a new concept of median-tree as well as a method of calculating the coding distance is proposed in this paper. Based upon this, a simple algorithm-dfCluster is put forward, which is capable of dealing with the prefix-coded data streams efficiently. Also, the algorithm analysis is presented in depth. At last, the designed experiment demonstrates that dfCluster is more efficient than the naive algorithm to cluster those kinds of data streams, and meanwhile, the performance of our algorithm is not limited by the specified value of k just as in algorithm k-means.","PeriodicalId":333889,"journal":{"name":"2008 International Conference on Internet Computing in Science and Engineering","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 International Conference on Internet Computing in Science and Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICSE.2008.103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In actual data streams, there are lots of prefix-coded data, which widely existed in applications. What leads to non-ideal performance and clustering result is that the special treatment of these prefix-coded data structure is not considered in traditional clustering algorithm. To deal with this problem, a new concept of median-tree as well as a method of calculating the coding distance is proposed in this paper. Based upon this, a simple algorithm-dfCluster is put forward, which is capable of dealing with the prefix-coded data streams efficiently. Also, the algorithm analysis is presented in depth. At last, the designed experiment demonstrates that dfCluster is more efficient than the naive algorithm to cluster those kinds of data streams, and meanwhile, the performance of our algorithm is not limited by the specified value of k just as in algorithm k-means.