聚类问题的K-means优化算法

2009 Second International Workshop on Knowledge Discovery and Data Mining Pub Date : 2009-01-23 DOI:10.1109/WKDD.2009.85

Jinxin Dong, Min-yong Qi

{"title":"聚类问题的K-means优化算法","authors":"Jinxin Dong, Min-yong Qi","doi":"10.1109/WKDD.2009.85","DOIUrl":null,"url":null,"abstract":"The basic K-means is sensitive to the initial centre and easy to get stuck at local optimal value. To solve such problems, a new clustering algorithm is proposed based on simulated annealing. The algorithm views the clustering as optimization problem, the bisecting K-means splits the dataset into k clusters at first, and then run simulated annealing algorithm using the sum of distances between each pattern and its centre based on bisecting K-means as the aim function. To avoid the shortcomings of simulated annealing such as long computation time and low efficiency, a new data structure named sequence list is given. The experiment result shows the feasibility and validity of the proposed algorithm.","PeriodicalId":143250,"journal":{"name":"2009 Second International Workshop on Knowledge Discovery and Data Mining","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"K-means Optimization Algorithm for Solving Clustering Problem\",\"authors\":\"Jinxin Dong, Min-yong Qi\",\"doi\":\"10.1109/WKDD.2009.85\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The basic K-means is sensitive to the initial centre and easy to get stuck at local optimal value. To solve such problems, a new clustering algorithm is proposed based on simulated annealing. The algorithm views the clustering as optimization problem, the bisecting K-means splits the dataset into k clusters at first, and then run simulated annealing algorithm using the sum of distances between each pattern and its centre based on bisecting K-means as the aim function. To avoid the shortcomings of simulated annealing such as long computation time and low efficiency, a new data structure named sequence list is given. The experiment result shows the feasibility and validity of the proposed algorithm.\",\"PeriodicalId\":143250,\"journal\":{\"name\":\"2009 Second International Workshop on Knowledge Discovery and Data Mining\",\"volume\":\"54 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-01-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 Second International Workshop on Knowledge Discovery and Data Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WKDD.2009.85\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Second International Workshop on Knowledge Discovery and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WKDD.2009.85","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 24

摘要

基本k均值对初始中心很敏感，容易卡在局部最优值。为了解决这类问题，提出了一种基于模拟退火的聚类算法。该算法将聚类问题视为优化问题，采用k均值平分法首先将数据集分成k个聚类，然后以基于k均值平分法的每个模式与其中心之间的距离之和作为目标函数，运行模拟退火算法。为了避免模拟退火算法计算时间长、效率低的缺点，提出了一种新的数据结构——序列表。实验结果表明了该算法的可行性和有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

K-means Optimization Algorithm for Solving Clustering Problem

The basic K-means is sensitive to the initial centre and easy to get stuck at local optimal value. To solve such problems, a new clustering algorithm is proposed based on simulated annealing. The algorithm views the clustering as optimization problem, the bisecting K-means splits the dataset into k clusters at first, and then run simulated annealing algorithm using the sum of distances between each pattern and its centre based on bisecting K-means as the aim function. To avoid the shortcomings of simulated annealing such as long computation time and low efficiency, a new data structure named sequence list is given. The experiment result shows the feasibility and validity of the proposed algorithm.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2009 Second International Workshop on Knowledge Discovery and Data Mining

自引率

0.00%

发文量