基于影响指数的改进K-Means算法

2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT) Pub Date : 2022-12-09 DOI:10.1109/ACAIT56212.2022.10137982

Shaobo Deng, Min Li, Xuegang Li, Lei Wang, Sujie Guan

{"title":"基于影响指数的改进K-Means算法","authors":"Shaobo Deng, Min Li, Xuegang Li, Lei Wang, Sujie Guan","doi":"10.1109/ACAIT56212.2022.10137982","DOIUrl":null,"url":null,"abstract":"The k-means clustering algorithm is a very classical clustering algorithm that is widely used because of its excellent efficiency and performance. The algorithm uses Euclidean distance to calculate the similarity between samples and iteratively updates the membership matrix to obtain clustering results. However, when k-means algorithm clusters datasets containing samples with intra-cluster distances greater than inter-cluster distances, errors often occur when partitioning the boundary samples, which eventually leads to unsatisfactory results. Moreover, although k-means algorithm makes the intra-cluster distance as small as possible, it neglects to maximize the inter-cluster distance, and eventually only finds the local optimal solution. Different from the existing k-means type algorithm, this paper proposes a similarity measure based on the impact factor, which determines the partitioning result by comparing the impact of samples on each cluster. And on the basis of the objective function of k-means algorithm, we combine the inter-cluster distance to solve the defects of local optimality that exist in k-means algorithm. In the paper, we theoretically analyze and prove the proposed method, and compare and analyze the clustering results of the algorithm with the class k-means algorithm on real datasets, and confirm that the proposed algorithm in this paper can effectively avoid the defects of the class k-means algorithm.","PeriodicalId":398228,"journal":{"name":"2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Improved K-Means Algorithm Based on Impact Index\",\"authors\":\"Shaobo Deng, Min Li, Xuegang Li, Lei Wang, Sujie Guan\",\"doi\":\"10.1109/ACAIT56212.2022.10137982\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The k-means clustering algorithm is a very classical clustering algorithm that is widely used because of its excellent efficiency and performance. The algorithm uses Euclidean distance to calculate the similarity between samples and iteratively updates the membership matrix to obtain clustering results. However, when k-means algorithm clusters datasets containing samples with intra-cluster distances greater than inter-cluster distances, errors often occur when partitioning the boundary samples, which eventually leads to unsatisfactory results. Moreover, although k-means algorithm makes the intra-cluster distance as small as possible, it neglects to maximize the inter-cluster distance, and eventually only finds the local optimal solution. Different from the existing k-means type algorithm, this paper proposes a similarity measure based on the impact factor, which determines the partitioning result by comparing the impact of samples on each cluster. And on the basis of the objective function of k-means algorithm, we combine the inter-cluster distance to solve the defects of local optimality that exist in k-means algorithm. In the paper, we theoretically analyze and prove the proposed method, and compare and analyze the clustering results of the algorithm with the class k-means algorithm on real datasets, and confirm that the proposed algorithm in this paper can effectively avoid the defects of the class k-means algorithm.\",\"PeriodicalId\":398228,\"journal\":{\"name\":\"2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT)\",\"volume\":\"59 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ACAIT56212.2022.10137982\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACAIT56212.2022.10137982","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

k-means聚类算法是一种非常经典的聚类算法，由于其优异的效率和性能被广泛应用。该算法利用欧氏距离计算样本间的相似度，并迭代更新隶属矩阵，得到聚类结果。然而，当k-means算法对包含簇内距离大于簇间距离的样本的数据集进行聚类时，在划分边界样本时往往会出现错误，最终导致结果不理想。此外，k-means算法虽然使簇内距离尽可能小，但忽略了簇间距离的最大化，最终只能找到局部最优解。与现有的k-means型算法不同，本文提出了一种基于影响因子的相似性度量，通过比较样本对每个聚类的影响来确定划分结果。在k-means算法目标函数的基础上，结合聚类间距离，解决了k-means算法存在的局部最优性缺陷。本文对本文提出的方法进行了理论分析和证明，并将算法与k-means算法在真实数据集上的聚类结果进行了比较分析，证实本文提出的算法能够有效地避免k-means算法的缺陷。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An Improved K-Means Algorithm Based on Impact Index

The k-means clustering algorithm is a very classical clustering algorithm that is widely used because of its excellent efficiency and performance. The algorithm uses Euclidean distance to calculate the similarity between samples and iteratively updates the membership matrix to obtain clustering results. However, when k-means algorithm clusters datasets containing samples with intra-cluster distances greater than inter-cluster distances, errors often occur when partitioning the boundary samples, which eventually leads to unsatisfactory results. Moreover, although k-means algorithm makes the intra-cluster distance as small as possible, it neglects to maximize the inter-cluster distance, and eventually only finds the local optimal solution. Different from the existing k-means type algorithm, this paper proposes a similarity measure based on the impact factor, which determines the partitioning result by comparing the impact of samples on each cluster. And on the basis of the objective function of k-means algorithm, we combine the inter-cluster distance to solve the defects of local optimality that exist in k-means algorithm. In the paper, we theoretically analyze and prove the proposed method, and compare and analyze the clustering results of the algorithm with the class k-means algorithm on real datasets, and confirm that the proposed algorithm in this paper can effectively avoid the defects of the class k-means algorithm.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT)

自引率

0.00%

发文量