Improving Clustering Accuracy of K-Means and Random Swap by an Evolutionary Technique Based on Careful Seeding

IF 2.1 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Algorithms Pub Date : 2023-12-17 DOI:10.3390/a16120572
L. Nigro, F. Cicirelli
{"title":"Improving Clustering Accuracy of K-Means and Random Swap by an Evolutionary Technique Based on Careful Seeding","authors":"L. Nigro, F. Cicirelli","doi":"10.3390/a16120572","DOIUrl":null,"url":null,"abstract":"K-Means is a “de facto” standard clustering algorithm due to its simplicity and efficiency. K-Means, though, strongly depends on the initialization of the centroids (seeding method) and often gets stuck in a local sub-optimal solution. K-Means, in fact, mainly acts as a local refiner of the centroids, and it is unable to move centroids all over the data space. Random Swap was defined to go beyond K-Means, and its modus operandi integrates K-Means in a global strategy of centroids management, which can often generate a clustering solution close to the global optimum. This paper proposes an approach which extends both K-Means and Random Swap and improves the clustering accuracy through an evolutionary technique and careful seeding. Two new algorithms are proposed: the Population-Based K-Means (PB-KM) and the Population-Based Random Swap (PB-RS). Both algorithms consist of two steps: first, a population of J candidate solutions is built, and then the candidate centroids are repeatedly recombined toward a final accurate solution. The paper motivates the design of PB-KM and PB-RS, outlines their current implementation in Java based on parallel streams, and demonstrates the achievable clustering accuracy using both synthetic and real-world datasets.","PeriodicalId":7636,"journal":{"name":"Algorithms","volume":"6 12","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Algorithms","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/a16120572","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

K-Means is a “de facto” standard clustering algorithm due to its simplicity and efficiency. K-Means, though, strongly depends on the initialization of the centroids (seeding method) and often gets stuck in a local sub-optimal solution. K-Means, in fact, mainly acts as a local refiner of the centroids, and it is unable to move centroids all over the data space. Random Swap was defined to go beyond K-Means, and its modus operandi integrates K-Means in a global strategy of centroids management, which can often generate a clustering solution close to the global optimum. This paper proposes an approach which extends both K-Means and Random Swap and improves the clustering accuracy through an evolutionary technique and careful seeding. Two new algorithms are proposed: the Population-Based K-Means (PB-KM) and the Population-Based Random Swap (PB-RS). Both algorithms consist of two steps: first, a population of J candidate solutions is built, and then the candidate centroids are repeatedly recombined toward a final accurate solution. The paper motivates the design of PB-KM and PB-RS, outlines their current implementation in Java based on parallel streams, and demonstrates the achievable clustering accuracy using both synthetic and real-world datasets.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过基于仔细播种的进化技术提高 K-Means 和随机交换的聚类精度
K-Means 算法简单高效,是 "事实上 "的标准聚类算法。不过,K-Means 严重依赖于中心点的初始化(播种方法),经常会陷入局部次优解。事实上,K-Means 主要充当中心点的局部细化器,无法将中心点移动到整个数据空间。随机交换的定义超越了 K-Means,其工作方式是将 K-Means 整合到中心点管理的全局策略中,这通常能产生接近全局最优的聚类解决方案。本文提出了一种扩展 K-Means 和随机交换的方法,并通过进化技术和精心播种提高了聚类精度。本文提出了两种新算法:基于种群的 K-Means 算法(PB-KM)和基于种群的随机交换算法(PB-RS)。这两种算法都包括两个步骤:首先,建立一个由 J 个候选解组成的群体,然后对候选中心点进行反复重组,以获得最终的精确解。论文介绍了 PB-KM 和 PB-RS 的设计动机,概述了它们目前基于并行流的 Java 实现,并使用合成数据集和真实数据集演示了可实现的聚类精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Algorithms
Algorithms Mathematics-Numerical Analysis
CiteScore
4.10
自引率
4.30%
发文量
394
审稿时长
11 weeks
期刊最新文献
Synthesizing Explainability Across Multiple ML Models for Structured Data. Finding Multiple Optimal Solutions to an Integer Linear Program by Random Perturbations of Its Objective Function. Anomaly Detection in High-Dimensional Time Series Data with Scaled Bregman Divergence. Specification Mining Based on the Ordering Points to Identify the Clustering Structure Clustering Algorithm and Model Checking Personalized Advertising in E-Commerce: Using Clickstream Data to Target High-Value Customers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1