All-pairwise squared distances lead to more balanced clustering

Mikko I. Malinen, P. Fränti
{"title":"All-pairwise squared distances lead to more balanced clustering","authors":"Mikko I. Malinen, P. Fränti","doi":"10.3934/aci.2023006","DOIUrl":null,"url":null,"abstract":"In clustering, the cost function that is commonly used involves calculating all-pairwise squared distances. In this paper, we formulate the cost function using mean squared error and show that this leads to more balanced clustering compared to centroid-based distance functions, like the sum of squared distances in $ k $-means. The clustering method has been formulated as a cut-based approach, more intuitively called Squared cut (Scut). We introduce an algorithm for the problem which is faster than the existing one based on the Stirling approximation. Our algorithm is a sequential variant of a local search algorithm. We show by experiments that the proposed approach provides better overall optimization of both mean squared error and cluster balance compared to existing methods.","PeriodicalId":414924,"journal":{"name":"Applied Computing and Intelligence","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computing and Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3934/aci.2023006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

In clustering, the cost function that is commonly used involves calculating all-pairwise squared distances. In this paper, we formulate the cost function using mean squared error and show that this leads to more balanced clustering compared to centroid-based distance functions, like the sum of squared distances in $ k $-means. The clustering method has been formulated as a cut-based approach, more intuitively called Squared cut (Scut). We introduce an algorithm for the problem which is faster than the existing one based on the Stirling approximation. Our algorithm is a sequential variant of a local search algorithm. We show by experiments that the proposed approach provides better overall optimization of both mean squared error and cluster balance compared to existing methods.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
全成对平方距离导致更平衡的聚类
在聚类中,通常使用的代价函数包括计算全成对的平方距离。在本文中,我们使用均方误差来制定成本函数,并表明与基于质心的距离函数(如k -means中的距离平方和)相比,这导致了更平衡的聚类。聚类方法已被制定为基于切割的方法,更直观地称为平方切割(Scut)。我们提出了一种比现有的基于斯特林近似的求解速度更快的算法。我们的算法是局部搜索算法的顺序变体。我们通过实验证明,与现有方法相比,所提出的方法在均方误差和簇平衡方面提供了更好的整体优化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Novel split quality measures for stratified multilabel cross validation with application to large and sparse gene ontology datasets Crop and weed classification based on AutoML A review of the application of machine learning in adult obesity studies Definition modeling: literature review and dataset analysis Effects of COVID-19 pandemic on computational intelligence and cybersecurity: survey
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1