A Scalable k-Medoids Clustering via Whale Optimization Algorithm

arXiv - CS - Performance Pub Date : 2024-08-30 DOI:arxiv-2408.16993

Huang Chenan, Narumasa Tsutsumida

{"title":"A Scalable k-Medoids Clustering via Whale Optimization Algorithm","authors":"Huang Chenan, Narumasa Tsutsumida","doi":"arxiv-2408.16993","DOIUrl":null,"url":null,"abstract":"Unsupervised clustering has emerged as a critical tool for uncovering hidden\npatterns and insights from vast, unlabeled datasets. However, traditional\nmethods like Partitioning Around Medoids (PAM) struggle with scalability due to\ntheir quadratic computational complexity. To address this limitation, we\nintroduce WOA-kMedoids, a novel unsupervised clustering method that\nincorporates the Whale Optimization Algorithm (WOA), a nature-inspired\nmetaheuristic inspired by the hunting strategies of humpback whales. By\noptimizing centroid selection, WOA-kMedoids reduces computational complexity of\nthe k-medoids algorithm from quadratic to near-linear with respect to the\nnumber of observations. This improvement in efficiency enables WOA-kMedoids to\nbe scalable to large datasets while maintaining high clustering accuracy. We\nevaluated the performance of WOA-kMedoids on 25 diverse time series datasets\nfrom the UCR archive. Our empirical results demonstrate that WOA-kMedoids\nmaintains clustering accuracy similar to PAM. While WOA-kMedoids exhibited\nslightly higher runtime than PAM on small datasets (less than 300\nobservations), it outperformed PAM in computational efficiency on larger\ndatasets. The scalability of WOA-kMedoids, combined with its consistently high\naccuracy, positions it as a promising and practical choice for unsupervised\nclustering in big data applications. WOA-kMedoids has implications for\nefficient knowledge discovery in massive, unlabeled datasets across various\ndomains.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Performance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.16993","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Unsupervised clustering has emerged as a critical tool for uncovering hidden patterns and insights from vast, unlabeled datasets. However, traditional methods like Partitioning Around Medoids (PAM) struggle with scalability due to their quadratic computational complexity. To address this limitation, we introduce WOA-kMedoids, a novel unsupervised clustering method that incorporates the Whale Optimization Algorithm (WOA), a nature-inspired metaheuristic inspired by the hunting strategies of humpback whales. By optimizing centroid selection, WOA-kMedoids reduces computational complexity of the k-medoids algorithm from quadratic to near-linear with respect to the number of observations. This improvement in efficiency enables WOA-kMedoids to be scalable to large datasets while maintaining high clustering accuracy. We evaluated the performance of WOA-kMedoids on 25 diverse time series datasets from the UCR archive. Our empirical results demonstrate that WOA-kMedoids maintains clustering accuracy similar to PAM. While WOA-kMedoids exhibited slightly higher runtime than PAM on small datasets (less than 300 observations), it outperformed PAM in computational efficiency on larger datasets. The scalability of WOA-kMedoids, combined with its consistently high accuracy, positions it as a promising and practical choice for unsupervised clustering in big data applications. WOA-kMedoids has implications for efficient knowledge discovery in massive, unlabeled datasets across various domains.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过鲸鱼优化算法实现可扩展的 k-Medoids 集群

无监督聚类已成为从庞大的无标记数据集中发掘隐藏模式和洞察力的重要工具。然而，传统的方法（如环中网格划分法（PAM））由于其二次计算复杂性而难以扩展。为了解决这一局限性，我们引入了 WOA-kMedoids，这是一种新型的无监督聚类方法，它结合了鲸鱼优化算法（WOA），这是一种受座头鲸狩猎策略启发的自然启发元启发式算法。通过优化中心点选择，WOA-kMedoids 将 k-medoids 算法的计算复杂度从与观测值数量相关的二次方降低到接近线性。效率的提高使 WOA-kMedoids 可以扩展到大型数据集，同时保持较高的聚类精度。我们在来自 UCR 档案库的 25 个不同时间序列数据集上评估了 WOA-kMedoids 的性能。实证结果表明，WOA-kMedoids 保持了与 PAM 相似的聚类精度。虽然 WOA-kMedoids 在小型数据集（少于 300 个观测值）上的运行时间略高于 PAM，但在大型数据集上的计算效率却优于 PAM。WOA-kMedoids 的可扩展性加上其一贯的高精确度，使它成为大数据应用中无监督聚类的一个有前途的实用选择。WOA-kMedoids 对在不同领域的海量无标记数据集中进行高效知识发现具有重要意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Performance

自引率

0.00%

发文量