{"title":"A Scalable k-Medoids Clustering via Whale Optimization Algorithm","authors":"Huang Chenan, Narumasa Tsutsumida","doi":"arxiv-2408.16993","DOIUrl":null,"url":null,"abstract":"Unsupervised clustering has emerged as a critical tool for uncovering hidden\npatterns and insights from vast, unlabeled datasets. However, traditional\nmethods like Partitioning Around Medoids (PAM) struggle with scalability due to\ntheir quadratic computational complexity. To address this limitation, we\nintroduce WOA-kMedoids, a novel unsupervised clustering method that\nincorporates the Whale Optimization Algorithm (WOA), a nature-inspired\nmetaheuristic inspired by the hunting strategies of humpback whales. By\noptimizing centroid selection, WOA-kMedoids reduces computational complexity of\nthe k-medoids algorithm from quadratic to near-linear with respect to the\nnumber of observations. This improvement in efficiency enables WOA-kMedoids to\nbe scalable to large datasets while maintaining high clustering accuracy. We\nevaluated the performance of WOA-kMedoids on 25 diverse time series datasets\nfrom the UCR archive. Our empirical results demonstrate that WOA-kMedoids\nmaintains clustering accuracy similar to PAM. While WOA-kMedoids exhibited\nslightly higher runtime than PAM on small datasets (less than 300\nobservations), it outperformed PAM in computational efficiency on larger\ndatasets. The scalability of WOA-kMedoids, combined with its consistently high\naccuracy, positions it as a promising and practical choice for unsupervised\nclustering in big data applications. WOA-kMedoids has implications for\nefficient knowledge discovery in massive, unlabeled datasets across various\ndomains.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Performance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.16993","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Unsupervised clustering has emerged as a critical tool for uncovering hidden
patterns and insights from vast, unlabeled datasets. However, traditional
methods like Partitioning Around Medoids (PAM) struggle with scalability due to
their quadratic computational complexity. To address this limitation, we
introduce WOA-kMedoids, a novel unsupervised clustering method that
incorporates the Whale Optimization Algorithm (WOA), a nature-inspired
metaheuristic inspired by the hunting strategies of humpback whales. By
optimizing centroid selection, WOA-kMedoids reduces computational complexity of
the k-medoids algorithm from quadratic to near-linear with respect to the
number of observations. This improvement in efficiency enables WOA-kMedoids to
be scalable to large datasets while maintaining high clustering accuracy. We
evaluated the performance of WOA-kMedoids on 25 diverse time series datasets
from the UCR archive. Our empirical results demonstrate that WOA-kMedoids
maintains clustering accuracy similar to PAM. While WOA-kMedoids exhibited
slightly higher runtime than PAM on small datasets (less than 300
observations), it outperformed PAM in computational efficiency on larger
datasets. The scalability of WOA-kMedoids, combined with its consistently high
accuracy, positions it as a promising and practical choice for unsupervised
clustering in big data applications. WOA-kMedoids has implications for
efficient knowledge discovery in massive, unlabeled datasets across various
domains.