{"title":"An Efficient and Speedy approach for Hierarchical Clustering Using Complete Linkage method","authors":"P. Banerjee, A. Chakrabarti, T. K. Ballabh","doi":"10.1109/ICECCT56650.2023.10179708","DOIUrl":null,"url":null,"abstract":"In recent days to deal with the major problem of increasing data size, Clustering is a highly useful tool that not only helps in shrinking the dataset by grouping them into clusters but also finds hidden information from the unlabeled data. The Complete Linkage algorithm is a highly preferred distance-based Hierarchical Clustering algorithm that provides compact clusters but suffers from the disadvantage of high convergence time. This algorithm needs the entire dataset in advance to take a clustering decision and hence is unsuitable for “on the fly” data clustering. This paper presents a two-staged partially incremental Complete Linkage Clustering algorithm that partially clusters data alongside the collection. The proposed method without compromising the space complexity reduces a lot of redundant distance computations thereby reducing the runtime of the algorithm to a much lower value. Although the clustering result may slightly deviate from the original Complete Linkage algorithm, the characteristics of the Complete Linkage Clusters are always met in all scenarios under any given threshold. The advantage of this algorithm over the existing methods has been verified experimentally.","PeriodicalId":180790,"journal":{"name":"2023 Fifth International Conference on Electrical, Computer and Communication Technologies (ICECCT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 Fifth International Conference on Electrical, Computer and Communication Technologies (ICECCT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECCT56650.2023.10179708","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In recent days to deal with the major problem of increasing data size, Clustering is a highly useful tool that not only helps in shrinking the dataset by grouping them into clusters but also finds hidden information from the unlabeled data. The Complete Linkage algorithm is a highly preferred distance-based Hierarchical Clustering algorithm that provides compact clusters but suffers from the disadvantage of high convergence time. This algorithm needs the entire dataset in advance to take a clustering decision and hence is unsuitable for “on the fly” data clustering. This paper presents a two-staged partially incremental Complete Linkage Clustering algorithm that partially clusters data alongside the collection. The proposed method without compromising the space complexity reduces a lot of redundant distance computations thereby reducing the runtime of the algorithm to a much lower value. Although the clustering result may slightly deviate from the original Complete Linkage algorithm, the characteristics of the Complete Linkage Clusters are always met in all scenarios under any given threshold. The advantage of this algorithm over the existing methods has been verified experimentally.