A-MKMC: An effective adaptive-based multilevel K-means clustering with optimal centroid selection using hybrid heuristic approach for handling the incomplete data

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Data & Knowledge Engineering Pub Date : 2023-11-22 DOI:10.1016/j.datak.2023.102243

Hima Vijayan , Subramaniam M , Sathiyasekar K

{"title":"A-MKMC: An effective adaptive-based multilevel K-means clustering with optimal centroid selection using hybrid heuristic approach for handling the incomplete data","authors":"Hima Vijayan , Subramaniam M , Sathiyasekar K","doi":"10.1016/j.datak.2023.102243","DOIUrl":null,"url":null,"abstract":"<div>In general, clustering is defined as partitioning similar and dissimilar objects into several groups. It has been widely used in applications like pattern recognition, image processing, and data analysis. When the dataset contains some missing data or value, it is termed incomplete data. In such implications, the incomplete dataset issue is untreatable while validating the data. Due to these flaws, the quality or standard level of the data gets an impact. Hence, the handling of missing values is done by influencing the clustering mechanisms for sorting out the missing data. Yet, the traditional clustering algorithms fail to combat the issues as it is not supposed to maintain large dimensional data. It is also caused by errors of human intervention or inaccurate outcomes. To alleviate the challenging issue of incomplete data, a novel clustering algorithm is proposed. Initially, incomplete or mixed data is garnered from the five different standard data sources. Once the data is to be collected, it is undergone the pre-processing phase, which is accomplished using data normalization. Subsequently, the final step is processed by the new clustering algorithm that is termed Adaptive centroid based Multilevel K-Means Clustering (A-MKMC), in which the cluster centroid is optimized by integrating the two conventional algorithms such as Border Collie Optimization (BCO) and Whale Optimization Algorithm (WOA) named as Hybrid Border Collie Whale Optimization (HBCWO). Therefore, the validation of the novel clustering model is estimated using various measures and compared against traditional mechanisms. From the overall result analysis, the accuracy and precision of the designed HBCWO-A-MKMC method attain 93 % and 95 %. Hence, the adaptive clustering process exploits the higher performance that aids in sorting out the missing data issuecompared to the other conventional methods.</div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"150 ","pages":"Article 102243"},"PeriodicalIF":2.7000,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data & Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169023X23001039","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In general, clustering is defined as partitioning similar and dissimilar objects into several groups. It has been widely used in applications like pattern recognition, image processing, and data analysis. When the dataset contains some missing data or value, it is termed incomplete data. In such implications, the incomplete dataset issue is untreatable while validating the data. Due to these flaws, the quality or standard level of the data gets an impact. Hence, the handling of missing values is done by influencing the clustering mechanisms for sorting out the missing data. Yet, the traditional clustering algorithms fail to combat the issues as it is not supposed to maintain large dimensional data. It is also caused by errors of human intervention or inaccurate outcomes. To alleviate the challenging issue of incomplete data, a novel clustering algorithm is proposed. Initially, incomplete or mixed data is garnered from the five different standard data sources. Once the data is to be collected, it is undergone the pre-processing phase, which is accomplished using data normalization. Subsequently, the final step is processed by the new clustering algorithm that is termed Adaptive centroid based Multilevel K-Means Clustering (A-MKMC), in which the cluster centroid is optimized by integrating the two conventional algorithms such as Border Collie Optimization (BCO) and Whale Optimization Algorithm (WOA) named as Hybrid Border Collie Whale Optimization (HBCWO). Therefore, the validation of the novel clustering model is estimated using various measures and compared against traditional mechanisms. From the overall result analysis, the accuracy and precision of the designed HBCWO-A-MKMC method attain 93 % and 95 %. Hence, the adaptive clustering process exploits the higher performance that aids in sorting out the missing data issuecompared to the other conventional methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A-MKMC:一种有效的基于自适应的多级k -均值聚类方法，采用混合启发式方法进行最优质心选择，用于处理不完整数据

通常，聚类被定义为将相似和不相似的对象划分为几个组。它被广泛应用于模式识别、图像处理和数据分析等领域。当数据集包含一些缺失的数据或值时，它被称为不完整数据。在这种情况下，验证数据时无法处理不完整的数据集问题。由于这些缺陷，数据的质量或标准水平受到影响。因此，对缺失值的处理是通过影响分类缺失数据的聚类机制来完成的。然而，传统的聚类算法无法解决这些问题，因为它不应该维护大维度的数据。它也由人为干预的错误或不准确的结果引起。为了解决数据不完整的难题，提出了一种新的聚类算法。最初，从五个不同的标准数据源收集不完整或混合的数据。一旦要收集数据，它就会经历预处理阶段，这是使用数据规范化来完成的。最后一步采用基于自适应质心的多层次k -均值聚类(A-MKMC)聚类算法，将边界牧羊犬优化算法(BCO)和鲸鱼优化算法(WOA)结合起来进行聚类质心的优化，称为混合边界牧羊犬鲸鱼优化算法(HBCWO)。因此，使用各种度量来估计新聚类模型的有效性，并与传统机制进行比较。从总体结果分析来看，所设计的HBCWO-A-MKMC方法的准确度和精密度分别达到93%和95%。因此，与其他传统方法相比，自适应聚类过程利用了更高的性能，有助于整理丢失的数据问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Data & Knowledge Engineering 工程技术-计算机：人工智能

CiteScore

5.00

自引率

0.00%

发文量

审稿时长

6 months

期刊介绍： Data & Knowledge Engineering (DKE) stimulates the exchange of ideas and interaction between these two related fields of interest. DKE reaches a world-wide audience of researchers, designers, managers and users. The major aim of the journal is to identify, investigate and analyze the underlying principles in the design and effective use of these systems.

期刊最新文献

Editorial Board Accessibility in conceptual modeling—A systematic literature review, a keyboard-only UML modeling tool, and a research roadmap Privacy-preserving cross-network service recommendation via federated learning of unified user representations A graph theoretic approach to assess quality of data for classification task Editorial Board