kClusterHub: An AutoML-Driven Tool for Effortless Partition-Based Clustering over Varied Data Types

IF 3.6 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Future Internet Pub Date : 2023-10-18 DOI:10.3390/fi15100341

Konstantinos Gratsos , Stefanos Ougiaroglou , Dionisis Margaris

{"title":"kClusterHub: An AutoML-Driven Tool for Effortless Partition-Based Clustering over Varied Data Types","authors":"Konstantinos Gratsos , Stefanos Ougiaroglou , Dionisis Margaris ","doi":"10.3390/fi15100341","DOIUrl":null,"url":null,"abstract":"Partition-based clustering is widely applied over diverse domains. Researchers and practitioners from various scientific disciplines engage with partition-based algorithms relying on specialized software or programming libraries. Addressing the need to bridge the knowledge gap associated with these tools, this paper introduces kClusterHub, an AutoML-driven web tool that simplifies the execution of partition-based clustering over numerical, categorical and mixed data types, while facilitating the identification of the optimal number of clusters, using the elbow method. Through automatic feature analysis, kClusterHub selects the most appropriate algorithm from the trio of k-means, k-modes, and k-prototypes. By empowering users to seamlessly upload datasets and select features, kClusterHub selects the algorithm, provides the elbow graph, recommends the optimal number of clusters, executes clustering, and presents the cluster assignment, through tabular representations and exploratory plots. Therefore, kClusterHub reduces the need for specialized software and programming skills, making clustering more accessible to non-experts. For further enhancing its utility, kClusterHub integrates a REST API to support the programmatic execution of cluster analysis. The paper concludes with an evaluation of kClusterHub’s usability via the System Usability Scale and CPU performance experiments. The results emerge that kClusterHub is a streamlined, efficient and user-friendly AutoML-inspired tool for cluster analysis.","PeriodicalId":37982,"journal":{"name":"Future Internet","volume":"74 1","pages":"0"},"PeriodicalIF":3.6000,"publicationDate":"2023-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Internet","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/fi15100341","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Partition-based clustering is widely applied over diverse domains. Researchers and practitioners from various scientific disciplines engage with partition-based algorithms relying on specialized software or programming libraries. Addressing the need to bridge the knowledge gap associated with these tools, this paper introduces kClusterHub, an AutoML-driven web tool that simplifies the execution of partition-based clustering over numerical, categorical and mixed data types, while facilitating the identification of the optimal number of clusters, using the elbow method. Through automatic feature analysis, kClusterHub selects the most appropriate algorithm from the trio of k-means, k-modes, and k-prototypes. By empowering users to seamlessly upload datasets and select features, kClusterHub selects the algorithm, provides the elbow graph, recommends the optimal number of clusters, executes clustering, and presents the cluster assignment, through tabular representations and exploratory plots. Therefore, kClusterHub reduces the need for specialized software and programming skills, making clustering more accessible to non-experts. For further enhancing its utility, kClusterHub integrates a REST API to support the programmatic execution of cluster analysis. The paper concludes with an evaluation of kClusterHub’s usability via the System Usability Scale and CPU performance experiments. The results emerge that kClusterHub is a streamlined, efficient and user-friendly AutoML-inspired tool for cluster analysis.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

kClusterHub:一个自动驱动的工具，用于在各种数据类型上轻松地基于分区进行聚类

基于分区的聚类广泛应用于各个领域。来自不同科学学科的研究人员和实践者依靠专门的软件或编程库从事基于分区的算法。为了解决与这些工具相关的知识鸿沟的需要，本文介绍了kClusterHub，这是一个自动驱动的web工具，它简化了基于分区的数字、分类和混合数据类型的聚类的执行，同时使用肘形方法促进了簇的最佳数量的识别。通过自动特征分析，kClusterHub从k-means、k-modes和k-prototype中选择最合适的算法。通过使用户能够无缝地上传数据集和选择特征，kClusterHub选择算法，提供肘形图，推荐最优集群数量，执行集群，并通过表格表示和探索图表示集群分配。因此，kClusterHub减少了对专门软件和编程技能的需求，使非专家更容易使用集群。为了进一步增强其实用性，kClusterHub集成了一个REST API来支持集群分析的程序化执行。本文最后通过系统可用性量表和CPU性能实验对kClusterHub的可用性进行了评估。结果表明，kClusterHub是一个简化的、高效的、用户友好的、受automl启发的聚类分析工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Future Internet Computer Science-Computer Networks and Communications

CiteScore

7.10

自引率

5.90%

发文量

303

审稿时长

11 weeks

期刊介绍： Future Internet is a scholarly open access journal which provides an advanced forum for science and research concerned with evolution of Internet technologies and related smart systems for “Net-Living” development. The general reference subject is therefore the evolution towards the future internet ecosystem, which is feeding a continuous, intensive, artificial transformation of the lived environment, for a widespread and significant improvement of well-being in all spheres of human life (private, public, professional). Included topics are: • advanced communications network infrastructures • evolution of internet basic services • internet of things • netted peripheral sensors • industrial internet • centralized and distributed data centers • embedded computing • cloud computing • software defined network functions and network virtualization • cloud-let and fog-computing • big data, open data and analytical tools • cyber-physical systems • network and distributed operating systems • web services • semantic structures and related software tools • artificial and augmented intelligence • augmented reality • system interoperability and flexible service composition • smart mission-critical system architectures • smart terminals and applications • pro-sumer tools for application design and development • cyber security compliance • privacy compliance • reliability compliance • dependability compliance • accountability compliance • trust compliance • technical quality of basic services.