kClusterHub: An AutoML-Driven Tool for Effortless Partition-Based Clustering over Varied Data Types

IF 2.8 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Future Internet Pub Date : 2023-10-18 DOI:10.3390/fi15100341
Konstantinos Gratsos , Stefanos Ougiaroglou , Dionisis Margaris 
{"title":"kClusterHub: An AutoML-Driven Tool for Effortless Partition-Based Clustering over Varied Data Types","authors":"Konstantinos Gratsos , Stefanos Ougiaroglou , Dionisis Margaris ","doi":"10.3390/fi15100341","DOIUrl":null,"url":null,"abstract":"Partition-based clustering is widely applied over diverse domains. Researchers and practitioners from various scientific disciplines engage with partition-based algorithms relying on specialized software or programming libraries. Addressing the need to bridge the knowledge gap associated with these tools, this paper introduces kClusterHub, an AutoML-driven web tool that simplifies the execution of partition-based clustering over numerical, categorical and mixed data types, while facilitating the identification of the optimal number of clusters, using the elbow method. Through automatic feature analysis, kClusterHub selects the most appropriate algorithm from the trio of k-means, k-modes, and k-prototypes. By empowering users to seamlessly upload datasets and select features, kClusterHub selects the algorithm, provides the elbow graph, recommends the optimal number of clusters, executes clustering, and presents the cluster assignment, through tabular representations and exploratory plots. Therefore, kClusterHub reduces the need for specialized software and programming skills, making clustering more accessible to non-experts. For further enhancing its utility, kClusterHub integrates a REST API to support the programmatic execution of cluster analysis. The paper concludes with an evaluation of kClusterHub’s usability via the System Usability Scale and CPU performance experiments. The results emerge that kClusterHub is a streamlined, efficient and user-friendly AutoML-inspired tool for cluster analysis.","PeriodicalId":37982,"journal":{"name":"Future Internet","volume":"74 1","pages":"0"},"PeriodicalIF":2.8000,"publicationDate":"2023-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Internet","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/fi15100341","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Partition-based clustering is widely applied over diverse domains. Researchers and practitioners from various scientific disciplines engage with partition-based algorithms relying on specialized software or programming libraries. Addressing the need to bridge the knowledge gap associated with these tools, this paper introduces kClusterHub, an AutoML-driven web tool that simplifies the execution of partition-based clustering over numerical, categorical and mixed data types, while facilitating the identification of the optimal number of clusters, using the elbow method. Through automatic feature analysis, kClusterHub selects the most appropriate algorithm from the trio of k-means, k-modes, and k-prototypes. By empowering users to seamlessly upload datasets and select features, kClusterHub selects the algorithm, provides the elbow graph, recommends the optimal number of clusters, executes clustering, and presents the cluster assignment, through tabular representations and exploratory plots. Therefore, kClusterHub reduces the need for specialized software and programming skills, making clustering more accessible to non-experts. For further enhancing its utility, kClusterHub integrates a REST API to support the programmatic execution of cluster analysis. The paper concludes with an evaluation of kClusterHub’s usability via the System Usability Scale and CPU performance experiments. The results emerge that kClusterHub is a streamlined, efficient and user-friendly AutoML-inspired tool for cluster analysis.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
kClusterHub:一个自动驱动的工具,用于在各种数据类型上轻松地基于分区进行聚类
基于分区的聚类广泛应用于各个领域。来自不同科学学科的研究人员和实践者依靠专门的软件或编程库从事基于分区的算法。为了解决与这些工具相关的知识鸿沟的需要,本文介绍了kClusterHub,这是一个自动驱动的web工具,它简化了基于分区的数字、分类和混合数据类型的聚类的执行,同时使用肘形方法促进了簇的最佳数量的识别。通过自动特征分析,kClusterHub从k-means、k-modes和k-prototype中选择最合适的算法。通过使用户能够无缝地上传数据集和选择特征,kClusterHub选择算法,提供肘形图,推荐最优集群数量,执行集群,并通过表格表示和探索图表示集群分配。因此,kClusterHub减少了对专门软件和编程技能的需求,使非专家更容易使用集群。为了进一步增强其实用性,kClusterHub集成了一个REST API来支持集群分析的程序化执行。本文最后通过系统可用性量表和CPU性能实验对kClusterHub的可用性进行了评估。结果表明,kClusterHub是一个简化的、高效的、用户友好的、受automl启发的聚类分析工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Future Internet
Future Internet Computer Science-Computer Networks and Communications
CiteScore
7.10
自引率
5.90%
发文量
303
审稿时长
11 weeks
期刊介绍: Future Internet is a scholarly open access journal which provides an advanced forum for science and research concerned with evolution of Internet technologies and related smart systems for “Net-Living” development. The general reference subject is therefore the evolution towards the future internet ecosystem, which is feeding a continuous, intensive, artificial transformation of the lived environment, for a widespread and significant improvement of well-being in all spheres of human life (private, public, professional). Included topics are: • advanced communications network infrastructures • evolution of internet basic services • internet of things • netted peripheral sensors • industrial internet • centralized and distributed data centers • embedded computing • cloud computing • software defined network functions and network virtualization • cloud-let and fog-computing • big data, open data and analytical tools • cyber-physical systems • network and distributed operating systems • web services • semantic structures and related software tools • artificial and augmented intelligence • augmented reality • system interoperability and flexible service composition • smart mission-critical system architectures • smart terminals and applications • pro-sumer tools for application design and development • cyber security compliance • privacy compliance • reliability compliance • dependability compliance • accountability compliance • trust compliance • technical quality of basic services.
期刊最新文献
Controllable Queuing System with Elastic Traffic and Signals for Resource Capacity Planning in 5G Network Slicing Internet-of-Things Traffic Analysis and Device Identification Based on Two-Stage Clustering in Smart Home Environments Resource Indexing and Querying in Large Connected Environments An Analysis of Methods and Metrics for Task Scheduling in Fog Computing Evaluating Embeddings from Pre-Trained Language Models and Knowledge Graphs for Educational Content Recommendation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1