Multi-Density Datasets Clustering Using K-Nearest Neighbors and Chebyshev’s Inequality

IF 3.3 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Informatica Pub Date : 2023-10-06 DOI:10.31449/inf.v47i8.4719
Amira Bouchemal, Mohamed Tahar Kimour
{"title":"Multi-Density Datasets Clustering Using K-Nearest Neighbors and Chebyshev’s Inequality","authors":"Amira Bouchemal, Mohamed Tahar Kimour","doi":"10.31449/inf.v47i8.4719","DOIUrl":null,"url":null,"abstract":"Density-based clustering techniques are widely used in data mining on various fields. DBSCAN is one of the most popular density-based clustering algorithms, characterized by its ability to discover clusters with different shapes and sizes, and to separate noise and outliers. However, two fundamental limitations are still encountered that is the required input parameter of Eps distance threshold and its inefficiency to cluster datasets with various densities. For overcoming such drawbacks, a statistical based technique is proposed in this work. Specifically, the proposed technique utilizes an appropriate k-nearest neighbor density, based on which it sorts the dataset in ascending order and, using the statistical Chebyshev’s inequality as a suitable means for handling arbitrary distributions, it automatically determines different Eps values for clusters of various densities. Experiments conducted on synthetic and real datasets have demonstrated its efficiency and accuracy. The results indicate its superiority compared with DBSCAN, DPC, and their recently proposed improvements.","PeriodicalId":56292,"journal":{"name":"Informatica","volume":"96 1","pages":"0"},"PeriodicalIF":3.3000,"publicationDate":"2023-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Informatica","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31449/inf.v47i8.4719","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Density-based clustering techniques are widely used in data mining on various fields. DBSCAN is one of the most popular density-based clustering algorithms, characterized by its ability to discover clusters with different shapes and sizes, and to separate noise and outliers. However, two fundamental limitations are still encountered that is the required input parameter of Eps distance threshold and its inefficiency to cluster datasets with various densities. For overcoming such drawbacks, a statistical based technique is proposed in this work. Specifically, the proposed technique utilizes an appropriate k-nearest neighbor density, based on which it sorts the dataset in ascending order and, using the statistical Chebyshev’s inequality as a suitable means for handling arbitrary distributions, it automatically determines different Eps values for clusters of various densities. Experiments conducted on synthetic and real datasets have demonstrated its efficiency and accuracy. The results indicate its superiority compared with DBSCAN, DPC, and their recently proposed improvements.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于k近邻和Chebyshev不等式的多密度数据集聚类
基于密度的聚类技术广泛应用于各个领域的数据挖掘。DBSCAN是最流行的基于密度的聚类算法之一,其特点是能够发现不同形状和大小的聚类,并分离噪声和异常值。然而,Eps距离阈值的输入参数要求和对不同密度的数据集聚类效率不高,仍然存在两个基本的局限性。为了克服这些缺点,本文提出了一种基于统计的技术。具体来说,所提出的技术利用适当的k近邻密度,在此基础上按升序对数据集进行排序,并使用统计Chebyshev不等式作为处理任意分布的合适手段,它自动确定不同密度簇的不同Eps值。在合成数据集和真实数据集上进行的实验证明了该方法的有效性和准确性。结果表明,该方法与DBSCAN、DPC及其最近提出的改进方案相比具有优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Informatica
Informatica 工程技术-计算机:信息系统
CiteScore
5.90
自引率
6.90%
发文量
19
审稿时长
12 months
期刊介绍: The quarterly journal Informatica provides an international forum for high-quality original research and publishes papers on mathematical simulation and optimization, recognition and control, programming theory and systems, automation systems and elements. Informatica provides a multidisciplinary forum for scientists and engineers involved in research and design including experts who implement and manage information systems applications.
期刊最新文献
Beyond Quasi-Adjoint Graphs: On Polynomial-Time Solvable Cases of the Hamiltonian Cycle and Path Problems Confidential Transaction Balance Verification by the Net Using Non-Interactive Zero-Knowledge Proofs An Improved Algorithm for Extracting Frequent Gradual Patterns Offloaded Data Processing Energy Efficiency Evaluation Demystifying the Stability and the Performance Aspects of CoCoSo Ranking Method under Uncertain Preferences
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1