一种自动确定DBSCAN参数的新方法

IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Journal of Artificial Intelligence and Soft Computing Research Pub Date : 2020-05-23 DOI:10.2478/jaiscr-2020-0014
Artur Starczewski, P. Goetzen, M. Er
{"title":"一种自动确定DBSCAN参数的新方法","authors":"Artur Starczewski, P. Goetzen, M. Er","doi":"10.2478/jaiscr-2020-0014","DOIUrl":null,"url":null,"abstract":"Abstract Clustering is an attractive technique used in many fields in order to deal with large scale data. Many clustering algorithms have been proposed so far. The most popular algorithms include density-based approaches. These kinds of algorithms can identify clusters of arbitrary shapes in datasets. The most common of them is the Density-Based Spatial Clustering of Applications with Noise (DBSCAN). The original DBSCAN algorithm has been widely applied in various applications and has many different modifications. However, there is a fundamental issue of the right choice of its two input parameters, i.e the eps radius and the MinPts density threshold. The choice of these parameters is especially difficult when the density variation within clusters is significant. In this paper, a new method that determines the right values of the parameters for different kinds of clusters is proposed. This method uses detection of sharp distance increases generated by a function which computes a distance between each element of a dataset and its k-th nearest neighbor. Experimental results have been obtained for several different datasets and they confirm a very good performance of the newly proposed method.","PeriodicalId":48494,"journal":{"name":"Journal of Artificial Intelligence and Soft Computing Research","volume":"10 1","pages":"209 - 221"},"PeriodicalIF":3.3000,"publicationDate":"2020-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"A New Method for Automatic Determining of the DBSCAN Parameters\",\"authors\":\"Artur Starczewski, P. Goetzen, M. Er\",\"doi\":\"10.2478/jaiscr-2020-0014\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Clustering is an attractive technique used in many fields in order to deal with large scale data. Many clustering algorithms have been proposed so far. The most popular algorithms include density-based approaches. These kinds of algorithms can identify clusters of arbitrary shapes in datasets. The most common of them is the Density-Based Spatial Clustering of Applications with Noise (DBSCAN). The original DBSCAN algorithm has been widely applied in various applications and has many different modifications. However, there is a fundamental issue of the right choice of its two input parameters, i.e the eps radius and the MinPts density threshold. The choice of these parameters is especially difficult when the density variation within clusters is significant. In this paper, a new method that determines the right values of the parameters for different kinds of clusters is proposed. This method uses detection of sharp distance increases generated by a function which computes a distance between each element of a dataset and its k-th nearest neighbor. Experimental results have been obtained for several different datasets and they confirm a very good performance of the newly proposed method.\",\"PeriodicalId\":48494,\"journal\":{\"name\":\"Journal of Artificial Intelligence and Soft Computing Research\",\"volume\":\"10 1\",\"pages\":\"209 - 221\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2020-05-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Artificial Intelligence and Soft Computing Research\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.2478/jaiscr-2020-0014\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Artificial Intelligence and Soft Computing Research","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.2478/jaiscr-2020-0014","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 20

摘要

聚类是一种有吸引力的技术,用于许多领域,以处理大规模数据。目前已经提出了许多聚类算法。最流行的算法包括基于密度的方法。这类算法可以识别数据集中任意形状的簇。其中最常见的是基于密度的带噪声应用空间聚类(DBSCAN)。原始的DBSCAN算法在各种应用中得到了广泛的应用,并进行了许多不同的修改。然而,有一个基本的问题是正确选择它的两个输入参数,即eps半径和MinPts密度阈值。当集群内密度变化显著时,这些参数的选择尤其困难。本文提出了一种确定不同类型聚类的参数值的新方法。该方法使用由一个函数生成的急剧距离增加检测,该函数计算数据集的每个元素与其第k个最近邻居之间的距离。在多个不同的数据集上进行了实验,结果证实了该方法的良好性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A New Method for Automatic Determining of the DBSCAN Parameters
Abstract Clustering is an attractive technique used in many fields in order to deal with large scale data. Many clustering algorithms have been proposed so far. The most popular algorithms include density-based approaches. These kinds of algorithms can identify clusters of arbitrary shapes in datasets. The most common of them is the Density-Based Spatial Clustering of Applications with Noise (DBSCAN). The original DBSCAN algorithm has been widely applied in various applications and has many different modifications. However, there is a fundamental issue of the right choice of its two input parameters, i.e the eps radius and the MinPts density threshold. The choice of these parameters is especially difficult when the density variation within clusters is significant. In this paper, a new method that determines the right values of the parameters for different kinds of clusters is proposed. This method uses detection of sharp distance increases generated by a function which computes a distance between each element of a dataset and its k-th nearest neighbor. Experimental results have been obtained for several different datasets and they confirm a very good performance of the newly proposed method.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Artificial Intelligence and Soft Computing Research
Journal of Artificial Intelligence and Soft Computing Research COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-
CiteScore
7.00
自引率
25.00%
发文量
10
审稿时长
24 weeks
期刊介绍: Journal of Artificial Intelligence and Soft Computing Research (available also at Sciendo (De Gruyter)) is a dynamically developing international journal focused on the latest scientific results and methods constituting traditional artificial intelligence methods and soft computing techniques. Our goal is to bring together scientists representing both approaches and various research communities.
期刊最新文献
Bending Path Understanding Based on Angle Projections in Field Environments Self-Organized Operational Neural Networks for The Detection of Atrial Fibrillation Interpreting Convolutional Layers in DNN Model Based on Time–Frequency Representation of Emotional Speech A Few-Shot Learning Approach for Covid-19 Diagnosis Using Quasi-Configured Topological Spaces Metrics for Assessing Generalization of Deep Reinforcement Learning in Parameterized Environments
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1