{"title":"一种自动确定DBSCAN参数的新方法","authors":"Artur Starczewski, P. Goetzen, M. Er","doi":"10.2478/jaiscr-2020-0014","DOIUrl":null,"url":null,"abstract":"Abstract Clustering is an attractive technique used in many fields in order to deal with large scale data. Many clustering algorithms have been proposed so far. The most popular algorithms include density-based approaches. These kinds of algorithms can identify clusters of arbitrary shapes in datasets. The most common of them is the Density-Based Spatial Clustering of Applications with Noise (DBSCAN). The original DBSCAN algorithm has been widely applied in various applications and has many different modifications. However, there is a fundamental issue of the right choice of its two input parameters, i.e the eps radius and the MinPts density threshold. The choice of these parameters is especially difficult when the density variation within clusters is significant. In this paper, a new method that determines the right values of the parameters for different kinds of clusters is proposed. This method uses detection of sharp distance increases generated by a function which computes a distance between each element of a dataset and its k-th nearest neighbor. Experimental results have been obtained for several different datasets and they confirm a very good performance of the newly proposed method.","PeriodicalId":48494,"journal":{"name":"Journal of Artificial Intelligence and Soft Computing Research","volume":"10 1","pages":"209 - 221"},"PeriodicalIF":3.3000,"publicationDate":"2020-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"A New Method for Automatic Determining of the DBSCAN Parameters\",\"authors\":\"Artur Starczewski, P. Goetzen, M. Er\",\"doi\":\"10.2478/jaiscr-2020-0014\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Clustering is an attractive technique used in many fields in order to deal with large scale data. Many clustering algorithms have been proposed so far. The most popular algorithms include density-based approaches. These kinds of algorithms can identify clusters of arbitrary shapes in datasets. The most common of them is the Density-Based Spatial Clustering of Applications with Noise (DBSCAN). The original DBSCAN algorithm has been widely applied in various applications and has many different modifications. However, there is a fundamental issue of the right choice of its two input parameters, i.e the eps radius and the MinPts density threshold. The choice of these parameters is especially difficult when the density variation within clusters is significant. In this paper, a new method that determines the right values of the parameters for different kinds of clusters is proposed. This method uses detection of sharp distance increases generated by a function which computes a distance between each element of a dataset and its k-th nearest neighbor. Experimental results have been obtained for several different datasets and they confirm a very good performance of the newly proposed method.\",\"PeriodicalId\":48494,\"journal\":{\"name\":\"Journal of Artificial Intelligence and Soft Computing Research\",\"volume\":\"10 1\",\"pages\":\"209 - 221\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2020-05-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Artificial Intelligence and Soft Computing Research\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.2478/jaiscr-2020-0014\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Artificial Intelligence and Soft Computing Research","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.2478/jaiscr-2020-0014","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A New Method for Automatic Determining of the DBSCAN Parameters
Abstract Clustering is an attractive technique used in many fields in order to deal with large scale data. Many clustering algorithms have been proposed so far. The most popular algorithms include density-based approaches. These kinds of algorithms can identify clusters of arbitrary shapes in datasets. The most common of them is the Density-Based Spatial Clustering of Applications with Noise (DBSCAN). The original DBSCAN algorithm has been widely applied in various applications and has many different modifications. However, there is a fundamental issue of the right choice of its two input parameters, i.e the eps radius and the MinPts density threshold. The choice of these parameters is especially difficult when the density variation within clusters is significant. In this paper, a new method that determines the right values of the parameters for different kinds of clusters is proposed. This method uses detection of sharp distance increases generated by a function which computes a distance between each element of a dataset and its k-th nearest neighbor. Experimental results have been obtained for several different datasets and they confirm a very good performance of the newly proposed method.
期刊介绍:
Journal of Artificial Intelligence and Soft Computing Research (available also at Sciendo (De Gruyter)) is a dynamically developing international journal focused on the latest scientific results and methods constituting traditional artificial intelligence methods and soft computing techniques. Our goal is to bring together scientists representing both approaches and various research communities.