{"title":"On a Distributed Approach for Density-Based Clustering","authors":"Nhien-An Le-Khac, Mohand Tahar Kechadi","doi":"10.1109/ICMLA.2011.108","DOIUrl":null,"url":null,"abstract":"Efficient extraction of useful knowledge from very large datasets is still a challenge, mainly when the datasets are distributed, heterogeneous and of different quality depending of the various nodes involved. To reduce the overhead cost due to communications, most of the existing distributed clustering approaches generates global models by aggregating local results obtained on each individual node. The complexity and quality of solutions depend highly on the quality of the aggregation. In this respect, we propose distributed density-based clustering that both reduces the communication overheads and improves the quality of the global models by considering the shapes of local clusters. From preliminary results we show that this algorithm is very promising.","PeriodicalId":439926,"journal":{"name":"2011 10th International Conference on Machine Learning and Applications and Workshops","volume":"136 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 10th International Conference on Machine Learning and Applications and Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2011.108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Efficient extraction of useful knowledge from very large datasets is still a challenge, mainly when the datasets are distributed, heterogeneous and of different quality depending of the various nodes involved. To reduce the overhead cost due to communications, most of the existing distributed clustering approaches generates global models by aggregating local results obtained on each individual node. The complexity and quality of solutions depend highly on the quality of the aggregation. In this respect, we propose distributed density-based clustering that both reduces the communication overheads and improves the quality of the global models by considering the shapes of local clusters. From preliminary results we show that this algorithm is very promising.