{"title":"Class noise elimination approach for large datasets based on a combination of classifiers","authors":"B. Zerhari","doi":"10.1109/CLOUDTECH.2016.7847688","DOIUrl":null,"url":null,"abstract":"Noise points, or class noise, detection and elimination became increasingly important to handle large datasets. In fact, eliminating noise in this environment helps reduce computing costs, especially when using clustering algorithms. Nowadays, large varieties of clustering algorithms exist and produce good results. However, they often assume that the input data are free or have very low level of noise, which is rarely the case in real Big Data context. In this paper, we present a noise detection and elimination approach for large datasets. This approach relies on four important steps: divide data into subsets, extract the best rules, apply different classifiers to the subsets, and finally combine the classifiers results.","PeriodicalId":133495,"journal":{"name":"2016 2nd International Conference on Cloud Computing Technologies and Applications (CloudTech)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 2nd International Conference on Cloud Computing Technologies and Applications (CloudTech)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLOUDTECH.2016.7847688","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Noise points, or class noise, detection and elimination became increasingly important to handle large datasets. In fact, eliminating noise in this environment helps reduce computing costs, especially when using clustering algorithms. Nowadays, large varieties of clustering algorithms exist and produce good results. However, they often assume that the input data are free or have very low level of noise, which is rarely the case in real Big Data context. In this paper, we present a noise detection and elimination approach for large datasets. This approach relies on four important steps: divide data into subsets, extract the best rules, apply different classifiers to the subsets, and finally combine the classifiers results.