{"title":"Parallel clustering of large data set on Hadoop using data mining techniques","authors":"K. S. Chaturbhuj, Gauri Chaudhary","doi":"10.1109/STARTUP.2016.7583955","DOIUrl":null,"url":null,"abstract":"Traditional data processing techniques are not enough to handle rapidly growing data. Hadoop can be used for processing such large data. K-means is the traditional clustering method which is simple, scalable and can easily implement but K-means converges to local minima from starting position and sensitive to initial centers. K-means required number of clusters in advance. Particle Swarm Optimization i.e PSO is mimic behavior based algorithm used to introduce the connectivity principle in the centroid based clustering algorithm that will gives optimum centroid and hence find better clusters. We used PSO for finding initial centroids and K-means to find better clusters. Hadoop is used for fast and parallel processing of large datasets.","PeriodicalId":355852,"journal":{"name":"2016 World Conference on Futuristic Trends in Research and Innovation for Social Welfare (Startup Conclave)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 World Conference on Futuristic Trends in Research and Innovation for Social Welfare (Startup Conclave)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/STARTUP.2016.7583955","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Traditional data processing techniques are not enough to handle rapidly growing data. Hadoop can be used for processing such large data. K-means is the traditional clustering method which is simple, scalable and can easily implement but K-means converges to local minima from starting position and sensitive to initial centers. K-means required number of clusters in advance. Particle Swarm Optimization i.e PSO is mimic behavior based algorithm used to introduce the connectivity principle in the centroid based clustering algorithm that will gives optimum centroid and hence find better clusters. We used PSO for finding initial centroids and K-means to find better clusters. Hadoop is used for fast and parallel processing of large datasets.