{"title":"基于Hadoop MapReduce改进K-Means聚类算法的MapReduce模型","authors":"N. Akthar, Mohd Vasim Ahamad, Shahbaaz Ahmad","doi":"10.1109/CICT.2016.46","DOIUrl":null,"url":null,"abstract":"In today's digital world scenario, digital data is coming in and going out faster than ever before. This data is of no use until we extract some useful content from it. But, it is impractical and inefficient to use traditional database management techniques on big data. That's why, big data technologies like Hadoop comes to existence. Hadoop is an open source framework, which can be used to process the huge amount of data in parallel. To extract useful information, data mining techniques can be used. Among many techniques of data mining, clustering is most popular technique. Clustering bind together the similar data in same group, whereas, dissimilar data is scattered in different groups. K Means clustering algorithm is one of the clustering technique. Traditional K Means clustering tries to assign n data objects to k clusters starting with random initial centers. Experiments show that data mining results are inefficient and unstable, if we use random initial centers. In this paper, we have modified traditional K Means clustering algorithm by using improved initial centers. We have proposed various methods to calculate the initial centers and compared their results.","PeriodicalId":118509,"journal":{"name":"2016 Second International Conference on Computational Intelligence & Communication Technology (CICT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"MapReduce Model of Improved K-Means Clustering Algorithm Using Hadoop MapReduce\",\"authors\":\"N. Akthar, Mohd Vasim Ahamad, Shahbaaz Ahmad\",\"doi\":\"10.1109/CICT.2016.46\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In today's digital world scenario, digital data is coming in and going out faster than ever before. This data is of no use until we extract some useful content from it. But, it is impractical and inefficient to use traditional database management techniques on big data. That's why, big data technologies like Hadoop comes to existence. Hadoop is an open source framework, which can be used to process the huge amount of data in parallel. To extract useful information, data mining techniques can be used. Among many techniques of data mining, clustering is most popular technique. Clustering bind together the similar data in same group, whereas, dissimilar data is scattered in different groups. K Means clustering algorithm is one of the clustering technique. Traditional K Means clustering tries to assign n data objects to k clusters starting with random initial centers. Experiments show that data mining results are inefficient and unstable, if we use random initial centers. In this paper, we have modified traditional K Means clustering algorithm by using improved initial centers. We have proposed various methods to calculate the initial centers and compared their results.\",\"PeriodicalId\":118509,\"journal\":{\"name\":\"2016 Second International Conference on Computational Intelligence & Communication Technology (CICT)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 Second International Conference on Computational Intelligence & Communication Technology (CICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CICT.2016.46\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Second International Conference on Computational Intelligence & Communication Technology (CICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CICT.2016.46","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
MapReduce Model of Improved K-Means Clustering Algorithm Using Hadoop MapReduce
In today's digital world scenario, digital data is coming in and going out faster than ever before. This data is of no use until we extract some useful content from it. But, it is impractical and inefficient to use traditional database management techniques on big data. That's why, big data technologies like Hadoop comes to existence. Hadoop is an open source framework, which can be used to process the huge amount of data in parallel. To extract useful information, data mining techniques can be used. Among many techniques of data mining, clustering is most popular technique. Clustering bind together the similar data in same group, whereas, dissimilar data is scattered in different groups. K Means clustering algorithm is one of the clustering technique. Traditional K Means clustering tries to assign n data objects to k clusters starting with random initial centers. Experiments show that data mining results are inefficient and unstable, if we use random initial centers. In this paper, we have modified traditional K Means clustering algorithm by using improved initial centers. We have proposed various methods to calculate the initial centers and compared their results.