基于Hadoop MapReduce改进K-Means聚类算法的MapReduce模型

N. Akthar, Mohd Vasim Ahamad, Shahbaaz Ahmad
{"title":"基于Hadoop MapReduce改进K-Means聚类算法的MapReduce模型","authors":"N. Akthar, Mohd Vasim Ahamad, Shahbaaz Ahmad","doi":"10.1109/CICT.2016.46","DOIUrl":null,"url":null,"abstract":"In today's digital world scenario, digital data is coming in and going out faster than ever before. This data is of no use until we extract some useful content from it. But, it is impractical and inefficient to use traditional database management techniques on big data. That's why, big data technologies like Hadoop comes to existence. Hadoop is an open source framework, which can be used to process the huge amount of data in parallel. To extract useful information, data mining techniques can be used. Among many techniques of data mining, clustering is most popular technique. Clustering bind together the similar data in same group, whereas, dissimilar data is scattered in different groups. K Means clustering algorithm is one of the clustering technique. Traditional K Means clustering tries to assign n data objects to k clusters starting with random initial centers. Experiments show that data mining results are inefficient and unstable, if we use random initial centers. In this paper, we have modified traditional K Means clustering algorithm by using improved initial centers. We have proposed various methods to calculate the initial centers and compared their results.","PeriodicalId":118509,"journal":{"name":"2016 Second International Conference on Computational Intelligence & Communication Technology (CICT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"MapReduce Model of Improved K-Means Clustering Algorithm Using Hadoop MapReduce\",\"authors\":\"N. Akthar, Mohd Vasim Ahamad, Shahbaaz Ahmad\",\"doi\":\"10.1109/CICT.2016.46\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In today's digital world scenario, digital data is coming in and going out faster than ever before. This data is of no use until we extract some useful content from it. But, it is impractical and inefficient to use traditional database management techniques on big data. That's why, big data technologies like Hadoop comes to existence. Hadoop is an open source framework, which can be used to process the huge amount of data in parallel. To extract useful information, data mining techniques can be used. Among many techniques of data mining, clustering is most popular technique. Clustering bind together the similar data in same group, whereas, dissimilar data is scattered in different groups. K Means clustering algorithm is one of the clustering technique. Traditional K Means clustering tries to assign n data objects to k clusters starting with random initial centers. Experiments show that data mining results are inefficient and unstable, if we use random initial centers. In this paper, we have modified traditional K Means clustering algorithm by using improved initial centers. We have proposed various methods to calculate the initial centers and compared their results.\",\"PeriodicalId\":118509,\"journal\":{\"name\":\"2016 Second International Conference on Computational Intelligence & Communication Technology (CICT)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 Second International Conference on Computational Intelligence & Communication Technology (CICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CICT.2016.46\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Second International Conference on Computational Intelligence & Communication Technology (CICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CICT.2016.46","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

摘要

在当今的数字世界中,数字数据的输入和输出速度比以往任何时候都要快。除非我们从中提取出一些有用的内容,否则这些数据是没有用的。但是,在大数据上使用传统的数据库管理技术是不切实际的,效率低下的。这就是像Hadoop这样的大数据技术出现的原因。Hadoop是一个开源框架,可以用来并行处理海量数据。为了提取有用的信息,可以使用数据挖掘技术。在众多的数据挖掘技术中,聚类是最受欢迎的技术。聚类将相似的数据聚在一起,而不相似的数据分散在不同的组中。K均值聚类算法是聚类技术的一种。传统的K均值聚类试图将n个数据对象分配到K个随机初始中心的聚类中。实验表明,如果使用随机初始中心,数据挖掘结果是低效且不稳定的。本文利用改进的初始中心对传统的K均值聚类算法进行了改进。我们提出了各种计算初始中心的方法,并比较了它们的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MapReduce Model of Improved K-Means Clustering Algorithm Using Hadoop MapReduce
In today's digital world scenario, digital data is coming in and going out faster than ever before. This data is of no use until we extract some useful content from it. But, it is impractical and inefficient to use traditional database management techniques on big data. That's why, big data technologies like Hadoop comes to existence. Hadoop is an open source framework, which can be used to process the huge amount of data in parallel. To extract useful information, data mining techniques can be used. Among many techniques of data mining, clustering is most popular technique. Clustering bind together the similar data in same group, whereas, dissimilar data is scattered in different groups. K Means clustering algorithm is one of the clustering technique. Traditional K Means clustering tries to assign n data objects to k clusters starting with random initial centers. Experiments show that data mining results are inefficient and unstable, if we use random initial centers. In this paper, we have modified traditional K Means clustering algorithm by using improved initial centers. We have proposed various methods to calculate the initial centers and compared their results.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Sketch Based Image Retrieval Using Watershed Transformation Modified ZRP to Identify Cooperative Attacks Short Term Load Forecasting Using ANN and Multiple Linear Regression Prediction of Carbon Stock Available in Forest Using Naive Bayes Approach CAD for the Detection of Fetal Electrocardiogram through Neuro-Fuzzy Logic and Wavelets Systems for Telemetry
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1