Large-scale Data Mining Method based on Clustering Algorithm Combined with MAPREDUCE

Yulun Zhang, Chenxu Zhang, Lei Yang, Hongyang Li
{"title":"Large-scale Data Mining Method based on Clustering Algorithm Combined with MAPREDUCE","authors":"Yulun Zhang, Chenxu Zhang, Lei Yang, Hongyang Li","doi":"10.62051/8p9b3106","DOIUrl":null,"url":null,"abstract":"With the continuous deepening and development of information technology, the diversity and amount of information in data continue to grow. Effectively mining these text data to extract valuable content has become an urgent task in the field of data research. This study combines the MapReduce distributed system with the K-means clustering algorithm to meet the challenges of large-scale data mining. At the same time, the paper use a distributed caching mechanism to solve the problem of repeated application of resources for multiple MapReduce collaborative operations and improve data mining efficiency. The combination of MapReduce's distributed computing and the advantages of K-means clustering algorithm provides an efficient and scalable method for large-scale data mining. Experimental results combining internal and external indicators show that the advantage of combining K-means with MapReduce is to fully utilize the distributed and parallel computing characteristics of MapReduce, providing users with an efficient and scalable data mining tool. Through this research, the paper provide new methods and insights for large-scale data mining, improving the efficiency and accuracy of data mining.","PeriodicalId":509968,"journal":{"name":"Transactions on Computer Science and Intelligent Systems Research","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transactions on Computer Science and Intelligent Systems Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.62051/8p9b3106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

With the continuous deepening and development of information technology, the diversity and amount of information in data continue to grow. Effectively mining these text data to extract valuable content has become an urgent task in the field of data research. This study combines the MapReduce distributed system with the K-means clustering algorithm to meet the challenges of large-scale data mining. At the same time, the paper use a distributed caching mechanism to solve the problem of repeated application of resources for multiple MapReduce collaborative operations and improve data mining efficiency. The combination of MapReduce's distributed computing and the advantages of K-means clustering algorithm provides an efficient and scalable method for large-scale data mining. Experimental results combining internal and external indicators show that the advantage of combining K-means with MapReduce is to fully utilize the distributed and parallel computing characteristics of MapReduce, providing users with an efficient and scalable data mining tool. Through this research, the paper provide new methods and insights for large-scale data mining, improving the efficiency and accuracy of data mining.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于聚类算法与 MAPREDUCE 结合的大规模数据挖掘方法
随着信息技术的不断深入和发展,数据的多样性和信息量不断增长。有效挖掘这些文本数据以提取有价值的内容已成为数据研究领域的一项紧迫任务。本研究将 MapReduce 分布式系统与 K-means 聚类算法相结合,以应对大规模数据挖掘的挑战。同时,本文利用分布式缓存机制,解决了多个 MapReduce 协同操作重复应用资源的问题,提高了数据挖掘效率。MapReduce 的分布式计算与 K-means 聚类算法的优势相结合,为大规模数据挖掘提供了一种高效、可扩展的方法。结合内外部指标的实验结果表明,K-means与MapReduce结合的优势在于充分发挥了MapReduce的分布式和并行计算特性,为用户提供了高效、可扩展的数据挖掘工具。通过这项研究,论文为大规模数据挖掘提供了新的方法和见解,提高了数据挖掘的效率和准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Automated pricing and replenishment decisions for vegetable products based on evaluation optimization models Obstacle Detection Technology for Autonomous Driving Based on Deep Learning Automatic Selection and Parameter Optimization of Mathematical Models Based on Machine Learning Exploring the intersection of network security and database communication: a PostgreSQL Socket Connection case study Genetic Algorithm Based Path Planning for Seawater Depth Data Measurement in Real Scenarios
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1