大数据平台Hadoop的性能挑战与解决方案

Balraj Singh, H. Verma, Vishu Madaan
{"title":"大数据平台Hadoop的性能挑战与解决方案","authors":"Balraj Singh, H. Verma, Vishu Madaan","doi":"10.2174/2666255816666230608165146","DOIUrl":null,"url":null,"abstract":"\n\nThe present era demands continuous support to bring improvements in executing complex analytics on large-scale data and to work beyond traditional systems.\n\n\n\nThe need for processing diverse data types and solutions for different domains of the industry is rising. Such needs increase the requirement for sophisticated techniques and methods to enhance the existing platforms and mechanisms further. It provides an opportunity for the research community to investigate further into the existing systems, find potential issues, and propose new ways to improve the current systems. Hadoop is a popular choice to manage and process Big data. It is an open-source platform and a front-runner in the batch processing of large-scale jobs. The economy associated with the cluster in scaling is low as compared to other platforms. However, this popularity by no means guarantees high performance in all scenarios. With the continuous evolution in data development and industrial requirements, it is imperative to investigate and look into new methods and techniques to bring advancements to the existing system.\n\n\n\nA systematic review is represented in this paper to have an insight into the current progress in this field. Research publications from various sources are taken and analyzed. The performance of a cluster largely depends upon the different job processing mechanisms and policies associated with it.\n\n\n\nWhile extensive studies and solutions are proposed, the performance bottlenecks in terms of load balancing, resource utilization, content management, and efficient processing prevail. Not many of the solutions are there on scheduling about the trade-off between different parameters, the process of content splitting and merging is not explored to a large extent and the skew mitigation solutions are more focused on Reduce side of the MapReduce while the Map side is not utilized much for load balancing.\n","PeriodicalId":36514,"journal":{"name":"Recent Advances in Computer Science and Communications","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance Challenges and Solutions in Big Data Platform Hadoop\",\"authors\":\"Balraj Singh, H. Verma, Vishu Madaan\",\"doi\":\"10.2174/2666255816666230608165146\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n\\nThe present era demands continuous support to bring improvements in executing complex analytics on large-scale data and to work beyond traditional systems.\\n\\n\\n\\nThe need for processing diverse data types and solutions for different domains of the industry is rising. Such needs increase the requirement for sophisticated techniques and methods to enhance the existing platforms and mechanisms further. It provides an opportunity for the research community to investigate further into the existing systems, find potential issues, and propose new ways to improve the current systems. Hadoop is a popular choice to manage and process Big data. It is an open-source platform and a front-runner in the batch processing of large-scale jobs. The economy associated with the cluster in scaling is low as compared to other platforms. However, this popularity by no means guarantees high performance in all scenarios. With the continuous evolution in data development and industrial requirements, it is imperative to investigate and look into new methods and techniques to bring advancements to the existing system.\\n\\n\\n\\nA systematic review is represented in this paper to have an insight into the current progress in this field. Research publications from various sources are taken and analyzed. The performance of a cluster largely depends upon the different job processing mechanisms and policies associated with it.\\n\\n\\n\\nWhile extensive studies and solutions are proposed, the performance bottlenecks in terms of load balancing, resource utilization, content management, and efficient processing prevail. Not many of the solutions are there on scheduling about the trade-off between different parameters, the process of content splitting and merging is not explored to a large extent and the skew mitigation solutions are more focused on Reduce side of the MapReduce while the Map side is not utilized much for load balancing.\\n\",\"PeriodicalId\":36514,\"journal\":{\"name\":\"Recent Advances in Computer Science and Communications\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Recent Advances in Computer Science and Communications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2174/2666255816666230608165146\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Recent Advances in Computer Science and Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2174/2666255816666230608165146","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0

摘要

当前的时代需要持续的支持,以提高对大规模数据执行复杂分析的能力,并超越传统系统。针对行业不同领域处理不同数据类型和解决方案的需求正在上升。这种需求增加了对先进技术和方法的需求,以进一步加强现有平台和机制。它为研究团体提供了进一步调查现有系统、发现潜在问题并提出改进当前系统的新方法的机会。Hadoop是管理和处理大数据的流行选择。它是一个开源平台,在大规模作业的批处理方面处于领先地位。与其他平台相比,集群在扩展方面的经济效益较低。但是,这种受欢迎程度并不能保证在所有场景中都具有高性能。随着数据开发和工业需求的不断发展,有必要调查和研究新的方法和技术,为现有系统带来进步。本文对这一领域的最新进展作了系统的综述。研究出版物从各种来源采取和分析。集群的性能在很大程度上取决于与之相关的不同作业处理机制和策略。虽然提出了广泛的研究和解决方案,但在负载平衡、资源利用、内容管理和高效处理方面的性能瓶颈仍然普遍存在。关于不同参数之间权衡的调度解决方案并不多,内容拆分和合并的过程也没有深入探讨,缓解倾斜的解决方案更多地集中在MapReduce的Reduce端,而Map端在负载平衡方面的利用并不多。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Performance Challenges and Solutions in Big Data Platform Hadoop
The present era demands continuous support to bring improvements in executing complex analytics on large-scale data and to work beyond traditional systems. The need for processing diverse data types and solutions for different domains of the industry is rising. Such needs increase the requirement for sophisticated techniques and methods to enhance the existing platforms and mechanisms further. It provides an opportunity for the research community to investigate further into the existing systems, find potential issues, and propose new ways to improve the current systems. Hadoop is a popular choice to manage and process Big data. It is an open-source platform and a front-runner in the batch processing of large-scale jobs. The economy associated with the cluster in scaling is low as compared to other platforms. However, this popularity by no means guarantees high performance in all scenarios. With the continuous evolution in data development and industrial requirements, it is imperative to investigate and look into new methods and techniques to bring advancements to the existing system. A systematic review is represented in this paper to have an insight into the current progress in this field. Research publications from various sources are taken and analyzed. The performance of a cluster largely depends upon the different job processing mechanisms and policies associated with it. While extensive studies and solutions are proposed, the performance bottlenecks in terms of load balancing, resource utilization, content management, and efficient processing prevail. Not many of the solutions are there on scheduling about the trade-off between different parameters, the process of content splitting and merging is not explored to a large extent and the skew mitigation solutions are more focused on Reduce side of the MapReduce while the Map side is not utilized much for load balancing.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Recent Advances in Computer Science and Communications
Recent Advances in Computer Science and Communications Computer Science-Computer Science (all)
CiteScore
2.50
自引率
0.00%
发文量
142
期刊最新文献
Flood Mapping and Damage Analysis Using Multispectral Sentinel-2 Satellite Imagery and Machine Learning Techniques Efficacy of Keystroke Dynamics-Based User Authentication in the Face of Language Complexity Innovation in Knowledge Economy: A Case Study of 3D Printing's Rise in Global Markets and India Cognitive Inherent SLR Enabled Survey for Software Defect Prediction An Era of Communication Technology Using Machine Learning Techniques in Medical Imaging
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1