{"title":"面向Hadoop MapReduce应用的性能优化","authors":"Thandar Htay, S. Phyu","doi":"10.1109/ECTI-CON49241.2020.9158095","DOIUrl":null,"url":null,"abstract":"Apache Hadoop is a widely used open-source distributed platform towards big data processing and provides YARN based distributed parallel processing framework on low cost commodity machines. However, YARN adopts static resource management (that is, the number of containers available per node and the size of each container are static in nature) depending on pre-configured default resource units called containers leading to poor performance to deal with various sort of MapReduce applications. In addition, during the last wave of a job, many available resources occur frequently being idle because YARN does not consider the wave behavior in tasks of MapReduce applications. To take advantage of idle resources resulting in performance improvement, the important parameter, the number of map tasks is needed to optimize based on the available resources and governed by split size. Therefore, this parameter is optimized through the split size tuning based on the available resources. To address the drawback of static resource management of yarn in Hadoop, the numbers of concurrent containers per machine are tuned to optimize the node performance for running each MapReduce application. As per experimental results, the proposed system that optimizes the selected parameter on optimized concurrent containers can achieve the performance gains of MapReduce applications while reducing the optimization overheads.","PeriodicalId":371552,"journal":{"name":"2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)","volume":"154 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards Performance Optimization for Hadoop MapReduce Applications\",\"authors\":\"Thandar Htay, S. Phyu\",\"doi\":\"10.1109/ECTI-CON49241.2020.9158095\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Apache Hadoop is a widely used open-source distributed platform towards big data processing and provides YARN based distributed parallel processing framework on low cost commodity machines. However, YARN adopts static resource management (that is, the number of containers available per node and the size of each container are static in nature) depending on pre-configured default resource units called containers leading to poor performance to deal with various sort of MapReduce applications. In addition, during the last wave of a job, many available resources occur frequently being idle because YARN does not consider the wave behavior in tasks of MapReduce applications. To take advantage of idle resources resulting in performance improvement, the important parameter, the number of map tasks is needed to optimize based on the available resources and governed by split size. Therefore, this parameter is optimized through the split size tuning based on the available resources. To address the drawback of static resource management of yarn in Hadoop, the numbers of concurrent containers per machine are tuned to optimize the node performance for running each MapReduce application. As per experimental results, the proposed system that optimizes the selected parameter on optimized concurrent containers can achieve the performance gains of MapReduce applications while reducing the optimization overheads.\",\"PeriodicalId\":371552,\"journal\":{\"name\":\"2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)\",\"volume\":\"154 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ECTI-CON49241.2020.9158095\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 17th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECTI-CON49241.2020.9158095","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

Apache Hadoop是一个广泛使用的面向大数据处理的开源分布式平台,在低成本的商用机器上提供基于YARN的分布式并行处理框架。然而,YARN采用静态资源管理(即每个节点可用的容器数量和每个容器的大小本质上是静态的),这取决于预配置的默认资源单元(称为容器),导致处理各种类型的MapReduce应用程序的性能较差。此外,在作业的最后一波期间,由于YARN没有考虑MapReduce应用程序任务中的波行为,许多可用资源经常出现空闲状态。为了利用空闲资源从而提高性能,需要根据可用资源和分割大小对映射任务的数量进行优化。因此,该参数通过基于可用资源的分割大小调优进行优化。为了解决Hadoop中yarn静态资源管理的缺点,我们调整了每台机器的并发容器数量,以优化运行每个MapReduce应用程序的节点性能。实验结果表明,本文提出的系统在优化的并发容器上对所选参数进行优化,可以在降低优化开销的同时实现MapReduce应用程序的性能提升。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Towards Performance Optimization for Hadoop MapReduce Applications
Apache Hadoop is a widely used open-source distributed platform towards big data processing and provides YARN based distributed parallel processing framework on low cost commodity machines. However, YARN adopts static resource management (that is, the number of containers available per node and the size of each container are static in nature) depending on pre-configured default resource units called containers leading to poor performance to deal with various sort of MapReduce applications. In addition, during the last wave of a job, many available resources occur frequently being idle because YARN does not consider the wave behavior in tasks of MapReduce applications. To take advantage of idle resources resulting in performance improvement, the important parameter, the number of map tasks is needed to optimize based on the available resources and governed by split size. Therefore, this parameter is optimized through the split size tuning based on the available resources. To address the drawback of static resource management of yarn in Hadoop, the numbers of concurrent containers per machine are tuned to optimize the node performance for running each MapReduce application. As per experimental results, the proposed system that optimizes the selected parameter on optimized concurrent containers can achieve the performance gains of MapReduce applications while reducing the optimization overheads.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Simple Tunable Biquadratic Digital Bandpass Filter Design for Spectrum Sensing in Cognitive Radio ElectricVehicle Simulator Using DC Drives Comparison of Machine Learning Algorithm’s on Self-Driving Car Navigation using Nvidia Jetson Nano Enhancing CNN Based Knowledge Graph Embedding Algorithms Using Auxiliary Vectors: A Case Study of Wordnet Knowledge Graph A Study of Radiated EMI Predictions from Measured Common-mode Currents for Switching Power Supplies
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1