NS3 Based HDFS Data Placement Algorithm Evaluation Framework

Hindol Bhattacharya, Samiran Chattopadhyay, M. Chattopadhyay
{"title":"NS3 Based HDFS Data Placement Algorithm Evaluation Framework","authors":"Hindol Bhattacharya, Samiran Chattopadhyay, M. Chattopadhyay","doi":"10.1109/ICCECE.2017.8526204","DOIUrl":null,"url":null,"abstract":"Big data analytics based data exploration and utilization holds immense prospects for the future of businesses. However, as the name suggests, processing such a huge amount of data is challenging. Hadoop with its parallel processing solutions, assists in processing big data in reasonable time. The heart of Hadoop is its distributed File System; and indeed how data is placed in the file system dictates the speed of the data processing. Hence, over the years efficient data placement algorithms has been one of the key research area in big data analytics. Evaluation of such algorithms traditionally requires deploying HDFS on hardware clusters and implementing the data placement algorithm on it. It is often difficult for researchers to acquire required hardware and build a hardware clusters. Even when such clusters are available, scalability becomes an issue. Moreover, real life data center like cluster is not available to many researchers. Simulation provides low cost alternative to evaluation of big data placement algorithms on HDFS. One of the key metrices that is optimized in data placement algorithms is to minimize communication costs and latency. Thus a network simulation based simulation framework would fit the role perfectly. NS3 is one of the most prominent network simulation tool available for researchers. However, full HDFS support for data placement research is still not implemented. This work proposes to extend the NS3 simulation environment for HDFS support and eventual use for data placement algorithm evaluation.","PeriodicalId":325599,"journal":{"name":"2017 International Conference on Computer, Electrical & Communication Engineering (ICCECE)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Computer, Electrical & Communication Engineering (ICCECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCECE.2017.8526204","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Big data analytics based data exploration and utilization holds immense prospects for the future of businesses. However, as the name suggests, processing such a huge amount of data is challenging. Hadoop with its parallel processing solutions, assists in processing big data in reasonable time. The heart of Hadoop is its distributed File System; and indeed how data is placed in the file system dictates the speed of the data processing. Hence, over the years efficient data placement algorithms has been one of the key research area in big data analytics. Evaluation of such algorithms traditionally requires deploying HDFS on hardware clusters and implementing the data placement algorithm on it. It is often difficult for researchers to acquire required hardware and build a hardware clusters. Even when such clusters are available, scalability becomes an issue. Moreover, real life data center like cluster is not available to many researchers. Simulation provides low cost alternative to evaluation of big data placement algorithms on HDFS. One of the key metrices that is optimized in data placement algorithms is to minimize communication costs and latency. Thus a network simulation based simulation framework would fit the role perfectly. NS3 is one of the most prominent network simulation tool available for researchers. However, full HDFS support for data placement research is still not implemented. This work proposes to extend the NS3 simulation environment for HDFS support and eventual use for data placement algorithm evaluation.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于NS3的HDFS数据放置算法评估框架
基于数据探索和利用的大数据分析为企业的未来带来了巨大的前景。然而,顾名思义,处理如此大量的数据是具有挑战性的。Hadoop以其并行处理解决方案,帮助在合理的时间内处理大数据。Hadoop的核心是分布式文件系统;实际上,数据在文件系统中的放置方式决定了数据处理的速度。因此,多年来高效的数据放置算法一直是大数据分析的重点研究领域之一。传统上,评估这种算法需要在硬件集群上部署HDFS,并在其上实现数据放置算法。研究人员通常很难获得所需的硬件并构建硬件集群。即使这样的集群是可用的,可伸缩性也成为一个问题。此外,像集群这样的现实数据中心对许多研究人员来说是不可用的。模拟为评估HDFS上的大数据放置算法提供了低成本的替代方案。数据放置算法优化的关键指标之一是最小化通信成本和延迟。因此,基于网络仿真的仿真框架将非常适合这一角色。NS3是研究人员最常用的网络仿真工具之一。然而,HDFS对数据放置研究的完全支持仍然没有实现。这项工作建议扩展NS3模拟环境以支持HDFS并最终用于数据放置算法评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
SPP Based Compact Optical Power Splitter Using Two-Mode Interference Coupling SiC MOSFET Based VSC Used in HVDC Transmission with DC Fault Protection Scheme Performance Improvement of Tea Industry with Multi Objective Particle Swarm Optimisation A Comparative Study of Different Ensemble Learning Techniques Using Wisconsin Breast Cancer Dataset Cost Modelling, Sizing and Multi-Point Allocation of Solar Powered DG Using Multi-Objective Cuckoo Search Via Lévy Flights Considering Economic, Technical and Environmental Impacts Along with Voltage Stability
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1