FLOWPROPHET:通用和准确的流量预测数据并行集群计算

Hao Wang, Li Chen, Kai Chen, Ziyang Li, Yiming Zhang, Haibing Guan, Zhengwei Qi, Dongsheng Li, Yanhui Geng
{"title":"FLOWPROPHET:通用和准确的流量预测数据并行集群计算","authors":"Hao Wang, Li Chen, Kai Chen, Ziyang Li, Yiming Zhang, Haibing Guan, Zhengwei Qi, Dongsheng Li, Yanhui Geng","doi":"10.1109/ICDCS.2015.43","DOIUrl":null,"url":null,"abstract":"Data-parallel computing frameworks (DCF) such as MapReduce, Spark, and Dryad etc. Have tremendous applications in big data and cloud computing, and throw tons of flows into data center networks. In this paper, we design and implement FLOWPROPHET, a general framework to predict traffic flows for DCFs. To this end, we analyze and summarize the common features of popular DCFs, and gain a key insight: since application logic in DCFs is naturally expressed by directed acyclic graphs (DAG), DAG contains necessary time and data dependencies for accurate flow prediction. Based on the insight, FLOWPROPHET extracts DAGs from user applications, and uses the time and data dependencies to calculate flow information 4-tuple, (source, destination, flow_size, establish_time), ahead-of-time for all flows. We also provide generic programming interface to FLOWPROPHET, so that current and future DCFs can deploy FLOWPROPHET readily. We implement FLOWPROPHET on both Spark and Hadoop, and perform extensive evaluations on a testbed with 37 physical servers. Our implementation and experiments demonstrate that, with time in advance and minimal cost, FLOWPROPHET can achieve almost 100% accuracy in source, destination, and flow size predictions. With accurate prediction from FLOWPROPHET, the job completion time of a Hadoop TeraSort benchmark is reduced by 12.52% on our cluster with a simple network scheduler.","PeriodicalId":129182,"journal":{"name":"2015 IEEE 35th International Conference on Distributed Computing Systems","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"FLOWPROPHET: Generic and Accurate Traffic Prediction for Data-Parallel Cluster Computing\",\"authors\":\"Hao Wang, Li Chen, Kai Chen, Ziyang Li, Yiming Zhang, Haibing Guan, Zhengwei Qi, Dongsheng Li, Yanhui Geng\",\"doi\":\"10.1109/ICDCS.2015.43\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data-parallel computing frameworks (DCF) such as MapReduce, Spark, and Dryad etc. Have tremendous applications in big data and cloud computing, and throw tons of flows into data center networks. In this paper, we design and implement FLOWPROPHET, a general framework to predict traffic flows for DCFs. To this end, we analyze and summarize the common features of popular DCFs, and gain a key insight: since application logic in DCFs is naturally expressed by directed acyclic graphs (DAG), DAG contains necessary time and data dependencies for accurate flow prediction. Based on the insight, FLOWPROPHET extracts DAGs from user applications, and uses the time and data dependencies to calculate flow information 4-tuple, (source, destination, flow_size, establish_time), ahead-of-time for all flows. We also provide generic programming interface to FLOWPROPHET, so that current and future DCFs can deploy FLOWPROPHET readily. We implement FLOWPROPHET on both Spark and Hadoop, and perform extensive evaluations on a testbed with 37 physical servers. Our implementation and experiments demonstrate that, with time in advance and minimal cost, FLOWPROPHET can achieve almost 100% accuracy in source, destination, and flow size predictions. With accurate prediction from FLOWPROPHET, the job completion time of a Hadoop TeraSort benchmark is reduced by 12.52% on our cluster with a simple network scheduler.\",\"PeriodicalId\":129182,\"journal\":{\"name\":\"2015 IEEE 35th International Conference on Distributed Computing Systems\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-07-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE 35th International Conference on Distributed Computing Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDCS.2015.43\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 35th International Conference on Distributed Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS.2015.43","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20

摘要

DCF (data parallel computing frameworks),如MapReduce、Spark、Dryad等。在大数据和云计算方面有巨大的应用,并将大量流量投入数据中心网络。在本文中,我们设计并实现了FLOWPROPHET,一个用于dcf交通流预测的通用框架。为此,我们分析和总结了流行的dcf的共同特征,并获得了一个关键的见解:由于dcf中的应用逻辑自然地由有向无环图(DAG)表示,DAG包含了准确的流量预测所需的时间和数据依赖性。基于洞察,FLOWPROPHET从用户应用程序中提取dag,并使用时间和数据依赖关系提前计算所有流的流信息4元组(源、目的地、flow_size、建立时间)。我们还为FLOWPROPHET提供了通用编程接口,以便当前和未来的dcf可以轻松部署FLOWPROPHET。我们在Spark和Hadoop上实现了FLOWPROPHET,并在一个有37个物理服务器的测试平台上进行了广泛的评估。我们的实践和实验表明,FLOWPROPHET可以在提前时间和最小成本的情况下,实现几乎100%的源、目的地和流量大小预测精度。通过FLOWPROPHET的准确预测,在我们的集群上使用一个简单的网络调度程序,Hadoop TeraSort基准测试的作业完成时间减少了12.52%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
FLOWPROPHET: Generic and Accurate Traffic Prediction for Data-Parallel Cluster Computing
Data-parallel computing frameworks (DCF) such as MapReduce, Spark, and Dryad etc. Have tremendous applications in big data and cloud computing, and throw tons of flows into data center networks. In this paper, we design and implement FLOWPROPHET, a general framework to predict traffic flows for DCFs. To this end, we analyze and summarize the common features of popular DCFs, and gain a key insight: since application logic in DCFs is naturally expressed by directed acyclic graphs (DAG), DAG contains necessary time and data dependencies for accurate flow prediction. Based on the insight, FLOWPROPHET extracts DAGs from user applications, and uses the time and data dependencies to calculate flow information 4-tuple, (source, destination, flow_size, establish_time), ahead-of-time for all flows. We also provide generic programming interface to FLOWPROPHET, so that current and future DCFs can deploy FLOWPROPHET readily. We implement FLOWPROPHET on both Spark and Hadoop, and perform extensive evaluations on a testbed with 37 physical servers. Our implementation and experiments demonstrate that, with time in advance and minimal cost, FLOWPROPHET can achieve almost 100% accuracy in source, destination, and flow size predictions. With accurate prediction from FLOWPROPHET, the job completion time of a Hadoop TeraSort benchmark is reduced by 12.52% on our cluster with a simple network scheduler.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
FLOWPROPHET: Generic and Accurate Traffic Prediction for Data-Parallel Cluster Computing Improving the Energy Benefit for 802.3az Using Dynamic Coalescing Techniques Systematic Mining of Associated Server Herds for Malware Campaign Discovery Rain Bar: Robust Application-Driven Visual Communication Using Color Barcodes Optimizing Roadside Advertisement Dissemination in Vehicular Cyber-Physical Systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1