基于云基础设施的大数据应用I/O性能建模

Ioannis Mytilinis, Dimitrios Tsoumakos, Verena Kantere, Anastassios Nanos, N. Koziris
{"title":"基于云基础设施的大数据应用I/O性能建模","authors":"Ioannis Mytilinis, Dimitrios Tsoumakos, Verena Kantere, Anastassios Nanos, N. Koziris","doi":"10.1109/IC2E.2015.29","DOIUrl":null,"url":null,"abstract":"Big Data applications receive an ever-increasing amount of attention, thus becoming a dominant class of applications that are deployed over virtualized environments. Cloud environments entail a large amount of complexity relative to I/O performance. The use of Big Data increases the complexity of I/O management as well as its characterization and prediction: As I/O operations become growingly dominant in such applications, the intricacies of virtualization, different storage back ends and deployment setups significantly hinder our ability to analyze and correctly predict I/O performance. To that end, this work proposes an end-to-end modeling technique to predict performance of I/O--intensive Big Data applications running over cloud infrastructures. We develop a model tuned over application and infrastructure dimensions: Primitive I/O operations, data access patterns, storage back ends and deployment parameters. The trained model can be used to predict both I/O but also general task performance. Our evaluation results show that for jobs which are dominated by I/O operations, such as I/O-bound MapReduce jobs, our model is capable of predicting execution time with an accuracy close to 90% that decreases as application processing becomes more complex.","PeriodicalId":395715,"journal":{"name":"2015 IEEE International Conference on Cloud Engineering","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"I/O Performance Modeling for Big Data Applications over Cloud Infrastructures\",\"authors\":\"Ioannis Mytilinis, Dimitrios Tsoumakos, Verena Kantere, Anastassios Nanos, N. Koziris\",\"doi\":\"10.1109/IC2E.2015.29\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Big Data applications receive an ever-increasing amount of attention, thus becoming a dominant class of applications that are deployed over virtualized environments. Cloud environments entail a large amount of complexity relative to I/O performance. The use of Big Data increases the complexity of I/O management as well as its characterization and prediction: As I/O operations become growingly dominant in such applications, the intricacies of virtualization, different storage back ends and deployment setups significantly hinder our ability to analyze and correctly predict I/O performance. To that end, this work proposes an end-to-end modeling technique to predict performance of I/O--intensive Big Data applications running over cloud infrastructures. We develop a model tuned over application and infrastructure dimensions: Primitive I/O operations, data access patterns, storage back ends and deployment parameters. The trained model can be used to predict both I/O but also general task performance. Our evaluation results show that for jobs which are dominated by I/O operations, such as I/O-bound MapReduce jobs, our model is capable of predicting execution time with an accuracy close to 90% that decreases as application processing becomes more complex.\",\"PeriodicalId\":395715,\"journal\":{\"name\":\"2015 IEEE International Conference on Cloud Engineering\",\"volume\":\"66 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-03-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE International Conference on Cloud Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IC2E.2015.29\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Cloud Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC2E.2015.29","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

摘要

大数据应用受到越来越多的关注,因此成为部署在虚拟化环境上的主要应用类别。相对于I/O性能,云环境带来了大量的复杂性。大数据的使用增加了I/O管理及其特征和预测的复杂性:随着I/O操作在此类应用程序中越来越占主导地位,虚拟化、不同存储后端和部署设置的复杂性极大地阻碍了我们分析和正确预测I/O性能的能力。为此,本研究提出了一种端到端建模技术,用于预测在云基础设施上运行的I/O密集型大数据应用程序的性能。我们开发了一个针对应用程序和基础设施维度进行调整的模型:基本I/O操作、数据访问模式、存储后端和部署参数。经过训练的模型既可用于预测I/O,也可用于预测一般任务性能。我们的评估结果表明,对于I/O操作占主导地位的作业,例如I/O绑定的MapReduce作业,我们的模型能够以接近90%的准确率预测执行时间,随着应用程序处理变得更复杂而降低。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
I/O Performance Modeling for Big Data Applications over Cloud Infrastructures
Big Data applications receive an ever-increasing amount of attention, thus becoming a dominant class of applications that are deployed over virtualized environments. Cloud environments entail a large amount of complexity relative to I/O performance. The use of Big Data increases the complexity of I/O management as well as its characterization and prediction: As I/O operations become growingly dominant in such applications, the intricacies of virtualization, different storage back ends and deployment setups significantly hinder our ability to analyze and correctly predict I/O performance. To that end, this work proposes an end-to-end modeling technique to predict performance of I/O--intensive Big Data applications running over cloud infrastructures. We develop a model tuned over application and infrastructure dimensions: Primitive I/O operations, data access patterns, storage back ends and deployment parameters. The trained model can be used to predict both I/O but also general task performance. Our evaluation results show that for jobs which are dominated by I/O operations, such as I/O-bound MapReduce jobs, our model is capable of predicting execution time with an accuracy close to 90% that decreases as application processing becomes more complex.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
In-memory computing for scalable data analytics Automating Cloud Service Level Agreements Using Semantic Technologies A Case Study of IaaS and SaaS in a Public Cloud Architecture for High Confidence Cloud Security Monitoring Towards a Practical and Efficient Search over Encrypted Data in the Cloud
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1