CAVA:探索虚拟化集群中大数据分析的内存局部性

Eunji Hwang, Hyungoo Kim, Beomseok Nam, Young-ri Choi
{"title":"CAVA:探索虚拟化集群中大数据分析的内存局部性","authors":"Eunji Hwang, Hyungoo Kim, Beomseok Nam, Young-ri Choi","doi":"10.1145/3127479.3129253","DOIUrl":null,"url":null,"abstract":"Running big data analytics frameworks in the cloud is becoming increasingly important, but their resource managers in the current form are not designed to consider virtualized environments. In this work, we investigate various levels of data locality in a virtualized environment, ranging from rack locality to memory locality. Exploiting extra fine-grained levels of data locality in a virtualized environment, our memory locality-aware scheduling algorithm effectively increases the cache hit ratio and thereby reduces network traffic and disk I/O. However, a high cache hit ratio does not necessarily imply a shorter job execution time in MapReduce applications. To resolve this issue, we develop the Cache-Affinity and Virtualization-Aware (CAVA) resource manager, which measures the cache affinity of MapReduce applications at runtime and efficiently manages distributed in-memory caches of a limited size by assigning high priority to applications that have high cache affinity. The proposed memory locality-aware scheduling algorithm is also integrated into the CAVA resource manager. Our extensive experimental study shows that CAVA exhibits overall good performance over various workloads composed of multiple big data analytics applications by considering the fine-grained data locality levels in virtualized clusters and by efficiently using scarce memory resources.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"CAVA: Exploring Memory Locality for Big Data Analytics in Virtualized Clusters\",\"authors\":\"Eunji Hwang, Hyungoo Kim, Beomseok Nam, Young-ri Choi\",\"doi\":\"10.1145/3127479.3129253\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Running big data analytics frameworks in the cloud is becoming increasingly important, but their resource managers in the current form are not designed to consider virtualized environments. In this work, we investigate various levels of data locality in a virtualized environment, ranging from rack locality to memory locality. Exploiting extra fine-grained levels of data locality in a virtualized environment, our memory locality-aware scheduling algorithm effectively increases the cache hit ratio and thereby reduces network traffic and disk I/O. However, a high cache hit ratio does not necessarily imply a shorter job execution time in MapReduce applications. To resolve this issue, we develop the Cache-Affinity and Virtualization-Aware (CAVA) resource manager, which measures the cache affinity of MapReduce applications at runtime and efficiently manages distributed in-memory caches of a limited size by assigning high priority to applications that have high cache affinity. The proposed memory locality-aware scheduling algorithm is also integrated into the CAVA resource manager. Our extensive experimental study shows that CAVA exhibits overall good performance over various workloads composed of multiple big data analytics applications by considering the fine-grained data locality levels in virtualized clusters and by efficiently using scarce memory resources.\",\"PeriodicalId\":321027,\"journal\":{\"name\":\"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)\",\"volume\":\"87 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-09-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3127479.3129253\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3127479.3129253","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

在云中运行大数据分析框架正变得越来越重要,但其当前形式的资源管理器并没有考虑到虚拟化环境。在这项工作中,我们研究了虚拟化环境中各种级别的数据局部性,从机架局部性到内存局部性。我们的内存位置感知调度算法利用虚拟化环境中更细粒度的数据位置级别,有效地提高了缓存命中率,从而减少了网络流量和磁盘I/O。然而,在MapReduce应用程序中,高缓存命中率并不一定意味着更短的作业执行时间。为了解决这个问题,我们开发了缓存亲和性和虚拟化感知(CAVA)资源管理器,它在运行时测量MapReduce应用程序的缓存亲和性,并通过为具有高缓存亲和性的应用程序分配高优先级来有效地管理有限大小的分布式内存缓存。提出的内存位置感知调度算法也被集成到CAVA资源管理器中。我们广泛的实验研究表明,通过考虑虚拟化集群中的细粒度数据局部性级别和有效利用稀缺的内存资源,CAVA在由多个大数据分析应用程序组成的各种工作负载上表现出总体良好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CAVA: Exploring Memory Locality for Big Data Analytics in Virtualized Clusters
Running big data analytics frameworks in the cloud is becoming increasingly important, but their resource managers in the current form are not designed to consider virtualized environments. In this work, we investigate various levels of data locality in a virtualized environment, ranging from rack locality to memory locality. Exploiting extra fine-grained levels of data locality in a virtualized environment, our memory locality-aware scheduling algorithm effectively increases the cache hit ratio and thereby reduces network traffic and disk I/O. However, a high cache hit ratio does not necessarily imply a shorter job execution time in MapReduce applications. To resolve this issue, we develop the Cache-Affinity and Virtualization-Aware (CAVA) resource manager, which measures the cache affinity of MapReduce applications at runtime and efficiently manages distributed in-memory caches of a limited size by assigning high priority to applications that have high cache affinity. The proposed memory locality-aware scheduling algorithm is also integrated into the CAVA resource manager. Our extensive experimental study shows that CAVA exhibits overall good performance over various workloads composed of multiple big data analytics applications by considering the fine-grained data locality levels in virtualized clusters and by efficiently using scarce memory resources.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Extreme-Scale Realistic Stencil Computations on Sunway TaihuLight with Ten Million Cores RideMatcher: Peer-to-Peer Matching of Passengers for Efficient Ridesharing Nitro: Network-Aware Virtual Machine Image Management in Geo-Distributed Clouds Improving Energy Efficiency of Database Clusters Through Prefetching and Caching Main-Memory Requirements of Big Data Applications on Commodity Server Platform
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1