Adaptively Accelerating Map-Reduce/Spark with GPUs: A Case Study

K. R. Jayaram, Anshul Gandhi, Hongyi Xin, S. Tao
{"title":"Adaptively Accelerating Map-Reduce/Spark with GPUs: A Case Study","authors":"K. R. Jayaram, Anshul Gandhi, Hongyi Xin, S. Tao","doi":"10.1109/ICAC.2019.00022","DOIUrl":null,"url":null,"abstract":"In this paper, we propose and evaluate a simple mechanism to accelerate iterative machine learning algorithms implemented in Hadoop map-reduce (stock), and Apache Spark. In particular, we describe a technique that enables data parallel tasks in map-reduce and Spark to be dynamically and adaptively scheduled on CPU or GPU, based on availability and load. We examine the extent of performance improvements, and correlate them to various parameters of the algorithms studied. We focus on end-to-end performance impact, including overheads associated with transferring data into and out of the GPU, and conversion between data representations in the JVM and on GPU. We also present three optimizations that, in our analysis, can be generalized across many iterative machine learning applications. We present a case study where we accelerate four iterative machine learning applications – multinomial logistic regression, multiple linear regression, K-Means clustering and principal components analysis using singular value decomposition, implemented in three data analytics frameworks – Hadoop Map-Reduce (HMR), IBM Main-Memory Map-Reduce (M3R) and Spark. We observe that the use of GPGPUs decreases the execution time of these applications on HMR by up to 8X, M3R by up to 18X, and Spark by up to 25X. Through our empirical analysis, we offer several insights that can be helpful in designing middleware and cluster managers to accelerate map-reduce and Spark applications using GPUs.","PeriodicalId":442645,"journal":{"name":"2019 IEEE International Conference on Autonomic Computing (ICAC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Autonomic Computing (ICAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAC.2019.00022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

In this paper, we propose and evaluate a simple mechanism to accelerate iterative machine learning algorithms implemented in Hadoop map-reduce (stock), and Apache Spark. In particular, we describe a technique that enables data parallel tasks in map-reduce and Spark to be dynamically and adaptively scheduled on CPU or GPU, based on availability and load. We examine the extent of performance improvements, and correlate them to various parameters of the algorithms studied. We focus on end-to-end performance impact, including overheads associated with transferring data into and out of the GPU, and conversion between data representations in the JVM and on GPU. We also present three optimizations that, in our analysis, can be generalized across many iterative machine learning applications. We present a case study where we accelerate four iterative machine learning applications – multinomial logistic regression, multiple linear regression, K-Means clustering and principal components analysis using singular value decomposition, implemented in three data analytics frameworks – Hadoop Map-Reduce (HMR), IBM Main-Memory Map-Reduce (M3R) and Spark. We observe that the use of GPGPUs decreases the execution time of these applications on HMR by up to 8X, M3R by up to 18X, and Spark by up to 25X. Through our empirical analysis, we offer several insights that can be helpful in designing middleware and cluster managers to accelerate map-reduce and Spark applications using GPUs.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用gpu自适应加速Map-Reduce/Spark:一个案例研究
在本文中,我们提出并评估了一种简单的机制来加速在Hadoop map-reduce (stock)和Apache Spark中实现的迭代机器学习算法。特别是,我们描述了一种技术,该技术使map-reduce和Spark中的数据并行任务能够基于可用性和负载在CPU或GPU上动态自适应地调度。我们检查了性能改进的程度,并将它们与所研究算法的各种参数相关联。我们关注端到端的性能影响,包括与将数据传入和传出GPU相关的开销,以及JVM和GPU中的数据表示之间的转换。我们还提出了三种优化方法,在我们的分析中,它们可以推广到许多迭代机器学习应用中。我们提出了一个案例研究,其中我们加速了四个迭代机器学习应用程序-多项逻辑回归,多元线性回归,K-Means聚类和主成分分析使用奇异值分解,在三个数据分析框架中实现- Hadoop Map-Reduce (HMR), IBM Main-Memory Map-Reduce (M3R)和Spark。我们观察到,使用gpgpu可以将这些应用程序在HMR上的执行时间减少多达8倍,M3R最多减少18倍,Spark最多减少25倍。通过我们的实证分析,我们提供了一些见解,可以帮助设计中间件和集群管理器来加速使用gpu的map-reduce和Spark应用程序。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Chisel: Reshaping Queries to Trim Latency in Key-Value Stores GreenRoute: A Generalizable Fuel-Saving Vehicular Navigation Service Characterizing Disk Health Degradation and Proactively Protecting Against Disk Failures for Reliable Storage Systems Adaptively Accelerating Map-Reduce/Spark with GPUs: A Case Study Enhancing Learning-Enabled Software Systems to Address Environmental Uncertainty
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1