FatMan vs. LittleBoy: Scaling Up Linear Algebraic Operations in Scale-Out Data Platforms

Luna Xu, Seung-Hwan Lim, A. Butt, S. Sukumar, R. Kannan
{"title":"FatMan vs. LittleBoy: Scaling Up Linear Algebraic Operations in Scale-Out Data Platforms","authors":"Luna Xu, Seung-Hwan Lim, A. Butt, S. Sukumar, R. Kannan","doi":"10.1109/PDSW-DISCS.2016.8","DOIUrl":null,"url":null,"abstract":"Linear algebraic operations such as matrix manipulations form the kernel of many machine learning and other crucial algorithms. Scaling up as well as scaling out such algorithms are highly desirable to enable efficient processing over millions of data points. To this end, we present a matrix manipulation approach to effectively scale-up each node in a scale-out data parallel platform such as Apache Spark. Specifically, we enable hardware acceleration for matrix multiplications in a distributed Spark setup without user intervention. Our approach supports both dense and sparse distributed matrices, and provides flexible control of acceleration by matrix density. We demonstrate the benefit of our approach for generalized matrix multiplication operations over large matrices with up to four billion elements. To connect the effectiveness of our approach with machine learning applications, we performed Gramian matrix computation via generalized matrix multiplications. Our experiments show that our approach achieves more than 2× performance speed-up, and up to 96.1% computation improvement, compared to a state of the art Spark MLlib for dense matrices.","PeriodicalId":375550,"journal":{"name":"2016 1st Joint International Workshop on Parallel Data Storage and data Intensive Scalable Computing Systems (PDSW-DISCS)","volume":"364 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 1st Joint International Workshop on Parallel Data Storage and data Intensive Scalable Computing Systems (PDSW-DISCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDSW-DISCS.2016.8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Linear algebraic operations such as matrix manipulations form the kernel of many machine learning and other crucial algorithms. Scaling up as well as scaling out such algorithms are highly desirable to enable efficient processing over millions of data points. To this end, we present a matrix manipulation approach to effectively scale-up each node in a scale-out data parallel platform such as Apache Spark. Specifically, we enable hardware acceleration for matrix multiplications in a distributed Spark setup without user intervention. Our approach supports both dense and sparse distributed matrices, and provides flexible control of acceleration by matrix density. We demonstrate the benefit of our approach for generalized matrix multiplication operations over large matrices with up to four billion elements. To connect the effectiveness of our approach with machine learning applications, we performed Gramian matrix computation via generalized matrix multiplications. Our experiments show that our approach achieves more than 2× performance speed-up, and up to 96.1% computation improvement, compared to a state of the art Spark MLlib for dense matrices.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
胖子vs小男孩:扩展数据平台中线性代数运算的扩展
线性代数运算,如矩阵运算,构成了许多机器学习和其他关键算法的核心。要实现对数百万个数据点的高效处理,这类算法的向上和向外扩展都是非常必要的。为此,我们提出了一种矩阵操作方法,可以在横向扩展数据并行平台(如Apache Spark)中有效地扩展每个节点。具体来说,我们在没有用户干预的情况下为分布式Spark设置中的矩阵乘法启用硬件加速。我们的方法支持密集和稀疏分布矩阵,并通过矩阵密度提供灵活的加速度控制。我们证明了我们的方法对具有多达40亿个元素的大型矩阵的广义矩阵乘法运算的好处。为了将我们的方法与机器学习应用程序的有效性联系起来,我们通过广义矩阵乘法进行了Gramian矩阵计算。我们的实验表明,与目前最先进的Spark MLlib相比,我们的方法实现了2倍以上的性能加速,以及高达96.1%的计算改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Klimatic: A Virtual Data Lake for Harvesting and Distribution of Geospatial Data Towards Energy Efficient Data Management in HPC: The Open Ethernet Drive Approach FatMan vs. LittleBoy: Scaling Up Linear Algebraic Operations in Scale-Out Data Platforms A Bloom Filter Based Scalable Data Integrity Check Tool for Large-Scale Dataset Can Non-volatile Memory Benefit MapReduce Applications on HPC Clusters?
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1