FatMan vs. LittleBoy: Scaling Up Linear Algebraic Operations in Scale-Out Data Platforms

2016 1st Joint International Workshop on Parallel Data Storage and data Intensive Scalable Computing Systems (PDSW-DISCS) Pub Date : 2016-11-13 DOI:10.1109/PDSW-DISCS.2016.8

Luna Xu, Seung-Hwan Lim, A. Butt, S. Sukumar, R. Kannan

{"title":"FatMan vs. LittleBoy: Scaling Up Linear Algebraic Operations in Scale-Out Data Platforms","authors":"Luna Xu, Seung-Hwan Lim, A. Butt, S. Sukumar, R. Kannan","doi":"10.1109/PDSW-DISCS.2016.8","DOIUrl":null,"url":null,"abstract":"Linear algebraic operations such as matrix manipulations form the kernel of many machine learning and other crucial algorithms. Scaling up as well as scaling out such algorithms are highly desirable to enable efficient processing over millions of data points. To this end, we present a matrix manipulation approach to effectively scale-up each node in a scale-out data parallel platform such as Apache Spark. Specifically, we enable hardware acceleration for matrix multiplications in a distributed Spark setup without user intervention. Our approach supports both dense and sparse distributed matrices, and provides flexible control of acceleration by matrix density. We demonstrate the benefit of our approach for generalized matrix multiplication operations over large matrices with up to four billion elements. To connect the effectiveness of our approach with machine learning applications, we performed Gramian matrix computation via generalized matrix multiplications. Our experiments show that our approach achieves more than 2× performance speed-up, and up to 96.1% computation improvement, compared to a state of the art Spark MLlib for dense matrices.","PeriodicalId":375550,"journal":{"name":"2016 1st Joint International Workshop on Parallel Data Storage and data Intensive Scalable Computing Systems (PDSW-DISCS)","volume":"364 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 1st Joint International Workshop on Parallel Data Storage and data Intensive Scalable Computing Systems (PDSW-DISCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDSW-DISCS.2016.8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Linear algebraic operations such as matrix manipulations form the kernel of many machine learning and other crucial algorithms. Scaling up as well as scaling out such algorithms are highly desirable to enable efficient processing over millions of data points. To this end, we present a matrix manipulation approach to effectively scale-up each node in a scale-out data parallel platform such as Apache Spark. Specifically, we enable hardware acceleration for matrix multiplications in a distributed Spark setup without user intervention. Our approach supports both dense and sparse distributed matrices, and provides flexible control of acceleration by matrix density. We demonstrate the benefit of our approach for generalized matrix multiplication operations over large matrices with up to four billion elements. To connect the effectiveness of our approach with machine learning applications, we performed Gramian matrix computation via generalized matrix multiplications. Our experiments show that our approach achieves more than 2× performance speed-up, and up to 96.1% computation improvement, compared to a state of the art Spark MLlib for dense matrices.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

胖子vs小男孩:扩展数据平台中线性代数运算的扩展

线性代数运算，如矩阵运算，构成了许多机器学习和其他关键算法的核心。要实现对数百万个数据点的高效处理，这类算法的向上和向外扩展都是非常必要的。为此，我们提出了一种矩阵操作方法，可以在横向扩展数据并行平台(如Apache Spark)中有效地扩展每个节点。具体来说，我们在没有用户干预的情况下为分布式Spark设置中的矩阵乘法启用硬件加速。我们的方法支持密集和稀疏分布矩阵，并通过矩阵密度提供灵活的加速度控制。我们证明了我们的方法对具有多达40亿个元素的大型矩阵的广义矩阵乘法运算的好处。为了将我们的方法与机器学习应用程序的有效性联系起来，我们通过广义矩阵乘法进行了Gramian矩阵计算。我们的实验表明，与目前最先进的Spark MLlib相比，我们的方法实现了2倍以上的性能加速，以及高达96.1%的计算改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2016 1st Joint International Workshop on Parallel Data Storage and data Intensive Scalable Computing Systems (PDSW-DISCS)

自引率

0.00%

发文量