Efficient Data-Parallel Cumulative Aggregates for Large-Scale Machine Learning

Matthias Boehm, A. Evfimievski, B. Reinwald
{"title":"Efficient Data-Parallel Cumulative Aggregates for Large-Scale Machine Learning","authors":"Matthias Boehm, A. Evfimievski, B. Reinwald","doi":"10.18420/btw2019-17","DOIUrl":null,"url":null,"abstract":"Cumulative aggregates are often overlooked yet important operations in large-scale machine learning (ML) systems. Examples are prefix sums and more complex aggregates, but also preprocessing techniques such as the removal of empty rows or columns. These operations are challenging to parallelize over distributed, blocked matrices—as commonly used in ML systems—due to recursive data dependencies. However, computing prefix sums is a classic example of a presumably sequential operation that can be efficiently parallelized via aggregation trees. In this paper, we describe an efficient framework for data-parallel cumulative aggregates over distributed, blocked matrices. The basic idea is a self-similar operator composed of a forward cascade that reduces the data size by orders of magnitude per iteration until the data fits in local memory, a local cumulative aggregate over the partial aggregates, and a backward cascade to produce the final result. We also generalize this framework for complex cumulative aggregates of sum-product expressions, and characterize the class of supported operations. Finally, we describe the end-to-end compiler and runtime integration into SystemML, and the use of cumulative aggregates in other operations. Our experiments show that this framework achieves both high performance for moderate data sizes and good scalability.","PeriodicalId":421643,"journal":{"name":"Datenbanksysteme für Business, Technologie und Web","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Datenbanksysteme für Business, Technologie und Web","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18420/btw2019-17","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Cumulative aggregates are often overlooked yet important operations in large-scale machine learning (ML) systems. Examples are prefix sums and more complex aggregates, but also preprocessing techniques such as the removal of empty rows or columns. These operations are challenging to parallelize over distributed, blocked matrices—as commonly used in ML systems—due to recursive data dependencies. However, computing prefix sums is a classic example of a presumably sequential operation that can be efficiently parallelized via aggregation trees. In this paper, we describe an efficient framework for data-parallel cumulative aggregates over distributed, blocked matrices. The basic idea is a self-similar operator composed of a forward cascade that reduces the data size by orders of magnitude per iteration until the data fits in local memory, a local cumulative aggregate over the partial aggregates, and a backward cascade to produce the final result. We also generalize this framework for complex cumulative aggregates of sum-product expressions, and characterize the class of supported operations. Finally, we describe the end-to-end compiler and runtime integration into SystemML, and the use of cumulative aggregates in other operations. Our experiments show that this framework achieves both high performance for moderate data sizes and good scalability.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
大规模机器学习的高效数据并行累积聚合
在大规模机器学习(ML)系统中,累积聚合通常是被忽视的重要操作。例如前缀和和更复杂的聚合,以及预处理技术,例如删除空行或空列。由于递归数据依赖关系,这些操作很难并行化分布的阻塞矩阵(如ML系统中常用的那样)。然而,计算前缀和是一个可以通过聚合树有效并行化的顺序操作的典型例子。在本文中,我们描述了一个有效的框架,用于数据并行累积聚合分布,阻塞矩阵。其基本思想是一个自相似的运算符,由前向级联组成,前向级联在每次迭代中按数量级减少数据大小,直到数据适合局部内存;局部累积聚合在部分聚合之上;后向级联产生最终结果。我们还将这个框架推广到和积表达式的复杂累积聚集,并描述了支持操作的类别。最后,我们描述了将端到端的编译器和运行时集成到SystemML中,以及在其他操作中使用累积聚合。我们的实验表明,该框架在中等数据量下实现了高性能和良好的可扩展性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
SportsTables: A new Corpus for Semantic Type Detection Accelerating Large Table Scan using Processing-In-Memory Technology The InsightsNet Climate Change Corpus (ICCC) On the State of German (Abstractive) Text Summarization The Easiest Way of Turning your Relational Database into a Blockchain - and the Cost of Doing So
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1