集合虚拟机:多前端多后端数据分析的抽象

Ingo Müller, Renato Marroquín, D. Koutsoukos, Mike Wawrzoniak, G. Alonso, Sabir Akhadov
{"title":"集合虚拟机:多前端多后端数据分析的抽象","authors":"Ingo Müller, Renato Marroquín, D. Koutsoukos, Mike Wawrzoniak, G. Alonso, Sabir Akhadov","doi":"10.1145/3399666.3399911","DOIUrl":null,"url":null,"abstract":"Getting the best performance from the ever-increasing number of hardware platforms has been a recurring challenge for data processing systems. In recent years, the advent of data science with its increasingly numerous and complex types of analytics has made this challenge even more difficult. In practice, system designers are overwhelmed by the number of combinations and typically implement a single analytics type on one platform, leading to repeated implementation effort---and a plethora of semi-compatible tools for data scientists. In this paper, we propose the \"Collection Virtual Machine\" (or CVM)---an extensible compiler framework designed to keep the specialization process of data analytics systems tractable. It can capture at the same time the essence of a large span of low-level, hardware-specific implementation techniques as well as high-level operations of different types of analyses. At its core lies a language for defining nested, collection-oriented intermediate representations (IRs). Frontends produce programs in their IR flavors defined in that language, which get optimized through a series of rewritings (possibly changing the IR flavor multiple times) until the program is finally expressed in an IR of platform-specific operators. While reducing the overall implementation effort, this also improves the interoperability of both analyses and hardware platforms. We have used CVM successfully to build specialized backends for platforms as diverse as multi-core CPUs, RDMA clusters, and serverless computing infrastructure in the cloud and expect similar results for many more frontends and hardware platforms in the near future.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"The collection Virtual Machine: an abstraction for multi-frontend multi-backend data analysis\",\"authors\":\"Ingo Müller, Renato Marroquín, D. Koutsoukos, Mike Wawrzoniak, G. Alonso, Sabir Akhadov\",\"doi\":\"10.1145/3399666.3399911\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Getting the best performance from the ever-increasing number of hardware platforms has been a recurring challenge for data processing systems. In recent years, the advent of data science with its increasingly numerous and complex types of analytics has made this challenge even more difficult. In practice, system designers are overwhelmed by the number of combinations and typically implement a single analytics type on one platform, leading to repeated implementation effort---and a plethora of semi-compatible tools for data scientists. In this paper, we propose the \\\"Collection Virtual Machine\\\" (or CVM)---an extensible compiler framework designed to keep the specialization process of data analytics systems tractable. It can capture at the same time the essence of a large span of low-level, hardware-specific implementation techniques as well as high-level operations of different types of analyses. At its core lies a language for defining nested, collection-oriented intermediate representations (IRs). Frontends produce programs in their IR flavors defined in that language, which get optimized through a series of rewritings (possibly changing the IR flavor multiple times) until the program is finally expressed in an IR of platform-specific operators. While reducing the overall implementation effort, this also improves the interoperability of both analyses and hardware platforms. We have used CVM successfully to build specialized backends for platforms as diverse as multi-core CPUs, RDMA clusters, and serverless computing infrastructure in the cloud and expect similar results for many more frontends and hardware platforms in the near future.\",\"PeriodicalId\":256784,\"journal\":{\"name\":\"Proceedings of the 16th International Workshop on Data Management on New Hardware\",\"volume\":\"44 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-04-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 16th International Workshop on Data Management on New Hardware\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3399666.3399911\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th International Workshop on Data Management on New Hardware","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3399666.3399911","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

摘要

从数量不断增加的硬件平台中获得最佳性能一直是数据处理系统面临的一个反复出现的挑战。近年来,数据科学的出现及其越来越多和复杂的分析类型使这一挑战变得更加困难。在实践中,系统设计人员被大量的组合所淹没,并且通常在一个平台上实现单一的分析类型,导致重复的实现工作-以及数据科学家的大量半兼容工具。在本文中,我们提出了“集合虚拟机”(或CVM)——一个可扩展的编译器框架,旨在保持数据分析系统的专业化过程易于处理。它可以同时捕获大范围的低级、特定于硬件的实现技术的本质,以及不同类型分析的高级操作。其核心是用于定义嵌套的、面向集合的中间表示(ir)的语言。前端生成用该语言定义的IR风格的程序,这些程序通过一系列重写(可能多次更改IR风格)得到优化,直到程序最终用特定于平台操作符的IR表示。在减少总体实现工作量的同时,这也提高了分析和硬件平台的互操作性。我们已经成功地使用CVM为各种平台构建专门的后端,如多核cpu、RDMA集群和云中的无服务器计算基础设施,并期望在不久的将来为更多的前端和硬件平台提供类似的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
The collection Virtual Machine: an abstraction for multi-frontend multi-backend data analysis
Getting the best performance from the ever-increasing number of hardware platforms has been a recurring challenge for data processing systems. In recent years, the advent of data science with its increasingly numerous and complex types of analytics has made this challenge even more difficult. In practice, system designers are overwhelmed by the number of combinations and typically implement a single analytics type on one platform, leading to repeated implementation effort---and a plethora of semi-compatible tools for data scientists. In this paper, we propose the "Collection Virtual Machine" (or CVM)---an extensible compiler framework designed to keep the specialization process of data analytics systems tractable. It can capture at the same time the essence of a large span of low-level, hardware-specific implementation techniques as well as high-level operations of different types of analyses. At its core lies a language for defining nested, collection-oriented intermediate representations (IRs). Frontends produce programs in their IR flavors defined in that language, which get optimized through a series of rewritings (possibly changing the IR flavor multiple times) until the program is finally expressed in an IR of platform-specific operators. While reducing the overall implementation effort, this also improves the interoperability of both analyses and hardware platforms. We have used CVM successfully to build specialized backends for platforms as diverse as multi-core CPUs, RDMA clusters, and serverless computing infrastructure in the cloud and expect similar results for many more frontends and hardware platforms in the near future.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Accelerating re-pair compression using FPGAs Scalable and robust latches for database systems Efficient generation of machine code for query compilers nKV Empirical evaluation across multiple GPU-accelerated DBMSes
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1