Incrementalization of Vertex-Centric Programs

Timothy A. K. Zakian, L. Capelli, Zhenjiang Hu
{"title":"Incrementalization of Vertex-Centric Programs","authors":"Timothy A. K. Zakian, L. Capelli, Zhenjiang Hu","doi":"10.1109/IPDPS.2019.00109","DOIUrl":null,"url":null,"abstract":"As the graphs in our world become ever larger, the need for programmable, easy to use, and highly scalable graph processing has become ever greater. One such popular graph processing model—the vertex-centric computational model—does precisely this by distributing computations across the vertices of the graph being computed over. Due to this distribution of the program to the vertices of the graph, the programmer \"thinks like a vertex\" when writing their graph computation, with limited to no sense of shared memory and where almost all communication between each on-vertex computation must be sent over the network. Because of this inherent communication overhead in the computational model, reducing the number of messages sent while performing a given computation is a central aspect of any efforts to optimize vertex-centric programs. While previous work has focused on reducing communication overhead by directly changing communication patterns—by altering the way the graph is partitioned and distributed, or by altering the graph topology itself—in this paper we present a different optimization strategy based on a family of complementary compile-time program transformations in order to minimize communication overhead by changing both the messaging and computational structures of programs. Particularly, we present and formalize a method by which a compiler can automatically incrementalize a vertex-centric program through a series of compile-time program transformations—by modifying the on-vertex computation and messaging between vertices so that messages between vertices represent patches to be applied to the other vertex's local state. We empirically evaluate these transformations on a set of common vertex-centric algorithms and graphs and achieve an average reduction of 2.7X in total computational time, and 2.9X in the number of messages sent across all programs in the benchmark suite. Furthermore, since these are compile-time program transformations alone, other prior optimization strategies for vertex-centric programs can work with the resulting vertex-centric program just as they would a non-incrementalized program.","PeriodicalId":403406,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2019.00109","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

As the graphs in our world become ever larger, the need for programmable, easy to use, and highly scalable graph processing has become ever greater. One such popular graph processing model—the vertex-centric computational model—does precisely this by distributing computations across the vertices of the graph being computed over. Due to this distribution of the program to the vertices of the graph, the programmer "thinks like a vertex" when writing their graph computation, with limited to no sense of shared memory and where almost all communication between each on-vertex computation must be sent over the network. Because of this inherent communication overhead in the computational model, reducing the number of messages sent while performing a given computation is a central aspect of any efforts to optimize vertex-centric programs. While previous work has focused on reducing communication overhead by directly changing communication patterns—by altering the way the graph is partitioned and distributed, or by altering the graph topology itself—in this paper we present a different optimization strategy based on a family of complementary compile-time program transformations in order to minimize communication overhead by changing both the messaging and computational structures of programs. Particularly, we present and formalize a method by which a compiler can automatically incrementalize a vertex-centric program through a series of compile-time program transformations—by modifying the on-vertex computation and messaging between vertices so that messages between vertices represent patches to be applied to the other vertex's local state. We empirically evaluate these transformations on a set of common vertex-centric algorithms and graphs and achieve an average reduction of 2.7X in total computational time, and 2.9X in the number of messages sent across all programs in the benchmark suite. Furthermore, since these are compile-time program transformations alone, other prior optimization strategies for vertex-centric programs can work with the resulting vertex-centric program just as they would a non-incrementalized program.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
以顶点为中心的程序的增量化
随着我们世界中的图形变得越来越大,对可编程、易于使用和高度可扩展的图形处理的需求也越来越大。一种流行的图处理模型——以顶点为中心的计算模型——通过在被计算的图的顶点之间分配计算来精确地做到这一点。由于程序分布在图的顶点上,程序员在编写图计算时“像一个顶点一样思考”,限制或没有共享内存,并且每个顶点计算之间的几乎所有通信都必须通过网络发送。由于计算模型中存在这种固有的通信开销,因此在执行给定计算时减少发送的消息数量是优化以顶点为中心的程序的一个核心方面。虽然以前的工作主要集中在通过直接改变通信模式来减少通信开销——通过改变图的划分和分布方式,或者通过改变图的拓扑结构本身——在本文中,我们提出了一种不同的优化策略,该策略基于一系列互补的编译时程序转换,以便通过改变程序的消息传递和计算结构来最小化通信开销。特别是,我们提出并形式化了一种方法,通过修改顶点上的计算和顶点之间的消息传递,编译器可以通过一系列编译时程序转换来自动增量化以顶点为中心的程序,从而使顶点之间的消息表示将应用于另一个顶点的局部状态的补丁。我们在一组常见的以顶点为中心的算法和图上对这些转换进行了经验评估,并在总计算时间上平均减少了2.7倍,在基准套件中的所有程序之间发送的消息数量减少了2.9倍。此外,由于这些仅仅是编译时的程序转换,因此针对以顶点为中心的程序的其他先前优化策略可以处理最终的以顶点为中心的程序,就像它们处理非增量化的程序一样。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Distributed Weighted All Pairs Shortest Paths Through Pipelining SAFIRE: Scalable and Accurate Fault Injection for Parallel Multithreaded Applications Architecting Racetrack Memory Preshift through Pattern-Based Prediction Mechanisms Z-Dedup:A Case for Deduplicating Compressed Contents in Cloud Dual Pattern Compression Using Data-Preprocessing for Large-Scale GPU Architectures
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1