Hongkuan Zhou, Ajitesh Srivastava, R. Kannan, V. Prasanna
{"title":"Design and Implementation of Knowledge Base for Runtime Management of Software Defined Hardware","authors":"Hongkuan Zhou, Ajitesh Srivastava, R. Kannan, V. Prasanna","doi":"10.1109/HPEC.2019.8916328","DOIUrl":null,"url":null,"abstract":"PageRank is a fundamental graph algorithm to evaluate the importance of vertices in a graph. In this paper, we present an efficient parallel PageRank design based on an edge-centric scatter-gather model. To overcome the poor locality of PageRank and optimize the memory performance, we develop a fast and efficient partitioning technique. We first partition all the vertices into non-overlapping vertex sets such that the data of each vertex set can fit in the cache; then we sort the outgoing edges of each vertex set based on the destination vertices to minimize random memory writes. The partitioning technique significantly reduces random accesses to main memory and improves the sustained memory bandwidth by 3×. It also enables efficient parallel execution on multicore platforms; we use distinct cores to execute the computations of distinct vertex sets in parallel to achieve speedup. We implement our design on a 16-core Intel Xeon processor and use various large-scale real-life and synthetic datasets for evaluation. Compared with the PageRank Pipeline Benchmark, our design achieves 12× to 19× speedup for all the datasets.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC.2019.8916328","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18
Abstract
PageRank is a fundamental graph algorithm to evaluate the importance of vertices in a graph. In this paper, we present an efficient parallel PageRank design based on an edge-centric scatter-gather model. To overcome the poor locality of PageRank and optimize the memory performance, we develop a fast and efficient partitioning technique. We first partition all the vertices into non-overlapping vertex sets such that the data of each vertex set can fit in the cache; then we sort the outgoing edges of each vertex set based on the destination vertices to minimize random memory writes. The partitioning technique significantly reduces random accesses to main memory and improves the sustained memory bandwidth by 3×. It also enables efficient parallel execution on multicore platforms; we use distinct cores to execute the computations of distinct vertex sets in parallel to achieve speedup. We implement our design on a 16-core Intel Xeon processor and use various large-scale real-life and synthetic datasets for evaluation. Compared with the PageRank Pipeline Benchmark, our design achieves 12× to 19× speedup for all the datasets.
运行时可重新配置的软件与可重新配置的硬件相结合是非常可取的,因为这是在不损害可编程性的情况下最大化运行时效率的一种手段。这类软件系统的编译器设计起来极其困难,因为它们必须在运行时利用不同类型的硬件。为了解决与动态可重构硬件相匹配的工作流的静态和动态编译器优化的需要,我们提出了一种针对软件定义硬件的动态软件编译器的中心组件的新设计。我们的综合设计不仅关注静态知识,还关注从程序执行中提取知识的半监督式提取,并开发其性能模型。具体来说,我们的新动态和可扩展知识库1)在工作流执行期间持续收集知识2)在最佳(可用)硬件配置上确定工作流的最佳实现。它在存储来自编译器的其他组件以及人工分析人员的信息并向其提供信息方面起着中心作用。通过丰富的三部分图表示,知识库捕获并学习了有关分解和将代码步骤映射到内核以及将内核映射到可用硬件配置的广泛信息。该知识库使用$ c++ $ Boost库实现,能够快速处理离线和在线查询和更新。我们展示了我们的知识库可以在$1 ms$内回答查询,而不管它存储的工作流的数量。据我们所知,这是支持高级语言编译以利用任意可重构平台的第一个动态和可扩展知识库的设计。