PSACS:计算存储上的高度并行Shuffle加速器

2021 IEEE 39th International Conference on Computer Design (ICCD) Pub Date : 2021-10-01 DOI:10.1109/iccd53106.2021.00080

Chen Zou, Hui Zhang, A. Chien, Y. Ki

{"title":"PSACS:计算存储上的高度并行Shuffle加速器","authors":"Chen Zou, Hui Zhang, A. Chien, Y. Ki","doi":"10.1109/iccd53106.2021.00080","DOIUrl":null,"url":null,"abstract":"Shuffle is an indispensable process in distributed online analytical processing systems to enable task-level parallelism exploitation via multiple nodes. As a data-intensive data reorganization process, shuffle implemented on general-purpose CPUs not only incurs data traffic back and forth between the computing and storage resources, but also pollutes the cache hierarchy with almost zero data reuse. As a result, shuffle can easily become the bottleneck of distributed analysis pipelines.Our PSACS approach attacks these bottlenecks with the rising computational storage paradigm. Shuffle is offloaded to the storage-side PSACS accelerator to avoid polluting computing node memory hierarchy and enjoy the latency, bandwidth and energy benefits of near-data computing. Further, the microarchitecture of PSACS exploits data-, subtask-, and task-level parallelism for high performance and a customized scratchpad for fast on-chip random access.PSACS achieves 4.6x—5.7x shuffle throughput at kernel-level and up to 1.3x overall shuffle throughput with only a twentieth of CPU utilization comparing to software baselines. These mount up to 23% end-to-end OLAP query speedup on average.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"PSACS: Highly-Parallel Shuffle Accelerator on Computational Storage\",\"authors\":\"Chen Zou, Hui Zhang, A. Chien, Y. Ki\",\"doi\":\"10.1109/iccd53106.2021.00080\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Shuffle is an indispensable process in distributed online analytical processing systems to enable task-level parallelism exploitation via multiple nodes. As a data-intensive data reorganization process, shuffle implemented on general-purpose CPUs not only incurs data traffic back and forth between the computing and storage resources, but also pollutes the cache hierarchy with almost zero data reuse. As a result, shuffle can easily become the bottleneck of distributed analysis pipelines.Our PSACS approach attacks these bottlenecks with the rising computational storage paradigm. Shuffle is offloaded to the storage-side PSACS accelerator to avoid polluting computing node memory hierarchy and enjoy the latency, bandwidth and energy benefits of near-data computing. Further, the microarchitecture of PSACS exploits data-, subtask-, and task-level parallelism for high performance and a customized scratchpad for fast on-chip random access.PSACS achieves 4.6x—5.7x shuffle throughput at kernel-level and up to 1.3x overall shuffle throughput with only a twentieth of CPU utilization comparing to software baselines. These mount up to 23% end-to-end OLAP query speedup on average.\",\"PeriodicalId\":154014,\"journal\":{\"name\":\"2021 IEEE 39th International Conference on Computer Design (ICCD)\",\"volume\":\"120 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 39th International Conference on Computer Design (ICCD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/iccd53106.2021.00080\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 39th International Conference on Computer Design (ICCD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iccd53106.2021.00080","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

Shuffle是分布式在线分析处理系统中不可缺少的过程，可以通过多个节点实现任务级并行性。shuffle作为一种数据密集型的数据重组过程，在通用cpu上实现的shuffle不仅会在计算资源和存储资源之间产生来回的数据流量，而且会导致数据几乎为零的重用，从而污染缓存层次结构。因此，shuffle很容易成为分布式分析管道的瓶颈。我们的PSACS方法通过不断发展的计算存储范式来解决这些瓶颈。Shuffle被卸载到存储端PSACS加速器，以避免污染计算节点的内存层次结构，并享受近数据计算的延迟、带宽和能源优势。此外，PSACS的微架构利用数据级、子任务级和任务级并行性来实现高性能，并利用定制的刮擦板来实现快速片上随机访问。PSACS在内核级实现4.6 - 5.7倍的shuffle吞吐量和1.3倍的shuffle吞吐量，而CPU利用率仅为软件基准的二十分之一。端到端OLAP查询平均加速高达23%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

PSACS: Highly-Parallel Shuffle Accelerator on Computational Storage

Shuffle is an indispensable process in distributed online analytical processing systems to enable task-level parallelism exploitation via multiple nodes. As a data-intensive data reorganization process, shuffle implemented on general-purpose CPUs not only incurs data traffic back and forth between the computing and storage resources, but also pollutes the cache hierarchy with almost zero data reuse. As a result, shuffle can easily become the bottleneck of distributed analysis pipelines.Our PSACS approach attacks these bottlenecks with the rising computational storage paradigm. Shuffle is offloaded to the storage-side PSACS accelerator to avoid polluting computing node memory hierarchy and enjoy the latency, bandwidth and energy benefits of near-data computing. Further, the microarchitecture of PSACS exploits data-, subtask-, and task-level parallelism for high performance and a customized scratchpad for fast on-chip random access.PSACS achieves 4.6x—5.7x shuffle throughput at kernel-level and up to 1.3x overall shuffle throughput with only a twentieth of CPU utilization comparing to software baselines. These mount up to 23% end-to-end OLAP query speedup on average.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE 39th International Conference on Computer Design (ICCD)

自引率

0.00%

发文量