Reconfigurable Accelerator Compute Hierarchy: A Case Study using Content-Based Image Retrieval

2020 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2020-10-01 DOI:10.1109/IISWC50251.2020.00034

Nazanin Farahpour, Y. Hao, Zhenman Fang, Glenn D. Reinman

{"title":"Reconfigurable Accelerator Compute Hierarchy: A Case Study using Content-Based Image Retrieval","authors":"Nazanin Farahpour, Y. Hao, Zhenman Fang, Glenn D. Reinman","doi":"10.1109/IISWC50251.2020.00034","DOIUrl":null,"url":null,"abstract":"The recent adoption of reconfigurable hardware accelerators in data centers has significantly improved their computational power and energy efficiency for compute-intensive applications. However, for common communication-bound analytics workloads, these benefits are limited by the efficiency of data movement in the IO stack. For this reason, server architects are proposing a more data-centric acceleration scheme by moving the compute elements closer to the data. While prior studies focus on the benefits of Near Data Processing (NDP) solely on one level of the memory hierarchy (one of cache, main memory or storage), we focus on the collaboration of NDP accelerators at all levels and their collective benefits in accelerating an application pipeline. In this paper, we present a Reconfigurable Accelerator Compute Hierarchy (ReACH) that combines on-chip, near-memory, and near-storage accelerators. Each memory level has a reconfigurable accelerator chip attached to it, which provides distinct compute and memory capabilities and offers a broad spectrum of acceleration options. To enable effective acceleration on various application pipelines, we propose a holistic approach to coordinate between the compute levels, reducing inter-level data access interference and achieving asynchronous task flow control. To minimize the programming efforts of using the compute hierarchy, a uniform programming interface is designed to decouple the ReACH configuration from the user application source code and allow runtime adjustments without modifying the deployed application. We experimentally deploy a billion-scale Content-Based Image Retrieval (CBIR) system on ReACH. Simulation results demonstrate that a proper application mapping eliminates unnecessary data movement, and ReACH achieves 4.5x throughput gain while reducing energy consumption by 52% compared to conventional on-chip acceleration.","PeriodicalId":365983,"journal":{"name":"2020 IEEE International Symposium on Workload Characterization (IISWC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Symposium on Workload Characterization (IISWC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISWC50251.2020.00034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The recent adoption of reconfigurable hardware accelerators in data centers has significantly improved their computational power and energy efficiency for compute-intensive applications. However, for common communication-bound analytics workloads, these benefits are limited by the efficiency of data movement in the IO stack. For this reason, server architects are proposing a more data-centric acceleration scheme by moving the compute elements closer to the data. While prior studies focus on the benefits of Near Data Processing (NDP) solely on one level of the memory hierarchy (one of cache, main memory or storage), we focus on the collaboration of NDP accelerators at all levels and their collective benefits in accelerating an application pipeline. In this paper, we present a Reconfigurable Accelerator Compute Hierarchy (ReACH) that combines on-chip, near-memory, and near-storage accelerators. Each memory level has a reconfigurable accelerator chip attached to it, which provides distinct compute and memory capabilities and offers a broad spectrum of acceleration options. To enable effective acceleration on various application pipelines, we propose a holistic approach to coordinate between the compute levels, reducing inter-level data access interference and achieving asynchronous task flow control. To minimize the programming efforts of using the compute hierarchy, a uniform programming interface is designed to decouple the ReACH configuration from the user application source code and allow runtime adjustments without modifying the deployed application. We experimentally deploy a billion-scale Content-Based Image Retrieval (CBIR) system on ReACH. Simulation results demonstrate that a proper application mapping eliminates unnecessary data movement, and ReACH achieves 4.5x throughput gain while reducing energy consumption by 52% compared to conventional on-chip acceleration.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

可重构加速器计算层次:使用基于内容的图像检索的案例研究

最近在数据中心中采用的可重构硬件加速器显著提高了计算密集型应用程序的计算能力和能源效率。然而，对于常见的通信绑定分析工作负载，这些好处受到IO堆栈中数据移动效率的限制。出于这个原因，服务器架构师提出了一种更加以数据为中心的加速方案，将计算元素移动到更靠近数据的位置。虽然以前的研究只关注近数据处理(NDP)在内存层次结构的一个级别(缓存，主存或存储之一)的好处，但我们关注的是NDP加速器在所有级别的协作以及它们在加速应用程序管道方面的集体好处。在本文中，我们提出了一个可重构加速器计算层次结构(Reconfigurable Accelerator Compute Hierarchy，简称ReACH)，它结合了片上、近内存和近存储加速器。每个内存级别都有一个可重新配置的加速器芯片，它提供了独特的计算和内存能力，并提供了广泛的加速选项。为了在各种应用程序管道上实现有效的加速，我们提出了一种整体方法来协调计算层之间的协调，减少层间数据访问干扰并实现异步任务流控制。为了最大限度地减少使用计算层次结构的编程工作，设计了一个统一的编程接口来将ReACH配置与用户应用程序源代码解耦，并允许在不修改已部署应用程序的情况下进行运行时调整。我们实验部署了一个十亿规模的基于内容的图像检索(CBIR)系统。仿真结果表明，适当的应用映射消除了不必要的数据移动，与传统的片上加速相比，ReACH实现了4.5倍的吞吐量增益，同时降低了52%的能耗。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 IEEE International Symposium on Workload Characterization (IISWC)

自引率

0.00%

发文量

期刊最新文献

Organizing Committee : IISWC 2020 Characterizing the impact of last-level cache replacement policies on big-data workloads AI on the Edge: Characterizing AI-based IoT Applications Using Specialized Edge Architectures Empirical Analysis and Modeling of Compute Times of CNN Operations on AWS Cloud Reliability Modeling of NISQ- Era Quantum Computers