Reconfigurable Accelerator Compute Hierarchy: A Case Study using Content-Based Image Retrieval

Nazanin Farahpour, Y. Hao, Zhenman Fang, Glenn D. Reinman
{"title":"Reconfigurable Accelerator Compute Hierarchy: A Case Study using Content-Based Image Retrieval","authors":"Nazanin Farahpour, Y. Hao, Zhenman Fang, Glenn D. Reinman","doi":"10.1109/IISWC50251.2020.00034","DOIUrl":null,"url":null,"abstract":"The recent adoption of reconfigurable hardware accelerators in data centers has significantly improved their computational power and energy efficiency for compute-intensive applications. However, for common communication-bound analytics workloads, these benefits are limited by the efficiency of data movement in the IO stack. For this reason, server architects are proposing a more data-centric acceleration scheme by moving the compute elements closer to the data. While prior studies focus on the benefits of Near Data Processing (NDP) solely on one level of the memory hierarchy (one of cache, main memory or storage), we focus on the collaboration of NDP accelerators at all levels and their collective benefits in accelerating an application pipeline. In this paper, we present a Reconfigurable Accelerator Compute Hierarchy (ReACH) that combines on-chip, near-memory, and near-storage accelerators. Each memory level has a reconfigurable accelerator chip attached to it, which provides distinct compute and memory capabilities and offers a broad spectrum of acceleration options. To enable effective acceleration on various application pipelines, we propose a holistic approach to coordinate between the compute levels, reducing inter-level data access interference and achieving asynchronous task flow control. To minimize the programming efforts of using the compute hierarchy, a uniform programming interface is designed to decouple the ReACH configuration from the user application source code and allow runtime adjustments without modifying the deployed application. We experimentally deploy a billion-scale Content-Based Image Retrieval (CBIR) system on ReACH. Simulation results demonstrate that a proper application mapping eliminates unnecessary data movement, and ReACH achieves 4.5x throughput gain while reducing energy consumption by 52% compared to conventional on-chip acceleration.","PeriodicalId":365983,"journal":{"name":"2020 IEEE International Symposium on Workload Characterization (IISWC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Symposium on Workload Characterization (IISWC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISWC50251.2020.00034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The recent adoption of reconfigurable hardware accelerators in data centers has significantly improved their computational power and energy efficiency for compute-intensive applications. However, for common communication-bound analytics workloads, these benefits are limited by the efficiency of data movement in the IO stack. For this reason, server architects are proposing a more data-centric acceleration scheme by moving the compute elements closer to the data. While prior studies focus on the benefits of Near Data Processing (NDP) solely on one level of the memory hierarchy (one of cache, main memory or storage), we focus on the collaboration of NDP accelerators at all levels and their collective benefits in accelerating an application pipeline. In this paper, we present a Reconfigurable Accelerator Compute Hierarchy (ReACH) that combines on-chip, near-memory, and near-storage accelerators. Each memory level has a reconfigurable accelerator chip attached to it, which provides distinct compute and memory capabilities and offers a broad spectrum of acceleration options. To enable effective acceleration on various application pipelines, we propose a holistic approach to coordinate between the compute levels, reducing inter-level data access interference and achieving asynchronous task flow control. To minimize the programming efforts of using the compute hierarchy, a uniform programming interface is designed to decouple the ReACH configuration from the user application source code and allow runtime adjustments without modifying the deployed application. We experimentally deploy a billion-scale Content-Based Image Retrieval (CBIR) system on ReACH. Simulation results demonstrate that a proper application mapping eliminates unnecessary data movement, and ReACH achieves 4.5x throughput gain while reducing energy consumption by 52% compared to conventional on-chip acceleration.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
可重构加速器计算层次:使用基于内容的图像检索的案例研究
最近在数据中心中采用的可重构硬件加速器显著提高了计算密集型应用程序的计算能力和能源效率。然而,对于常见的通信绑定分析工作负载,这些好处受到IO堆栈中数据移动效率的限制。出于这个原因,服务器架构师提出了一种更加以数据为中心的加速方案,将计算元素移动到更靠近数据的位置。虽然以前的研究只关注近数据处理(NDP)在内存层次结构的一个级别(缓存,主存或存储之一)的好处,但我们关注的是NDP加速器在所有级别的协作以及它们在加速应用程序管道方面的集体好处。在本文中,我们提出了一个可重构加速器计算层次结构(Reconfigurable Accelerator Compute Hierarchy,简称ReACH),它结合了片上、近内存和近存储加速器。每个内存级别都有一个可重新配置的加速器芯片,它提供了独特的计算和内存能力,并提供了广泛的加速选项。为了在各种应用程序管道上实现有效的加速,我们提出了一种整体方法来协调计算层之间的协调,减少层间数据访问干扰并实现异步任务流控制。为了最大限度地减少使用计算层次结构的编程工作,设计了一个统一的编程接口来将ReACH配置与用户应用程序源代码解耦,并允许在不修改已部署应用程序的情况下进行运行时调整。我们实验部署了一个十亿规模的基于内容的图像检索(CBIR)系统。仿真结果表明,适当的应用映射消除了不必要的数据移动,与传统的片上加速相比,ReACH实现了4.5倍的吞吐量增益,同时降低了52%的能耗。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Organizing Committee : IISWC 2020 Characterizing the impact of last-level cache replacement policies on big-data workloads AI on the Edge: Characterizing AI-based IoT Applications Using Specialized Edge Architectures Empirical Analysis and Modeling of Compute Times of CNN Operations on AWS Cloud Reliability Modeling of NISQ- Era Quantum Computers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1