MITRACA: A Next-Gen Heterogeneous Architecture

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2019-10-01 DOI:10.1109/MCSoC.2019.00050

Riadh Ben Abdelhamid, Y. Yamaguchi, T. Boku

{"title":"MITRACA: A Next-Gen Heterogeneous Architecture","authors":"Riadh Ben Abdelhamid, Y. Yamaguchi, T. Boku","doi":"10.1109/MCSoC.2019.00050","DOIUrl":null,"url":null,"abstract":"GPU (Graphics Processing Unit) and CPU (Central Processing Unit) possess a sufficient and appropriate performance to compute massively parallel applications like AI, Big data, and material sciences. However, their real performance is far lower than those theoretical ones. The primary reason for the performance degradation is that they suffer from limited memory bandwidth and inefficient interconnection topology not optimized for these types of applications. Thus, from the viewpoint of real computational performance called computational efficiency, FPGA (Field Programmable Gate Array) is now becoming an attractive chip for these types of applications with massively parallel computation. FPGA can efficiently propose optimized communication and bridge different computing accelerators as customized hardware. In other words, FPGA-based hardware accelerators offer a convenient solution for both high performance and high memory bandwidth. However, one serious concern is usability. For example, the FPGA design using hardware description language is a meticulous task and requires specialized skill sets as well as a long time to market. An overlay architecture will become an appropriate candidate that can resolve this issue because it offers a software layer that simplifies FPGA programmability by abstracting the fabric resources. Thus, this article proposes an overlay architecture based on a tightly-connected many-core-based CGRA (Coarse-Grained Reconfigurable Architecture). It will help software engineers on seamlessly implementing their applications. Our final goal is not on the current fine-grained FPGAs but new middle-to-course-grained programmable chips. If an ASIC (Application-Specific Integrated Circuit) implementation was adopted, the performance would achieve at least ten times higher compared with the current FPGA implementation because of the working frequency. In this article, the proposed overlay system provides a programmable interface that virtualizes FPGA resources and let prospected users focus on high-level software programming.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MCSoC.2019.00050","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

GPU (Graphics Processing Unit) and CPU (Central Processing Unit) possess a sufficient and appropriate performance to compute massively parallel applications like AI, Big data, and material sciences. However, their real performance is far lower than those theoretical ones. The primary reason for the performance degradation is that they suffer from limited memory bandwidth and inefficient interconnection topology not optimized for these types of applications. Thus, from the viewpoint of real computational performance called computational efficiency, FPGA (Field Programmable Gate Array) is now becoming an attractive chip for these types of applications with massively parallel computation. FPGA can efficiently propose optimized communication and bridge different computing accelerators as customized hardware. In other words, FPGA-based hardware accelerators offer a convenient solution for both high performance and high memory bandwidth. However, one serious concern is usability. For example, the FPGA design using hardware description language is a meticulous task and requires specialized skill sets as well as a long time to market. An overlay architecture will become an appropriate candidate that can resolve this issue because it offers a software layer that simplifies FPGA programmability by abstracting the fabric resources. Thus, this article proposes an overlay architecture based on a tightly-connected many-core-based CGRA (Coarse-Grained Reconfigurable Architecture). It will help software engineers on seamlessly implementing their applications. Our final goal is not on the current fine-grained FPGAs but new middle-to-course-grained programmable chips. If an ASIC (Application-Specific Integrated Circuit) implementation was adopted, the performance would achieve at least ten times higher compared with the current FPGA implementation because of the working frequency. In this article, the proposed overlay system provides a programmable interface that virtualizes FPGA resources and let prospected users focus on high-level software programming.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MITRACA:下一代异构架构

GPU(图形处理单元)和CPU(中央处理单元)拥有足够和适当的性能来计算大规模并行应用，如人工智能，大数据和材料科学。然而，它们的实际性能远低于理论性能。性能下降的主要原因是它们受到有限的内存带宽和没有针对这些类型的应用程序进行优化的低效互连拓扑的影响。因此，从被称为计算效率的真正计算性能的角度来看，FPGA(现场可编程门阵列)现在成为具有大规模并行计算的这些类型应用的有吸引力的芯片。FPGA可以作为定制硬件有效地提出优化的通信和桥接不同的计算加速器。换句话说，基于fpga的硬件加速器为高性能和高内存带宽提供了方便的解决方案。然而，一个严重的问题是可用性。例如，使用硬件描述语言的FPGA设计是一项细致的任务，需要专业的技能集以及很长的上市时间。覆盖架构将成为解决此问题的合适候选，因为它提供了一个通过抽象结构资源来简化FPGA可编程性的软件层。因此，本文提出了一种基于紧密连接的多核CGRA(粗粒度可重构架构)的覆盖体系结构。它将帮助软件工程师无缝地实现他们的应用程序。我们的最终目标不是当前的细粒度fpga，而是新的中至粗粒度可编程芯片。如果采用ASIC(专用集成电路)实现，由于工作频率的原因，性能将比目前的FPGA实现提高至少十倍。在本文中，提出的覆盖系统提供了一个可编程接口，可以虚拟化FPGA资源，让潜在用户专注于高级软件编程。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

自引率

0.00%

发文量

期刊最新文献

Algorithm to Determine Extended Edit Distance between Program Codes Smart Ontology-Based Event Identification Automatic Generation of Fill-in-the-Blank Programming Problems Prototype of FPGA Dynamic Reconfiguration Based-on Context-Oriented Programming An Efficient Implementation of a TAGE Branch Predictor for Soft Processors on FPGA