A FPGA Friendly Approximate Computing Framework with Hybrid Neural Networks: (Abstract Only)

Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2018-02-15 DOI:10.1145/3174243.3174965

Haiyue Song, Xiang Song, Tianjian Li, Hao Dong, Naifeng Jing, Xiaoyao Liang, Li Jiang

{"title":"A FPGA Friendly Approximate Computing Framework with Hybrid Neural Networks: (Abstract Only)","authors":"Haiyue Song, Xiang Song, Tianjian Li, Hao Dong, Naifeng Jing, Xiaoyao Liang, Li Jiang","doi":"10.1145/3174243.3174965","DOIUrl":null,"url":null,"abstract":"Neural approximate computing is promising to gain energy-efficiency at the cost of tolerable quality loss. The architecture contains two neural networks: the approximate accelerator generates approximate results while the classifier determines whether input data can be safely approximated. However, they are not compatible to a heterogeneous computing platform, due to the large communication overhead between the approximate accelerator and accurate cores, and the large speed gap between them. This paper proposes a software-hardware co-design strategy. With deep exploration of data distributions in the feature space, we first propose a novel approximate computing architecture containing a multi-class classifier and multiple approximate accelerator; this architecture, derived by the existing iterative co-training methods, can shift more data from accurate computation (in CPU) to approximate accelerator (in FPGA); the increased invocation of the approximate accelerator thus can yield higher utilization of the FPGA-based accelerator, resulting in the enhanced the performance. Moreover, much less input data is redistributed, by the classifier (also in FPGA), back to CPU, which can minimize the CPU-FPGA communication. Second, we design a pipelined data-path with batched input/output for the proposed hybrid architecture to efficiently hide the communication latency. A mask technique is proposed to decouple the synchronization between CPU and FPGA, in order to minimize the frequency of communication.","PeriodicalId":164936,"journal":{"name":"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"156 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3174243.3174965","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Neural approximate computing is promising to gain energy-efficiency at the cost of tolerable quality loss. The architecture contains two neural networks: the approximate accelerator generates approximate results while the classifier determines whether input data can be safely approximated. However, they are not compatible to a heterogeneous computing platform, due to the large communication overhead between the approximate accelerator and accurate cores, and the large speed gap between them. This paper proposes a software-hardware co-design strategy. With deep exploration of data distributions in the feature space, we first propose a novel approximate computing architecture containing a multi-class classifier and multiple approximate accelerator; this architecture, derived by the existing iterative co-training methods, can shift more data from accurate computation (in CPU) to approximate accelerator (in FPGA); the increased invocation of the approximate accelerator thus can yield higher utilization of the FPGA-based accelerator, resulting in the enhanced the performance. Moreover, much less input data is redistributed, by the classifier (also in FPGA), back to CPU, which can minimize the CPU-FPGA communication. Second, we design a pipelined data-path with batched input/output for the proposed hybrid architecture to efficiently hide the communication latency. A mask technique is proposed to decouple the synchronization between CPU and FPGA, in order to minimize the frequency of communication.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

一种FPGA友好的混合神经网络近似计算框架(摘要)

神经近似计算有望以可容忍的质量损失为代价获得能源效率。该体系结构包含两个神经网络:近似加速器生成近似结果，分类器确定输入数据是否可以安全近似。然而，由于近似加速器和精确内核之间的通信开销很大，并且它们之间的速度差距很大，因此它们不兼容于异构计算平台。本文提出了一种软硬件协同设计策略。通过对特征空间中数据分布的深入探索，我们首先提出了一种包含多类分类器和多个近似加速器的新型近似计算架构;该架构由现有的迭代协同训练方法衍生而来，可以将更多的数据从精确计算(在CPU上)转移到近似加速(在FPGA上);因此，增加近似加速器的调用可以提高基于fpga的加速器的利用率，从而提高性能。此外，通过分类器(也在FPGA中)将更少的输入数据重新分配回CPU，这可以最大限度地减少CPU-FPGA通信。其次，我们为所提出的混合架构设计了一个具有批处理输入/输出的管道数据路径，以有效地隐藏通信延迟。为了降低通信频率，提出了一种掩码技术来解耦CPU和FPGA之间的同步。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

自引率

0.00%

发文量

期刊最新文献

Architecture and Circuit Design of an All-Spintronic FPGA Session details: Session 6: High Level Synthesis 2 A FPGA Friendly Approximate Computing Framework with Hybrid Neural Networks: (Abstract Only) Software/Hardware Co-design for Multichannel Scheduling in IEEE 802.11p MLME: (Abstract Only) Session details: Special Session: Deep Learning