{"title":"A FPGA Friendly Approximate Computing Framework with Hybrid Neural Networks: (Abstract Only)","authors":"Haiyue Song, Xiang Song, Tianjian Li, Hao Dong, Naifeng Jing, Xiaoyao Liang, Li Jiang","doi":"10.1145/3174243.3174965","DOIUrl":null,"url":null,"abstract":"Neural approximate computing is promising to gain energy-efficiency at the cost of tolerable quality loss. The architecture contains two neural networks: the approximate accelerator generates approximate results while the classifier determines whether input data can be safely approximated. However, they are not compatible to a heterogeneous computing platform, due to the large communication overhead between the approximate accelerator and accurate cores, and the large speed gap between them. This paper proposes a software-hardware co-design strategy. With deep exploration of data distributions in the feature space, we first propose a novel approximate computing architecture containing a multi-class classifier and multiple approximate accelerator; this architecture, derived by the existing iterative co-training methods, can shift more data from accurate computation (in CPU) to approximate accelerator (in FPGA); the increased invocation of the approximate accelerator thus can yield higher utilization of the FPGA-based accelerator, resulting in the enhanced the performance. Moreover, much less input data is redistributed, by the classifier (also in FPGA), back to CPU, which can minimize the CPU-FPGA communication. Second, we design a pipelined data-path with batched input/output for the proposed hybrid architecture to efficiently hide the communication latency. A mask technique is proposed to decouple the synchronization between CPU and FPGA, in order to minimize the frequency of communication.","PeriodicalId":164936,"journal":{"name":"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"156 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3174243.3174965","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Neural approximate computing is promising to gain energy-efficiency at the cost of tolerable quality loss. The architecture contains two neural networks: the approximate accelerator generates approximate results while the classifier determines whether input data can be safely approximated. However, they are not compatible to a heterogeneous computing platform, due to the large communication overhead between the approximate accelerator and accurate cores, and the large speed gap between them. This paper proposes a software-hardware co-design strategy. With deep exploration of data distributions in the feature space, we first propose a novel approximate computing architecture containing a multi-class classifier and multiple approximate accelerator; this architecture, derived by the existing iterative co-training methods, can shift more data from accurate computation (in CPU) to approximate accelerator (in FPGA); the increased invocation of the approximate accelerator thus can yield higher utilization of the FPGA-based accelerator, resulting in the enhanced the performance. Moreover, much less input data is redistributed, by the classifier (also in FPGA), back to CPU, which can minimize the CPU-FPGA communication. Second, we design a pipelined data-path with batched input/output for the proposed hybrid architecture to efficiently hide the communication latency. A mask technique is proposed to decouple the synchronization between CPU and FPGA, in order to minimize the frequency of communication.