Parallel implementation of finite state machines for reducing the latency of stochastic computing

2018 19th International Symposium on Quality Electronic Design (ISQED) Pub Date : 2018-03-13 DOI:10.1109/ISQED.2018.8357309

Cong Ma, D. Lilja

{"title":"Parallel implementation of finite state machines for reducing the latency of stochastic computing","authors":"Cong Ma, D. Lilja","doi":"10.1109/ISQED.2018.8357309","DOIUrl":null,"url":null,"abstract":"Stochastic computing, which employs random bit streams for computations, has shown low hardware cost and high fault-tolerance compared to the computations using a conventional binary encoding. Finite state machine (FSM) based stochastic computing elements can compute complex functions, such as the exponentiation and hyperbolic tangent functions, more efficiently than those using combinational logic. However, the FSM, as a sequential logic, cannot be directly implemented in parallel like the combinational logic, so reducing the long latency of the calculation becomes difficult. Applications in the relatively higher frequency domain would require an extremely fast clock rate using FSM. This paper proposes a parallel implementation of the FSM, using an estimator and a dispatcher to directly initialize the FSM to the steady state. Experimental results show that the outputs of four typical functions using the parallel implementation are very close to those of the serial version. The parallel FSM scheme further shows equivalent or better image quality than the serial implementation in two image processing applications Edge Detection and Frame Difference.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 19th International Symposium on Quality Electronic Design (ISQED)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISQED.2018.8357309","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Stochastic computing, which employs random bit streams for computations, has shown low hardware cost and high fault-tolerance compared to the computations using a conventional binary encoding. Finite state machine (FSM) based stochastic computing elements can compute complex functions, such as the exponentiation and hyperbolic tangent functions, more efficiently than those using combinational logic. However, the FSM, as a sequential logic, cannot be directly implemented in parallel like the combinational logic, so reducing the long latency of the calculation becomes difficult. Applications in the relatively higher frequency domain would require an extremely fast clock rate using FSM. This paper proposes a parallel implementation of the FSM, using an estimator and a dispatcher to directly initialize the FSM to the steady state. Experimental results show that the outputs of four typical functions using the parallel implementation are very close to those of the serial version. The parallel FSM scheme further shows equivalent or better image quality than the serial implementation in two image processing applications Edge Detection and Frame Difference.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

减少随机计算延迟的有限状态机并行实现

随机计算采用随机比特流进行计算，与传统二进制编码计算相比，具有低硬件成本和高容错性的特点。基于有限状态机(FSM)的随机计算单元可以比使用组合逻辑的随机计算单元更有效地计算复杂函数，如幂函数和双曲正切函数。然而，FSM作为一种顺序逻辑，不能像组合逻辑那样直接并行实现，因此降低计算的长延迟变得困难。在相对较高频率域的应用程序将需要使用FSM的极快时钟速率。本文提出了一种FSM的并行实现方法，利用估计器和调度器直接将FSM初始化为稳态。实验结果表明，采用并行实现的四个典型函数的输出与串行版本的输出非常接近。在边缘检测和帧差两种图像处理应用中，并行FSM方案进一步显示出与串行实现相同或更好的图像质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2018 19th International Symposium on Quality Electronic Design (ISQED)

自引率

0.00%

发文量

期刊最新文献

Body-biasing assisted vmin optimization for 5nm-node multi-Vt FD-SOI 6T-SRAM PDA-HyPAR: Path-diversity-aware hybrid planar adaptive routing algorithm for 3D NoCs A loop structure optimization targeting high-level synthesis of fast number theoretic transform Hybrid-comp: A criticality-aware compressed last-level cache Low power latch based design with smart retiming