{"title":"SC-SD: Towards Low Power Stochastic Computing Using Sigma Delta Streams","authors":"Patricia Gonzalez-Guerrero, Xinfei Guo, M. Stan","doi":"10.1109/ICRC.2018.8638611","DOIUrl":null,"url":null,"abstract":"Processing data using Stochastic Computing (SC) requires only $\\sim$ 7% of the area and power of the typical binary approach. However, SC has two major drawbacks that eclipse any area and power savings. First, it takes $sim$ 99% more time to finish a computation when compared with the binary approach, since data is represented as streams of bits. Second, the Linear Feedback Shift Registers (LFSRs) required to generate the stochastic streams increment the power and area of the overall SC-LFSR system. These drawbacks result in similar or higher area, power, and energy numbers when compared with the binary counterpart. In this work, we address these drawbacks by applying SC directly on Pulse Density Modulated (PDM) streams. Most modern Systems on Chip (SoCs) already include Analog to Digital Converters (ADCs). The core of $\\Sigma\\Delta$ -ADCs is the $\\Sigma\\Delta$ Modulator whose output is a PDM stream. Our approach (SC-SD) simplifies the system hardware in two ways. First, we drop the filter stage at the ADC and, second, we replace the costly Stochastic Number Generators (SNGs) with $\\Sigma\\Delta$ -Modulators. To further lower the system complexity, we adopt an Asynchronous $\\Sigma\\Delta$ -Modulator $(\\mathrm{A}\\Sigma\\Delta \\mathrm{M})$ architecture. We design and simulate the $\\mathrm{A}\\Sigma\\Delta \\mathrm{M}$: using an industry-standard 1×FinFET11In modern technologies the node number does not refer to any one feature in the process, and foundries use slightly different conventions; we use 1x to denote the 14/16nm FinFET nodes offered by the foundry. technology with foundry models. We achieve power savings of 81 % in SNG compared to the LFSR approach. To evaluate how this area and power savings scale to more complex applications, we implement Gamma Correction, a popular image processing algorithm. For this application, our simulations show that SC-SD can save 98%-11% in the total system latency and 50%-38% in power consumption when compared with the SC-LFSR approach or the binary counterpart.","PeriodicalId":169413,"journal":{"name":"2018 IEEE International Conference on Rebooting Computing (ICRC)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Rebooting Computing (ICRC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRC.2018.8638611","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
Processing data using Stochastic Computing (SC) requires only $\sim$ 7% of the area and power of the typical binary approach. However, SC has two major drawbacks that eclipse any area and power savings. First, it takes $sim$ 99% more time to finish a computation when compared with the binary approach, since data is represented as streams of bits. Second, the Linear Feedback Shift Registers (LFSRs) required to generate the stochastic streams increment the power and area of the overall SC-LFSR system. These drawbacks result in similar or higher area, power, and energy numbers when compared with the binary counterpart. In this work, we address these drawbacks by applying SC directly on Pulse Density Modulated (PDM) streams. Most modern Systems on Chip (SoCs) already include Analog to Digital Converters (ADCs). The core of $\Sigma\Delta$ -ADCs is the $\Sigma\Delta$ Modulator whose output is a PDM stream. Our approach (SC-SD) simplifies the system hardware in two ways. First, we drop the filter stage at the ADC and, second, we replace the costly Stochastic Number Generators (SNGs) with $\Sigma\Delta$ -Modulators. To further lower the system complexity, we adopt an Asynchronous $\Sigma\Delta$ -Modulator $(\mathrm{A}\Sigma\Delta \mathrm{M})$ architecture. We design and simulate the $\mathrm{A}\Sigma\Delta \mathrm{M}$: using an industry-standard 1×FinFET11In modern technologies the node number does not refer to any one feature in the process, and foundries use slightly different conventions; we use 1x to denote the 14/16nm FinFET nodes offered by the foundry. technology with foundry models. We achieve power savings of 81 % in SNG compared to the LFSR approach. To evaluate how this area and power savings scale to more complex applications, we implement Gamma Correction, a popular image processing algorithm. For this application, our simulations show that SC-SD can save 98%-11% in the total system latency and 50%-38% in power consumption when compared with the SC-LFSR approach or the binary counterpart.
使用随机计算(SC)处理数据只需要$\sim$ 7% of the area and power of the typical binary approach. However, SC has two major drawbacks that eclipse any area and power savings. First, it takes $sim$ 99% more time to finish a computation when compared with the binary approach, since data is represented as streams of bits. Second, the Linear Feedback Shift Registers (LFSRs) required to generate the stochastic streams increment the power and area of the overall SC-LFSR system. These drawbacks result in similar or higher area, power, and energy numbers when compared with the binary counterpart. In this work, we address these drawbacks by applying SC directly on Pulse Density Modulated (PDM) streams. Most modern Systems on Chip (SoCs) already include Analog to Digital Converters (ADCs). The core of $\Sigma\Delta$ -ADCs is the $\Sigma\Delta$ Modulator whose output is a PDM stream. Our approach (SC-SD) simplifies the system hardware in two ways. First, we drop the filter stage at the ADC and, second, we replace the costly Stochastic Number Generators (SNGs) with $\Sigma\Delta$ -Modulators. To further lower the system complexity, we adopt an Asynchronous $\Sigma\Delta$ -Modulator $(\mathrm{A}\Sigma\Delta \mathrm{M})$ architecture. We design and simulate the $\mathrm{A}\Sigma\Delta \mathrm{M}$: using an industry-standard 1×FinFET11In modern technologies the node number does not refer to any one feature in the process, and foundries use slightly different conventions; we use 1x to denote the 14/16nm FinFET nodes offered by the foundry. technology with foundry models. We achieve power savings of 81 % in SNG compared to the LFSR approach. To evaluate how this area and power savings scale to more complex applications, we implement Gamma Correction, a popular image processing algorithm. For this application, our simulations show that SC-SD can save 98%-11% in the total system latency and 50%-38% in power consumption when compared with the SC-LFSR approach or the binary counterpart.