2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)最新文献

英文中文

Enabling Microarchitectural Randomization in Serialized AES Implementations to Mitigate Side Channel Susceptibility 在序列化AES实现中启用微架构随机化以减轻侧信道易感性

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00064

S. Dhanuskodi, Daniel E. Holcomb

Highly serialized implementations of the AES block cipher are used in lightweight applications where low area and low power are the primary concerns. Security of these lightweight designs becomes increasingly critical on resource-constrained devices in the Internet of Things era. The AES algorithm does not have any significant known cryptanalytic weaknesses, but keys can often be extracted by attacking implementation weaknesses using side channel information leakage or fault injection. Highly serialized AES implementations compute on individual bytes/words of data in each cycle which leaves them especially sensitive to side channel key extraction because there is less overall power consumption to obscure side channel leakages. In this work, we present an efficient AES microarchitecture that randomizes sub-round operations and reduces susceptibility to power side channel attacks. The architecture we propose is compatible with, and complementary to, all existing circuit-level side channel countermeasures. We design an 8-bit AES architecture in a commercial 16nm FinFET technology and observe an order of magnitude improvement in side channel protection at a cost of 36% more area and 25% more energy per encryption. Testchip measurement shows 0.93pJ/bit energy consumption at 10MHz.

AES分组密码的高度序列化实现用于主要关注低面积和低功耗的轻量级应用程序。在物联网时代，这些轻量级设计的安全性在资源受限的设备上变得越来越重要。AES算法没有任何重大的已知密码分析弱点，但通常可以通过使用侧信道信息泄漏或故障注入攻击实现弱点来提取密钥。高度序列化的AES实现在每个周期中对单个字节/字的数据进行计算，这使得它们对侧信道密钥提取特别敏感，因为用于掩盖侧信道泄漏的总功耗更小。在这项工作中，我们提出了一种高效的AES微架构，可以随机化次轮操作并降低对功率侧信道攻击的易感性。我们提出的架构与所有现有的电路级侧信道对抗兼容并互补。我们在商用16nm FinFET技术中设计了一个8位AES架构，并观察到在侧信道保护方面的数量级改进，每次加密的成本增加了36%的面积和25%的能量。测试芯片测量显示10MHz时0.93pJ/bit的能量消耗。

{"title":"Enabling Microarchitectural Randomization in Serialized AES Implementations to Mitigate Side Channel Susceptibility","authors":"S. Dhanuskodi, Daniel E. Holcomb","doi":"10.1109/ISVLSI.2019.00064","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00064","url":null,"abstract":"Highly serialized implementations of the AES block cipher are used in lightweight applications where low area and low power are the primary concerns. Security of these lightweight designs becomes increasingly critical on resource-constrained devices in the Internet of Things era. The AES algorithm does not have any significant known cryptanalytic weaknesses, but keys can often be extracted by attacking implementation weaknesses using side channel information leakage or fault injection. Highly serialized AES implementations compute on individual bytes/words of data in each cycle which leaves them especially sensitive to side channel key extraction because there is less overall power consumption to obscure side channel leakages. In this work, we present an efficient AES microarchitecture that randomizes sub-round operations and reduces susceptibility to power side channel attacks. The architecture we propose is compatible with, and complementary to, all existing circuit-level side channel countermeasures. We design an 8-bit AES architecture in a commercial 16nm FinFET technology and observe an order of magnitude improvement in side channel protection at a cost of 36% more area and 25% more energy per encryption. Testchip measurement shows 0.93pJ/bit energy consumption at 10MHz.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"20 1","pages":"314-319"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85935098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Hardware Implementation of Improved Fast-SSC-Flip Decoder for Polar Codes 改进的快速ssc翻转译码器的硬件实现

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00109

Jing Zeng, Yangcan Zhou, Jun Lin, Zhongfeng Wang

Polar code has gained worldwide recognition due to its capacity-achieving property. The fast simplified successive cancellation flip (Fast-SSC-Flip) algorithm improves the frame error rate (FER) performance of successive cancellation (SC) decoders with extra metric computations. However, the metric includes exponential operations, which is inefficient to implement. In this work, one shift operation and one addition are employed to approximate the exponential operations with negligible performance loss. Besides, the metric computations of more constituent codes (Type-I to Type-V) are derived that the complexity of Fast-SSC-Flip is further reduced. Simulation results show that the FER performance of the improved Fast-SSC-Flip algorithm with list size λ = 8 outperforms that of the Fast-SSC algorithm by 0.7dB. Moreover, based on the proposed algorithm, an efficient hardware architecture is first developed. It uses less look up tables and achieves a maximum throughput of 140.6 Mbps at high SNR on the Altera Stratix IV EP4SGX530KH40C2 FPGA. Compared with the state of art Fast-SSC implementation, the proposed architecture has a higher resource utilization efficiency (throughput per look-up tables). Under the 28 nm CMOS technology, the throughput can reach up to 483.8 Mbps at a clock frequency of 704 MHz.

Polar码由于其容量实现特性而获得了全世界的认可。快速简化连续抵消翻转(fast - ssc - flip)算法通过额外的度量计算提高了连续抵消解码器的帧误码率性能。然而，度量包括指数运算，实现起来效率很低。在这项工作中，使用一个移位运算和一个加法运算来近似指数运算，性能损失可以忽略不计。此外，推导了更多组成码(Type-I到Type-V)的度量计算，进一步降低了Fast-SSC-Flip的复杂度。仿真结果表明，当链表大小λ = 8时，改进的Fast-SSC- flip算法的FER性能比Fast-SSC算法高0.7dB。在此基础上，提出了一种高效的硬件结构。它使用较少的查找表，并在Altera Stratix IV EP4SGX530KH40C2 FPGA上实现高信噪比下的最大吞吐量140.6 Mbps。与现有的Fast-SSC实现相比，所提出的体系结构具有更高的资源利用效率(每个查找表的吞吐量)。在28纳米CMOS技术下，时钟频率为704 MHz时的吞吐量可达483.8 Mbps。

{"title":"Hardware Implementation of Improved Fast-SSC-Flip Decoder for Polar Codes","authors":"Jing Zeng, Yangcan Zhou, Jun Lin, Zhongfeng Wang","doi":"10.1109/ISVLSI.2019.00109","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00109","url":null,"abstract":"Polar code has gained worldwide recognition due to its capacity-achieving property. The fast simplified successive cancellation flip (Fast-SSC-Flip) algorithm improves the frame error rate (FER) performance of successive cancellation (SC) decoders with extra metric computations. However, the metric includes exponential operations, which is inefficient to implement. In this work, one shift operation and one addition are employed to approximate the exponential operations with negligible performance loss. Besides, the metric computations of more constituent codes (Type-I to Type-V) are derived that the complexity of Fast-SSC-Flip is further reduced. Simulation results show that the FER performance of the improved Fast-SSC-Flip algorithm with list size λ = 8 outperforms that of the Fast-SSC algorithm by 0.7dB. Moreover, based on the proposed algorithm, an efficient hardware architecture is first developed. It uses less look up tables and achieves a maximum throughput of 140.6 Mbps at high SNR on the Altera Stratix IV EP4SGX530KH40C2 FPGA. Compared with the state of art Fast-SSC implementation, the proposed architecture has a higher resource utilization efficiency (throughput per look-up tables). Under the 28 nm CMOS technology, the throughput can reach up to 483.8 Mbps at a clock frequency of 704 MHz.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"88 1","pages":"580-585"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85633660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Machine Learning-Based Processor Adaptability Targeting Energy, Performance, and Reliability 基于机器学习的处理器适应性:能量、性能和可靠性

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00037

A. L. Sartor, P. H. E. Becker, Stephan Wong, R. Marculescu, A. C. S. Beck

Adaptive processors can dynamically change their hardware configuration by tuning several knobs that optimize a given metric, according to the current application. However, the complexity of choosing the best setup at runtime increases exponentially as more adaptive resources become available. Therefore, we propose a polymorphic VLIW processor coupled to a machine learning-based decision mechanism that quickly and accurately delivers the best trade-off in terms of energy, performance, and reliability. The proposed system predicts the best processor configuration in 97.37% of the test cases and achieves an efficiency that is close to an oracle (more than 93.30% on all benchmarks).

自适应处理器可以根据当前的应用程序，通过调整几个旋钮来优化给定的指标，从而动态地改变它们的硬件配置。然而，在运行时选择最佳设置的复杂性随着更多的自适应资源的可用性呈指数增长。因此，我们提出了一种多态VLIW处理器，结合基于机器学习的决策机制，快速准确地在能源、性能和可靠性方面提供最佳权衡。提出的系统在97.37%的测试用例中预测出最佳的处理器配置，并实现了接近oracle的效率(在所有基准测试中超过93.30%)。

引用次数: 1

ASSET: Architectures for Smart Security of Non-Volatile Memories 资产:非易失性存储器的智能安全架构

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00070

S. Swami, K. Mohanram

Computing systems that integrate advanced non-volatile memories (NVMs) are vulnerable to several security attacks that threaten (i) data confidentiality, (ii) data availability, and (iii) data integrity. This paper proposes Architectures for Smart Security of NVMs (AS-SET), which integrates five low overhead, high performance security solutions—SECRET [1], COVERT [2], ACME [3], ARSE-NAL [4], and STASH [5]—to thwart these attacks on NVM systems. SECRET is a low cost security solution that employs counter mode encryption (CME) for data confidentiality in multi-/triple-level cell (i.e., MLC/TLC) NVMs. COVERT and ACME complement SECRET to improve system availability of CME. ARSENAL integrates CME and Bonsai Merkle Tree (BMT) authentication to thwart data confidentiality and integrity attacks, respectively, in NVMs and simultaneously enables instant data recovery (IDR) on power/system failures. Finally, STASH is the first comprehensive end-to-end security architecture for state-of-the-art smart hybrid memories (SHMs). STASH integrates (i) CME for data confidentiality, (ii) page-level MT authentication for data integrity, (iii) recovery-compatible MT updates to withstand power or system failures, and (iv) page-migration friendly security meta-data management. This paper thus addresses the core security challenges of next-generation NVM systems.

集成高级非易失性存储器(nvm)的计算系统容易受到几种安全攻击的威胁，这些攻击会威胁到(i)数据机密性，(ii)数据可用性和(iii)数据完整性。本文提出了NVM智能安全架构(AS-SET)，它集成了五种低开销、高性能的安全解决方案——secret[1]、COVERT[2]、ACME[3]、ase - nal[4]和STASH[5]——以阻止这些对NVM系统的攻击。SECRET是一种低成本的安全解决方案，它采用反模式加密(CME)来实现多/三层单元(即MLC/TLC) nvm中的数据机密性。COVERT和ACME是SECRET的补充，以提高CME的系统可用性。ARSENAL集成了CME和盆景默克尔树(BMT)认证，分别可以防止nvm中的数据机密性和完整性攻击，同时可以在电源/系统故障时实现即时数据恢复(IDR)。最后，STASH是用于最先进的智能混合存储器(SHMs)的第一个全面的端到端安全体系结构。STASH集成了(i)数据保密性的CME， (ii)数据完整性的页面级MT认证，(iii)恢复兼容的MT更新以承受电源或系统故障，以及(iv)页面迁移友好的安全元数据管理。因此，本文解决了下一代NVM系统的核心安全挑战。

{"title":"ASSET: Architectures for Smart Security of Non-Volatile Memories","authors":"S. Swami, K. Mohanram","doi":"10.1109/ISVLSI.2019.00070","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00070","url":null,"abstract":"Computing systems that integrate advanced non-volatile memories (NVMs) are vulnerable to several security attacks that threaten (i) data confidentiality, (ii) data availability, and (iii) data integrity. This paper proposes Architectures for Smart Security of NVMs (AS-SET), which integrates five low overhead, high performance security solutions—SECRET [1], COVERT [2], ACME [3], ARSE-NAL [4], and STASH [5]—to thwart these attacks on NVM systems. SECRET is a low cost security solution that employs counter mode encryption (CME) for data confidentiality in multi-/triple-level cell (i.e., MLC/TLC) NVMs. COVERT and ACME complement SECRET to improve system availability of CME. ARSENAL integrates CME and Bonsai Merkle Tree (BMT) authentication to thwart data confidentiality and integrity attacks, respectively, in NVMs and simultaneously enables instant data recovery (IDR) on power/system failures. Finally, STASH is the first comprehensive end-to-end security architecture for state-of-the-art smart hybrid memories (SHMs). STASH integrates (i) CME for data confidentiality, (ii) page-level MT authentication for data integrity, (iii) recovery-compatible MT updates to withstand power or system failures, and (iv) page-migration friendly security meta-data management. This paper thus addresses the core security challenges of next-generation NVM systems.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"182 1","pages":"348-353"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77589898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Self Timed SRAM Array with Enhanced low Voltage Read and Write Capability 具有增强低电压读写能力的自定时SRAM阵列

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00117

Prasad Vernekar, Nithin Kumar Yernad Balachandra, V. M. Harishchandra

The scaling of technology resulted in shrinking of device sizes, however suffers from threshold variation. These variations impact the performance and stability of static random access memory (SRAM). The proposed work takes the advantage of the conventional Replica-Bit-Line (RBL) technique to design a self-timed SRAM control circuit which keeps track of timing variations and generated control signals. A variable wordline scheme is proposed to ensure successful read and write operation of SRAM at low voltages. The 2 Kb (2048 bits) SRAM designed in 65 nm UMC technology, operates with clock frequency reaching upto 1 GHz at the nominal supply voltage of 1.2 V and at 50 MHz with the supply voltage of 0.58 V. The variations in the Sense Amplifier Enable (SAE) generation are reduced upto 92% compared to the conventional RBL technique.

技术的规模化导致了设备尺寸的缩小，但也受到阈值变化的影响。这些变化会影响静态随机存取存储器(SRAM)的性能和稳定性。本文利用传统的复制位线(RBL)技术设计了一种自定时SRAM控制电路，该电路可以跟踪时间变化并产生控制信号。提出了一种可变字线方案，以保证SRAM在低电压下能够成功地进行读写操作。采用65纳米UMC技术设计的2kb(2048位)SRAM，在标称电源电压为1.2 V时时钟频率最高可达1 GHz，在50 MHz时电源电压为0.58 V。与传统的RBL技术相比，感应放大器使能(SAE)产生的变化减少了92%。

引用次数: 1

Minimization of Flare in EUVL by Simultaneous Wire Segment Perturbation and Dummification 同时线段摄动和伪化的EUVL中耀斑最小化

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00047

S. Paul, Pritha Banerjee, S. Sur-Kolay

Extreme Ultraviolet Lithography (EUVL) suffers from pattern distortion defects caused by flare, which is irregular reflection from the surface of the mask used. While techniques based on (a) dummification and (b) perturbation of wire segments have reduced flare notably, each has its own merits and demerits. Unlike earlier works where each method for flare reduction is applied independently, in this paper we extensively study the effects on flare and the amount of dummy fills required by simultaneous dummification and wire segment perturbation using an Integer Linear Programming (ILP) based formulation in a multilevel framework at the post-routing stage. Experimental results show an average reduction of maximum flare level by 29% compared to that in the initial routed layout. In addition to that, an average reduction of maximum flare by 19% is observed compared to the method of wire perturbation alone. Moreover, in our method 35% reduction in dummy requirement on average is achieved compared to the application of dummification technique alone for the reduction of flare.

极紫外光刻(EUVL)存在由光斑引起的图案畸变缺陷，这是由所用掩模表面的不规则反射引起的。虽然基于(a)伪化和(b)导线段扰动的技术显著减少了耀斑，但每种技术都有自己的优点和缺点。与早期的工作不同，每种减少耀斑的方法都是独立应用的，在本文中，我们使用基于整数线性规划(ILP)的公式，在路由后阶段的多层框架中广泛研究了同时伪化和线段扰动对耀斑的影响以及所需的虚拟填充量。实验结果表明，与初始路由布局相比，最大耀斑电平平均降低了29%。除此之外，与单独的线摄动方法相比，观察到最大耀斑平均减少19%。此外，在我们的方法中，与单独应用伪化技术相比，平均降低了35%的伪要求，以减少耀斑。

引用次数: 1

A Multi-phase Time-to-Digital Converter Differential Vernier Ring Oscillato 多相时间-数字转换器差分游标环振荡器

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00069

A. Annagrebah, E. Bechetoille, I. Laktineh, H. Chanal, P. Russo, H. Mathez

is paper reports the development of an adjustable, Time-to-Digital Converter (TDC) based on two vernier Ring Oscillators (RO). The TDC aims to measure timing in Resistive Plate Chamber (RPC) detector for CMS experiment. Considering previous designs, the contribution from power supply noise and intrinsic transistor noise had been minimizing with differential stages and proper transistor sizing. In order to reduce the dead time inherent to Vernier TDC architecture, as many Phase Detector (PD) as possible had been implemented. Such functionality permits to choose whether reducing the dead time or measuring redundantly the start-stop time difference for an improved resolution. The 81-PD Matrix is the originality of the design. Indeed, the information of the start-stop time is present at each inverter output with a predictable offset in term of time. The inverter architecture introduces a constant delay at each stage of the chain with phase inversion. By recording the counter when a phase detection occurs at each inverter output permitted to revert back to the start-stop time. The prototype TDC fabricated in a 130-nm technology consumes 8.5 mW power under 1.2-V supply. The measurement of this chip shown a timing accuracy of 5.48 ps at a timing resolution of 8 ps for the first data allowed by the first phase detection

本文报道了一种基于两个游标环振荡器的可调时间-数字转换器(TDC)的研制。TDC旨在测量CMS实验中电阻板室(RPC)探测器的定时。考虑到以前的设计，通过不同的级和适当的晶体管尺寸，电源噪声和晶体管固有噪声的贡献已经最小化。为了减少游标TDC结构固有的死区时间，人们设计了尽可能多的鉴相器(PD)。这样的功能允许选择是否减少死区时间或冗余测量启动停止时间差，以提高分辨率。81-PD矩阵是设计的独创性。实际上，启停时间的信息存在于每个逆变器的输出，在时间方面具有可预测的偏移。逆变器结构在链的每个阶段引入恒定的相位反转延迟。通过记录计数器，当相位检测发生在每个逆变器输出时，允许恢复到启停时间。采用130纳米技术制造的原型TDC在1.2 v电源下功耗为8.5 mW。该芯片的测量显示，第一相位检测允许的第一个数据的定时分辨率为8 ps，定时精度为5.48 ps

{"title":"A Multi-phase Time-to-Digital Converter Differential Vernier Ring Oscillato","authors":"A. Annagrebah, E. Bechetoille, I. Laktineh, H. Chanal, P. Russo, H. Mathez","doi":"10.1109/ISVLSI.2019.00069","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00069","url":null,"abstract":"is paper reports the development of an adjustable, Time-to-Digital Converter (TDC) based on two vernier Ring Oscillators (RO). The TDC aims to measure timing in Resistive Plate Chamber (RPC) detector for CMS experiment. Considering previous designs, the contribution from power supply noise and intrinsic transistor noise had been minimizing with differential stages and proper transistor sizing. In order to reduce the dead time inherent to Vernier TDC architecture, as many Phase Detector (PD) as possible had been implemented. Such functionality permits to choose whether reducing the dead time or measuring redundantly the start-stop time difference for an improved resolution. The 81-PD Matrix is the originality of the design. Indeed, the information of the start-stop time is present at each inverter output with a predictable offset in term of time. The inverter architecture introduces a constant delay at each stage of the chain with phase inversion. By recording the counter when a phase detection occurs at each inverter output permitted to revert back to the start-stop time. The prototype TDC fabricated in a 130-nm technology consumes 8.5 mW power under 1.2-V supply. The measurement of this chip shown a timing accuracy of 5.48 ps at a timing resolution of 8 ps for the first data allowed by the first phase detection","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"22 1","pages":"344-347"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85404019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

MRAM-Based Stochastic Oscillators for Adaptive Non-Uniform Sampling of Sparse Signals in IoT Applications 物联网应用中基于mram的稀疏信号自适应非均匀采样随机振荡器

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00079

Soheil Salehi, Alireza Zaeemzadeh, Adrian Tatulian, N. Rahnavard, R. Demara

Recent advances to hardware integration and realization of highly-efficient Compressive Sensing (CS) approaches have inspired novel circuit and architectural-level approaches. These embrace the challenge to design more optimal nonuniform CS solutions that consider device-level constraints for IoT applications wherein lifetime energy, device area, and manufacturing costs are highly-constrained, but meanwhile, the sensing environment is rapidly changing. In this manuscript, we develop a novel adaptive hardware-based approach for non-uniform compressive sampling of sparse and time-varying signals. The proposed Adaptive Sampling of Sparse IoT signals via STochastic-oscillators (ASSIST) approach intelligently generates the CS measurement matrix by distributing the sensing energy among coefficients by considering the signal characteristics such as sparsity rate and noise level obtained in the previous time step. In our proposed approach, Magnetic Random Access Memory (MRAM)-based stochastic oscillators are utilized to generate the random bitstreams used in the CS measurement matrix. SPICE and MATLAB circuit-algorithm simulation results indicate that ASSIST efficiently achieves the desired non-uniform recovery of the original signals with varying sparsity rates and noise levels.

硬件集成和高效压缩感知(CS)方法的最新进展激发了新的电路和架构级方法。这些挑战包括设计更优的非统一CS解决方案，考虑到物联网应用的设备级限制，其中生命周期能量，设备面积和制造成本受到高度限制，但同时，传感环境正在迅速变化。在本文中，我们开发了一种新的基于硬件的自适应方法，用于稀疏和时变信号的非均匀压缩采样。本文提出的基于随机振荡器的物联网稀疏信号自适应采样(ASSIST)方法，通过考虑前一时间步长获得的信号稀疏率和噪声水平等特征，将感知能量分配到系数中，智能地生成CS测量矩阵。在我们提出的方法中，利用基于磁随机存取存储器(MRAM)的随机振荡器来生成CS测量矩阵中使用的随机比特流。SPICE和MATLAB电路算法仿真结果表明，ASSIST有效地实现了原始信号在不同稀疏率和噪声水平下的非均匀恢复。

{"title":"MRAM-Based Stochastic Oscillators for Adaptive Non-Uniform Sampling of Sparse Signals in IoT Applications","authors":"Soheil Salehi, Alireza Zaeemzadeh, Adrian Tatulian, N. Rahnavard, R. Demara","doi":"10.1109/ISVLSI.2019.00079","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00079","url":null,"abstract":"Recent advances to hardware integration and realization of highly-efficient Compressive Sensing (CS) approaches have inspired novel circuit and architectural-level approaches. These embrace the challenge to design more optimal nonuniform CS solutions that consider device-level constraints for IoT applications wherein lifetime energy, device area, and manufacturing costs are highly-constrained, but meanwhile, the sensing environment is rapidly changing. In this manuscript, we develop a novel adaptive hardware-based approach for non-uniform compressive sampling of sparse and time-varying signals. The proposed Adaptive Sampling of Sparse IoT signals via STochastic-oscillators (ASSIST) approach intelligently generates the CS measurement matrix by distributing the sensing energy among coefficients by considering the signal characteristics such as sparsity rate and noise level obtained in the previous time step. In our proposed approach, Magnetic Random Access Memory (MRAM)-based stochastic oscillators are utilized to generate the random bitstreams used in the CS measurement matrix. SPICE and MATLAB circuit-algorithm simulation results indicate that ASSIST efficiently achieves the desired non-uniform recovery of the original signals with varying sparsity rates and noise levels.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"28 1","pages":"403-408"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89941504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Distributed Pulse Rotary Traveling Wave VCO: Architecture and Design 分布式脉冲旋转行波压控振荡器:结构与设计

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00051

Prashansa Mukim, Aditya Dalakoti, D. McCarthy, Brandon Pon, Carrie Segal, Merritt Miller, J. Buckwalter, F. Brewer

This paper describes the architecture and design of pulse rotary traveling wave voltage controlled oscillators that preserve wave shape, and thus wave harmonics using non-linear amplification. These oscillators can provide multiple low dutycycle clock phases and architectural modifications can allow for the same clock phase to be present at multiple physical locations. A design fabricated in GFUS 130nm (8RF) technology operates at 5.32 GHz with a 10 MHz offset phase noise of -128.15 dBc/Hz at 45.4 mW while generating 12 driven phase outputs with 15.66 ps phase resolution and less than 500 fs cycle-to-cycle jitter. It can be coarse or fine tuned within a frequency range of 4.35 GHz to 5.4 GHz with KV CO of 1.7 GHz/V and 470 MHz/V respectively. The start-up mechanism of the oscillator minimizes transmission line reflections and allows maintenance of the traveling wave shape, yielding an average 3 dB figure of merit improvement over existing designs.

本文介绍了脉冲旋转行波压控振荡器的结构和设计，该振荡器采用非线性放大技术来保持波形，从而保持波的谐波。这些振荡器可以提供多个低占空比时钟相位，并且结构修改可以允许在多个物理位置出现相同的时钟相位。采用GFUS 130nm (8RF)技术制作的设计工作频率为5.32 GHz, 45.4 mW时10 MHz偏置相位噪声为-128.15 dBc/Hz，同时产生12个驱动相位输出，相位分辨率为15.66 ps，周期间抖动小于500 fs。可在4.35 GHz ~ 5.4 GHz的频率范围内进行粗调或精调，KV CO分别为1.7 GHz/V和470 MHz/V。振荡器的启动机制最大限度地减少了传输线反射，并允许保持行波形状，与现有设计相比，产生平均3 dB的性能改进。

引用次数: 0

A Framework for the Analysis of Throughput-Constraints of SNNs on Neuromorphic Hardware 神经形态硬件上snn吞吐量约束分析框架

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00043

Adarsha Balaji, Anup Das

Spiking neural networks (SNN) are efficient computation models to infer spacio-temporal pattern recognition applications on neuromorphic hardware. Neuromorphic hardware are typically designed using interconnected crossbars, with each crossbar containing a structure of fully connected neurons. In order to ensure application performance such as accuracy and system performance such as throughput and resource utilization, SNNs need to be efficiently mapped on neuromorphic hardware. To address this, we propose a design flow to partition and map SNN-based applications on neuromorphic hardware, with an aim to enhance application and system performance. The design flow operates in two steps : (1) a two-step clustering technique to partition trained SNNs into clusters of neurons and synapses, with an aim to minimize inter-cluster spike communication, (2) mapping and scheduling the clusters on to crossbars-based architectures, modeled using Synchronous Data-flow Graphs (SDFGs). The SDFG model incorporates hardware constraints such as I/O bandwidth of crossbars and synaptic memory while analyzing the throughput of the modeled system. Our design-flow integrates CARLsim, a GPU-accelerated application-level SNN simulator with SDF3, a tool to map SDFG on hardware. We evaluate the design-flow using synthetic and realistic SNN-based applications. We show that, for throughput constrained applications, we achieve a 21.74% and 15.03% reduction in memory usage and utilization of the time-multiplexed interconnect, compared to a state of the art approach.

脉冲神经网络(SNN)是一种在神经形态硬件上推断时空模式识别应用的高效计算模型。神经形态硬件通常使用相互连接的横条设计，每个横条包含一个完全连接的神经元结构。为了保证精度等应用性能和吞吐量、资源利用率等系统性能，snn需要有效地映射到神经形态硬件上。为了解决这个问题，我们提出了一个设计流程，在神经形态硬件上划分和映射基于snn的应用程序，旨在提高应用程序和系统性能。设计流程分为两个步骤:(1)两步聚类技术，将训练好的snn划分为神经元和突触簇，目的是尽量减少簇间尖峰通信;(2)将簇映射和调度到基于crossbars的架构上，使用同步数据流图(sdfg)建模。SDFG模型在分析建模系统的吞吐量时考虑了硬件约束，如交叉条的I/O带宽和突触内存。我们的设计流程集成了CARLsim，一个gpu加速的应用级SNN模拟器和SDF3，一个将SDFG映射到硬件上的工具。我们使用合成的和现实的基于snn的应用程序来评估设计流程。我们表明，对于吞吐量受限的应用程序，与最先进的方法相比，我们在内存使用和时间复用互连的利用率方面分别减少了21.74%和15.03%。

{"title":"A Framework for the Analysis of Throughput-Constraints of SNNs on Neuromorphic Hardware","authors":"Adarsha Balaji, Anup Das","doi":"10.1109/ISVLSI.2019.00043","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00043","url":null,"abstract":"Spiking neural networks (SNN) are efficient computation models to infer spacio-temporal pattern recognition applications on neuromorphic hardware. Neuromorphic hardware are typically designed using interconnected crossbars, with each crossbar containing a structure of fully connected neurons. In order to ensure application performance such as accuracy and system performance such as throughput and resource utilization, SNNs need to be efficiently mapped on neuromorphic hardware. To address this, we propose a design flow to partition and map SNN-based applications on neuromorphic hardware, with an aim to enhance application and system performance. The design flow operates in two steps : (1) a two-step clustering technique to partition trained SNNs into clusters of neurons and synapses, with an aim to minimize inter-cluster spike communication, (2) mapping and scheduling the clusters on to crossbars-based architectures, modeled using Synchronous Data-flow Graphs (SDFGs). The SDFG model incorporates hardware constraints such as I/O bandwidth of crossbars and synaptic memory while analyzing the throughput of the modeled system. Our design-flow integrates CARLsim, a GPU-accelerated application-level SNN simulator with SDF3, a tool to map SDFG on hardware. We evaluate the design-flow using synthetic and realistic SNN-based applications. We show that, for throughput constrained applications, we achieve a 21.74% and 15.03% reduction in memory usage and utilization of the time-multiplexed interconnect, compared to a state of the art approach.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"1 1","pages":"193-196"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75049569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀