Pub Date : 2019-10-01DOI: 10.1109/ASICON47005.2019.8983575
Ru Ding, Xuemei Tian, Guoqiang Bai, G. Su, Xingjun Wu
As an important feed-forward neural network in the field of deep learning, convolutional neural network (CNN) has been widely used in image classification, face recognition, natural language processing and document analysis in recent years. CNN has a large amount of data and many multiply and accumulate (MAC) operations. With the diversity of application files, the channel sizes and kernel sizes of CNN are diverse, while the existing hardware platform mostly adopts the average optimization technology, which causes the waste of computing resources. In this paper, a special configurable convolution computing array is designed, which contains 15 convolution units, each PE contains 6×6 MAC operations, it can be configured to calculate three different kernel sizes of 5×5, 3×3 and 1×1. At the same time, pipeline structure is used to synchronize convolution and pooling operations, which reduces the storage of intermediate results. We design the special hardware structure to optimize DeepID network. Tested on Altera Cyclone V FPGA, the peak performance of each convolution layer at 50 MHz is 27 GOPS, and the average utilization of the MAC is 92%.
卷积神经网络(CNN)作为深度学习领域重要的前馈神经网络,近年来在图像分类、人脸识别、自然语言处理和文档分析等领域得到了广泛的应用。CNN的数据量很大,有很多的乘法和累加运算(MAC)。随着应用程序文件的多样性,CNN的通道大小和内核大小也是多种多样的,而现有的硬件平台大多采用平均优化技术,造成了计算资源的浪费。本文设计了一种特殊的可配置卷积计算阵列,该阵列包含15个卷积单元,每个PE包含6×6 MAC操作,可配置计算5×5、3×3和1×1三种不同内核大小。同时,采用流水线结构同步卷积和池化操作,减少了中间结果的存储。我们设计了特殊的硬件结构来优化DeepID网络。在Altera Cyclone V FPGA上测试,50 MHz时每个卷积层的峰值性能为27 GOPS, MAC的平均利用率为92%。
{"title":"Hardware Implementation of Convolutional Neural Network for Face Feature Extraction","authors":"Ru Ding, Xuemei Tian, Guoqiang Bai, G. Su, Xingjun Wu","doi":"10.1109/ASICON47005.2019.8983575","DOIUrl":"https://doi.org/10.1109/ASICON47005.2019.8983575","url":null,"abstract":"As an important feed-forward neural network in the field of deep learning, convolutional neural network (CNN) has been widely used in image classification, face recognition, natural language processing and document analysis in recent years. CNN has a large amount of data and many multiply and accumulate (MAC) operations. With the diversity of application files, the channel sizes and kernel sizes of CNN are diverse, while the existing hardware platform mostly adopts the average optimization technology, which causes the waste of computing resources. In this paper, a special configurable convolution computing array is designed, which contains 15 convolution units, each PE contains 6×6 MAC operations, it can be configured to calculate three different kernel sizes of 5×5, 3×3 and 1×1. At the same time, pipeline structure is used to synchronize convolution and pooling operations, which reduces the storage of intermediate results. We design the special hardware structure to optimize DeepID network. Tested on Altera Cyclone V FPGA, the peak performance of each convolution layer at 50 MHz is 27 GOPS, and the average utilization of the MAC is 92%.","PeriodicalId":319342,"journal":{"name":"2019 IEEE 13th International Conference on ASIC (ASICON)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132676008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-10-01DOI: 10.1109/ASICON47005.2019.8983642
Jintao Li, Yanhan Zeng, Hailong Wu, R. Li, Jun Zhang, Hongzhou Tan
An application of differential evolution for parameter optimization in the low dropout regulator (LDO) is presented in this paper. The parameters optimization by manual work for the analog integrated circuit, such as LDO, is laborious and time-consuming, and it is uncertain to find the relatively good result. In this paper, the differential evolution is used to optimize the parameters and find the relatively good performance of LDO. In order to improve the convergence speed and optimization effect, a new constraint solution and a fast weight-based non-dominated sorting method are proposed. Simulation results show that the gain-bandwidth product,load regulation and line regulation are improved by 206.5%, 58.1% and 87.6%, respectively, compared with the manual solution.
{"title":"Performance optimization for LDO regulator based on the differential evolution","authors":"Jintao Li, Yanhan Zeng, Hailong Wu, R. Li, Jun Zhang, Hongzhou Tan","doi":"10.1109/ASICON47005.2019.8983642","DOIUrl":"https://doi.org/10.1109/ASICON47005.2019.8983642","url":null,"abstract":"An application of differential evolution for parameter optimization in the low dropout regulator (LDO) is presented in this paper. The parameters optimization by manual work for the analog integrated circuit, such as LDO, is laborious and time-consuming, and it is uncertain to find the relatively good result. In this paper, the differential evolution is used to optimize the parameters and find the relatively good performance of LDO. In order to improve the convergence speed and optimization effect, a new constraint solution and a fast weight-based non-dominated sorting method are proposed. Simulation results show that the gain-bandwidth product,load regulation and line regulation are improved by 206.5%, 58.1% and 87.6%, respectively, compared with the manual solution.","PeriodicalId":319342,"journal":{"name":"2019 IEEE 13th International Conference on ASIC (ASICON)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133436993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-10-01DOI: 10.1109/ASICON47005.2019.8983678
Jingyi Wu, Hongyu Yu, Yang Jiang, Zeyu Wan, Siqi Lei, W. Cheng, Guangnan Zhou, R. Sokolovskij, Qing Wang, G. Xia
Digital etching is an effective method to lower dry etch damages in A1GaN/GaN HEMTs. This work systematically investigated O2-plasma-based digital etching of AlGaN and p-GaN. AlN layers were used as the etch stop layers in the AlGaN etch. Important process aspects such as the use of the AlN layers, the RF power, the oxygen flow rate, the oxidation time and the resulting roughness were studied. These are technically relevant to obtain controllable, uniform etch surfaces with low surface damages for better HEMT performance.
{"title":"Oxygen-plasma-based digital etching for GaN/AlGaN high electron mobility transistors","authors":"Jingyi Wu, Hongyu Yu, Yang Jiang, Zeyu Wan, Siqi Lei, W. Cheng, Guangnan Zhou, R. Sokolovskij, Qing Wang, G. Xia","doi":"10.1109/ASICON47005.2019.8983678","DOIUrl":"https://doi.org/10.1109/ASICON47005.2019.8983678","url":null,"abstract":"Digital etching is an effective method to lower dry etch damages in A1GaN/GaN HEMTs. This work systematically investigated O2-plasma-based digital etching of AlGaN and p-GaN. AlN layers were used as the etch stop layers in the AlGaN etch. Important process aspects such as the use of the AlN layers, the RF power, the oxygen flow rate, the oxidation time and the resulting roughness were studied. These are technically relevant to obtain controllable, uniform etch surfaces with low surface damages for better HEMT performance.","PeriodicalId":319342,"journal":{"name":"2019 IEEE 13th International Conference on ASIC (ASICON)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133571647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-10-01DOI: 10.1109/asicon47005.2019.8983623
Wenbin He, Fan Ye, Junyan Ren
This paper introduces a low power NRZ transmitter with 2-tap multiple-FFE operating at 40Gb/s. The transmitter incorporates 3-stage MUX, a tailless CML driver and a resistance calibration system. Simplifying low speed MUX, modifying high speed MUX structure and adopting tailless CML save a lot of power and improve signal quality. The simulation results show that the design can work at 40Gb/s with a −14.8 dB RLGC channel in 28nm CMOS technology. The simulation power consumption is 16.83 mW under 1.05V supply.
本文介绍了一种工作速度为40Gb/s的低功耗2分路多频频发射机。变送器包含3级MUX,无尾CML驱动器和电阻校准系统。简化低速MUX,修改高速MUX结构,采用无尾CML,大大节省了功耗,提高了信号质量。仿真结果表明,该设计可以在−14.8 dB RLGC通道下以40Gb/s的速度在28nm CMOS技术下工作。在1.05V电源下,仿真功耗为16.83 mW。
{"title":"A 40Gb/s Low Power Transmitter with 2-tap FFE and 40:1 MUX in 28nm CMOS Technology","authors":"Wenbin He, Fan Ye, Junyan Ren","doi":"10.1109/asicon47005.2019.8983623","DOIUrl":"https://doi.org/10.1109/asicon47005.2019.8983623","url":null,"abstract":"This paper introduces a low power NRZ transmitter with 2-tap multiple-FFE operating at 40Gb/s. The transmitter incorporates 3-stage MUX, a tailless CML driver and a resistance calibration system. Simplifying low speed MUX, modifying high speed MUX structure and adopting tailless CML save a lot of power and improve signal quality. The simulation results show that the design can work at 40Gb/s with a −14.8 dB RLGC channel in 28nm CMOS technology. The simulation power consumption is 16.83 mW under 1.05V supply.","PeriodicalId":319342,"journal":{"name":"2019 IEEE 13th International Conference on ASIC (ASICON)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122002999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-10-01DOI: 10.1109/ASICON47005.2019.8983467
T. Sadi, O. Badami, V. Georgiev, J. Ding, A. Asenov
Resistive random-access memories (RRAMs) are overwhelmingly viewed as potential candidates for the next generation of non-volatile memory devices. Here, we discuss the advantages of the kinetic Monte Carlo (KMC) simulation framework for RRAMs. We use a robust KMC simulator to analyze transport in promising oxide structures. The simulator couples self-consistently charge transport and thermal effects in the three-dimensional (3D) space, allowing a realistic reconstruction of the conductive filaments responsible for switching. By presenting insightful results, we argue that using a 3D physical electro-thermal simulator is necessary for understanding RRAM operation and reliability.
{"title":"Advanced Simulation of RRAM Memory Cells","authors":"T. Sadi, O. Badami, V. Georgiev, J. Ding, A. Asenov","doi":"10.1109/ASICON47005.2019.8983467","DOIUrl":"https://doi.org/10.1109/ASICON47005.2019.8983467","url":null,"abstract":"Resistive random-access memories (RRAMs) are overwhelmingly viewed as potential candidates for the next generation of non-volatile memory devices. Here, we discuss the advantages of the kinetic Monte Carlo (KMC) simulation framework for RRAMs. We use a robust KMC simulator to analyze transport in promising oxide structures. The simulator couples self-consistently charge transport and thermal effects in the three-dimensional (3D) space, allowing a realistic reconstruction of the conductive filaments responsible for switching. By presenting insightful results, we argue that using a 3D physical electro-thermal simulator is necessary for understanding RRAM operation and reliability.","PeriodicalId":319342,"journal":{"name":"2019 IEEE 13th International Conference on ASIC (ASICON)","volume":"14 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121005437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-10-01DOI: 10.1109/ASICON47005.2019.8983499
Ziyun He, Shaoquan Liao, Zixin Wang, Jianping Guo
This paper proposed a low-dropout regulator (LDO) with a dynamic-biasing super buffer technique and a 2-stage error amplifier (EA). The 2-stage EA provides a high loop gain and wide output swing so as to implement an ultra small load regulation and line regulation. The dynamic-biasing super buffer technique, made of super source follower, ultra low-output impedance buffer and dynamic biasing technique, ensures the stability of the LDO with a 1-µF output capacitance and adjusts the quiescent current (Iq) of the LDO in different load current so that an ultra low Iq is achieved by the proposed LDO structure. Based on Global Foundry 0.18-µm CMOS process, the circuit can drive up to 200-mA load current at 1.2-V output voltage in 1.4-V- 2-V supply voltage (Vs). In 1.4-Vs, Iq is 26.3-µA in no load and 57.7-µA in full load. In 2-Vs, Iq is 26.4-µA in no load and 246.2-µA in full load. The load regulation is 1.12-µV/mA in 1.4-Vs and the line regulation is 0.50-mV/V in 200-mA full load.
{"title":"A Power-Area-Efficient Low-Dropout Regulator With Enhanced Buffer Impedance Attenuation","authors":"Ziyun He, Shaoquan Liao, Zixin Wang, Jianping Guo","doi":"10.1109/ASICON47005.2019.8983499","DOIUrl":"https://doi.org/10.1109/ASICON47005.2019.8983499","url":null,"abstract":"This paper proposed a low-dropout regulator (LDO) with a dynamic-biasing super buffer technique and a 2-stage error amplifier (EA). The 2-stage EA provides a high loop gain and wide output swing so as to implement an ultra small load regulation and line regulation. The dynamic-biasing super buffer technique, made of super source follower, ultra low-output impedance buffer and dynamic biasing technique, ensures the stability of the LDO with a 1-µF output capacitance and adjusts the quiescent current (Iq) of the LDO in different load current so that an ultra low Iq is achieved by the proposed LDO structure. Based on Global Foundry 0.18-µm CMOS process, the circuit can drive up to 200-mA load current at 1.2-V output voltage in 1.4-V- 2-V supply voltage (Vs). In 1.4-Vs, Iq is 26.3-µA in no load and 57.7-µA in full load. In 2-Vs, Iq is 26.4-µA in no load and 246.2-µA in full load. The load regulation is 1.12-µV/mA in 1.4-Vs and the line regulation is 0.50-mV/V in 200-mA full load.","PeriodicalId":319342,"journal":{"name":"2019 IEEE 13th International Conference on ASIC (ASICON)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116209652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-10-01DOI: 10.1109/ASICON47005.2019.8983539
Zhao Tuo, Tao Chen, Wei Li, Danyang Yang
The reconfigurable very-long instruction word (VLIW) cipher processor need a large number of instructions and a long execution period when processing the elliptic curve cryptographic algorithm. By adding a modular operation acceleration unit to the cryptographic processor, the energy efficiency of the cipher processor processing the elliptic curve cryptographic algorithm is improved. After synthesis under the 130nm process standard cell library, the experimental results show that the module operation acceleration unit area is 30 Kgate, accounting for 6% of the processor area, and the 256-bit binary domain point multiplication operation time is reduced from 22.8ms to 5.8ms, and the energy efficiency is improved by about 372%.
{"title":"Method for improving energy efficiency of elliptic curve cryptography algorithm on reconfigurable symmetric cipher processor","authors":"Zhao Tuo, Tao Chen, Wei Li, Danyang Yang","doi":"10.1109/ASICON47005.2019.8983539","DOIUrl":"https://doi.org/10.1109/ASICON47005.2019.8983539","url":null,"abstract":"The reconfigurable very-long instruction word (VLIW) cipher processor need a large number of instructions and a long execution period when processing the elliptic curve cryptographic algorithm. By adding a modular operation acceleration unit to the cryptographic processor, the energy efficiency of the cipher processor processing the elliptic curve cryptographic algorithm is improved. After synthesis under the 130nm process standard cell library, the experimental results show that the module operation acceleration unit area is 30 Kgate, accounting for 6% of the processor area, and the 256-bit binary domain point multiplication operation time is reduced from 22.8ms to 5.8ms, and the energy efficiency is improved by about 372%.","PeriodicalId":319342,"journal":{"name":"2019 IEEE 13th International Conference on ASIC (ASICON)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114647271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-10-01DOI: 10.1109/ASICON47005.2019.8983441
Yangyang Chang, G. Sobelman, Xiaofang Zhou
In order for deep learning applications to run efficiently on low-power edge devices, including mobile and internet-of-things systems, it is important to reduce their computational and memory requirements. Binarized neural networks have shown promise in this area, but these are typically designed using existing architectures based on floating-point number representations. A more promising approach is to apply network architecture search algorithms to find optimized binarized architectures. In this paper, encoding schemes for the genetic algorithm search of binarized networks are described. The simulation results demonstrate the effectiveness of the proposed method.
{"title":"Genetic Architecture Search for Binarized Neural Networks","authors":"Yangyang Chang, G. Sobelman, Xiaofang Zhou","doi":"10.1109/ASICON47005.2019.8983441","DOIUrl":"https://doi.org/10.1109/ASICON47005.2019.8983441","url":null,"abstract":"In order for deep learning applications to run efficiently on low-power edge devices, including mobile and internet-of-things systems, it is important to reduce their computational and memory requirements. Binarized neural networks have shown promise in this area, but these are typically designed using existing architectures based on floating-point number representations. A more promising approach is to apply network architecture search algorithms to find optimized binarized architectures. In this paper, encoding schemes for the genetic algorithm search of binarized networks are described. The simulation results demonstrate the effectiveness of the proposed method.","PeriodicalId":319342,"journal":{"name":"2019 IEEE 13th International Conference on ASIC (ASICON)","volume":"302 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116517683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-10-01DOI: 10.1109/ASICON47005.2019.8983457
Yue Shi, Jiawen Wang, Jianwen Cao, Ze-kun Zhou
A high performance, low power cycle-by-cycle current limiter is proposed in this paper, which can prevent switching-mode power supply from damage under over current or short circuit conditions. With the clamping effect formed by feedback loop with operational amplifier, the output current can be proportionally sampled and transferred to a voltage relative to ground. In order to further reduce power consumption, the sampling loop with low bias current is only enabled during the on time of high-side transistors. Besides, an assistant clamping circuit is presented to accelerate the settling time of every cycle for high-speed applications. What's more, a source-input comparator with voltage clamper is adopted to judge the status of output current, which can realize reference level and comparison at the same time for further speed improvement with lower power requirement. The proposed current limiter is implemented in a standard 0.18µm CMOS process, whose verification results within a 3A, 2.2MHz Buck converter show that a 39ns response time with 8.6MHz bandwidth is realized with only 2µA on-state and zero off-state current consumption.
{"title":"An Ultra-Low Power Cycle-by-Cycle Current Limiter Suitable for Switching-Mode Power Supply with 2.2 MHz Frequency","authors":"Yue Shi, Jiawen Wang, Jianwen Cao, Ze-kun Zhou","doi":"10.1109/ASICON47005.2019.8983457","DOIUrl":"https://doi.org/10.1109/ASICON47005.2019.8983457","url":null,"abstract":"A high performance, low power cycle-by-cycle current limiter is proposed in this paper, which can prevent switching-mode power supply from damage under over current or short circuit conditions. With the clamping effect formed by feedback loop with operational amplifier, the output current can be proportionally sampled and transferred to a voltage relative to ground. In order to further reduce power consumption, the sampling loop with low bias current is only enabled during the on time of high-side transistors. Besides, an assistant clamping circuit is presented to accelerate the settling time of every cycle for high-speed applications. What's more, a source-input comparator with voltage clamper is adopted to judge the status of output current, which can realize reference level and comparison at the same time for further speed improvement with lower power requirement. The proposed current limiter is implemented in a standard 0.18µm CMOS process, whose verification results within a 3A, 2.2MHz Buck converter show that a 39ns response time with 8.6MHz bandwidth is realized with only 2µA on-state and zero off-state current consumption.","PeriodicalId":319342,"journal":{"name":"2019 IEEE 13th International Conference on ASIC (ASICON)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115612315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-10-01DOI: 10.1109/ASICON47005.2019.8983641
Xuemei Fan, Rujin Wang, Qinghui Zeng, Hao Liu, Shengli Lu
The performance and reliability of integrated circuits are susceptible to PVTA variations. Conventional designs reserve certain timing margin and consider the worst-case to avoid these side effects. Timing resilient circuits can reduce the timing safe margin with the cost of excessive energy overhead and an unsteady state under a low voltage. In this study, we exploit a simple steady timing resilient sample by expanding previous works to save considerable extra power overhead. This sample executes timing errors detection based on the delay data sense detection and is implemented both on latches and data strobe flip-flops to recover errors with merely four extra transistors. The effectiveness and efficiency are evaluated by the design of a systolic array CNN accelerator in the 40-nm process. Simulation results demonstrate that the accelerator can achieve a stable performance without any accuracy loss, with the voltage scaled to 0.57V.
{"title":"A Simple Steady Timing Resilient Sample Based on Delay Data Sense Detection","authors":"Xuemei Fan, Rujin Wang, Qinghui Zeng, Hao Liu, Shengli Lu","doi":"10.1109/ASICON47005.2019.8983641","DOIUrl":"https://doi.org/10.1109/ASICON47005.2019.8983641","url":null,"abstract":"The performance and reliability of integrated circuits are susceptible to PVTA variations. Conventional designs reserve certain timing margin and consider the worst-case to avoid these side effects. Timing resilient circuits can reduce the timing safe margin with the cost of excessive energy overhead and an unsteady state under a low voltage. In this study, we exploit a simple steady timing resilient sample by expanding previous works to save considerable extra power overhead. This sample executes timing errors detection based on the delay data sense detection and is implemented both on latches and data strobe flip-flops to recover errors with merely four extra transistors. The effectiveness and efficiency are evaluated by the design of a systolic array CNN accelerator in the 40-nm process. Simulation results demonstrate that the accelerator can achieve a stable performance without any accuracy loss, with the voltage scaled to 0.57V.","PeriodicalId":319342,"journal":{"name":"2019 IEEE 13th International Conference on ASIC (ASICON)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123598015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}