2019 IEEE 13th International Conference on ASIC (ASICON)最新文献

英文中文

Hardware Implementation of Convolutional Neural Network for Face Feature Extraction 卷积神经网络人脸特征提取的硬件实现

2019 IEEE 13th International Conference on ASIC (ASICON)

Pub Date : 2019-10-01 DOI: 10.1109/ASICON47005.2019.8983575

Ru Ding, Xuemei Tian, Guoqiang Bai, G. Su, Xingjun Wu

As an important feed-forward neural network in the field of deep learning, convolutional neural network (CNN) has been widely used in image classification, face recognition, natural language processing and document analysis in recent years. CNN has a large amount of data and many multiply and accumulate (MAC) operations. With the diversity of application files, the channel sizes and kernel sizes of CNN are diverse, while the existing hardware platform mostly adopts the average optimization technology, which causes the waste of computing resources. In this paper, a special configurable convolution computing array is designed, which contains 15 convolution units, each PE contains 6×6 MAC operations, it can be configured to calculate three different kernel sizes of 5×5, 3×3 and 1×1. At the same time, pipeline structure is used to synchronize convolution and pooling operations, which reduces the storage of intermediate results. We design the special hardware structure to optimize DeepID network. Tested on Altera Cyclone V FPGA, the peak performance of each convolution layer at 50 MHz is 27 GOPS, and the average utilization of the MAC is 92%.

卷积神经网络(CNN)作为深度学习领域重要的前馈神经网络，近年来在图像分类、人脸识别、自然语言处理和文档分析等领域得到了广泛的应用。CNN的数据量很大，有很多的乘法和累加运算(MAC)。随着应用程序文件的多样性，CNN的通道大小和内核大小也是多种多样的，而现有的硬件平台大多采用平均优化技术，造成了计算资源的浪费。本文设计了一种特殊的可配置卷积计算阵列，该阵列包含15个卷积单元，每个PE包含6×6 MAC操作，可配置计算5×5、3×3和1×1三种不同内核大小。同时，采用流水线结构同步卷积和池化操作，减少了中间结果的存储。我们设计了特殊的硬件结构来优化DeepID网络。在Altera Cyclone V FPGA上测试，50 MHz时每个卷积层的峰值性能为27 GOPS, MAC的平均利用率为92%。

{"title":"Hardware Implementation of Convolutional Neural Network for Face Feature Extraction","authors":"Ru Ding, Xuemei Tian, Guoqiang Bai, G. Su, Xingjun Wu","doi":"10.1109/ASICON47005.2019.8983575","DOIUrl":"https://doi.org/10.1109/ASICON47005.2019.8983575","url":null,"abstract":"As an important feed-forward neural network in the field of deep learning, convolutional neural network (CNN) has been widely used in image classification, face recognition, natural language processing and document analysis in recent years. CNN has a large amount of data and many multiply and accumulate (MAC) operations. With the diversity of application files, the channel sizes and kernel sizes of CNN are diverse, while the existing hardware platform mostly adopts the average optimization technology, which causes the waste of computing resources. In this paper, a special configurable convolution computing array is designed, which contains 15 convolution units, each PE contains 6×6 MAC operations, it can be configured to calculate three different kernel sizes of 5×5, 3×3 and 1×1. At the same time, pipeline structure is used to synchronize convolution and pooling operations, which reduces the storage of intermediate results. We design the special hardware structure to optimize DeepID network. Tested on Altera Cyclone V FPGA, the peak performance of each convolution layer at 50 MHz is 27 GOPS, and the average utilization of the MAC is 92%.","PeriodicalId":319342,"journal":{"name":"2019 IEEE 13th International Conference on ASIC (ASICON)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132676008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Performance optimization for LDO regulator based on the differential evolution 基于差分进化的LDO调节器性能优化

2019 IEEE 13th International Conference on ASIC (ASICON)

Pub Date : 2019-10-01 DOI: 10.1109/ASICON47005.2019.8983642

Jintao Li, Yanhan Zeng, Hailong Wu, R. Li, Jun Zhang, Hongzhou Tan

An application of differential evolution for parameter optimization in the low dropout regulator (LDO) is presented in this paper. The parameters optimization by manual work for the analog integrated circuit, such as LDO, is laborious and time-consuming, and it is uncertain to find the relatively good result. In this paper, the differential evolution is used to optimize the parameters and find the relatively good performance of LDO. In order to improve the convergence speed and optimization effect, a new constraint solution and a fast weight-based non-dominated sorting method are proposed. Simulation results show that the gain-bandwidth product,load regulation and line regulation are improved by 206.5%, 58.1% and 87.6%, respectively, compared with the manual solution.

本文介绍了差分进化算法在低差压稳压器参数优化中的应用。对于模拟集成电路，如LDO，手工进行参数优化既费力又耗时，而且不确定是否能找到相对较好的结果。本文采用差分进化方法对参数进行优化，找到了性能相对较好的LDO。为了提高收敛速度和优化效果，提出了一种新的约束解和一种快速的基于权重的非支配排序方法。仿真结果表明，与手动方案相比，该方案的增益-带宽积、负载调节性和线路调节性分别提高了206.5%、58.1%和87.6%。

引用次数: 2

Oxygen-plasma-based digital etching for GaN/AlGaN high electron mobility transistors GaN/AlGaN高电子迁移率晶体管的氧等离子体数字刻蚀

2019 IEEE 13th International Conference on ASIC (ASICON)

Pub Date : 2019-10-01 DOI: 10.1109/ASICON47005.2019.8983678

Jingyi Wu, Hongyu Yu, Yang Jiang, Zeyu Wan, Siqi Lei, W. Cheng, Guangnan Zhou, R. Sokolovskij, Qing Wang, G. Xia

Digital etching is an effective method to lower dry etch damages in A1GaN/GaN HEMTs. This work systematically investigated O2-plasma-based digital etching of AlGaN and p-GaN. AlN layers were used as the etch stop layers in the AlGaN etch. Important process aspects such as the use of the AlN layers, the RF power, the oxygen flow rate, the oxidation time and the resulting roughness were studied. These are technically relevant to obtain controllable, uniform etch surfaces with low surface damages for better HEMT performance.

数字刻蚀是降低A1GaN/GaN hemt干刻蚀损伤的有效方法。本工作系统地研究了基于o2等离子体的AlGaN和p-GaN的数字刻蚀。AlN层作为AlGaN刻蚀中的刻蚀停止层。研究了AlN层的使用、射频功率、氧流量、氧化时间和粗糙度等重要工艺方面的问题。这在技术上与获得可控、均匀、低表面损伤的蚀刻表面有关，从而获得更好的HEMT性能。

引用次数: 1

A 40Gb/s Low Power Transmitter with 2-tap FFE and 40:1 MUX in 28nm CMOS Technology 40Gb/s低功耗发射机，采用28nm CMOS技术，2分接FFE和40:1 MUX

2019 IEEE 13th International Conference on ASIC (ASICON)

Pub Date : 2019-10-01 DOI: 10.1109/asicon47005.2019.8983623

Wenbin He, Fan Ye, Junyan Ren

This paper introduces a low power NRZ transmitter with 2-tap multiple-FFE operating at 40Gb/s. The transmitter incorporates 3-stage MUX, a tailless CML driver and a resistance calibration system. Simplifying low speed MUX, modifying high speed MUX structure and adopting tailless CML save a lot of power and improve signal quality. The simulation results show that the design can work at 40Gb/s with a −14.8 dB RLGC channel in 28nm CMOS technology. The simulation power consumption is 16.83 mW under 1.05V supply.

本文介绍了一种工作速度为40Gb/s的低功耗2分路多频频发射机。变送器包含3级MUX，无尾CML驱动器和电阻校准系统。简化低速MUX，修改高速MUX结构，采用无尾CML，大大节省了功耗，提高了信号质量。仿真结果表明，该设计可以在−14.8 dB RLGC通道下以40Gb/s的速度在28nm CMOS技术下工作。在1.05V电源下，仿真功耗为16.83 mW。

引用次数: 0

Advanced Simulation of RRAM Memory Cells RRAM存储单元的高级模拟

2019 IEEE 13th International Conference on ASIC (ASICON)

Pub Date : 2019-10-01 DOI: 10.1109/ASICON47005.2019.8983467

T. Sadi, O. Badami, V. Georgiev, J. Ding, A. Asenov

Resistive random-access memories (RRAMs) are overwhelmingly viewed as potential candidates for the next generation of non-volatile memory devices. Here, we discuss the advantages of the kinetic Monte Carlo (KMC) simulation framework for RRAMs. We use a robust KMC simulator to analyze transport in promising oxide structures. The simulator couples self-consistently charge transport and thermal effects in the three-dimensional (3D) space, allowing a realistic reconstruction of the conductive filaments responsible for switching. By presenting insightful results, we argue that using a 3D physical electro-thermal simulator is necessary for understanding RRAM operation and reliability.

电阻式随机存取存储器(rram)被广泛认为是下一代非易失性存储器的潜在候选器件。在这里，我们讨论了动态蒙特卡罗(KMC)模拟框架的优点。我们使用一个鲁棒的KMC模拟器来分析有前途的氧化物结构中的输运。该模拟器耦合了三维(3D)空间中的自一致电荷传输和热效应，从而可以真实地重建负责开关的导电细丝。通过提出富有洞察力的结果，我们认为使用3D物理电热模拟器对于理解RRAM的运行和可靠性是必要的。

引用次数: 0

A Power-Area-Efficient Low-Dropout Regulator With Enhanced Buffer Impedance Attenuation 一种具有增强缓冲阻抗衰减的功率面积效率低差稳压器

2019 IEEE 13th International Conference on ASIC (ASICON)

Pub Date : 2019-10-01 DOI: 10.1109/ASICON47005.2019.8983499

Ziyun He, Shaoquan Liao, Zixin Wang, Jianping Guo

This paper proposed a low-dropout regulator (LDO) with a dynamic-biasing super buffer technique and a 2-stage error amplifier (EA). The 2-stage EA provides a high loop gain and wide output swing so as to implement an ultra small load regulation and line regulation. The dynamic-biasing super buffer technique, made of super source follower, ultra low-output impedance buffer and dynamic biasing technique, ensures the stability of the LDO with a 1-µF output capacitance and adjusts the quiescent current (Iq) of the LDO in different load current so that an ultra low Iq is achieved by the proposed LDO structure. Based on Global Foundry 0.18-µm CMOS process, the circuit can drive up to 200-mA load current at 1.2-V output voltage in 1.4-V- 2-V supply voltage (Vs). In 1.4-Vs, Iq is 26.3-µA in no load and 57.7-µA in full load. In 2-Vs, Iq is 26.4-µA in no load and 246.2-µA in full load. The load regulation is 1.12-µV/mA in 1.4-Vs and the line regulation is 0.50-mV/V in 200-mA full load.

本文提出了一种采用动态偏置超级缓冲技术和2级误差放大器的低差稳压器。2级EA提供高环路增益和宽输出摆幅，从而实现超小负载调节和线路调节。由超源跟随器、超低输出阻抗缓冲器和动态偏置技术组成的动态偏置超级缓冲技术，以1µF的输出电容保证了LDO的稳定性，并在不同负载电流下调节LDO的静态电流(Iq)，从而实现了LDO结构的超低Iq。该电路基于Global Foundry 0.18µm CMOS工艺，在1.4 v - 2v电源电压下，可在1.2 v输出电压下驱动高达200ma的负载电流。在1.4 v电压下，Iq在空载时为26.3µA，在满载时为57.7µA。在2vs中，Iq在空载时为26.4µA，在满载时为246.2µA。负载稳压在1.4 V时为1.12µV/mA，线路稳压在200 mA满负荷时为0.50 mv /V。

引用次数: 0

Method for improving energy efficiency of elliptic curve cryptography algorithm on reconfigurable symmetric cipher processor 在可重构对称密码处理器上提高椭圆曲线密码算法能量效率的方法

2019 IEEE 13th International Conference on ASIC (ASICON)

Pub Date : 2019-10-01 DOI: 10.1109/ASICON47005.2019.8983539

Zhao Tuo, Tao Chen, Wei Li, Danyang Yang

The reconfigurable very-long instruction word (VLIW) cipher processor need a large number of instructions and a long execution period when processing the elliptic curve cryptographic algorithm. By adding a modular operation acceleration unit to the cryptographic processor, the energy efficiency of the cipher processor processing the elliptic curve cryptographic algorithm is improved. After synthesis under the 130nm process standard cell library, the experimental results show that the module operation acceleration unit area is 30 Kgate, accounting for 6% of the processor area, and the 256-bit binary domain point multiplication operation time is reduced from 22.8ms to 5.8ms, and the energy efficiency is improved by about 372%.

可重构超长指令字(VLIW)密码处理器在处理椭圆曲线密码算法时，需要大量的指令和较长的执行周期。通过在密码处理器中增加模块化运算加速单元，提高了密码处理器处理椭圆曲线密码算法的能量效率。在130nm制程标准单元库下合成后，实验结果表明，模块运算加速单位面积为30 Kgate，占处理器面积的6%，256位二进制域点乘法运算时间由22.8ms缩短至5.8ms，能效提高约372%。

引用次数: 0

Genetic Architecture Search for Binarized Neural Networks 二值化神经网络的遗传结构搜索

2019 IEEE 13th International Conference on ASIC (ASICON)

Pub Date : 2019-10-01 DOI: 10.1109/ASICON47005.2019.8983441

Yangyang Chang, G. Sobelman, Xiaofang Zhou

In order for deep learning applications to run efficiently on low-power edge devices, including mobile and internet-of-things systems, it is important to reduce their computational and memory requirements. Binarized neural networks have shown promise in this area, but these are typically designed using existing architectures based on floating-point number representations. A more promising approach is to apply network architecture search algorithms to find optimized binarized architectures. In this paper, encoding schemes for the genetic algorithm search of binarized networks are described. The simulation results demonstrate the effectiveness of the proposed method.

为了使深度学习应用程序在低功耗边缘设备(包括移动和物联网系统)上高效运行，降低它们的计算和内存需求非常重要。二值化神经网络在这一领域显示出了前景，但它们通常是使用基于浮点数表示的现有架构设计的。一个更有前途的方法是应用网络架构搜索算法来寻找优化的二值化架构。本文描述了二值化网络遗传算法搜索的编码方案。仿真结果验证了该方法的有效性。

引用次数: 3

An Ultra-Low Power Cycle-by-Cycle Current Limiter Suitable for Switching-Mode Power Supply with 2.2 MHz Frequency 一种适用于2.2 MHz频率开关电源的超低功耗逐周期限流器

2019 IEEE 13th International Conference on ASIC (ASICON)

Pub Date : 2019-10-01 DOI: 10.1109/ASICON47005.2019.8983457

Yue Shi, Jiawen Wang, Jianwen Cao, Ze-kun Zhou

A high performance, low power cycle-by-cycle current limiter is proposed in this paper, which can prevent switching-mode power supply from damage under over current or short circuit conditions. With the clamping effect formed by feedback loop with operational amplifier, the output current can be proportionally sampled and transferred to a voltage relative to ground. In order to further reduce power consumption, the sampling loop with low bias current is only enabled during the on time of high-side transistors. Besides, an assistant clamping circuit is presented to accelerate the settling time of every cycle for high-speed applications. What's more, a source-input comparator with voltage clamper is adopted to judge the status of output current, which can realize reference level and comparison at the same time for further speed improvement with lower power requirement. The proposed current limiter is implemented in a standard 0.18µm CMOS process, whose verification results within a 3A, 2.2MHz Buck converter show that a 39ns response time with 8.6MHz bandwidth is realized with only 2µA on-state and zero off-state current consumption.

本文提出了一种高性能、低功耗的逐周期限流器，可防止开关电源在过流或短路情况下损坏。利用运放反馈回路形成的箝位效应，可以对输出电流按比例采样并转换成相对于地的电压。为了进一步降低功耗，低偏置电流的采样回路只在高侧晶体管导通时启用。此外，还设计了辅助箝位电路，加快了高速应用中每个周期的稳定时间。采用带电压箝位器的源输入比较器判断输出电流的状态，可以同时实现参考电平和比较，从而在更低的功率要求下进一步提高速度。所提出的限流器在标准的0.18µm CMOS工艺中实现，其在3A, 2.2MHz Buck变换器中的验证结果表明，仅在2µa的导通状态和零的关断状态电流消耗下，实现了39ns的响应时间和8.6MHz的带宽。

{"title":"An Ultra-Low Power Cycle-by-Cycle Current Limiter Suitable for Switching-Mode Power Supply with 2.2 MHz Frequency","authors":"Yue Shi, Jiawen Wang, Jianwen Cao, Ze-kun Zhou","doi":"10.1109/ASICON47005.2019.8983457","DOIUrl":"https://doi.org/10.1109/ASICON47005.2019.8983457","url":null,"abstract":"A high performance, low power cycle-by-cycle current limiter is proposed in this paper, which can prevent switching-mode power supply from damage under over current or short circuit conditions. With the clamping effect formed by feedback loop with operational amplifier, the output current can be proportionally sampled and transferred to a voltage relative to ground. In order to further reduce power consumption, the sampling loop with low bias current is only enabled during the on time of high-side transistors. Besides, an assistant clamping circuit is presented to accelerate the settling time of every cycle for high-speed applications. What's more, a source-input comparator with voltage clamper is adopted to judge the status of output current, which can realize reference level and comparison at the same time for further speed improvement with lower power requirement. The proposed current limiter is implemented in a standard 0.18µm CMOS process, whose verification results within a 3A, 2.2MHz Buck converter show that a 39ns response time with 8.6MHz bandwidth is realized with only 2µA on-state and zero off-state current consumption.","PeriodicalId":319342,"journal":{"name":"2019 IEEE 13th International Conference on ASIC (ASICON)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115612315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Simple Steady Timing Resilient Sample Based on Delay Data Sense Detection 基于延迟数据感知检测的简单稳定定时弹性采样

2019 IEEE 13th International Conference on ASIC (ASICON)

Pub Date : 2019-10-01 DOI: 10.1109/ASICON47005.2019.8983641

Xuemei Fan, Rujin Wang, Qinghui Zeng, Hao Liu, Shengli Lu

The performance and reliability of integrated circuits are susceptible to PVTA variations. Conventional designs reserve certain timing margin and consider the worst-case to avoid these side effects. Timing resilient circuits can reduce the timing safe margin with the cost of excessive energy overhead and an unsteady state under a low voltage. In this study, we exploit a simple steady timing resilient sample by expanding previous works to save considerable extra power overhead. This sample executes timing errors detection based on the delay data sense detection and is implemented both on latches and data strobe flip-flops to recover errors with merely four extra transistors. The effectiveness and efficiency are evaluated by the design of a systolic array CNN accelerator in the 40-nm process. Simulation results demonstrate that the accelerator can achieve a stable performance without any accuracy loss, with the voltage scaled to 0.57V.

集成电路的性能和可靠性易受PVTA变化的影响。传统设计保留一定的时间余量，并考虑最坏情况以避免这些副作用。时序弹性电路可以降低时序安全裕度，但代价是能量开销过大和低电压下的不稳定。在本研究中，我们通过扩展以前的工作来开发一个简单的稳定定时弹性样本，以节省相当多的额外功率开销。该示例基于延迟数据感测执行时序错误检测，并在锁存器和数据频闪锁触发器上实现，仅用四个额外的晶体管即可恢复错误。通过设计40纳米工艺的收缩阵列CNN加速器来评估其有效性和效率。仿真结果表明，当电压降至0.57V时，该加速器能够在不影响精度的情况下实现稳定的性能。

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2019 IEEE 13th International Conference on ASIC (ASICON)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀