2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)最新文献_第10页

IDE Development, Logic Synthesis and Buffer/Splitter Insertion Framework for Adiabatic Quantum-Flux-Parametron Superconducting Circuits 绝热量子通量参数超导电路的IDE开发、逻辑合成和缓冲/分频器插入框架

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00042

R. Cai, Xiaolong Ma, O. Chen, Ao Ren, Ning Liu, N. Yoshikawa, Yanzhi Wang

Josephson Junction (JJ) based superconductor logic families have been proposed and implemented to process analog and digital signals [1] for its low energy dissipation and ultrafast switching speed. Thanks to its construction of resistance-less wires and ultrafast switches, it can operate at clock frequencies of several tens of gigahertz and even hundreds of thousands of times as energy efficient as its CMOS counterparts. It has been perceived to be an important candidate to replace stateof-the-art CMOS due to the superior potential in operation speed and energy efficiency, as recognized by the U.S. IARPA C3 and SuperTools Programs and Japan MEXT-JSPS Project. The design and fabrication of superconducting circuits have already been established [2]-[4]. In addition, a prototype superconducting microprocessor "Core 1" has been demonstrated in 2004 [3], which is able to execute instructions at a high clock frequency of several tens of gigahertz, and with extremely low-power dissipation. These achievements make superconducting electronics highly promising for future high-performance computing applications. As one of the most matured superconducting technology, the Rapid-Single-Flux-Quantum (RSFQ) technology is proposed by K. Likharev, O. Mukhanoc, V. Semenov in 1985 [1]. Despite its capability to be operated at an ultra-high speed of hundreds of GHz while maintaining extremely low switching energy (10^-19 J), it suffers from an increasing static power due to on-chip resistors that are required for constant DC bias supply for the main RSFQ circuit. Numerous methods have been proposed to resolve the static power dissipation problem of RSFQ, including low-voltage RSFQ (LV-RSFQ) [5], reciprocal quantum logic (RQL) [6], LRbiased RSFQ [7] and energy-efficient single-flux quantum (eSFQ) [8]. The Adiabatic Quantum-Flux-Parametron (AQFP) technology, on the other hand, uses AC bias/excitation currents as both multiphase clock signal and power supply [9] to mitigate the power consumption overhead of DC bias while operating at a frequency of few GHz. Consequently, AQFP is remarkably energy efficient compared to RSFQ, albeit operating at a lower frequency. The energy-delay-product (EDP) of the AQFP circuits fabricated using processes such as the AIST standard process 2 (STP2) and the MIT-LL SFQ process [10], [11], is at least 200 times smaller than those of the other energy-efficient superconductor logics and is only three orders of magnitude larger than the quantum limit [9]. Physical testing results of an AQFP 8-bit carry-look-ahead adder and large scale circuits consisting up-to 10,000 AQFP logic gates have demonstrated the AQFP being a promising technology that is robust against circuit parameter variations [12]. Despite the high application potential of AQFP in VLSI circuits, a systematic, automatic synthesis framework for AQFP is imminent. There are two features of AQFP that restrict conventional CMOS synthesis methods being directly applied on AQFP. In spi

基于Josephson结(JJ)的超导体逻辑族因其低能量损耗和超快的开关速度而被提出并实现，用于处理模拟和数字信号[1]。由于其无电阻导线和超快开关的结构，它可以在几十千兆赫兹的时钟频率下工作，甚至比CMOS同类产品节能数十万倍。美国IARPA C3和SuperTools项目以及日本next - jsps项目都承认，由于在运行速度和能源效率方面具有卓越的潜力，它已被认为是取代最先进CMOS的重要候选者。超导电路的设计和制造已经建立起来[2]-[4]。此外，超导微处理器原型“Core 1”已于2004年展示[3]，它能够以几十千兆赫的高时钟频率执行指令，并且功耗极低。这些成就使超导电子学在未来的高性能计算应用中具有很大的前景。快速单通量量子(Rapid-Single-Flux-Quantum, RSFQ)技术是由K. Likharev, O. Mukhanoc, V. Semenov于1985年提出的超导技术之一[1]。尽管它能够在数百GHz的超高速下工作，同时保持极低的开关能量(10^-19 J)，但由于片上电阻需要为主RSFQ电路提供恒定的直流偏置电源，因此它的静态功率不断增加。解决RSFQ静态功耗问题的方法有很多，包括低压RSFQ (LV-RSFQ)[5]、互反量子逻辑(RQL)[6]、LRbiased RSFQ[7]和节能单通量量子(eSFQ)[8]。另一方面，绝热量子通量参数管(AQFP)技术使用交流偏置/激励电流作为多相时钟信号和电源[9]，以减轻在几GHz频率下工作时直流偏置的功耗开销。因此，与RSFQ相比，AQFP非常节能，尽管工作频率较低。使用AIST标准工艺2 (STP2)和MIT-LL SFQ工艺[10]，[11]等工艺制造的AQFP电路的能量延迟积(EDP)比其他节能超导体逻辑的能量延迟积至少小200倍，仅比量子极限大三个数量级[9]。AQFP 8位超前进位加法器和由多达10,000个AQFP逻辑门组成的大规模电路的物理测试结果表明，AQFP是一种有前途的技术，对电路参数变化具有鲁棒性[12]。尽管AQFP在VLSI电路中具有很高的应用潜力，但一个系统的、自动的AQFP合成框架迫在眉睫。AQFP有两个特点，限制了传统的CMOS合成方法直接应用于AQFP。尽管传统CMOS电路高度依赖于基于and - or -逆变器(AOI)的表示，但AQFP电路更倾向于多数门。事实上，它的两个输入与或门也是用三个输入多数门构建的，其中一个输入是恒定的。此外，由于AQFP技术具有时钟同步的数据传播特性，它要求任何门的所有输入具有相等的延迟。为了满足这种平衡的时序要求，需要在电路中插入分配器和缓冲器。事实上，一些电路的大小可以增加一倍，甚至与最佳数量的缓冲器和分离器插入。缓冲区和分离器插入方法会对整体资源消耗产生巨大影响。随着设计复杂性的增加，未优化的缓冲区和分配器插入方法可能会导致添加大量不必要的缓冲区和分配器。除了完整的综合框架外，还缺乏用于AQFP设计的集成开发环境(IDE)。一个集成了原理图和布局编辑器、仿真和分析的AQFP集成工具的IDE即将出现，以实现更好、更高效的AQFP设计流程。在本文中，我们提出了一个完整的AQFP设计工具，包括一个集成开发环境(IDE)，一个完整的基于多数的合成框架和一个缓冲器和分离器插入框架。我们提出了一个AQFP电路的多数门综合框架，该框架能够通过将所有可行的三输入子网络映射到相应的基于MAJ的实现，将任何AOI网络列表转换为相应的MAJ网络列表。此外，我们还提出了一种自动缓冲区和分离器插入方法，该方法能够在任何给定的门级网络列表中添加最佳数量的缓冲区和分离器。该方法可以在任意库对分配器大小的限制下，找到要插入的缓冲区和分配器的最小数量，以实现相等的延迟。

{"title":"IDE Development, Logic Synthesis and Buffer/Splitter Insertion Framework for Adiabatic Quantum-Flux-Parametron Superconducting Circuits","authors":"R. Cai, Xiaolong Ma, O. Chen, Ao Ren, Ning Liu, N. Yoshikawa, Yanzhi Wang","doi":"10.1109/ISVLSI.2019.00042","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00042","url":null,"abstract":"Josephson Junction (JJ) based superconductor logic families have been proposed and implemented to process analog and digital signals [1] for its low energy dissipation and ultrafast switching speed. Thanks to its construction of resistance-less wires and ultrafast switches, it can operate at clock frequencies of several tens of gigahertz and even hundreds of thousands of times as energy efficient as its CMOS counterparts. It has been perceived to be an important candidate to replace stateof-the-art CMOS due to the superior potential in operation speed and energy efficiency, as recognized by the U.S. IARPA C3 and SuperTools Programs and Japan MEXT-JSPS Project. The design and fabrication of superconducting circuits have already been established [2]-[4]. In addition, a prototype superconducting microprocessor \"Core 1\" has been demonstrated in 2004 [3], which is able to execute instructions at a high clock frequency of several tens of gigahertz, and with extremely low-power dissipation. These achievements make superconducting electronics highly promising for future high-performance computing applications. As one of the most matured superconducting technology, the Rapid-Single-Flux-Quantum (RSFQ) technology is proposed by K. Likharev, O. Mukhanoc, V. Semenov in 1985 [1]. Despite its capability to be operated at an ultra-high speed of hundreds of GHz while maintaining extremely low switching energy (10^-19 J), it suffers from an increasing static power due to on-chip resistors that are required for constant DC bias supply for the main RSFQ circuit. Numerous methods have been proposed to resolve the static power dissipation problem of RSFQ, including low-voltage RSFQ (LV-RSFQ) [5], reciprocal quantum logic (RQL) [6], LRbiased RSFQ [7] and energy-efficient single-flux quantum (eSFQ) [8]. The Adiabatic Quantum-Flux-Parametron (AQFP) technology, on the other hand, uses AC bias/excitation currents as both multiphase clock signal and power supply [9] to mitigate the power consumption overhead of DC bias while operating at a frequency of few GHz. Consequently, AQFP is remarkably energy efficient compared to RSFQ, albeit operating at a lower frequency. The energy-delay-product (EDP) of the AQFP circuits fabricated using processes such as the AIST standard process 2 (STP2) and the MIT-LL SFQ process [10], [11], is at least 200 times smaller than those of the other energy-efficient superconductor logics and is only three orders of magnitude larger than the quantum limit [9]. Physical testing results of an AQFP 8-bit carry-look-ahead adder and large scale circuits consisting up-to 10,000 AQFP logic gates have demonstrated the AQFP being a promising technology that is robust against circuit parameter variations [12]. Despite the high application potential of AQFP in VLSI circuits, a systematic, automatic synthesis framework for AQFP is imminent. There are two features of AQFP that restrict conventional CMOS synthesis methods being directly applied on AQFP. In spi","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"92 1","pages":"187-192"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74945988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

A Low-Power Recurrence-Based Radix 4 Divider Using Signed-Digit Addition 基于递归的低功耗带符号数字加法基数4除法

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00077

Matthew Gaalswyk, James W. Stine

This paper presents a novel radix-4 division by recurrence architecture that utilizes a hierarchical Signed-Digit (SD) adder. The implementations are easily generated based on the methodology as it is suited towards digital implementations. Results are generated for several designs using Global Foundries 45nm SOI technology and ARM standard cells. Results indicate that power dissipation can be reduced using these architectures for division by recurrence as the area is significantly decreased.

本文提出了一种新的基于递归的基数-4除法结构，该结构利用了一个分层的符号数字加法器。实现很容易根据方法生成，因为它适合于数字实现。采用globalfoundries 45nm SOI技术和ARM标准电池的几种设计产生了结果。结果表明，使用这些递归除法架构可以降低功耗，因为面积显着减少。

引用次数: 0

Energy-efficient Analog Processing Architecture for Direction of Arrival with Microphone Array 麦克风阵列到达方向的节能模拟处理架构

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00097

Changlu Liu, T. Lan, Qin Li, Kaige Jia, Yidian Fan, Xing Wu, F. Qiao, W. Qi, Xinjun Liu, Huazhong Yang

Direction of arrival (DOA) is a critical component in the conventional smart acoustic system for navigation, noise canceling hearing aids and so on. However, conventional DOA has encountered power consumption and processing speed bottlenecks dominated by analog-to-digital converter (ADC) and fast fourier transform (FFT). Especially in the always-on applications, the power-hungry ADC and time-consuming FFT take up most of the system's computation cost. We propose a novel processing architecture with analog-domain processing for DOA. The whole processing procedure of DOA is implemented in the analog domain without ADC and frequency-domain transformation. In order to verify the performance of the architecture, we simulate a generic DOA algorithm. Under the CMOS 0.18µm process, the results show the 94.5% reduction in power consumption and 4724× improvement in processing speed compared to conventional digital realization. We simulate the simple task with the direction accuracy of 80.74%, which can be extended to a more complex scenario.

到达方向(DOA)是导航、降噪助听器等传统智能声学系统的关键组成部分。然而，传统的DOA技术遇到了以模数转换器(ADC)和快速傅立叶变换(FFT)为主的功耗和处理速度瓶颈。特别是在不间断应用中，耗电的ADC和耗时的FFT占据了系统的大部分计算成本。我们提出了一种新的DOA模拟域处理体系结构。整个DOA处理过程在模拟域中实现，没有进行ADC和频域变换。为了验证该架构的性能，我们模拟了一个通用的DOA算法。在CMOS 0.18µm工艺下，与传统数字实现相比，功耗降低94.5%，处理速度提高4724倍。我们模拟的简单任务的方向精度为80.74%，可以扩展到更复杂的场景。

{"title":"Energy-efficient Analog Processing Architecture for Direction of Arrival with Microphone Array","authors":"Changlu Liu, T. Lan, Qin Li, Kaige Jia, Yidian Fan, Xing Wu, F. Qiao, W. Qi, Xinjun Liu, Huazhong Yang","doi":"10.1109/ISVLSI.2019.00097","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00097","url":null,"abstract":"Direction of arrival (DOA) is a critical component in the conventional smart acoustic system for navigation, noise canceling hearing aids and so on. However, conventional DOA has encountered power consumption and processing speed bottlenecks dominated by analog-to-digital converter (ADC) and fast fourier transform (FFT). Especially in the always-on applications, the power-hungry ADC and time-consuming FFT take up most of the system's computation cost. We propose a novel processing architecture with analog-domain processing for DOA. The whole processing procedure of DOA is implemented in the analog domain without ADC and frequency-domain transformation. In order to verify the performance of the architecture, we simulate a generic DOA algorithm. Under the CMOS 0.18µm process, the results show the 94.5% reduction in power consumption and 4724× improvement in processing speed compared to conventional digital realization. We simulate the simple task with the direction accuracy of 80.74%, which can be extended to a more complex scenario.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"105 1","pages":"507-512"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87481938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Routing Performance Optimization for Homogeneous Droplets on MEDA-based Digital Microfluidic Biochips 基于meda的数字微流控生物芯片上均匀液滴路由性能优化

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00082

Sarit Chakraborty, Susanta Chakraborty

Digital Microfluidic based Biochips (DMFBs) are capable of automation, re-configurable, low operational cost and accuracy of results. Such Lab-on-Chips (Loc's) are now extensively used in point of care diagnosis and other monitoring applications. Routing of micro or nano (10^-6 or 10^-9) litre volume of droplets on such chips elevate few critical challenges due to the blockages caused by microfluidic modules present on the chip. Micro-Electrode Dot Array (MEDA) based architecture of DMFB can facilitate cross contamination free routing and eradicate other routing issues over conventional DMF chips. This paper proposes a novel heuristic routing technique for MEDA based DMFB architecture to tackle routing complexities due to overlapping nets, interfering blockages and deadlock zones formed by the conflicting nets. We have categorized various region based movements of droplet on MEDA chip and derived a metric named Snooping Index (SIn) to improve the routing performance of the droplets in first phase. Next an exhaustive search is applied to find the routing path for the remaining nets considering different constraints specific to MEDA platform. Finally we have computed another measure called 'Zone Compaction Factor' (ZCF) to overcome blockage extensive route paths. Experimental results on benchmark suite I and III show our proposed technique significantly reduces latest arrival time, average assay execution time and number of used cells as compared with earlier methods.

数字微流控生物芯片(dmfb)具有自动化、可重构、操作成本低、结果准确等特点。这种芯片实验室(Loc)现在广泛用于护理点诊断和其他监测应用。微或纳米(10^-6或10^-9)升体积的液滴在这种芯片上的布线，由于芯片上存在的微流体模块造成的阻塞，增加了一些关键挑战。基于微电极点阵列(MEDA)的DMFB架构可以促进无交叉污染的路由，并消除传统DMF芯片的其他路由问题。本文提出了一种基于MEDA的DMFB架构的启发式路由技术，以解决由于网络重叠、干扰阻塞和冲突网络形成的死锁区而导致的路由复杂性。我们对MEDA芯片上基于不同区域的液滴运动进行了分类，并推导了一个名为窥探指数(SIn)的度量来改善第一阶段液滴的路由性能。其次，考虑MEDA平台的不同约束条件，采用穷举搜索方法寻找剩余网络的路由路径。最后，我们计算了另一种称为“区域压实系数”(ZCF)的措施，以克服堵塞广泛的路线。基准套件I和III的实验结果表明，与早期的方法相比，我们提出的技术显着减少了最新到达时间，平均分析执行时间和使用的细胞数量。

{"title":"Routing Performance Optimization for Homogeneous Droplets on MEDA-based Digital Microfluidic Biochips","authors":"Sarit Chakraborty, Susanta Chakraborty","doi":"10.1109/ISVLSI.2019.00082","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00082","url":null,"abstract":"Digital Microfluidic based Biochips (DMFBs) are capable of automation, re-configurable, low operational cost and accuracy of results. Such Lab-on-Chips (Loc's) are now extensively used in point of care diagnosis and other monitoring applications. Routing of micro or nano (10^-6 or 10^-9) litre volume of droplets on such chips elevate few critical challenges due to the blockages caused by microfluidic modules present on the chip. Micro-Electrode Dot Array (MEDA) based architecture of DMFB can facilitate cross contamination free routing and eradicate other routing issues over conventional DMF chips. This paper proposes a novel heuristic routing technique for MEDA based DMFB architecture to tackle routing complexities due to overlapping nets, interfering blockages and deadlock zones formed by the conflicting nets. We have categorized various region based movements of droplet on MEDA chip and derived a metric named Snooping Index (SIn) to improve the routing performance of the droplets in first phase. Next an exhaustive search is applied to find the routing path for the remaining nets considering different constraints specific to MEDA platform. Finally we have computed another measure called 'Zone Compaction Factor' (ZCF) to overcome blockage extensive route paths. Experimental results on benchmark suite I and III show our proposed technique significantly reduces latest arrival time, average assay execution time and number of used cells as compared with earlier methods.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"33 1","pages":"419-424"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87353695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

An Area Effective Programmable Front-end Amplifier for Neural Signal Acquisition 一种区域有效的可编程前端神经信号采集放大器

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00046

Gopabandhu Hota, Hardik Agrawal, M. Sharad

Acquisition and analysis of neural signals have greatly changed our understanding of the brain. These neural implants are required to be as small as possible so that they are least invasive to normal body functioning. The neural signal contains frequency components from 0.1-10KHz and amplitude in 10-100µV range, which is very small and can be easily distorted by external noise sources. This demands a very area-efficient and low-noise Front-End Amplifier (FEA). Low voltage supply and low power dissipation is another critical requirement to ensure safe implantation and prolonged battery life. Keeping all these requirements in mind, we propose a programmable area efficient and low-noise FEA design along with both manual and SAR-based Gain Tuning and Offset Cancellation Scheme which is robust to any temperature and process variations. The designed FEA occupies a minimal area of 0.05 mm2 which shows great area efficiency w.r.t. switch-capacitor based and closed-loop frontend amplifiers. Obtained maximum voltage gain from Simulation is 87.6 dB, Input-referred noise density is 20 nV/√Hz, and the power consumption is 43.2µW at 1.8V power supply with a Noise Efficiency(NEF) factor of 1.84. The proposed scheme has offset cancellation capacity up to 30 mV using the 7 bits of transistor bank.

神经信号的获取和分析极大地改变了我们对大脑的理解。这些神经植入物要求尽可能小，这样它们对正常身体功能的侵入性最小。该神经信号的频率分量在0.1-10KHz，幅值在10-100µV范围内，非常小，容易被外界噪声源扭曲。这需要一个非常面积高效和低噪声的前端放大器(FEA)。低电压和低功耗是确保安全植入和延长电池寿命的另一个关键要求。考虑到所有这些要求，我们提出了一种可编程区域高效和低噪声的有限元设计，以及手动和基于sar的增益调谐和偏移抵消方案，该方案对任何温度和工艺变化都具有鲁棒性。所设计的有限元分析的最小面积为0.05 mm2，显示出基于开关电容和闭环前端放大器的高面积效率。仿真得到的最大电压增益为87.6 dB，输入参考噪声密度为20 nV/√Hz，功耗为43.2µW，电源为1.8V，噪声效率(NEF)系数为1.84。该方案使用7位晶体管组，具有高达30 mV的偏移抵消能力。

{"title":"An Area Effective Programmable Front-end Amplifier for Neural Signal Acquisition","authors":"Gopabandhu Hota, Hardik Agrawal, M. Sharad","doi":"10.1109/ISVLSI.2019.00046","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00046","url":null,"abstract":"Acquisition and analysis of neural signals have greatly changed our understanding of the brain. These neural implants are required to be as small as possible so that they are least invasive to normal body functioning. The neural signal contains frequency components from 0.1-10KHz and amplitude in 10-100µV range, which is very small and can be easily distorted by external noise sources. This demands a very area-efficient and low-noise Front-End Amplifier (FEA). Low voltage supply and low power dissipation is another critical requirement to ensure safe implantation and prolonged battery life. Keeping all these requirements in mind, we propose a programmable area efficient and low-noise FEA design along with both manual and SAR-based Gain Tuning and Offset Cancellation Scheme which is robust to any temperature and process variations. The designed FEA occupies a minimal area of 0.05 mm2 which shows great area efficiency w.r.t. switch-capacitor based and closed-loop frontend amplifiers. Obtained maximum voltage gain from Simulation is 87.6 dB, Input-referred noise density is 20 nV/√Hz, and the power consumption is 43.2µW at 1.8V power supply with a Noise Efficiency(NEF) factor of 1.84. The proposed scheme has offset cancellation capacity up to 30 mV using the 7 bits of transistor bank.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"38 1","pages":"207-211"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81503513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Approximate Energy Recovery 4-2 Compressor for Low-Power Sub-GHz IoT Applications 用于低功耗Sub-GHz物联网应用的近似能量回收4-2压缩机

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00081

H. Thapliyal, Zachary Kahleifeh

Approximate computing is a circuit design technique that reduces area and power dissipation at the cost of accurate results. In this paper, we have investigated to further reduce the power dissipation of approximate circuits while maintaining high speeds using a form of energy recovery (ER) computing known as Pulse Boost Logic (PBL). To demonstrate power savings and speed capabilities, we have constructed an approximate 4-2 compressor circuit using PBL based ER computing. Simulations were performed using 45nm technology in Cadence Spectre. At 800 MHz, our results show the average power saving of 64% in PBL based approximate 4-2 compressor design compared to its standard CMOS based design. We also illustrate that the power saving of 89% can be achieved in 4-2 compressor by combining approximate and ER computing compared to CMOS based design of accurate 4-2 compressor. Further, we illustrate that the PBL based proposed approximate 4-2 compressor has 65% less energy consumption than the CMOS based approximate 4-2 compressor. We have verified the functionality of the proposed PBL based approximate 4-2 compressor up to 1 GHz to illustrate its application in low-power and low-energy Sub-GHz IoT applications.

近似计算是一种电路设计技术，以减少面积和功耗为代价，精确的结果。在本文中，我们研究了使用一种称为脉冲升压逻辑(PBL)的能量恢复(ER)计算形式进一步降低近似电路的功耗，同时保持高速。为了证明节省电力和提高速度的能力，我们使用基于PBL的ER计算构建了一个大约4-2的压缩机电路。在Cadence Spectre中使用45纳米技术进行了模拟。在800 MHz时，我们的结果显示，与基于标准CMOS的设计相比，基于PBL的近似4-2压缩机设计平均节省64%的功率。与基于CMOS的精确4-2压缩机设计相比，将近似计算和ER计算相结合的4-2压缩机节能89%。此外，我们还说明了基于PBL的近似4-2压缩机比基于CMOS的近似4-2压缩机能耗低65%。我们已经验证了所提出的基于PBL的高达1 GHz的近似4-2压缩机的功能，以说明其在低功耗和低功耗Sub-GHz物联网应用中的应用。

{"title":"Approximate Energy Recovery 4-2 Compressor for Low-Power Sub-GHz IoT Applications","authors":"H. Thapliyal, Zachary Kahleifeh","doi":"10.1109/ISVLSI.2019.00081","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00081","url":null,"abstract":"Approximate computing is a circuit design technique that reduces area and power dissipation at the cost of accurate results. In this paper, we have investigated to further reduce the power dissipation of approximate circuits while maintaining high speeds using a form of energy recovery (ER) computing known as Pulse Boost Logic (PBL). To demonstrate power savings and speed capabilities, we have constructed an approximate 4-2 compressor circuit using PBL based ER computing. Simulations were performed using 45nm technology in Cadence Spectre. At 800 MHz, our results show the average power saving of 64% in PBL based approximate 4-2 compressor design compared to its standard CMOS based design. We also illustrate that the power saving of 89% can be achieved in 4-2 compressor by combining approximate and ER computing compared to CMOS based design of accurate 4-2 compressor. Further, we illustrate that the PBL based proposed approximate 4-2 compressor has 65% less energy consumption than the CMOS based approximate 4-2 compressor. We have verified the functionality of the proposed PBL based approximate 4-2 compressor up to 1 GHz to illustrate its application in low-power and low-energy Sub-GHz IoT applications.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"79 1","pages":"414-418"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87719016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Test Point Insertion Using Artificial Neural Networks 使用人工神经网络插入测试点

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00054

Yang Sun, S. Millican

A method of data collecting, training, and using artificial neural networks (ANNs) for evaluating test point (TP) quality for TP insertion (TPI) is presented in this study. The TPI method analyzes a digital circuit and determines where to insert TPs to improve fault coverage under pseudo-random stimulus, but in contrast to conventional TPI algorithms using heuristically-calculated testability measures, the proposed method uses an ANN trained through fault simulation to evaluate a TP's quality. The time of feature extraction is demonstrated to be significantly faster compared to heuristic-based TP evaluation, and the impact of inserted TPs is shown to provide superior stuck-at fault coverage compared to conventional heuristic-based testability analysis.

本文提出了一种数据收集、训练和使用人工神经网络(ann)来评估TP插入(TPI)测试点(TP)质量的方法。TPI方法分析数字电路并确定在何处插入TPI以提高伪随机刺激下的故障覆盖率，但与传统的TPI算法使用启发式计算的可测试性度量不同，该方法使用经过故障模拟训练的神经网络来评估TP的质量。与启发式TP评估相比，特征提取的时间明显更快，插入TP的影响与传统的基于启发式的可测试性分析相比，提供了更好的卡在故障覆盖率。

引用次数: 16

Real-Time Automatic Music Transcription (AMT) with Zync FPGA 实时自动音乐转录(AMT)与Zync FPGA

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00075

Kevin Vaca, Archit Gajjar, Xiaokun Yang

A real-time automatic music transcription (AMT) system has a great potential for applications and interactions between people and music, such as the popular devices Amazon Echo and Google Home. This paper thus presents a design on chord recognition with the Zync7000 Field-Programmable Gate Array (FPGA), capable of sampling analog frequency signals through a microphone and, in real time, showing sheet music on a smart phone app that corresponds to the user's playing. We demonstrate the design of audio sampling on programming logic and the implementation of frequency transform and vector building on programming system, which is an embedded ARM core on the Zync FPGA. Experimental results show that the logic design spends 574 slices of look-up-tables (LUTs) and 792 slices of flip-flops. Due to the dynamic power consumption on programming system (1399 mW) being significantly higher than the dynamic power dissipation on programming logic (7 mW), the future work of this platform is to design intelligent property (IP) for algorithms of frequency transform, pitch class profile (PCP), and pattern matching with hardware description language (HDL), making the entire system-on-chip (SoC) able to be taped out as an application-specific design for consumer electronics.

实时自动音乐转录(AMT)系统在人与音乐之间的应用和互动方面具有巨大的潜力，例如流行的设备亚马逊Echo和谷歌Home。因此，本文提出了一种使用Zync7000现场可编程门阵列(FPGA)的和弦识别设计，该设计能够通过麦克风采样模拟频率信号，并实时在智能手机应用程序上显示与用户演奏相对应的乐谱。我们演示了基于编程逻辑的音频采样设计，以及基于Zync FPGA的嵌入式ARM内核的编程系统的频率变换和矢量构建的实现。实验结果表明，该逻辑设计耗费了574片查找表和792片触发器。由于编程系统的动态功耗(1399 mW)明显高于编程逻辑的动态功耗(7 mW)，因此该平台的未来工作是为频率变换、音调类轮廓(PCP)和与硬件描述语言(HDL)的模式匹配算法设计智能产权(IP)，使整个片上系统(SoC)能够作为消费电子产品的特定应用设计。

{"title":"Real-Time Automatic Music Transcription (AMT) with Zync FPGA","authors":"Kevin Vaca, Archit Gajjar, Xiaokun Yang","doi":"10.1109/ISVLSI.2019.00075","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00075","url":null,"abstract":"A real-time automatic music transcription (AMT) system has a great potential for applications and interactions between people and music, such as the popular devices Amazon Echo and Google Home. This paper thus presents a design on chord recognition with the Zync7000 Field-Programmable Gate Array (FPGA), capable of sampling analog frequency signals through a microphone and, in real time, showing sheet music on a smart phone app that corresponds to the user's playing. We demonstrate the design of audio sampling on programming logic and the implementation of frequency transform and vector building on programming system, which is an embedded ARM core on the Zync FPGA. Experimental results show that the logic design spends 574 slices of look-up-tables (LUTs) and 792 slices of flip-flops. Due to the dynamic power consumption on programming system (1399 mW) being significantly higher than the dynamic power dissipation on programming logic (7 mW), the future work of this platform is to design intelligent property (IP) for algorithms of frequency transform, pitch class profile (PCP), and pattern matching with hardware description language (HDL), making the entire system-on-chip (SoC) able to be taped out as an application-specific design for consumer electronics.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"141 1","pages":"378-384"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80139863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Post-Layout Simulation of Quasi-Adiabatic Logic Based Physical Unclonable Function 基于准绝热逻辑的物理不可克隆函数布局后仿真

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00086

Yasuhiro Takahashi, Hiroki Koyasu, S. D. Kumar, H. Thapliyal

Silicon based Physical Unclonable Function (PUF) is a popular hardware security primitive for mitigating security vulnerabilities. Recently, Quasi-adiabatic logic based physical unclonable function (QUALPUF) was first proposed by Kumar and Thapliyal. QUALPUF has ultra low-power dissipation; hence it is suitable to implement in low-power portable electronic devices such RFIDs, wireless sensor nodes, etc. In this paper, we present the post-layout simulation results of the 4-bit QUALPUF for low-power portable electronic devices. To evaluate the uniqueness and reliability, the 4-bit QUALPUF is implemented in 0.18 um standard CMOS process with 1.8 V supply voltage. The QUALPUF occupies 58.7x15.7 um2 of layout area. The post-layout simulation results illustrate that the 4-bit QUALPUF has good uniqueness and reliability with 29.73 fJ/cycle/bit energy consumption.

基于硅的物理不可克隆函数(PUF)是一种流行的用于减轻安全漏洞的硬件安全原语。最近，Kumar和Thapliyal首次提出了基于准绝热逻辑的物理不可克隆函数(QUALPUF)。QUALPUF超低功耗;因此，它适用于低功耗便携式电子设备，如rfid，无线传感器节点等。本文给出了用于低功耗便携式电子设备的4位QUALPUF的布局后仿真结果。为了评估其独特性和可靠性，4位QUALPUF在1.8 V电源电压下采用0.18 um标准CMOS工艺实现。QUALPUF的布局面积为58.7 × 15.7平方米。布局后仿真结果表明，4位QUALPUF具有良好的唯一性和可靠性，能耗为29.73 fJ/周/位。

引用次数: 3

Machine Learning-based Prediction for Phase-Based Dynamic Architectural Specialization 基于机器学习的基于阶段的动态架构专业化预测

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Pub Date : 2019-07-01 DOI: 10.1109/ISVLSI.2019.00101

Ruben Vazquez, Islam Badreldin, Mohamad Hammam Alsafrjalani, A. Gordon-Ross

Embedded computing systems are becoming increasingly complex, now performing tasks that were generally limited to desktop computing systems. However, embedded system designers are still required to adhere to stringent embedded design constraints (e.g., energy and area requirements) when designing such increasingly complex systems. To meet these constraints, configurable hardware components introduce configurable parameters (e.g., CPU voltage and frequency, cache size, cache associativity, cache line size, pipeline depth/width, etc.) that can be tuned to specific values to meet different design constraints (e.g., area, energy, performance, etc.) and user demands (e.g., increased battery life, increased performance, or a desired trade off), which translates to a better quality of the user experience. However, determining these specific parameter values is increasingly difficult and time-consuming as the configurable parameter design space increases. This issue is further complicated when considering that each application has a different set of optimal/best parameter values based on these demands and requirements. Furthermore, repetitious application behavior, known as phases, which occur throughout an application's runtime, can be exploited by tracking each phase's unique optimal parameter values; resulting in a multiplicative increase or an exponential increase in the size of the size of the configuration space. In this paper, we propose a machine learning-based methodology to significantly reduce the time required to find the optimal configurable parameter values for the instruction and data caches for each application phase. In our method, we use artificial neural networks (ANNs) to predict the optimal configuration for application phases. We collect execution statistics for use as features for an application phase and use feature reduction to significantly reduce the features size. We show that ANNs exhibit high, stable accuracy over multiple training and testing iterations. We also show that applications exhibit low energy degradations (less than 1%) for both the instruction and data caches using our methodology.

嵌入式计算系统正变得越来越复杂，现在可以执行通常仅限于桌面计算系统的任务。然而，在设计这种日益复杂的系统时，嵌入式系统设计者仍然需要遵守严格的嵌入式设计约束(例如，能源和面积要求)。为了满足这些限制，可配置的硬件组件引入了可配置的参数(例如，CPU电压和频率，缓存大小，缓存关联性，缓存线大小，管道深度/宽度等)，可以调整到特定的值，以满足不同的设计限制(例如，面积，能量，性能等)和用户需求(例如，增加电池寿命，提高性能，或期望的权衡)，这转化为更好的用户体验质量。然而，随着可配置参数设计空间的增加，确定这些特定参数值变得越来越困难和耗时。考虑到每个应用程序都有一组不同的基于这些需求和要求的最优/最佳参数值，这个问题变得更加复杂。此外，重复的应用程序行为，称为阶段，在整个应用程序运行时发生，可以通过跟踪每个阶段的唯一最优参数值来利用;导致构型空间的大小呈倍数增长或指数增长。在本文中，我们提出了一种基于机器学习的方法，以显着减少为每个应用阶段的指令和数据缓存找到最佳可配置参数值所需的时间。在我们的方法中，我们使用人工神经网络(ann)来预测应用阶段的最佳配置。我们收集执行统计数据作为应用程序阶段的功能，并使用功能缩减来显著减少功能大小。我们证明了人工神经网络在多次训练和测试迭代中表现出高、稳定的精度。我们还表明，使用我们的方法，应用程序对指令和数据缓存都表现出较低的能量衰减(小于1%)。

{"title":"Machine Learning-based Prediction for Phase-Based Dynamic Architectural Specialization","authors":"Ruben Vazquez, Islam Badreldin, Mohamad Hammam Alsafrjalani, A. Gordon-Ross","doi":"10.1109/ISVLSI.2019.00101","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00101","url":null,"abstract":"Embedded computing systems are becoming increasingly complex, now performing tasks that were generally limited to desktop computing systems. However, embedded system designers are still required to adhere to stringent embedded design constraints (e.g., energy and area requirements) when designing such increasingly complex systems. To meet these constraints, configurable hardware components introduce configurable parameters (e.g., CPU voltage and frequency, cache size, cache associativity, cache line size, pipeline depth/width, etc.) that can be tuned to specific values to meet different design constraints (e.g., area, energy, performance, etc.) and user demands (e.g., increased battery life, increased performance, or a desired trade off), which translates to a better quality of the user experience. However, determining these specific parameter values is increasingly difficult and time-consuming as the configurable parameter design space increases. This issue is further complicated when considering that each application has a different set of optimal/best parameter values based on these demands and requirements. Furthermore, repetitious application behavior, known as phases, which occur throughout an application's runtime, can be exploited by tracking each phase's unique optimal parameter values; resulting in a multiplicative increase or an exponential increase in the size of the size of the configuration space. In this paper, we propose a machine learning-based methodology to significantly reduce the time required to find the optimal configurable parameter values for the instruction and data caches for each application phase. In our method, we use artificial neural networks (ANNs) to predict the optimal configuration for application phases. We collect execution statistics for use as features for an application phase and use feature reduction to significantly reduce the features size. We show that ANNs exhibit high, stable accuracy over multiple training and testing iterations. We also show that applications exhibit low energy degradations (less than 1%) for both the instruction and data caches using our methodology.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"5 1","pages":"529-534"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91506287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0