2018 19th International Symposium on Quality Electronic Design (ISQED)最新文献

英文中文

Parallel implementation of finite state machines for reducing the latency of stochastic computing 减少随机计算延迟的有限状态机并行实现

2018 19th International Symposium on Quality Electronic Design (ISQED)

Pub Date : 2018-03-13 DOI: 10.1109/ISQED.2018.8357309

Cong Ma, D. Lilja

Stochastic computing, which employs random bit streams for computations, has shown low hardware cost and high fault-tolerance compared to the computations using a conventional binary encoding. Finite state machine (FSM) based stochastic computing elements can compute complex functions, such as the exponentiation and hyperbolic tangent functions, more efficiently than those using combinational logic. However, the FSM, as a sequential logic, cannot be directly implemented in parallel like the combinational logic, so reducing the long latency of the calculation becomes difficult. Applications in the relatively higher frequency domain would require an extremely fast clock rate using FSM. This paper proposes a parallel implementation of the FSM, using an estimator and a dispatcher to directly initialize the FSM to the steady state. Experimental results show that the outputs of four typical functions using the parallel implementation are very close to those of the serial version. The parallel FSM scheme further shows equivalent or better image quality than the serial implementation in two image processing applications Edge Detection and Frame Difference.

随机计算采用随机比特流进行计算，与传统二进制编码计算相比，具有低硬件成本和高容错性的特点。基于有限状态机(FSM)的随机计算单元可以比使用组合逻辑的随机计算单元更有效地计算复杂函数，如幂函数和双曲正切函数。然而，FSM作为一种顺序逻辑，不能像组合逻辑那样直接并行实现，因此降低计算的长延迟变得困难。在相对较高频率域的应用程序将需要使用FSM的极快时钟速率。本文提出了一种FSM的并行实现方法，利用估计器和调度器直接将FSM初始化为稳态。实验结果表明，采用并行实现的四个典型函数的输出与串行版本的输出非常接近。在边缘检测和帧差两种图像处理应用中，并行FSM方案进一步显示出与串行实现相同或更好的图像质量。

{"title":"Parallel implementation of finite state machines for reducing the latency of stochastic computing","authors":"Cong Ma, D. Lilja","doi":"10.1109/ISQED.2018.8357309","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357309","url":null,"abstract":"Stochastic computing, which employs random bit streams for computations, has shown low hardware cost and high fault-tolerance compared to the computations using a conventional binary encoding. Finite state machine (FSM) based stochastic computing elements can compute complex functions, such as the exponentiation and hyperbolic tangent functions, more efficiently than those using combinational logic. However, the FSM, as a sequential logic, cannot be directly implemented in parallel like the combinational logic, so reducing the long latency of the calculation becomes difficult. Applications in the relatively higher frequency domain would require an extremely fast clock rate using FSM. This paper proposes a parallel implementation of the FSM, using an estimator and a dispatcher to directly initialize the FSM to the steady state. Experimental results show that the outputs of four typical functions using the parallel implementation are very close to those of the serial version. The parallel FSM scheme further shows equivalent or better image quality than the serial implementation in two image processing applications Edge Detection and Frame Difference.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130885653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A modified method of logical effort for FinFET circuits considering impact of fin-extension effects 一种考虑翅片延伸效应影响的FinFET电路逻辑功的改进方法

2018 19th International Symposium on Quality Electronic Design (ISQED)

Pub Date : 2018-03-13 DOI: 10.1109/ISQED.2018.8357286

A. Pandey, Pitul Garg, Shobhit Tyagi, R. Ranjan, B. Anand

For the transistor sizing of multistage digital circuits with predictable delays, the relationship between the effective input capacitances of all the stages and the stage size ratios must be known. Effective capacitances of the FinFET logic gates are strongly dependent on transition times at their input-output nodes and are, therefore, not directly proportional to the stage size, as opposed to conventional transistors. Due to this, the methods developed for transistor sizing of planar logic circuits are not valid for FinFET logic circuits. Though this effect is not present in highly gate-drain overlapped FinFET devices, their performance is highly compromised (higher power consumption and larger delay). We propose a modification of the existing logical effort based delay model for FinFET inverter chain that considers the above-mentioned characteristics of FinFET devices. We also discuss branching loads and transistor sizing of non-critical paths in this paper. We observe that our FinFET sizing scheme leads to a significant reduction in inverter chain delays. We observe that error in estimation of delay (not considering the transition time dependency) in a two stage FO4 inverter chain is 31.8% and 15.3% respectively (from mixed-mode TCAD simulations).

对于具有可预测延迟的多级数字电路的晶体管尺寸，必须知道所有级的有效输入电容与级尺寸比之间的关系。FinFET逻辑门的有效电容强烈依赖于其输入输出节点的转换时间，因此，与传统晶体管相反，与级尺寸不成正比。因此，平面逻辑电路晶体管尺寸的方法不适用于FinFET逻辑电路。虽然这种效应在高度栅漏重叠的FinFET器件中不存在，但它们的性能受到高度损害(更高的功耗和更大的延迟)。考虑到FinFET器件的上述特性，我们对现有的基于逻辑努力的FinFET逆变链延迟模型进行了修改。本文还讨论了非关键路径的分支负载和晶体管尺寸。我们观察到，我们的FinFET尺寸方案导致逆变器链延迟显著减少。我们观察到，在两级FO4逆变器链中估计延迟(不考虑过渡时间依赖性)的误差分别为31.8%和15.3%(来自混合模式TCAD仿真)。

{"title":"A modified method of logical effort for FinFET circuits considering impact of fin-extension effects","authors":"A. Pandey, Pitul Garg, Shobhit Tyagi, R. Ranjan, B. Anand","doi":"10.1109/ISQED.2018.8357286","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357286","url":null,"abstract":"For the transistor sizing of multistage digital circuits with predictable delays, the relationship between the effective input capacitances of all the stages and the stage size ratios must be known. Effective capacitances of the FinFET logic gates are strongly dependent on transition times at their input-output nodes and are, therefore, not directly proportional to the stage size, as opposed to conventional transistors. Due to this, the methods developed for transistor sizing of planar logic circuits are not valid for FinFET logic circuits. Though this effect is not present in highly gate-drain overlapped FinFET devices, their performance is highly compromised (higher power consumption and larger delay). We propose a modification of the existing logical effort based delay model for FinFET inverter chain that considers the above-mentioned characteristics of FinFET devices. We also discuss branching loads and transistor sizing of non-critical paths in this paper. We observe that our FinFET sizing scheme leads to a significant reduction in inverter chain delays. We observe that error in estimation of delay (not considering the transition time dependency) in a two stage FO4 inverter chain is 31.8% and 15.3% respectively (from mixed-mode TCAD simulations).","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124691474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Power and performance aware memory-controller voting mechanism 功耗和性能敏感的内存控制器投票机制

2018 19th International Symposium on Quality Electronic Design (ISQED)

Pub Date : 2018-03-13 DOI: 10.1109/ISQED.2018.8357276

M. Vratonjic, H. Singh, G. Kumar, R. Mohamed, Ashish Bajaj, Ken Gainey

Modern System-on-Chips (SoCs) integrate a graphics unit (GPU) with many application processor cores (CPUs), communication cores (modem, WiFi) and device interfaces (USB, HDMI) on a single die. The primary memory system is fast becoming a major performance bottleneck as more and more of these units share this critical resource. An Integrated-Memory-Controller (IMC) is responsible for buffering and servicing memory requests from different CPU cores, GPU and other processing blocks that require DDR memory access. Previous work [2] was focused on appropriately prioritizing memory requests and increasing IMC/DDR memory frequency to improve system performance — which came at the expense of higher power consumption. Recent work has addressed this problem by using a demand based approach. This is accomplished by making the IMC aware of the application characteristics and then scaling its frequency based on the memory access demand [1]. This leads to lower IMC and DDR frequencies and thus lower power. The work presented here shows that instead of lowering the frequency, greater total system power savings can be achieved by increasing IMC frequency at the beginning of a use-case that has moderate GPU utilization. The primary motivation behind this approach is that it allows GPU, with its inherent ability to execute a larger number of parallel threads, to access memory faster and therefore complete its processing portion of the execution pipeline faster. This, in turn, allows relaxation of the timing requirements imposed on the CPU pipeline portion and consecutive cycles, thus saving on total system power. An algorithm for this technique, along with the silicon results on an SoC implemented in an industrial 28nm process, will be presented in this paper.

现代片上系统(soc)将图形单元(GPU)与许多应用处理器核心(cpu)，通信核心(调制解调器，WiFi)和设备接口(USB, HDMI)集成在单个芯片上。随着越来越多的主存单元共享这一关键资源，主存系统正迅速成为主要的性能瓶颈。集成内存控制器(IMC)负责缓冲和服务来自不同CPU核心、GPU和其他需要DDR内存访问的处理块的内存请求。以前的工作[2]主要关注内存请求的适当优先级和提高IMC/DDR内存频率以提高系统性能——这是以更高的功耗为代价的。最近的工作通过使用基于需求的方法解决了这个问题。这是通过让IMC了解应用程序的特征，然后根据内存访问需求调整其频率来实现的[1]。这将导致较低的IMC和DDR频率，从而降低功率。这里展示的工作表明，在具有中等GPU利用率的用例开始时，通过增加IMC频率可以实现更大的系统总功耗节省，而不是降低频率。这种方法背后的主要动机是，它允许GPU以其固有的能力来执行更多的并行线程，更快地访问内存，从而更快地完成执行管道的处理部分。反过来，这允许放松对CPU管道部分和连续周期施加的时间要求，从而节省系统总功率。本文将介绍该技术的算法，以及在工业28nm工艺中实现的SoC上的硅结果。

{"title":"Power and performance aware memory-controller voting mechanism","authors":"M. Vratonjic, H. Singh, G. Kumar, R. Mohamed, Ashish Bajaj, Ken Gainey","doi":"10.1109/ISQED.2018.8357276","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357276","url":null,"abstract":"Modern System-on-Chips (SoCs) integrate a graphics unit (GPU) with many application processor cores (CPUs), communication cores (modem, WiFi) and device interfaces (USB, HDMI) on a single die. The primary memory system is fast becoming a major performance bottleneck as more and more of these units share this critical resource. An Integrated-Memory-Controller (IMC) is responsible for buffering and servicing memory requests from different CPU cores, GPU and other processing blocks that require DDR memory access. Previous work [2] was focused on appropriately prioritizing memory requests and increasing IMC/DDR memory frequency to improve system performance — which came at the expense of higher power consumption. Recent work has addressed this problem by using a demand based approach. This is accomplished by making the IMC aware of the application characteristics and then scaling its frequency based on the memory access demand [1]. This leads to lower IMC and DDR frequencies and thus lower power. The work presented here shows that instead of lowering the frequency, greater total system power savings can be achieved by increasing IMC frequency at the beginning of a use-case that has moderate GPU utilization. The primary motivation behind this approach is that it allows GPU, with its inherent ability to execute a larger number of parallel threads, to access memory faster and therefore complete its processing portion of the execution pipeline faster. This, in turn, allows relaxation of the timing requirements imposed on the CPU pipeline portion and consecutive cycles, thus saving on total system power. An algorithm for this technique, along with the silicon results on an SoC implemented in an industrial 28nm process, will be presented in this paper.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117069742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Clock buffer and flip-flop co-optimization for reducing peak current noise 降低峰值电流噪声的时钟缓冲器和触发器协同优化

2018 19th International Symposium on Quality Electronic Design (ISQED)

Pub Date : 2018-03-13 DOI: 10.1109/ISQED.2018.8357271

Joohan Kim, Taewhan Kim

For high-speed digital circuits, the activation of all flip-flops that are used to store data should be strictly synchronized by clock signals delivered through clock networks. However, due to the high frequency of simultaneous switching of clock pins in flip-flops, a high peak power/ground noise (i.e., voltage drop) is induced at the clock boundary. To mitigate the current noise, we employ four different types of hardware component that can implement a set of flip-flops and their driving buffer as a single unit, which was previously used for reducing clock power consumption. (The idea for the generation of the four types of clock boundary component was that one of the two inverters in a driving buffer and one of the two inverters in each of its driven flip-flops can be nullified without altering the circuit functionality.) Consequently, we have a flexibility of selecting (i.e., allocating) clock boundary components in a way to reduce peak current under timing constraint. We formulate the component allocation problem of minimizing peak current into a multi-objective shortest path problem and solve it efficiently using an approximation algorithm. We have implemented our proposed approach and tested it with ISCAS benchmark circuits. The experimental results confirm that our approach is able to reduce the peak current by 27.7%∼30.9% on average.

对于高速数字电路，用于存储数据的所有触发器的激活应由时钟网络传递的时钟信号严格同步。然而，由于触发器中时钟引脚同时开关的频率很高，在时钟边界处会产生很高的峰值功率/地噪声(即电压降)。为了减轻电流噪声，我们采用了四种不同类型的硬件组件，这些硬件组件可以将一组触发器及其驱动缓冲器作为单个单元来实现，这在以前用于降低时钟功耗。(产生四种类型时钟边界元件的想法是，驱动缓冲器中的两个逆变器中的一个和每个驱动触发器中的两个逆变器中的一个可以在不改变电路功能的情况下无效。)因此，我们可以灵活地选择(即分配)时钟边界组件，以减少定时约束下的峰值电流。将峰值电流最小的元件分配问题转化为多目标最短路径问题，并采用近似算法进行有效求解。我们已经实现了我们提出的方法，并在ISCAS基准电路上进行了测试。实验结果证实，我们的方法能够将峰值电流平均降低27.7% ~ 30.9%。

{"title":"Clock buffer and flip-flop co-optimization for reducing peak current noise","authors":"Joohan Kim, Taewhan Kim","doi":"10.1109/ISQED.2018.8357271","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357271","url":null,"abstract":"For high-speed digital circuits, the activation of all flip-flops that are used to store data should be strictly synchronized by clock signals delivered through clock networks. However, due to the high frequency of simultaneous switching of clock pins in flip-flops, a high peak power/ground noise (i.e., voltage drop) is induced at the clock boundary. To mitigate the current noise, we employ four different types of hardware component that can implement a set of flip-flops and their driving buffer as a single unit, which was previously used for reducing clock power consumption. (The idea for the generation of the four types of clock boundary component was that one of the two inverters in a driving buffer and one of the two inverters in each of its driven flip-flops can be nullified without altering the circuit functionality.) Consequently, we have a flexibility of selecting (i.e., allocating) clock boundary components in a way to reduce peak current under timing constraint. We formulate the component allocation problem of minimizing peak current into a multi-objective shortest path problem and solve it efficiently using an approximation algorithm. We have implemented our proposed approach and tested it with ISCAS benchmark circuits. The experimental results confirm that our approach is able to reduce the peak current by 27.7%∼30.9% on average.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134209025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep neural network acceleration framework under hardware uncertainty 硬件不确定性下的深度神经网络加速框架

2018 19th International Symposium on Quality Electronic Design (ISQED)

Pub Date : 2018-03-13 DOI: 10.1109/ISQED.2018.8357318

M. Imani, Pushen Wang, T. Simunic

Deep Neural Networks (DNNs) are known as effective model to perform cognitive tasks. However, DNNs are computationally expensive in both train and inference modes as they require the precision of floating point operations. Although, several prior work proposed approximate hardware to accelerate DNNs inference, they have not considered the impact of training on accuracy. In this paper, we propose a general framework called FramNN, which adjusts DNN training model to make it appropriate for underlying hardware. To accelerate training FramNN applies adaptive approximation which dynamically changes the level of hardware approximation depending on the DNN error rate. We test the efficiency of the proposed design over six popular DNN applications. Our evaluation shows that in inference, our design can achieve 1.9× energy efficiency improvement and 1.7× speedup while ensuring less than 1% quality loss. Similarly, in training mode FramNN can achieve 5.0× energy-delay product improvement as compared to baseline AMD GPU.

深度神经网络(dnn)被认为是执行认知任务的有效模型。然而，dnn在训练和推理模式下的计算成本都很高，因为它们需要浮点运算的精度。虽然之前的一些工作提出了近似硬件来加速dnn推理，但他们没有考虑训练对准确性的影响。在本文中，我们提出了一个称为FramNN的通用框架，该框架调整DNN训练模型以使其适合底层硬件。为了加速训练，采用自适应逼近，根据深度神经网络的错误率动态改变硬件逼近的水平。我们在六个流行的深度神经网络应用中测试了所提出设计的效率。我们的评估表明，在推理中，我们的设计可以实现1.9倍的能效提升和1.7倍的加速，同时保证不到1%的质量损失。同样，在训练模式下，与基准AMD GPU相比，FramNN可以实现5.0倍的能量延迟产品改进。

引用次数: 10

An automated flow for design validation of switched mode power supply 开关电源设计验证的自动化流程

2018 19th International Symposium on Quality Electronic Design (ISQED)

Pub Date : 2018-03-13 DOI: 10.1109/ISQED.2018.8357292

P. Chawda, S. Srinivasan

Traditionally SPICE simulations have been extensively used to validate switched mode power supply designs. It takes multiple iterations of each simulation type at a design corner and there are multiple such simulations required over several design corners to find and fix issues, which is tedious and error prone. This paper presents a novel methodology of using detailed analytical circuit equations in addition to simulations for automated validation of power supply design. In this work we propose automatic detection and correction of design issues over required design corners. This significantly reduces the number of simulations required for design signoff and therefore drastically reduces overall design cycle time.

传统上，SPICE仿真已广泛用于验证开关模式电源设计。它需要在设计角对每种模拟类型进行多次迭代，并且需要在多个设计角上进行多次此类模拟以发现和修复问题，这是乏味且容易出错的。本文提出了一种新的方法，利用详细的分析电路方程和仿真来自动验证电源设计。在这项工作中，我们提出了在要求的设计角上自动检测和纠正设计问题。这大大减少了设计签字所需的模拟次数，从而大大缩短了整体设计周期时间。

引用次数: 2

Parasitic-aware gm/ID-based many-objective analog/RF circuit sizing 基于寄生感知gm/ id的多目标模拟/射频电路定径

2018 19th International Symposium on Quality Electronic Design (ISQED)

Pub Date : 2018-03-13 DOI: 10.1109/ISQED.2018.8357272

Tuotian Liao, Lihong Zhang

Accurate parasitic consideration in analog/RF circuit synthesis becomes more essential since layout-dependent effects become more influential in the advanced technologies. In this paper, a gm/ID-based circuit sizing method, which takes into account both device intrinsic parasitics and interconnect parasitics, is proposed as the first stage of a hybrid sizing optimization. In the second stage, a many-objective evolutionary algorithm is applied to refine the sizing solutions. The proposed methodology has been utilized to optimize multiple performances of an analog dynamic differential comparator and a RF circuit in the advanced CMOS technology. The experimental results have exhibited high efficacy of our proposed parasitic-aware hybrid sizing methodology.

在模拟/射频电路合成中，精确的寄生考虑变得越来越重要，因为在先进的技术中，布图依赖效应变得越来越重要。本文提出了一种同时考虑器件固有寄生和互连寄生的基于gm/ id的电路尺寸优化方法，作为混合尺寸优化的第一阶段。在第二阶段，采用多目标进化算法对尺度解进行细化。该方法已被用于优化先进CMOS技术中模拟动态差分比较器和射频电路的多项性能。实验结果表明，我们提出的寄生感知杂交施胶方法具有很高的有效性。

引用次数: 6

A droop measurement built-in self-test circuit for digital low-dropout regulators 一种用于数字低压差稳压器的内置自检电路

2018 19th International Symposium on Quality Electronic Design (ISQED)

Pub Date : 2018-03-13 DOI: 10.1109/ISQED.2018.8357257

Aydin Dirican, Cagatay Ozmen, M. Margala

Today's highly integrated system-on-chips (SOCs) employ several integrated voltage regulators to achieve higher power efficiency and smaller board area. Testing of voltage regulators is essential to validate the final product. In this work, we present a unique droop measurement built-in self-test (BIST) circuit for digital low-dropout regulators (DLDOs). The proposed BIST system is capable of storing transient droop information with less than 1.05 % error for droop voltages ranging from 45 mV to 520 mV for nominal DLDO output voltage of 1.6 V where supply voltage is 1.8 V. Additionally, a reuse based 10-bit successive-approximation (SAR) analog-to-digital converter (ADC) is incorporated to generate a digital output corresponding to the stored droop information as the BIST measurement result. The on-chip DLDO decoupling capacitor (∼1 nF) is reconfigured as a charge scaling array for ADC operation during testing to increase reusability. The proposed BIST circuit is designed with 0.18 μm CMOS process in Cadence Virtuoso and verified with corner simulations.

今天的高度集成的系统芯片(soc)采用几个集成的电压调节器，以实现更高的功率效率和更小的板面积。电压调节器的测试对于验证最终产品至关重要。在这项工作中，我们提出了一种独特的数字低降稳压器(dldo)的下垂测量内置自检(BIST)电路。在供电电压为1.8 V的DLDO标称输出电压为1.6 V时，该系统能够在45 mV至520 mV的电压范围内存储暂态下垂信息，误差小于1.05%。此外，采用基于重用的10位连续逼近(SAR)模数转换器(ADC)，生成与存储的垂降信息相对应的数字输出作为BIST测量结果。片上DLDO去耦电容器(~ 1 nF)被重新配置为ADC操作的电荷缩放阵列，以提高测试期间的可重用性。在Cadence Virtuoso中采用0.18 μm CMOS工艺设计了BIST电路，并进行了拐角仿真验证。

引用次数: 2

Generic system-level modeling and optimization for beyond CMOS device applications 通用系统级建模和优化超越CMOS器件应用

2018 19th International Symposium on Quality Electronic Design (ISQED)

Pub Date : 2018-03-13 DOI: 10.1109/ISQED.2018.8357287

V. Huang, C. Pan, A. Naeemi

In this work, a fast, generic system-level design and optimization methodology is presented for futuristic devices. This work evaluates GaN Heterojunction TFET, WTe2 Two-dimensional heterojunction interlayer TFET (ThinTFET), and WTe2 Transition Metal Dichalcogenide TFET (TMD TFET) in terms of performance and energy-delay product (EDP). This study investigates the impact of device-level performance on the system-level performance and power dissipation. The system-level methodology uses a generic model that utilizes a stochastic wire distribution to estimate system performance. An optimum supply voltage and gate count to achieve maximum throughput is examined for each of the devices using an empirical CPI model under different power budget constraints. Based on this study, the optimal design of each beyond-CMOS device technology is demonstrated to improve EDP. Results in this work delineate an optimal EDP for a given range of power budgets, and provides insightful trends on key design parameters as well as optimal performance and power metrics based on the fast system-level optimization at the early design stage.

在这项工作中，为未来设备提出了一种快速，通用的系统级设计和优化方法。本研究从性能和能量延迟积(EDP)方面对GaN异质结TFET、WTe2二维异质结间层TFET (ThinTFET)和WTe2过渡金属二硫化物TFET (TMD TFET)进行了评价。本研究探讨装置级性能对系统级性能及功耗的影响。系统级方法使用一种通用模型，该模型利用随机线分布来估计系统性能。在不同的功率预算约束下，使用经验CPI模型检查每个器件的最佳电源电压和栅极计数以实现最大吞吐量。在此基础上，论证了各种超cmos器件技术的优化设计，以提高EDP。这项工作的结果描述了给定功率预算范围内的最佳EDP，并提供了关键设计参数的深刻趋势，以及基于早期设计阶段快速系统级优化的最佳性能和功率指标。

{"title":"Generic system-level modeling and optimization for beyond CMOS device applications","authors":"V. Huang, C. Pan, A. Naeemi","doi":"10.1109/ISQED.2018.8357287","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357287","url":null,"abstract":"In this work, a fast, generic system-level design and optimization methodology is presented for futuristic devices. This work evaluates GaN Heterojunction TFET, WTe2 Two-dimensional heterojunction interlayer TFET (ThinTFET), and WTe2 Transition Metal Dichalcogenide TFET (TMD TFET) in terms of performance and energy-delay product (EDP). This study investigates the impact of device-level performance on the system-level performance and power dissipation. The system-level methodology uses a generic model that utilizes a stochastic wire distribution to estimate system performance. An optimum supply voltage and gate count to achieve maximum throughput is examined for each of the devices using an empirical CPI model under different power budget constraints. Based on this study, the optimal design of each beyond-CMOS device technology is demonstrated to improve EDP. Results in this work delineate an optimal EDP for a given range of power budgets, and provides insightful trends on key design parameters as well as optimal performance and power metrics based on the fast system-level optimization at the early design stage.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130805382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A deep learning based approach for analog hardware implementation of delayed feedback reservoir computing system 基于深度学习的延迟反馈油藏计算系统模拟硬件实现方法

2018 19th International Symposium on Quality Electronic Design (ISQED)

Pub Date : 2018-03-13 DOI: 10.1109/ISQED.2018.8357305

Jialing Li, Kangjun Bai, Lingjia Liu, Y. Yi

As the 2020 roadblock approaches, the need of breakthrough in computing systems has directed researchers to novel computing paradigms. The recently emerged reservoir computing model, delayed feedback reservoir (DFR) computing, only utilizes one nonlinear neuron along with a delay loop. It not only offers the ease of hardware implementation but also enables the optimal performance contributed by the inherent delay and its rich intrinsic dynamics. The field of deep learning has attracted worldwide attention due to its hierarchical architecture that allows more efficient performance than a shallow structure. Along with our analog hardware implementation of the DFR, we investigate the possibility of merging deep learning and DFR computing systems. By evaluating the results, deep DFR models demonstrate 50%–81% better performance during training and 39%–64% performance improvement during testing than shallow leaky echo state network (ESN) model. Due to the difference in architecture, the training time of MI (multiple inputs)-deep DFR requires approximately 21% longer than that of the deep DFR model. Our approach offers the great potential and promise in the realization of analog hardware implementations for deep DFR systems.

随着2020年路障的临近，计算系统的突破需求引导研究人员寻找新的计算范式。最近出现的水库计算模型，延迟反馈水库(DFR)计算，只使用一个非线性神经元和一个延迟环。它不仅提供了硬件实现的便利性，而且使固有延迟及其丰富的内在动态所带来的最佳性能成为可能。深度学习领域因其比浅层结构更有效的分层结构而受到全世界的关注。随着DFR的模拟硬件实现，我们研究了合并深度学习和DFR计算系统的可能性。通过对结果的评估，深度DFR模型在训练时的性能比浅泄漏回声状态网络(ESN)模型提高50% ~ 81%，在测试时的性能比浅泄漏回声状态网络(ESN)模型提高39% ~ 64%。由于结构的不同，MI(多输入)-deep DFR模型的训练时间比deep DFR模型的训练时间大约长21%。我们的方法为实现深度DFR系统的模拟硬件实现提供了巨大的潜力和前景。

{"title":"A deep learning based approach for analog hardware implementation of delayed feedback reservoir computing system","authors":"Jialing Li, Kangjun Bai, Lingjia Liu, Y. Yi","doi":"10.1109/ISQED.2018.8357305","DOIUrl":"https://doi.org/10.1109/ISQED.2018.8357305","url":null,"abstract":"As the 2020 roadblock approaches, the need of breakthrough in computing systems has directed researchers to novel computing paradigms. The recently emerged reservoir computing model, delayed feedback reservoir (DFR) computing, only utilizes one nonlinear neuron along with a delay loop. It not only offers the ease of hardware implementation but also enables the optimal performance contributed by the inherent delay and its rich intrinsic dynamics. The field of deep learning has attracted worldwide attention due to its hierarchical architecture that allows more efficient performance than a shallow structure. Along with our analog hardware implementation of the DFR, we investigate the possibility of merging deep learning and DFR computing systems. By evaluating the results, deep DFR models demonstrate 50%–81% better performance during training and 39%–64% performance improvement during testing than shallow leaky echo state network (ESN) model. Due to the difference in architecture, the training time of MI (multiple inputs)-deep DFR requires approximately 21% longer than that of the deep DFR model. Our approach offers the great potential and promise in the realization of analog hardware implementations for deep DFR systems.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115063674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2018 19th International Symposium on Quality Electronic Design (ISQED)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀