首页 > 最新文献

2012 25th International Conference on VLSI Design最新文献

英文 中文
Two Graph Based Circuit Simulator for PDE-Electrical Analogy 基于双图的pde电气仿真器
Pub Date : 2013-01-05 DOI: 10.1109/VLSID.2013.214
Yogesh Dilip Save, H. Narayanan, S. Patkar
The aim of the paper is to develop an efficient circuit simulator to solve circuits arising out of an electrical analogy for Partial Differential Equations (PDEs). This electrical analogy arises when we solve PDE through finite element method (FEM). The paper also proposes an optimal method for simulation of such circuits. We have built simulators based on Modified Nodal Analysis and Two Graph method for solution of PDEs through electrical analogy and compared their timing performance with commercial simulators. The timing performance of circuit simulators is improved for special PDE problems (such as Convection-diffusion) by an efficient implementation of iterative Cholesky with Two Graph method. The method is based on a graph representation of linear systems of equations. Such iterative methods would not be feasible with MNA. Using this method, we have been able to simulate circuits arising from the Convection-Diffusion problem with approximately 1.6 million nodes and 47 million edges in less than 8 minutes.
本文的目的是开发一个有效的电路模拟器来解决由偏微分方程(PDEs)的电类比引起的电路。当我们用有限元法求解偏微分方程时,就会出现这种电学上的类比。本文还提出了该类电路仿真的优化方法。通过电学类比建立了基于修正节点分析法和双图法求解偏微分方程的仿真器,并将其时序性能与商用仿真器进行了比较。利用二图法有效地实现了迭代Cholesky算法,提高了电路模拟器在求解对流扩散等特殊PDE问题时的定时性能。该方法基于线性方程组的图表示。这种迭代方法对于MNA是不可行的。使用这种方法,我们能够在不到8分钟的时间内模拟出由对流扩散问题产生的大约160万个节点和4700万个边的电路。
{"title":"Two Graph Based Circuit Simulator for PDE-Electrical Analogy","authors":"Yogesh Dilip Save, H. Narayanan, S. Patkar","doi":"10.1109/VLSID.2013.214","DOIUrl":"https://doi.org/10.1109/VLSID.2013.214","url":null,"abstract":"The aim of the paper is to develop an efficient circuit simulator to solve circuits arising out of an electrical analogy for Partial Differential Equations (PDEs). This electrical analogy arises when we solve PDE through finite element method (FEM). The paper also proposes an optimal method for simulation of such circuits. We have built simulators based on Modified Nodal Analysis and Two Graph method for solution of PDEs through electrical analogy and compared their timing performance with commercial simulators. The timing performance of circuit simulators is improved for special PDE problems (such as Convection-diffusion) by an efficient implementation of iterative Cholesky with Two Graph method. The method is based on a graph representation of linear systems of equations. Such iterative methods would not be feasible with MNA. Using this method, we have been able to simulate circuits arising from the Convection-Diffusion problem with approximately 1.6 million nodes and 47 million edges in less than 8 minutes.","PeriodicalId":405021,"journal":{"name":"2012 25th International Conference on VLSI Design","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134049605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Tutorial T8A: Designing Silicon-Photonic Communication Networks for Manycore Systems 教程T8A:设计多核系统的硅光子通信网络
Pub Date : 2012-03-12 DOI: 10.1109/VLSID.2012.36
A. Joshi
Summary form only given. The goal of this tutorial is to explain the limits and opportunities of using silicon-photonic link technology for inter-chip and intra-chip communication in manycore systems. Silicon-photonic links have larger bandwidth density and lower energy than equivalent electrical links. In this tutorial, I will first provide an overview of the silicon-photonic device technology and link transceiver/tuning circuits. Using three silicon-photonic network case studies {on-chip tile-to-tile network, process-to-DRAM network and DRAM memory channel, the various silicon-photonic network design issues at the physical level, micro-architecture level and architecture level will be explained in detail. An iterative design process, where we move between these three levels to meet the power-performance specifications under the silicon-photonic technology constraints will also be presented. At the end of the tutorial, attendees will have a broad understanding of the capabilities of silicon-photonic technology, and they will be able to design and analyze silicon-photonic networks for Manycore systems.
只提供摘要形式。本教程的目的是解释在多核系统中使用硅光子链路技术进行芯片间和芯片内通信的限制和机会。硅光子链路比等效电链路具有更大的带宽密度和更低的能量。在本教程中,我将首先提供硅光子器件技术和链路收发器/调谐电路的概述。通过三个硅光子网络案例研究{片上块到块网络,进程到DRAM网络和DRAM存储器通道,将详细解释物理层,微架构层和架构层的各种硅光子网络设计问题。我们还将介绍一个迭代设计过程,在这三个层次之间移动以满足硅光子技术约束下的功率性能规格。在本教程结束时,与会者将对硅光子技术的能力有一个广泛的了解,他们将能够设计和分析多核系统的硅光子网络。
{"title":"Tutorial T8A: Designing Silicon-Photonic Communication Networks for Manycore Systems","authors":"A. Joshi","doi":"10.1109/VLSID.2012.36","DOIUrl":"https://doi.org/10.1109/VLSID.2012.36","url":null,"abstract":"Summary form only given. The goal of this tutorial is to explain the limits and opportunities of using silicon-photonic link technology for inter-chip and intra-chip communication in manycore systems. Silicon-photonic links have larger bandwidth density and lower energy than equivalent electrical links. In this tutorial, I will first provide an overview of the silicon-photonic device technology and link transceiver/tuning circuits. Using three silicon-photonic network case studies {on-chip tile-to-tile network, process-to-DRAM network and DRAM memory channel, the various silicon-photonic network design issues at the physical level, micro-architecture level and architecture level will be explained in detail. An iterative design process, where we move between these three levels to meet the power-performance specifications under the silicon-photonic technology constraints will also be presented. At the end of the tutorial, attendees will have a broad understanding of the capabilities of silicon-photonic technology, and they will be able to design and analyze silicon-photonic networks for Manycore systems.","PeriodicalId":405021,"journal":{"name":"2012 25th International Conference on VLSI Design","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117062480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Energy-Efficient Application Mapping in FPGA through Computation in Embedded Memory Blocks 基于嵌入式内存块计算的FPGA节能应用映射
Pub Date : 2012-01-07 DOI: 10.1109/VLSID.2012.108
A. Ghosh, Somnath Paul, S. Bhunia
FPGAs have emerged as the preferred prototyping and accelerator platform for diverse application domains such as digital signal processing (DSP), security and multimedia, which often impose real-time performance requirements. Most applications in these domains require efficient implementation of complex data paths or functions, e.g. transcendental functions which are spatially mapped in the configurable logic or embedded DSP blocks of a FPGA device. Requirement of elaborate computational resources to realize these operations impose a major barrier to energy efficiency. In this paper, we propose to use embedded memory blocks in FPGA for computing to significantly improve energy efficiency of the applications which are dominated by complex data paths and/or functions. Complex operations are decomposed into large multi-input/output lookup tables (LUTs); mapped to embedded memory blocks and evaluated through memory access over single or multiple cycles. Different parts of an application are selectively mapped into memory or logic/DSP blocks in a heterogeneous mapping framework to maximize energy efficiency. We explore optimal energy configuration of embedded memory for mapping applications of varying input size and develop a complete mapping flow including decomposition, fusion and packing. Effectiveness of the proposed flow is evaluated using a commercial state-of-the-art FPGA system (Altera Stratix IV device). Finally the proposed framework is used to drastically trade-off energy vs accuracy at run-time for common signal processing applications.
fpga已经成为各种应用领域的首选原型和加速器平台,例如数字信号处理(DSP),安全和多媒体,这些领域通常对实时性有要求。这些领域的大多数应用需要有效地实现复杂的数据路径或功能,例如超越功能,这些功能在可配置逻辑或FPGA器件的嵌入式DSP块中进行空间映射。实现这些操作需要复杂的计算资源,这是能源效率的主要障碍。在本文中,我们建议在FPGA中使用嵌入式存储块进行计算,以显着提高以复杂数据路径和/或功能为主的应用的能源效率。复杂的操作被分解成大型的多输入/输出查找表(lut);映射到嵌入式内存块,并通过单个或多个周期的内存访问进行评估。在异构映射框架中,应用程序的不同部分被选择性地映射到内存或逻辑/DSP块中,以最大限度地提高能源效率。我们探索了不同输入大小的映射应用中嵌入式存储器的最佳能量配置,并开发了一个完整的映射流程,包括分解、融合和打包。使用商用最先进的FPGA系统(Altera Stratix IV设备)评估了所提出流程的有效性。最后,所提出的框架用于在运行时对常见信号处理应用的能量与精度进行大幅度权衡。
{"title":"Energy-Efficient Application Mapping in FPGA through Computation in Embedded Memory Blocks","authors":"A. Ghosh, Somnath Paul, S. Bhunia","doi":"10.1109/VLSID.2012.108","DOIUrl":"https://doi.org/10.1109/VLSID.2012.108","url":null,"abstract":"FPGAs have emerged as the preferred prototyping and accelerator platform for diverse application domains such as digital signal processing (DSP), security and multimedia, which often impose real-time performance requirements. Most applications in these domains require efficient implementation of complex data paths or functions, e.g. transcendental functions which are spatially mapped in the configurable logic or embedded DSP blocks of a FPGA device. Requirement of elaborate computational resources to realize these operations impose a major barrier to energy efficiency. In this paper, we propose to use embedded memory blocks in FPGA for computing to significantly improve energy efficiency of the applications which are dominated by complex data paths and/or functions. Complex operations are decomposed into large multi-input/output lookup tables (LUTs); mapped to embedded memory blocks and evaluated through memory access over single or multiple cycles. Different parts of an application are selectively mapped into memory or logic/DSP blocks in a heterogeneous mapping framework to maximize energy efficiency. We explore optimal energy configuration of embedded memory for mapping applications of varying input size and develop a complete mapping flow including decomposition, fusion and packing. Effectiveness of the proposed flow is evaluated using a commercial state-of-the-art FPGA system (Altera Stratix IV device). Finally the proposed framework is used to drastically trade-off energy vs accuracy at run-time for common signal processing applications.","PeriodicalId":405021,"journal":{"name":"2012 25th International Conference on VLSI Design","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127339137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Way Sharing Set Associative Cache Architecture 方式共享设置关联缓存架构
Pub Date : 2012-01-07 DOI: 10.1109/VLSID.2012.79
C. J. Janraj, T. V. Kalyan, Tripti S. Warrier, M. Mutyam
In order to minimize the conflict miss rate, cache memories can be organized in set-associative manner. The downside of increasing the associativity is increase in the per access energy consumption. In conventional n-way set-associative caches, irrespective of the set-wise demand, each set has n cache ways at its disposal, but cache sets may exhibit non-uniform demand for these cache ways. Exploiting this property, we propose a novel cache architecture, called way sharing cache, wherein by allowing sharing of cache ways among a pair of cache sets, we obtain dynamic energy savings as high as 41% in DL1 cache with negligible performance penalty.
为了使冲突缺失率最小化,可以采用集合关联的方式组织缓存存储器。增加结合性的缺点是每次访问能耗的增加。在传统的n路集合关联缓存中,无论是否有集合相关的需求,每个集合都有n条缓存方式可供使用,但是缓存集可能对这些缓存方式表现出不一致的需求。利用这一特性,我们提出了一种新的缓存架构,称为方式共享缓存,其中通过允许在一对缓存集之间共享缓存方式,我们可以在DL1缓存中获得高达41%的动态节能,而性能损失可以忽略不计。
{"title":"Way Sharing Set Associative Cache Architecture","authors":"C. J. Janraj, T. V. Kalyan, Tripti S. Warrier, M. Mutyam","doi":"10.1109/VLSID.2012.79","DOIUrl":"https://doi.org/10.1109/VLSID.2012.79","url":null,"abstract":"In order to minimize the conflict miss rate, cache memories can be organized in set-associative manner. The downside of increasing the associativity is increase in the per access energy consumption. In conventional n-way set-associative caches, irrespective of the set-wise demand, each set has n cache ways at its disposal, but cache sets may exhibit non-uniform demand for these cache ways. Exploiting this property, we propose a novel cache architecture, called way sharing cache, wherein by allowing sharing of cache ways among a pair of cache sets, we obtain dynamic energy savings as high as 41% in DL1 cache with negligible performance penalty.","PeriodicalId":405021,"journal":{"name":"2012 25th International Conference on VLSI Design","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117039009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A High Speed FIR Filter Architecture Based on Novel Higher Radix Algorithm 一种基于新型高基数算法的高速FIR滤波器结构
Pub Date : 2012-01-07 DOI: 10.1109/VLSID.2012.48
S. K. Sahoo, K. S. Reddy
Redundant binary (RB) number systems are becoming popular because of its unique carry propagation free addition property. A finite impulse response (FIR) filter computes its output using multiply and accumulate operations. In the present work, a FIR filter based on novel higher radix-256 and RB arithmetic is implemented. The use of radix-256 booth encoding reduces the number of partial product rows in any multiplication by 8 fold. In the present work inputs and coefficients are considered of 16-bit. Hence, only two partial product rows are obtained in RB form for each input and coefficient multiplications. These two partial product rows are added using carry free RB addition. Finally the RB output is converted back to natural binary (NB) form using RB to NB converter. The performance of proposed multiplier architecture for FIR filter is compared with computation sharing multiplier (CSHM) implementation in 90nm technology. The proposed multiplication method for FIR filter is found to be faster approximately by 42% in comparison to CSHM implementation, however with 0.5% and 11% increase in area and power respectively.
冗余二进制(RB)数制因其独特的无进位传播的加法性质而受到广泛的应用。有限脉冲响应(FIR)滤波器使用乘法和累加运算来计算其输出。本文提出了一种基于高基数256和RB算法的FIR滤波器。使用基数-256展位编码减少部分乘积行在任何乘法的8倍的数量。在目前的工作中,输入和系数被认为是16位的。因此,对于每个输入和系数相乘,只能以RB形式得到两个偏积行。这两个部分积行是使用免进位RB加法进行相加的。最后,使用RB - NB转换器将RB输出转换回自然二进制(NB)形式。将所提出的FIR滤波器乘法器结构与基于90纳米技术的计算共享乘法器(CSHM)的性能进行了比较。与CSHM实现相比,提出的FIR滤波器乘法方法的速度大约提高了42%,但面积和功率分别增加了0.5%和11%。
{"title":"A High Speed FIR Filter Architecture Based on Novel Higher Radix Algorithm","authors":"S. K. Sahoo, K. S. Reddy","doi":"10.1109/VLSID.2012.48","DOIUrl":"https://doi.org/10.1109/VLSID.2012.48","url":null,"abstract":"Redundant binary (RB) number systems are becoming popular because of its unique carry propagation free addition property. A finite impulse response (FIR) filter computes its output using multiply and accumulate operations. In the present work, a FIR filter based on novel higher radix-256 and RB arithmetic is implemented. The use of radix-256 booth encoding reduces the number of partial product rows in any multiplication by 8 fold. In the present work inputs and coefficients are considered of 16-bit. Hence, only two partial product rows are obtained in RB form for each input and coefficient multiplications. These two partial product rows are added using carry free RB addition. Finally the RB output is converted back to natural binary (NB) form using RB to NB converter. The performance of proposed multiplier architecture for FIR filter is compared with computation sharing multiplier (CSHM) implementation in 90nm technology. The proposed multiplication method for FIR filter is found to be faster approximately by 42% in comparison to CSHM implementation, however with 0.5% and 11% increase in area and power respectively.","PeriodicalId":405021,"journal":{"name":"2012 25th International Conference on VLSI Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129098133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Pole-Zero Analysis of Low-Dropout (LDO) Regulators: A Tutorial Overview 低差(LDO)稳压器的极零分析:教程概述
Pub Date : 2012-01-07 DOI: 10.1109/VLSID.2012.38
A. Garimella, Punith R. Surkanti, P. Furth
Analyzing poles and zeros of a circuit is often essential for (a) choose the appropriate topology for given specifications, (b) understanding the frequency response of the circuit and (c) stabilizing the circuit by choosing appropriate frequency compensation techniques. Analyzing poles and zeros of a low-dropout (LDO) voltage regulator is often intriguing as (a) the voltage/current control loop need to be broken for small signal analysis and (b) the location of poles move with output load current. The objective of this tutorial is to provide a step-by-step procedure for analyzing poles and zeros in LDO regulators. To this end, two recent state-of-the-art LDO regulators from the literature are analyzed, explaining several intricacies involved. During the process, several frequency compensation techniques are elucidated.
分析电路的极点和零点对于(a)为给定的规格选择合适的拓扑结构,(b)理解电路的频率响应以及(c)通过选择适当的频率补偿技术来稳定电路通常是必不可少的。分析低压差(LDO)稳压器的极点和零点通常很有趣,因为(a)电压/电流控制环需要断开以进行小信号分析,(b)极点的位置随输出负载电流移动。本教程的目的是提供一个逐步分析LDO稳压器中的极点和零点的过程。为此,从文献中分析了两个最近最先进的LDO调节器,解释了涉及的几个复杂性。在此过程中,阐述了几种频率补偿技术。
{"title":"Pole-Zero Analysis of Low-Dropout (LDO) Regulators: A Tutorial Overview","authors":"A. Garimella, Punith R. Surkanti, P. Furth","doi":"10.1109/VLSID.2012.38","DOIUrl":"https://doi.org/10.1109/VLSID.2012.38","url":null,"abstract":"Analyzing poles and zeros of a circuit is often essential for (a) choose the appropriate topology for given specifications, (b) understanding the frequency response of the circuit and (c) stabilizing the circuit by choosing appropriate frequency compensation techniques. Analyzing poles and zeros of a low-dropout (LDO) voltage regulator is often intriguing as (a) the voltage/current control loop need to be broken for small signal analysis and (b) the location of poles move with output load current. The objective of this tutorial is to provide a step-by-step procedure for analyzing poles and zeros in LDO regulators. To this end, two recent state-of-the-art LDO regulators from the literature are analyzed, explaining several intricacies involved. During the process, several frequency compensation techniques are elucidated.","PeriodicalId":405021,"journal":{"name":"2012 25th International Conference on VLSI Design","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123991661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Hybrid NEMS-CMOS DC-DC Converter for Improved Area and Power Efficiency 用于提高面积和功率效率的混合NEMS-CMOS DC-DC变换器
Pub Date : 2012-01-07 DOI: 10.1109/VLSID.2012.74
S. Manohar, R. Venkatasubramanian, P. Balsara
Nano-electromechanical (NEM) relays are a promising class of emerging devices that exhibit zero leakage operation. Numerous end applications of NEM relay logic circuits have been proposed recently [1][2]. This work explores the usage of NEM relays in on-chip DC-DC converters. As a feasibility study of using NEMS in integrated power electronics, discontinuous conduction mode (DCM) buck regulator with specifications suitable for portable applications has been implemented in a NEMS-CMOS hybrid design and the results are compared against a standard commercial 0.35 μm CMOS implementation. Ron of the NEM relay switch is constant and is insensitive to the gate slew rate. This creates a paradigm shift in design of power switches. This coupled with infinite Roff offers significant area and power advantages over CMOS. Accurate Verilog-A models were developed based on published fabrication results of NEM relays [1] operating at 1V with a nominal air gap of 5-10nm. This work shows that NEMS-CMOS hybrid DC-DC converter has an area savings of 60V over CMOS and achieves 95% efficiency at max load condition (50mA).
纳米机电(NEM)继电器是一类很有前途的新兴器件,具有零泄漏操作。NEM继电器逻辑电路的终端应用最近被提出了很多[1][2]。本研究探讨了NEM继电器在片上DC-DC转换器中的应用。作为在集成电力电子中使用NEMS的可行性研究,在NEMS-CMOS混合设计中实现了规格适合便携式应用的不连续传导模式(DCM)降压调节器,并将结果与标准商用0.35 μm CMOS实现进行了比较。NEM继电器开关的Ron是恒定的,对栅极压转率不敏感。这创造了电源开关设计的范式转变。这与无限Roff相结合,与CMOS相比具有显着的面积和功率优势。精确的Verilog-A模型是基于已发表的NEM继电器[1]的制造结果开发的,工作电压为1V,标称气隙为5-10nm。这项工作表明,NEMS-CMOS混合DC-DC转换器比CMOS节省60V的面积,在最大负载条件下(50mA)达到95%的效率。
{"title":"Hybrid NEMS-CMOS DC-DC Converter for Improved Area and Power Efficiency","authors":"S. Manohar, R. Venkatasubramanian, P. Balsara","doi":"10.1109/VLSID.2012.74","DOIUrl":"https://doi.org/10.1109/VLSID.2012.74","url":null,"abstract":"Nano-electromechanical (NEM) relays are a promising class of emerging devices that exhibit zero leakage operation. Numerous end applications of NEM relay logic circuits have been proposed recently [1][2]. This work explores the usage of NEM relays in on-chip DC-DC converters. As a feasibility study of using NEMS in integrated power electronics, discontinuous conduction mode (DCM) buck regulator with specifications suitable for portable applications has been implemented in a NEMS-CMOS hybrid design and the results are compared against a standard commercial 0.35 μm CMOS implementation. Ron of the NEM relay switch is constant and is insensitive to the gate slew rate. This creates a paradigm shift in design of power switches. This coupled with infinite Roff offers significant area and power advantages over CMOS. Accurate Verilog-A models were developed based on published fabrication results of NEM relays [1] operating at 1V with a nominal air gap of 5-10nm. This work shows that NEMS-CMOS hybrid DC-DC converter has an area savings of 60V over CMOS and achieves 95% efficiency at max load condition (50mA).","PeriodicalId":405021,"journal":{"name":"2012 25th International Conference on VLSI Design","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134085260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
A Novel Encoding Scheme for Low Power in Network on Chip Links 一种基于片上链路的低功耗网络编码方案
Pub Date : 2012-01-07 DOI: 10.1109/VLSID.2012.80
Deepa N. Sarma, G. Lakshminarayanan, K. Chavali
Dynamic power dissipation in interconnects is a major contributor to power consumption in Network on Chips (NoCs). This is mainly due to two factors, self switching activity of the particular link and coupling switching activity among adjacent links. Two novel techniques are proposed to reduce power consumption due to switching transition and cross talk. First technique reorders the data in such a way that switching transition is brought down. In the second technique, it is ensured that power consumption due to cross coupling activity is reduced. An end to end encoding scheme facilitating two stage coding to reduce power consumption in wormhole routed network on chip is designed using the proposed power reduction techniques. Encoder and Decoder exhibiting the proposed scheme have been described in RTL level in Verilog HDL, synthesized and mapped into UMC180 nm technology library. It has been observed that the proposed technique (TSC) offers an average reduction in dynamic power consumption of 17.34%. Proposed scheme was compared with existing techniques and observations concluded that there was not much degradation in area, speed and static power dissipation. Power reduction when subjected to different kinds of data streams was analyzed and results indicate that proposed scheme offers uniform power reduction irrespective of the nature of data stream unlike the existing techniques.
互连中的动态功耗是片上网络(noc)功耗的主要贡献者。这主要是由两个因素造成的,一是特定环节的自交换活动,二是相邻环节之间的耦合交换活动。提出了两种新的技术来降低由于切换转换和串扰造成的功耗。第一种技术是对数据重新排序,使切换转换降低。在第二种技术中,可以确保减少由于交叉耦合活动而导致的功耗。利用所提出的降功耗技术,设计了一种端到端编码方案,实现了两级编码,从而降低了片上虫洞路由网络的功耗。采用该方案的编码器和解码器在Verilog HDL语言中进行了RTL级描述,合成并映射到umc180nm技术库中。据观察,提出的技术(TSC)提供了17.34%的动态功耗平均降低。将该方案与现有技术进行了比较,结果表明该方案在面积、速度和静态功耗方面没有太大的下降。对不同类型数据流下的功耗降低进行了分析,结果表明,与现有技术不同,所提出的方案无论数据流的性质如何,都能提供均匀的功耗降低。
{"title":"A Novel Encoding Scheme for Low Power in Network on Chip Links","authors":"Deepa N. Sarma, G. Lakshminarayanan, K. Chavali","doi":"10.1109/VLSID.2012.80","DOIUrl":"https://doi.org/10.1109/VLSID.2012.80","url":null,"abstract":"Dynamic power dissipation in interconnects is a major contributor to power consumption in Network on Chips (NoCs). This is mainly due to two factors, self switching activity of the particular link and coupling switching activity among adjacent links. Two novel techniques are proposed to reduce power consumption due to switching transition and cross talk. First technique reorders the data in such a way that switching transition is brought down. In the second technique, it is ensured that power consumption due to cross coupling activity is reduced. An end to end encoding scheme facilitating two stage coding to reduce power consumption in wormhole routed network on chip is designed using the proposed power reduction techniques. Encoder and Decoder exhibiting the proposed scheme have been described in RTL level in Verilog HDL, synthesized and mapped into UMC180 nm technology library. It has been observed that the proposed technique (TSC) offers an average reduction in dynamic power consumption of 17.34%. Proposed scheme was compared with existing techniques and observations concluded that there was not much degradation in area, speed and static power dissipation. Power reduction when subjected to different kinds of data streams was analyzed and results indicate that proposed scheme offers uniform power reduction irrespective of the nature of data stream unlike the existing techniques.","PeriodicalId":405021,"journal":{"name":"2012 25th International Conference on VLSI Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131020124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Towards Thermal Profiling in CMOS/Memristor Hybrid RRAM Architectures CMOS/忆阻器混合RRAM架构的热分析
Pub Date : 2012-01-07 DOI: 10.1109/VLSID.2012.65
Cory E. Merkel, D. Kudithipudi
In this paper, we propose a hybrid temperature sensing resistive random access memory (TSRRAM) architecture composed of traditional CMOS components and emerging memristive switching devices. The architecture enables each RRAM switching element to be used both as a memory bit and a temperature sensor. The TSRRAM is integrated into an Alpha 21364 processor as an L2 cache. Its accuracy and performance were simulated using a customized simulation framework. SPEC2000 benchmarks were used to generate thermal profiles in the Alpha processor core. Active and passive sensing mechanisms are also introduced as means for DTM algorithms to determine the thermal profile of the RRAM switching layer. The proposed architecture yielded a 2.14 K mean absolute temperature error during passive sensing, which is well within the useful range of dynamic thermal management (DTM) algorithms. Furthermore, the proposed design is shown to have only an 8 cycle performance overhead.
本文提出了一种由传统CMOS元件和新兴记忆开关器件组成的混合温度传感电阻随机存取存储器(TSRRAM)架构。该架构使每个RRAM开关元件既可用作存储位又可用作温度传感器。TSRRAM作为二级缓存集成到Alpha 21364处理器中。利用定制的仿真框架对其精度和性能进行了仿真。使用SPEC2000基准测试在Alpha处理器核心中生成热剖面。本文还介绍了主动和被动传感机制,作为DTM算法确定RRAM交换层热分布的手段。所提出的结构在被动感知期间产生2.14 K的平均绝对温度误差,这完全在动态热管理(DTM)算法的有用范围内。此外,所提出的设计显示只有8个周期的性能开销。
{"title":"Towards Thermal Profiling in CMOS/Memristor Hybrid RRAM Architectures","authors":"Cory E. Merkel, D. Kudithipudi","doi":"10.1109/VLSID.2012.65","DOIUrl":"https://doi.org/10.1109/VLSID.2012.65","url":null,"abstract":"In this paper, we propose a hybrid temperature sensing resistive random access memory (TSRRAM) architecture composed of traditional CMOS components and emerging memristive switching devices. The architecture enables each RRAM switching element to be used both as a memory bit and a temperature sensor. The TSRRAM is integrated into an Alpha 21364 processor as an L2 cache. Its accuracy and performance were simulated using a customized simulation framework. SPEC2000 benchmarks were used to generate thermal profiles in the Alpha processor core. Active and passive sensing mechanisms are also introduced as means for DTM algorithms to determine the thermal profile of the RRAM switching layer. The proposed architecture yielded a 2.14 K mean absolute temperature error during passive sensing, which is well within the useful range of dynamic thermal management (DTM) algorithms. Furthermore, the proposed design is shown to have only an 8 cycle performance overhead.","PeriodicalId":405021,"journal":{"name":"2012 25th International Conference on VLSI Design","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132876342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
A Compact Temperature Sensor at 1.8µA per Hz Conversion Rate and 1.1 °C Accuracy for SOCs 紧凑型温度传感器,1.8 μ A / Hz转换速率和1.1°C精度,适用于soc
Pub Date : 2012-01-07 DOI: 10.1109/VLSID.2012.67
S. Sen, D. Babitch, N. Dubash
A compact (0.1 mm2 area) temperature-recording system that is suitable for easy integration into an SOC is described. It includes a PTAT sensor, a pre-amplifier, and a first-order sigma-delta modulator based ADC all operating at 1.2V supply. The switched-capacitor pre-amplifier uses an auto-zeroing scheme based upon capacitive reset to avoid the need for shorting the op-amp outputs and inputs during reset. Errors due to transistor leakage are eliminated by selective use of thick-oxide transistors in the design. Another contribution of the paper is to illustrate a scheme that uses two reference voltages in the sigma-delta modulator ADC corresponding to the minimum and maximum temperatures measured to improve its effective resolution.
描述了一种紧凑的(0.1 mm2面积)温度记录系统,适用于易于集成到SOC中。它包括一个PTAT传感器、一个前置放大器和一个基于一阶sigma-delta调制器的ADC,全部工作在1.2V电源下。开关电容前置放大器采用基于电容复位的自动调零方案,以避免在复位过程中短路运放输出和输入。通过在设计中选择性地使用厚氧化晶体管,可以消除晶体管泄漏引起的误差。本文的另一个贡献是说明了一种方案,该方案在sigma-delta调制器ADC中使用两个参考电压,对应于测量的最低和最高温度,以提高其有效分辨率。
{"title":"A Compact Temperature Sensor at 1.8µA per Hz Conversion Rate and 1.1 °C Accuracy for SOCs","authors":"S. Sen, D. Babitch, N. Dubash","doi":"10.1109/VLSID.2012.67","DOIUrl":"https://doi.org/10.1109/VLSID.2012.67","url":null,"abstract":"A compact (0.1 mm2 area) temperature-recording system that is suitable for easy integration into an SOC is described. It includes a PTAT sensor, a pre-amplifier, and a first-order sigma-delta modulator based ADC all operating at 1.2V supply. The switched-capacitor pre-amplifier uses an auto-zeroing scheme based upon capacitive reset to avoid the need for shorting the op-amp outputs and inputs during reset. Errors due to transistor leakage are eliminated by selective use of thick-oxide transistors in the design. Another contribution of the paper is to illustrate a scheme that uses two reference voltages in the sigma-delta modulator ADC corresponding to the minimum and maximum temperatures measured to improve its effective resolution.","PeriodicalId":405021,"journal":{"name":"2012 25th International Conference on VLSI Design","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132294605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2012 25th International Conference on VLSI Design
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1