首页 > 最新文献

21st International Conference on VLSI Design (VLSID 2008)最新文献

英文 中文
Reconfiguring CMOS as Pseudo N/PMOS for Defect Tolerance in Nano-Scale CMOS 纳米级CMOS缺陷容限的伪N/PMOS重构
Pub Date : 2008-01-04 DOI: 10.1109/VLSI.2008.104
M. Ashouei, A. Singh, A. Chatterjee
End-of-the-roadmap nanoscale CMOS is expected to suffer from significant defectivity due to manufacturing defects, random process variations, and wear-out during normal operational. To ensure acceptable yield and reliable operation of the circuit during its life-time, future circuits must be equipped with significant defect-tolerance capabilities. Traditional defect-tolerance approaches are too expensive to be applied to general purpose circuits. In this paper, we propose a defect-tolerant CMOS logic gate architecture that exploits the inherent functional redundancy in static CMOS. This is accomplished by reconfiguring the CMOS logic gate to a pseudo-NMOS-like gate in the presence of a defect. The resulting defect-tolerant logic architecture incurs only a modest area overhead. The proposed gate design can tolerate defects in either the pull-up or pull-down network of the gate. The architecture can tolerate multiple defects across the logic gates of a CMOS logic circuit. The effectiveness of the proposed defect tolerance technique and its impact on circuit delay and power is studied. It is shown that the technique imposes little delay overhead (less than 6%) but incurs power dissipation overhead (less than 20%) in the presence of defects.
由于制造缺陷、随机工艺变化和正常运行过程中的损耗,纳米级CMOS预计将遭受严重的缺陷。为了确保可接受的良率和电路在其生命周期内的可靠运行,未来的电路必须配备显著的缺陷容忍能力。传统的缺陷容限方法太昂贵,无法应用于通用电路。在本文中,我们提出了一种容错CMOS逻辑门架构,利用静态CMOS固有的功能冗余。这是通过在存在缺陷的情况下将CMOS逻辑门重新配置为伪nmos类门来实现的。由此产生的容错逻辑体系结构只会产生适度的面积开销。所提出的浇口设计可以容忍上拉或下拉浇口网络中的缺陷。该结构可以容忍CMOS逻辑电路逻辑门上的多个缺陷。研究了缺陷容限技术的有效性及其对电路延迟和功率的影响。结果表明,该技术的延迟开销很小(小于6%),但在存在缺陷的情况下会产生功耗开销(小于20%)。
{"title":"Reconfiguring CMOS as Pseudo N/PMOS for Defect Tolerance in Nano-Scale CMOS","authors":"M. Ashouei, A. Singh, A. Chatterjee","doi":"10.1109/VLSI.2008.104","DOIUrl":"https://doi.org/10.1109/VLSI.2008.104","url":null,"abstract":"End-of-the-roadmap nanoscale CMOS is expected to suffer from significant defectivity due to manufacturing defects, random process variations, and wear-out during normal operational. To ensure acceptable yield and reliable operation of the circuit during its life-time, future circuits must be equipped with significant defect-tolerance capabilities. Traditional defect-tolerance approaches are too expensive to be applied to general purpose circuits. In this paper, we propose a defect-tolerant CMOS logic gate architecture that exploits the inherent functional redundancy in static CMOS. This is accomplished by reconfiguring the CMOS logic gate to a pseudo-NMOS-like gate in the presence of a defect. The resulting defect-tolerant logic architecture incurs only a modest area overhead. The proposed gate design can tolerate defects in either the pull-up or pull-down network of the gate. The architecture can tolerate multiple defects across the logic gates of a CMOS logic circuit. The effectiveness of the proposed defect tolerance technique and its impact on circuit delay and power is studied. It is shown that the technique imposes little delay overhead (less than 6%) but incurs power dissipation overhead (less than 20%) in the presence of defects.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126739860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Memory Yield Improvement through Multiple Test Sequences and Application-Aware Fault Models 通过多测试序列和应用感知故障模型提高内存成品率
Pub Date : 2008-01-04 DOI: 10.1109/VLSI.2008.115
A. Kokrady, C. Ravikumar, N. Chandrachoodan
In this paper, we propose a way to improve the yield of memory products by selecting the appropriate test strategy for memory Built- in Self-Test (BIST). We argue that by testing the memory through a sequence of test algorithms which differ in their fault coverage, it is possible to bin the memory into multiple yield bins and increase the yield and product revenue. Further, the test strategy must take into consideration the usage model of the memory. Thus, a number of video and audio buffers are used in sequential access mode, but are overtested using conventional memory test algorithms which model a large number of defects which do not impact the operation of the buffers. We propose a binning strategy where memory test algorithms are applied in different order of strictness such that bins have a specific defect / fault grade. Depending on the applications some of these bins need not be discarded but sold at a lower price as the functionality would never catch the fault due to its usage of memory. We introduce the notion of a test map for the on-chip memories in a SoC and provide results of yield simulation on two specific test strategies called "Most Strict First" and "Least Strict First". Our simulations indicate that significant improvements in yield are possible through the adoption of the proposed technique. We show that the BIST controller area and run-time overheads also reduce when information about the usage model of the memory, such as sequential access, is exploited.
本文提出了一种通过选择合适的内存内建自检(BIST)测试策略来提高内存产品成品率的方法。我们认为,通过一系列不同故障覆盖率的测试算法来测试内存,可以将内存分为多个良率箱,从而提高良率和产品收益。此外,测试策略必须考虑到内存的使用模型。因此,在顺序访问模式下使用了许多视频和音频缓冲区,但使用传统的内存测试算法进行了过度测试,该算法模拟了大量不影响缓冲区操作的缺陷。我们提出了一种分箱策略,其中内存测试算法以不同的严格顺序应用,使得分箱具有特定的缺陷/故障等级。根据应用程序的不同,这些箱子中的一些不需要丢弃,但以较低的价格出售,因为该功能永远不会捕获由于内存使用而导致的故障。我们介绍了SoC中片上存储器测试图的概念,并提供了两种特定测试策略的良率模拟结果,称为“最严格优先”和“最不严格优先”。我们的模拟表明,通过采用所提出的技术可以显著提高产量。我们表明,当利用有关内存使用模型的信息(如顺序访问)时,BIST控制器区域和运行时开销也会减少。
{"title":"Memory Yield Improvement through Multiple Test Sequences and Application-Aware Fault Models","authors":"A. Kokrady, C. Ravikumar, N. Chandrachoodan","doi":"10.1109/VLSI.2008.115","DOIUrl":"https://doi.org/10.1109/VLSI.2008.115","url":null,"abstract":"In this paper, we propose a way to improve the yield of memory products by selecting the appropriate test strategy for memory Built- in Self-Test (BIST). We argue that by testing the memory through a sequence of test algorithms which differ in their fault coverage, it is possible to bin the memory into multiple yield bins and increase the yield and product revenue. Further, the test strategy must take into consideration the usage model of the memory. Thus, a number of video and audio buffers are used in sequential access mode, but are overtested using conventional memory test algorithms which model a large number of defects which do not impact the operation of the buffers. We propose a binning strategy where memory test algorithms are applied in different order of strictness such that bins have a specific defect / fault grade. Depending on the applications some of these bins need not be discarded but sold at a lower price as the functionality would never catch the fault due to its usage of memory. We introduce the notion of a test map for the on-chip memories in a SoC and provide results of yield simulation on two specific test strategies called \"Most Strict First\" and \"Least Strict First\". Our simulations indicate that significant improvements in yield are possible through the adoption of the proposed technique. We show that the BIST controller area and run-time overheads also reduce when information about the usage model of the memory, such as sequential access, is exploited.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127187398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GyroCompiler: A Soft IP Model Synthesis and Analysis Framework for Design of MEMS Based Gyroscopes 陀螺编译器:用于MEMS陀螺仪设计的软IP模型综合与分析框架
Pub Date : 2008-01-04 DOI: 10.1109/VLSI.2008.10
S. Jairam, N. Bhat
A model to create a simulation and a synthesis framework for design of gyroscopes is proposed. The main motivation is to have a framework for developing gyroscope models in the form of soft intellectual properties (IPs) for their subsequent integration into mainstream VLSI systems. Synthesis targetting different performance classes of gyros is based on a simple table look-up. The next level of model refinement involving optimization of the different physical aspects of the gyro such as its shape is based on statistical design of experiments (DoE). Both FEM and Simulink based models have been used to build a custom DoE framework to estimate the parameters related to a desired gyro structure. A simple gyroscope structure is modeled and analysed with both FEM and Simulink based models. It is shown that DoE based framework can capture the parameters of a gyroscope structure, accurately and that it can be easily integrated with system level synthesis tools.
提出了陀螺仪设计的仿真模型和综合框架。主要动机是有一个框架,以软知识产权(ip)的形式开发陀螺仪模型,以便随后集成到主流VLSI系统中。针对不同性能等级的陀螺仪的综合是基于一个简单的表查找。下一阶段的模型细化涉及陀螺仪的不同物理方面的优化,如其形状,是基于统计设计的实验(DoE)。基于有限元和Simulink的模型已经被用来建立一个自定义的DoE框架来估计与期望的陀螺结构相关的参数。对一个简单的陀螺仪结构进行了有限元和Simulink建模分析。结果表明,基于DoE的框架可以准确地捕获陀螺仪结构的参数,并且可以方便地与系统级综合工具集成。
{"title":"GyroCompiler: A Soft IP Model Synthesis and Analysis Framework for Design of MEMS Based Gyroscopes","authors":"S. Jairam, N. Bhat","doi":"10.1109/VLSI.2008.10","DOIUrl":"https://doi.org/10.1109/VLSI.2008.10","url":null,"abstract":"A model to create a simulation and a synthesis framework for design of gyroscopes is proposed. The main motivation is to have a framework for developing gyroscope models in the form of soft intellectual properties (IPs) for their subsequent integration into mainstream VLSI systems. Synthesis targetting different performance classes of gyros is based on a simple table look-up. The next level of model refinement involving optimization of the different physical aspects of the gyro such as its shape is based on statistical design of experiments (DoE). Both FEM and Simulink based models have been used to build a custom DoE framework to estimate the parameters related to a desired gyro structure. A simple gyroscope structure is modeled and analysed with both FEM and Simulink based models. It is shown that DoE based framework can capture the parameters of a gyroscope structure, accurately and that it can be easily integrated with system level synthesis tools.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126523126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Highly Linear Wide Dynamic Swing CMOS Transconductance Multiplier Using Source-Degeneration V-I Converters 采用源退化V-I变换器的高线性宽动态摆CMOS跨导倍增器
Pub Date : 2008-01-04 DOI: 10.1109/VLSI.2008.91
S. Garimella
A novel compact four quadrant CMOS transconductance analog multiplier with wide dynamic swing and wide gain bandwidth product using source- degeneration V-I converters is proposed. The design consists of two stages. First stage is a voltage adder and utilizes two V-I converters with diode connected load and source-degeneration resistor which can provide high bandwidth. The second stage consists of two cross connected differential pairs with source- degeneration resistor which act as current steering elements performing V to I conversion with wide dynamic swing and continuous adjustable gain. Unlike conventional multipliers, in the proposed scheme all the significant intermediate terms generated are linear reducing the non-linear term cancellation, making the circuit power efficient. SPICE simulation results in 0.5 mum CMOS AMI technology are presented which validate the proposed work.
提出了一种新型的紧凑的四象限CMOS跨导模拟乘法器,具有宽动态摆幅和宽增益带宽积。设计分为两个阶段。第一级是电压加法器,利用两个V-I转换器,二极管连接负载和源退化电阻,可以提供高带宽。第二级由两个带源退化电阻的交叉连接的差分对组成,它们作为电流转向元件进行V到I转换,具有宽动态摆动和连续可调增益。与传统乘法器不同的是,该方案中产生的所有重要中间项都是线性的,减少了非线性项的抵消,从而提高了电路的功率效率。给出了在0.5 μ m CMOS AMI技术上的SPICE仿真结果,验证了所提出的工作。
{"title":"Highly Linear Wide Dynamic Swing CMOS Transconductance Multiplier Using Source-Degeneration V-I Converters","authors":"S. Garimella","doi":"10.1109/VLSI.2008.91","DOIUrl":"https://doi.org/10.1109/VLSI.2008.91","url":null,"abstract":"A novel compact four quadrant CMOS transconductance analog multiplier with wide dynamic swing and wide gain bandwidth product using source- degeneration V-I converters is proposed. The design consists of two stages. First stage is a voltage adder and utilizes two V-I converters with diode connected load and source-degeneration resistor which can provide high bandwidth. The second stage consists of two cross connected differential pairs with source- degeneration resistor which act as current steering elements performing V to I conversion with wide dynamic swing and continuous adjustable gain. Unlike conventional multipliers, in the proposed scheme all the significant intermediate terms generated are linear reducing the non-linear term cancellation, making the circuit power efficient. SPICE simulation results in 0.5 mum CMOS AMI technology are presented which validate the proposed work.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133308602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Single Chip Encryptor/Decryptor Core Implementation of AES Algorithm 单片加密/解密核心实现AES算法
Pub Date : 2008-01-04 DOI: 10.1109/VLSI.2008.82
Monjur Alam, Santosh K. Ghosh, D. R. Chowdhury, I. Sengupta
This paper presents a single chip encryp- tor/decryptor core implementation of Advanced Encryption Standard (AES-Rijndael) cryptosystem. The suggested architecture is capable of handling all possible combinations of standard bit lengths (128,192,256) of data and key. The fully rolled inner- pipelined architecture ensures lesser hardware complexity. The architecture does reutilize precomputed blocks, in the sense that the same hardware is shared during encryption and decryption as much as possible. The design has been implemented on Xilinx XCVe1000-8bg560 device. The performance of the architecture has been compared with existing results in the literature and has been found to be the most efficient (throughput/area) implementation of the AES algorithm.
本文提出了一种高级加密标准(AES-Rijndael)密码系统的单片加解密核心实现。建议的体系结构能够处理数据和密钥的标准位长度(128,192,256)的所有可能组合。完全轧制的内部流水线架构确保了较低的硬件复杂性。该体系结构确实重用了预先计算的块,在加密和解密期间尽可能多地共享相同的硬件。该设计已在Xilinx XCVe1000-8bg560器件上实现。该架构的性能已与文献中的现有结果进行了比较,并发现它是AES算法中最有效的(吞吐量/面积)实现。
{"title":"Single Chip Encryptor/Decryptor Core Implementation of AES Algorithm","authors":"Monjur Alam, Santosh K. Ghosh, D. R. Chowdhury, I. Sengupta","doi":"10.1109/VLSI.2008.82","DOIUrl":"https://doi.org/10.1109/VLSI.2008.82","url":null,"abstract":"This paper presents a single chip encryp- tor/decryptor core implementation of Advanced Encryption Standard (AES-Rijndael) cryptosystem. The suggested architecture is capable of handling all possible combinations of standard bit lengths (128,192,256) of data and key. The fully rolled inner- pipelined architecture ensures lesser hardware complexity. The architecture does reutilize precomputed blocks, in the sense that the same hardware is shared during encryption and decryption as much as possible. The design has been implemented on Xilinx XCVe1000-8bg560 device. The performance of the architecture has been compared with existing results in the literature and has been found to be the most efficient (throughput/area) implementation of the AES algorithm.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131325598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Voltage and Temperature Scalable Gate Delay and Slew Models Including Intra-Gate Variations 电压和温度可扩展的门延迟和转换模型,包括门内变化
Pub Date : 2008-01-04 DOI: 10.1109/VLSI.2008.92
B. P. Das, Janakiraman Viraraghavan, B. Amrutur, H. S. Jamadagni, N. Arvind
We investigate the feasibility of developing a comprehensive gate delay and slew models which incorporates output load, input edge slew, supply voltage, temperature, global process variations and local process variations all in the same model. We find that the standard polynomial models cannot handle such a large heterogeneous set of input variables. We instead use neural networks, which are well known for their ability to approximate any arbitrary continuous function. Our initial experiments with a small subset of standard cell gates of an industrial 65 nm library show promising results with error in mean less than 1%, error in standard deviation less than 3% and maximum error less than 11% as compared to SPICE for models covering 0.9- 1.1 V of supply, -40degC to 125degC of temperature, load, slew and global and local process parameters. Enhancing the conventional libraries to be voltage and temperature scalable with similar accuracy requires on an average 4x more SPICE characterization runs.
我们研究了开发一个综合的门延迟和转换模型的可行性,该模型将输出负载、输入边缘转换、电源电压、温度、全局过程变化和局部过程变化都包含在同一个模型中。我们发现标准的多项式模型不能处理如此大的异质输入变量集。我们转而使用神经网络,它以其近似任意连续函数的能力而闻名。我们对工业65nm库的一小部分标准电池门进行的初步实验显示,与SPICE相比,对于覆盖0.9- 1.1 V电源,-40°c至125°c温度,负载,旋转以及全局和局部工艺参数的模型,平均误差小于1%,标准差误差小于3%,最大误差小于11%。将传统库增强为具有相似精度的电压和温度可扩展,平均需要多运行4倍的SPICE表征。
{"title":"Voltage and Temperature Scalable Gate Delay and Slew Models Including Intra-Gate Variations","authors":"B. P. Das, Janakiraman Viraraghavan, B. Amrutur, H. S. Jamadagni, N. Arvind","doi":"10.1109/VLSI.2008.92","DOIUrl":"https://doi.org/10.1109/VLSI.2008.92","url":null,"abstract":"We investigate the feasibility of developing a comprehensive gate delay and slew models which incorporates output load, input edge slew, supply voltage, temperature, global process variations and local process variations all in the same model. We find that the standard polynomial models cannot handle such a large heterogeneous set of input variables. We instead use neural networks, which are well known for their ability to approximate any arbitrary continuous function. Our initial experiments with a small subset of standard cell gates of an industrial 65 nm library show promising results with error in mean less than 1%, error in standard deviation less than 3% and maximum error less than 11% as compared to SPICE for models covering 0.9- 1.1 V of supply, -40degC to 125degC of temperature, load, slew and global and local process parameters. Enhancing the conventional libraries to be voltage and temperature scalable with similar accuracy requires on an average 4x more SPICE characterization runs.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134579677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
PTSMT: A Tool for Cross-Level Power, Performance, and Thermal Exploration of SMT Processors PTSMT: SMT处理器的跨能级功率、性能和热研究工具
Pub Date : 2008-01-04 DOI: 10.1109/VLSI.2008.84
D. Kannan, Aseem Gupta, Aviral Shrivastava, N. Dutt, F. Kurdahi
Simultaneous Multi-Threading (SMT) processors are becoming popular because they exploit both instruction-level and thread- level parallelism by issuing instructions from different threads in the same cycle. However, the issues of power and thermal management hinder SMT processors fabricated in nano-scale technologies. Power and thermal issues in SMT processors not only limit the achievable performance, but also have a direct impact on the cost and viability of these processors. While several performance simulation tools to explore the performance aspect of SMT processors early in their design phase exist, there is a lack of early power and performance evaluation tools for SMT processors. To this end, we have developed PTSMT: a tightly coupled power, performance and thermal exploration tool for SMT processors. In this paper, we demonstrate that PTSMT can automatically and effectively accomplish power, performance and thermal exploration of SMT processors at various levels of design hierarchy, at the application level, microarchitecture level, and physical level. Our experimental results show that: at the application level, number of contexts into which an application is divided could affect performance by 2.2times, energy by 52%, and peak temperature by 35degC; and at the microarchitecture level, context swapping during run time could reduce energy by 9% and improve performance by 8%. These observations indicate the size of the design space which can be explored using PTSMT.
同步多线程(SMT)处理器正变得越来越流行,因为它们通过在同一周期内从不同线程发出指令来利用指令级和线程级并行性。然而,功率和热管理问题阻碍了纳米级SMT处理器的制造。SMT处理器中的功耗和热问题不仅限制了可实现的性能,而且对这些处理器的成本和可行性也有直接影响。虽然存在一些性能仿真工具,可以在SMT处理器的设计阶段早期探索其性能方面,但缺乏用于SMT处理器的早期功率和性能评估工具。为此,我们开发了PTSMT:一种紧密耦合的SMT处理器功率,性能和热勘探工具。在本文中,我们证明了PTSMT可以自动有效地完成SMT处理器在设计层次、应用层、微体系结构层和物理层的功耗、性能和热探测。我们的实验结果表明:在应用程序级别,将应用程序划分为上下文的数量可能会影响性能2.2倍,能量降低52%,峰值温度降低35℃;在微体系结构级别上,运行时的上下文交换可以减少9%的能量并提高8%的性能。这些观察结果表明,设计空间的大小,可以探索使用PTSMT。
{"title":"PTSMT: A Tool for Cross-Level Power, Performance, and Thermal Exploration of SMT Processors","authors":"D. Kannan, Aseem Gupta, Aviral Shrivastava, N. Dutt, F. Kurdahi","doi":"10.1109/VLSI.2008.84","DOIUrl":"https://doi.org/10.1109/VLSI.2008.84","url":null,"abstract":"Simultaneous Multi-Threading (SMT) processors are becoming popular because they exploit both instruction-level and thread- level parallelism by issuing instructions from different threads in the same cycle. However, the issues of power and thermal management hinder SMT processors fabricated in nano-scale technologies. Power and thermal issues in SMT processors not only limit the achievable performance, but also have a direct impact on the cost and viability of these processors. While several performance simulation tools to explore the performance aspect of SMT processors early in their design phase exist, there is a lack of early power and performance evaluation tools for SMT processors. To this end, we have developed PTSMT: a tightly coupled power, performance and thermal exploration tool for SMT processors. In this paper, we demonstrate that PTSMT can automatically and effectively accomplish power, performance and thermal exploration of SMT processors at various levels of design hierarchy, at the application level, microarchitecture level, and physical level. Our experimental results show that: at the application level, number of contexts into which an application is divided could affect performance by 2.2times, energy by 52%, and peak temperature by 35degC; and at the microarchitecture level, context swapping during run time could reduce energy by 9% and improve performance by 8%. These observations indicate the size of the design space which can be explored using PTSMT.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132011568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploiting Circuit Reconvergence through Static Learning in CNF SAT Solvers 利用静态学习在CNF SAT求解器中的电路再收敛
Pub Date : 2008-01-04 DOI: 10.1109/VLSI.2008.90
Yinlei Yu, C. Brien, S. Malik
Most contemporary SAT solvers use a conjunctive-normal-form (CNF) representation for logic functions due to the availability of efficient algorithms for this form, such as deduction through unit propagation and conflict driven learning using clause resolution. The use of CNF generally entails transformation to this form from other representations such as logic circuits (Tseitin, 1970). However, this transformation results in loss of information such as direction of signal flow and observability of signals at circuit outputs (Een, 2003)(Fu, 2005). This has prompted the development of various circuit-based solvers (Ganai et al., 2002), hybrid CNF+circuit-based solvers (Fu, 2005), as well as augmented CNF solvers (Een, 2003). Having the circuit available provides for additional capabilities at a cost, and thus requires careful analysis to determine the viability of each approach. This paper highlights one specific capability provided by a circuit: the ability to consider reconvergent paths in unit propagation. Unit propagation is the workhorse of contemporary SAT solvers, thus any improvement to this has significant practical potential. We first demonstrate that the Tseitin circuit-to-CNF transformation limits backward unit propagation and how additional implications can be derived when unit propagation across multiple paths is considered. Next, we show how these implications can be exploited by statically learning clauses during circuit pre-processing. The results of the practical implementation of these algorithms show that the static learning can provide significant speed-up on several classes of benchmark circuits. Finally, we discuss how this work compares with other circuit-based approaches, especially those arising from the automatic-test-pattern-generation (ATPG) community (e.g. recursive learning) and circuit and non- circuit based pre-processors.
大多数当代SAT求解器使用合取范式(CNF)表示逻辑函数,因为这种形式的有效算法可用,例如通过单元传播的演绎和使用子句解析的冲突驱动学习。CNF的使用通常需要将其他表示(如逻辑电路)转换为这种形式(tseittin, 1970)。然而,这种转换会导致信号流方向和电路输出信号的可观察性等信息的丢失(Een, 2003)(Fu, 2005)。这促使了各种基于电路的求解器的发展(Ganai等人,2002年),混合CNF+基于电路的求解器(Fu, 2005年),以及增强CNF求解器(Een, 2003年)。有了可用的电路,额外的功能是有代价的,因此需要仔细分析,以确定每种方法的可行性。本文强调了电路提供的一种特殊能力:在单元传播中考虑再收敛路径的能力。单元传播是当代SAT求解器的主力军,因此对其进行任何改进都具有重大的实际潜力。我们首先证明了tseittin电路到cnf转换限制了向后的单元传播,以及当考虑跨多条路径的单元传播时,如何推导出额外的含义。接下来,我们将展示如何在电路预处理期间通过静态学习子句利用这些含义。这些算法的实际实现结果表明,静态学习可以在几类基准电路上提供显着的加速。最后,我们讨论了这项工作与其他基于电路的方法的比较,特别是那些来自自动测试模式生成(ATPG)社区(例如递归学习)以及基于电路和非电路的预处理器的方法。
{"title":"Exploiting Circuit Reconvergence through Static Learning in CNF SAT Solvers","authors":"Yinlei Yu, C. Brien, S. Malik","doi":"10.1109/VLSI.2008.90","DOIUrl":"https://doi.org/10.1109/VLSI.2008.90","url":null,"abstract":"Most contemporary SAT solvers use a conjunctive-normal-form (CNF) representation for logic functions due to the availability of efficient algorithms for this form, such as deduction through unit propagation and conflict driven learning using clause resolution. The use of CNF generally entails transformation to this form from other representations such as logic circuits (Tseitin, 1970). However, this transformation results in loss of information such as direction of signal flow and observability of signals at circuit outputs (Een, 2003)(Fu, 2005). This has prompted the development of various circuit-based solvers (Ganai et al., 2002), hybrid CNF+circuit-based solvers (Fu, 2005), as well as augmented CNF solvers (Een, 2003). Having the circuit available provides for additional capabilities at a cost, and thus requires careful analysis to determine the viability of each approach. This paper highlights one specific capability provided by a circuit: the ability to consider reconvergent paths in unit propagation. Unit propagation is the workhorse of contemporary SAT solvers, thus any improvement to this has significant practical potential. We first demonstrate that the Tseitin circuit-to-CNF transformation limits backward unit propagation and how additional implications can be derived when unit propagation across multiple paths is considered. Next, we show how these implications can be exploited by statically learning clauses during circuit pre-processing. The results of the practical implementation of these algorithms show that the static learning can provide significant speed-up on several classes of benchmark circuits. Finally, we discuss how this work compares with other circuit-based approaches, especially those arising from the automatic-test-pattern-generation (ATPG) community (e.g. recursive learning) and circuit and non- circuit based pre-processors.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124446430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Delay and Energy Efficient Design of On-Chip Encoded Bus with Repeaters 带中继器的片上编码总线的延迟和节能设计
Pub Date : 2008-01-04 DOI: 10.1109/VLSI.2008.21
Qingli Zhang, Jinxiang Wang, Y. Ye
In this paper, we propose a new spatial and temporal encoding approach for generic on-chip global buses with repeaters that enables higher performance while reducing peak energy and average energy. The proposed encoding approach exploits the benefits of temporal encoding circuit and spatial bus-invert coding techniques to simultaneously eliminate opposite transitions on adjacent wires and reduce the number of self-transitions and coupling-transitions. In the design process of applying encoding techniques for reduced bus delay and energy, we present a repeater insertion design methodology to determine the repeater size and inter-repeater bus length which minimizes the total bus energy dissipation while satisfying target delay and slew-rate constraints. This methodology can be employed to obtain optimal energy vs. delay trade-offs under slew-rate constraint for various encoding techniques.
在本文中,我们提出了一种新的空间和时间编码方法,用于具有中继器的通用片上全局总线,在降低峰值能量和平均能量的同时实现更高的性能。所提出的编码方法利用时间编码电路和空间总线反相编码技术的优点,同时消除相邻导线上的反向转换,减少自转换和耦合转换的数量。在应用编码技术降低总线延迟和能量的设计过程中,我们提出了一种中继器插入设计方法,以确定中继器尺寸和中继器间总线长度,从而使总总线能量消耗最小化,同时满足目标延迟和sleslerate约束。该方法可用于各种编码技术在慢速约束下获得最佳的能量与延迟权衡。
{"title":"Delay and Energy Efficient Design of On-Chip Encoded Bus with Repeaters","authors":"Qingli Zhang, Jinxiang Wang, Y. Ye","doi":"10.1109/VLSI.2008.21","DOIUrl":"https://doi.org/10.1109/VLSI.2008.21","url":null,"abstract":"In this paper, we propose a new spatial and temporal encoding approach for generic on-chip global buses with repeaters that enables higher performance while reducing peak energy and average energy. The proposed encoding approach exploits the benefits of temporal encoding circuit and spatial bus-invert coding techniques to simultaneously eliminate opposite transitions on adjacent wires and reduce the number of self-transitions and coupling-transitions. In the design process of applying encoding techniques for reduced bus delay and energy, we present a repeater insertion design methodology to determine the repeater size and inter-repeater bus length which minimizes the total bus energy dissipation while satisfying target delay and slew-rate constraints. This methodology can be employed to obtain optimal energy vs. delay trade-offs under slew-rate constraint for various encoding techniques.","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127335804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Dynamic Error Detection for Dependable Cache Coherency in Multicore Architectures 多核架构中可靠缓存一致性的动态错误检测
Pub Date : 2008-01-04 DOI: 10.1109/VLSI.2008.68
Hui Wang, Sandeep Baldawa, R. Sangireddy
In chip multiprocessor (CMP) systems the various effects of technology scaling make the on chip components more susceptible to faults. Most of the earlier schemes that address fault tolerance issues in CMPs adopt redundant-thread techniques. These techniques are mostly effective, except that they fail to detect errors resulting from faults in hardware components on chip that commonly serve multiple cores. The cache coherence controller (CC) logic, which ensures consistency of data shared among multiple threads, is a vital common component in CMPs. A fault in CC logic of any of the processors may lead to errors in the data states in the entire CMP system. It is observed that up to 59.6% of the memory references cause a change in cache state for SPLASH-2 applications. We propose a novel scheme with a verification logic that can dynamically detect errors in the CC logic of multiple cores in a CMP system. The entire verification logic is designed with a negligible area of 0.1372 sq.mm using a TSMC 0.18 mu4-metal layer process technology. Even at highly aggressive fault injection rates, the logic achieves an average error coverage of more than 95% (and almost 100% for some applications)
在芯片多处理器(CMP)系统中,技术缩放的各种影响使芯片上的组件更容易发生故障。大多数解决cmp中容错问题的早期方案都采用冗余线程技术。这些技术大多是有效的,除了它们无法检测到通常服务于多个核心的芯片上的硬件组件故障所导致的错误。缓存一致性控制器(CC)逻辑是cmp中重要的公共组件,它保证了多线程间共享数据的一致性。任何处理器的CC逻辑出现故障都可能导致整个CMP系统的数据状态出现错误。可以观察到,高达59.6%的内存引用会导致SPLASH-2应用程序的缓存状态发生变化。我们提出了一种新的验证逻辑方案,该方案可以动态检测CMP系统中多核CC逻辑中的错误。整个验证逻辑的设计面积为0.1372平方,可以忽略不计。mm采用台积电0.18 mu4金属层工艺技术。即使在高度激进的故障注入率下,逻辑也能实现95%以上的平均错误覆盖率(对于某些应用程序几乎是100%)。
{"title":"Dynamic Error Detection for Dependable Cache Coherency in Multicore Architectures","authors":"Hui Wang, Sandeep Baldawa, R. Sangireddy","doi":"10.1109/VLSI.2008.68","DOIUrl":"https://doi.org/10.1109/VLSI.2008.68","url":null,"abstract":"In chip multiprocessor (CMP) systems the various effects of technology scaling make the on chip components more susceptible to faults. Most of the earlier schemes that address fault tolerance issues in CMPs adopt redundant-thread techniques. These techniques are mostly effective, except that they fail to detect errors resulting from faults in hardware components on chip that commonly serve multiple cores. The cache coherence controller (CC) logic, which ensures consistency of data shared among multiple threads, is a vital common component in CMPs. A fault in CC logic of any of the processors may lead to errors in the data states in the entire CMP system. It is observed that up to 59.6% of the memory references cause a change in cache state for SPLASH-2 applications. We propose a novel scheme with a verification logic that can dynamically detect errors in the CC logic of multiple cores in a CMP system. The entire verification logic is designed with a negligible area of 0.1372 sq.mm using a TSMC 0.18 mu4-metal layer process technology. Even at highly aggressive fault injection rates, the logic achieves an average error coverage of more than 95% (and almost 100% for some applications)","PeriodicalId":143886,"journal":{"name":"21st International Conference on VLSI Design (VLSID 2008)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125281016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
期刊
21st International Conference on VLSI Design (VLSID 2008)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1