2012 IEEE Computer Society Annual Symposium on VLSI最新文献

英文中文

Analysis and Optimization of Thermal Effect on STT-RAM Based 3-D Stacked Cache Design 基于STT-RAM的三维堆叠高速缓存热效应分析与优化

2012 IEEE Computer Society Annual Symposium on VLSI

Pub Date : 2012-08-19 DOI: 10.1109/ISVLSI.2012.56

Xiuyuan Bi, Hai Helen Li, Jae-Joon Kim

Spin-Transfer Torque Random Access Memory (STT-RAM) has been proved a promising emerging nonvolatile memory technology suitable for many applications such as cache memory of CPU. Simulation results show that the switching time of Magnetic Tunnel Junction (MTJ), which is the core element of the STT-RAM cell, varies when the temperature changes. In this paper, we study the thermal effect on switching time of STT-RAM cell, and it is showed that when temperature changes from 300K to 375K, the required write pulse period to achieve 10-8 bit error rate (BER) increases from 10.02ns to 15.04ns under 45nm technology. When STT-RAM is used as 3-D stacked L3 cache, the required write pulse period ranges from 11.42ns to 14.68ns due to temperature variation caused by the CPU core layer. If the thermal effect is not considered, the BER of the hottest region will significantly increase to 10-4. Based on these observations, an optimization design with Dynamic Temperature Aware Write Access is proposed, to increase the efficiency of accessing a 3-D stacked STT-RAM cache, as well as achieve the target BER. Compared to a conventional design, the proposed scheme can improve the CPU performance by 3.8% and reduce the write energy consumption of the STT-RAM cache by 4.8%.

自旋传递扭矩随机存取存储器(STT-RAM)是一种新兴的非易失性存储技术，适用于CPU高速缓存等多种应用。仿真结果表明，STT-RAM单元的核心元件磁隧道结(MTJ)的开关时间随温度的变化而变化。本文研究了热效应对STT-RAM电池开关时间的影响，结果表明，当温度从300K变化到375K时，45nm技术下实现10-8比特误码率所需的写入脉冲周期从10.02ns增加到15.04ns。当STT-RAM作为3-D堆叠L3缓存时，由于CPU核心层温度的变化，需要的写脉冲周期在11.42 ~ 14.68ns之间。如果不考虑热效应，最热区的BER将显著增加到10-4。在此基础上，提出了一种动态温度感知写访问优化设计，以提高三维堆叠STT-RAM缓存的访问效率，并达到目标误码率。与传统设计相比，该方案可将CPU性能提高3.8%，并将STT-RAM缓存的写能耗降低4.8%。

{"title":"Analysis and Optimization of Thermal Effect on STT-RAM Based 3-D Stacked Cache Design","authors":"Xiuyuan Bi, Hai Helen Li, Jae-Joon Kim","doi":"10.1109/ISVLSI.2012.56","DOIUrl":"https://doi.org/10.1109/ISVLSI.2012.56","url":null,"abstract":"Spin-Transfer Torque Random Access Memory (STT-RAM) has been proved a promising emerging nonvolatile memory technology suitable for many applications such as cache memory of CPU. Simulation results show that the switching time of Magnetic Tunnel Junction (MTJ), which is the core element of the STT-RAM cell, varies when the temperature changes. In this paper, we study the thermal effect on switching time of STT-RAM cell, and it is showed that when temperature changes from 300K to 375K, the required write pulse period to achieve 10-8 bit error rate (BER) increases from 10.02ns to 15.04ns under 45nm technology. When STT-RAM is used as 3-D stacked L3 cache, the required write pulse period ranges from 11.42ns to 14.68ns due to temperature variation caused by the CPU core layer. If the thermal effect is not considered, the BER of the hottest region will significantly increase to 10-4. Based on these observations, an optimization design with Dynamic Temperature Aware Write Access is proposed, to increase the efficiency of accessing a 3-D stacked STT-RAM cache, as well as achieve the target BER. Compared to a conventional design, the proposed scheme can improve the CPU performance by 3.8% and reduce the write energy consumption of the STT-RAM cache by 4.8%.","PeriodicalId":398850,"journal":{"name":"2012 IEEE Computer Society Annual Symposium on VLSI","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115224201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

A Tuneable CMOS Pulse Generator for Detecting the Cracks in Concrete Walls 用于混凝土墙体裂缝检测的可调谐CMOS脉冲发生器

2012 IEEE Computer Society Annual Symposium on VLSI

Pub Date : 2012-08-19 DOI: 10.1109/ISVLSI.2012.20

T. Rao, A. Dutta, S. Singh, Arijit De, B. D. Sahoo

A CMOS impulse generator was designed as a part of Ultra Wide Band (UWB) wireless communication system. An input square wave signal is delayed by using differential pairs and then XORed with the input signal to produce short duration pulses. An RLC circuit works as a Band Pass Filter (BPF) used to generate Gaussian monopulse from the obtained short duration pulses. It operates with center frequency at 4.782 GHz and -3dB band width of 20.36 GHz. The output peak to peak amplitude of the signal is 44.11 mV with pulse duration of 300 picoseconds. The UWB pulse generator has been simulated in 0.18μm CMOS technology.

设计了一种CMOS脉冲发生器，作为超宽带无线通信系统的组成部分。输入方波信号通过差分对被延迟，然后与输入信号xor产生短持续时间脉冲。RLC电路作为带通滤波器(BPF)，用于从获得的短持续脉冲产生高斯单脉冲。它的中心频率为4.782 GHz， -3dB带宽为20.36 GHz。信号输出峰对峰幅值为44.11 mV，脉冲持续时间为300皮秒。采用0.18μm CMOS技术对UWB脉冲发生器进行了仿真。

引用次数: 3

A Survey of Microarchitecture Support for Embedded Processor Security 嵌入式处理器安全的微体系结构支持综述

2012 IEEE Computer Society Annual Symposium on VLSI

Pub Date : 2012-08-19 DOI: 10.1109/ISVLSI.2012.64

A. Kanuparthi, R. Karri, Gaston Ormazabal, Sateesh Addepalli

The number of attacks on embedded processors is on the rise. Attackers exploit vulnerabilities in the software to launch new attacks and get unauthorized access to sensitive information stored in these devices. Several solutions have been proposed by both the academia and the industry to protect the programs running on these embedded-processor based computer systems. After a description of the several attacks that threaten a computer system, this paper surveys existing defenses - software-based and hardware-based (watchdog checkers, integrity trees, memory encryption, and modification of processor architecture), that protect against such attacks. This paper also provides a comparative discussion of their advantages and disadvantages.

针对嵌入式处理器的攻击数量正在上升。攻击者利用软件中的漏洞发起新的攻击，并未经授权访问存储在这些设备中的敏感信息。学术界和工业界已经提出了几种解决方案来保护这些基于嵌入式处理器的计算机系统上运行的程序。在描述了几种威胁计算机系统的攻击之后，本文调查了现有的防御措施——基于软件和基于硬件的防御措施(看门狗检查器、完整性树、内存加密和修改处理器架构)，以防止此类攻击。本文还对它们的优缺点进行了比较讨论。

引用次数: 6

A 3D-NoC Router Implementation Exploiting Vertically-Partially-Connected Topologies 利用垂直部分连接拓扑的3D-NoC路由器实现

2012 IEEE Computer Society Annual Symposium on VLSI

Pub Date : 2012-08-19 DOI: 10.1109/ISVLSI.2012.19

M. Bahmani, Abbas Sheibanyrad, F. Pétrot, Florentine Dubois, Paolo Durante

In this paper, we detail the design and implementation of a router for vertically-partially-connected 3D-NoCs based on stacked 2D-meshes. This router implements the necessary hardware to support a recently introduced routing algorithm called "Elevator-First", which targets topologies with irregularly placed vertical connections in a deadlock free manner, using only two virtual channels in the plane. The micro-architectural design shows that the proposed router requires few additional hardware. Our studies about the practicality of the algorithm and its router implementation demonstrate that it has low overhead compared to a router for fully connected 3D-NoCs. Using ST Microelectronics 65nm CMOS technology Elevator-First router with 7 ports has a total area of 0.07mm2, an Operating frequency of over 3GHz and a power consumption of around 3mW.

在本文中，我们详细介绍了基于堆叠2d网格的垂直部分连接3d - noc路由器的设计和实现。该路由器实现了必要的硬件来支持最近引入的名为“Elevator-First”的路由算法，该算法以无死锁的方式针对具有不规则垂直连接的拓扑结构，仅使用平面中的两个虚拟通道。微结构设计表明，所提出的路由器需要很少的额外硬件。我们对该算法的实用性及其路由器实现的研究表明，与完全连接的3d - noc路由器相比，它的开销较低。Elevator-First路由器采用意法半导体65nm CMOS技术，共7个端口，总面积0.07mm2，工作频率超过3GHz，功耗约3mW。

引用次数: 36

A Fast Head-Tail Expression Generator for TCAM -- Application to Packet Classification 一种快速的TCAM正尾表达式生成器——在包分类中的应用

2012 IEEE Computer Society Annual Symposium on VLSI

Pub Date : 2012-08-19 DOI: 10.1109/ISVLSI.2012.47

Infall Syafalni, Tsutomu Sasao

This paper presents a method to generate head-tail expressions for Ternary Content Addressable Memories (TCAMs). First, we derive head-tail expressions for interval functions. We introduce a fast prefix sum-of-product (PreSOP) generator (FP) which generates products using the bit patterns of the endpoints. Next, we propose a direct head-tail expression generator (DHT). Experimental results show that DHT generates much smaller TCAM than FP. The proposed algorithm is useful for simplified TCAM generator for packet classification.

提出了一种生成三元内容可寻址存储器(TCAMs)正尾表达式的方法。首先，我们推导出区间函数的正反表达式。我们介绍了一种快速前缀积和(PreSOP)生成器(FP)，它使用端点的位模式生成乘积。接下来，我们提出了一个直接正尾表达式生成器(DHT)。实验结果表明，DHT产生的TCAM比FP小得多。该算法可用于简化TCAM生成器的分组分类。

引用次数: 2

Binary Difference Based Test Data Compression for NoC Based SoCs 基于二进制差分的NoC soc测试数据压缩

2012 IEEE Computer Society Annual Symposium on VLSI

Pub Date : 2012-08-19 DOI: 10.1109/ISVLSI.2012.26

Sanga Chaki, C. Giri, H. Rahaman

The scaling of microchip technologies has enabled large scale and very complex systems-on-chip (SoC). The high-performance, flexible, scalable, simple to design and power efficient interconnection network, called the Network-on-chip (NoC), permits the system components to communicate effectively. This communication structure needs to be tested for correctness, which requires handling huge volume of test data. Thus, test data compression has now become essential to reduce test costs. It reduces test data volume which in turn decreases testing time. This work presents a new test data compression method based on binary difference and the corresponding decompression architecture. The major advantages of this compression technique include very high compression ratio, and a low-cost on-chip decoder. The effectiveness of the proposed approach is demonstrated by applying it to the full scan test data set of ISCAS'89 benchmark circuits.

微芯片技术的规模化使大规模和非常复杂的片上系统(SoC)成为可能。这种高性能、灵活、可扩展、设计简单且节能的互连网络被称为片上网络(NoC)，允许系统组件进行有效通信。需要测试这种通信结构的正确性，这需要处理大量的测试数据。因此，测试数据压缩现在已经成为降低测试成本的必要手段。它减少了测试数据量，从而减少了测试时间。本文提出了一种新的基于二进制差分的测试数据压缩方法及相应的解压缩体系结构。这种压缩技术的主要优点包括非常高的压缩比和低成本的片上解码器。将该方法应用于ISCAS'89基准电路的全扫描测试数据集，证明了该方法的有效性。

引用次数: 3

An Investigation of Concurrent Error Detection over Binary Galois Fields in CNTFET and QCA Technologies CNTFET和QCA技术中二值伽罗瓦场并发错误检测的研究

2012 IEEE Computer Society Annual Symposium on VLSI

Pub Date : 2012-08-19 DOI: 10.1109/ISVLSI.2012.57

M. Poolakkaparambil, J. Mathew, A. Jabir, S. Mohanty

Permanent and temporary transient faults are the main concern in modern very large scale integrated circuits (VLSI). The main reason for such high vulnerability of the modern integrated circuit is their high integration density. Miniaturization of devices resulted in scaling their properties along with their size and thus making them a subject to induced faults and permanent faults. As the research progresses towards shrinking the technology even further to 15nm or below with potential CMOS replacement strategies such as carbon nano-tube field effect transistors (CNTFET) and quantum cellular automata (QCA) cells, the notion of fault susceptibility increases even further. Owing to these facts, this paper investigates the performance of standard concurrent error detection (CED) scheme over CNTEFETs and QCA technologies using normal basis (NB) finite field multiplier circuit as a test bench. The results are then compared with their CMOS equivalents which are believed to be the first reported attempt to the best of the authors'knowledge. The detailed experimental analysis of CMOS with CNTFET design proves that the emerging technologies perform better for error tolerant designs in terms of area, power, and delay as compared to its CMOS equivalent.

永久性和暂时性暂态故障是现代超大规模集成电路(VLSI)中主要关注的问题。现代集成电路具有如此高的易损性，其主要原因是其高集成密度。器件的小型化导致其性能随尺寸而缩放，从而使其成为诱发故障和永久故障的对象。随着研究的进展，该技术甚至进一步缩小到15nm或以下，潜在的CMOS替代策略，如碳纳米管场效应晶体管(CNTFET)和量子细胞自动机(QCA)细胞，故障敏感性的概念进一步增加。鉴于此，本文以正态基有限场乘法器电路为试验台，研究了标准并发错误检测(CED)方案在cntefet和QCA技术上的性能。然后将结果与他们的CMOS等效物进行比较，这被认为是作者所知的第一次报道的尝试。对采用CNTFET设计的CMOS进行了详细的实验分析，证明了与等效CMOS相比，新兴技术在面积、功耗和延迟方面具有更好的容错设计。

{"title":"An Investigation of Concurrent Error Detection over Binary Galois Fields in CNTFET and QCA Technologies","authors":"M. Poolakkaparambil, J. Mathew, A. Jabir, S. Mohanty","doi":"10.1109/ISVLSI.2012.57","DOIUrl":"https://doi.org/10.1109/ISVLSI.2012.57","url":null,"abstract":"Permanent and temporary transient faults are the main concern in modern very large scale integrated circuits (VLSI). The main reason for such high vulnerability of the modern integrated circuit is their high integration density. Miniaturization of devices resulted in scaling their properties along with their size and thus making them a subject to induced faults and permanent faults. As the research progresses towards shrinking the technology even further to 15nm or below with potential CMOS replacement strategies such as carbon nano-tube field effect transistors (CNTFET) and quantum cellular automata (QCA) cells, the notion of fault susceptibility increases even further. Owing to these facts, this paper investigates the performance of standard concurrent error detection (CED) scheme over CNTEFETs and QCA technologies using normal basis (NB) finite field multiplier circuit as a test bench. The results are then compared with their CMOS equivalents which are believed to be the first reported attempt to the best of the authors'knowledge. The detailed experimental analysis of CMOS with CNTFET design proves that the emerging technologies perform better for error tolerant designs in terms of area, power, and delay as compared to its CMOS equivalent.","PeriodicalId":398850,"journal":{"name":"2012 IEEE Computer Society Annual Symposium on VLSI","volume":"51 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133880564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Uncertain Model and Algorithm for Hardware/Software Partitioning 硬件/软件分区的不确定模型与算法

2012 IEEE Computer Society Annual Symposium on VLSI

Pub Date : 2012-08-19 DOI: 10.1109/ISVLSI.2012.14

Yu Jiang, Hehua Zhang, Xun Jiao, Xiaoyu Song, W. Hung, M. Gu, Jiaguang Sun

Embedded systems are becoming increasingly popular due to their widespread applications. Hardware/software partitioning is becoming one of the most crucial steps in the design of embedded systems. The costs and delays of the final results of a design will strongly depend on partitioning. In this paper, we propose an uncertain programming model for partitioning problems. The delay related constraints and the cost related objective are modeled by uncertain variables with uncertainty distributions. We convert the uncertain programming model to a deterministic model and solve the converted model by an efficient heuristic method. We propose a heuristic based on genetic algorithm and simulated annealing to solve the problem near-optimally, even for quite large systems. Experiment results show that the proposed model and algorithm produce quality partitions.

嵌入式系统由于其广泛的应用而变得越来越流行。硬件/软件分区正在成为嵌入式系统设计中最关键的步骤之一。设计最终结果的成本和延迟将在很大程度上取决于分区。本文提出了分区问题的不确定规划模型。时延相关约束和成本相关目标由不确定变量和不确定分布进行建模。将不确定规划模型转化为确定模型，并采用一种有效的启发式方法求解。我们提出了一种基于遗传算法和模拟退火的启发式算法，以接近最优地解决问题，即使对于相当大的系统也是如此。实验结果表明，所提出的模型和算法产生了高质量的分区。

引用次数: 32

Building Blocks to Use in Innovative Non-volatile FPGA Architecture Based on MTJs 基于MTJs的创新非易失性FPGA架构中的构建块

2012 IEEE Computer Society Annual Symposium on VLSI

Pub Date : 2012-08-19 DOI: 10.1109/ISVLSI.2012.21

L. Montesi, Z. Zilic, T. Hanyu, D. Suzuki

This paper addresses the need for a non-volatile reconfigurable FPGA in order to allow for many current applications to transition away from costly ASIC development. It is assumed that an architecture has been selected and needs to be filled with blocks designed at the transistor level. These are to allow for non-volatility by means of magnetic tunnel junction devices (MTJs). Circuit level designs are presented, together with their successful simulations. The blocks are therefore assembled together and electrically sound simulations are presented for a fully functional FPGA of minimal size. Design and testing is carried out in Cadance Virtuoso and Spectre along with the IBM p13 toolkit. The typical parameters of a University of Tohoku MTJ are used in a SPICE model developed by University of Minnesota.

本文解决了对非易失性可重构FPGA的需求，以便允许许多当前应用从昂贵的ASIC开发过渡。假设一个架构已经选定，并且需要填充在晶体管级设计的模块。这是为了通过磁性隧道结器件(MTJs)实现非挥发性。给出了电路级设计，并进行了成功的仿真。因此，将这些模块组装在一起，并为最小尺寸的全功能FPGA提供了电声模拟。设计和测试在Cadance Virtuoso和Spectre以及IBM p13工具包中进行。在明尼苏达大学开发的SPICE模型中使用了东北大学MTJ的典型参数。

引用次数: 5

Nano-PPUF: A Memristor-Based Security Primitive 纳米ppuf:一种基于忆阻器的安全原语

2012 IEEE Computer Society Annual Symposium on VLSI

Pub Date : 2012-08-19 DOI: 10.1109/ISVLSI.2012.40

Jeyavijayan Rajendran, G. Rose, R. Karri, M. Potkonjak

CMOS devices have been used to build hardware security primitives such as physical unclonable functions. Since MOS devices are relatively easy to model and simulate, CMOS-based security primitives are increasingly prone to modeling attacks. We propose memristor-based Public Physical Unclonable Functions (nano-PPUFs), they have complex models that are difficult to simulate. We leverage sneak path currents, process variations, and computationally intensive SPICE models as features to build the nano-PPUF. With just a few hundreds of memristors, we construct a time-bounded authentication protocol that will take several years for an attacker to compromise.

CMOS器件已被用于构建硬件安全原语，如物理不可克隆功能。由于MOS器件相对容易建模和仿真，基于cmos的安全原语越来越容易受到建模攻击。我们提出了基于忆阻器的公共物理不可克隆函数(Public Physical unclable Functions, nano- pufs)，其模型复杂，难以模拟。我们利用潜行路径电流、工艺变化和计算密集型SPICE模型作为构建纳米ppuf的特征。仅使用几百个忆阻器，我们就构建了一个有时间限制的身份验证协议，攻击者需要几年的时间才能攻破它。

引用次数: 103

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2012 IEEE Computer Society Annual Symposium on VLSI

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀