Spin-Transfer Torque Random Access Memory (STT-RAM) has been proved a promising emerging nonvolatile memory technology suitable for many applications such as cache memory of CPU. Simulation results show that the switching time of Magnetic Tunnel Junction (MTJ), which is the core element of the STT-RAM cell, varies when the temperature changes. In this paper, we study the thermal effect on switching time of STT-RAM cell, and it is showed that when temperature changes from 300K to 375K, the required write pulse period to achieve 10-8 bit error rate (BER) increases from 10.02ns to 15.04ns under 45nm technology. When STT-RAM is used as 3-D stacked L3 cache, the required write pulse period ranges from 11.42ns to 14.68ns due to temperature variation caused by the CPU core layer. If the thermal effect is not considered, the BER of the hottest region will significantly increase to 10-4. Based on these observations, an optimization design with Dynamic Temperature Aware Write Access is proposed, to increase the efficiency of accessing a 3-D stacked STT-RAM cache, as well as achieve the target BER. Compared to a conventional design, the proposed scheme can improve the CPU performance by 3.8% and reduce the write energy consumption of the STT-RAM cache by 4.8%.
{"title":"Analysis and Optimization of Thermal Effect on STT-RAM Based 3-D Stacked Cache Design","authors":"Xiuyuan Bi, Hai Helen Li, Jae-Joon Kim","doi":"10.1109/ISVLSI.2012.56","DOIUrl":"https://doi.org/10.1109/ISVLSI.2012.56","url":null,"abstract":"Spin-Transfer Torque Random Access Memory (STT-RAM) has been proved a promising emerging nonvolatile memory technology suitable for many applications such as cache memory of CPU. Simulation results show that the switching time of Magnetic Tunnel Junction (MTJ), which is the core element of the STT-RAM cell, varies when the temperature changes. In this paper, we study the thermal effect on switching time of STT-RAM cell, and it is showed that when temperature changes from 300K to 375K, the required write pulse period to achieve 10-8 bit error rate (BER) increases from 10.02ns to 15.04ns under 45nm technology. When STT-RAM is used as 3-D stacked L3 cache, the required write pulse period ranges from 11.42ns to 14.68ns due to temperature variation caused by the CPU core layer. If the thermal effect is not considered, the BER of the hottest region will significantly increase to 10-4. Based on these observations, an optimization design with Dynamic Temperature Aware Write Access is proposed, to increase the efficiency of accessing a 3-D stacked STT-RAM cache, as well as achieve the target BER. Compared to a conventional design, the proposed scheme can improve the CPU performance by 3.8% and reduce the write energy consumption of the STT-RAM cache by 4.8%.","PeriodicalId":398850,"journal":{"name":"2012 IEEE Computer Society Annual Symposium on VLSI","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115224201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Rao, A. Dutta, S. Singh, Arijit De, B. D. Sahoo
A CMOS impulse generator was designed as a part of Ultra Wide Band (UWB) wireless communication system. An input square wave signal is delayed by using differential pairs and then XORed with the input signal to produce short duration pulses. An RLC circuit works as a Band Pass Filter (BPF) used to generate Gaussian monopulse from the obtained short duration pulses. It operates with center frequency at 4.782 GHz and -3dB band width of 20.36 GHz. The output peak to peak amplitude of the signal is 44.11 mV with pulse duration of 300 picoseconds. The UWB pulse generator has been simulated in 0.18μm CMOS technology.
{"title":"A Tuneable CMOS Pulse Generator for Detecting the Cracks in Concrete Walls","authors":"T. Rao, A. Dutta, S. Singh, Arijit De, B. D. Sahoo","doi":"10.1109/ISVLSI.2012.20","DOIUrl":"https://doi.org/10.1109/ISVLSI.2012.20","url":null,"abstract":"A CMOS impulse generator was designed as a part of Ultra Wide Band (UWB) wireless communication system. An input square wave signal is delayed by using differential pairs and then XORed with the input signal to produce short duration pulses. An RLC circuit works as a Band Pass Filter (BPF) used to generate Gaussian monopulse from the obtained short duration pulses. It operates with center frequency at 4.782 GHz and -3dB band width of 20.36 GHz. The output peak to peak amplitude of the signal is 44.11 mV with pulse duration of 300 picoseconds. The UWB pulse generator has been simulated in 0.18μm CMOS technology.","PeriodicalId":398850,"journal":{"name":"2012 IEEE Computer Society Annual Symposium on VLSI","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124986351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Kanuparthi, R. Karri, Gaston Ormazabal, Sateesh Addepalli
The number of attacks on embedded processors is on the rise. Attackers exploit vulnerabilities in the software to launch new attacks and get unauthorized access to sensitive information stored in these devices. Several solutions have been proposed by both the academia and the industry to protect the programs running on these embedded-processor based computer systems. After a description of the several attacks that threaten a computer system, this paper surveys existing defenses - software-based and hardware-based (watchdog checkers, integrity trees, memory encryption, and modification of processor architecture), that protect against such attacks. This paper also provides a comparative discussion of their advantages and disadvantages.
{"title":"A Survey of Microarchitecture Support for Embedded Processor Security","authors":"A. Kanuparthi, R. Karri, Gaston Ormazabal, Sateesh Addepalli","doi":"10.1109/ISVLSI.2012.64","DOIUrl":"https://doi.org/10.1109/ISVLSI.2012.64","url":null,"abstract":"The number of attacks on embedded processors is on the rise. Attackers exploit vulnerabilities in the software to launch new attacks and get unauthorized access to sensitive information stored in these devices. Several solutions have been proposed by both the academia and the industry to protect the programs running on these embedded-processor based computer systems. After a description of the several attacks that threaten a computer system, this paper surveys existing defenses - software-based and hardware-based (watchdog checkers, integrity trees, memory encryption, and modification of processor architecture), that protect against such attacks. This paper also provides a comparative discussion of their advantages and disadvantages.","PeriodicalId":398850,"journal":{"name":"2012 IEEE Computer Society Annual Symposium on VLSI","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126177075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Bahmani, Abbas Sheibanyrad, F. Pétrot, Florentine Dubois, Paolo Durante
In this paper, we detail the design and implementation of a router for vertically-partially-connected 3D-NoCs based on stacked 2D-meshes. This router implements the necessary hardware to support a recently introduced routing algorithm called "Elevator-First", which targets topologies with irregularly placed vertical connections in a deadlock free manner, using only two virtual channels in the plane. The micro-architectural design shows that the proposed router requires few additional hardware. Our studies about the practicality of the algorithm and its router implementation demonstrate that it has low overhead compared to a router for fully connected 3D-NoCs. Using ST Microelectronics 65nm CMOS technology Elevator-First router with 7 ports has a total area of 0.07mm2, an Operating frequency of over 3GHz and a power consumption of around 3mW.
{"title":"A 3D-NoC Router Implementation Exploiting Vertically-Partially-Connected Topologies","authors":"M. Bahmani, Abbas Sheibanyrad, F. Pétrot, Florentine Dubois, Paolo Durante","doi":"10.1109/ISVLSI.2012.19","DOIUrl":"https://doi.org/10.1109/ISVLSI.2012.19","url":null,"abstract":"In this paper, we detail the design and implementation of a router for vertically-partially-connected 3D-NoCs based on stacked 2D-meshes. This router implements the necessary hardware to support a recently introduced routing algorithm called \"Elevator-First\", which targets topologies with irregularly placed vertical connections in a deadlock free manner, using only two virtual channels in the plane. The micro-architectural design shows that the proposed router requires few additional hardware. Our studies about the practicality of the algorithm and its router implementation demonstrate that it has low overhead compared to a router for fully connected 3D-NoCs. Using ST Microelectronics 65nm CMOS technology Elevator-First router with 7 ports has a total area of 0.07mm2, an Operating frequency of over 3GHz and a power consumption of around 3mW.","PeriodicalId":398850,"journal":{"name":"2012 IEEE Computer Society Annual Symposium on VLSI","volume":"205 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114406955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a method to generate head-tail expressions for Ternary Content Addressable Memories (TCAMs). First, we derive head-tail expressions for interval functions. We introduce a fast prefix sum-of-product (PreSOP) generator (FP) which generates products using the bit patterns of the endpoints. Next, we propose a direct head-tail expression generator (DHT). Experimental results show that DHT generates much smaller TCAM than FP. The proposed algorithm is useful for simplified TCAM generator for packet classification.
{"title":"A Fast Head-Tail Expression Generator for TCAM -- Application to Packet Classification","authors":"Infall Syafalni, Tsutomu Sasao","doi":"10.1109/ISVLSI.2012.47","DOIUrl":"https://doi.org/10.1109/ISVLSI.2012.47","url":null,"abstract":"This paper presents a method to generate head-tail expressions for Ternary Content Addressable Memories (TCAMs). First, we derive head-tail expressions for interval functions. We introduce a fast prefix sum-of-product (PreSOP) generator (FP) which generates products using the bit patterns of the endpoints. Next, we propose a direct head-tail expression generator (DHT). Experimental results show that DHT generates much smaller TCAM than FP. The proposed algorithm is useful for simplified TCAM generator for packet classification.","PeriodicalId":398850,"journal":{"name":"2012 IEEE Computer Society Annual Symposium on VLSI","volume":"271 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123365145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The scaling of microchip technologies has enabled large scale and very complex systems-on-chip (SoC). The high-performance, flexible, scalable, simple to design and power efficient interconnection network, called the Network-on-chip (NoC), permits the system components to communicate effectively. This communication structure needs to be tested for correctness, which requires handling huge volume of test data. Thus, test data compression has now become essential to reduce test costs. It reduces test data volume which in turn decreases testing time. This work presents a new test data compression method based on binary difference and the corresponding decompression architecture. The major advantages of this compression technique include very high compression ratio, and a low-cost on-chip decoder. The effectiveness of the proposed approach is demonstrated by applying it to the full scan test data set of ISCAS'89 benchmark circuits.
{"title":"Binary Difference Based Test Data Compression for NoC Based SoCs","authors":"Sanga Chaki, C. Giri, H. Rahaman","doi":"10.1109/ISVLSI.2012.26","DOIUrl":"https://doi.org/10.1109/ISVLSI.2012.26","url":null,"abstract":"The scaling of microchip technologies has enabled large scale and very complex systems-on-chip (SoC). The high-performance, flexible, scalable, simple to design and power efficient interconnection network, called the Network-on-chip (NoC), permits the system components to communicate effectively. This communication structure needs to be tested for correctness, which requires handling huge volume of test data. Thus, test data compression has now become essential to reduce test costs. It reduces test data volume which in turn decreases testing time. This work presents a new test data compression method based on binary difference and the corresponding decompression architecture. The major advantages of this compression technique include very high compression ratio, and a low-cost on-chip decoder. The effectiveness of the proposed approach is demonstrated by applying it to the full scan test data set of ISCAS'89 benchmark circuits.","PeriodicalId":398850,"journal":{"name":"2012 IEEE Computer Society Annual Symposium on VLSI","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114851495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Poolakkaparambil, J. Mathew, A. Jabir, S. Mohanty
Permanent and temporary transient faults are the main concern in modern very large scale integrated circuits (VLSI). The main reason for such high vulnerability of the modern integrated circuit is their high integration density. Miniaturization of devices resulted in scaling their properties along with their size and thus making them a subject to induced faults and permanent faults. As the research progresses towards shrinking the technology even further to 15nm or below with potential CMOS replacement strategies such as carbon nano-tube field effect transistors (CNTFET) and quantum cellular automata (QCA) cells, the notion of fault susceptibility increases even further. Owing to these facts, this paper investigates the performance of standard concurrent error detection (CED) scheme over CNTEFETs and QCA technologies using normal basis (NB) finite field multiplier circuit as a test bench. The results are then compared with their CMOS equivalents which are believed to be the first reported attempt to the best of the authors'knowledge. The detailed experimental analysis of CMOS with CNTFET design proves that the emerging technologies perform better for error tolerant designs in terms of area, power, and delay as compared to its CMOS equivalent.
{"title":"An Investigation of Concurrent Error Detection over Binary Galois Fields in CNTFET and QCA Technologies","authors":"M. Poolakkaparambil, J. Mathew, A. Jabir, S. Mohanty","doi":"10.1109/ISVLSI.2012.57","DOIUrl":"https://doi.org/10.1109/ISVLSI.2012.57","url":null,"abstract":"Permanent and temporary transient faults are the main concern in modern very large scale integrated circuits (VLSI). The main reason for such high vulnerability of the modern integrated circuit is their high integration density. Miniaturization of devices resulted in scaling their properties along with their size and thus making them a subject to induced faults and permanent faults. As the research progresses towards shrinking the technology even further to 15nm or below with potential CMOS replacement strategies such as carbon nano-tube field effect transistors (CNTFET) and quantum cellular automata (QCA) cells, the notion of fault susceptibility increases even further. Owing to these facts, this paper investigates the performance of standard concurrent error detection (CED) scheme over CNTEFETs and QCA technologies using normal basis (NB) finite field multiplier circuit as a test bench. The results are then compared with their CMOS equivalents which are believed to be the first reported attempt to the best of the authors'knowledge. The detailed experimental analysis of CMOS with CNTFET design proves that the emerging technologies perform better for error tolerant designs in terms of area, power, and delay as compared to its CMOS equivalent.","PeriodicalId":398850,"journal":{"name":"2012 IEEE Computer Society Annual Symposium on VLSI","volume":"51 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133880564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yu Jiang, Hehua Zhang, Xun Jiao, Xiaoyu Song, W. Hung, M. Gu, Jiaguang Sun
Embedded systems are becoming increasingly popular due to their widespread applications. Hardware/software partitioning is becoming one of the most crucial steps in the design of embedded systems. The costs and delays of the final results of a design will strongly depend on partitioning. In this paper, we propose an uncertain programming model for partitioning problems. The delay related constraints and the cost related objective are modeled by uncertain variables with uncertainty distributions. We convert the uncertain programming model to a deterministic model and solve the converted model by an efficient heuristic method. We propose a heuristic based on genetic algorithm and simulated annealing to solve the problem near-optimally, even for quite large systems. Experiment results show that the proposed model and algorithm produce quality partitions.
{"title":"Uncertain Model and Algorithm for Hardware/Software Partitioning","authors":"Yu Jiang, Hehua Zhang, Xun Jiao, Xiaoyu Song, W. Hung, M. Gu, Jiaguang Sun","doi":"10.1109/ISVLSI.2012.14","DOIUrl":"https://doi.org/10.1109/ISVLSI.2012.14","url":null,"abstract":"Embedded systems are becoming increasingly popular due to their widespread applications. Hardware/software partitioning is becoming one of the most crucial steps in the design of embedded systems. The costs and delays of the final results of a design will strongly depend on partitioning. In this paper, we propose an uncertain programming model for partitioning problems. The delay related constraints and the cost related objective are modeled by uncertain variables with uncertainty distributions. We convert the uncertain programming model to a deterministic model and solve the converted model by an efficient heuristic method. We propose a heuristic based on genetic algorithm and simulated annealing to solve the problem near-optimally, even for quite large systems. Experiment results show that the proposed model and algorithm produce quality partitions.","PeriodicalId":398850,"journal":{"name":"2012 IEEE Computer Society Annual Symposium on VLSI","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132607464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper addresses the need for a non-volatile reconfigurable FPGA in order to allow for many current applications to transition away from costly ASIC development. It is assumed that an architecture has been selected and needs to be filled with blocks designed at the transistor level. These are to allow for non-volatility by means of magnetic tunnel junction devices (MTJs). Circuit level designs are presented, together with their successful simulations. The blocks are therefore assembled together and electrically sound simulations are presented for a fully functional FPGA of minimal size. Design and testing is carried out in Cadance Virtuoso and Spectre along with the IBM p13 toolkit. The typical parameters of a University of Tohoku MTJ are used in a SPICE model developed by University of Minnesota.
{"title":"Building Blocks to Use in Innovative Non-volatile FPGA Architecture Based on MTJs","authors":"L. Montesi, Z. Zilic, T. Hanyu, D. Suzuki","doi":"10.1109/ISVLSI.2012.21","DOIUrl":"https://doi.org/10.1109/ISVLSI.2012.21","url":null,"abstract":"This paper addresses the need for a non-volatile reconfigurable FPGA in order to allow for many current applications to transition away from costly ASIC development. It is assumed that an architecture has been selected and needs to be filled with blocks designed at the transistor level. These are to allow for non-volatility by means of magnetic tunnel junction devices (MTJs). Circuit level designs are presented, together with their successful simulations. The blocks are therefore assembled together and electrically sound simulations are presented for a fully functional FPGA of minimal size. Design and testing is carried out in Cadance Virtuoso and Spectre along with the IBM p13 toolkit. The typical parameters of a University of Tohoku MTJ are used in a SPICE model developed by University of Minnesota.","PeriodicalId":398850,"journal":{"name":"2012 IEEE Computer Society Annual Symposium on VLSI","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133246212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jeyavijayan Rajendran, G. Rose, R. Karri, M. Potkonjak
CMOS devices have been used to build hardware security primitives such as physical unclonable functions. Since MOS devices are relatively easy to model and simulate, CMOS-based security primitives are increasingly prone to modeling attacks. We propose memristor-based Public Physical Unclonable Functions (nano-PPUFs), they have complex models that are difficult to simulate. We leverage sneak path currents, process variations, and computationally intensive SPICE models as features to build the nano-PPUF. With just a few hundreds of memristors, we construct a time-bounded authentication protocol that will take several years for an attacker to compromise.
{"title":"Nano-PPUF: A Memristor-Based Security Primitive","authors":"Jeyavijayan Rajendran, G. Rose, R. Karri, M. Potkonjak","doi":"10.1109/ISVLSI.2012.40","DOIUrl":"https://doi.org/10.1109/ISVLSI.2012.40","url":null,"abstract":"CMOS devices have been used to build hardware security primitives such as physical unclonable functions. Since MOS devices are relatively easy to model and simulate, CMOS-based security primitives are increasingly prone to modeling attacks. We propose memristor-based Public Physical Unclonable Functions (nano-PPUFs), they have complex models that are difficult to simulate. We leverage sneak path currents, process variations, and computationally intensive SPICE models as features to build the nano-PPUF. With just a few hundreds of memristors, we construct a time-bounded authentication protocol that will take several years for an attacker to compromise.","PeriodicalId":398850,"journal":{"name":"2012 IEEE Computer Society Annual Symposium on VLSI","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133760169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}