This paper proposes a latch-less pipeline architecture for spintronic circuits and quantifies the impact of pipeline depth and width on the error rate caused by thermal noise. This paper focuses on concatenable spin logic (CSL) even though the proposed architecture and error rate estimation approach can be applied to any spintronic logic that use magnetic moment of nanomagnets as the computational state variable. The latchless pipeline architecture takes advantage of the non-volatility of nanomagnets and eliminates the need for the extra switches that are necessary in CMOS circuits to latch data at the beginning and end of each pipeline stage. However, choosing a pipeline clock rate requires knowing the circuit delay of a single stage. It is shown that the delay of a magnet can best be represented as a gamma distribution, and thus, in order to achieve a 10-4 error rate with a single switch, the clock period will need to be approximately 120% greater the average delay of a single device. This variation tax can be reduced to under 35% for a circuit with 10 switches connected in series, or it can exceed 145% if the switches are connected in parallel (depth=1).
{"title":"Pipeline design in spintronic circuits","authors":"N. Kani, A. Naeemi","doi":"10.1145/2770287.2770314","DOIUrl":"https://doi.org/10.1145/2770287.2770314","url":null,"abstract":"This paper proposes a latch-less pipeline architecture for spintronic circuits and quantifies the impact of pipeline depth and width on the error rate caused by thermal noise. This paper focuses on concatenable spin logic (CSL) even though the proposed architecture and error rate estimation approach can be applied to any spintronic logic that use magnetic moment of nanomagnets as the computational state variable. The latchless pipeline architecture takes advantage of the non-volatility of nanomagnets and eliminates the need for the extra switches that are necessary in CMOS circuits to latch data at the beginning and end of each pipeline stage. However, choosing a pipeline clock rate requires knowing the circuit delay of a single stage. It is shown that the delay of a magnet can best be represented as a gamma distribution, and thus, in order to achieve a 10-4 error rate with a single switch, the clock period will need to be approximately 120% greater the average delay of a single device. This variation tax can be reduced to under 35% for a circuit with 10 switches connected in series, or it can exceed 145% if the switches are connected in parallel (depth=1).","PeriodicalId":6519,"journal":{"name":"2014 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"32 1","pages":"110-115"},"PeriodicalIF":0.0,"publicationDate":"2014-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86816393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a Ternary Content Addressable Memory (TCAM) cell that employs memristors as storage element. The TCAM cell requires two memristors in series to perform the traditional memory operations (read and write) as well as the search and matching operations for TCAM; this memory cell is analyzed with respect to different features (such as memristance range and voltage threshold) of the memristors to process fast and efficiently the ternary data. A comprehensive simulation based assessment of this cell is pursued by HSPICE. Comparison with other memristor-based CAMs as well as CMOS-based TCAMs shows that the proposed cell offers significant advantages in terms of power dissipation, reduced transistor count and search/match operation performance.
{"title":"A memristor-based TCAM (Ternary Content Addressable Memory) cell","authors":"P. Junsangsri, F. Lombardi, Jie Han","doi":"10.1145/2770287.2770289","DOIUrl":"https://doi.org/10.1145/2770287.2770289","url":null,"abstract":"This paper presents a Ternary Content Addressable Memory (TCAM) cell that employs memristors as storage element. The TCAM cell requires two memristors in series to perform the traditional memory operations (read and write) as well as the search and matching operations for TCAM; this memory cell is analyzed with respect to different features (such as memristance range and voltage threshold) of the memristors to process fast and efficiently the ternary data. A comprehensive simulation based assessment of this cell is pursued by HSPICE. Comparison with other memristor-based CAMs as well as CMOS-based TCAMs shows that the proposed cell offers significant advantages in terms of power dissipation, reduced transistor count and search/match operation performance.","PeriodicalId":6519,"journal":{"name":"2014 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"2016 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86391105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper introduces an analog-to-stochastic converter using a magnetic-tunnel junction (MTJ) device for stochastic computation. Stochastic computation has recently been exploited for area-efficient hardware implementation, such as low-density parity-check (LDPC) decoders and image processors. However, power-and-area hungry analog-to-digital and digital-to-stochastic converters are required for the analog to stochastic signal conversion. The MTJ devices exhibit probabilistic switching behaviour between two resistance states. Exploiting the probabilistic behaviour, analog signals can be directly converted to stochastic signals to mitigate the signal-conversion overhead. The analog-to-stochastic signal conversion is mathematically described and the conversion circuit is designed based on a transistor/MTJ hybrid structure. The conversion characteristic is evaluated using device and circuit parameters that determines proper parameters for designing the analog-to-stochastic converter.
{"title":"Analog-to-stochastic converter using magnetic-tunnel junction devices","authors":"N. Onizawa, Daisaku Katagiri, W. Gross, T. Hanyu","doi":"10.1145/2770287.2770303","DOIUrl":"https://doi.org/10.1145/2770287.2770303","url":null,"abstract":"This paper introduces an analog-to-stochastic converter using a magnetic-tunnel junction (MTJ) device for stochastic computation. Stochastic computation has recently been exploited for area-efficient hardware implementation, such as low-density parity-check (LDPC) decoders and image processors. However, power-and-area hungry analog-to-digital and digital-to-stochastic converters are required for the analog to stochastic signal conversion. The MTJ devices exhibit probabilistic switching behaviour between two resistance states. Exploiting the probabilistic behaviour, analog signals can be directly converted to stochastic signals to mitigate the signal-conversion overhead. The analog-to-stochastic signal conversion is mathematically described and the conversion circuit is designed based on a transistor/MTJ hybrid structure. The conversion characteristic is evaluated using device and circuit parameters that determines proper parameters for designing the analog-to-stochastic converter.","PeriodicalId":6519,"journal":{"name":"2014 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"9 1","pages":"59-64"},"PeriodicalIF":0.0,"publicationDate":"2014-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82134521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The high power and long global interconnection delay are two of the major limits for further scaling down of the process nodes in the very large scale integrated (VLSI) systems. Therefore, new technologies and computer architectures are under focused development to reduce the power consumption and interconnection delay. Magnetic tunnel junction (MTJ) nanopillar with the advantages of non-volatility, fast switching speed, and high density promises new designs and architectures to significantly alleviate the power and delay issues. This paper presents new logic-in-memory designs of the basic logic gates based on MTJs, including INV, (N)AND, (N)OR and XOR. The MTJ sharing and timing demultiplexing techniques are used in the proposed non-volatile logic gates to greatly reduce the write power. The simulation results show that the write power of the proposed non-volatile logic gates is as low as 285fJ/bit. The basic logic gates can finish the read operation in less than 160ps with 4.35f J read energy. Moreover, the proposed non-volatile logic gates may be reconfigured after fabrication, which makes the designs more flexible and robust.
{"title":"STT-MRAM based low power synchronous non-volatile logic with timing demultiplexing","authors":"Kejie Huang, Rong Zhao, Y. Lian","doi":"10.1145/2770287.2770295","DOIUrl":"https://doi.org/10.1145/2770287.2770295","url":null,"abstract":"The high power and long global interconnection delay are two of the major limits for further scaling down of the process nodes in the very large scale integrated (VLSI) systems. Therefore, new technologies and computer architectures are under focused development to reduce the power consumption and interconnection delay. Magnetic tunnel junction (MTJ) nanopillar with the advantages of non-volatility, fast switching speed, and high density promises new designs and architectures to significantly alleviate the power and delay issues. This paper presents new logic-in-memory designs of the basic logic gates based on MTJs, including INV, (N)AND, (N)OR and XOR. The MTJ sharing and timing demultiplexing techniques are used in the proposed non-volatile logic gates to greatly reduce the write power. The simulation results show that the write power of the proposed non-volatile logic gates is as low as 285fJ/bit. The basic logic gates can finish the read operation in less than 160ps with 4.35f J read energy. Moreover, the proposed non-volatile logic gates may be reconfigured after fabrication, which makes the designs more flexible and robust.","PeriodicalId":6519,"journal":{"name":"2014 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"41 1","pages":"31-36"},"PeriodicalIF":0.0,"publicationDate":"2014-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82290095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Miguel Diez-Garcia, A. Vincent, N. Izard, D. Querlioz
Carbon nanotube networks are compatible with silicon and possess features of light modulation, detection and emission in the transparency band of silicon. This makes them excellent candidates as active material for silicon photonics. However, the ubiquitous presence of residual metallic nanotubes in nanotube networks is a strong issue for this vision. In this work, we perform Monte Carlo simulations of the electrical properties of nanotube networks, by extracting and simulating an equivalent netlist of the networks. The results allow us to identify the appropriate densities of nanotubes not affected by the metallic nanotube issue, and to propose a first design rule for nanotube-based optoelectronics.
{"title":"Monte Carlo simulations of carbon nanotube networks for optoelectronic applications","authors":"Miguel Diez-Garcia, A. Vincent, N. Izard, D. Querlioz","doi":"10.1145/2770287.2770319","DOIUrl":"https://doi.org/10.1145/2770287.2770319","url":null,"abstract":"Carbon nanotube networks are compatible with silicon and possess features of light modulation, detection and emission in the transparency band of silicon. This makes them excellent candidates as active material for silicon photonics. However, the ubiquitous presence of residual metallic nanotubes in nanotube networks is a strong issue for this vision. In this work, we perform Monte Carlo simulations of the electrical properties of nanotube networks, by extracting and simulating an equivalent netlist of the networks. The results allow us to identify the appropriate densities of nanotubes not affected by the metallic nanotube issue, and to propose a first design rule for nanotube-based optoelectronics.","PeriodicalId":6519,"journal":{"name":"2014 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"34 1","pages":"135-136"},"PeriodicalIF":0.0,"publicationDate":"2014-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74670890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Digital circuits made with nano-electro-mechanical (NEM) relays offer energy-efficiency benefits over CMOS since they have zero leakage power and can offer circuit level performance that competes with CMOS. In this paper we show how new relay circuit design techniques combined with those we already demonstrated on smaller relay blocks enable us to optimize the design of the most complex arithmetic unit, the floating-point unit (FPU). The energy, performance, and area trade-offs of FPU designs with NEM relays are examined and compared with those of state-of-the-art CMOS designs in an equivalent scaled process. Circuits that are critical path bottlenecks for the FPU specifically, most notably the leading zero detector (LZD) and leading zero anticipator (LZA), are optimized with new relay-tailored circuit techniques. These optimizations reduce the NEM relay FPU latency from 71 mechanical delays in an optimal-CMOS-style implementation to 16 mechanical delays in a generalized custom NEM relay implementation. In a 90 nm process node, the FPU designed with NEM relays is projected to achieve 15× lower energy per operation compared to the FPU designed with CMOS.
{"title":"Floating-point unit design with nano-electro-mechanical (NEM) relays","authors":"S. Dutta, V. Stojanović","doi":"10.1145/2770287.2770323","DOIUrl":"https://doi.org/10.1145/2770287.2770323","url":null,"abstract":"Digital circuits made with nano-electro-mechanical (NEM) relays offer energy-efficiency benefits over CMOS since they have zero leakage power and can offer circuit level performance that competes with CMOS. In this paper we show how new relay circuit design techniques combined with those we already demonstrated on smaller relay blocks enable us to optimize the design of the most complex arithmetic unit, the floating-point unit (FPU). The energy, performance, and area trade-offs of FPU designs with NEM relays are examined and compared with those of state-of-the-art CMOS designs in an equivalent scaled process. Circuits that are critical path bottlenecks for the FPU specifically, most notably the leading zero detector (LZD) and leading zero anticipator (LZA), are optimized with new relay-tailored circuit techniques. These optimizations reduce the NEM relay FPU latency from 71 mechanical delays in an optimal-CMOS-style implementation to 16 mechanical delays in a generalized custom NEM relay implementation. In a 90 nm process node, the FPU designed with NEM relays is projected to achieve 15× lower energy per operation compared to the FPU designed with CMOS.","PeriodicalId":6519,"journal":{"name":"2014 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"300 1","pages":"145-150"},"PeriodicalIF":0.0,"publicationDate":"2014-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83086629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Image classification with feed-forward neural networks typically assumes the application of input images as single column vectors, which leads to a large number of required input neurons as well as large synaptic arrays connecting individual neural layers. In this paper we show how a class of memristive devices can be used as non-linear, leaky integrators that extend regular feed-forward neural networks with short-term memory. By trading space for time, our novel architecture allows to reduce the number of neurons by a factor of 3 and the number of synapses up to 15 times on the MNIST data set compared to previously reported results. Furthermore, the results indicate that less neurons and synapses also leads to a reduced learning complexity. With memristive devices functioning as dynamic processing elements, our findings advocate for a diverse use of memristive devices that would allow to build more area-efficient hardware by exploiting more than just their non-volatile memory property.
{"title":"Volatile memristive devices as short-term memory in a neuromorphic learning architecture","authors":"Jens Bürger, C. Teuscher","doi":"10.1145/2770287.2770313","DOIUrl":"https://doi.org/10.1145/2770287.2770313","url":null,"abstract":"Image classification with feed-forward neural networks typically assumes the application of input images as single column vectors, which leads to a large number of required input neurons as well as large synaptic arrays connecting individual neural layers. In this paper we show how a class of memristive devices can be used as non-linear, leaky integrators that extend regular feed-forward neural networks with short-term memory. By trading space for time, our novel architecture allows to reduce the number of neurons by a factor of 3 and the number of synapses up to 15 times on the MNIST data set compared to previously reported results. Furthermore, the results indicate that less neurons and synapses also leads to a reduced learning complexity. With memristive devices functioning as dynamic processing elements, our findings advocate for a diverse use of memristive devices that would allow to build more area-efficient hardware by exploiting more than just their non-volatile memory property.","PeriodicalId":6519,"journal":{"name":"2014 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"33 1","pages":"104-109"},"PeriodicalIF":0.0,"publicationDate":"2014-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89545294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Goudarzi, Matthew R. Lakin, D. Stefanovic, C. Teuscher
Reconfiguration has been used for both defect- and fault-tolerant nanoscale architectures with regular structure. Recent advances in self-assembled nanowires have opened doors to a new class of electronic devices with irregular structure. For such devices, reservoir computing has been shown to be a viable approach to implement computation. This approach exploits the dynamical properties of a system rather than specifics of its structure. Here, we extend a model of reservoir computing, called the echo state network, to reflect more realistic aspects of self-assembled nanowire networks. As a proof of concept, we use echo state networks to implement basic building blocks of digital computing: AND, OR, and XOR gates, and 2-bit adder and multiplier circuits. We show that the system can operate perfectly in the presence of variations five orders of magnitude higher than ITRS's 2005 target, 6%, and achieves success rates 6 times higher than related approaches at half the cost. We also describe an adaptive algorithm that can detect faults in the system and reconfigure it to resume perfect operational condition.
{"title":"A model for variation- and fault-tolerant digital logic using self-assembled nanowire architectures","authors":"A. Goudarzi, Matthew R. Lakin, D. Stefanovic, C. Teuscher","doi":"10.1145/2770287.2770315","DOIUrl":"https://doi.org/10.1145/2770287.2770315","url":null,"abstract":"Reconfiguration has been used for both defect- and fault-tolerant nanoscale architectures with regular structure. Recent advances in self-assembled nanowires have opened doors to a new class of electronic devices with irregular structure. For such devices, reservoir computing has been shown to be a viable approach to implement computation. This approach exploits the dynamical properties of a system rather than specifics of its structure. Here, we extend a model of reservoir computing, called the echo state network, to reflect more realistic aspects of self-assembled nanowire networks. As a proof of concept, we use echo state networks to implement basic building blocks of digital computing: AND, OR, and XOR gates, and 2-bit adder and multiplier circuits. We show that the system can operate perfectly in the presence of variations five orders of magnitude higher than ITRS's 2005 target, 6%, and achieves success rates 6 times higher than related approaches at half the cost. We also describe an adaptive algorithm that can detect faults in the system and reconfigure it to resume perfect operational condition.","PeriodicalId":6519,"journal":{"name":"2014 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"136 1","pages":"116-121"},"PeriodicalIF":0.0,"publicationDate":"2014-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77363517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Topological insulator (TI) is recently discovered nano-device whose bulk acts as insulator but surface behaves as metal. As state information in a TI device is conducted by ordered spins, it draws tremendous interest for ultra-low power computing. This paper shows a state-space modeling and design exploration of TI device for non-volatile memory (NVM) design. The non-traditional electrical state in TI is extracted and modeled in a SPICE-like simulator. The model is the employed for hybrid CMOS-TI NVM design explorations for both memory cell and memory array. The experiment results show that TI based NVM exhibits a fast write and read latency as low as 20ns. In addition, compared to other emerging NVM technologies, it exhibits several orders of magnitude lower operation energy.
{"title":"Design exploration of ultra-low power non-volatile memory based on topological insulator","authors":"Yuhao Wang, Hao Yu","doi":"10.1145/2765491.2765498","DOIUrl":"https://doi.org/10.1145/2765491.2765498","url":null,"abstract":"Topological insulator (TI) is recently discovered nano-device whose bulk acts as insulator but surface behaves as metal. As state information in a TI device is conducted by ordered spins, it draws tremendous interest for ultra-low power computing. This paper shows a state-space modeling and design exploration of TI device for non-volatile memory (NVM) design. The non-traditional electrical state in TI is extracted and modeled in a SPICE-like simulator. The model is the employed for hybrid CMOS-TI NVM design explorations for both memory cell and memory array. The experiment results show that TI based NVM exhibits a fast write and read latency as low as 20ns. In addition, compared to other emerging NVM technologies, it exhibits several orders of magnitude lower operation energy.","PeriodicalId":6519,"journal":{"name":"2014 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"1 1","pages":"30-35"},"PeriodicalIF":0.0,"publicationDate":"2012-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88884163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}