Use of non-volatile memory (NVM) devices such as resistive RAM (ReRAM) and spin transfer torque RAM (STT-RAM) for designing on-chip caches holds the promise of providing a high-density, low-leakage alternative to SRAM. However, low write endurance of NVMs, along with the write-variation introduced by existing cache management schemes significantly limits the lifetime of NVM caches. We present LastingNVCache, a technique for improving the cache lifetime by mitigating the intra-set write variation. LastingNVCache works on the key idea that by periodically flushing a frequently-written data-item, next time the block can be made to load into a cold block in the set. Through this, the future writes to that data-item can be redirected from a hot block to a cold block, which leads to improvement in the cache lifetime. Microarchitectural simulations have shown that LastingNVCache provides 6.36X, 9.79X, and 10.94X improvement in lifetime for single, dual and quad-core systems, respectively. Also, its implementation overhead is small and it outperforms a recently proposed technique for improving lifetime of NVM caches.
{"title":"LastingNVCache: A Technique for Improving the Lifetime of Non-volatile Caches","authors":"Sparsh Mittal, J. Vetter, Dong Li","doi":"10.1109/ISVLSI.2014.69","DOIUrl":"https://doi.org/10.1109/ISVLSI.2014.69","url":null,"abstract":"Use of non-volatile memory (NVM) devices such as resistive RAM (ReRAM) and spin transfer torque RAM (STT-RAM) for designing on-chip caches holds the promise of providing a high-density, low-leakage alternative to SRAM. However, low write endurance of NVMs, along with the write-variation introduced by existing cache management schemes significantly limits the lifetime of NVM caches. We present LastingNVCache, a technique for improving the cache lifetime by mitigating the intra-set write variation. LastingNVCache works on the key idea that by periodically flushing a frequently-written data-item, next time the block can be made to load into a cold block in the set. Through this, the future writes to that data-item can be redirected from a hot block to a cold block, which leads to improvement in the cache lifetime. Microarchitectural simulations have shown that LastingNVCache provides 6.36X, 9.79X, and 10.94X improvement in lifetime for single, dual and quad-core systems, respectively. Also, its implementation overhead is small and it outperforms a recently proposed technique for improving lifetime of NVM caches.","PeriodicalId":405755,"journal":{"name":"2014 IEEE Computer Society Annual Symposium on VLSI","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123963158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the reduction of supply voltage motivated bypower reduction, the signal to noise ratio of digital signals has reduced. Alternately, signal can be represented as current while the supply voltage still remaining small. This gives rise to the field of current mode signal processing circuits. In this work, we propose a current mode analog Walsh-Hadamard processor while the control mechanism remains digital. The design is implemented in 0.35μm CMOS technology. Walsh-Hadamard transform is a complete transform and finds significant applications in the field of image processing, filter design, multiplexing. To the best of our knowledge, no such implementation exists in the published literature.
{"title":"A New Walsh Hadamard Transform Architecture Using Current Mode Circuit","authors":"S. Bhattacharya, S. Talapatra","doi":"10.1109/ISVLSI.2014.71","DOIUrl":"https://doi.org/10.1109/ISVLSI.2014.71","url":null,"abstract":"With the reduction of supply voltage motivated bypower reduction, the signal to noise ratio of digital signals has reduced. Alternately, signal can be represented as current while the supply voltage still remaining small. This gives rise to the field of current mode signal processing circuits. In this work, we propose a current mode analog Walsh-Hadamard processor while the control mechanism remains digital. The design is implemented in 0.35μm CMOS technology. Walsh-Hadamard transform is a complete transform and finds significant applications in the field of image processing, filter design, multiplexing. To the best of our knowledge, no such implementation exists in the published literature.","PeriodicalId":405755,"journal":{"name":"2014 IEEE Computer Society Annual Symposium on VLSI","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125178315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In typical NoC systems, most Routing Algorithms (RAs) abandon the interconnection between two adjacent routers if one traffic direction is broken, despite whether the other one is still functional or not. In this paper, we propose a distributed logic based RA, which can efficiently utilize the UnPaired Functional (UPF) links in such partially defected interconnects. The basic fault pattern tolerated by the proposed RA is a fault wall, which is composed of adjacent broken links with the same outgoing direction. Messages are routed around the fault walls along the misrouting contours of the broken links. The proposed RA requires at least 3 Virtual Channels (VCs) and dynamically reserve them to misrouted messages to avoid deadlock. Our experiments indicate that, for random and localized traffic patterns, we achieve an average saturation throughput 20% higher than the Solid Fault Region Tolerant (SFRT) RA, and 22% and 14% higher than the Ariadne routing table based RA, respectively. For the real applications, sample and satell, our proposal requires a routing execution time with at least 16% shorter than both SFRT and Ariadne. Synthesis results with Synopsis Design Compiler and TSMC 65nm technology indicate that, embedding the proposed RA into a baseline router results in 11% area overhead, which is only 3% higher than that of SFRT. In contrast, Ariadne area overhead is 15% for an 8 × 8 NoC and increases to 21% for a 10 × 10 NoC.
{"title":"Towards an Effective Utilization of Partially Defected Interconnections in 2D Mesh NoCs","authors":"Changlin Chen, S. Cotofana","doi":"10.1109/ISVLSI.2014.70","DOIUrl":"https://doi.org/10.1109/ISVLSI.2014.70","url":null,"abstract":"In typical NoC systems, most Routing Algorithms (RAs) abandon the interconnection between two adjacent routers if one traffic direction is broken, despite whether the other one is still functional or not. In this paper, we propose a distributed logic based RA, which can efficiently utilize the UnPaired Functional (UPF) links in such partially defected interconnects. The basic fault pattern tolerated by the proposed RA is a fault wall, which is composed of adjacent broken links with the same outgoing direction. Messages are routed around the fault walls along the misrouting contours of the broken links. The proposed RA requires at least 3 Virtual Channels (VCs) and dynamically reserve them to misrouted messages to avoid deadlock. Our experiments indicate that, for random and localized traffic patterns, we achieve an average saturation throughput 20% higher than the Solid Fault Region Tolerant (SFRT) RA, and 22% and 14% higher than the Ariadne routing table based RA, respectively. For the real applications, sample and satell, our proposal requires a routing execution time with at least 16% shorter than both SFRT and Ariadne. Synthesis results with Synopsis Design Compiler and TSMC 65nm technology indicate that, embedding the proposed RA into a baseline router results in 11% area overhead, which is only 3% higher than that of SFRT. In contrast, Ariadne area overhead is 15% for an 8 × 8 NoC and increases to 21% for a 10 × 10 NoC.","PeriodicalId":405755,"journal":{"name":"2014 IEEE Computer Society Annual Symposium on VLSI","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130132380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Programmable reversible logic is emerging as a prospective logic design style for implementation in modern nanotechnology and quantum computing with minimal impact on circuit heat generation. Adiabatic logic is a design methodology for reversible logic in CMOS where the current flow through the circuit is controlled such that the energy dissipation due to switching and capacitor dissipation is minimized. Production of cost-effective Secure Integrated Chips, such as Smart Cards, requires hardware designers to consider tradeoffs in size, security, and power consumption. In order to design successful security-centric designs, the low-level hardware must contain built-in protection mechanisms to supplement cryptographic algorithms such as AES and Triple DES by preventing side channel attacks, such as Differential Power Analysis (DPA). Dynamic logic obfuscates the output waveforms and the circuit operation, reducing the effectiveness of the DPA attack. In this dissertation, I address theory, synthesis, and application of adiabatic and reversible logic circuits for security applications. First, we present a mathematical proof to demonstrate that reversible logic can be used to design sequential computing structures. Next, a novel algorithm for synthesis of adiabatic circuits in CMOS is presented. This approach is unique because it correlates the offsets in the permutation matrix to the transistors required for synthesis, instead of determining an equivalent circuit and substituting a previously synthesized circuit from a library. Using the ESPRESSO heuristic for minimization of Boolean functions method on each output node in parallel, we optimize the synthesized circuit. It is demonstrated that the algorithm produces a 32.86% improvement over previously synthesized circuit benchmarks. For stronger mitigation of DPA attacks, we propose the implementation of Adiabatic Dynamic Differential Logic for applications in secure IC design. A Performance Adiabatic Dynamic Differential Logic (PADDL) is presented for an implementation in high frequency secure ICs. This method improves the differential power over previous dynamic and differential logic methods by up to 89.65. Then, we present an adiabatic S-box which significantly reduces energy imbalance compared to previous benchmarks. The design is capable of forward encryption and reverse decryption with minimal overhead, allowing for efficient hardware reuse.
{"title":"Theory, Synthesis, and Application of Adiabatic and Reversible Logic Circuits for Security Applications","authors":"Matthew Morrison","doi":"10.1109/ISVLSI.2014.88","DOIUrl":"https://doi.org/10.1109/ISVLSI.2014.88","url":null,"abstract":"Programmable reversible logic is emerging as a prospective logic design style for implementation in modern nanotechnology and quantum computing with minimal impact on circuit heat generation. Adiabatic logic is a design methodology for reversible logic in CMOS where the current flow through the circuit is controlled such that the energy dissipation due to switching and capacitor dissipation is minimized. Production of cost-effective Secure Integrated Chips, such as Smart Cards, requires hardware designers to consider tradeoffs in size, security, and power consumption. In order to design successful security-centric designs, the low-level hardware must contain built-in protection mechanisms to supplement cryptographic algorithms such as AES and Triple DES by preventing side channel attacks, such as Differential Power Analysis (DPA). Dynamic logic obfuscates the output waveforms and the circuit operation, reducing the effectiveness of the DPA attack. In this dissertation, I address theory, synthesis, and application of adiabatic and reversible logic circuits for security applications. First, we present a mathematical proof to demonstrate that reversible logic can be used to design sequential computing structures. Next, a novel algorithm for synthesis of adiabatic circuits in CMOS is presented. This approach is unique because it correlates the offsets in the permutation matrix to the transistors required for synthesis, instead of determining an equivalent circuit and substituting a previously synthesized circuit from a library. Using the ESPRESSO heuristic for minimization of Boolean functions method on each output node in parallel, we optimize the synthesized circuit. It is demonstrated that the algorithm produces a 32.86% improvement over previously synthesized circuit benchmarks. For stronger mitigation of DPA attacks, we propose the implementation of Adiabatic Dynamic Differential Logic for applications in secure IC design. A Performance Adiabatic Dynamic Differential Logic (PADDL) is presented for an implementation in high frequency secure ICs. This method improves the differential power over previous dynamic and differential logic methods by up to 89.65. Then, we present an adiabatic S-box which significantly reduces energy imbalance compared to previous benchmarks. The design is capable of forward encryption and reverse decryption with minimal overhead, allowing for efficient hardware reuse.","PeriodicalId":405755,"journal":{"name":"2014 IEEE Computer Society Annual Symposium on VLSI","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126187333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With growing applications and increased integration of functionalities on multi-electrode biosensors, more attentions are paid to the need to include on-chip temperature measurement for providing ambient temperature monitoring of bio-samples and for recording heat generated by biosensor chips and their potential damage to bio-samples. This paper presents an integrated temperature sensor design which is intended to provide ambient temperature monitoring in a highly integrated biosensor system. Special attentions were paid to improve power supply rejection (PSR) performance at the clock frequency of 1MHz in the integrated biosensor system using PSR enhanced OTAs. The temperature sensor design was implemented using a commercial 0.18μm CMOS process. The temperature sensor achieves an inaccuracy of -0.34°C to 0.27°C from -30°C to 80°C. At 36°C, the PSR is around -50dB at 1MHz and -89.5dB at DC.
{"title":"A CMOS Temperature Sensor with -0.34°C to 0.27°C Inaccuracy from -30°C to 80°C","authors":"Hai Chi, Tom Chen","doi":"10.1109/ISVLSI.2014.30","DOIUrl":"https://doi.org/10.1109/ISVLSI.2014.30","url":null,"abstract":"With growing applications and increased integration of functionalities on multi-electrode biosensors, more attentions are paid to the need to include on-chip temperature measurement for providing ambient temperature monitoring of bio-samples and for recording heat generated by biosensor chips and their potential damage to bio-samples. This paper presents an integrated temperature sensor design which is intended to provide ambient temperature monitoring in a highly integrated biosensor system. Special attentions were paid to improve power supply rejection (PSR) performance at the clock frequency of 1MHz in the integrated biosensor system using PSR enhanced OTAs. The temperature sensor design was implemented using a commercial 0.18μm CMOS process. The temperature sensor achieves an inaccuracy of -0.34°C to 0.27°C from -30°C to 80°C. At 36°C, the PSR is around -50dB at 1MHz and -89.5dB at DC.","PeriodicalId":405755,"journal":{"name":"2014 IEEE Computer Society Annual Symposium on VLSI","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128559406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Juan Yi, Weichen Liu, Weiwen Jiang, Mingwen Qin, Lei Yang, Duo Liu, Chunming Xiao, Luelue Du, E. Sha
With the increasing power density and number of cores integrated into a single chip, thermal management is widely recognized as one of the essential issues in Multi-Processor Systems-on-Chip (MPSoCs). An uncontrolled temperature could significantly decrease system performance, lead to high cooling and packaging costs, and even cause serious damage. These issues have made temperature one of the major factors that must be addressed in MPSoC designs. Static scheduling of applications should take the thermal effects of task executions into consideration to keep the chip temperature under a safety threshold. However, inaccurate temperature estimation would cause processor overheating or system performance degradation. In this paper, we propose an improved thermal modeling technique that can be used to predict the chip temperature more accurately and efficiently at design time. We further develop a simulated annealing (SA)-based algorithm to address the static application mapping and scheduling problem based on the improved thermal model. The thermal condition is greatly improved and the total energy consumption is minimized. Experimental results show that the improved thermal modeling technique could provide an average of over 99% accuracy of temperature prediction when comparing with the results offered by Hotspot simulations. Based on it, the SA-based algorithm could reduce the chances that the temperature threshold to be violated at runtime by 24.3%.
{"title":"An Improved Thermal Model for Static Optimization of Application Mapping and Scheduling in Multiprocessor System-on-Chip","authors":"Juan Yi, Weichen Liu, Weiwen Jiang, Mingwen Qin, Lei Yang, Duo Liu, Chunming Xiao, Luelue Du, E. Sha","doi":"10.1109/ISVLSI.2014.40","DOIUrl":"https://doi.org/10.1109/ISVLSI.2014.40","url":null,"abstract":"With the increasing power density and number of cores integrated into a single chip, thermal management is widely recognized as one of the essential issues in Multi-Processor Systems-on-Chip (MPSoCs). An uncontrolled temperature could significantly decrease system performance, lead to high cooling and packaging costs, and even cause serious damage. These issues have made temperature one of the major factors that must be addressed in MPSoC designs. Static scheduling of applications should take the thermal effects of task executions into consideration to keep the chip temperature under a safety threshold. However, inaccurate temperature estimation would cause processor overheating or system performance degradation. In this paper, we propose an improved thermal modeling technique that can be used to predict the chip temperature more accurately and efficiently at design time. We further develop a simulated annealing (SA)-based algorithm to address the static application mapping and scheduling problem based on the improved thermal model. The thermal condition is greatly improved and the total energy consumption is minimized. Experimental results show that the improved thermal modeling technique could provide an average of over 99% accuracy of temperature prediction when comparing with the results offered by Hotspot simulations. Based on it, the SA-based algorithm could reduce the chances that the temperature threshold to be violated at runtime by 24.3%.","PeriodicalId":405755,"journal":{"name":"2014 IEEE Computer Society Annual Symposium on VLSI","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128233846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A security bug in the OpenSSL library, codenamed Heartbleed, allowed attackers to read the contents of the corresponding server's memory, effectively revealing passwords, master keys, and users' session cookies. As long as the server memory contents are in the clear, it is a matter of time until the next bug/attack hands information over to attackers. In this paper, we investigate the applicability of privacy-preserving general-purpose computation, that would potentially render any information leaked indecipherable to attackers. Privacy is ensured by the use of homomorphically-encrypted memory contents. To this end, we explore the boundaries of general-purpose computation constrained for user data privacy. Specifically, we explore the minimum amount of information required for general purpose computation, which typically requires control flow and branches, and to what extent such information can be kept private from threats that have theoretically unlimited resources, including access to the internals of a target system.
{"title":"Trust No One: Thwarting \"heartbleed\" Attacks Using Privacy-Preserving Computation","authors":"N. G. Tsoutsos, M. Maniatakos","doi":"10.1109/ISVLSI.2014.86","DOIUrl":"https://doi.org/10.1109/ISVLSI.2014.86","url":null,"abstract":"A security bug in the OpenSSL library, codenamed Heartbleed, allowed attackers to read the contents of the corresponding server's memory, effectively revealing passwords, master keys, and users' session cookies. As long as the server memory contents are in the clear, it is a matter of time until the next bug/attack hands information over to attackers. In this paper, we investigate the applicability of privacy-preserving general-purpose computation, that would potentially render any information leaked indecipherable to attackers. Privacy is ensured by the use of homomorphically-encrypted memory contents. To this end, we explore the boundaries of general-purpose computation constrained for user data privacy. Specifically, we explore the minimum amount of information required for general purpose computation, which typically requires control flow and branches, and to what extent such information can be kept private from threats that have theoretically unlimited resources, including access to the internals of a target system.","PeriodicalId":405755,"journal":{"name":"2014 IEEE Computer Society Annual Symposium on VLSI","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126704072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we focus on the hypergraph bipartitioning problem and present a new multilevel hypergraph partitioning algorithm that is much faster and of similar quality compared with hMETIS. In the coarsening phase, successive coarsened hypergraphs are constructed using the MFCC (Modified First-Choice Coarsening) algorithm. After getting a small hypergraph containing only a small number of vertices, we will use a randomized algorithm to obtain an initial partition and then apply an A-FM (Alternating Fiduccia-Mattheyses) refinement algorithm to optimize it. In the uncoarsening phase, we will extract clusters level by level and apply the A-FM repeatedly. Experiments on large benchmarks issued in the DAC 2012 Routability-Driven Placement Contest show that we can achieve similar or even better quality (1% improvement in minimum cut on average) and save 50% to 80% running time comparing with the state-of-the-art partitioner hMETIS.
{"title":"A Fast Hypergraph Bipartitioning Algorithm","authors":"Wenzan Cai, Evangeline F. Y. Young","doi":"10.1109/ISVLSI.2014.58","DOIUrl":"https://doi.org/10.1109/ISVLSI.2014.58","url":null,"abstract":"In this paper, we focus on the hypergraph bipartitioning problem and present a new multilevel hypergraph partitioning algorithm that is much faster and of similar quality compared with hMETIS. In the coarsening phase, successive coarsened hypergraphs are constructed using the MFCC (Modified First-Choice Coarsening) algorithm. After getting a small hypergraph containing only a small number of vertices, we will use a randomized algorithm to obtain an initial partition and then apply an A-FM (Alternating Fiduccia-Mattheyses) refinement algorithm to optimize it. In the uncoarsening phase, we will extract clusters level by level and apply the A-FM repeatedly. Experiments on large benchmarks issued in the DAC 2012 Routability-Driven Placement Contest show that we can achieve similar or even better quality (1% improvement in minimum cut on average) and save 50% to 80% running time comparing with the state-of-the-art partitioner hMETIS.","PeriodicalId":405755,"journal":{"name":"2014 IEEE Computer Society Annual Symposium on VLSI","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116201708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a novel neuromemristive architecture for pattern classification based on extreme learning machines (ELMs). Specifically, we propose CMOS current-mode neuron circuits, memristor-based bipolar synapse circuits, and a stochastic, hardware-friendly training approach based on the least-mean-squares (LMS) learning algorithm. These components are integrated into a current-mode ELM architecture. We show that the current-mode design is especially efficient for implementing constant network weights between the ELM's input and hidden layers. The neuromemristive ELM was simulated in the Cadence AMS design environment. We used an experimental memristor model based on experimental data from an HfO_{x} device. The top-level design was validated by training a 10 hidden-node network to detect edges in binary patterns. Results indicate that the proposed architecture and learning approach are able to yield 100% classification accuracy.
{"title":"Neuromemristive Extreme Learning Machines for Pattern Classification","authors":"Cory E. Merkel, D. Kudithipudi","doi":"10.1109/ISVLSI.2014.67","DOIUrl":"https://doi.org/10.1109/ISVLSI.2014.67","url":null,"abstract":"This paper presents a novel neuromemristive architecture for pattern classification based on extreme learning machines (ELMs). Specifically, we propose CMOS current-mode neuron circuits, memristor-based bipolar synapse circuits, and a stochastic, hardware-friendly training approach based on the least-mean-squares (LMS) learning algorithm. These components are integrated into a current-mode ELM architecture. We show that the current-mode design is especially efficient for implementing constant network weights between the ELM's input and hidden layers. The neuromemristive ELM was simulated in the Cadence AMS design environment. We used an experimental memristor model based on experimental data from an HfO_{x} device. The top-level design was validated by training a 10 hidden-node network to detect edges in binary patterns. Results indicate that the proposed architecture and learning approach are able to yield 100% classification accuracy.","PeriodicalId":405755,"journal":{"name":"2014 IEEE Computer Society Annual Symposium on VLSI","volume":" 25","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113950113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Relying on a recently developed gate-level information assurance scheme, we formally analyze the security of design-for-test (DFT) scan chains, the industrial standard testing methods for fabricated chips and, for the first time, formally prove that a circuit with scan chain inserted can violate security properties. The same security assessment method is then applied to a built-in-self-test (BIST) structure where it is shown that even BIST structures can cause security vulnerabilities. To balance trustworthiness and testability, a new design-for-security (DFS) methodology is proposed which, through the modification of scan chain structure, can achieve high security without compromising the testability of the inserted scan structure. To support the task of secure scan chain insertion, a method of scan chain reshuffling is introduced. Using an AES encryption core as the testing platform, we elaborated the security assessment procedure as well as the DFS technique in balancing security and testability of cryptographic circuits.
{"title":"Design-for-Security vs. Design-for-Testability: A Case Study on DFT Chain in Cryptographic Circuits","authors":"Yier Jin","doi":"10.1109/ISVLSI.2014.54","DOIUrl":"https://doi.org/10.1109/ISVLSI.2014.54","url":null,"abstract":"Relying on a recently developed gate-level information assurance scheme, we formally analyze the security of design-for-test (DFT) scan chains, the industrial standard testing methods for fabricated chips and, for the first time, formally prove that a circuit with scan chain inserted can violate security properties. The same security assessment method is then applied to a built-in-self-test (BIST) structure where it is shown that even BIST structures can cause security vulnerabilities. To balance trustworthiness and testability, a new design-for-security (DFS) methodology is proposed which, through the modification of scan chain structure, can achieve high security without compromising the testability of the inserted scan structure. To support the task of secure scan chain insertion, a method of scan chain reshuffling is introduced. Using an AES encryption core as the testing platform, we elaborated the security assessment procedure as well as the DFS technique in balancing security and testability of cryptographic circuits.","PeriodicalId":405755,"journal":{"name":"2014 IEEE Computer Society Annual Symposium on VLSI","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133821903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}