Pub Date : 2019-07-15DOI: 10.1109/ISVLSI.2019.00049
Walter Lau Neto, Xifan Tang, Max Austin, L. Amarù, P. Gaillardon
Majority-inverter graph (MIG) is a recently introduced Boolean network that enables efficient logic manipulation. Recent works show that MIGs are capable of achieving significant improvements in area, delay, and power when comparing to current academic and commercial tools. However, current MIG optimizations are limited to combinational circuits, missing the sequential elements which are ubiquitous in practical implementations. This paper is the first to study the sequential optimization opportunities using MIGs. The presented extension leverages the efficiency of MIGs area and depth-oriented rewriting algorithms for combinational circuits in sequential networks. Experimental results showed that, averaged over the OpenCores benchmark suite, (1) when considering technology-independent evaluations, compared to a popular academic tool, our MIG-based sequential optimization brings an improvement of 9% and 38% in area and delay respectively; (2) when using a standard optimization+technology mapping flow for ASICs with a 7nm predictive standard cell library, the proposed sequential optimizer outperforms both academic and commercial tools in energy-delay product (EDP) by 12% and 4% respectively and area-delay product (ADP) by 13% and 7% respectively.
{"title":"Improving Logic Optimization in Sequential Circuits using Majority-inverter Graphs","authors":"Walter Lau Neto, Xifan Tang, Max Austin, L. Amarù, P. Gaillardon","doi":"10.1109/ISVLSI.2019.00049","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00049","url":null,"abstract":"Majority-inverter graph (MIG) is a recently introduced Boolean network that enables efficient logic manipulation. Recent works show that MIGs are capable of achieving significant improvements in area, delay, and power when comparing to current academic and commercial tools. However, current MIG optimizations are limited to combinational circuits, missing the sequential elements which are ubiquitous in practical implementations. This paper is the first to study the sequential optimization opportunities using MIGs. The presented extension leverages the efficiency of MIGs area and depth-oriented rewriting algorithms for combinational circuits in sequential networks. Experimental results showed that, averaged over the OpenCores benchmark suite, (1) when considering technology-independent evaluations, compared to a popular academic tool, our MIG-based sequential optimization brings an improvement of 9% and 38% in area and delay respectively; (2) when using a standard optimization+technology mapping flow for ASICs with a 7nm predictive standard cell library, the proposed sequential optimizer outperforms both academic and commercial tools in energy-delay product (EDP) by 12% and 4% respectively and area-delay product (ADP) by 13% and 7% respectively.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"28 1","pages":"224-229"},"PeriodicalIF":0.0,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77656693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-15DOI: 10.1109/ISVLSI.2019.00030
Salma Hesham, D. Göhringer, M. A. E. Ghany
In this paper, we propose a dark-silicon inspired hierarchical Time-Division-Multiplexing (TDM) network-on-chip (NoC) with online distributed setup-scheme for slots allocation. In addition to the normal mesh routers, we propose hierarchical routers, making use of the dim silicon parts of the chip, to hierarchically connect quad-routers units. Normal routers operate at full-chip-frequency at supply of 1. 2V, while hierarchical routers operate at half-chip-frequency and supply of 0.8V with double datawidth and half slot-size. Routers follow a proposed architecture that distinguishes between data-path and control-setup sub-routers. This allows separate clocking and operating supplies between data and control and to keep the control as a single-slot-cycle design independent of the data slot size. The proposed NoC architecture as well as a base NoC architecture from state-of-the-art are evaluated under uniform random traffic using Synopsys VCS and synthesized using Synopsys Design Compiler for SAED90nm technology. With the same power budget of the base NoC, the proposed architecture provides up to 74% improved setup latency, 32% increased NoC saturation load, and 21% higher success rates. The proposed hierarchical quad is based on leveraging the dim silicon parts of the chip for an energy efficient design. Though it consumes 1.78 times the area of the base quad, however with 56% under-clocked area operating at half the maximum chip frequency; thus reducing the power density to 52% of the base NoC.
{"title":"Dark-Silicon Inspired Energy Efficient Hierarchical TDM NoC","authors":"Salma Hesham, D. Göhringer, M. A. E. Ghany","doi":"10.1109/ISVLSI.2019.00030","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00030","url":null,"abstract":"In this paper, we propose a dark-silicon inspired hierarchical Time-Division-Multiplexing (TDM) network-on-chip (NoC) with online distributed setup-scheme for slots allocation. In addition to the normal mesh routers, we propose hierarchical routers, making use of the dim silicon parts of the chip, to hierarchically connect quad-routers units. Normal routers operate at full-chip-frequency at supply of 1. 2V, while hierarchical routers operate at half-chip-frequency and supply of 0.8V with double datawidth and half slot-size. Routers follow a proposed architecture that distinguishes between data-path and control-setup sub-routers. This allows separate clocking and operating supplies between data and control and to keep the control as a single-slot-cycle design independent of the data slot size. The proposed NoC architecture as well as a base NoC architecture from state-of-the-art are evaluated under uniform random traffic using Synopsys VCS and synthesized using Synopsys Design Compiler for SAED90nm technology. With the same power budget of the base NoC, the proposed architecture provides up to 74% improved setup latency, 32% increased NoC saturation load, and 21% higher success rates. The proposed hierarchical quad is based on leveraging the dim silicon parts of the chip for an energy efficient design. Though it consumes 1.78 times the area of the base quad, however with 56% under-clocked area operating at half the maximum chip frequency; thus reducing the power density to 52% of the base NoC.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"87 1","pages":"116-121"},"PeriodicalIF":0.0,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78266186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-15DOI: 10.1109/ISVLSI.2019.00088
Jingyan Fu, Zhiheng Liao, Na Gong, Jinhui Wang
Memristors offer advantages as a hardware solution for neuromorphic computing, however, their nonlinear property makes the weight update difficult and reduces the accuracy of a neural network. A piecewise linear (PL) method is proposed in this paper to mitigate the nonlinear effect of memristors by calculating the weight update parameters along a piecewise line, which reduces errors in the weight update process. It is a simple but efficient method for the nonlinearity mitigation without reading the current conductance of the memristor in each updating, thereby avoiding complex peripheral circuits. The PL methods with respectively with 2-segment, 3-segment, and 4-segment models in two split points selection strategies are investigated, and the results show that under different nonlinearity, the PL method improves the recognition accuracy of MNIST handwriting digits to 87.87%-95.05%, as compared to 10.77%-73.18% of the cases without PL method. Finally, it concludes that the more segments in PL methods, the less weight deviation caused by the non-linearity of the synapse device.
{"title":"Linear Optimization for Memristive Device in Neuromorphic Hardware","authors":"Jingyan Fu, Zhiheng Liao, Na Gong, Jinhui Wang","doi":"10.1109/ISVLSI.2019.00088","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00088","url":null,"abstract":"Memristors offer advantages as a hardware solution for neuromorphic computing, however, their nonlinear property makes the weight update difficult and reduces the accuracy of a neural network. A piecewise linear (PL) method is proposed in this paper to mitigate the nonlinear effect of memristors by calculating the weight update parameters along a piecewise line, which reduces errors in the weight update process. It is a simple but efficient method for the nonlinearity mitigation without reading the current conductance of the memristor in each updating, thereby avoiding complex peripheral circuits. The PL methods with respectively with 2-segment, 3-segment, and 4-segment models in two split points selection strategies are investigated, and the results show that under different nonlinearity, the PL method improves the recognition accuracy of MNIST handwriting digits to 87.87%-95.05%, as compared to 10.77%-73.18% of the cases without PL method. Finally, it concludes that the more segments in PL methods, the less weight deviation caused by the non-linearity of the synapse device.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"11 1","pages":"453-458"},"PeriodicalIF":0.0,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82759008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-15DOI: 10.1109/ISVLSI.2019.00063
C. Bresch, D. Hély, Stéphanie Chollet, I. Parissis
With the emergence of Internet of Things (IoT), embedded computing cores tend to be increasingly used to handle critical applications. In order to avoid faulty scenarios on these devices, there is a need to provide extra hardware support against memory corruption bug exploits. To address this issue, the presented paper provides a new efficient fine-grained data flow integrity mechanism based on a translation lookaside buffer. The concept is validated by extending the RISC-V instruction set and implementing it on a Digilent Xilinx Arty-35T board. The obtained results show that the contribution extends few features in the processor pipeline, the compiler and does not induce any software overhead at run-time.
{"title":"TrustFlow: A Trusted Memory Support for Data Flow Integrity","authors":"C. Bresch, D. Hély, Stéphanie Chollet, I. Parissis","doi":"10.1109/ISVLSI.2019.00063","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00063","url":null,"abstract":"With the emergence of Internet of Things (IoT), embedded computing cores tend to be increasingly used to handle critical applications. In order to avoid faulty scenarios on these devices, there is a need to provide extra hardware support against memory corruption bug exploits. To address this issue, the presented paper provides a new efficient fine-grained data flow integrity mechanism based on a translation lookaside buffer. The concept is validated by extending the RISC-V instruction set and implementing it on a Digilent Xilinx Arty-35T board. The obtained results show that the contribution extends few features in the processor pipeline, the compiler and does not induce any software overhead at run-time.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"19 1","pages":"308-313"},"PeriodicalIF":0.0,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83204478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-15DOI: 10.1109/ISVLSI.2019.00093
Zhiming Zhang, Qiaoyan Yu
Three-dimensional (3D) integration facilitates to integrate increasing number of transistors into a single package. Despite of improved performance and power efficiency, the integration of multiple dies into the same package potentially leads to new security threats, such as 3D hardware Trojans. In this work, we first provide a thorough survey of reported hardware Trojans in 3D integrated circuits and systems, and then propose comprehensive 3D hardware Trojan models. A case study is performed to verify the implementation feasibility of thermal-triggered 3D Trojan. The activation speed of the 3D Trojan is compared to its 2D counterpart to confirm that 3D IC provides a better environment to hide thermal Trojans.
{"title":"Modeling Hardware Trojans in 3D ICs","authors":"Zhiming Zhang, Qiaoyan Yu","doi":"10.1109/ISVLSI.2019.00093","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00093","url":null,"abstract":"Three-dimensional (3D) integration facilitates to integrate increasing number of transistors into a single package. Despite of improved performance and power efficiency, the integration of multiple dies into the same package potentially leads to new security threats, such as 3D hardware Trojans. In this work, we first provide a thorough survey of reported hardware Trojans in 3D integrated circuits and systems, and then propose comprehensive 3D hardware Trojan models. A case study is performed to verify the implementation feasibility of thermal-triggered 3D Trojan. The activation speed of the 3D Trojan is compared to its 2D counterpart to confirm that 3D IC provides a better environment to hide thermal Trojans.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"10 1","pages":"483-488"},"PeriodicalIF":0.0,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81531528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-15DOI: 10.1109/ISVLSI.2019.00025
Cheng-Wei Tai, Rung-Bin Lin
In this article we present a concept called morphed layouts which are layouts of a standard cell with different footprints on the pins of each layout. We propose two approaches to exploiting morphed layouts for pin length reduction. The first approach is performed after placement but before routing. This approach enables design space exploration to seek best trade-off between total wire length and via count. It can obtain better results than the previous work when dealing with large circuits. The second approach is applied to a routed design, which can always achieve pin length reduction without via count increase. It can on average reduce total pin length by 12.1% and total wire length by 3.4%.
{"title":"Morphed Standard Cell Layouts for Pin Length Reduction","authors":"Cheng-Wei Tai, Rung-Bin Lin","doi":"10.1109/ISVLSI.2019.00025","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00025","url":null,"abstract":"In this article we present a concept called morphed layouts which are layouts of a standard cell with different footprints on the pins of each layout. We propose two approaches to exploiting morphed layouts for pin length reduction. The first approach is performed after placement but before routing. This approach enables design space exploration to seek best trade-off between total wire length and via count. It can obtain better results than the previous work when dealing with large circuits. The second approach is applied to a routed design, which can always achieve pin length reduction without via count increase. It can on average reduce total pin length by 12.1% and total wire length by 3.4%.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"26 1","pages":"94-99"},"PeriodicalIF":0.0,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82752307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-15DOI: 10.1109/ISVLSI.2019.00092
Siavoosh Payandeh Azad, G. Jervan, Michael Tempelmeier, Martha Johanna Sepúlveda
Dynamic security zones in Multiprocessor System-on-Chip (MP-SoC) has been used to isolate sensitive applications from possible attackers. These physical wrappers are usually configured through programmable hardware firewalls. Previous works have shown the efficiency of this security mechanism against a wide variety of attacks. However, the security zone configuration is performed in an unprotected way, exposing the system to attacks caused by rogue firewall update. In this work we propose CAESAR-MPSoC, an enhanced MPSoC able to ensure the protected configuration of the firewalls through encrypted and authenticated reconfiguration packets. To this end, we present two contributions. First, we integrate two CAESAR (Competition for Authenticated Encryption: Security, Applicability, and Robustness) hardware IP cores, ASCON and AEGIS, into MPSoCs. Second, we developed a light-weight interface that allows to plug-in the different CAESAR cores into MPSoC environment. Third, we show the protected configuration of security zones. Fourth, we evaluate the security, area and cost of CAESAR-MPSoC. The results show that our solution is feasible and effective to allow the protected and efficient security zone configuration.
{"title":"CAESAR-MPSoC: Dynamic and Efficient MPSoC Security Zones","authors":"Siavoosh Payandeh Azad, G. Jervan, Michael Tempelmeier, Martha Johanna Sepúlveda","doi":"10.1109/ISVLSI.2019.00092","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00092","url":null,"abstract":"Dynamic security zones in Multiprocessor System-on-Chip (MP-SoC) has been used to isolate sensitive applications from possible attackers. These physical wrappers are usually configured through programmable hardware firewalls. Previous works have shown the efficiency of this security mechanism against a wide variety of attacks. However, the security zone configuration is performed in an unprotected way, exposing the system to attacks caused by rogue firewall update. In this work we propose CAESAR-MPSoC, an enhanced MPSoC able to ensure the protected configuration of the firewalls through encrypted and authenticated reconfiguration packets. To this end, we present two contributions. First, we integrate two CAESAR (Competition for Authenticated Encryption: Security, Applicability, and Robustness) hardware IP cores, ASCON and AEGIS, into MPSoCs. Second, we developed a light-weight interface that allows to plug-in the different CAESAR cores into MPSoC environment. Third, we show the protected configuration of security zones. Fourth, we evaluate the security, area and cost of CAESAR-MPSoC. The results show that our solution is feasible and effective to allow the protected and efficient security zone configuration.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"27 1","pages":"477-482"},"PeriodicalIF":0.0,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79016018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-15DOI: 10.1109/ISVLSI.2019.00095
Ali Ozdemir, Mshabab Alrizah, Kyusun Choi
A Threshold Inverter Quantization (TIQ) architecture for Flash Analog to Digital Converters (ADCs) uses inverters as a voltage comparator. TIQ approach has many advantages over a differential voltage comparator, but it is hard to create and select comparators for it. Precise selection of gate switching voltage is crucial for Flash Analog to Digital Converters (ADCs). Therefore, Differential Non-Linearity (DNL) and Integral Non-Linearity (INL) error measurements are used to understand how precisely voltage comparators are selected. Different selection algorithms are used to make selection as precise as possible. In this work, we present two new algorithms based on a dynamic programming approach along with DNL and INL simulation results. Comparison with state-of-the-art methods, 4 times, and 5 times DNL improvements are achieved through the new approach for 6-bit and 8-bit respectively.
{"title":"Optimization of Comparator Selection Algorithm for TIQ Flash ADC Using Dynamic Programming Approach","authors":"Ali Ozdemir, Mshabab Alrizah, Kyusun Choi","doi":"10.1109/ISVLSI.2019.00095","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00095","url":null,"abstract":"A Threshold Inverter Quantization (TIQ) architecture for Flash Analog to Digital Converters (ADCs) uses inverters as a voltage comparator. TIQ approach has many advantages over a differential voltage comparator, but it is hard to create and select comparators for it. Precise selection of gate switching voltage is crucial for Flash Analog to Digital Converters (ADCs). Therefore, Differential Non-Linearity (DNL) and Integral Non-Linearity (INL) error measurements are used to understand how precisely voltage comparators are selected. Different selection algorithms are used to make selection as precise as possible. In this work, we present two new algorithms based on a dynamic programming approach along with DNL and INL simulation results. Comparison with state-of-the-art methods, 4 times, and 5 times DNL improvements are achieved through the new approach for 6-bit and 8-bit respectively.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"18 1","pages":"495-500"},"PeriodicalIF":0.0,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79885782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-15DOI: 10.1109/ISVLSI.2019.00023
Mehrnoosh Raoufi, Quan Deng, Youtao Zhang, Jun Yang
KSM-based page deduplication is an important Linux system service for reducing main memory consumption on cloud servers. However, it tends to incur large computation and memory bandwidth overheads. Recently proposed hardware-assisted KSM approaches, while effectively addressing the computation overhead, still need to consume a dramatic amount of off-chip memory bandwidth. In this paper, we propose PageCmp, a PIM (Processing-In-Memory) based page deduplication approach, to achieve bandwidth efficiency on cloud servers. PageCmp exploits the bitwise operation capability inside the DRAM cell array to enable fast page comparison. By integrating a lightweight local comparator inside the output buffer of DRAM modules, PageCmp sends only the page comparison result back to the processor. Our experimental results show that, comparing to the state-of-the-art, PageCmp achieves 4x memory bandwidth reduction while introducing less than 1% hardware overhead.
{"title":"PageCmp: Bandwidth Efficient Page Deduplication through In-memory Page Comparison","authors":"Mehrnoosh Raoufi, Quan Deng, Youtao Zhang, Jun Yang","doi":"10.1109/ISVLSI.2019.00023","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00023","url":null,"abstract":"KSM-based page deduplication is an important Linux system service for reducing main memory consumption on cloud servers. However, it tends to incur large computation and memory bandwidth overheads. Recently proposed hardware-assisted KSM approaches, while effectively addressing the computation overhead, still need to consume a dramatic amount of off-chip memory bandwidth. In this paper, we propose PageCmp, a PIM (Processing-In-Memory) based page deduplication approach, to achieve bandwidth efficiency on cloud servers. PageCmp exploits the bitwise operation capability inside the DRAM cell array to enable fast page comparison. By integrating a lightweight local comparator inside the output buffer of DRAM modules, PageCmp sends only the page comparison result back to the processor. Our experimental results show that, comparing to the state-of-the-art, PageCmp achieves 4x memory bandwidth reduction while introducing less than 1% hardware overhead.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"96 1","pages":"82-87"},"PeriodicalIF":0.0,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77446382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-07-15DOI: 10.1109/ISVLSI.2019.00053
Zheng Xu, J. Abraham
Recently Machine Learning (ML) accelerators have grown into prominence with significant power and performance efficiency improvements over CPU and GPU. In this paper, we developed an Algorithm Based Error Checker (ABEC) for Concurrent Error Detection (CED) based on an industry quality Convolution Neural Network (CNN) accelerator with priority to meet high safety Diagnostic Coverage (DC) requirement and enhanced area and power efficiency. Furthermore, we developed an Algorithm Based Cluster Checker (ABCC) with coarse-grained error localization to improve run-time availability. Experimental results showed that we could achieve above 99% DC with only 30% area and power overhead for a selected configuration.
{"title":"Design of a Safe Convolutional Neural Network Accelerator","authors":"Zheng Xu, J. Abraham","doi":"10.1109/ISVLSI.2019.00053","DOIUrl":"https://doi.org/10.1109/ISVLSI.2019.00053","url":null,"abstract":"Recently Machine Learning (ML) accelerators have grown into prominence with significant power and performance efficiency improvements over CPU and GPU. In this paper, we developed an Algorithm Based Error Checker (ABEC) for Concurrent Error Detection (CED) based on an industry quality Convolution Neural Network (CNN) accelerator with priority to meet high safety Diagnostic Coverage (DC) requirement and enhanced area and power efficiency. Furthermore, we developed an Algorithm Based Cluster Checker (ABCC) with coarse-grained error localization to improve run-time availability. Experimental results showed that we could achieve above 99% DC with only 30% area and power overhead for a selected configuration.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"34 1","pages":"247-252"},"PeriodicalIF":0.0,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84998275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}