Pub Date : 2017-11-13DOI: 10.1109/ICCAD.2017.8203850
Quan Deng, Youtao Zhang, Minxuan Zhang, Jun Yang
Modern Graphics Processing Units (GPUs) widely adopt large SRAM based register file (RF) to enable fast context-switch. A large SRAM RF may consume 20% to 40% GPU power, which has become one of the major design challenges for GPUs. Recent studies mitigate the issue through hybrid RF designs that architect a large STT-RAM (Spin Transfer Torque Magnetic memory) RF and a small SRAM buffer. However, the long STT-RAM write latency throttles the data exchange between STT-RAM and SRAM, which deprecates warp scheduler with frequent context switches, e.g., round robin scheduler. In this paper, we propose HC-RF, a warp-scheduler friendly hybrid RF design using novel SRAM/STT-RAM hybrid cell (HC) structure. HC-RF exploits cell level integration to improve the effective bandwidth between STT-RAM and SRAM. By enabling silent data transfer from SRAM to STT-RAM without blocking RF banks, HC-RF supports concurrent context-switching and decouples its dependency on warp scheduler. Our experimental results show that, on average, HC-RF achieves 50% performance improvement and 44% energy consumption reduction over the coarse-grained hybrid design when adopting LRR(Loose Round Robin) warp scheduler.
{"title":"Towards warp-scheduler friendly STT-RAM/SRAM hybrid GPGPU register file design","authors":"Quan Deng, Youtao Zhang, Minxuan Zhang, Jun Yang","doi":"10.1109/ICCAD.2017.8203850","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203850","url":null,"abstract":"Modern Graphics Processing Units (GPUs) widely adopt large SRAM based register file (RF) to enable fast context-switch. A large SRAM RF may consume 20% to 40% GPU power, which has become one of the major design challenges for GPUs. Recent studies mitigate the issue through hybrid RF designs that architect a large STT-RAM (Spin Transfer Torque Magnetic memory) RF and a small SRAM buffer. However, the long STT-RAM write latency throttles the data exchange between STT-RAM and SRAM, which deprecates warp scheduler with frequent context switches, e.g., round robin scheduler. In this paper, we propose HC-RF, a warp-scheduler friendly hybrid RF design using novel SRAM/STT-RAM hybrid cell (HC) structure. HC-RF exploits cell level integration to improve the effective bandwidth between STT-RAM and SRAM. By enabling silent data transfer from SRAM to STT-RAM without blocking RF banks, HC-RF supports concurrent context-switching and decouples its dependency on warp scheduler. Our experimental results show that, on average, HC-RF achieves 50% performance improvement and 44% energy consumption reduction over the coarse-grained hybrid design when adopting LRR(Loose Round Robin) warp scheduler.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124142057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-13DOI: 10.1109/ICCAD.2017.8203783
Arighna Deb, R. Wille, R. Drechsler
Optical circuits received significant interest as a promising alternative to existing electronic systems. Because of this, also the synthesis of optical circuits receives increasing attention. However, initial solutions for the synthesis of optical circuits either rely on manual design or rather straight-forward mappings from established data-structures such as BDDs, SoPs/ESoPs, etc. to the corresponding optical netlist. These approaches hardly utilize the full potential of the gate libraries available in this domain. In this paper, we propose an alternative synthesis solution based on AND-Inverter Graphs (AIGs) which is capable of utilizing this potential. That is, a scheme is presented which dedicatedly maps the given function representation to the desired circuit in a one-to-one fashion — yielding significantly smaller circuit sizes. Experimental evaluations confirm that the proposed solution generates optical circuits with up to 97% less number of gates as compared to existing synthesis approaches.
{"title":"Dedicated synthesis for MZI-based optical circuits based on AND-inverter graphs","authors":"Arighna Deb, R. Wille, R. Drechsler","doi":"10.1109/ICCAD.2017.8203783","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203783","url":null,"abstract":"Optical circuits received significant interest as a promising alternative to existing electronic systems. Because of this, also the synthesis of optical circuits receives increasing attention. However, initial solutions for the synthesis of optical circuits either rely on manual design or rather straight-forward mappings from established data-structures such as BDDs, SoPs/ESoPs, etc. to the corresponding optical netlist. These approaches hardly utilize the full potential of the gate libraries available in this domain. In this paper, we propose an alternative synthesis solution based on AND-Inverter Graphs (AIGs) which is capable of utilizing this potential. That is, a scheme is presented which dedicatedly maps the given function representation to the desired circuit in a one-to-one fashion — yielding significantly smaller circuit sizes. Experimental evaluations confirm that the proposed solution generates optical circuits with up to 97% less number of gates as compared to existing synthesis approaches.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"218 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134429887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-13DOI: 10.1109/ICCAD.2017.8203753
M. Arjomand, A. Jadidi, M. Kandemir, C. Das
Owing to negligible leakage current, high density and superior scalability, Spin-Transfer Torque RAM (STT-RAM) technology becomes one of the promising candidates for low power and high capacity on-chip caches in multicore systems. While STT-RAM read access latency is comparable to that of SRAM, write operations in STT-RAM are more challenging: writes are slow, consume a large energy, and the lifetime of STT-RAM is limited by the number of write operations to each cell. To overcome these challenges in STT-RAM caches, this paper explores the potential of eliminating redundant writes using the phenomenon of frequent value locality (FVL). According to FLV, few distinct values appear in a large fraction of memory transactions, with emphasis on cache memories in this work. By leveraging frequent value locality, we propose a novel value-based hybrid (STT-RAM +, SRAM) cache that has benefits of both SRAM and STT-RAM technologies — i.e., it is high-performance, power-efficient, and scalable. Our evaluation results for a 8-core chip-multiprocessor with 6MB last-level cache show that our proposed design is able to reduce power consumption of a STT-RAM cache by up to 90% (an average of 82%), enhances its lifetime by up to 52% (29% on average), and improves the system performance by up 30% (11% on average), for a wide range of multi-threaded and multi-program workloads.
{"title":"Leveraging value locality for efficient design of a hybrid cache in multicore processors","authors":"M. Arjomand, A. Jadidi, M. Kandemir, C. Das","doi":"10.1109/ICCAD.2017.8203753","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203753","url":null,"abstract":"Owing to negligible leakage current, high density and superior scalability, Spin-Transfer Torque RAM (STT-RAM) technology becomes one of the promising candidates for low power and high capacity on-chip caches in multicore systems. While STT-RAM read access latency is comparable to that of SRAM, write operations in STT-RAM are more challenging: writes are slow, consume a large energy, and the lifetime of STT-RAM is limited by the number of write operations to each cell. To overcome these challenges in STT-RAM caches, this paper explores the potential of eliminating redundant writes using the phenomenon of frequent value locality (FVL). According to FLV, few distinct values appear in a large fraction of memory transactions, with emphasis on cache memories in this work. By leveraging frequent value locality, we propose a novel value-based hybrid (STT-RAM +, SRAM) cache that has benefits of both SRAM and STT-RAM technologies — i.e., it is high-performance, power-efficient, and scalable. Our evaluation results for a 8-core chip-multiprocessor with 6MB last-level cache show that our proposed design is able to reduce power consumption of a STT-RAM cache by up to 90% (an average of 82%), enhances its lifetime by up to 52% (29% on average), and improves the system performance by up 30% (11% on average), for a wide range of multi-threaded and multi-program workloads.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128086256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-13DOI: 10.1109/ICCAD.2017.8203756
Yeseong Kim, M. Imani, T. Simunic
In recent years, machine learning for visual object recognition has been applied to various domains, e.g., autonomous vehicle, heath diagnose, and home automation. However, the recognition procedures still consume a lot of processing energy and incur a high cost of data movement for memory accesses. In this paper, we propose a novel hardware accelerator design, called ORCHARD, which processes the object recognition tasks inside memory. The proposed design accelerates both the image feature extraction and boosting-based learning algorithm, which are key subtasks of the state-of-the-art image recognition approaches. We optimize the recognition procedures by leveraging approximate computing and emerging non-volatile memory (NVM) technology. The NVM-based in-memory processing allows the proposed design to mitigate the CMOS-based computation overhead, highly improving the system efficiency. In our evaluation conducted on circuit- and device-level simulations, we show that ORCHARD successfully performs practical image recognition tasks, including text, face, pedestrian, and vehicle recognition with 0.3% of accuracy loss made by computation approximation. In addition, our design significantly improves the performance and energy efficiency by up to 376x and 1896x, respectively, compared to the existing processor-based implementation.
{"title":"ORCHARD: Visual object recognition accelerator based on approximate in-memory processing","authors":"Yeseong Kim, M. Imani, T. Simunic","doi":"10.1109/ICCAD.2017.8203756","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203756","url":null,"abstract":"In recent years, machine learning for visual object recognition has been applied to various domains, e.g., autonomous vehicle, heath diagnose, and home automation. However, the recognition procedures still consume a lot of processing energy and incur a high cost of data movement for memory accesses. In this paper, we propose a novel hardware accelerator design, called ORCHARD, which processes the object recognition tasks inside memory. The proposed design accelerates both the image feature extraction and boosting-based learning algorithm, which are key subtasks of the state-of-the-art image recognition approaches. We optimize the recognition procedures by leveraging approximate computing and emerging non-volatile memory (NVM) technology. The NVM-based in-memory processing allows the proposed design to mitigate the CMOS-based computation overhead, highly improving the system efficiency. In our evaluation conducted on circuit- and device-level simulations, we show that ORCHARD successfully performs practical image recognition tasks, including text, face, pedestrian, and vehicle recognition with 0.3% of accuracy loss made by computation approximation. In addition, our design significantly improves the performance and energy efficiency by up to 376x and 1896x, respectively, compared to the existing processor-based implementation.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"373 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116364542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-13DOI: 10.1109/ICCAD.2017.8203810
Ye Tian, Ting Wang, Qian Zhang, Q. Xu
Computing with memory, which stores function responses of some input patterns into lookup tables offline and retrieves their values when encountering similar patterns (instead of performing online calculation), is a promising energy-efficient computing technique. No doubt to say, with a given lookup table size, the efficiency of this technique depends on which function responses are stored and how they are organized. In this paper, we propose a novel adaptive approximate lookup table based accelerator, wherein we store function responses in a hierarchical manner with increasing fine-grained granularity and accuracy. In addition, the proposed accelerator provides lightweight compensation on output results at different precision levels according to input patterns and output quality requirements. Moreover, our accelerator conducts adaptive lookup table search by exploiting input locality. Experimental results on various computation kernels show significant energy savings of the proposed accelerator over prior solutions.
{"title":"ApproxLUT: A novel approximate lookup table-based accelerator","authors":"Ye Tian, Ting Wang, Qian Zhang, Q. Xu","doi":"10.1109/ICCAD.2017.8203810","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203810","url":null,"abstract":"Computing with memory, which stores function responses of some input patterns into lookup tables offline and retrieves their values when encountering similar patterns (instead of performing online calculation), is a promising energy-efficient computing technique. No doubt to say, with a given lookup table size, the efficiency of this technique depends on which function responses are stored and how they are organized. In this paper, we propose a novel adaptive approximate lookup table based accelerator, wherein we store function responses in a hierarchical manner with increasing fine-grained granularity and accuracy. In addition, the proposed accelerator provides lightweight compensation on output results at different precision levels according to input patterns and output quality requirements. Moreover, our accelerator conducts adaptive lookup table search by exploiting input locality. Experimental results on various computation kernels show significant energy savings of the proposed accelerator over prior solutions.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124524554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-13DOI: 10.1109/ICCAD.2017.8203788
Hengyu Zhao, Linuo Xue, Ping Chi, Jishen Zhao
Images consume significant storage and space in both consumer devices and in the cloud. As such, image processing applications impose high energy consumption in loading and accessing the image data in the memory. Fortunately, most image processing applications can tolerate approximate image data storage. In addition, multi-level cell spin-transfer torque MRAM (STT-MRAM) offers unique design opportunities as the image memory: the two bits in the memory cell require asymmetric write current — the soft bit requires much less write current than the hard bit. This paper proposes an approximate image processing scheme that improves system energy efficiency without upsetting image quality requirement of applications. Our design consists of (i) an approximate image storage mechanism that strives to only write the soft bits in MLC STT-MRAM main memory with small write current and (ii) a memory mode controller that determines the approximation of image data and coordinates across precise/approximate memory access modes. Our experimental results with various image processing functionalities demonstrate that our design reduces memory access energy consumption by 53% and 2.3 x with 100% user's satisfaction compared with traditional DRAM-based and MLC phase-change-memory-based main memory, respectively.
{"title":"Approximate image storage with multi-level cell STT-MRAM main memory","authors":"Hengyu Zhao, Linuo Xue, Ping Chi, Jishen Zhao","doi":"10.1109/ICCAD.2017.8203788","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203788","url":null,"abstract":"Images consume significant storage and space in both consumer devices and in the cloud. As such, image processing applications impose high energy consumption in loading and accessing the image data in the memory. Fortunately, most image processing applications can tolerate approximate image data storage. In addition, multi-level cell spin-transfer torque MRAM (STT-MRAM) offers unique design opportunities as the image memory: the two bits in the memory cell require asymmetric write current — the soft bit requires much less write current than the hard bit. This paper proposes an approximate image processing scheme that improves system energy efficiency without upsetting image quality requirement of applications. Our design consists of (i) an approximate image storage mechanism that strives to only write the soft bits in MLC STT-MRAM main memory with small write current and (ii) a memory mode controller that determines the approximation of image data and coordinates across precise/approximate memory access modes. Our experimental results with various image processing functionalities demonstrate that our design reduces memory access energy consumption by 53% and 2.3 x with 100% user's satisfaction compared with traditional DRAM-based and MLC phase-change-memory-based main memory, respectively.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124333165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-13DOI: 10.1109/ICCAD.2017.8203767
Nian-Ze Lee, Victor N. Kravets, J. H. Jiang
Engineering change order (ECO) is pivotal in rectifying late design changes that occur commonly due to ever-increasing system complexity. Existing functional ECO methods focus on combinational equivalence assuming a known input correspondence between the old implementation and new specification. They are inadequate for rectifying circuits under sequential transformations. This inadequacy hinders the utilization of powerful and effective sequential optimization methods using retiming and resynthesis. As retiming and/or resynthesis gains increasing adoption in industry, incorporating sequential ECO techniques into the hardware design flow becomes essential. In this paper, we provide the first attempt to extend ECO to designs under retiming and resynthesis in an industrial flow by leveraging conventional combinational ECO engine. Experimental results over industrial ECO benchmarks justify the promising practicality of our methods.
{"title":"Sequential engineering change order under retiming and resynthesis","authors":"Nian-Ze Lee, Victor N. Kravets, J. H. Jiang","doi":"10.1109/ICCAD.2017.8203767","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203767","url":null,"abstract":"Engineering change order (ECO) is pivotal in rectifying late design changes that occur commonly due to ever-increasing system complexity. Existing functional ECO methods focus on combinational equivalence assuming a known input correspondence between the old implementation and new specification. They are inadequate for rectifying circuits under sequential transformations. This inadequacy hinders the utilization of powerful and effective sequential optimization methods using retiming and resynthesis. As retiming and/or resynthesis gains increasing adoption in industry, incorporating sequential ECO techniques into the hardware design flow becomes essential. In this paper, we provide the first attempt to extend ECO to designs under retiming and resynthesis in an industrial flow by leveraging conventional combinational ECO engine. Experimental results over industrial ECO benchmarks justify the promising practicality of our methods.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125559403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-13DOI: 10.1109/ICCAD.2017.8203833
Guo-Gin Fan, Mark Po-Hung Lin
Retention registers/latches are commonly applied to power-gated circuits for state retention during the sleep mode. Recent studies have shown that applying uniform multi-bit retention registers (MBRRs) can reduce the storage size, and hence save more chip area and leakage power compared with single-bit retention registers. In this paper, a new problem formulation of power-gated circuit optimization with nonuniform MBRRs is studied for achieving even more storage saving and higher storage utilization. An ILP-based approach is proposed to effectively explore different combinations of nonuniform MBRR replacement. Experiment results show that the proposed approach can reduce 36% storage size, compared with the state-of-the-art uniform MBRR replacement, while achieving 100% storage utilization.
{"title":"State retention for power gated design with non-uniform multi-bit retention latches","authors":"Guo-Gin Fan, Mark Po-Hung Lin","doi":"10.1109/ICCAD.2017.8203833","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203833","url":null,"abstract":"Retention registers/latches are commonly applied to power-gated circuits for state retention during the sleep mode. Recent studies have shown that applying uniform multi-bit retention registers (MBRRs) can reduce the storage size, and hence save more chip area and leakage power compared with single-bit retention registers. In this paper, a new problem formulation of power-gated circuit optimization with nonuniform MBRRs is studied for achieving even more storage saving and higher storage utilization. An ILP-based approach is proposed to effectively explore different combinations of nonuniform MBRR replacement. Experiment results show that the proposed approach can reduce 36% storage size, compared with the state-of-the-art uniform MBRR replacement, while achieving 100% storage utilization.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121379501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-13DOI: 10.1109/ICCAD.2017.8203805
Muhammad Hassan, V. Herdt, H. M. Le, Daniel Große, R. Drechsler
Security is one of the most burning issues in embedded system design nowadays. The majority of strategies to secure embedded systems are being implemented in software. However, a potential hardware backdoor that allows unprivileged software access to confidential data will render even the perfectly secure software useless. As the underlying SoC cannot be patched after deployment, it is very critical to detect and correct SoC hardware security issues in the design phase. To prevent costly fixes in later stages, security validation should start as early as possible. In this paper, we propose a novel approach to SoC security validation at the system level using Virtual Prototypes (VP). At the heart of the approach is a scalable static information flow analysis that can detect potential security breaches such as data leakage and untrusted access; confidentiality and integrity issues, respectively. We demonstrate the applicability of the approach on real-world VPs.
{"title":"Early SoC security validation by VP-based static information flow analysis","authors":"Muhammad Hassan, V. Herdt, H. M. Le, Daniel Große, R. Drechsler","doi":"10.1109/ICCAD.2017.8203805","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203805","url":null,"abstract":"Security is one of the most burning issues in embedded system design nowadays. The majority of strategies to secure embedded systems are being implemented in software. However, a potential hardware backdoor that allows unprivileged software access to confidential data will render even the perfectly secure software useless. As the underlying SoC cannot be patched after deployment, it is very critical to detect and correct SoC hardware security issues in the design phase. To prevent costly fixes in later stages, security validation should start as early as possible. In this paper, we propose a novel approach to SoC security validation at the system level using Virtual Prototypes (VP). At the heart of the approach is a scalable static information flow analysis that can detect potential security breaches such as data leakage and untrusted access; confidentiality and integrity issues, respectively. We demonstrate the applicability of the approach on real-world VPs.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115443954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-11-13DOI: 10.1109/ICCAD.2017.8203854
Lei Zhao, Youtao Zhang, Jun Yang
Neural Networks (NNs) have recently gained popularity in a wide range of modern application domains due to its superior inference accuracy. With growing problem size and complexity, modern NNs, e.g., CNNs (Convolutional NNs) and DNNs (Deep NNs), contain a large number of weights, which require tremendous efforts not only to prepare representative training datasets but also to train the network. There is an increasing demand to protect the NN weight matrices, an emerging Intellectual Property (IP) in NN field. Unfortunately, adopting conventional encryption method faces significant performance and energy consumption overheads. In this paper, we propose AEP, a DianNao based NN accelerator design for IP protection. AEP aggressively reduces DRAM timing to generate a device dependent error mask, i.e., a set of erroneous cells while the distribution of these cells are device dependent due to process variations. AEP incorporates the error mask in the NN training process so that the trained weights are device dependent, which effectively defects IP piracy as exporting the weights to other devices cannot produce satisfactory inference accuracy. In addition, AEP speeds up NN inference and achieves significant energy reduction due to the fact that main memory dominates the energy consumption in DianNao accelerator. Our evaluation results show that by injecting 0.1% to 5% memory errors, AEP has negligible inference accuracy loss on the target device while exhibiting unacceptable accuracy degradation on other devices. In addition, AEP achieves an average of 72% performance improvement and 44% energy reduction over the DianNao baseline.
{"title":"AEP: An error-bearing neural network accelerator for energy efficiency and model protection","authors":"Lei Zhao, Youtao Zhang, Jun Yang","doi":"10.1109/ICCAD.2017.8203854","DOIUrl":"https://doi.org/10.1109/ICCAD.2017.8203854","url":null,"abstract":"Neural Networks (NNs) have recently gained popularity in a wide range of modern application domains due to its superior inference accuracy. With growing problem size and complexity, modern NNs, e.g., CNNs (Convolutional NNs) and DNNs (Deep NNs), contain a large number of weights, which require tremendous efforts not only to prepare representative training datasets but also to train the network. There is an increasing demand to protect the NN weight matrices, an emerging Intellectual Property (IP) in NN field. Unfortunately, adopting conventional encryption method faces significant performance and energy consumption overheads. In this paper, we propose AEP, a DianNao based NN accelerator design for IP protection. AEP aggressively reduces DRAM timing to generate a device dependent error mask, i.e., a set of erroneous cells while the distribution of these cells are device dependent due to process variations. AEP incorporates the error mask in the NN training process so that the trained weights are device dependent, which effectively defects IP piracy as exporting the weights to other devices cannot produce satisfactory inference accuracy. In addition, AEP speeds up NN inference and achieves significant energy reduction due to the fact that main memory dominates the energy consumption in DianNao accelerator. Our evaluation results show that by injecting 0.1% to 5% memory errors, AEP has negligible inference accuracy loss on the target device while exhibiting unacceptable accuracy degradation on other devices. In addition, AEP achieves an average of 72% performance improvement and 44% energy reduction over the DianNao baseline.","PeriodicalId":126686,"journal":{"name":"2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116920797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}