Pub Date : 2017-07-01DOI: 10.1109/ISLPED.2017.8009164
Kyungrak Choi, Woong Choi, Kyungho Shin, Jongsun Park
This paper presents a low area and energy efficient hardware accelerator for the deep convolutional neural networks (CNNs). Based on the multiply-accumulate (MAC) based architecture, three design techniques are proposed to reduce the hardware cost of the convolutional computations. First, to reduce the computational bit-width of convolutions, an adaptive bit-width reduction scheme is proposed based on differential input method. The bit-width reduction approach can reduce the 37 % of operation bit-width with almost ignorable CNN accuracy degradation. Second, it has been found that adapting bi-directional filtering window in CNN accelerator can considerably reduce the energy for data movement with much smaller number of memory accesses. To expedite the bi-directional filtering operations, we also propose a bidirectional first-input-first-output (bi-FIFO). With SRAM bit-cell layout manner, the proposed bi-FIFO facilitates fast data re-distribution with area and energy efficiency. To verify the effectiveness of the proposed techniques, the AlexNet accelerator has been designed. The numerical results show that the proposed adaptive bit-width reduction scheme achieves 25.9% and 47.3% of area and energy savings, respectively. The bi-FIFO based accelerator also achieves 33 % improved processing time.
{"title":"Bit-width reduction and customized register for low cost convolutional neural network accelerator","authors":"Kyungrak Choi, Woong Choi, Kyungho Shin, Jongsun Park","doi":"10.1109/ISLPED.2017.8009164","DOIUrl":"https://doi.org/10.1109/ISLPED.2017.8009164","url":null,"abstract":"This paper presents a low area and energy efficient hardware accelerator for the deep convolutional neural networks (CNNs). Based on the multiply-accumulate (MAC) based architecture, three design techniques are proposed to reduce the hardware cost of the convolutional computations. First, to reduce the computational bit-width of convolutions, an adaptive bit-width reduction scheme is proposed based on differential input method. The bit-width reduction approach can reduce the 37 % of operation bit-width with almost ignorable CNN accuracy degradation. Second, it has been found that adapting bi-directional filtering window in CNN accelerator can considerably reduce the energy for data movement with much smaller number of memory accesses. To expedite the bi-directional filtering operations, we also propose a bidirectional first-input-first-output (bi-FIFO). With SRAM bit-cell layout manner, the proposed bi-FIFO facilitates fast data re-distribution with area and energy efficiency. To verify the effectiveness of the proposed techniques, the AlexNet accelerator has been designed. The numerical results show that the proposed adaptive bit-width reduction scheme achieves 25.9% and 47.3% of area and energy savings, respectively. The bi-FIFO based accelerator also achieves 33 % improved processing time.","PeriodicalId":385714,"journal":{"name":"2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128386959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-07-01DOI: 10.1109/ISLPED.2017.8009172
Younghyun Kim, Setareh Behroozi, V. Raghunathan, A. Raghunathan
Mobile, wearable, and implantable devices integrate an increasing number and variety of sensors such as microphones, image sensors, and accelerometers. These devices spend substantial amounts of time reading the sensors within them, thereby incurring significant energy dissipation over off-chip serial interconnects. This paper proposes AXSERBUS, a quality-configurable approximate serial bus that exploits the locality of sensory data and the error resiliency of sensing applications to reduce energy dissipation. AXSERBUS significantly reduces signal transitions by encoding the differentials of sensory data in three encoding modes, depending on the magnitude of the differentials: very small differentials are zeroed out, incurring no energy dissipation; intermediate differentials are encoded using special low-transition count patterns; and for high differentials, the absolute value (not the differential) of the data is transmitted. Compared to previous schemes, the proposed multi-level encoding results in more data being encoded as low-energy patterns. In addition, in the intermediate differential encoding mode, the differentials are encoded in an approximate manner, and the approximation bounds are proportional to the magnitude of the differentials. Since small differentials are more frequent than large differentials in sensory data, the proposed encoding scheme also minimizes quality degradation. We demonstrate that AXSERBUS achieves improved energy vs. quality tradeoffs compared to previous schemes. In the context of an optical character recognition (OCR) application, AXSERBUS achieves 79.4% reduction in dynamic power dissipation, while maintaining accuracy above 95%.
{"title":"AXSERBUS: A quality-configurable approximate serial bus for energy-efficient sensing","authors":"Younghyun Kim, Setareh Behroozi, V. Raghunathan, A. Raghunathan","doi":"10.1109/ISLPED.2017.8009172","DOIUrl":"https://doi.org/10.1109/ISLPED.2017.8009172","url":null,"abstract":"Mobile, wearable, and implantable devices integrate an increasing number and variety of sensors such as microphones, image sensors, and accelerometers. These devices spend substantial amounts of time reading the sensors within them, thereby incurring significant energy dissipation over off-chip serial interconnects. This paper proposes AXSERBUS, a quality-configurable approximate serial bus that exploits the locality of sensory data and the error resiliency of sensing applications to reduce energy dissipation. AXSERBUS significantly reduces signal transitions by encoding the differentials of sensory data in three encoding modes, depending on the magnitude of the differentials: very small differentials are zeroed out, incurring no energy dissipation; intermediate differentials are encoded using special low-transition count patterns; and for high differentials, the absolute value (not the differential) of the data is transmitted. Compared to previous schemes, the proposed multi-level encoding results in more data being encoded as low-energy patterns. In addition, in the intermediate differential encoding mode, the differentials are encoded in an approximate manner, and the approximation bounds are proportional to the magnitude of the differentials. Since small differentials are more frequent than large differentials in sensory data, the proposed encoding scheme also minimizes quality degradation. We demonstrate that AXSERBUS achieves improved energy vs. quality tradeoffs compared to previous schemes. In the context of an optical character recognition (OCR) application, AXSERBUS achieves 79.4% reduction in dynamic power dissipation, while maintaining accuracy above 95%.","PeriodicalId":385714,"journal":{"name":"2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124359149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-07-01DOI: 10.1109/ISLPED.2017.8009186
Monodeep Kar, Arvind Singh, S. Mathew, Anand Rajan, V. De, S. Mukhopadhyay
Power attack is a critical challenge to security of encryption engines. Countermeasures to side-channel attacks often come at high power, area, or performance overhead. Therefore, design of side-channel secure encryption engines is a critical challenge for power-/resource-constrained platforms. This paper discusses that although low-power need imposes critical challenge for side-channel security, but circuit techniques traditionally developed for power management also present new opportunities for side-channel resistance. As a case-study, we show the feasibility of using integrated voltage regulator, normally used for efficient power management, for increasing side-channel resistance of AES engines.
{"title":"Invited paper: Low power requirements and side-channel protection of encryption engines: Challenges and opportunities","authors":"Monodeep Kar, Arvind Singh, S. Mathew, Anand Rajan, V. De, S. Mukhopadhyay","doi":"10.1109/ISLPED.2017.8009186","DOIUrl":"https://doi.org/10.1109/ISLPED.2017.8009186","url":null,"abstract":"Power attack is a critical challenge to security of encryption engines. Countermeasures to side-channel attacks often come at high power, area, or performance overhead. Therefore, design of side-channel secure encryption engines is a critical challenge for power-/resource-constrained platforms. This paper discusses that although low-power need imposes critical challenge for side-channel security, but circuit techniques traditionally developed for power management also present new opportunities for side-channel resistance. As a case-study, we show the feasibility of using integrated voltage regulator, normally used for efficient power management, for increasing side-channel resistance of AES engines.","PeriodicalId":385714,"journal":{"name":"2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123101127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-07-01DOI: 10.1109/ISLPED.2017.8009155
Z. Azim, K. Roy
We propose a hybrid global interconnect that combines Spin-Torque (ST) sensors with differential amplifiers to greatly reduce the overall power consumption while minimizing the delay along the line. ST-sensor based interconnects have recently been proposed that show significant energy efficiency compared to conventional full swing CMOS interconnects. However, the latency of ST-sensor interconnects can be rather high due to inefficient signal regeneration along the line. As a solution, we propose the use of differential amplifiers as repeaters along with ST-sensor as receiver to speed up the interconnect delay. Moreover, the introduction of differential signaling greatly increases the robustness of the design against noise and variations. Our simulation results indicate that for a 10 mm line in 45 mm CMOS technology, the energy consumption with hybrid ST-sensor interconnect is ∼5× lower compared to full-swing CMOS interconnect while operating at similar speed. Moreover, the energy consumption is ∼2× lower compared to low-swing CMOS interconnect, in addition to significant improvement in latency.
{"title":"Spin-torque sensors with differential signaling for fast and energy efficient global interconnects","authors":"Z. Azim, K. Roy","doi":"10.1109/ISLPED.2017.8009155","DOIUrl":"https://doi.org/10.1109/ISLPED.2017.8009155","url":null,"abstract":"We propose a hybrid global interconnect that combines Spin-Torque (ST) sensors with differential amplifiers to greatly reduce the overall power consumption while minimizing the delay along the line. ST-sensor based interconnects have recently been proposed that show significant energy efficiency compared to conventional full swing CMOS interconnects. However, the latency of ST-sensor interconnects can be rather high due to inefficient signal regeneration along the line. As a solution, we propose the use of differential amplifiers as repeaters along with ST-sensor as receiver to speed up the interconnect delay. Moreover, the introduction of differential signaling greatly increases the robustness of the design against noise and variations. Our simulation results indicate that for a 10 mm line in 45 mm CMOS technology, the energy consumption with hybrid ST-sensor interconnect is ∼5× lower compared to full-swing CMOS interconnect while operating at similar speed. Moreover, the energy consumption is ∼2× lower compared to low-swing CMOS interconnect, in addition to significant improvement in latency.","PeriodicalId":385714,"journal":{"name":"2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129170081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-07-01DOI: 10.1109/ISLPED.2017.8009206
Wenbin Xu, S. Sapatnekar, Jiang Hu
Approximate computing is a promising approach for low power IC design and has recently received considerable research attention. To accommodate dynamic levels of approximation, a few accuracy configurable adder designs have been developed in the past. However, these designs tend to incur large area overheads as they rely on either redundant computing or complicated carry prediction. Some of these designs include error detection and correction circuitry, which further increases area. In this work, we investigate a simple accuracy configurable adder design that contains no redundancy or error detection/correction circuitry and uses very simple carry prediction. Simulation results show that our design dominates the latest previous work on accuracy-delay-power tradeoff while using 39% lower area. Moreover, we propose a delay-adaptive self-configuration technique to further improve accuracy-delay-power tradeoff.
{"title":"A simple yet efficient accuracy configurable adder design","authors":"Wenbin Xu, S. Sapatnekar, Jiang Hu","doi":"10.1109/ISLPED.2017.8009206","DOIUrl":"https://doi.org/10.1109/ISLPED.2017.8009206","url":null,"abstract":"Approximate computing is a promising approach for low power IC design and has recently received considerable research attention. To accommodate dynamic levels of approximation, a few accuracy configurable adder designs have been developed in the past. However, these designs tend to incur large area overheads as they rely on either redundant computing or complicated carry prediction. Some of these designs include error detection and correction circuitry, which further increases area. In this work, we investigate a simple accuracy configurable adder design that contains no redundancy or error detection/correction circuitry and uses very simple carry prediction. Simulation results show that our design dominates the latest previous work on accuracy-delay-power tradeoff while using 39% lower area. Moreover, we propose a delay-adaptive self-configuration technique to further improve accuracy-delay-power tradeoff.","PeriodicalId":385714,"journal":{"name":"2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132509012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-07-01DOI: 10.1109/ISLPED.2017.8009165
Sangyoung Park, Licong Zhang, S. Chakraborty
Recent advances in battery and drone technologies have opened up possibilities of commercial use of drones. Private companies are looking into the possibilities of using drones for commercial deliveries from the legal, technical, and economical perspective. Nevertheless, the battery management perspective of such businesses has not yet been thoroughly investigated. In this paper, we identify that battery management of such application has a major impact of the costs, and formulate an optimization problem to reduce the aging of batteries. We identify two sub-problems, battery assignment, and battery scheduling to derive a solution that minimizes the aging of the batteries. We show that the formulation enables leveraging the trade-off relationships between the packet waiting time and battery purchasing cost. The experimental results show the proposed method reduce the electricity and battery purchasing cost by 25%, and average packet waiting time by more than 50%.
{"title":"Battery assignment and scheduling for drone delivery businesses","authors":"Sangyoung Park, Licong Zhang, S. Chakraborty","doi":"10.1109/ISLPED.2017.8009165","DOIUrl":"https://doi.org/10.1109/ISLPED.2017.8009165","url":null,"abstract":"Recent advances in battery and drone technologies have opened up possibilities of commercial use of drones. Private companies are looking into the possibilities of using drones for commercial deliveries from the legal, technical, and economical perspective. Nevertheless, the battery management perspective of such businesses has not yet been thoroughly investigated. In this paper, we identify that battery management of such application has a major impact of the costs, and formulate an optimization problem to reduce the aging of batteries. We identify two sub-problems, battery assignment, and battery scheduling to derive a solution that minimizes the aging of the batteries. We show that the formulation enables leveraging the trade-off relationships between the packet waiting time and battery purchasing cost. The experimental results show the proposed method reduce the electricity and battery purchasing cost by 25%, and average packet waiting time by more than 50%.","PeriodicalId":385714,"journal":{"name":"2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132170218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-07-01DOI: 10.1109/ISLPED.2017.8009203
Sungju Ryu, Jongeun Koo, Jae-Joon Kim
The elastic clock scheme is a robust design methodology to ensure timing closure under PVT variation using locally generated clocks and handshaking protocol. However, it still has a chance of timing errors due to delay mismatch between the data-path and delay replica. In this paper, we propose a low design overhead timing error correction scheme tailored to elastic clock. In the proposed scheme, a timing error can be corrected within a cycle using clock stretching. The proposed scheme shows 40.3× and 4.6× reduction in timing margin with 9.1% and 9.0% area overhead over the synchronous baseline and elastic clock design, respectively.
{"title":"Low design overhead timing error correction scheme for elastic clock methodology","authors":"Sungju Ryu, Jongeun Koo, Jae-Joon Kim","doi":"10.1109/ISLPED.2017.8009203","DOIUrl":"https://doi.org/10.1109/ISLPED.2017.8009203","url":null,"abstract":"The elastic clock scheme is a robust design methodology to ensure timing closure under PVT variation using locally generated clocks and handshaking protocol. However, it still has a chance of timing errors due to delay mismatch between the data-path and delay replica. In this paper, we propose a low design overhead timing error correction scheme tailored to elastic clock. In the proposed scheme, a timing error can be corrected within a cycle using clock stretching. The proposed scheme shows 40.3× and 4.6× reduction in timing margin with 9.1% and 9.0% area overhead over the synchronous baseline and elastic clock design, respectively.","PeriodicalId":385714,"journal":{"name":"2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116572096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-07-01DOI: 10.1109/ISLPED.2017.8009207
Chengke Wang, Yao Guo, Peng Shen, Xiangqun Chen
Energy consumption is one of the most important aspects of mobile apps. During energy testing, it is important for developers to understand not only the energy consumption rate of an app, but also why energy is consumed. However, existing energy testing tools are more concerned about the accuracy of energy estimation, while typically not providing explanations on why and how exactly energy has been consumed. This paper presents E-Spector, an online energy inspection method for Android apps, which can not only visualize the energy consumption of an app in an instant online manner, but also can tell what happened behind each energy hotspot on the energy curve. E-Spector relies on static analysis and app instrumentation to collect the activities from an app execution in real-time. Then it presents the activities on an instant energy curve, such that the user can easily tell what happened behind each energy spike. Experimental result shows that the energy estimation error of E-Spector is less than 10% and its overhead on energy consumption is about 4%. We also show case studies to demonstrate the applicability and effectiveness of E-Spector in energy monitoring, analysis and bug inspection.
{"title":"E-Spector: Online energy inspection for Android applications","authors":"Chengke Wang, Yao Guo, Peng Shen, Xiangqun Chen","doi":"10.1109/ISLPED.2017.8009207","DOIUrl":"https://doi.org/10.1109/ISLPED.2017.8009207","url":null,"abstract":"Energy consumption is one of the most important aspects of mobile apps. During energy testing, it is important for developers to understand not only the energy consumption rate of an app, but also why energy is consumed. However, existing energy testing tools are more concerned about the accuracy of energy estimation, while typically not providing explanations on why and how exactly energy has been consumed. This paper presents E-Spector, an online energy inspection method for Android apps, which can not only visualize the energy consumption of an app in an instant online manner, but also can tell what happened behind each energy hotspot on the energy curve. E-Spector relies on static analysis and app instrumentation to collect the activities from an app execution in real-time. Then it presents the activities on an instant energy curve, such that the user can easily tell what happened behind each energy spike. Experimental result shows that the energy estimation error of E-Spector is less than 10% and its overhead on energy consumption is about 4%. We also show case studies to demonstrate the applicability and effectiveness of E-Spector in energy monitoring, analysis and bug inspection.","PeriodicalId":385714,"journal":{"name":"2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130621914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-07-01DOI: 10.1109/ISLPED.2017.8009204
M. Imani, Saransh Gupta, Atl Arredondo, T. Simunic
Today's computing systems use huge amount of energy and time to process basic queries in database. A large part of it is spent in data movement between the memory and processing cores, owing to the limited cache capacity and memory bandwidth of traditional computers. In this paper, we propose a non-volatile memory-based query accelerator, called NVQuery, which performs several basic query functions in memory including aggregation, prediction, bit-wise operations, as well as exact and nearest distance search queries. NVQuery is implemented on a content addressable memory (CAM) and exploits the analog characteristic of non-volatile memory in order to enable in-memory processing. To implement nearest distance search in memory, we introduce a novel bitline driving scheme to give weights to the indices of the bits during the search operation. Our experimental evaluation shows that, NVQuery can provide 49.3× performance speedup and 32.9× energy savings as compared to running the same query on traditional processor. In addition, compared to the state-of-the-art query accelerators, NVQuery can achieve 26.2× energy-delay product improvement while providing the similar accuracy.
{"title":"Efficient query processing in crossbar memory","authors":"M. Imani, Saransh Gupta, Atl Arredondo, T. Simunic","doi":"10.1109/ISLPED.2017.8009204","DOIUrl":"https://doi.org/10.1109/ISLPED.2017.8009204","url":null,"abstract":"Today's computing systems use huge amount of energy and time to process basic queries in database. A large part of it is spent in data movement between the memory and processing cores, owing to the limited cache capacity and memory bandwidth of traditional computers. In this paper, we propose a non-volatile memory-based query accelerator, called NVQuery, which performs several basic query functions in memory including aggregation, prediction, bit-wise operations, as well as exact and nearest distance search queries. NVQuery is implemented on a content addressable memory (CAM) and exploits the analog characteristic of non-volatile memory in order to enable in-memory processing. To implement nearest distance search in memory, we introduce a novel bitline driving scheme to give weights to the indices of the bits during the search operation. Our experimental evaluation shows that, NVQuery can provide 49.3× performance speedup and 32.9× energy savings as compared to running the same query on traditional processor. In addition, compared to the state-of-the-art query accelerators, NVQuery can achieve 26.2× energy-delay product improvement while providing the similar accuracy.","PeriodicalId":385714,"journal":{"name":"2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132515155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-07-01DOI: 10.1109/ISLPED.2017.8009198
Amir Mahdi Hosseini Monazzah, Majid Namaki-Shoushtari, S. Miremadi, A. Rahmani, N. Dutt
Emerging STT-MRAM memories are promising alternatives for SRAM memories to tackle their low density and high static power consumption, but impose high energy consumption for reliable read/write operations. However, absolute data integrity is not required for many approximate computing applications, allowing energy savings with minimal quality loss. This paper proposes QuARK, a hardware/software approach for trading reliability of STT-MRAM caches for energy savings in the on-chip memory hierarchy of multi- and many-core systems running approximate applications. In contrast to SRAM-based cache-way-level actuators, QuARK utilizes fine-grained cache-line-level actuation knobs with different levels of reliability for individual read and write accesses which are unique to STT-MRAM and suitable for systems running multiple applications with mixed accuracy sensitivity, thus avoiding interapplication actuation interference. Our experimental results with a set of recognition, mining and synthesis (RMS) benchmarks demonstrate up to 40% energy savings over a fully-protected STT-MRAM cache, with negligible loss in the quality of the generated outputs.
{"title":"QuARK: Quality-configurable approximate STT-MRAM cache by fine-grained tuning of reliability-energy knobs","authors":"Amir Mahdi Hosseini Monazzah, Majid Namaki-Shoushtari, S. Miremadi, A. Rahmani, N. Dutt","doi":"10.1109/ISLPED.2017.8009198","DOIUrl":"https://doi.org/10.1109/ISLPED.2017.8009198","url":null,"abstract":"Emerging STT-MRAM memories are promising alternatives for SRAM memories to tackle their low density and high static power consumption, but impose high energy consumption for reliable read/write operations. However, absolute data integrity is not required for many approximate computing applications, allowing energy savings with minimal quality loss. This paper proposes QuARK, a hardware/software approach for trading reliability of STT-MRAM caches for energy savings in the on-chip memory hierarchy of multi- and many-core systems running approximate applications. In contrast to SRAM-based cache-way-level actuators, QuARK utilizes fine-grained cache-line-level actuation knobs with different levels of reliability for individual read and write accesses which are unique to STT-MRAM and suitable for systems running multiple applications with mixed accuracy sensitivity, thus avoiding interapplication actuation interference. Our experimental results with a set of recognition, mining and synthesis (RMS) benchmarks demonstrate up to 40% energy savings over a fully-protected STT-MRAM cache, with negligible loss in the quality of the generated outputs.","PeriodicalId":385714,"journal":{"name":"2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132246221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}