Pub Date : 2011-10-09DOI: 10.1109/ICCD.2011.6081371
Minjeong Shin, John Kim
On-chip networks are becoming more important as the number of on-chip components continue to increase. 2D mesh topology is a commonly assumed topology for on-chip networks but in this work, we make the argument that 2D torus can provide a more cost-efficient on-chip network since the on-chip network datapath is reduced by 2× while providing the same bisection bandwidth as a mesh network. Our results show that 2D torus can achieve an improvement of up to 1.9× over a 2D mesh in performance per watt metric. However, routing deadlock can occur in a torus network with the wrap-around channel and requires additional virtual channels for deadlock avoidance. In this work, we propose deadlock recovery with tokens (DRT) in on-chip networks that exploits on-chip networks - exploiting the abundant wires available while minimizing the need for additional buffers. As a result, deadlocks can be exactly detected without having to rely on a timeout mechanism and when needed, recover from the deadlock. We show how DRT results in minimal loss in performance, compared with deadlock avoidance using virtual channels, while reducing the on-chip network complexity.
{"title":"Leveraging torus topology with deadlock recovery for cost-efficient on-chip network","authors":"Minjeong Shin, John Kim","doi":"10.1109/ICCD.2011.6081371","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081371","url":null,"abstract":"On-chip networks are becoming more important as the number of on-chip components continue to increase. 2D mesh topology is a commonly assumed topology for on-chip networks but in this work, we make the argument that 2D torus can provide a more cost-efficient on-chip network since the on-chip network datapath is reduced by 2× while providing the same bisection bandwidth as a mesh network. Our results show that 2D torus can achieve an improvement of up to 1.9× over a 2D mesh in performance per watt metric. However, routing deadlock can occur in a torus network with the wrap-around channel and requires additional virtual channels for deadlock avoidance. In this work, we propose deadlock recovery with tokens (DRT) in on-chip networks that exploits on-chip networks - exploiting the abundant wires available while minimizing the need for additional buffers. As a result, deadlocks can be exactly detected without having to rely on a timeout mechanism and when needed, recover from the deadlock. We show how DRT results in minimal loss in performance, compared with deadlock avoidance using virtual channels, while reducing the on-chip network complexity.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"238 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116176321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-10-09DOI: 10.1109/ICCD.2011.6081411
Justin Rilling, David Graziano, Jamin Hitchcock, Tim Meyer, Xinying Wang, Phillip H. Jones, Joseph Zambreno
Ring oscillators are commonly used as a locking mechanism that binds a hardware design to a specific area of silicon within an integrated circuit (IC). This locking mechanism can be used to detect malicious modifications to the hardware design, also known as a hardware Trojan, in situations where such modifications result in a change to the physical placement of the design on the IC. However, careful consideration is needed when designing ring oscillators for such a scenario to guarantee the integrity of the locking mechanism. This paper presents a case study in which flaws discovered in a ring oscillator-based Trojan detection scheme allowed for the circumvention of the security mechanism and the implementation of a large and diverse set of hardware Trojans, limited only by hardware resources.
{"title":"Circumventing a ring oscillator approach to FPGA-based hardware Trojan detection","authors":"Justin Rilling, David Graziano, Jamin Hitchcock, Tim Meyer, Xinying Wang, Phillip H. Jones, Joseph Zambreno","doi":"10.1109/ICCD.2011.6081411","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081411","url":null,"abstract":"Ring oscillators are commonly used as a locking mechanism that binds a hardware design to a specific area of silicon within an integrated circuit (IC). This locking mechanism can be used to detect malicious modifications to the hardware design, also known as a hardware Trojan, in situations where such modifications result in a change to the physical placement of the design on the IC. However, careful consideration is needed when designing ring oscillators for such a scenario to guarantee the integrity of the locking mechanism. This paper presents a case study in which flaws discovered in a ring oscillator-based Trojan detection scheme allowed for the circumvention of the security mechanism and the implementation of a large and diverse set of hardware Trojans, limited only by hardware resources.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128988609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-10-09DOI: 10.1109/ICCD.2011.6081443
Vish Ganti, H. Mahmoodi
Clock distribution network is an important part of digital integrated circuits. The clock signal carried by the distribution network has to reach every end node at the same time to ensure synchronized switching. Due to mismatches among different nodes of the H-tree, the clock transitions among the final nodes of the distribution tree show some time difference, the maximum of which is called clock skew. In modern CMOS technologies, copper interconnect is popular for high level interconnects such as clock and power routing. Carbon Nanotube (CNT) exhibits less resistivity than copper making it a better material for interconnect. This paper compares the impact on clock skew of H-tree clock distribution network by replacing the traditional copper interconnects with carbon nanotube interconnects. By applying temperature mismatch, threshold voltage mismatch, and process mismatch, our findings show that using carbon nanotube interconnects reduces the clock skew significantly compared to traditional copper interconnects.
{"title":"Comparative analysis of copper and CNT interconnects for H-tree clock distribution","authors":"Vish Ganti, H. Mahmoodi","doi":"10.1109/ICCD.2011.6081443","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081443","url":null,"abstract":"Clock distribution network is an important part of digital integrated circuits. The clock signal carried by the distribution network has to reach every end node at the same time to ensure synchronized switching. Due to mismatches among different nodes of the H-tree, the clock transitions among the final nodes of the distribution tree show some time difference, the maximum of which is called clock skew. In modern CMOS technologies, copper interconnect is popular for high level interconnects such as clock and power routing. Carbon Nanotube (CNT) exhibits less resistivity than copper making it a better material for interconnect. This paper compares the impact on clock skew of H-tree clock distribution network by replacing the traditional copper interconnects with carbon nanotube interconnects. By applying temperature mismatch, threshold voltage mismatch, and process mismatch, our findings show that using carbon nanotube interconnects reduces the clock skew significantly compared to traditional copper interconnects.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122802555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-10-09DOI: 10.1109/ICCD.2011.6081419
F. Moradi, G. Panagopoulos, G. Karakonstantis, D. Wisland, H. Mahmoodi, J. K. Madsen, K. Roy
In this paper, a multi-level wordline driver scheme is presented to improve SRAM read and write stability while lowering power consumption during hold operation. The proposed circuit applies a shaped wordline voltage pulse during read mode and a boosted wordline pulse during write mode. During read, the applied shaped pulse is tuned at nominal voltage for short period of time, whereas for the remaining access time, the wordline voltage is reduced to a lower level. This pulse results in improved read noise margin without any degradation in access time which is explained by examining the dynamic and nonlinear behavior of the SRAM cell. Furthermore, during hold mode, the wordline voltage starts from a negative value and reaches zero voltage, resulting in a lower leakage current compared to conventional SRAM. Our simulations using TSMC 65nm process show that the proposed wordline driver results in 2X improvement in static read noise margin while the write margin is improved by 3X. In addition, the total leakage of the proposed SRAM is reduced by 10% while the total power is improved by 12% in the worst case scenario of a single SRAM cell. The total area penalty is 10% for a 128Kb standard SRAM array.
{"title":"Multi-level wordline driver for low power SRAMs in nano-scale CMOS technology","authors":"F. Moradi, G. Panagopoulos, G. Karakonstantis, D. Wisland, H. Mahmoodi, J. K. Madsen, K. Roy","doi":"10.1109/ICCD.2011.6081419","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081419","url":null,"abstract":"In this paper, a multi-level wordline driver scheme is presented to improve SRAM read and write stability while lowering power consumption during hold operation. The proposed circuit applies a shaped wordline voltage pulse during read mode and a boosted wordline pulse during write mode. During read, the applied shaped pulse is tuned at nominal voltage for short period of time, whereas for the remaining access time, the wordline voltage is reduced to a lower level. This pulse results in improved read noise margin without any degradation in access time which is explained by examining the dynamic and nonlinear behavior of the SRAM cell. Furthermore, during hold mode, the wordline voltage starts from a negative value and reaches zero voltage, resulting in a lower leakage current compared to conventional SRAM. Our simulations using TSMC 65nm process show that the proposed wordline driver results in 2X improvement in static read noise margin while the write margin is improved by 3X. In addition, the total leakage of the proposed SRAM is reduced by 10% while the total power is improved by 12% in the worst case scenario of a single SRAM cell. The total area penalty is 10% for a 128Kb standard SRAM array.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117095140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-10-09DOI: 10.1109/ICCD.2011.6081427
H. Lee, Seungcheol Baek, C. Nicopoulos, Jongman Kim
The last few years have witnessed the emergence of a promising new memory technology. Phase-Change Memory (PCM) is increasingly viewed as an attractive alternative for the memory sub-system of future microprocessor architectures, mainly because of its inherent ability to scale deeply into the nanoscale regime, and its low power consumption. However, PCM's write performance is its Achilles' heel, especially when compared to the prevalent DRAM technology. This weakness necessitates the deployment of hybridized solutions that fuse DRAM and PCM, in order to attain high overall system performance. In this paper, we set out to explore how various DRAM/PCM hybrid configurations affect system performance and energy consumption, and then proceed with the presentation of a novel architecture that maximizes performance without adversely affecting power efficiency. An energy-delay product improvement of 42.2%, on average, over conventional hybrid structures, is demonstrated.
{"title":"An energy- and performance-aware DRAM cache architecture for hybrid DRAM/PCM main memory systems","authors":"H. Lee, Seungcheol Baek, C. Nicopoulos, Jongman Kim","doi":"10.1109/ICCD.2011.6081427","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081427","url":null,"abstract":"The last few years have witnessed the emergence of a promising new memory technology. Phase-Change Memory (PCM) is increasingly viewed as an attractive alternative for the memory sub-system of future microprocessor architectures, mainly because of its inherent ability to scale deeply into the nanoscale regime, and its low power consumption. However, PCM's write performance is its Achilles' heel, especially when compared to the prevalent DRAM technology. This weakness necessitates the deployment of hybridized solutions that fuse DRAM and PCM, in order to attain high overall system performance. In this paper, we set out to explore how various DRAM/PCM hybrid configurations affect system performance and energy consumption, and then proceed with the presentation of a novel architecture that maximizes performance without adversely affecting power efficiency. An energy-delay product improvement of 42.2%, on average, over conventional hybrid structures, is demonstrated.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133304579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-10-09DOI: 10.1109/ICCD.2011.6081369
Jean-Michel Chabloz, A. Hemani
GALS Networks-on-Chip (NoCs) in which the frequency of every switch can be set independently would enable per-node DVFS without requiring asynchronous switch design. However, traditional GALS interfaces introduce high latency penalties and are therefore ill-suited for inter-switch links in a NoC. In this paper we introduce and study a GALS Network-on-Chip based on the Globally-Ratiochronous, Locally-Synchronous (GRLS) paradigm. GRLS constrains all switch frequencies to be rationally-related but enables the use of efficient interfaces which reduce the latency of the network 60% compared to GALS solutions and obtains better throughput-per-power ratios compared to synchronous and mesochronous solutions.
{"title":"A GALS Network-on-Chip based on rationally-related frequencies","authors":"Jean-Michel Chabloz, A. Hemani","doi":"10.1109/ICCD.2011.6081369","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081369","url":null,"abstract":"GALS Networks-on-Chip (NoCs) in which the frequency of every switch can be set independently would enable per-node DVFS without requiring asynchronous switch design. However, traditional GALS interfaces introduce high latency penalties and are therefore ill-suited for inter-switch links in a NoC. In this paper we introduce and study a GALS Network-on-Chip based on the Globally-Ratiochronous, Locally-Synchronous (GRLS) paradigm. GRLS constrains all switch frequencies to be rationally-related but enables the use of efficient interfaces which reduce the latency of the network 60% compared to GALS solutions and obtains better throughput-per-power ratios compared to synchronous and mesochronous solutions.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133402449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-10-09DOI: 10.1109/ICCD.2011.6081458
V. Nandakumar, M. Marek-Sadowska
Long wires degrade significantly the performance of network-on-chip (NoC) communication fabric in large multicore processors. 3D network-on-chip architecture alleviates the problem of long wires, but practical limitations of CMOS technology restrict such structures to two active layers only. In this work, we study a heterogeneous 3D chip with processor cores and cache blocks implemented in CMOS and NoC fabric in VeSFET tech-nology. Such a 3D architecture shows significant improvements in all network parameters including latency, power and energy consumption compared to existing 3D NoCs.
{"title":"Low power, high throughput network-on-chip fabric for 3D multicore processors","authors":"V. Nandakumar, M. Marek-Sadowska","doi":"10.1109/ICCD.2011.6081458","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081458","url":null,"abstract":"Long wires degrade significantly the performance of network-on-chip (NoC) communication fabric in large multicore processors. 3D network-on-chip architecture alleviates the problem of long wires, but practical limitations of CMOS technology restrict such structures to two active layers only. In this work, we study a heterogeneous 3D chip with processor cores and cache blocks implemented in CMOS and NoC fabric in VeSFET tech-nology. Such a 3D architecture shows significant improvements in all network parameters including latency, power and energy consumption compared to existing 3D NoCs.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124249493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Phase-change memory (PCM) is one of the most promising technologies among emerging non-volatile memories. Recently, the technology of multi-level cell (MLC) for PCM has been developed and a high capacity memory system can be implemented by storing multiple bits in a cell. However, programming MLC PCM involves the program-and-verify scheme. Thus, the energy of programming intermediate states in MLC PCM is considerably larger than that of single-level cell (SLC) PCM. To mitigate the MLC energy overhead, we propose an energy-efficient PCM architecture using data encoding write based on the observation that there are significant value-dependent energy variations in programming MLC PCM. In addition, data comparison write (DCW) is adopted to enhance the effectiveness of the proposed data encoding architecture for MLC PCM. Simulation results show that this encoding architecture achieves 9.6% average energy saving (up to 19.8%) on the plain MLC PCM system, and 12.9% average energy saving (up to 26.7%) on the DCW-adopted MLC PCM system1.
{"title":"Energy-efficient multi-level cell phase-change memory system with data encoding","authors":"Jue Wang, Xiangyu Dong, Guangyu Sun, Dimin Niu, Yuan Xie","doi":"10.1109/ICCD.2011.6081394","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081394","url":null,"abstract":"Phase-change memory (PCM) is one of the most promising technologies among emerging non-volatile memories. Recently, the technology of multi-level cell (MLC) for PCM has been developed and a high capacity memory system can be implemented by storing multiple bits in a cell. However, programming MLC PCM involves the program-and-verify scheme. Thus, the energy of programming intermediate states in MLC PCM is considerably larger than that of single-level cell (SLC) PCM. To mitigate the MLC energy overhead, we propose an energy-efficient PCM architecture using data encoding write based on the observation that there are significant value-dependent energy variations in programming MLC PCM. In addition, data comparison write (DCW) is adopted to enhance the effectiveness of the proposed data encoding architecture for MLC PCM. Simulation results show that this encoding architecture achieves 9.6% average energy saving (up to 19.8%) on the plain MLC PCM system, and 12.9% average energy saving (up to 26.7%) on the DCW-adopted MLC PCM system1.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114814205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-10-09DOI: 10.1109/ICCD.2011.6081431
O. Khan, H. Hoffmann, Mieszko Lis, Farrukh Hijaz, A. Agarwal, S. Devadas
This paper proposes an architecturally redundant cache-coherence architecture (ARCc) that combines the directory and shared-NUCA based coherence protocols to improve performance, energy and dependability. Both coherence mechanisms co-exist in the hardware and ARCc enables seamless transition between the two protocols. We present an online analytical model implemented in the hardware that predicts performance and triggers a transition between the two coherence protocols at application-level granularity. The ARCc architecture delivers up to 1.6× higher performance and up to 1.5× lower energy consumption compared to the directory-based counterpart. It does so by identifying applications which benefit from the large shared cache capacity of shared-NUCA because of lower off-chip accesses, or where remote-cache word accesses are efficient.
{"title":"ARCc: A case for an architecturally redundant cache-coherence architecture for large multicores","authors":"O. Khan, H. Hoffmann, Mieszko Lis, Farrukh Hijaz, A. Agarwal, S. Devadas","doi":"10.1109/ICCD.2011.6081431","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081431","url":null,"abstract":"This paper proposes an architecturally redundant cache-coherence architecture (ARCc) that combines the directory and shared-NUCA based coherence protocols to improve performance, energy and dependability. Both coherence mechanisms co-exist in the hardware and ARCc enables seamless transition between the two protocols. We present an online analytical model implemented in the hardware that predicts performance and triggers a transition between the two coherence protocols at application-level granularity. The ARCc architecture delivers up to 1.6× higher performance and up to 1.5× lower energy consumption compared to the directory-based counterpart. It does so by identifying applications which benefit from the large shared cache capacity of shared-NUCA because of lower off-chip accesses, or where remote-cache word accesses are efficient.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130860293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-10-09DOI: 10.1109/ICCD.2011.6081393
Brad K. Donohoo, Chris Ohlsen, S. Pasricha
Mobile battery-operated devices are becoming an essential instrument for business, communication, and social interaction. In addition to the demand for an acceptable level of performance and a comprehensive set of features, users often desire extended battery lifetime. In fact, limited battery lifetime is one of the biggest obstacles facing the current utility and future growth of increasingly sophisticated “smart” mobile devices. This paper proposes a novel application-aware and user-interaction aware energy optimization middleware framework (AURA) for pervasive mobile devices. AURA optimizes CPU and screen backlight energy consumption while maintaining a minimum acceptable level of performance. The proposed framework employs a novel Bayesian application classifier and management strategies based on Markov Decision Processes to achieve energy savings. Real-world user evaluation studies on a Google Android based HTC Dream smartphone running the AURA framework demonstrate promising results, with up to 24% energy savings compared to the baseline device manager, and up to 5× savings over prior work on CPU and backlight energy co-optimization.
{"title":"AURA: An application and user interaction aware middleware framework for energy optimization in mobile devices","authors":"Brad K. Donohoo, Chris Ohlsen, S. Pasricha","doi":"10.1109/ICCD.2011.6081393","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081393","url":null,"abstract":"Mobile battery-operated devices are becoming an essential instrument for business, communication, and social interaction. In addition to the demand for an acceptable level of performance and a comprehensive set of features, users often desire extended battery lifetime. In fact, limited battery lifetime is one of the biggest obstacles facing the current utility and future growth of increasingly sophisticated “smart” mobile devices. This paper proposes a novel application-aware and user-interaction aware energy optimization middleware framework (AURA) for pervasive mobile devices. AURA optimizes CPU and screen backlight energy consumption while maintaining a minimum acceptable level of performance. The proposed framework employs a novel Bayesian application classifier and management strategies based on Markov Decision Processes to achieve energy savings. Real-world user evaluation studies on a Google Android based HTC Dream smartphone running the AURA framework demonstrate promising results, with up to 24% energy savings compared to the baseline device manager, and up to 5× savings over prior work on CPU and backlight energy co-optimization.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"38 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129525155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}