Pub Date : 2011-10-09DOI: 10.1109/ICCD.2011.6081414
G. Becker, Ashwin Lakshminarasimhan, Lang Lin, Sudheendra K. Srivathsa, Vikram B. Suresh, W. Burleson
Hardware Trojans have become a growing concern in the design of secure integrated circuits. In this work, we present a set of novel hardware Trojans aimed at evading detection methods, designed as part of the CSAW Embedded System Challenge 2010. We introduced and implemented unique Trojans based on side-channel analysis that leak the secret key in the reference encryption algorithm. These side-channel-based Trojans do not impact the functionality of the design to minimize the possibility of detection. We have demonstrated the statistical analysis approach to attack such Trojans. Besides, we introduced Trojans that modify either the functional behavior or the electrical characteristics of the reference design. Novel techniques such as a Trojan draining the battery of a device do not have an immediate impact and hence avoid detection, but affect the long term reliability of the system.
{"title":"Implementing hardware Trojans: Experiences from a hardware Trojan challenge","authors":"G. Becker, Ashwin Lakshminarasimhan, Lang Lin, Sudheendra K. Srivathsa, Vikram B. Suresh, W. Burleson","doi":"10.1109/ICCD.2011.6081414","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081414","url":null,"abstract":"Hardware Trojans have become a growing concern in the design of secure integrated circuits. In this work, we present a set of novel hardware Trojans aimed at evading detection methods, designed as part of the CSAW Embedded System Challenge 2010. We introduced and implemented unique Trojans based on side-channel analysis that leak the secret key in the reference encryption algorithm. These side-channel-based Trojans do not impact the functionality of the design to minimize the possibility of detection. We have demonstrated the statistical analysis approach to attack such Trojans. Besides, we introduced Trojans that modify either the functional behavior or the electrical characteristics of the reference design. Novel techniques such as a Trojan draining the battery of a device do not have an immediate impact and hence avoid detection, but affect the long term reliability of the system.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116108934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-10-09DOI: 10.1109/ICCD.2011.6081393
Brad K. Donohoo, Chris Ohlsen, S. Pasricha
Mobile battery-operated devices are becoming an essential instrument for business, communication, and social interaction. In addition to the demand for an acceptable level of performance and a comprehensive set of features, users often desire extended battery lifetime. In fact, limited battery lifetime is one of the biggest obstacles facing the current utility and future growth of increasingly sophisticated “smart” mobile devices. This paper proposes a novel application-aware and user-interaction aware energy optimization middleware framework (AURA) for pervasive mobile devices. AURA optimizes CPU and screen backlight energy consumption while maintaining a minimum acceptable level of performance. The proposed framework employs a novel Bayesian application classifier and management strategies based on Markov Decision Processes to achieve energy savings. Real-world user evaluation studies on a Google Android based HTC Dream smartphone running the AURA framework demonstrate promising results, with up to 24% energy savings compared to the baseline device manager, and up to 5× savings over prior work on CPU and backlight energy co-optimization.
{"title":"AURA: An application and user interaction aware middleware framework for energy optimization in mobile devices","authors":"Brad K. Donohoo, Chris Ohlsen, S. Pasricha","doi":"10.1109/ICCD.2011.6081393","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081393","url":null,"abstract":"Mobile battery-operated devices are becoming an essential instrument for business, communication, and social interaction. In addition to the demand for an acceptable level of performance and a comprehensive set of features, users often desire extended battery lifetime. In fact, limited battery lifetime is one of the biggest obstacles facing the current utility and future growth of increasingly sophisticated “smart” mobile devices. This paper proposes a novel application-aware and user-interaction aware energy optimization middleware framework (AURA) for pervasive mobile devices. AURA optimizes CPU and screen backlight energy consumption while maintaining a minimum acceptable level of performance. The proposed framework employs a novel Bayesian application classifier and management strategies based on Markov Decision Processes to achieve energy savings. Real-world user evaluation studies on a Google Android based HTC Dream smartphone running the AURA framework demonstrate promising results, with up to 24% energy savings compared to the baseline device manager, and up to 5× savings over prior work on CPU and backlight energy co-optimization.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"38 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129525155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-10-09DOI: 10.1109/ICCD.2011.6081368
D. Kaeli, David Akodes
Recently we have seen two exciting trends that have been flooding the market: 1) the movement of graphics processing units into mainstream general-purpose platforms, and 2) the movement of multi-core embedded systems into tablet computing and smartphone spaces. These trends are forcing application developers to rethink how they are going to best utilize these many-core and multi-core heterogeneous platforms to provide new levels of cost/performance/power in a range of emerging application domains.
{"title":"The convergence of HPC and embedded systems in our heterogeneous computing future","authors":"D. Kaeli, David Akodes","doi":"10.1109/ICCD.2011.6081368","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081368","url":null,"abstract":"Recently we have seen two exciting trends that have been flooding the market: 1) the movement of graphics processing units into mainstream general-purpose platforms, and 2) the movement of multi-core embedded systems into tablet computing and smartphone spaces. These trends are forcing application developers to rethink how they are going to best utilize these many-core and multi-core heterogeneous platforms to provide new levels of cost/performance/power in a range of emerging application domains.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"2004 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129635801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-10-09DOI: 10.1109/ICCD.2011.6081447
Domenic Forte, Ankur Srivastava
As the popularity of video streaming over the Internet grows, energy consumption in video server environments which store and retrieve video data increases as well. Previous work has shown that video quality delivered to clients can be scaled in order to serve more concurrent video requests and/or reduce energy consumption of server disks. We propose a data placement strategy for such quality scaling methods which distributes video data within a disk based on its priority/importance. Results show that in doing so the disk can retrieve data with greater efficiency and serve lower quality video to more clients than previously investigated strategies.
{"title":"Energy-aware and quality-scalable data placement and retrieval for disks in video server environments","authors":"Domenic Forte, Ankur Srivastava","doi":"10.1109/ICCD.2011.6081447","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081447","url":null,"abstract":"As the popularity of video streaming over the Internet grows, energy consumption in video server environments which store and retrieve video data increases as well. Previous work has shown that video quality delivered to clients can be scaled in order to serve more concurrent video requests and/or reduce energy consumption of server disks. We propose a data placement strategy for such quality scaling methods which distributes video data within a disk based on its priority/importance. Results show that in doing so the disk can retrieve data with greater efficiency and serve lower quality video to more clients than previously investigated strategies.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128870596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-10-09DOI: 10.1109/ICCD.2011.6081417
Kyohei Yamaguchi, Yuya Kora, H. Ando
The tradeoff between complexity and attained instructions per cycle is often an important issue in microarchitectural designs. In this design phase, quick quantification of the complexity (i.e., delay) of relevant structures is required. The issue queue is one of such complex structures for which it is difficult to estimate delay. In this paper, we evaluate the issue queue delay to aid microarchitectural design. Our study includes two features: a circuit design and evaluation. First, we introduce banking the tag RAM, which is one of the components comprising the issue queue, to reduce the delay. Unlike normal RAM, banking the tag RAM is not straightforward, because of its uniqueness in the organization of the issue queue. Second, we explore and identify a correct critical path in the issue queue. A previous study summed the critical path of each component in the issue queue to obtain the delay of the issue queue, but this does not provide the correct delay of the issue queue, because the critical paths of each component are not connected logically. In the evaluation assuming 32nm LSI technology, we obtained the delays of an issue queue with eight to 128 entries. The process of banking the tag RAM and identifying the correct critical path reduces the delay by up to 20%, compared with not banking the tag RAM and simply summing the critical path delay of each component.
{"title":"Evaluation of issue queue delay: Banking tag RAM and identifying correct critical path","authors":"Kyohei Yamaguchi, Yuya Kora, H. Ando","doi":"10.1109/ICCD.2011.6081417","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081417","url":null,"abstract":"The tradeoff between complexity and attained instructions per cycle is often an important issue in microarchitectural designs. In this design phase, quick quantification of the complexity (i.e., delay) of relevant structures is required. The issue queue is one of such complex structures for which it is difficult to estimate delay. In this paper, we evaluate the issue queue delay to aid microarchitectural design. Our study includes two features: a circuit design and evaluation. First, we introduce banking the tag RAM, which is one of the components comprising the issue queue, to reduce the delay. Unlike normal RAM, banking the tag RAM is not straightforward, because of its uniqueness in the organization of the issue queue. Second, we explore and identify a correct critical path in the issue queue. A previous study summed the critical path of each component in the issue queue to obtain the delay of the issue queue, but this does not provide the correct delay of the issue queue, because the critical paths of each component are not connected logically. In the evaluation assuming 32nm LSI technology, we obtained the delays of an issue queue with eight to 128 entries. The process of banking the tag RAM and identifying the correct critical path reduces the delay by up to 20%, compared with not banking the tag RAM and simply summing the critical path delay of each component.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"743 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122915482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-10-09DOI: 10.1109/ICCD.2011.6081431
O. Khan, H. Hoffmann, Mieszko Lis, Farrukh Hijaz, A. Agarwal, S. Devadas
This paper proposes an architecturally redundant cache-coherence architecture (ARCc) that combines the directory and shared-NUCA based coherence protocols to improve performance, energy and dependability. Both coherence mechanisms co-exist in the hardware and ARCc enables seamless transition between the two protocols. We present an online analytical model implemented in the hardware that predicts performance and triggers a transition between the two coherence protocols at application-level granularity. The ARCc architecture delivers up to 1.6× higher performance and up to 1.5× lower energy consumption compared to the directory-based counterpart. It does so by identifying applications which benefit from the large shared cache capacity of shared-NUCA because of lower off-chip accesses, or where remote-cache word accesses are efficient.
{"title":"ARCc: A case for an architecturally redundant cache-coherence architecture for large multicores","authors":"O. Khan, H. Hoffmann, Mieszko Lis, Farrukh Hijaz, A. Agarwal, S. Devadas","doi":"10.1109/ICCD.2011.6081431","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081431","url":null,"abstract":"This paper proposes an architecturally redundant cache-coherence architecture (ARCc) that combines the directory and shared-NUCA based coherence protocols to improve performance, energy and dependability. Both coherence mechanisms co-exist in the hardware and ARCc enables seamless transition between the two protocols. We present an online analytical model implemented in the hardware that predicts performance and triggers a transition between the two coherence protocols at application-level granularity. The ARCc architecture delivers up to 1.6× higher performance and up to 1.5× lower energy consumption compared to the directory-based counterpart. It does so by identifying applications which benefit from the large shared cache capacity of shared-NUCA because of lower off-chip accesses, or where remote-cache word accesses are efficient.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130860293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-10-09DOI: 10.1109/ICCD.2011.6081377
Jin-Tai Yan, Zhi-Wei Chen
Given a set of IO connections between IO buffers and bump balls in a re-distribution routing layer, an efficient router is proposed to route all the IO connections for pre-assignment RDL routing in an area-IO flip-chip design. Based on the simplification of net renumbering and the extraction of the maximal net sequence for all the IO connections, all the connections can be firstly divided into local and global connections. After routing the global wires of all the local connections, the global wires of all the global connections are further assigned under the capacity constraint for RDL global routing. Finally, the global wires of all the IO connections are routed for RDL detailed routing by assigning feasible crossing points and physical paths. The experimental results show that our proposed pre-assignment RDL router can maintain 100% routability in 7 tested industrial circuits. Compared with Yan's pre-assignment RDL router[4] in total wirelength and CPU time, our proposed approach saves 3.7% of total wirelength and 27.0% of CPU time on the average.
{"title":"Pre-assignment RDL routing via extraction of maximal net sequence","authors":"Jin-Tai Yan, Zhi-Wei Chen","doi":"10.1109/ICCD.2011.6081377","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081377","url":null,"abstract":"Given a set of IO connections between IO buffers and bump balls in a re-distribution routing layer, an efficient router is proposed to route all the IO connections for pre-assignment RDL routing in an area-IO flip-chip design. Based on the simplification of net renumbering and the extraction of the maximal net sequence for all the IO connections, all the connections can be firstly divided into local and global connections. After routing the global wires of all the local connections, the global wires of all the global connections are further assigned under the capacity constraint for RDL global routing. Finally, the global wires of all the IO connections are routed for RDL detailed routing by assigning feasible crossing points and physical paths. The experimental results show that our proposed pre-assignment RDL router can maintain 100% routability in 7 tested industrial circuits. Compared with Yan's pre-assignment RDL router[4] in total wirelength and CPU time, our proposed approach saves 3.7% of total wirelength and 27.0% of CPU time on the average.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125237968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-10-09DOI: 10.1109/ICCD.2011.6081373
Kai Ma, Xiaorui Wang, Yefu Wang
A key challenge in chip multiprocessor (CMP) design is to optimize the performance within a power budget limited by the CMP's cooling, packaging, and power supply capacities. Most existing solutions rely solely on DVFS to adapt the power consumption of CPU cores, without coordinating with the last-level on-chip (e.g., L2) cache. This paper proposes DPPC, a chip-level power partitioning and capping strategy that can dynamically and explicitly partition the chip-level power budget among different CPU cores and the shared last-level cache in a CMP based on the workload characteristics measured online. DPPC features a novel performance-power model and an online model estimator to quantitatively estimate the performance contributed by each core and the cache with their respective local power budgets. DPPC then re-partitions the chip-level power budget among them for optimized CMP performance. The partitioned local power budgets for the CPU cores and cache are precisely enforced by power capping algorithms designed rigorously based on feedback control theory. Our experimental results demonstrate that DPPC achieves better CMP performance, within a given power budget, than several state-of-the-art power capping solutions.
{"title":"DPPC: Dynamic power partitioning and capping in chip multiprocessors","authors":"Kai Ma, Xiaorui Wang, Yefu Wang","doi":"10.1109/ICCD.2011.6081373","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081373","url":null,"abstract":"A key challenge in chip multiprocessor (CMP) design is to optimize the performance within a power budget limited by the CMP's cooling, packaging, and power supply capacities. Most existing solutions rely solely on DVFS to adapt the power consumption of CPU cores, without coordinating with the last-level on-chip (e.g., L2) cache. This paper proposes DPPC, a chip-level power partitioning and capping strategy that can dynamically and explicitly partition the chip-level power budget among different CPU cores and the shared last-level cache in a CMP based on the workload characteristics measured online. DPPC features a novel performance-power model and an online model estimator to quantitatively estimate the performance contributed by each core and the cache with their respective local power budgets. DPPC then re-partitions the chip-level power budget among them for optimized CMP performance. The partitioned local power budgets for the CPU cores and cache are precisely enforced by power capping algorithms designed rigorously based on feedback control theory. Our experimental results demonstrate that DPPC achieves better CMP performance, within a given power budget, than several state-of-the-art power capping solutions.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134525516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-10-09DOI: 10.1109/ICCD.2011.6081379
Minoru Iizuka, N. Hamada, H. Saito, R. Yamaguchi, Minoru Yoshinaga
This paper proposes a tool set for the design of asynchronous circuits with bundled-data implementation. Using the proposed tool set with commercial CAD tools, asynchronous circuits with bundled-data implementation can be designed easily. Through the experiments, this paper evaluates synthesized circuits using the proposed tool set in terms of area, performance, power consumption, and energy consumption comparing with synchronous counterparts.
{"title":"A tool set for the design of asynchronous circuits with bundled-data implementation","authors":"Minoru Iizuka, N. Hamada, H. Saito, R. Yamaguchi, Minoru Yoshinaga","doi":"10.1109/ICCD.2011.6081379","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081379","url":null,"abstract":"This paper proposes a tool set for the design of asynchronous circuits with bundled-data implementation. Using the proposed tool set with commercial CAD tools, asynchronous circuits with bundled-data implementation can be designed easily. Through the experiments, this paper evaluates synthesized circuits using the proposed tool set in terms of area, performance, power consumption, and energy consumption comparing with synchronous counterparts.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131361681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-10-09DOI: 10.1109/ICCD.2011.6081403
S. Rethinagiri, R. B. Atitallah, S. Niar, E. Senn, J. Dekeyser
This paper proposes an efficient Hybrid System Level (HSL) power estimation methodology for FPGA-based MPSoC. Within this methodology, the Functional Level Power Analysis (FLPA) is extended to set up generic power models for the different parts of the system. Then, a simulation framework is developed at the transactional level to evaluate accurately the activities used in the related power models. The combination of the above two parts lead to a hybrid power estimation that gives a better trade-off between accuracy and speed. The proposed methodology has several benefits: it considers the power consumption of the embedded system in its entirety and leads to accurate estimates without a costly and complex material. The proposed methodology is also scalable for exploring complex embedded architectures. The usefulness and effectiveness of our HSL methodology is validated through a typical mono-processor and multiprocessor embedded system designed around the Xilinx Virtex II Pro FPGA board. Our experiments performed on an explicit embedded platform show that the obtained power estimation results are less than 1.2% of error when compared to the real board measurements and faster compared to other power estimation tools.
本文提出了一种高效的基于fpga的MPSoC混合系统级(HSL)功率估计方法。在这种方法中,功能级功率分析(FLPA)被扩展到为系统的不同部分建立通用功率模型。然后,在事务级开发了仿真框架,以准确评估相关权力模型中使用的活动。上述两部分的结合导致混合功率估计,在准确性和速度之间提供了更好的权衡。所提出的方法有几个好处:它考虑了嵌入式系统的整体功耗,并在没有昂贵和复杂材料的情况下得出准确的估计。所提出的方法对于探索复杂的嵌入式体系结构也是可扩展的。通过围绕Xilinx Virtex II Pro FPGA板设计的典型单处理器和多处理器嵌入式系统,验证了HSL方法的实用性和有效性。我们在显式嵌入式平台上进行的实验表明,与实际电路板测量结果相比,获得的功率估计结果误差小于1.2%,与其他功率估计工具相比,速度更快。
{"title":"Hybrid system level power consumption estimation for FPGA-based MPSoC","authors":"S. Rethinagiri, R. B. Atitallah, S. Niar, E. Senn, J. Dekeyser","doi":"10.1109/ICCD.2011.6081403","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081403","url":null,"abstract":"This paper proposes an efficient Hybrid System Level (HSL) power estimation methodology for FPGA-based MPSoC. Within this methodology, the Functional Level Power Analysis (FLPA) is extended to set up generic power models for the different parts of the system. Then, a simulation framework is developed at the transactional level to evaluate accurately the activities used in the related power models. The combination of the above two parts lead to a hybrid power estimation that gives a better trade-off between accuracy and speed. The proposed methodology has several benefits: it considers the power consumption of the embedded system in its entirety and leads to accurate estimates without a costly and complex material. The proposed methodology is also scalable for exploring complex embedded architectures. The usefulness and effectiveness of our HSL methodology is validated through a typical mono-processor and multiprocessor embedded system designed around the Xilinx Virtex II Pro FPGA board. Our experiments performed on an explicit embedded platform show that the obtained power estimation results are less than 1.2% of error when compared to the real board measurements and faster compared to other power estimation tools.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114269060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}