首页 > 最新文献

2011 IEEE 29th International Conference on Computer Design (ICCD)最新文献

英文 中文
Implementing hardware Trojans: Experiences from a hardware Trojan challenge 实现硬件木马:来自硬件木马挑战的经验
Pub Date : 2011-10-09 DOI: 10.1109/ICCD.2011.6081414
G. Becker, Ashwin Lakshminarasimhan, Lang Lin, Sudheendra K. Srivathsa, Vikram B. Suresh, W. Burleson
Hardware Trojans have become a growing concern in the design of secure integrated circuits. In this work, we present a set of novel hardware Trojans aimed at evading detection methods, designed as part of the CSAW Embedded System Challenge 2010. We introduced and implemented unique Trojans based on side-channel analysis that leak the secret key in the reference encryption algorithm. These side-channel-based Trojans do not impact the functionality of the design to minimize the possibility of detection. We have demonstrated the statistical analysis approach to attack such Trojans. Besides, we introduced Trojans that modify either the functional behavior or the electrical characteristics of the reference design. Novel techniques such as a Trojan draining the battery of a device do not have an immediate impact and hence avoid detection, but affect the long term reliability of the system.
硬件木马在安全集成电路的设计中越来越受到关注。在这项工作中,我们提出了一套新颖的硬件木马,旨在逃避检测方法,设计作为CSAW嵌入式系统挑战赛2010的一部分。我们引入并实现了基于侧信道分析的独特木马,该木马会泄露参考加密算法中的密钥。这些基于侧通道的木马不会影响设计的功能,以尽量减少检测的可能性。我们已经演示了攻击此类木马的统计分析方法。此外,我们还介绍了修改参考设计的功能行为或电气特性的木马程序。新技术,如木马耗尽设备的电池不会立即产生影响,因此可以避免检测,但会影响系统的长期可靠性。
{"title":"Implementing hardware Trojans: Experiences from a hardware Trojan challenge","authors":"G. Becker, Ashwin Lakshminarasimhan, Lang Lin, Sudheendra K. Srivathsa, Vikram B. Suresh, W. Burleson","doi":"10.1109/ICCD.2011.6081414","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081414","url":null,"abstract":"Hardware Trojans have become a growing concern in the design of secure integrated circuits. In this work, we present a set of novel hardware Trojans aimed at evading detection methods, designed as part of the CSAW Embedded System Challenge 2010. We introduced and implemented unique Trojans based on side-channel analysis that leak the secret key in the reference encryption algorithm. These side-channel-based Trojans do not impact the functionality of the design to minimize the possibility of detection. We have demonstrated the statistical analysis approach to attack such Trojans. Besides, we introduced Trojans that modify either the functional behavior or the electrical characteristics of the reference design. Novel techniques such as a Trojan draining the battery of a device do not have an immediate impact and hence avoid detection, but affect the long term reliability of the system.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116108934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
AURA: An application and user interaction aware middleware framework for energy optimization in mobile devices AURA:用于移动设备能源优化的应用程序和用户交互感知中间件框架
Pub Date : 2011-10-09 DOI: 10.1109/ICCD.2011.6081393
Brad K. Donohoo, Chris Ohlsen, S. Pasricha
Mobile battery-operated devices are becoming an essential instrument for business, communication, and social interaction. In addition to the demand for an acceptable level of performance and a comprehensive set of features, users often desire extended battery lifetime. In fact, limited battery lifetime is one of the biggest obstacles facing the current utility and future growth of increasingly sophisticated “smart” mobile devices. This paper proposes a novel application-aware and user-interaction aware energy optimization middleware framework (AURA) for pervasive mobile devices. AURA optimizes CPU and screen backlight energy consumption while maintaining a minimum acceptable level of performance. The proposed framework employs a novel Bayesian application classifier and management strategies based on Markov Decision Processes to achieve energy savings. Real-world user evaluation studies on a Google Android based HTC Dream smartphone running the AURA framework demonstrate promising results, with up to 24% energy savings compared to the baseline device manager, and up to 5× savings over prior work on CPU and backlight energy co-optimization.
电池驱动的移动设备正在成为商业、通信和社会互动的重要工具。除了要求可接受的性能水平和全面的功能集之外,用户通常还希望延长电池寿命。事实上,有限的电池寿命是当前和未来日益复杂的“智能”移动设备发展面临的最大障碍之一。提出了一种面向普适移动设备的应用感知和用户交互感知的能量优化中间件框架(AURA)。AURA优化了CPU和屏幕背光能耗,同时保持了最低可接受的性能水平。该框架采用了一种新的贝叶斯应用分类器和基于马尔可夫决策过程的管理策略来实现节能。在运行AURA框架的基于Google Android的HTC Dream智能手机上进行的实际用户评估研究显示了有希望的结果,与基准设备管理器相比,节能高达24%,比之前在CPU和背光能量协同优化方面的工作节省高达5倍。
{"title":"AURA: An application and user interaction aware middleware framework for energy optimization in mobile devices","authors":"Brad K. Donohoo, Chris Ohlsen, S. Pasricha","doi":"10.1109/ICCD.2011.6081393","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081393","url":null,"abstract":"Mobile battery-operated devices are becoming an essential instrument for business, communication, and social interaction. In addition to the demand for an acceptable level of performance and a comprehensive set of features, users often desire extended battery lifetime. In fact, limited battery lifetime is one of the biggest obstacles facing the current utility and future growth of increasingly sophisticated “smart” mobile devices. This paper proposes a novel application-aware and user-interaction aware energy optimization middleware framework (AURA) for pervasive mobile devices. AURA optimizes CPU and screen backlight energy consumption while maintaining a minimum acceptable level of performance. The proposed framework employs a novel Bayesian application classifier and management strategies based on Markov Decision Processes to achieve energy savings. Real-world user evaluation studies on a Google Android based HTC Dream smartphone running the AURA framework demonstrate promising results, with up to 24% energy savings compared to the baseline device manager, and up to 5× savings over prior work on CPU and backlight energy co-optimization.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"38 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129525155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
The convergence of HPC and embedded systems in our heterogeneous computing future 在异构计算的未来,高性能计算和嵌入式系统的融合
Pub Date : 2011-10-09 DOI: 10.1109/ICCD.2011.6081368
D. Kaeli, David Akodes
Recently we have seen two exciting trends that have been flooding the market: 1) the movement of graphics processing units into mainstream general-purpose platforms, and 2) the movement of multi-core embedded systems into tablet computing and smartphone spaces. These trends are forcing application developers to rethink how they are going to best utilize these many-core and multi-core heterogeneous platforms to provide new levels of cost/performance/power in a range of emerging application domains.
最近我们看到了两个令人兴奋的市场趋势:1)图形处理单元向主流通用平台的移动,2)多核嵌入式系统向平板电脑和智能手机领域的移动。这些趋势迫使应用程序开发人员重新思考如何最好地利用这些多核和多核异构平台,在一系列新兴应用领域中提供新的成本/性能/能力水平。
{"title":"The convergence of HPC and embedded systems in our heterogeneous computing future","authors":"D. Kaeli, David Akodes","doi":"10.1109/ICCD.2011.6081368","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081368","url":null,"abstract":"Recently we have seen two exciting trends that have been flooding the market: 1) the movement of graphics processing units into mainstream general-purpose platforms, and 2) the movement of multi-core embedded systems into tablet computing and smartphone spaces. These trends are forcing application developers to rethink how they are going to best utilize these many-core and multi-core heterogeneous platforms to provide new levels of cost/performance/power in a range of emerging application domains.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"2004 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129635801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Energy-aware and quality-scalable data placement and retrieval for disks in video server environments 视频服务器环境中磁盘的能量感知和质量可伸缩数据放置和检索
Pub Date : 2011-10-09 DOI: 10.1109/ICCD.2011.6081447
Domenic Forte, Ankur Srivastava
As the popularity of video streaming over the Internet grows, energy consumption in video server environments which store and retrieve video data increases as well. Previous work has shown that video quality delivered to clients can be scaled in order to serve more concurrent video requests and/or reduce energy consumption of server disks. We propose a data placement strategy for such quality scaling methods which distributes video data within a disk based on its priority/importance. Results show that in doing so the disk can retrieve data with greater efficiency and serve lower quality video to more clients than previously investigated strategies.
随着互联网上视频流的普及,存储和检索视频数据的视频服务器环境的能耗也在增加。以前的工作表明,交付给客户端的视频质量可以扩展,以服务更多的并发视频请求和/或减少服务器磁盘的能耗。我们提出了一种基于优先级/重要性在磁盘内分配视频数据的数据放置策略。结果表明,与之前研究的策略相比,这样做可以提高磁盘检索数据的效率,并为更多客户提供低质量的视频。
{"title":"Energy-aware and quality-scalable data placement and retrieval for disks in video server environments","authors":"Domenic Forte, Ankur Srivastava","doi":"10.1109/ICCD.2011.6081447","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081447","url":null,"abstract":"As the popularity of video streaming over the Internet grows, energy consumption in video server environments which store and retrieve video data increases as well. Previous work has shown that video quality delivered to clients can be scaled in order to serve more concurrent video requests and/or reduce energy consumption of server disks. We propose a data placement strategy for such quality scaling methods which distributes video data within a disk based on its priority/importance. Results show that in doing so the disk can retrieve data with greater efficiency and serve lower quality video to more clients than previously investigated strategies.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128870596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of issue queue delay: Banking tag RAM and identifying correct critical path 问题队列延迟的评估:存储标签RAM和识别正确的关键路径
Pub Date : 2011-10-09 DOI: 10.1109/ICCD.2011.6081417
Kyohei Yamaguchi, Yuya Kora, H. Ando
The tradeoff between complexity and attained instructions per cycle is often an important issue in microarchitectural designs. In this design phase, quick quantification of the complexity (i.e., delay) of relevant structures is required. The issue queue is one of such complex structures for which it is difficult to estimate delay. In this paper, we evaluate the issue queue delay to aid microarchitectural design. Our study includes two features: a circuit design and evaluation. First, we introduce banking the tag RAM, which is one of the components comprising the issue queue, to reduce the delay. Unlike normal RAM, banking the tag RAM is not straightforward, because of its uniqueness in the organization of the issue queue. Second, we explore and identify a correct critical path in the issue queue. A previous study summed the critical path of each component in the issue queue to obtain the delay of the issue queue, but this does not provide the correct delay of the issue queue, because the critical paths of each component are not connected logically. In the evaluation assuming 32nm LSI technology, we obtained the delays of an issue queue with eight to 128 entries. The process of banking the tag RAM and identifying the correct critical path reduces the delay by up to 20%, compared with not banking the tag RAM and simply summing the critical path delay of each component.
在微架构设计中,复杂性和每个周期获得的指令之间的权衡通常是一个重要的问题。在这个设计阶段,需要快速量化相关结构的复杂性(即延迟)。问题队列就是这样一种复杂的结构,它的延迟很难估计。在本文中,我们评估问题队列延迟,以帮助微架构设计。我们的研究包括两个方面:电路设计和评估。首先,我们引入标签RAM,它是组成问题队列的组件之一,以减少延迟。与普通RAM不同,存储标记RAM并不简单,因为它在问题队列的组织中具有唯一性。其次,我们探索并确定问题队列中的正确关键路径。之前的研究对问题队列中各部件的关键路径进行求和,得到了问题队列的时延,但由于各部件的关键路径没有逻辑连接,这并不能提供正确的问题队列时延。在假设采用32nm大规模集成电路技术的评估中,我们获得了8到128个条目的问题队列的延迟。与不存储标签RAM并简单地将每个组件的关键路径延迟相加相比,存储标签RAM并识别正确的关键路径的过程最多可减少20%的延迟。
{"title":"Evaluation of issue queue delay: Banking tag RAM and identifying correct critical path","authors":"Kyohei Yamaguchi, Yuya Kora, H. Ando","doi":"10.1109/ICCD.2011.6081417","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081417","url":null,"abstract":"The tradeoff between complexity and attained instructions per cycle is often an important issue in microarchitectural designs. In this design phase, quick quantification of the complexity (i.e., delay) of relevant structures is required. The issue queue is one of such complex structures for which it is difficult to estimate delay. In this paper, we evaluate the issue queue delay to aid microarchitectural design. Our study includes two features: a circuit design and evaluation. First, we introduce banking the tag RAM, which is one of the components comprising the issue queue, to reduce the delay. Unlike normal RAM, banking the tag RAM is not straightforward, because of its uniqueness in the organization of the issue queue. Second, we explore and identify a correct critical path in the issue queue. A previous study summed the critical path of each component in the issue queue to obtain the delay of the issue queue, but this does not provide the correct delay of the issue queue, because the critical paths of each component are not connected logically. In the evaluation assuming 32nm LSI technology, we obtained the delays of an issue queue with eight to 128 entries. The process of banking the tag RAM and identifying the correct critical path reduces the delay by up to 20%, compared with not banking the tag RAM and simply summing the critical path delay of each component.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"743 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122915482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
ARCc: A case for an architecturally redundant cache-coherence architecture for large multicores ARCc:大型多核架构冗余缓存一致性架构的案例
Pub Date : 2011-10-09 DOI: 10.1109/ICCD.2011.6081431
O. Khan, H. Hoffmann, Mieszko Lis, Farrukh Hijaz, A. Agarwal, S. Devadas
This paper proposes an architecturally redundant cache-coherence architecture (ARCc) that combines the directory and shared-NUCA based coherence protocols to improve performance, energy and dependability. Both coherence mechanisms co-exist in the hardware and ARCc enables seamless transition between the two protocols. We present an online analytical model implemented in the hardware that predicts performance and triggers a transition between the two coherence protocols at application-level granularity. The ARCc architecture delivers up to 1.6× higher performance and up to 1.5× lower energy consumption compared to the directory-based counterpart. It does so by identifying applications which benefit from the large shared cache capacity of shared-NUCA because of lower off-chip accesses, or where remote-cache word accesses are efficient.
本文提出了一种架构冗余缓存一致性架构(ARCc),该架构结合了基于目录和共享nuca的一致性协议,以提高性能、能源和可靠性。两种一致性机制共存于硬件中,ARCc实现了两种协议之间的无缝转换。我们提出了一个在硬件上实现的在线分析模型,该模型预测性能并在应用级粒度上触发两种一致性协议之间的转换。与基于目录的同类产品相比,ARCc架构的性能提高了1.6倍,能耗降低了1.5倍。它通过识别那些受益于共享nuca的大共享缓存容量的应用程序来实现这一点,因为芯片外访问较低,或者远程缓存字访问效率较高。
{"title":"ARCc: A case for an architecturally redundant cache-coherence architecture for large multicores","authors":"O. Khan, H. Hoffmann, Mieszko Lis, Farrukh Hijaz, A. Agarwal, S. Devadas","doi":"10.1109/ICCD.2011.6081431","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081431","url":null,"abstract":"This paper proposes an architecturally redundant cache-coherence architecture (ARCc) that combines the directory and shared-NUCA based coherence protocols to improve performance, energy and dependability. Both coherence mechanisms co-exist in the hardware and ARCc enables seamless transition between the two protocols. We present an online analytical model implemented in the hardware that predicts performance and triggers a transition between the two coherence protocols at application-level granularity. The ARCc architecture delivers up to 1.6× higher performance and up to 1.5× lower energy consumption compared to the directory-based counterpart. It does so by identifying applications which benefit from the large shared cache capacity of shared-NUCA because of lower off-chip accesses, or where remote-cache word accesses are efficient.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130860293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Pre-assignment RDL routing via extraction of maximal net sequence 基于最大网络序列提取的预分配RDL路由
Pub Date : 2011-10-09 DOI: 10.1109/ICCD.2011.6081377
Jin-Tai Yan, Zhi-Wei Chen
Given a set of IO connections between IO buffers and bump balls in a re-distribution routing layer, an efficient router is proposed to route all the IO connections for pre-assignment RDL routing in an area-IO flip-chip design. Based on the simplification of net renumbering and the extraction of the maximal net sequence for all the IO connections, all the connections can be firstly divided into local and global connections. After routing the global wires of all the local connections, the global wires of all the global connections are further assigned under the capacity constraint for RDL global routing. Finally, the global wires of all the IO connections are routed for RDL detailed routing by assigning feasible crossing points and physical paths. The experimental results show that our proposed pre-assignment RDL router can maintain 100% routability in 7 tested industrial circuits. Compared with Yan's pre-assignment RDL router[4] in total wirelength and CPU time, our proposed approach saves 3.7% of total wirelength and 27.0% of CPU time on the average.
在重新分配路由层中,给定IO缓冲区和碰撞球之间的一组IO连接,提出了一种有效的路由器来路由所有的IO连接,用于区域IO倒装芯片设计中的预分配RDL路由。在简化网络编号和提取所有IO连接最大网络序列的基础上,首先将所有IO连接划分为局部连接和全局连接。路由完所有本地连接的全局线后,在RDL全局路由的容量约束下,进一步分配所有全局连接的全局线。最后,通过分配可行的交叉点和物理路径,为RDL详细路由路由所有IO连接的全局连接。实验结果表明,我们提出的预分配RDL路由器在7个测试的工业电路中都能保持100%的可达性。与Yan的预分配RDL路由器[4]相比,我们的方法在总带宽和CPU时间上平均节省了3.7%的总带宽和27.0%的CPU时间。
{"title":"Pre-assignment RDL routing via extraction of maximal net sequence","authors":"Jin-Tai Yan, Zhi-Wei Chen","doi":"10.1109/ICCD.2011.6081377","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081377","url":null,"abstract":"Given a set of IO connections between IO buffers and bump balls in a re-distribution routing layer, an efficient router is proposed to route all the IO connections for pre-assignment RDL routing in an area-IO flip-chip design. Based on the simplification of net renumbering and the extraction of the maximal net sequence for all the IO connections, all the connections can be firstly divided into local and global connections. After routing the global wires of all the local connections, the global wires of all the global connections are further assigned under the capacity constraint for RDL global routing. Finally, the global wires of all the IO connections are routed for RDL detailed routing by assigning feasible crossing points and physical paths. The experimental results show that our proposed pre-assignment RDL router can maintain 100% routability in 7 tested industrial circuits. Compared with Yan's pre-assignment RDL router[4] in total wirelength and CPU time, our proposed approach saves 3.7% of total wirelength and 27.0% of CPU time on the average.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125237968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
DPPC: Dynamic power partitioning and capping in chip multiprocessors DPPC:芯片多处理器中的动态功率划分和封顶
Pub Date : 2011-10-09 DOI: 10.1109/ICCD.2011.6081373
Kai Ma, Xiaorui Wang, Yefu Wang
A key challenge in chip multiprocessor (CMP) design is to optimize the performance within a power budget limited by the CMP's cooling, packaging, and power supply capacities. Most existing solutions rely solely on DVFS to adapt the power consumption of CPU cores, without coordinating with the last-level on-chip (e.g., L2) cache. This paper proposes DPPC, a chip-level power partitioning and capping strategy that can dynamically and explicitly partition the chip-level power budget among different CPU cores and the shared last-level cache in a CMP based on the workload characteristics measured online. DPPC features a novel performance-power model and an online model estimator to quantitatively estimate the performance contributed by each core and the cache with their respective local power budgets. DPPC then re-partitions the chip-level power budget among them for optimized CMP performance. The partitioned local power budgets for the CPU cores and cache are precisely enforced by power capping algorithms designed rigorously based on feedback control theory. Our experimental results demonstrate that DPPC achieves better CMP performance, within a given power budget, than several state-of-the-art power capping solutions.
芯片多处理器(CMP)设计的一个关键挑战是在CMP的冷却、封装和供电能力限制的功耗预算内优化性能。大多数现有的解决方案仅依靠DVFS来适应CPU内核的功耗,而不与片上最后一级(例如L2)缓存协调。DPPC是一种芯片级功率分区和封顶策略,它可以根据在线测量的工作负载特征,动态地、显式地将芯片级功率预算分配给CMP中不同的CPU内核和共享的最后一级缓存。DPPC具有新颖的性能-功率模型和在线模型估计器,可以定量估计每个核心和缓存在各自的本地功率预算下所贡献的性能。DPPC然后在它们之间重新划分芯片级功率预算,以优化CMP性能。通过基于反馈控制理论设计的功率封顶算法,对CPU内核和缓存的局部功率预算进行精确的分区。我们的实验结果表明,在给定的功率预算内,DPPC比几种最先进的功率封顶解决方案实现了更好的CMP性能。
{"title":"DPPC: Dynamic power partitioning and capping in chip multiprocessors","authors":"Kai Ma, Xiaorui Wang, Yefu Wang","doi":"10.1109/ICCD.2011.6081373","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081373","url":null,"abstract":"A key challenge in chip multiprocessor (CMP) design is to optimize the performance within a power budget limited by the CMP's cooling, packaging, and power supply capacities. Most existing solutions rely solely on DVFS to adapt the power consumption of CPU cores, without coordinating with the last-level on-chip (e.g., L2) cache. This paper proposes DPPC, a chip-level power partitioning and capping strategy that can dynamically and explicitly partition the chip-level power budget among different CPU cores and the shared last-level cache in a CMP based on the workload characteristics measured online. DPPC features a novel performance-power model and an online model estimator to quantitatively estimate the performance contributed by each core and the cache with their respective local power budgets. DPPC then re-partitions the chip-level power budget among them for optimized CMP performance. The partitioned local power budgets for the CPU cores and cache are precisely enforced by power capping algorithms designed rigorously based on feedback control theory. Our experimental results demonstrate that DPPC achieves better CMP performance, within a given power budget, than several state-of-the-art power capping solutions.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134525516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A tool set for the design of asynchronous circuits with bundled-data implementation 一个用于设计具有捆绑数据实现的异步电路的工具集
Pub Date : 2011-10-09 DOI: 10.1109/ICCD.2011.6081379
Minoru Iizuka, N. Hamada, H. Saito, R. Yamaguchi, Minoru Yoshinaga
This paper proposes a tool set for the design of asynchronous circuits with bundled-data implementation. Using the proposed tool set with commercial CAD tools, asynchronous circuits with bundled-data implementation can be designed easily. Through the experiments, this paper evaluates synthesized circuits using the proposed tool set in terms of area, performance, power consumption, and energy consumption comparing with synchronous counterparts.
本文提出了一种用于设计具有捆绑数据实现的异步电路的工具集。利用所提出的工具集和商业CAD工具,可以轻松地设计具有捆绑数据实现的异步电路。通过实验,本文对使用所提出的工具集的合成电路在面积、性能、功耗和能耗方面与同步电路进行了比较。
{"title":"A tool set for the design of asynchronous circuits with bundled-data implementation","authors":"Minoru Iizuka, N. Hamada, H. Saito, R. Yamaguchi, Minoru Yoshinaga","doi":"10.1109/ICCD.2011.6081379","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081379","url":null,"abstract":"This paper proposes a tool set for the design of asynchronous circuits with bundled-data implementation. Using the proposed tool set with commercial CAD tools, asynchronous circuits with bundled-data implementation can be designed easily. Through the experiments, this paper evaluates synthesized circuits using the proposed tool set in terms of area, performance, power consumption, and energy consumption comparing with synchronous counterparts.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131361681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Hybrid system level power consumption estimation for FPGA-based MPSoC 基于fpga的MPSoC混合系统级功耗估计
Pub Date : 2011-10-09 DOI: 10.1109/ICCD.2011.6081403
S. Rethinagiri, R. B. Atitallah, S. Niar, E. Senn, J. Dekeyser
This paper proposes an efficient Hybrid System Level (HSL) power estimation methodology for FPGA-based MPSoC. Within this methodology, the Functional Level Power Analysis (FLPA) is extended to set up generic power models for the different parts of the system. Then, a simulation framework is developed at the transactional level to evaluate accurately the activities used in the related power models. The combination of the above two parts lead to a hybrid power estimation that gives a better trade-off between accuracy and speed. The proposed methodology has several benefits: it considers the power consumption of the embedded system in its entirety and leads to accurate estimates without a costly and complex material. The proposed methodology is also scalable for exploring complex embedded architectures. The usefulness and effectiveness of our HSL methodology is validated through a typical mono-processor and multiprocessor embedded system designed around the Xilinx Virtex II Pro FPGA board. Our experiments performed on an explicit embedded platform show that the obtained power estimation results are less than 1.2% of error when compared to the real board measurements and faster compared to other power estimation tools.
本文提出了一种高效的基于fpga的MPSoC混合系统级(HSL)功率估计方法。在这种方法中,功能级功率分析(FLPA)被扩展到为系统的不同部分建立通用功率模型。然后,在事务级开发了仿真框架,以准确评估相关权力模型中使用的活动。上述两部分的结合导致混合功率估计,在准确性和速度之间提供了更好的权衡。所提出的方法有几个好处:它考虑了嵌入式系统的整体功耗,并在没有昂贵和复杂材料的情况下得出准确的估计。所提出的方法对于探索复杂的嵌入式体系结构也是可扩展的。通过围绕Xilinx Virtex II Pro FPGA板设计的典型单处理器和多处理器嵌入式系统,验证了HSL方法的实用性和有效性。我们在显式嵌入式平台上进行的实验表明,与实际电路板测量结果相比,获得的功率估计结果误差小于1.2%,与其他功率估计工具相比,速度更快。
{"title":"Hybrid system level power consumption estimation for FPGA-based MPSoC","authors":"S. Rethinagiri, R. B. Atitallah, S. Niar, E. Senn, J. Dekeyser","doi":"10.1109/ICCD.2011.6081403","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081403","url":null,"abstract":"This paper proposes an efficient Hybrid System Level (HSL) power estimation methodology for FPGA-based MPSoC. Within this methodology, the Functional Level Power Analysis (FLPA) is extended to set up generic power models for the different parts of the system. Then, a simulation framework is developed at the transactional level to evaluate accurately the activities used in the related power models. The combination of the above two parts lead to a hybrid power estimation that gives a better trade-off between accuracy and speed. The proposed methodology has several benefits: it considers the power consumption of the embedded system in its entirety and leads to accurate estimates without a costly and complex material. The proposed methodology is also scalable for exploring complex embedded architectures. The usefulness and effectiveness of our HSL methodology is validated through a typical mono-processor and multiprocessor embedded system designed around the Xilinx Virtex II Pro FPGA board. Our experiments performed on an explicit embedded platform show that the obtained power estimation results are less than 1.2% of error when compared to the real board measurements and faster compared to other power estimation tools.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114269060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
期刊
2011 IEEE 29th International Conference on Computer Design (ICCD)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1