首页 > 最新文献

2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools最新文献

英文 中文
A Formal Condition to Stop an Incremental Automatic Functional Diagnosis 停止增量功能自动诊断的正式条件
Luca Amati, C. Bolchini, F. Salice, F. Franzoso
iAF2D (incremental Automatic Functional Fault Detective) is a methodology for the identification of the faulty component in a complex system using data collected from a test session. It is an incremental approach based on a Bayesian Belief Network, where the model of the system under analysis is extracted from a faulty signature description. iAF2D reduces time, cost and efforts during the diagnostic phase by implementing a step-by-step selection of the tests to be executed from the set of available tests. This paper focuses on the evolution of the BBN nodes probabilities, to define a stop criterion to interrupt the diagnosis process when additional test outcomes would not provide further useful information for identifying the faulty candidate. Methodology validation is performed on a set of experimental results.
iAF2D(增量自动功能故障检测)是一种使用从测试会话中收集的数据来识别复杂系统中故障组件的方法。它是一种基于贝叶斯信念网络的增量方法,从错误的签名描述中提取待分析系统的模型。iAF2D通过从一组可用测试中逐步选择要执行的测试,减少了诊断阶段的时间、成本和工作量。本文的重点是BBN节点概率的演变,以定义一个停止准则,当额外的测试结果不能提供进一步有用的信息来识别错误的候选时,中断诊断过程。对一组实验结果进行了方法学验证。
{"title":"A Formal Condition to Stop an Incremental Automatic Functional Diagnosis","authors":"Luca Amati, C. Bolchini, F. Salice, F. Franzoso","doi":"10.1109/DSD.2010.98","DOIUrl":"https://doi.org/10.1109/DSD.2010.98","url":null,"abstract":"iAF2D (incremental Automatic Functional Fault Detective) is a methodology for the identification of the faulty component in a complex system using data collected from a test session. It is an incremental approach based on a Bayesian Belief Network, where the model of the system under analysis is extracted from a faulty signature description. iAF2D reduces time, cost and efforts during the diagnostic phase by implementing a step-by-step selection of the tests to be executed from the set of available tests. This paper focuses on the evolution of the BBN nodes probabilities, to define a stop criterion to interrupt the diagnosis process when additional test outcomes would not provide further useful information for identifying the faulty candidate. Methodology validation is performed on a set of experimental results.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122487613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Reconfigurable Grid Alu Processor: Optimization and Design Space Exploration 可重构网格Alu处理器:优化与设计空间探索
Basher Shehan, Ralf Jahr, S. Uhrig, T. Ungerer
Currently few architectural approaches propose new paths to raise the performance of conventional sequential instruction streams in the time of the billions transistor era. Many application programs could profit from processors that are able to speed up the execution of sequential applications beyond the performance of current super scalar processors. The Grid Alu Processor (GAP) is a runtime reconfigurable processor designed for the acceleration of a conventional sequential instruction stream without the need of recompilation. It comprises a super scalar processor front-end, a configuration unit, and an array of reconfigurable functional units (FUs), which is fully integrated into the pipeline. The configuration unit maps data dependent and independent instructions simultaneously at runtime into the array of FUs. This paper evaluates the GAP architecture and optimizes the architecture, the number of FUs, and the configuration layers implemented in the array. The simulations show a significant speed-up for sequential applications on GAP in comparison to an out-of-order super scalar simulator (Simple Scalar). The GAP simulator outperforms Simple Scalar on average by about 50% on the basic architecture and about 100% with an extended version including configuration layers.
目前,在数十亿晶体管时代,很少有架构方法提出新的途径来提高传统顺序指令流的性能。许多应用程序可以从能够加快顺序应用程序执行速度的处理器中获益,这种处理器的性能超过了当前的超大标量处理器。栅格Alu处理器(GAP)是一种运行时可重构处理器,设计用于加速传统顺序指令流而无需重新编译。它包括一个超大标量处理器前端、一个配置单元和一组可重构功能单元(FUs),这些可重构功能单元完全集成到管道中。配置单元在运行时将与数据相关的和独立的指令同时映射到fu数组中。本文对GAP体系结构进行了评估,并对该阵列的体系结构、FUs数量和配置层进行了优化。仿真结果表明,与乱序超标量模拟器(Simple scalar)相比,GAP上的顺序应用程序有显著的加速。GAP模拟器在基本架构上的性能比Simple Scalar平均高50%,在包含配置层的扩展版本上的性能比Simple Scalar平均高100%。
{"title":"Reconfigurable Grid Alu Processor: Optimization and Design Space Exploration","authors":"Basher Shehan, Ralf Jahr, S. Uhrig, T. Ungerer","doi":"10.1109/DSD.2010.28","DOIUrl":"https://doi.org/10.1109/DSD.2010.28","url":null,"abstract":"Currently few architectural approaches propose new paths to raise the performance of conventional sequential instruction streams in the time of the billions transistor era. Many application programs could profit from processors that are able to speed up the execution of sequential applications beyond the performance of current super scalar processors. The Grid Alu Processor (GAP) is a runtime reconfigurable processor designed for the acceleration of a conventional sequential instruction stream without the need of recompilation. It comprises a super scalar processor front-end, a configuration unit, and an array of reconfigurable functional units (FUs), which is fully integrated into the pipeline. The configuration unit maps data dependent and independent instructions simultaneously at runtime into the array of FUs. This paper evaluates the GAP architecture and optimizes the architecture, the number of FUs, and the configuration layers implemented in the array. The simulations show a significant speed-up for sequential applications on GAP in comparison to an out-of-order super scalar simulator (Simple Scalar). The GAP simulator outperforms Simple Scalar on average by about 50% on the basic architecture and about 100% with an extended version including configuration layers.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115148605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
High Level Validation of an Optimization Algorithm for the Implementation of Adaptive Wavelet Transforms in FPGAs fpga中实现自适应小波变换的优化算法的高级验证
R. Salvador, F. Moreno, T. Riesgo, L. Sekanina
The work reported in this paper describes the steps given towards an FPGA-based implementation of evolvable wavelet transforms for image compression in embedded systems. An Evolutionary Algorithm (EA) for the design and optimization of the transform coefficients is tailored for a suitable System on Chip implementation. Several cut downs on the computing requirements have been done to the original algorithm, adapting it for the FPGA implementation. What this paper addresses more specifically is the validation of the algorithm using fixed point arithmetic for the whole optimization process. The results show how high quality transforms are evolved from scratch with limited precision arithmetic. Also, preliminary results of the implementation in an FPGA device are included.
本文所报道的工作描述了在嵌入式系统中实现基于fpga的可进化小波变换图像压缩的步骤。一种进化算法(EA)的设计和优化的变换系数是量身定制的一个合适的片上系统的实现。对原始算法的计算要求进行了一些削减,使其适应FPGA的实现。本文更具体地讨论的是在整个优化过程中使用不动点算法对算法进行验证。结果表明,高质量的变换是如何在有限精度的算法下从零开始演化的。此外,还包括在FPGA器件中实现的初步结果。
{"title":"High Level Validation of an Optimization Algorithm for the Implementation of Adaptive Wavelet Transforms in FPGAs","authors":"R. Salvador, F. Moreno, T. Riesgo, L. Sekanina","doi":"10.1109/DSD.2010.96","DOIUrl":"https://doi.org/10.1109/DSD.2010.96","url":null,"abstract":"The work reported in this paper describes the steps given towards an FPGA-based implementation of evolvable wavelet transforms for image compression in embedded systems. An Evolutionary Algorithm (EA) for the design and optimization of the transform coefficients is tailored for a suitable System on Chip implementation. Several cut downs on the computing requirements have been done to the original algorithm, adapting it for the FPGA implementation. What this paper addresses more specifically is the validation of the algorithm using fixed point arithmetic for the whole optimization process. The results show how high quality transforms are evolved from scratch with limited precision arithmetic. Also, preliminary results of the implementation in an FPGA device are included.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124510892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Multicore Embedded Processor for Fingerprint Recognition 用于指纹识别的多核嵌入式处理器
G. Danese, Mauro Giachero, F. Leporati, Nelson Nazzicari
Biometric identification systems exploit automated methods of recognition based on physiological or behavioural people characteristics. Among these, fingerprints are very affordable biometric identifiers. In order to build embedded systems performing real-time authentication, a fast computational unit for image processing is required. In this paper we propose a parallel architecture that efficiently implements the high computationally demanding core of a matching algorithm based on Band Limited Phase Only spatial Correlation (BLPOC), elaborated by two concurrent computational units implemented onto Stratix II family Altera FPGA. The realised device is competitive with those provided by similar hardware solutions described in literature and outperforms the elaboration capabilities of general purpose PC processors.
生物识别系统利用基于人的生理或行为特征的自动识别方法。其中,指纹是非常实惠的生物识别标识。为了构建执行实时身份验证的嵌入式系统,需要一个快速的图像处理计算单元。在本文中,我们提出了一种并行架构,该架构有效地实现了基于带限相位空间相关(BLPOC)匹配算法的高计算要求核心,由两个并行计算单元实现在Stratix II系列Altera FPGA上。实现的设备与文献中描述的类似硬件解决方案提供的设备具有竞争力,并且优于通用PC处理器的细化能力。
{"title":"A Multicore Embedded Processor for Fingerprint Recognition","authors":"G. Danese, Mauro Giachero, F. Leporati, Nelson Nazzicari","doi":"10.1109/DSD.2010.101","DOIUrl":"https://doi.org/10.1109/DSD.2010.101","url":null,"abstract":"Biometric identification systems exploit automated methods of recognition based on physiological or behavioural people characteristics. Among these, fingerprints are very affordable biometric identifiers. In order to build embedded systems performing real-time authentication, a fast computational unit for image processing is required. In this paper we propose a parallel architecture that efficiently implements the high computationally demanding core of a matching algorithm based on Band Limited Phase Only spatial Correlation (BLPOC), elaborated by two concurrent computational units implemented onto Stratix II family Altera FPGA. The realised device is competitive with those provided by similar hardware solutions described in literature and outperforms the elaboration capabilities of general purpose PC processors.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116266885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Instantiating GENESYS Application Architecture Modeling via UML 2.0 Constructs and MARTE Profile 通过UML 2.0构造和MARTE概要文件实例化GENESYS应用程序体系结构建模
Subayal Khan, Kari Tiensyrjä, J. Nurmi
Modeling of complex and computationally intense applications supported by modern mobile devices via standard modeling languages is a challenging task. Within the GENESYS process model the application modeling phase is thus of key importance. GENESYS manages complexity by employing cross domain and platform-based application design. The main contribution of this article is to describe the instantiation of GENESYS application architecture modeling via MARTE profile and describe a methodology for validation of nonfunctional properties annotated in the application model.
通过标准建模语言对现代移动设备支持的复杂和计算密集型应用程序进行建模是一项具有挑战性的任务。因此,在GENESYS流程模型中,应用程序建模阶段非常重要。GENESYS通过采用跨领域和平台的应用程序设计来管理复杂性。本文的主要贡献是描述了通过MARTE概要文件对GENESYS应用程序体系结构建模的实例化,并描述了一种验证应用程序模型中注释的非功能属性的方法。
{"title":"Instantiating GENESYS Application Architecture Modeling via UML 2.0 Constructs and MARTE Profile","authors":"Subayal Khan, Kari Tiensyrjä, J. Nurmi","doi":"10.1109/DSD.2010.36","DOIUrl":"https://doi.org/10.1109/DSD.2010.36","url":null,"abstract":"Modeling of complex and computationally intense applications supported by modern mobile devices via standard modeling languages is a challenging task. Within the GENESYS process model the application modeling phase is thus of key importance. GENESYS manages complexity by employing cross domain and platform-based application design. The main contribution of this article is to describe the instantiation of GENESYS application architecture modeling via MARTE profile and describe a methodology for validation of nonfunctional properties annotated in the application model.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116524788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Description-Level Optimisation of Synthesisable Asynchronous Circuits 可合成异步电路的描述级优化
L. Tarazona, D. Edwards, A. Bardsley, L. Plana
The syntax-directed synthesis paradigm has shown to be a powerful synthesis approach. However, its control-driven nature results in significant performance overhead. Some methods to reduce this overhead include peephole optimisations, control resynthesis and component optimisations. This work explores new methods of improving the performance of syntax-directed synthesised asynchronous circuits, using the Balsa synthesis system as the research framework. This includes investigating description styles and the usage of language constructs that exploit the directness of the synthesis method to obtain more concurrent and faster circuits. The techniques and optimisations presented here has been tested in a set of non-trivial examples including a 32-bit processor, a Viterbi decoder, and a channel-sliced wormhole router.
语法导向的合成范式已被证明是一种强大的合成方法。然而,其控制驱动的特性导致了显著的性能开销。减少这种开销的一些方法包括窥视孔优化、控制再合成和组件优化。这项工作探索了提高语法定向合成异步电路性能的新方法,使用Balsa合成系统作为研究框架。这包括研究描述风格和语言结构的使用,这些语言结构利用合成方法的直接性来获得更多并发和更快的电路。本文介绍的技术和优化已经在一系列重要的示例中进行了测试,包括32位处理器、Viterbi解码器和通道切片虫洞路由器。
{"title":"Description-Level Optimisation of Synthesisable Asynchronous Circuits","authors":"L. Tarazona, D. Edwards, A. Bardsley, L. Plana","doi":"10.1109/DSD.2010.71","DOIUrl":"https://doi.org/10.1109/DSD.2010.71","url":null,"abstract":"The syntax-directed synthesis paradigm has shown to be a powerful synthesis approach. However, its control-driven nature results in significant performance overhead. Some methods to reduce this overhead include peephole optimisations, control resynthesis and component optimisations. This work explores new methods of improving the performance of syntax-directed synthesised asynchronous circuits, using the Balsa synthesis system as the research framework. This includes investigating description styles and the usage of language constructs that exploit the directness of the synthesis method to obtain more concurrent and faster circuits. The techniques and optimisations presented here has been tested in a set of non-trivial examples including a 32-bit processor, a Viterbi decoder, and a channel-sliced wormhole router.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121841167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Design of Testable Universal Logic Gate Targeting Minimum Wire-Crossings in QCA Logic Circuit QCA逻辑电路中以最小导线交叉为目标的可测试通用逻辑门设计
B. Sen, Anik Sengupta, M. Dalui, B. Sikdar
This work proposes a testable QCA (Quantum-Dot Cellular Automata) logic gate (UQCALG) realizing the universal functions. The design of UQCALG is based on the Coupled Majority Minority (CMVMIN) QCA structure with the target to reduce wire crossings as well as the number of clock cycles required to operate a QCA circuit. The characterization of defects in such design leads to synthesis of a test block, realized with the majority and minority voters, that ensures the desired testability of a circuit. The experimental designs establish that the UQCALG can result in cost effective design of testable QCA logic circuits that may not be possible with conventional ULG (Universal Logic Gate).
本文提出了一种可测试的量子点元胞自动机逻辑门(UQCALG),实现了通用功能。UQCALG的设计基于耦合多数少数(CMVMIN) QCA结构,其目标是减少导线交叉以及操作QCA电路所需的时钟周期数。这种设计中的缺陷特征导致测试块的合成,通过多数和少数选民实现,确保电路的预期可测试性。实验设计表明,UQCALG可以设计出具有成本效益的可测试QCA逻辑电路,这可能是传统ULG(通用逻辑门)所无法实现的。
{"title":"Design of Testable Universal Logic Gate Targeting Minimum Wire-Crossings in QCA Logic Circuit","authors":"B. Sen, Anik Sengupta, M. Dalui, B. Sikdar","doi":"10.1109/DSD.2010.114","DOIUrl":"https://doi.org/10.1109/DSD.2010.114","url":null,"abstract":"This work proposes a testable QCA (Quantum-Dot Cellular Automata) logic gate (UQCALG) realizing the universal functions. The design of UQCALG is based on the Coupled Majority Minority (CMVMIN) QCA structure with the target to reduce wire crossings as well as the number of clock cycles required to operate a QCA circuit. The characterization of defects in such design leads to synthesis of a test block, realized with the majority and minority voters, that ensures the desired testability of a circuit. The experimental designs establish that the UQCALG can result in cost effective design of testable QCA logic circuits that may not be possible with conventional ULG (Universal Logic Gate).","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114161278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
On Reducing Error Rate of Data Protected Using Systematic Unordered Codes in Asymmetric Channels 非对称信道中使用系统无序码降低数据保护错误率的研究
S. Piestrak
Berger-invert codes are coding schemes used to protect communication channels against all asymmetric errors and to decrease power consumption. This paper proposes a method of constructing modified Berger-invert codes that relies on the choice of check parts with the smallest possible total weight and assignment of low-weight check parts to the most numerous subsets of data with the largest Hamming weights. As a result, the error rate of the transmitted data can be reduced by up to about 23.5% for a 8-bit bus at no cost (no extra bus lines or increase of hardware to implement encoding and decoding/checking circuitry).
伯杰反码是一种用于保护通信信道免受所有非对称错误和降低功耗的编码方案。本文提出了一种构造改进的berger -反转码的方法,该方法依赖于选择具有尽可能小的总权值的校验部分,并将低权值的校验部分分配给具有最大汉明权值的最多的数据子集。因此,对于8位总线,传输数据的错误率可以降低约23.5%,而无需成本(不需要额外的总线线路或增加硬件来实现编码和解码/检查电路)。
{"title":"On Reducing Error Rate of Data Protected Using Systematic Unordered Codes in Asymmetric Channels","authors":"S. Piestrak","doi":"10.1109/DSD.2010.117","DOIUrl":"https://doi.org/10.1109/DSD.2010.117","url":null,"abstract":"Berger-invert codes are coding schemes used to protect communication channels against all asymmetric errors and to decrease power consumption. This paper proposes a method of constructing modified Berger-invert codes that relies on the choice of check parts with the smallest possible total weight and assignment of low-weight check parts to the most numerous subsets of data with the largest Hamming weights. As a result, the error rate of the transmitted data can be reduced by up to about 23.5% for a 8-bit bus at no cost (no extra bus lines or increase of hardware to implement encoding and decoding/checking circuitry).","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127754727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Power Consumption Modeling for DVFS Exploitation DVFS开发中的功耗建模
A. Castagnetti, C. Belleudy, S. Bilavarn, M. Auguin
A lot of task scheduling algorithms and power management policies have been developed based on simplistic power models, which rarely take into account the effects of the power consumptions of the different components of a real system. Most of the models on which the study of the DVFS scheduling is based, make the assumption that the power consumption of a processor could be modelled as a E ∝ V 2 model. This hypothesis, even if partly true, is not generally applicable when considering the complete system, which consists of the processor, memories and power conversion circuits. In this paper we present a power and energy model for a DVFS enabled mobile computing platform. The platform is based on a low power SoC, which integrates both the processor core and memory, as well as other hardware accelerators. We include in our analisys the study of the power conversion components, which supply the SoC. Starting from measures, we first characterize the power consumption of the SoC and the converters, then a power and energy model for the processor is proposed. The model is able to predict the power consumption of the processor core with an average error less than 10%. This is then used to analyse two DVFS scheduling techniques based on the EDF algorithm, Cycle Conserving and Look Ahead. The results show that the CPU energy saving computed using our model, is far less than what would be expected using a model that does not take into account the effect of the static power.
许多任务调度算法和电源管理策略都是基于简单的功耗模型开发的,很少考虑实际系统中不同组件功耗的影响。研究DVFS调度的大多数模型都假设处理器的功耗可以用E∝v2模型来建模。这种假设即使部分正确,但在考虑由处理器、存储器和电源转换电路组成的完整系统时并不普遍适用。在本文中,我们提出了一个支持DVFS的移动计算平台的功率和能量模型。该平台基于低功耗SoC,集成了处理器核心和内存以及其他硬件加速器。在我们的分析中包括了对供电SoC的功率转换组件的研究。从测量的角度出发,首先对SoC和转换器的功耗进行了表征,然后提出了处理器的功耗和能量模型。该模型能够预测处理器核心的功耗,平均误差小于10%。然后分析了两种基于EDF算法的DVFS调度技术,循环保护和前瞻性。结果表明,使用我们的模型计算的CPU节能远远小于使用不考虑静态功率影响的模型所期望的。
{"title":"Power Consumption Modeling for DVFS Exploitation","authors":"A. Castagnetti, C. Belleudy, S. Bilavarn, M. Auguin","doi":"10.1109/DSD.2010.55","DOIUrl":"https://doi.org/10.1109/DSD.2010.55","url":null,"abstract":"A lot of task scheduling algorithms and power management policies have been developed based on simplistic power models, which rarely take into account the effects of the power consumptions of the different components of a real system. Most of the models on which the study of the DVFS scheduling is based, make the assumption that the power consumption of a processor could be modelled as a E ∝ V 2 model. This hypothesis, even if partly true, is not generally applicable when considering the complete system, which consists of the processor, memories and power conversion circuits. In this paper we present a power and energy model for a DVFS enabled mobile computing platform. The platform is based on a low power SoC, which integrates both the processor core and memory, as well as other hardware accelerators. We include in our analisys the study of the power conversion components, which supply the SoC. Starting from measures, we first characterize the power consumption of the SoC and the converters, then a power and energy model for the processor is proposed. The model is able to predict the power consumption of the processor core with an average error less than 10%. This is then used to analyse two DVFS scheduling techniques based on the EDF algorithm, Cycle Conserving and Look Ahead. The results show that the CPU energy saving computed using our model, is far less than what would be expected using a model that does not take into account the effect of the static power.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123694235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Cyclic Redundancy Checking (CRC) Accelerator for the FlexCore Processor 循环冗余检查(CRC)加速器的FlexCore处理器
M. Azhar, T. Hoang, P. Larsson-Edefors
A proven approach to increase performance of general-purpose processors is to add hardware accelerators. In its basic configuration, the FlexCore processor has a limited set of datapath units. But thanks to a flexible datapath interconnect and a wide control word, the FlexCore datapath is explicitly designed to support integration of special units that, on demand, can accelerate certain data-intensive applications. We present the integration of a versatile accelerator for several Cyclic Redundancy Checking (CRC) keys. Furthermore, we investigate the accelerator’s impact on processor execution time and energy efficiency, using the Power Stone CRC benchmark. Our evaluation shows that the accelerated 65-nm 2.7-ns FlexCore datapath is, for example, 86% more energy and cycle efficient than a datapath lacking the CRC accelerator.
提高通用处理器性能的一种经过验证的方法是添加硬件加速器。在其基本配置中,FlexCore处理器有一组有限的数据路径单元。但是由于灵活的数据路径互连和广泛的控制字,FlexCore数据路径被明确设计为支持特殊单元的集成,可以根据需要加速某些数据密集型应用程序。我们提出了一个多功能加速器的几个循环冗余校验(CRC)密钥的集成。此外,我们研究了加速器对处理器执行时间和能源效率的影响,使用Power Stone CRC基准。我们的评估表明,例如,加速的65纳米2.7 ns FlexCore数据路径比缺乏CRC加速器的数据路径的能量和循环效率高86%。
{"title":"Cyclic Redundancy Checking (CRC) Accelerator for the FlexCore Processor","authors":"M. Azhar, T. Hoang, P. Larsson-Edefors","doi":"10.1109/DSD.2010.51","DOIUrl":"https://doi.org/10.1109/DSD.2010.51","url":null,"abstract":"A proven approach to increase performance of general-purpose processors is to add hardware accelerators. In its basic configuration, the FlexCore processor has a limited set of datapath units. But thanks to a flexible datapath interconnect and a wide control word, the FlexCore datapath is explicitly designed to support integration of special units that, on demand, can accelerate certain data-intensive applications. We present the integration of a versatile accelerator for several Cyclic Redundancy Checking (CRC) keys. Furthermore, we investigate the accelerator’s impact on processor execution time and energy efficiency, using the Power Stone CRC benchmark. Our evaluation shows that the accelerated 65-nm 2.7-ns FlexCore datapath is, for example, 86% more energy and cycle efficient than a datapath lacking the CRC accelerator.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126528621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1