iAF2D (incremental Automatic Functional Fault Detective) is a methodology for the identification of the faulty component in a complex system using data collected from a test session. It is an incremental approach based on a Bayesian Belief Network, where the model of the system under analysis is extracted from a faulty signature description. iAF2D reduces time, cost and efforts during the diagnostic phase by implementing a step-by-step selection of the tests to be executed from the set of available tests. This paper focuses on the evolution of the BBN nodes probabilities, to define a stop criterion to interrupt the diagnosis process when additional test outcomes would not provide further useful information for identifying the faulty candidate. Methodology validation is performed on a set of experimental results.
{"title":"A Formal Condition to Stop an Incremental Automatic Functional Diagnosis","authors":"Luca Amati, C. Bolchini, F. Salice, F. Franzoso","doi":"10.1109/DSD.2010.98","DOIUrl":"https://doi.org/10.1109/DSD.2010.98","url":null,"abstract":"iAF2D (incremental Automatic Functional Fault Detective) is a methodology for the identification of the faulty component in a complex system using data collected from a test session. It is an incremental approach based on a Bayesian Belief Network, where the model of the system under analysis is extracted from a faulty signature description. iAF2D reduces time, cost and efforts during the diagnostic phase by implementing a step-by-step selection of the tests to be executed from the set of available tests. This paper focuses on the evolution of the BBN nodes probabilities, to define a stop criterion to interrupt the diagnosis process when additional test outcomes would not provide further useful information for identifying the faulty candidate. Methodology validation is performed on a set of experimental results.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122487613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Currently few architectural approaches propose new paths to raise the performance of conventional sequential instruction streams in the time of the billions transistor era. Many application programs could profit from processors that are able to speed up the execution of sequential applications beyond the performance of current super scalar processors. The Grid Alu Processor (GAP) is a runtime reconfigurable processor designed for the acceleration of a conventional sequential instruction stream without the need of recompilation. It comprises a super scalar processor front-end, a configuration unit, and an array of reconfigurable functional units (FUs), which is fully integrated into the pipeline. The configuration unit maps data dependent and independent instructions simultaneously at runtime into the array of FUs. This paper evaluates the GAP architecture and optimizes the architecture, the number of FUs, and the configuration layers implemented in the array. The simulations show a significant speed-up for sequential applications on GAP in comparison to an out-of-order super scalar simulator (Simple Scalar). The GAP simulator outperforms Simple Scalar on average by about 50% on the basic architecture and about 100% with an extended version including configuration layers.
{"title":"Reconfigurable Grid Alu Processor: Optimization and Design Space Exploration","authors":"Basher Shehan, Ralf Jahr, S. Uhrig, T. Ungerer","doi":"10.1109/DSD.2010.28","DOIUrl":"https://doi.org/10.1109/DSD.2010.28","url":null,"abstract":"Currently few architectural approaches propose new paths to raise the performance of conventional sequential instruction streams in the time of the billions transistor era. Many application programs could profit from processors that are able to speed up the execution of sequential applications beyond the performance of current super scalar processors. The Grid Alu Processor (GAP) is a runtime reconfigurable processor designed for the acceleration of a conventional sequential instruction stream without the need of recompilation. It comprises a super scalar processor front-end, a configuration unit, and an array of reconfigurable functional units (FUs), which is fully integrated into the pipeline. The configuration unit maps data dependent and independent instructions simultaneously at runtime into the array of FUs. This paper evaluates the GAP architecture and optimizes the architecture, the number of FUs, and the configuration layers implemented in the array. The simulations show a significant speed-up for sequential applications on GAP in comparison to an out-of-order super scalar simulator (Simple Scalar). The GAP simulator outperforms Simple Scalar on average by about 50% on the basic architecture and about 100% with an extended version including configuration layers.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115148605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The work reported in this paper describes the steps given towards an FPGA-based implementation of evolvable wavelet transforms for image compression in embedded systems. An Evolutionary Algorithm (EA) for the design and optimization of the transform coefficients is tailored for a suitable System on Chip implementation. Several cut downs on the computing requirements have been done to the original algorithm, adapting it for the FPGA implementation. What this paper addresses more specifically is the validation of the algorithm using fixed point arithmetic for the whole optimization process. The results show how high quality transforms are evolved from scratch with limited precision arithmetic. Also, preliminary results of the implementation in an FPGA device are included.
{"title":"High Level Validation of an Optimization Algorithm for the Implementation of Adaptive Wavelet Transforms in FPGAs","authors":"R. Salvador, F. Moreno, T. Riesgo, L. Sekanina","doi":"10.1109/DSD.2010.96","DOIUrl":"https://doi.org/10.1109/DSD.2010.96","url":null,"abstract":"The work reported in this paper describes the steps given towards an FPGA-based implementation of evolvable wavelet transforms for image compression in embedded systems. An Evolutionary Algorithm (EA) for the design and optimization of the transform coefficients is tailored for a suitable System on Chip implementation. Several cut downs on the computing requirements have been done to the original algorithm, adapting it for the FPGA implementation. What this paper addresses more specifically is the validation of the algorithm using fixed point arithmetic for the whole optimization process. The results show how high quality transforms are evolved from scratch with limited precision arithmetic. Also, preliminary results of the implementation in an FPGA device are included.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124510892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Danese, Mauro Giachero, F. Leporati, Nelson Nazzicari
Biometric identification systems exploit automated methods of recognition based on physiological or behavioural people characteristics. Among these, fingerprints are very affordable biometric identifiers. In order to build embedded systems performing real-time authentication, a fast computational unit for image processing is required. In this paper we propose a parallel architecture that efficiently implements the high computationally demanding core of a matching algorithm based on Band Limited Phase Only spatial Correlation (BLPOC), elaborated by two concurrent computational units implemented onto Stratix II family Altera FPGA. The realised device is competitive with those provided by similar hardware solutions described in literature and outperforms the elaboration capabilities of general purpose PC processors.
{"title":"A Multicore Embedded Processor for Fingerprint Recognition","authors":"G. Danese, Mauro Giachero, F. Leporati, Nelson Nazzicari","doi":"10.1109/DSD.2010.101","DOIUrl":"https://doi.org/10.1109/DSD.2010.101","url":null,"abstract":"Biometric identification systems exploit automated methods of recognition based on physiological or behavioural people characteristics. Among these, fingerprints are very affordable biometric identifiers. In order to build embedded systems performing real-time authentication, a fast computational unit for image processing is required. In this paper we propose a parallel architecture that efficiently implements the high computationally demanding core of a matching algorithm based on Band Limited Phase Only spatial Correlation (BLPOC), elaborated by two concurrent computational units implemented onto Stratix II family Altera FPGA. The realised device is competitive with those provided by similar hardware solutions described in literature and outperforms the elaboration capabilities of general purpose PC processors.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116266885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Modeling of complex and computationally intense applications supported by modern mobile devices via standard modeling languages is a challenging task. Within the GENESYS process model the application modeling phase is thus of key importance. GENESYS manages complexity by employing cross domain and platform-based application design. The main contribution of this article is to describe the instantiation of GENESYS application architecture modeling via MARTE profile and describe a methodology for validation of nonfunctional properties annotated in the application model.
{"title":"Instantiating GENESYS Application Architecture Modeling via UML 2.0 Constructs and MARTE Profile","authors":"Subayal Khan, Kari Tiensyrjä, J. Nurmi","doi":"10.1109/DSD.2010.36","DOIUrl":"https://doi.org/10.1109/DSD.2010.36","url":null,"abstract":"Modeling of complex and computationally intense applications supported by modern mobile devices via standard modeling languages is a challenging task. Within the GENESYS process model the application modeling phase is thus of key importance. GENESYS manages complexity by employing cross domain and platform-based application design. The main contribution of this article is to describe the instantiation of GENESYS application architecture modeling via MARTE profile and describe a methodology for validation of nonfunctional properties annotated in the application model.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116524788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The syntax-directed synthesis paradigm has shown to be a powerful synthesis approach. However, its control-driven nature results in significant performance overhead. Some methods to reduce this overhead include peephole optimisations, control resynthesis and component optimisations. This work explores new methods of improving the performance of syntax-directed synthesised asynchronous circuits, using the Balsa synthesis system as the research framework. This includes investigating description styles and the usage of language constructs that exploit the directness of the synthesis method to obtain more concurrent and faster circuits. The techniques and optimisations presented here has been tested in a set of non-trivial examples including a 32-bit processor, a Viterbi decoder, and a channel-sliced wormhole router.
{"title":"Description-Level Optimisation of Synthesisable Asynchronous Circuits","authors":"L. Tarazona, D. Edwards, A. Bardsley, L. Plana","doi":"10.1109/DSD.2010.71","DOIUrl":"https://doi.org/10.1109/DSD.2010.71","url":null,"abstract":"The syntax-directed synthesis paradigm has shown to be a powerful synthesis approach. However, its control-driven nature results in significant performance overhead. Some methods to reduce this overhead include peephole optimisations, control resynthesis and component optimisations. This work explores new methods of improving the performance of syntax-directed synthesised asynchronous circuits, using the Balsa synthesis system as the research framework. This includes investigating description styles and the usage of language constructs that exploit the directness of the synthesis method to obtain more concurrent and faster circuits. The techniques and optimisations presented here has been tested in a set of non-trivial examples including a 32-bit processor, a Viterbi decoder, and a channel-sliced wormhole router.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121841167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work proposes a testable QCA (Quantum-Dot Cellular Automata) logic gate (UQCALG) realizing the universal functions. The design of UQCALG is based on the Coupled Majority Minority (CMVMIN) QCA structure with the target to reduce wire crossings as well as the number of clock cycles required to operate a QCA circuit. The characterization of defects in such design leads to synthesis of a test block, realized with the majority and minority voters, that ensures the desired testability of a circuit. The experimental designs establish that the UQCALG can result in cost effective design of testable QCA logic circuits that may not be possible with conventional ULG (Universal Logic Gate).
{"title":"Design of Testable Universal Logic Gate Targeting Minimum Wire-Crossings in QCA Logic Circuit","authors":"B. Sen, Anik Sengupta, M. Dalui, B. Sikdar","doi":"10.1109/DSD.2010.114","DOIUrl":"https://doi.org/10.1109/DSD.2010.114","url":null,"abstract":"This work proposes a testable QCA (Quantum-Dot Cellular Automata) logic gate (UQCALG) realizing the universal functions. The design of UQCALG is based on the Coupled Majority Minority (CMVMIN) QCA structure with the target to reduce wire crossings as well as the number of clock cycles required to operate a QCA circuit. The characterization of defects in such design leads to synthesis of a test block, realized with the majority and minority voters, that ensures the desired testability of a circuit. The experimental designs establish that the UQCALG can result in cost effective design of testable QCA logic circuits that may not be possible with conventional ULG (Universal Logic Gate).","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114161278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Berger-invert codes are coding schemes used to protect communication channels against all asymmetric errors and to decrease power consumption. This paper proposes a method of constructing modified Berger-invert codes that relies on the choice of check parts with the smallest possible total weight and assignment of low-weight check parts to the most numerous subsets of data with the largest Hamming weights. As a result, the error rate of the transmitted data can be reduced by up to about 23.5% for a 8-bit bus at no cost (no extra bus lines or increase of hardware to implement encoding and decoding/checking circuitry).
{"title":"On Reducing Error Rate of Data Protected Using Systematic Unordered Codes in Asymmetric Channels","authors":"S. Piestrak","doi":"10.1109/DSD.2010.117","DOIUrl":"https://doi.org/10.1109/DSD.2010.117","url":null,"abstract":"Berger-invert codes are coding schemes used to protect communication channels against all asymmetric errors and to decrease power consumption. This paper proposes a method of constructing modified Berger-invert codes that relies on the choice of check parts with the smallest possible total weight and assignment of low-weight check parts to the most numerous subsets of data with the largest Hamming weights. As a result, the error rate of the transmitted data can be reduced by up to about 23.5% for a 8-bit bus at no cost (no extra bus lines or increase of hardware to implement encoding and decoding/checking circuitry).","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127754727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Castagnetti, C. Belleudy, S. Bilavarn, M. Auguin
A lot of task scheduling algorithms and power management policies have been developed based on simplistic power models, which rarely take into account the effects of the power consumptions of the different components of a real system. Most of the models on which the study of the DVFS scheduling is based, make the assumption that the power consumption of a processor could be modelled as a E ∝ V 2 model. This hypothesis, even if partly true, is not generally applicable when considering the complete system, which consists of the processor, memories and power conversion circuits. In this paper we present a power and energy model for a DVFS enabled mobile computing platform. The platform is based on a low power SoC, which integrates both the processor core and memory, as well as other hardware accelerators. We include in our analisys the study of the power conversion components, which supply the SoC. Starting from measures, we first characterize the power consumption of the SoC and the converters, then a power and energy model for the processor is proposed. The model is able to predict the power consumption of the processor core with an average error less than 10%. This is then used to analyse two DVFS scheduling techniques based on the EDF algorithm, Cycle Conserving and Look Ahead. The results show that the CPU energy saving computed using our model, is far less than what would be expected using a model that does not take into account the effect of the static power.
{"title":"Power Consumption Modeling for DVFS Exploitation","authors":"A. Castagnetti, C. Belleudy, S. Bilavarn, M. Auguin","doi":"10.1109/DSD.2010.55","DOIUrl":"https://doi.org/10.1109/DSD.2010.55","url":null,"abstract":"A lot of task scheduling algorithms and power management policies have been developed based on simplistic power models, which rarely take into account the effects of the power consumptions of the different components of a real system. Most of the models on which the study of the DVFS scheduling is based, make the assumption that the power consumption of a processor could be modelled as a E ∝ V 2 model. This hypothesis, even if partly true, is not generally applicable when considering the complete system, which consists of the processor, memories and power conversion circuits. In this paper we present a power and energy model for a DVFS enabled mobile computing platform. The platform is based on a low power SoC, which integrates both the processor core and memory, as well as other hardware accelerators. We include in our analisys the study of the power conversion components, which supply the SoC. Starting from measures, we first characterize the power consumption of the SoC and the converters, then a power and energy model for the processor is proposed. The model is able to predict the power consumption of the processor core with an average error less than 10%. This is then used to analyse two DVFS scheduling techniques based on the EDF algorithm, Cycle Conserving and Look Ahead. The results show that the CPU energy saving computed using our model, is far less than what would be expected using a model that does not take into account the effect of the static power.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123694235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A proven approach to increase performance of general-purpose processors is to add hardware accelerators. In its basic configuration, the FlexCore processor has a limited set of datapath units. But thanks to a flexible datapath interconnect and a wide control word, the FlexCore datapath is explicitly designed to support integration of special units that, on demand, can accelerate certain data-intensive applications. We present the integration of a versatile accelerator for several Cyclic Redundancy Checking (CRC) keys. Furthermore, we investigate the accelerator’s impact on processor execution time and energy efficiency, using the Power Stone CRC benchmark. Our evaluation shows that the accelerated 65-nm 2.7-ns FlexCore datapath is, for example, 86% more energy and cycle efficient than a datapath lacking the CRC accelerator.
提高通用处理器性能的一种经过验证的方法是添加硬件加速器。在其基本配置中,FlexCore处理器有一组有限的数据路径单元。但是由于灵活的数据路径互连和广泛的控制字,FlexCore数据路径被明确设计为支持特殊单元的集成,可以根据需要加速某些数据密集型应用程序。我们提出了一个多功能加速器的几个循环冗余校验(CRC)密钥的集成。此外,我们研究了加速器对处理器执行时间和能源效率的影响,使用Power Stone CRC基准。我们的评估表明,例如,加速的65纳米2.7 ns FlexCore数据路径比缺乏CRC加速器的数据路径的能量和循环效率高86%。
{"title":"Cyclic Redundancy Checking (CRC) Accelerator for the FlexCore Processor","authors":"M. Azhar, T. Hoang, P. Larsson-Edefors","doi":"10.1109/DSD.2010.51","DOIUrl":"https://doi.org/10.1109/DSD.2010.51","url":null,"abstract":"A proven approach to increase performance of general-purpose processors is to add hardware accelerators. In its basic configuration, the FlexCore processor has a limited set of datapath units. But thanks to a flexible datapath interconnect and a wide control word, the FlexCore datapath is explicitly designed to support integration of special units that, on demand, can accelerate certain data-intensive applications. We present the integration of a versatile accelerator for several Cyclic Redundancy Checking (CRC) keys. Furthermore, we investigate the accelerator’s impact on processor execution time and energy efficiency, using the Power Stone CRC benchmark. Our evaluation shows that the accelerated 65-nm 2.7-ns FlexCore datapath is, for example, 86% more energy and cycle efficient than a datapath lacking the CRC accelerator.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126528621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}