In most real time multi-task systems, scheduling is handled by the operating systems. The overhead of task management is significant in such systems. And also, strict isolation of the real time tasks can hardly be provided. A hardware scheduler is proposed to address the above problems. Different from previous work, the proposed scheduler was embedded into the processor. A monitor-and-tuner unit was used to measure and record the efficiency of every two-tuple of a task and a processor core. The tasks will be adaptively assigned to the most efficient core. The hardware scheduler can reduce the overhead of task management in the experiments. In the prototype multi-core application-turned processor architecture (ATPA) system, it helped to exploit the utilization of each application specific core and increase the total performance.
{"title":"Adaptive Hardware Real-Time Task Scheduler of Multi-Core ATPA Environment","authors":"Mi Zhou, L. Shang, Jiong Zhang, H. Jin","doi":"10.1109/AHS.2009.17","DOIUrl":"https://doi.org/10.1109/AHS.2009.17","url":null,"abstract":"In most real time multi-task systems, scheduling is handled by the operating systems. The overhead of task management is significant in such systems. And also, strict isolation of the real time tasks can hardly be provided. A hardware scheduler is proposed to address the above problems. Different from previous work, the proposed scheduler was embedded into the processor. A monitor-and-tuner unit was used to measure and record the efficiency of every two-tuple of a task and a processor core. The tasks will be adaptively assigned to the most efficient core. The hardware scheduler can reduce the overhead of task management in the experiments. In the prototype multi-core application-turned processor architecture (ATPA) system, it helped to exploit the utilization of each application specific core and increase the total performance.","PeriodicalId":318989,"journal":{"name":"2009 NASA/ESA Conference on Adaptive Hardware and Systems","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115409378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The constant advances on scaling have introduced several issues to the design of processing structures in new technologies. The closer one gets to nano-scale devices, the more necessary are methods to develop circuits that are able to tolerate high defect densities. At the same time, beyond area costs, there is a pressure to maintain energy and power dissipation at acceptable levels, which practically forbids classical redundancy. This paper presents a dynamic solution to provide reliability and reduce energy of a microprocessor using a dynamically adaptive reconfigurable fabric. The approach combines the binary translation mechanism with the sleep transistor technique to ensure graceful degradation for software applications, while at the same time can reduce energy by shutting off the power supply of the unused and the defective resources of a reconfigurable fabric.
{"title":"Dynamically Adapted Low-Energy Fault Tolerant Processors","authors":"M. Pereira, L. Carro","doi":"10.1109/AHS.2009.34","DOIUrl":"https://doi.org/10.1109/AHS.2009.34","url":null,"abstract":"The constant advances on scaling have introduced several issues to the design of processing structures in new technologies. The closer one gets to nano-scale devices, the more necessary are methods to develop circuits that are able to tolerate high defect densities. At the same time, beyond area costs, there is a pressure to maintain energy and power dissipation at acceptable levels, which practically forbids classical redundancy. This paper presents a dynamic solution to provide reliability and reduce energy of a microprocessor using a dynamically adaptive reconfigurable fabric. The approach combines the binary translation mechanism with the sleep transistor technique to ensure graceful degradation for software applications, while at the same time can reduce energy by shutting off the power supply of the unused and the defective resources of a reconfigurable fabric.","PeriodicalId":318989,"journal":{"name":"2009 NASA/ESA Conference on Adaptive Hardware and Systems","volume":"1047 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113994903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we describe the implementation of gravitational force calculation for N-body simulations in the context of Astrophysics. It will describe high performance implementations on general purpose processors, GPUs, and FPGAs, and compare them using a number of criteria including speed performance, power efficiency and cost of development. These results show that, for gravitational force calculation and many-body simulations in general, GPUs are very competitive in terms of performance and performance per dollar figures, whereas FPGAs are competitive in terms of performance per Watt figures.
{"title":"A Comparative Study on ASIC, FPGAs, GPUs and General Purpose Processors in the O(N^2) Gravitational N-body Simulation","authors":"T. Hamada, K. Benkrid, Keigo Nitadori, M. Taiji","doi":"10.1109/AHS.2009.55","DOIUrl":"https://doi.org/10.1109/AHS.2009.55","url":null,"abstract":"In this paper, we describe the implementation of gravitational force calculation for N-body simulations in the context of Astrophysics. It will describe high performance implementations on general purpose processors, GPUs, and FPGAs, and compare them using a number of criteria including speed performance, power efficiency and cost of development. These results show that, for gravitational force calculation and many-body simulations in general, GPUs are very competitive in terms of performance and performance per dollar figures, whereas FPGAs are competitive in terms of performance per Watt figures.","PeriodicalId":318989,"journal":{"name":"2009 NASA/ESA Conference on Adaptive Hardware and Systems","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123925218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mario Porrmann, M. Purnaprajna, Christoph Puttmann
A dynamically reconfigurable on-chip multiprocessor architecture is presented, which can be adapted to changing application demands and to faults detected at run-time. The scalable architecture comprises lightweight embedded RISC processors that are interconnected by a hierarchical network-on-chip (NoC). Reconfigurability is integrated into the processors as well as into the NoC with minimal area and performance overhead. Adaptability of the architecture relies on a self-optimizing reconfiguration of the MPSoC at run-time. The resource-efficiency of the proposed architecture is analyzed based on FPGA and ASIC prototypes.
{"title":"Self-optimization of MPSoCs Targeting Resource Efficiency and Fault Tolerance","authors":"Mario Porrmann, M. Purnaprajna, Christoph Puttmann","doi":"10.1109/AHS.2009.52","DOIUrl":"https://doi.org/10.1109/AHS.2009.52","url":null,"abstract":"A dynamically reconfigurable on-chip multiprocessor architecture is presented, which can be adapted to changing application demands and to faults detected at run-time. The scalable architecture comprises lightweight embedded RISC processors that are interconnected by a hierarchical network-on-chip (NoC). Reconfigurability is integrated into the processors as well as into the NoC with minimal area and performance overhead. Adaptability of the architecture relies on a self-optimizing reconfiguration of the MPSoC at run-time. The resource-efficiency of the proposed architecture is analyzed based on FPGA and ASIC prototypes.","PeriodicalId":318989,"journal":{"name":"2009 NASA/ESA Conference on Adaptive Hardware and Systems","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115898226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this work we present EvoCache, a novel approach for implementing application-specific caches. The key innovation of EvoCache is to make the function that maps memory addresses from the CPU address space to cache indices programmable. We support arbitrary Boolean mapping functions that are implemented within a small reconfigurable logic fabric. For finding suitable cache mapping functions we rely on techniques from the evolvable hardware domain and utilize an evolutionary optimization procedure. We evaluate the use of EvoCache in an embedded processor for two specific applications (JPEG and BZIP2 compression) with respect to execution time, cache miss rate and energy consumption. We show that the evolvable hardware approach for optimizing the cache functions not only significantly improves the cache performance for the training data used during optimization, but that the evolved mapping functions generalize very well. Compared to a conventional cache architecture, EvoCache applied to test data achieves a reduction in execution time of up to 14.31% for JPEG (10.98% for BZIP2), and in energy consumption by 16.43% for JPEG (10.70% for BZIP2). We also discuss the integration of EvoCache into the operating system and show that the area and delay overheads introduced by EvoCache are acceptable.
{"title":"EvoCaches: Application-specific Adaptation of Cache Mappings","authors":"Paul Kaufmann, Christian Plessl, M. Platzner","doi":"10.1109/AHS.2009.26","DOIUrl":"https://doi.org/10.1109/AHS.2009.26","url":null,"abstract":"In this work we present EvoCache, a novel approach for implementing application-specific caches. The key innovation of EvoCache is to make the function that maps memory addresses from the CPU address space to cache indices programmable. We support arbitrary Boolean mapping functions that are implemented within a small reconfigurable logic fabric. For finding suitable cache mapping functions we rely on techniques from the evolvable hardware domain and utilize an evolutionary optimization procedure. We evaluate the use of EvoCache in an embedded processor for two specific applications (JPEG and BZIP2 compression) with respect to execution time, cache miss rate and energy consumption. We show that the evolvable hardware approach for optimizing the cache functions not only significantly improves the cache performance for the training data used during optimization, but that the evolved mapping functions generalize very well. Compared to a conventional cache architecture, EvoCache applied to test data achieves a reduction in execution time of up to 14.31% for JPEG (10.98% for BZIP2), and in energy consumption by 16.43% for JPEG (10.70% for BZIP2). We also discuss the integration of EvoCache into the operating system and show that the area and delay overheads introduced by EvoCache are acceptable.","PeriodicalId":318989,"journal":{"name":"2009 NASA/ESA Conference on Adaptive Hardware and Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120954017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The demand for high-performance on-board processing in space applications drastically increased because of the discrepancy between extreme high data volume and low downlink channel capacity. Furthermore in-flight reconfigurability and dynamic partial reconfiguration enhances space applications with re-programmable hardware and at run-time adaptive functionality. Therefore it is a maintenance and performance improvement. Furthermore it enables mission specific adaptability on demand on board of S/C. Additionally dynamic partial reconfiguration is an improvement in terms of resource utilization and costs. Current space qualified reprogrammable FPGA technologies provide large logic density and have already successfully demonstrated their suitability for space applications. To achieve such an advanced dynamic partial reconfigurable system an appropriate FPGA architecture has to be chosen and the requirements to meet a high reliable system have to be analyzed. In this paper the current available reprogrammable FPGA technologies will be compared and their suitability for a dynamic partial reconfiguration will be outlined. The requirements to achieve a high reliable fault tolerant system will be presented and a framework is proposed.
{"title":"Dynamic Partial Reconfiguration in Space Applications","authors":"B. Osterloh, H. Michalik, S. Habinc, B. Fiethe","doi":"10.1109/AHS.2009.13","DOIUrl":"https://doi.org/10.1109/AHS.2009.13","url":null,"abstract":"The demand for high-performance on-board processing in space applications drastically increased because of the discrepancy between extreme high data volume and low downlink channel capacity. Furthermore in-flight reconfigurability and dynamic partial reconfiguration enhances space applications with re-programmable hardware and at run-time adaptive functionality. Therefore it is a maintenance and performance improvement. Furthermore it enables mission specific adaptability on demand on board of S/C. Additionally dynamic partial reconfiguration is an improvement in terms of resource utilization and costs. Current space qualified reprogrammable FPGA technologies provide large logic density and have already successfully demonstrated their suitability for space applications. To achieve such an advanced dynamic partial reconfigurable system an appropriate FPGA architecture has to be chosen and the requirements to meet a high reliable system have to be analyzed. In this paper the current available reprogrammable FPGA technologies will be compared and their suitability for a dynamic partial reconfiguration will be outlined. The requirements to achieve a high reliable fault tolerant system will be presented and a framework is proposed.","PeriodicalId":318989,"journal":{"name":"2009 NASA/ESA Conference on Adaptive Hardware and Systems","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125234334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Xydis, I. Triantafyllou, G. Economakos, K. Pekmestzi
Datapath synthesis incorporating complex operation templates has been proven extremely efficient especially for the Digital Signal Processing (DSP) application domain.However, only architectural level optimizations have been reported for the specification and implementation of the operation templates. This paper introduces the consideration of arithmetic level optimizations for template based datapath synthesis. A high performance architecture for the implementation of DSP kernels is presented. It is based on flexible and arithmetically optimized components able to perform a large set of operation templates. A synthesis methodology for optimized mapping of DSP kernels onto the proposed architecture is also presented. Experimental results are reported showing significant gains in execution time, active chip area and power dissipation in comparison to previously published flexible template-based data paths.
{"title":"Flexible Datapath Synthesis through Arithmetically Optimized Operation Chaining","authors":"S. Xydis, I. Triantafyllou, G. Economakos, K. Pekmestzi","doi":"10.1109/AHS.2009.21","DOIUrl":"https://doi.org/10.1109/AHS.2009.21","url":null,"abstract":"Datapath synthesis incorporating complex operation templates has been proven extremely efficient especially for the Digital Signal Processing (DSP) application domain.However, only architectural level optimizations have been reported for the specification and implementation of the operation templates. This paper introduces the consideration of arithmetic level optimizations for template based datapath synthesis. A high performance architecture for the implementation of DSP kernels is presented. It is based on flexible and arithmetically optimized components able to perform a large set of operation templates. A synthesis methodology for optimized mapping of DSP kernels onto the proposed architecture is also presented. Experimental results are reported showing significant gains in execution time, active chip area and power dissipation in comparison to previously published flexible template-based data paths.","PeriodicalId":318989,"journal":{"name":"2009 NASA/ESA Conference on Adaptive Hardware and Systems","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117102403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
O. Brousse, J. Guillot, G. Sassatelli, Thierry Gil, M. Robert, J. Moreno, A. Villa, E. Sanchez
This paper describes an agent oriented framework supporting bio-inspired mechanisms which takes profit of the intrinsic hardware parallelism of the pervasive platform developed within the Perplexus IST European project. The proposed framework is a flexible and modular means to describe and simulate complex phenomena such as biologically plausible neural networks or culture dissemination. Associated to this framework and based on the multiprocessor architecture of the Perplexus platform nodes, a tool suite capable of accelerating parallelizable agents is described. Therefore, this contribution combines the software flexibility of agent-based programming with the efficiency of multiprocessor hardware execution. This framework has been successfully tested with two experiments: a proof of concept application made of robots that autonomously improve their behaviours according to their environment and a spiking neural network simulation. These results prove that the framework and its associated methodology are relevant in the context of the simulation of complex phenomena.
{"title":"A Bio-Inspired Agent Framework for Hardware Accelerated Distributed Pervasive Applications","authors":"O. Brousse, J. Guillot, G. Sassatelli, Thierry Gil, M. Robert, J. Moreno, A. Villa, E. Sanchez","doi":"10.1109/AHS.2009.54","DOIUrl":"https://doi.org/10.1109/AHS.2009.54","url":null,"abstract":"This paper describes an agent oriented framework supporting bio-inspired mechanisms which takes profit of the intrinsic hardware parallelism of the pervasive platform developed within the Perplexus IST European project. The proposed framework is a flexible and modular means to describe and simulate complex phenomena such as biologically plausible neural networks or culture dissemination. Associated to this framework and based on the multiprocessor architecture of the Perplexus platform nodes, a tool suite capable of accelerating parallelizable agents is described. Therefore, this contribution combines the software flexibility of agent-based programming with the efficiency of multiprocessor hardware execution. This framework has been successfully tested with two experiments: a proof of concept application made of robots that autonomously improve their behaviours according to their environment and a spiking neural network simulation. These results prove that the framework and its associated methodology are relevant in the context of the simulation of complex phenomena.","PeriodicalId":318989,"journal":{"name":"2009 NASA/ESA Conference on Adaptive Hardware and Systems","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131953370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A simple image enhancement technique based upon evolvable hardware is presented. Improving visual appearance is achieved by evolved histogram stretching transformation (evolved circuit). The performance is compared with the classical histogram equalization method using traditional measures of enhancement. Experimental results will be presented to show that the proposed technique offers better performance than the classical histogram equalization method.
{"title":"Evolvable Hardware Based Gray-level Image Enhancement","authors":"Jie Li","doi":"10.1109/AHS.2009.12","DOIUrl":"https://doi.org/10.1109/AHS.2009.12","url":null,"abstract":"A simple image enhancement technique based upon evolvable hardware is presented. Improving visual appearance is achieved by evolved histogram stretching transformation (evolved circuit). The performance is compared with the classical histogram equalization method using traditional measures of enhancement. Experimental results will be presented to show that the proposed technique offers better performance than the classical histogram equalization method.","PeriodicalId":318989,"journal":{"name":"2009 NASA/ESA Conference on Adaptive Hardware and Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129853017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Turnquist, E. Laulainen, Jani Mäkipää, H. Tenhunen, L. Koskinen
Emerging ubiquitous systems such as distributed sensor networks require ultra-low power consumption. The energy minimum and thus, the lowest possible power consumption of CMOS logic, is achieved in the sub-threshold region. The exponential dependence of the drain current on threshold voltage variations leads to increased overdesign if sub-threshold circuits are to be robust. Adaptive systems are required to address variability robustness. One approach to achieve adaptivity is timing error detection (TED) within the circuit. Presented here is a TED latch capable of sub-threshold operation. It was designed in 65 nm technology, has an operating voltage range of 0.25 V through 1.2 V, and a minimum energy point (MEP) of 0.4 V. At the MEP, the average power consumption for one clock period and an activity factor of alpha=0.5 is 0.43 nW. The area of the TED latch is 101 um^2. A sub-threshold CORDIC implementation is presented to demonstrate the TED latch at a system level.
{"title":"Adaptive Sub-Threshold Test Circuit","authors":"M. Turnquist, E. Laulainen, Jani Mäkipää, H. Tenhunen, L. Koskinen","doi":"10.1109/AHS.2009.20","DOIUrl":"https://doi.org/10.1109/AHS.2009.20","url":null,"abstract":"Emerging ubiquitous systems such as distributed sensor networks require ultra-low power consumption. The energy minimum and thus, the lowest possible power consumption of CMOS logic, is achieved in the sub-threshold region. The exponential dependence of the drain current on threshold voltage variations leads to increased overdesign if sub-threshold circuits are to be robust. Adaptive systems are required to address variability robustness. One approach to achieve adaptivity is timing error detection (TED) within the circuit. Presented here is a TED latch capable of sub-threshold operation. It was designed in 65 nm technology, has an operating voltage range of 0.25 V through 1.2 V, and a minimum energy point (MEP) of 0.4 V. At the MEP, the average power consumption for one clock period and an activity factor of alpha=0.5 is 0.43 nW. The area of the TED latch is 101 um^2. A sub-threshold CORDIC implementation is presented to demonstrate the TED latch at a system level.","PeriodicalId":318989,"journal":{"name":"2009 NASA/ESA Conference on Adaptive Hardware and Systems","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127823846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}