One design goal of future processors is to maximize the performance per watt. However, the performance of general purpose processors can be hardly improved by barely increasing clock frequency. This paper presents an application specific reconfigurable processor architecture which is fine tuned for high performance computing. It benefits from the application specific hardware customized to significantly improve its efficiency. In comparison with the existing work on configurable processor architectures, the proposed architecture has higher functional density and lower power consumption per inch due to its runtime partial reconfiguration ability. Moreover, it can adaptively change its architecture to further promote the average performance and feasibility for other applications.
{"title":"A New Application-Tuned Processor Architecture for High-Performance Reconfigurable Computing","authors":"L. Shang, Mi Zhou, Jiong Zhang, Hongbing Li","doi":"10.1109/AHS.2009.18","DOIUrl":"https://doi.org/10.1109/AHS.2009.18","url":null,"abstract":"One design goal of future processors is to maximize the performance per watt. However, the performance of general purpose processors can be hardly improved by barely increasing clock frequency. This paper presents an application specific reconfigurable processor architecture which is fine tuned for high performance computing. It benefits from the application specific hardware customized to significantly improve its efficiency. In comparison with the existing work on configurable processor architectures, the proposed architecture has higher functional density and lower power consumption per inch due to its runtime partial reconfiguration ability. Moreover, it can adaptively change its architecture to further promote the average performance and feasibility for other applications.","PeriodicalId":318989,"journal":{"name":"2009 NASA/ESA Conference on Adaptive Hardware and Systems","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124882095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Samie, G. Dragffy, A. Popescu, A. Pipe, C. Melhuish
This paper is presented in conjunction with, and forms the first part of, the paper entitled “Prokaryotic Bio-Inspired Systems.” In this part we propose and investigate a novel prokaryotic cell-based bio-inspired model suitable to implement self-healing bio-inspired systems. A key feature of our model is that system reliability can be increased with a minimal amount of hardware overhead. It also offers a bio-inspired compression/decompression technique that exploits the intimate relationship between different genes. Distributed DNA, highly dynamic and flexible routing resources and optimized self-repair characteristics (using Block and cell elimination) are some of the other advantages of the proposed model.
{"title":"Prokaryotic Bio-Inspired Model for Embryonics","authors":"M. Samie, G. Dragffy, A. Popescu, A. Pipe, C. Melhuish","doi":"10.1109/AHS.2009.45","DOIUrl":"https://doi.org/10.1109/AHS.2009.45","url":null,"abstract":"This paper is presented in conjunction with, and forms the first part of, the paper entitled “Prokaryotic Bio-Inspired Systems.” In this part we propose and investigate a novel prokaryotic cell-based bio-inspired model suitable to implement self-healing bio-inspired systems. A key feature of our model is that system reliability can be increased with a minimal amount of hardware overhead. It also offers a bio-inspired compression/decompression technique that exploits the intimate relationship between different genes. Distributed DNA, highly dynamic and flexible routing resources and optimized self-repair characteristics (using Block and cell elimination) are some of the other advantages of the proposed model.","PeriodicalId":318989,"journal":{"name":"2009 NASA/ESA Conference on Adaptive Hardware and Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124315663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we shall present a fully synchronous digital implementation of the Address Event Representation (AER) communication scheme that has been used in the PERPLEXUS chip in order to permit the emulation of large-scale biologically inspired spiking neural networks models. By introducing specific commands in the AER protocol it is possible to distribute the AER bus among a large number of chips where the functionality of the spiking neurons is being emulated. A careful design of the AER encoder module using compact Content Addressable Memories (CAMs) allows for a feasible realization of large-scale models.
{"title":"Synchronous Digital Implementation of the AER Communication Scheme for Emulating Large-Scale Spiking Neural Networks Models","authors":"J. Moreno, J. Madrenas, L. Kotynia","doi":"10.1109/AHS.2009.14","DOIUrl":"https://doi.org/10.1109/AHS.2009.14","url":null,"abstract":"In this paper we shall present a fully synchronous digital implementation of the Address Event Representation (AER) communication scheme that has been used in the PERPLEXUS chip in order to permit the emulation of large-scale biologically inspired spiking neural networks models. By introducing specific commands in the AER protocol it is possible to distribute the AER bus among a large number of chips where the functionality of the spiking neurons is being emulated. A careful design of the AER encoder module using compact Content Addressable Memories (CAMs) allows for a feasible realization of large-scale models.","PeriodicalId":318989,"journal":{"name":"2009 NASA/ESA Conference on Adaptive Hardware and Systems","volume":"552 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128805308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we describe a mapping methodology for heterogeneous reconfigurable architectures consisting of one or more SW processors and one or more reconfigurable units, FPGAs. The mapping methodology consists of a separated track for a) the generation of the configurations for the FPGA by level-based and clustering-based temporal partitioning, and b) the scheduling of those configurations as well as the software tasks, based on two multiprocessor scheduling algorithms: a simple list-based scheduler and the more complex extended dynamic level scheduling algorithm. The mapping methodology is benchmarked by means of randomly created task graphs on an architecture of one SW processor and one FPGA. The results are compared to a 0-1 integer linear programming solution in terms of exploration time as well as the finish-time of all tasks of the application. The results show that, in 90% of the investigated cases, the combination of level-based temporal partitioning and extended dynamic level scheduling gives the best performance in terms of finish-time of the full task-set.
{"title":"Scheduling Temporal Partitions in a Multiprocessing Paradigm for Reconfigurable Architectures","authors":"A. Popp, Y. Moullec, P. Koch","doi":"10.1109/AHS.2009.43","DOIUrl":"https://doi.org/10.1109/AHS.2009.43","url":null,"abstract":"In this paper we describe a mapping methodology for heterogeneous reconfigurable architectures consisting of one or more SW processors and one or more reconfigurable units, FPGAs. The mapping methodology consists of a separated track for a) the generation of the configurations for the FPGA by level-based and clustering-based temporal partitioning, and b) the scheduling of those configurations as well as the software tasks, based on two multiprocessor scheduling algorithms: a simple list-based scheduler and the more complex extended dynamic level scheduling algorithm. The mapping methodology is benchmarked by means of randomly created task graphs on an architecture of one SW processor and one FPGA. The results are compared to a 0-1 integer linear programming solution in terms of exploration time as well as the finish-time of all tasks of the application. The results show that, in 90% of the investigated cases, the combination of level-based temporal partitioning and extended dynamic level scheduling gives the best performance in terms of finish-time of the full task-set.","PeriodicalId":318989,"journal":{"name":"2009 NASA/ESA Conference on Adaptive Hardware and Systems","volume":"2002 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127315819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents the design and implementation of the FPGA-based web server for biological sequence alignment. Central to this web-server is a set of highly parameterisable, scalable, and platform-independent FPGA cores for biological sequence alignment. The web server consists of an HTML–based interface, a MySQL database which holds user queries and results, a set of biological databases, a library of FPGA configurations, a host application servicing user requests, and an FPGA coprocessor for the acceleration of the sequence alignment operation. The paper presents a real implementation of this server on an HP ProLiant DL145 server with a Celoxica RCHTX FPGA board. Compared to an optimized pure software implementation, our FPGA-based web server achieved a two order of magnitude speed-up for a pairwise protein sequence alignment application based on the Smith-Waterman algorithm. The FPGA-based implementation has the added advantage of being over 100x more energy efficient.
{"title":"An FPGA-Based Web Server for High Performance Biological Sequence Alignment","authors":"Y. Liu, K. Benkrid, A. Benkrid, Server Kasap","doi":"10.1109/AHS.2009.59","DOIUrl":"https://doi.org/10.1109/AHS.2009.59","url":null,"abstract":"This paper presents the design and implementation of the FPGA-based web server for biological sequence alignment. Central to this web-server is a set of highly parameterisable, scalable, and platform-independent FPGA cores for biological sequence alignment. The web server consists of an HTML–based interface, a MySQL database which holds user queries and results, a set of biological databases, a library of FPGA configurations, a host application servicing user requests, and an FPGA coprocessor for the acceleration of the sequence alignment operation. The paper presents a real implementation of this server on an HP ProLiant DL145 server with a Celoxica RCHTX FPGA board. Compared to an optimized pure software implementation, our FPGA-based web server achieved a two order of magnitude speed-up for a pairwise protein sequence alignment application based on the Smith-Waterman algorithm. The FPGA-based implementation has the added advantage of being over 100x more energy efficient.","PeriodicalId":318989,"journal":{"name":"2009 NASA/ESA Conference on Adaptive Hardware and Systems","volume":"864 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126966069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Within the field of neural electrophysiology, there exists a divide between experimentalists and computational modellers. This is caused by the different spheres of expertise required to perform each discipline, as well as the differing resource requirements of the two parties. This paper considers several forms of hardware acceleration for implementation within a laboratory alongside time sensitive experimentation, and focuses on how the use of general purpose computation on graphics processing units (GP-GPU) can allow parameter estimation to be performed in the laboratory, thereby acting as a bridge between the two halves of this field.This would facilitate rapid iterative model design, as well as allowing new forms of experimentation. This discussion is concluded with a brief case study that reports the performance increases associated with a GPU implementation over a single CPU approach. It should be noted that the proposed paradigm is not limited to neuroscience, as it would be beneficial to any discipline where unreliable time sensitive experimental procedures dominate exploration of the field.
{"title":"GP-GPU: Bridging the Gap between Modelling & Experimentation","authors":"T. F. Clayton, A. Murray, Iain A. B. Lindsay","doi":"10.1109/AHS.2009.60","DOIUrl":"https://doi.org/10.1109/AHS.2009.60","url":null,"abstract":"Within the field of neural electrophysiology, there exists a divide between experimentalists and computational modellers. This is caused by the different spheres of expertise required to perform each discipline, as well as the differing resource requirements of the two parties. This paper considers several forms of hardware acceleration for implementation within a laboratory alongside time sensitive experimentation, and focuses on how the use of general purpose computation on graphics processing units (GP-GPU) can allow parameter estimation to be performed in the laboratory, thereby acting as a bridge between the two halves of this field.This would facilitate rapid iterative model design, as well as allowing new forms of experimentation. This discussion is concluded with a brief case study that reports the performance increases associated with a GPU implementation over a single CPU approach. It should be noted that the proposed paradigm is not limited to neuroscience, as it would be beneficial to any discipline where unreliable time sensitive experimental procedures dominate exploration of the field.","PeriodicalId":318989,"journal":{"name":"2009 NASA/ESA Conference on Adaptive Hardware and Systems","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124483403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It has been shown that evolutionary and developmental processes can be used for emergence of scalability, robustness and fault-tolerance in hardware. However, designing a suitable representation for such processes is far from straightforward. Here, a bio-inspired developmental genotype-phenotype mapping for evolution of spiking neural microcircuits in an FPGA is introduced, based on a digital neuron model and cortex structure suggested and verified previously by the authors. The new developmental process is based on complex multi-cellular protein-protein and gene-protein interactions and signaling. Suitability of the representation for evolution of useful architectures and its adaptability is shown through statistical analysis and examples of scalability, modularity and fault-tolerance.
{"title":"A Multi-cellular Developmental Representation for Evolution of Adaptive Spiking Neural Microcircuits in an FPGA","authors":"Hooman Shayani, P. Bentley, A. Tyrrell","doi":"10.1109/AHS.2009.39","DOIUrl":"https://doi.org/10.1109/AHS.2009.39","url":null,"abstract":"It has been shown that evolutionary and developmental processes can be used for emergence of scalability, robustness and fault-tolerance in hardware. However, designing a suitable representation for such processes is far from straightforward. Here, a bio-inspired developmental genotype-phenotype mapping for evolution of spiking neural microcircuits in an FPGA is introduced, based on a digital neuron model and cortex structure suggested and verified previously by the authors. The new developmental process is based on complex multi-cellular protein-protein and gene-protein interactions and signaling. Suitability of the representation for evolution of useful architectures and its adaptability is shown through statistical analysis and examples of scalability, modularity and fault-tolerance.","PeriodicalId":318989,"journal":{"name":"2009 NASA/ESA Conference on Adaptive Hardware and Systems","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121622879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Smith-Waterman (SW) algorithm is the most accurate sequence alignment approach used by computational biologists for DNA matching. However it’s computational complexity makes SW impractical to use in clinical environment compared to much faster but less accurate sequence alignment technique such as BLAST. High performance computing community is examining alternative multi core architectures such as IBM Cell Broadband Engine (BE) and Graphics Processing Units (GPUs) that address the limitations of modern cache based designs. In this paper we investigate the performance of IBM Cell BE architecture in the context of SW. We present an analysis on architectural features of the Cell BE, study the architecture’s fitness for accelerating sequence alignment based on its parallel processing power, interconnect structure and communication protocols among the processing cores. We then evaluate the performance of Cell BE against the state of art implementation of SW on NVIDIA’s Tesla GPU. Results show that based on the memory architecture of the SW algorithm, Cell BE performs much better than Tesla GPU in terms of both cycle count and execution time metrics. Compared to purely serial implementation, in terms of cycle count, while state of the art GPU implementation delivers 15x speedup, our solution achieves 64x speedup.
{"title":"Performance Analysis of IBM Cell Broadband Engine on Sequence Alignment","authors":"Yang Song, Gregory M. Striemer, A. Akoglu","doi":"10.1109/AHS.2009.16","DOIUrl":"https://doi.org/10.1109/AHS.2009.16","url":null,"abstract":"The Smith-Waterman (SW) algorithm is the most accurate sequence alignment approach used by computational biologists for DNA matching. However it’s computational complexity makes SW impractical to use in clinical environment compared to much faster but less accurate sequence alignment technique such as BLAST. High performance computing community is examining alternative multi core architectures such as IBM Cell Broadband Engine (BE) and Graphics Processing Units (GPUs) that address the limitations of modern cache based designs. In this paper we investigate the performance of IBM Cell BE architecture in the context of SW. We present an analysis on architectural features of the Cell BE, study the architecture’s fitness for accelerating sequence alignment based on its parallel processing power, interconnect structure and communication protocols among the processing cores. We then evaluate the performance of Cell BE against the state of art implementation of SW on NVIDIA’s Tesla GPU. Results show that based on the memory architecture of the SW algorithm, Cell BE performs much better than Tesla GPU in terms of both cycle count and execution time metrics. Compared to purely serial implementation, in terms of cycle count, while state of the art GPU implementation delivers 15x speedup, our solution achieves 64x speedup.","PeriodicalId":318989,"journal":{"name":"2009 NASA/ESA Conference on Adaptive Hardware and Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130173521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Demand for fast dynamic reconfiguration has increased since dynamic reconfiguration can accelerate the performance of implementation circuits on a programmable device. Such dynamic reconfiguration necessitates two important features: fast reconfiguration and numerous contexts. However, because fast reconfiguration and numerous contexts share a tradeoff relation on current VLSIs, optically reconfigurable gate arrays (ORGAs) have been developed to resolve this dilemma.ORGAs can realize a large virtual gate count that is much larger than those of current VLSI chips by exploiting the large storage capacity of a holographic memory. Furthermore, ORGAs can realize fast reconfiguration through use of large bandwidth optical connections between a holographic memory and a programmable gate array VLSI. Among such developments, we have been developing dynamic optically reconfigurable gate arrays (DORGAs)that realize a high gate density VLSI using a photodiode memory architecture. This paper presents the first demonstration of a 16-context DORGA architecture. Furthermore, we present experimental results: 530–833 ns reconfiguration times and 5-9.375 us retention times.
{"title":"A Sixteen-Context Dynamic Optically Reconfigurable Gate Array","authors":"M. Nakajima, Minoru Watanabe","doi":"10.1109/AHS.2009.64","DOIUrl":"https://doi.org/10.1109/AHS.2009.64","url":null,"abstract":"Demand for fast dynamic reconfiguration has increased since dynamic reconfiguration can accelerate the performance of implementation circuits on a programmable device. Such dynamic reconfiguration necessitates two important features: fast reconfiguration and numerous contexts. However, because fast reconfiguration and numerous contexts share a tradeoff relation on current VLSIs, optically reconfigurable gate arrays (ORGAs) have been developed to resolve this dilemma.ORGAs can realize a large virtual gate count that is much larger than those of current VLSI chips by exploiting the large storage capacity of a holographic memory. Furthermore, ORGAs can realize fast reconfiguration through use of large bandwidth optical connections between a holographic memory and a programmable gate array VLSI. Among such developments, we have been developing dynamic optically reconfigurable gate arrays (DORGAs)that realize a high gate density VLSI using a photodiode memory architecture. This paper presents the first demonstration of a 16-context DORGA architecture. Furthermore, we present experimental results: 530–833 ns reconfiguration times and 5-9.375 us retention times.","PeriodicalId":318989,"journal":{"name":"2009 NASA/ESA Conference on Adaptive Hardware and Systems","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116475587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Configurable System-on-Chip (SoC) solutions based on state-of-the art FPGA are a good candidate to fulfill the requirements of future high end onboard payload applications. Reliability, performance and flexibility provided by SoCs can be further extended using a new communication paradigm, the Network-on-a-Chip (NoC). NoCs have the potential to solve the scalability problem of traditional on-chip bus systems but may introduce uncertainties due to contention for shared network resources. This paper explores NoC solutions that provide QoS and propose a methodology for the seamless integration of payload data-handling protocols into a NoC architecture.
{"title":"Quality of Service in NoC for Reconfigurable Space Applications","authors":"A. F. Florit, S. Parkes, P. Mendham","doi":"10.1109/AHS.2009.58","DOIUrl":"https://doi.org/10.1109/AHS.2009.58","url":null,"abstract":"Configurable System-on-Chip (SoC) solutions based on state-of-the art FPGA are a good candidate to fulfill the requirements of future high end onboard payload applications. Reliability, performance and flexibility provided by SoCs can be further extended using a new communication paradigm, the Network-on-a-Chip (NoC). NoCs have the potential to solve the scalability problem of traditional on-chip bus systems but may introduce uncertainties due to contention for shared network resources. This paper explores NoC solutions that provide QoS and propose a methodology for the seamless integration of payload data-handling protocols into a NoC architecture.","PeriodicalId":318989,"journal":{"name":"2009 NASA/ESA Conference on Adaptive Hardware and Systems","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125553799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}