首页 > 最新文献

2009 NASA/ESA Conference on Adaptive Hardware and Systems最新文献

英文 中文
A New Application-Tuned Processor Architecture for High-Performance Reconfigurable Computing 面向高性能可重构计算的新型应用调优处理器体系结构
Pub Date : 2009-07-29 DOI: 10.1109/AHS.2009.18
L. Shang, Mi Zhou, Jiong Zhang, Hongbing Li
One design goal of future processors is to maximize the performance per watt. However, the performance of general purpose processors can be hardly improved by barely increasing clock frequency. This paper presents an application specific reconfigurable processor architecture which is fine tuned for high performance computing. It benefits from the application specific hardware customized to significantly improve its efficiency. In comparison with the existing work on configurable processor architectures, the proposed architecture has higher functional density and lower power consumption per inch due to its runtime partial reconfiguration ability. Moreover, it can adaptively change its architecture to further promote the average performance and feasibility for other applications.
未来处理器的一个设计目标是最大化每瓦特的性能。然而,通用处理器的性能几乎不能通过增加时钟频率来提高。本文提出了一种适用于高性能计算的可重构处理器体系结构。它受益于特定于应用程序的硬件定制,以显着提高其效率。与现有的可配置处理器架构相比,该架构具有更高的功能密度和更低的每英寸功耗,因为它具有运行时部分重构能力。此外,它可以自适应地改变其架构,进一步提高平均性能和其他应用的可行性。
{"title":"A New Application-Tuned Processor Architecture for High-Performance Reconfigurable Computing","authors":"L. Shang, Mi Zhou, Jiong Zhang, Hongbing Li","doi":"10.1109/AHS.2009.18","DOIUrl":"https://doi.org/10.1109/AHS.2009.18","url":null,"abstract":"One design goal of future processors is to maximize the performance per watt. However, the performance of general purpose processors can be hardly improved by barely increasing clock frequency. This paper presents an application specific reconfigurable processor architecture which is fine tuned for high performance computing. It benefits from the application specific hardware customized to significantly improve its efficiency. In comparison with the existing work on configurable processor architectures, the proposed architecture has higher functional density and lower power consumption per inch due to its runtime partial reconfiguration ability. Moreover, it can adaptively change its architecture to further promote the average performance and feasibility for other applications.","PeriodicalId":318989,"journal":{"name":"2009 NASA/ESA Conference on Adaptive Hardware and Systems","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124882095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prokaryotic Bio-Inspired Model for Embryonics 原核生物启发的胚胎学模型
Pub Date : 2009-07-29 DOI: 10.1109/AHS.2009.45
M. Samie, G. Dragffy, A. Popescu, A. Pipe, C. Melhuish
This paper is presented in conjunction with, and forms the first part of, the paper entitled “Prokaryotic Bio-Inspired Systems.” In this part we propose and investigate a novel prokaryotic cell-based bio-inspired model suitable to implement self-healing bio-inspired systems. A key feature of our model is that system reliability can be increased with a minimal amount of hardware overhead. It also offers a bio-inspired compression/decompression technique that exploits the intimate relationship between different genes. Distributed DNA, highly dynamic and flexible routing resources and optimized self-repair characteristics (using Block and cell elimination) are some of the other advantages of the proposed model.
本文与题为“原核生物启发系统”的论文一起提出,并构成了该论文的第一部分。在这一部分中,我们提出并研究了一种新的基于原核细胞的仿生模型,适合于实现自修复的仿生系统。我们模型的一个关键特征是,系统可靠性可以通过最小的硬件开销来提高。它还提供了一种生物启发的压缩/减压技术,利用不同基因之间的亲密关系。分布式DNA、高度动态和灵活的路由资源以及优化的自我修复特性(使用块和细胞消除)是该模型的其他一些优点。
{"title":"Prokaryotic Bio-Inspired Model for Embryonics","authors":"M. Samie, G. Dragffy, A. Popescu, A. Pipe, C. Melhuish","doi":"10.1109/AHS.2009.45","DOIUrl":"https://doi.org/10.1109/AHS.2009.45","url":null,"abstract":"This paper is presented in conjunction with, and forms the first part of, the paper entitled “Prokaryotic Bio-Inspired Systems.” In this part we propose and investigate a novel prokaryotic cell-based bio-inspired model suitable to implement self-healing bio-inspired systems. A key feature of our model is that system reliability can be increased with a minimal amount of hardware overhead. It also offers a bio-inspired compression/decompression technique that exploits the intimate relationship between different genes. Distributed DNA, highly dynamic and flexible routing resources and optimized self-repair characteristics (using Block and cell elimination) are some of the other advantages of the proposed model.","PeriodicalId":318989,"journal":{"name":"2009 NASA/ESA Conference on Adaptive Hardware and Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124315663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Synchronous Digital Implementation of the AER Communication Scheme for Emulating Large-Scale Spiking Neural Networks Models 模拟大规模尖峰神经网络模型的AER通信方案的同步数字实现
Pub Date : 2009-07-29 DOI: 10.1109/AHS.2009.14
J. Moreno, J. Madrenas, L. Kotynia
In this paper we shall present a fully synchronous digital implementation of the Address Event Representation (AER) communication scheme that has been used in the PERPLEXUS chip in order to permit the emulation of large-scale biologically inspired spiking neural networks models. By introducing specific commands in the AER protocol it is possible to distribute the AER bus among a large number of chips where the functionality of the spiking neurons is being emulated. A careful design of the AER encoder module using compact Content Addressable Memories (CAMs) allows for a feasible realization of large-scale models.
在本文中,我们将介绍一种完全同步的地址事件表示(AER)通信方案的数字实现,该方案已在PERPLEXUS芯片中使用,以便允许模拟大规模生物激发的尖峰神经网络模型。通过在AER协议中引入特定的命令,可以将AER总线分布在大量芯片中,在这些芯片中模拟尖峰神经元的功能。使用紧凑内容可寻址存储器(CAMs)的AER编码器模块的精心设计允许大规模模型的可行实现。
{"title":"Synchronous Digital Implementation of the AER Communication Scheme for Emulating Large-Scale Spiking Neural Networks Models","authors":"J. Moreno, J. Madrenas, L. Kotynia","doi":"10.1109/AHS.2009.14","DOIUrl":"https://doi.org/10.1109/AHS.2009.14","url":null,"abstract":"In this paper we shall present a fully synchronous digital implementation of the Address Event Representation (AER) communication scheme that has been used in the PERPLEXUS chip in order to permit the emulation of large-scale biologically inspired spiking neural networks models. By introducing specific commands in the AER protocol it is possible to distribute the AER bus among a large number of chips where the functionality of the spiking neurons is being emulated. A careful design of the AER encoder module using compact Content Addressable Memories (CAMs) allows for a feasible realization of large-scale models.","PeriodicalId":318989,"journal":{"name":"2009 NASA/ESA Conference on Adaptive Hardware and Systems","volume":"552 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128805308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Scheduling Temporal Partitions in a Multiprocessing Paradigm for Reconfigurable Architectures 在可重构体系结构的多处理范式中调度时间分区
Pub Date : 2009-07-29 DOI: 10.1109/AHS.2009.43
A. Popp, Y. Moullec, P. Koch
In this paper we describe a mapping methodology for heterogeneous reconfigurable architectures consisting of one or more SW processors and one or more reconfigurable units, FPGAs. The mapping methodology consists of a separated track for a) the generation of the configurations for the FPGA by level-based and clustering-based temporal partitioning, and b) the scheduling of those configurations as well as the software tasks, based on two multiprocessor scheduling algorithms: a simple list-based scheduler and the more complex extended dynamic level scheduling algorithm. The mapping methodology is benchmarked by means of randomly created task graphs on an architecture of one SW processor and one FPGA. The results are compared to a 0-1 integer linear programming solution in terms of exploration time as well as the finish-time of all tasks of the application. The results show that, in 90% of the investigated cases, the combination of level-based temporal partitioning and extended dynamic level scheduling gives the best performance in terms of finish-time of the full task-set.
在本文中,我们描述了由一个或多个软件处理器和一个或多个可重构单元fpga组成的异构可重构架构的映射方法。该映射方法包括:a)通过基于级别和基于集群的时间分区生成FPGA配置;b)基于两种多处理器调度算法(简单的基于列表的调度算法和更复杂的扩展动态级别调度算法)调度这些配置和软件任务。通过在一个SW处理器和一个FPGA的架构上随机创建任务图,对映射方法进行基准测试。在探索时间和应用程序所有任务的完成时间方面,将结果与0-1整数线性规划解决方案进行比较。结果表明,在90%的调查案例中,基于级别的时间分区和扩展的动态级别调度相结合在完成整个任务集的时间方面具有最佳性能。
{"title":"Scheduling Temporal Partitions in a Multiprocessing Paradigm for Reconfigurable Architectures","authors":"A. Popp, Y. Moullec, P. Koch","doi":"10.1109/AHS.2009.43","DOIUrl":"https://doi.org/10.1109/AHS.2009.43","url":null,"abstract":"In this paper we describe a mapping methodology for heterogeneous reconfigurable architectures consisting of one or more SW processors and one or more reconfigurable units, FPGAs. The mapping methodology consists of a separated track for a) the generation of the configurations for the FPGA by level-based and clustering-based temporal partitioning, and b) the scheduling of those configurations as well as the software tasks, based on two multiprocessor scheduling algorithms: a simple list-based scheduler and the more complex extended dynamic level scheduling algorithm. The mapping methodology is benchmarked by means of randomly created task graphs on an architecture of one SW processor and one FPGA. The results are compared to a 0-1 integer linear programming solution in terms of exploration time as well as the finish-time of all tasks of the application. The results show that, in 90% of the investigated cases, the combination of level-based temporal partitioning and extended dynamic level scheduling gives the best performance in terms of finish-time of the full task-set.","PeriodicalId":318989,"journal":{"name":"2009 NASA/ESA Conference on Adaptive Hardware and Systems","volume":"2002 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127315819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
An FPGA-Based Web Server for High Performance Biological Sequence Alignment 基于fpga的高性能生物序列比对Web服务器
Pub Date : 2009-07-29 DOI: 10.1109/AHS.2009.59
Y. Liu, K. Benkrid, A. Benkrid, Server Kasap
This paper presents the design and implementation of the FPGA-based web server for biological sequence alignment. Central to this web-server is a set of highly parameterisable, scalable, and platform-independent FPGA cores for biological sequence alignment. The web server consists of an HTML–based interface, a MySQL database which holds user queries and results, a set of biological databases, a library of FPGA configurations, a host application servicing user requests, and an FPGA coprocessor for the acceleration of the sequence alignment operation. The paper presents a real implementation of this server on an HP ProLiant DL145 server with a Celoxica RCHTX FPGA board. Compared to an optimized pure software implementation, our FPGA-based web server achieved a two order of magnitude speed-up for a pairwise protein sequence alignment application based on the Smith-Waterman algorithm. The FPGA-based implementation has the added advantage of being over 100x more energy efficient.
本文介绍了基于fpga的生物序列比对web服务器的设计与实现。这个web服务器的核心是一组高度可参数化、可扩展和平台无关的FPGA内核,用于生物序列对齐。web服务器由一个基于html的界面、一个保存用户查询和结果的MySQL数据库、一组生物数据库、一个FPGA配置库、一个为用户请求服务的主机应用程序和一个用于加速序列对齐操作的FPGA协处理器组成。本文采用Celoxica RCHTX FPGA板,在HP ProLiant DL145服务器上实现了该服务器。与优化的纯软件实现相比,我们基于fpga的web服务器实现了基于Smith-Waterman算法的两两蛋白质序列比对应用程序的两个数量级的加速。基于fpga的实现具有超过100倍的能源效率的额外优势。
{"title":"An FPGA-Based Web Server for High Performance Biological Sequence Alignment","authors":"Y. Liu, K. Benkrid, A. Benkrid, Server Kasap","doi":"10.1109/AHS.2009.59","DOIUrl":"https://doi.org/10.1109/AHS.2009.59","url":null,"abstract":"This paper presents the design and implementation of the FPGA-based web server for biological sequence alignment. Central to this web-server is a set of highly parameterisable, scalable, and platform-independent FPGA cores for biological sequence alignment. The web server consists of an HTML–based interface, a MySQL database which holds user queries and results, a set of biological databases, a library of FPGA configurations, a host application servicing user requests, and an FPGA coprocessor for the acceleration of the sequence alignment operation. The paper presents a real implementation of this server on an HP ProLiant DL145 server with a Celoxica RCHTX FPGA board. Compared to an optimized pure software implementation, our FPGA-based web server achieved a two order of magnitude speed-up for a pairwise protein sequence alignment application based on the Smith-Waterman algorithm. The FPGA-based implementation has the added advantage of being over 100x more energy efficient.","PeriodicalId":318989,"journal":{"name":"2009 NASA/ESA Conference on Adaptive Hardware and Systems","volume":"864 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126966069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
GP-GPU: Bridging the Gap between Modelling & Experimentation GP-GPU:弥合建模与实验之间的差距
Pub Date : 2009-07-29 DOI: 10.1109/AHS.2009.60
T. F. Clayton, A. Murray, Iain A. B. Lindsay
Within the field of neural electrophysiology, there exists a divide between experimentalists and computational modellers. This is caused by the different spheres of expertise required to perform each discipline, as well as the differing resource requirements of the two parties. This paper considers several forms of hardware acceleration for implementation within a laboratory alongside time sensitive experimentation, and focuses on how the use of general purpose computation on graphics processing units (GP-GPU) can allow parameter estimation to be performed in the laboratory, thereby acting as a bridge between the two halves of this field.This would facilitate rapid iterative model design, as well as allowing new forms of experimentation. This discussion is concluded with a brief case study that reports the performance increases associated with a GPU implementation over a single CPU approach. It should be noted that the proposed paradigm is not limited to neuroscience, as it would be beneficial to any discipline where unreliable time sensitive experimental procedures dominate exploration of the field.
在神经电生理学领域,存在着实验学家和计算建模者之间的分歧。这是由于执行每个学科所需的专业知识领域不同,以及双方不同的资源需求造成的。本文考虑了几种形式的硬件加速,以便在实验室中与时间敏感实验一起实现,并重点介绍了如何在图形处理单元(GP-GPU)上使用通用计算可以允许在实验室中执行参数估计,从而充当该领域两部分之间的桥梁。这将促进快速迭代模型设计,以及允许新形式的实验。本讨论以一个简短的案例研究结束,该案例研究报告了与单一CPU方法相关的GPU实现的性能提高。值得注意的是,所提出的范式并不局限于神经科学,因为它对任何不可靠的时间敏感实验程序主导该领域探索的学科都是有益的。
{"title":"GP-GPU: Bridging the Gap between Modelling & Experimentation","authors":"T. F. Clayton, A. Murray, Iain A. B. Lindsay","doi":"10.1109/AHS.2009.60","DOIUrl":"https://doi.org/10.1109/AHS.2009.60","url":null,"abstract":"Within the field of neural electrophysiology, there exists a divide between experimentalists and computational modellers. This is caused by the different spheres of expertise required to perform each discipline, as well as the differing resource requirements of the two parties. This paper considers several forms of hardware acceleration for implementation within a laboratory alongside time sensitive experimentation, and focuses on how the use of general purpose computation on graphics processing units (GP-GPU) can allow parameter estimation to be performed in the laboratory, thereby acting as a bridge between the two halves of this field.This would facilitate rapid iterative model design, as well as allowing new forms of experimentation. This discussion is concluded with a brief case study that reports the performance increases associated with a GPU implementation over a single CPU approach. It should be noted that the proposed paradigm is not limited to neuroscience, as it would be beneficial to any discipline where unreliable time sensitive experimental procedures dominate exploration of the field.","PeriodicalId":318989,"journal":{"name":"2009 NASA/ESA Conference on Adaptive Hardware and Systems","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124483403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Multi-cellular Developmental Representation for Evolution of Adaptive Spiking Neural Microcircuits in an FPGA 基于FPGA的自适应脉冲神经微电路进化的多细胞发展表征
Pub Date : 2009-07-29 DOI: 10.1109/AHS.2009.39
Hooman Shayani, P. Bentley, A. Tyrrell
It has been shown that evolutionary and developmental processes can be used for emergence of scalability, robustness and fault-tolerance in hardware. However, designing a suitable representation for such processes is far from straightforward. Here, a bio-inspired developmental genotype-phenotype mapping for evolution of spiking neural microcircuits in an FPGA is introduced, based on a digital neuron model and cortex structure suggested and verified previously by the authors. The new developmental process is based on complex multi-cellular protein-protein and gene-protein interactions and signaling. Suitability of the representation for evolution of useful architectures and its adaptability is shown through statistical analysis and examples of scalability, modularity and fault-tolerance.
研究表明,进化和发展过程可以用于硬件的可伸缩性、鲁棒性和容错性的出现。然而,为这些过程设计合适的表示方式远非易事。在此,基于数字神经元模型和作者先前提出并验证的皮层结构,介绍了一种基于FPGA的脉冲神经微电路进化的生物启发发育基因型-表型定位。新的发育过程是基于复杂的多细胞蛋白质-蛋白质和基因-蛋白质相互作用和信号传导。通过统计分析和可扩展性、模块化和容错性的实例,说明了表示对有用体系结构演化的适用性及其适应性。
{"title":"A Multi-cellular Developmental Representation for Evolution of Adaptive Spiking Neural Microcircuits in an FPGA","authors":"Hooman Shayani, P. Bentley, A. Tyrrell","doi":"10.1109/AHS.2009.39","DOIUrl":"https://doi.org/10.1109/AHS.2009.39","url":null,"abstract":"It has been shown that evolutionary and developmental processes can be used for emergence of scalability, robustness and fault-tolerance in hardware. However, designing a suitable representation for such processes is far from straightforward. Here, a bio-inspired developmental genotype-phenotype mapping for evolution of spiking neural microcircuits in an FPGA is introduced, based on a digital neuron model and cortex structure suggested and verified previously by the authors. The new developmental process is based on complex multi-cellular protein-protein and gene-protein interactions and signaling. Suitability of the representation for evolution of useful architectures and its adaptability is shown through statistical analysis and examples of scalability, modularity and fault-tolerance.","PeriodicalId":318989,"journal":{"name":"2009 NASA/ESA Conference on Adaptive Hardware and Systems","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121622879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Performance Analysis of IBM Cell Broadband Engine on Sequence Alignment IBM Cell宽带引擎序列比对性能分析
Pub Date : 2009-07-29 DOI: 10.1109/AHS.2009.16
Yang Song, Gregory M. Striemer, A. Akoglu
The Smith-Waterman (SW) algorithm is the most accurate sequence alignment approach used by computational biologists for DNA matching. However it’s computational complexity makes SW impractical to use in clinical environment compared to much faster but less accurate sequence alignment technique such as BLAST. High performance computing community is examining alternative multi core architectures such as IBM Cell Broadband Engine (BE) and Graphics Processing Units (GPUs) that address the limitations of modern cache based designs. In this paper we investigate the performance of IBM Cell BE architecture in the context of SW. We present an analysis on architectural features of the Cell BE, study the architecture’s fitness for accelerating sequence alignment based on its parallel processing power, interconnect structure and communication protocols among the processing cores. We then evaluate the performance of Cell BE against the state of art implementation of SW on NVIDIA’s Tesla GPU. Results show that based on the memory architecture of the SW algorithm, Cell BE performs much better than Tesla GPU in terms of both cycle count and execution time metrics. Compared to purely serial implementation, in terms of cycle count, while state of the art GPU implementation delivers 15x speedup, our solution achieves 64x speedup.
Smith-Waterman (SW)算法是计算生物学家用于DNA匹配的最精确的序列比对方法。然而,与BLAST等更快但精度较低的序列比对技术相比,SW的计算复杂性使得它在临床环境中使用起来不切实际。高性能计算社区正在研究替代的多核架构,如IBM Cell宽带引擎(BE)和图形处理单元(gpu),它们解决了现代基于缓存设计的局限性。在本文中,我们研究了IBM Cell BE架构在软件环境下的性能。分析了Cell BE的结构特点,从并行处理能力、互连结构和处理核间通信协议等方面研究了该结构对加速序列比对的适应性。然后,我们根据NVIDIA的Tesla GPU上最先进的SW实现状态评估Cell BE的性能。结果表明,基于SW算法的内存架构,Cell BE在周期计数和执行时间指标上都优于Tesla GPU。与纯串行实现相比,在周期计数方面,虽然最先进的GPU实现提供了15倍的加速,但我们的解决方案实现了64倍的加速。
{"title":"Performance Analysis of IBM Cell Broadband Engine on Sequence Alignment","authors":"Yang Song, Gregory M. Striemer, A. Akoglu","doi":"10.1109/AHS.2009.16","DOIUrl":"https://doi.org/10.1109/AHS.2009.16","url":null,"abstract":"The Smith-Waterman (SW) algorithm is the most accurate sequence alignment approach used by computational biologists for DNA matching. However it’s computational complexity makes SW impractical to use in clinical environment compared to much faster but less accurate sequence alignment technique such as BLAST. High performance computing community is examining alternative multi core architectures such as IBM Cell Broadband Engine (BE) and Graphics Processing Units (GPUs) that address the limitations of modern cache based designs. In this paper we investigate the performance of IBM Cell BE architecture in the context of SW. We present an analysis on architectural features of the Cell BE, study the architecture’s fitness for accelerating sequence alignment based on its parallel processing power, interconnect structure and communication protocols among the processing cores. We then evaluate the performance of Cell BE against the state of art implementation of SW on NVIDIA’s Tesla GPU. Results show that based on the memory architecture of the SW algorithm, Cell BE performs much better than Tesla GPU in terms of both cycle count and execution time metrics. Compared to purely serial implementation, in terms of cycle count, while state of the art GPU implementation delivers 15x speedup, our solution achieves 64x speedup.","PeriodicalId":318989,"journal":{"name":"2009 NASA/ESA Conference on Adaptive Hardware and Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130173521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Sixteen-Context Dynamic Optically Reconfigurable Gate Array 十六上下文动态光可重构门阵列
Pub Date : 2009-07-29 DOI: 10.1109/AHS.2009.64
M. Nakajima, Minoru Watanabe
Demand for fast dynamic reconfiguration has increased since dynamic reconfiguration can accelerate the performance of implementation circuits on a programmable device. Such dynamic reconfiguration necessitates two important features: fast reconfiguration and numerous contexts. However, because fast reconfiguration and numerous contexts share a tradeoff relation on current VLSIs, optically reconfigurable gate arrays (ORGAs) have been developed to resolve this dilemma.ORGAs can realize a large virtual gate count that is much larger than those of current VLSI chips by exploiting the large storage capacity of a holographic memory. Furthermore, ORGAs can realize fast reconfiguration through use of large bandwidth optical connections between a holographic memory and a programmable gate array VLSI. Among such developments, we have been developing dynamic optically reconfigurable gate arrays (DORGAs)that realize a high gate density VLSI using a photodiode memory architecture. This paper presents the first demonstration of a 16-context DORGA architecture. Furthermore, we present experimental results: 530–833 ns reconfiguration times and 5-9.375 us retention times.
由于动态重构可以提高可编程器件上实现电路的性能,因此对快速动态重构的需求不断增加。这种动态重新配置需要两个重要特性:快速重新配置和大量上下文。然而,由于快速重构和众多上下文在当前vlsi上共享权衡关系,因此开发了光可重构门阵列(ORGAs)来解决这一困境。orga可以利用全息存储器的大存储容量,实现比当前VLSI芯片大得多的虚拟门数。此外,orga可以通过在全息存储器和可编程门阵列VLSI之间使用大带宽光连接来实现快速重构。在这些发展中,我们一直在开发动态光学可重构门阵列(DORGAs),该阵列使用光电二极管存储架构实现高栅极密度VLSI。本文首次展示了一个16上下文的DORGA体系结构。此外,我们还给出了实验结果:530-833 ns的重构时间和5-9.375 us的保留时间。
{"title":"A Sixteen-Context Dynamic Optically Reconfigurable Gate Array","authors":"M. Nakajima, Minoru Watanabe","doi":"10.1109/AHS.2009.64","DOIUrl":"https://doi.org/10.1109/AHS.2009.64","url":null,"abstract":"Demand for fast dynamic reconfiguration has increased since dynamic reconfiguration can accelerate the performance of implementation circuits on a programmable device. Such dynamic reconfiguration necessitates two important features: fast reconfiguration and numerous contexts. However, because fast reconfiguration and numerous contexts share a tradeoff relation on current VLSIs, optically reconfigurable gate arrays (ORGAs) have been developed to resolve this dilemma.ORGAs can realize a large virtual gate count that is much larger than those of current VLSI chips by exploiting the large storage capacity of a holographic memory. Furthermore, ORGAs can realize fast reconfiguration through use of large bandwidth optical connections between a holographic memory and a programmable gate array VLSI. Among such developments, we have been developing dynamic optically reconfigurable gate arrays (DORGAs)that realize a high gate density VLSI using a photodiode memory architecture. This paper presents the first demonstration of a 16-context DORGA architecture. Furthermore, we present experimental results: 530–833 ns reconfiguration times and 5-9.375 us retention times.","PeriodicalId":318989,"journal":{"name":"2009 NASA/ESA Conference on Adaptive Hardware and Systems","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116475587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Quality of Service in NoC for Reconfigurable Space Applications 面向可重构空间应用的NoC服务质量
Pub Date : 2009-07-29 DOI: 10.1109/AHS.2009.58
A. F. Florit, S. Parkes, P. Mendham
Configurable System-on-Chip (SoC) solutions based on state-of-the art FPGA are a good candidate to fulfill the requirements of future high end onboard payload applications. Reliability, performance and flexibility provided by SoCs can be further extended using a new communication paradigm, the Network-on-a-Chip (NoC). NoCs have the potential to solve the scalability problem of traditional on-chip bus systems but may introduce uncertainties due to contention for shared network resources. This paper explores NoC solutions that provide QoS and propose a methodology for the seamless integration of payload data-handling protocols into a NoC architecture.
基于最先进的FPGA的可配置片上系统(SoC)解决方案是满足未来高端板载有效载荷应用需求的良好候选者。soc提供的可靠性、性能和灵活性可以通过一种新的通信范式——片上网络(NoC)进一步扩展。noc有潜力解决传统片上总线系统的可扩展性问题,但可能由于共享网络资源的争夺而引入不确定性。本文探讨了提供QoS的NoC解决方案,并提出了一种将有效载荷数据处理协议无缝集成到NoC架构中的方法。
{"title":"Quality of Service in NoC for Reconfigurable Space Applications","authors":"A. F. Florit, S. Parkes, P. Mendham","doi":"10.1109/AHS.2009.58","DOIUrl":"https://doi.org/10.1109/AHS.2009.58","url":null,"abstract":"Configurable System-on-Chip (SoC) solutions based on state-of-the art FPGA are a good candidate to fulfill the requirements of future high end onboard payload applications. Reliability, performance and flexibility provided by SoCs can be further extended using a new communication paradigm, the Network-on-a-Chip (NoC). NoCs have the potential to solve the scalability problem of traditional on-chip bus systems but may introduce uncertainties due to contention for shared network resources. This paper explores NoC solutions that provide QoS and propose a methodology for the seamless integration of payload data-handling protocols into a NoC architecture.","PeriodicalId":318989,"journal":{"name":"2009 NASA/ESA Conference on Adaptive Hardware and Systems","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125553799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2009 NASA/ESA Conference on Adaptive Hardware and Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1