2009 NASA/ESA Conference on Adaptive Hardware and Systems最新文献

英文中文

A New Application-Tuned Processor Architecture for High-Performance Reconfigurable Computing 面向高性能可重构计算的新型应用调优处理器体系结构

2009 NASA/ESA Conference on Adaptive Hardware and Systems

Pub Date : 2009-07-29 DOI: 10.1109/AHS.2009.18

L. Shang, Mi Zhou, Jiong Zhang, Hongbing Li

One design goal of future processors is to maximize the performance per watt. However, the performance of general purpose processors can be hardly improved by barely increasing clock frequency. This paper presents an application specific reconfigurable processor architecture which is fine tuned for high performance computing. It benefits from the application specific hardware customized to significantly improve its efficiency. In comparison with the existing work on configurable processor architectures, the proposed architecture has higher functional density and lower power consumption per inch due to its runtime partial reconfiguration ability. Moreover, it can adaptively change its architecture to further promote the average performance and feasibility for other applications.

未来处理器的一个设计目标是最大化每瓦特的性能。然而，通用处理器的性能几乎不能通过增加时钟频率来提高。本文提出了一种适用于高性能计算的可重构处理器体系结构。它受益于特定于应用程序的硬件定制，以显着提高其效率。与现有的可配置处理器架构相比，该架构具有更高的功能密度和更低的每英寸功耗，因为它具有运行时部分重构能力。此外，它可以自适应地改变其架构，进一步提高平均性能和其他应用的可行性。

引用次数: 0

Evolutionary Algorithms in Unreliable Memory 不可靠记忆中的进化算法

2009 NASA/ESA Conference on Adaptive Hardware and Systems

Pub Date : 2009-07-29 DOI: 10.1109/AHS.2009.24

Haisoo Shin, Yun-Geun Lee, R. McKay, N. X. Hoai

Guaranteeing the underlying reliability of computer memory is becoming more difficult as chip dimensions scale down, and as power limitations make lower voltages desirable. To date, the reliability of memory has been seen as the responsibility of the computer engineer, any underlying unreliability being hidden from programmers. However it may make sense, in future, to shift this balance, optionally exposing the unreliability to programmers, permitting them to choose between higher and lower reliabilities. This is particularly relevant to the data-intensive applications which might potentially provide the "killer apps" for anticipated future many-core architectures. We simulated the effect of unreliable memory on the behaviour of a slightly re-programmed variant of a typical Genetic Algorithm (GA) on a range of optimisation problems. With only minor change to the code, most variables held in unreliable memory, and error rates up to 10^-3, the memory unreliability had no real effect on the GA behaviour. For higher error rates, the effects became noticeable, and the behaviour of the GA was unacceptable once the error rate reached 10^-2.

随着芯片尺寸的缩小，保证计算机内存的基本可靠性变得越来越困难，而且由于功率限制，需要更低的电压。迄今为止，存储器的可靠性一直被视为计算机工程师的责任，任何潜在的不可靠性都对程序员隐藏起来。然而，在将来，改变这种平衡是有意义的，有选择地将不可靠性暴露给程序员，允许他们在高可靠性和低可靠性之间进行选择。这与数据密集型应用程序特别相关，这些应用程序可能会为预期的未来多核架构提供“杀手级应用程序”。我们模拟了不可靠记忆对典型遗传算法(GA)在一系列优化问题上稍微重新编程的变体行为的影响。只要对代码进行很小的更改，大多数变量保存在不可靠的内存中，错误率高达10^-3，内存不可靠性对遗传算法行为没有实际影响。对于较高的错误率，影响变得明显，一旦错误率达到10^-2，遗传算法的行为是不可接受的。

引用次数: 0

Synchronous Digital Implementation of the AER Communication Scheme for Emulating Large-Scale Spiking Neural Networks Models 模拟大规模尖峰神经网络模型的AER通信方案的同步数字实现

2009 NASA/ESA Conference on Adaptive Hardware and Systems

Pub Date : 2009-07-29 DOI: 10.1109/AHS.2009.14

J. Moreno, J. Madrenas, L. Kotynia

In this paper we shall present a fully synchronous digital implementation of the Address Event Representation (AER) communication scheme that has been used in the PERPLEXUS chip in order to permit the emulation of large-scale biologically inspired spiking neural networks models. By introducing specific commands in the AER protocol it is possible to distribute the AER bus among a large number of chips where the functionality of the spiking neurons is being emulated. A careful design of the AER encoder module using compact Content Addressable Memories (CAMs) allows for a feasible realization of large-scale models.

在本文中，我们将介绍一种完全同步的地址事件表示(AER)通信方案的数字实现，该方案已在PERPLEXUS芯片中使用，以便允许模拟大规模生物激发的尖峰神经网络模型。通过在AER协议中引入特定的命令，可以将AER总线分布在大量芯片中，在这些芯片中模拟尖峰神经元的功能。使用紧凑内容可寻址存储器(CAMs)的AER编码器模块的精心设计允许大规模模型的可行实现。

引用次数: 7

Scheduling Temporal Partitions in a Multiprocessing Paradigm for Reconfigurable Architectures 在可重构体系结构的多处理范式中调度时间分区

2009 NASA/ESA Conference on Adaptive Hardware and Systems

Pub Date : 2009-07-29 DOI: 10.1109/AHS.2009.43

A. Popp, Y. Moullec, P. Koch

In this paper we describe a mapping methodology for heterogeneous reconfigurable architectures consisting of one or more SW processors and one or more reconfigurable units, FPGAs. The mapping methodology consists of a separated track for a) the generation of the configurations for the FPGA by level-based and clustering-based temporal partitioning, and b) the scheduling of those configurations as well as the software tasks, based on two multiprocessor scheduling algorithms: a simple list-based scheduler and the more complex extended dynamic level scheduling algorithm. The mapping methodology is benchmarked by means of randomly created task graphs on an architecture of one SW processor and one FPGA. The results are compared to a 0-1 integer linear programming solution in terms of exploration time as well as the finish-time of all tasks of the application. The results show that, in 90% of the investigated cases, the combination of level-based temporal partitioning and extended dynamic level scheduling gives the best performance in terms of finish-time of the full task-set.

在本文中，我们描述了由一个或多个软件处理器和一个或多个可重构单元fpga组成的异构可重构架构的映射方法。该映射方法包括:a)通过基于级别和基于集群的时间分区生成FPGA配置;b)基于两种多处理器调度算法(简单的基于列表的调度算法和更复杂的扩展动态级别调度算法)调度这些配置和软件任务。通过在一个SW处理器和一个FPGA的架构上随机创建任务图，对映射方法进行基准测试。在探索时间和应用程序所有任务的完成时间方面，将结果与0-1整数线性规划解决方案进行比较。结果表明，在90%的调查案例中，基于级别的时间分区和扩展的动态级别调度相结合在完成整个任务集的时间方面具有最佳性能。

引用次数: 5

An FPGA-Based Web Server for High Performance Biological Sequence Alignment 基于fpga的高性能生物序列比对Web服务器

2009 NASA/ESA Conference on Adaptive Hardware and Systems

Pub Date : 2009-07-29 DOI: 10.1109/AHS.2009.59

Y. Liu, K. Benkrid, A. Benkrid, Server Kasap

This paper presents the design and implementation of the FPGA-based web server for biological sequence alignment. Central to this web-server is a set of highly parameterisable, scalable, and platform-independent FPGA cores for biological sequence alignment. The web server consists of an HTML–based interface, a MySQL database which holds user queries and results, a set of biological databases, a library of FPGA configurations, a host application servicing user requests, and an FPGA coprocessor for the acceleration of the sequence alignment operation. The paper presents a real implementation of this server on an HP ProLiant DL145 server with a Celoxica RCHTX FPGA board. Compared to an optimized pure software implementation, our FPGA-based web server achieved a two order of magnitude speed-up for a pairwise protein sequence alignment application based on the Smith-Waterman algorithm. The FPGA-based implementation has the added advantage of being over 100x more energy efficient.

本文介绍了基于fpga的生物序列比对web服务器的设计与实现。这个web服务器的核心是一组高度可参数化、可扩展和平台无关的FPGA内核，用于生物序列对齐。web服务器由一个基于html的界面、一个保存用户查询和结果的MySQL数据库、一组生物数据库、一个FPGA配置库、一个为用户请求服务的主机应用程序和一个用于加速序列对齐操作的FPGA协处理器组成。本文采用Celoxica RCHTX FPGA板，在HP ProLiant DL145服务器上实现了该服务器。与优化的纯软件实现相比，我们基于fpga的web服务器实现了基于Smith-Waterman算法的两两蛋白质序列比对应用程序的两个数量级的加速。基于fpga的实现具有超过100倍的能源效率的额外优势。

引用次数: 19

GP-GPU: Bridging the Gap between Modelling & Experimentation GP-GPU:弥合建模与实验之间的差距

2009 NASA/ESA Conference on Adaptive Hardware and Systems

Pub Date : 2009-07-29 DOI: 10.1109/AHS.2009.60

T. F. Clayton, A. Murray, Iain A. B. Lindsay

Within the field of neural electrophysiology, there exists a divide between experimentalists and computational modellers. This is caused by the different spheres of expertise required to perform each discipline, as well as the differing resource requirements of the two parties. This paper considers several forms of hardware acceleration for implementation within a laboratory alongside time sensitive experimentation, and focuses on how the use of general purpose computation on graphics processing units (GP-GPU) can allow parameter estimation to be performed in the laboratory, thereby acting as a bridge between the two halves of this field.This would facilitate rapid iterative model design, as well as allowing new forms of experimentation. This discussion is concluded with a brief case study that reports the performance increases associated with a GPU implementation over a single CPU approach. It should be noted that the proposed paradigm is not limited to neuroscience, as it would be beneficial to any discipline where unreliable time sensitive experimental procedures dominate exploration of the field.

在神经电生理学领域，存在着实验学家和计算建模者之间的分歧。这是由于执行每个学科所需的专业知识领域不同，以及双方不同的资源需求造成的。本文考虑了几种形式的硬件加速，以便在实验室中与时间敏感实验一起实现，并重点介绍了如何在图形处理单元(GP-GPU)上使用通用计算可以允许在实验室中执行参数估计，从而充当该领域两部分之间的桥梁。这将促进快速迭代模型设计，以及允许新形式的实验。本讨论以一个简短的案例研究结束，该案例研究报告了与单一CPU方法相关的GPU实现的性能提高。值得注意的是，所提出的范式并不局限于神经科学，因为它对任何不可靠的时间敏感实验程序主导该领域探索的学科都是有益的。

引用次数: 2

A Multi-cellular Developmental Representation for Evolution of Adaptive Spiking Neural Microcircuits in an FPGA 基于FPGA的自适应脉冲神经微电路进化的多细胞发展表征

2009 NASA/ESA Conference on Adaptive Hardware and Systems

Pub Date : 2009-07-29 DOI: 10.1109/AHS.2009.39

Hooman Shayani, P. Bentley, A. Tyrrell

It has been shown that evolutionary and developmental processes can be used for emergence of scalability, robustness and fault-tolerance in hardware. However, designing a suitable representation for such processes is far from straightforward. Here, a bio-inspired developmental genotype-phenotype mapping for evolution of spiking neural microcircuits in an FPGA is introduced, based on a digital neuron model and cortex structure suggested and verified previously by the authors. The new developmental process is based on complex multi-cellular protein-protein and gene-protein interactions and signaling. Suitability of the representation for evolution of useful architectures and its adaptability is shown through statistical analysis and examples of scalability, modularity and fault-tolerance.

研究表明，进化和发展过程可以用于硬件的可伸缩性、鲁棒性和容错性的出现。然而，为这些过程设计合适的表示方式远非易事。在此，基于数字神经元模型和作者先前提出并验证的皮层结构，介绍了一种基于FPGA的脉冲神经微电路进化的生物启发发育基因型-表型定位。新的发育过程是基于复杂的多细胞蛋白质-蛋白质和基因-蛋白质相互作用和信号传导。通过统计分析和可扩展性、模块化和容错性的实例，说明了表示对有用体系结构演化的适用性及其适应性。

引用次数: 4

Performance Analysis of IBM Cell Broadband Engine on Sequence Alignment IBM Cell宽带引擎序列比对性能分析

2009 NASA/ESA Conference on Adaptive Hardware and Systems

Pub Date : 2009-07-29 DOI: 10.1109/AHS.2009.16

Yang Song, Gregory M. Striemer, A. Akoglu

The Smith-Waterman (SW) algorithm is the most accurate sequence alignment approach used by computational biologists for DNA matching. However it’s computational complexity makes SW impractical to use in clinical environment compared to much faster but less accurate sequence alignment technique such as BLAST. High performance computing community is examining alternative multi core architectures such as IBM Cell Broadband Engine (BE) and Graphics Processing Units (GPUs) that address the limitations of modern cache based designs. In this paper we investigate the performance of IBM Cell BE architecture in the context of SW. We present an analysis on architectural features of the Cell BE, study the architecture’s fitness for accelerating sequence alignment based on its parallel processing power, interconnect structure and communication protocols among the processing cores. We then evaluate the performance of Cell BE against the state of art implementation of SW on NVIDIA’s Tesla GPU. Results show that based on the memory architecture of the SW algorithm, Cell BE performs much better than Tesla GPU in terms of both cycle count and execution time metrics. Compared to purely serial implementation, in terms of cycle count, while state of the art GPU implementation delivers 15x speedup, our solution achieves 64x speedup.

Smith-Waterman (SW)算法是计算生物学家用于DNA匹配的最精确的序列比对方法。然而，与BLAST等更快但精度较低的序列比对技术相比，SW的计算复杂性使得它在临床环境中使用起来不切实际。高性能计算社区正在研究替代的多核架构，如IBM Cell宽带引擎(BE)和图形处理单元(gpu)，它们解决了现代基于缓存设计的局限性。在本文中，我们研究了IBM Cell BE架构在软件环境下的性能。分析了Cell BE的结构特点，从并行处理能力、互连结构和处理核间通信协议等方面研究了该结构对加速序列比对的适应性。然后，我们根据NVIDIA的Tesla GPU上最先进的SW实现状态评估Cell BE的性能。结果表明，基于SW算法的内存架构，Cell BE在周期计数和执行时间指标上都优于Tesla GPU。与纯串行实现相比，在周期计数方面，虽然最先进的GPU实现提供了15倍的加速，但我们的解决方案实现了64倍的加速。

{"title":"Performance Analysis of IBM Cell Broadband Engine on Sequence Alignment","authors":"Yang Song, Gregory M. Striemer, A. Akoglu","doi":"10.1109/AHS.2009.16","DOIUrl":"https://doi.org/10.1109/AHS.2009.16","url":null,"abstract":"The Smith-Waterman (SW) algorithm is the most accurate sequence alignment approach used by computational biologists for DNA matching. However it’s computational complexity makes SW impractical to use in clinical environment compared to much faster but less accurate sequence alignment technique such as BLAST. High performance computing community is examining alternative multi core architectures such as IBM Cell Broadband Engine (BE) and Graphics Processing Units (GPUs) that address the limitations of modern cache based designs. In this paper we investigate the performance of IBM Cell BE architecture in the context of SW. We present an analysis on architectural features of the Cell BE, study the architecture’s fitness for accelerating sequence alignment based on its parallel processing power, interconnect structure and communication protocols among the processing cores. We then evaluate the performance of Cell BE against the state of art implementation of SW on NVIDIA’s Tesla GPU. Results show that based on the memory architecture of the SW algorithm, Cell BE performs much better than Tesla GPU in terms of both cycle count and execution time metrics. Compared to purely serial implementation, in terms of cycle count, while state of the art GPU implementation delivers 15x speedup, our solution achieves 64x speedup.","PeriodicalId":318989,"journal":{"name":"2009 NASA/ESA Conference on Adaptive Hardware and Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130173521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A Sixteen-Context Dynamic Optically Reconfigurable Gate Array 十六上下文动态光可重构门阵列

2009 NASA/ESA Conference on Adaptive Hardware and Systems

Pub Date : 2009-07-29 DOI: 10.1109/AHS.2009.64

M. Nakajima, Minoru Watanabe

Demand for fast dynamic reconfiguration has increased since dynamic reconfiguration can accelerate the performance of implementation circuits on a programmable device. Such dynamic reconfiguration necessitates two important features: fast reconfiguration and numerous contexts. However, because fast reconfiguration and numerous contexts share a tradeoff relation on current VLSIs, optically reconfigurable gate arrays (ORGAs) have been developed to resolve this dilemma.ORGAs can realize a large virtual gate count that is much larger than those of current VLSI chips by exploiting the large storage capacity of a holographic memory. Furthermore, ORGAs can realize fast reconfiguration through use of large bandwidth optical connections between a holographic memory and a programmable gate array VLSI. Among such developments, we have been developing dynamic optically reconfigurable gate arrays (DORGAs)that realize a high gate density VLSI using a photodiode memory architecture. This paper presents the first demonstration of a 16-context DORGA architecture. Furthermore, we present experimental results: 530–833 ns reconfiguration times and 5-9.375 us retention times.

由于动态重构可以提高可编程器件上实现电路的性能，因此对快速动态重构的需求不断增加。这种动态重新配置需要两个重要特性:快速重新配置和大量上下文。然而，由于快速重构和众多上下文在当前vlsi上共享权衡关系，因此开发了光可重构门阵列(ORGAs)来解决这一困境。orga可以利用全息存储器的大存储容量，实现比当前VLSI芯片大得多的虚拟门数。此外，orga可以通过在全息存储器和可编程门阵列VLSI之间使用大带宽光连接来实现快速重构。在这些发展中，我们一直在开发动态光学可重构门阵列(DORGAs)，该阵列使用光电二极管存储架构实现高栅极密度VLSI。本文首次展示了一个16上下文的DORGA体系结构。此外，我们还给出了实验结果:530-833 ns的重构时间和5-9.375 us的保留时间。

{"title":"A Sixteen-Context Dynamic Optically Reconfigurable Gate Array","authors":"M. Nakajima, Minoru Watanabe","doi":"10.1109/AHS.2009.64","DOIUrl":"https://doi.org/10.1109/AHS.2009.64","url":null,"abstract":"Demand for fast dynamic reconfiguration has increased since dynamic reconfiguration can accelerate the performance of implementation circuits on a programmable device. Such dynamic reconfiguration necessitates two important features: fast reconfiguration and numerous contexts. However, because fast reconfiguration and numerous contexts share a tradeoff relation on current VLSIs, optically reconfigurable gate arrays (ORGAs) have been developed to resolve this dilemma.ORGAs can realize a large virtual gate count that is much larger than those of current VLSI chips by exploiting the large storage capacity of a holographic memory. Furthermore, ORGAs can realize fast reconfiguration through use of large bandwidth optical connections between a holographic memory and a programmable gate array VLSI. Among such developments, we have been developing dynamic optically reconfigurable gate arrays (DORGAs)that realize a high gate density VLSI using a photodiode memory architecture. This paper presents the first demonstration of a 16-context DORGA architecture. Furthermore, we present experimental results: 530–833 ns reconfiguration times and 5-9.375 us retention times.","PeriodicalId":318989,"journal":{"name":"2009 NASA/ESA Conference on Adaptive Hardware and Systems","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116475587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Quality of Service in NoC for Reconfigurable Space Applications 面向可重构空间应用的NoC服务质量

2009 NASA/ESA Conference on Adaptive Hardware and Systems

Pub Date : 2009-07-29 DOI: 10.1109/AHS.2009.58

A. F. Florit, S. Parkes, P. Mendham

Configurable System-on-Chip (SoC) solutions based on state-of-the art FPGA are a good candidate to fulfill the requirements of future high end onboard payload applications. Reliability, performance and flexibility provided by SoCs can be further extended using a new communication paradigm, the Network-on-a-Chip (NoC). NoCs have the potential to solve the scalability problem of traditional on-chip bus systems but may introduce uncertainties due to contention for shared network resources. This paper explores NoC solutions that provide QoS and propose a methodology for the seamless integration of payload data-handling protocols into a NoC architecture.

基于最先进的FPGA的可配置片上系统(SoC)解决方案是满足未来高端板载有效载荷应用需求的良好候选者。soc提供的可靠性、性能和灵活性可以通过一种新的通信范式——片上网络(NoC)进一步扩展。noc有潜力解决传统片上总线系统的可扩展性问题，但可能由于共享网络资源的争夺而引入不确定性。本文探讨了提供QoS的NoC解决方案，并提出了一种将有效载荷数据处理协议无缝集成到NoC架构中的方法。

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2009 NASA/ESA Conference on Adaptive Hardware and Systems

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀