首页 > 最新文献

Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186)最新文献

英文 中文
A parallel hardware evolvable computer POLYP 一种并行硬件进化计算机POLYP
U. Tangen, L. Schulte, J. McCaskill
Previous work (J.S. McCaskill et al., 1996; 1997) has shown the power of massively parallel configurable hardware (NGEN) in conjunction with dataflow architectures for the simulation of evolving populations. NGEN is a flexible computer hardware for rapid custom circuit simulation of fine grained physical processes via a massively parallel architecture, e.g. 144 hardware configurable field programmable gate arrays (FPGAs, XC4008, Xilinx). NGEN is optimized to implement dataflow architectures and systolic algorithms for large problems and is confectioned with high speed distributed SRAM, 144*8*256 kBit, 15ns access time, on the chip to chip interconnect. Microconfigurable FPGAs allow a further step to close the gap between micro electronics and biology on the information processing area. A design for a massively parallel microconfigurable computer (POLYP) is presented. It is designed to allow online evolution in hardware with significant locally controllable memory resources. It is also designed for high throughput dataflow applications with large problem size. Additionally, an evolvable interface between high rate measurement devices is provided to allow adaptive processing coupled with real time experimental environments. The computer represents the next logical step towards evolvable hardware interacting with biology beyond the massively parallel computer NGEN.
以前的工作(J.S. McCaskill et al., 1996;1997年)展示了大规模并行可配置硬件(NGEN)与数据流架构相结合的能力,可以模拟不断进化的种群。NGEN是一种灵活的计算机硬件,可通过大规模并行架构快速定制细粒度物理过程的电路仿真,例如144个硬件可配置现场可编程门阵列(fpga, XC4008, Xilinx)。NGEN经过优化,可实现大型问题的数据流架构和收缩算法,并在片与片互连上配置高速分布式SRAM, 144*8*256 kBit, 15ns访问时间。微可配置fpga进一步缩小了微电子学和生物学在信息处理领域的差距。提出了一种大规模并行微组态计算机(POLYP)的设计方案。它被设计为允许在具有大量本地可控内存资源的硬件中进行在线演进。它还设计用于具有大问题规模的高吞吐量数据流应用程序。此外,提供了高速率测量设备之间的可进化接口,以允许与实时实验环境相结合的自适应处理。计算机代表了在大规模并行计算机NGEN之外,迈向可进化硬件与生物交互的下一个合乎逻辑的步骤。
{"title":"A parallel hardware evolvable computer POLYP","authors":"U. Tangen, L. Schulte, J. McCaskill","doi":"10.1109/FPGA.1997.624625","DOIUrl":"https://doi.org/10.1109/FPGA.1997.624625","url":null,"abstract":"Previous work (J.S. McCaskill et al., 1996; 1997) has shown the power of massively parallel configurable hardware (NGEN) in conjunction with dataflow architectures for the simulation of evolving populations. NGEN is a flexible computer hardware for rapid custom circuit simulation of fine grained physical processes via a massively parallel architecture, e.g. 144 hardware configurable field programmable gate arrays (FPGAs, XC4008, Xilinx). NGEN is optimized to implement dataflow architectures and systolic algorithms for large problems and is confectioned with high speed distributed SRAM, 144*8*256 kBit, 15ns access time, on the chip to chip interconnect. Microconfigurable FPGAs allow a further step to close the gap between micro electronics and biology on the information processing area. A design for a massively parallel microconfigurable computer (POLYP) is presented. It is designed to allow online evolution in hardware with significant locally controllable memory resources. It is also designed for high throughput dataflow applications with large problem size. Additionally, an evolvable interface between high rate measurement devices is provided to allow adaptive processing coupled with real time experimental environments. The computer represents the next logical step towards evolvable hardware interacting with biology beyond the massively parallel computer NGEN.","PeriodicalId":303064,"journal":{"name":"Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130057781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Mapping applications to the RaPiD configurable architecture 将应用程序映射到快速可配置体系结构
C. Ebeling, Darren C. Cronquist, Paul Franklin, Jason Secosky, Stefan G. Berg
The goal of the RaPiD (Reconfigurable Pipelined Datapath) architecture is to provide high performance configurable computing for a range of computationally-intensive applications that demand special-purpose hardware. This is accomplished by mapping the computation into a deep pipeline using a configurable array of coarse-grained computational units. A key feature of RaPiD is the combination of static and dynamic control. While the underlying computational pipelines are configured statically, a limited amount of dynamic control is provided which greatly increases the range and capability of applications that can be mapped to RaPiD. This paper illustrates this mapping and configuration for several important applications including a FIR filter, 2-D DCT, motion estimation, and parametric curve generation; it also shows how static and dynamic control are used to perform complex computations.
RaPiD(可重构流水线数据路径)体系结构的目标是为一系列需要专用硬件的计算密集型应用程序提供高性能的可配置计算。这是通过使用可配置的粗粒度计算单元数组将计算映射到深层管道来实现的。RaPiD的一个关键特征是静态和动态控制的结合。虽然底层计算管道是静态配置的,但提供了有限数量的动态控制,这大大增加了可以映射到RaPiD的应用程序的范围和能力。本文说明了这种映射和配置的几个重要应用,包括FIR滤波器,2-D DCT,运动估计和参数曲线生成;它还展示了如何使用静态和动态控制来执行复杂的计算。
{"title":"Mapping applications to the RaPiD configurable architecture","authors":"C. Ebeling, Darren C. Cronquist, Paul Franklin, Jason Secosky, Stefan G. Berg","doi":"10.1109/FPGA.1997.624610","DOIUrl":"https://doi.org/10.1109/FPGA.1997.624610","url":null,"abstract":"The goal of the RaPiD (Reconfigurable Pipelined Datapath) architecture is to provide high performance configurable computing for a range of computationally-intensive applications that demand special-purpose hardware. This is accomplished by mapping the computation into a deep pipeline using a configurable array of coarse-grained computational units. A key feature of RaPiD is the combination of static and dynamic control. While the underlying computational pipelines are configured statically, a limited amount of dynamic control is provided which greatly increases the range and capability of applications that can be mapped to RaPiD. This paper illustrates this mapping and configuration for several important applications including a FIR filter, 2-D DCT, motion estimation, and parametric curve generation; it also shows how static and dynamic control are used to perform complex computations.","PeriodicalId":303064,"journal":{"name":"Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186)","volume":"218 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133713141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 124
Speech recognition HMM training on reconfigurable parallel processor 基于可重构并行处理器的语音识别HMM训练
HyunJeong Yun, Aaron Smith, H. Silverman
Armstrong III is a 20 node multi-computer that is currently operational. In addition to a RISC processor, each node contains reconfigurable resources implemented with FPGAs. The in-circuit reprogramability of static RAM based FPGAs allows the computational capabilities of a node to be dynamically matched to the computational requirements of an application. Most reconfigurable computers in existence today rely solely on a large number of FPGAs to perform computations. In contrast, the paper demonstrates the utility of a small number of FPGAs coupled to a RISC processor with a simple interconnect. The article describes a substantive example application that performs HMM training for speech recognition with the reconfigurable platform.
阿姆斯特朗三号是一台20节点的多计算机,目前正在运行。除了RISC处理器之外,每个节点还包含用fpga实现的可重构资源。基于静态RAM的fpga的电路可重编程性允许节点的计算能力动态匹配应用程序的计算需求。目前存在的大多数可重构计算机仅依靠大量的fpga来执行计算。相比之下,本文展示了通过简单的互连将少量fpga耦合到RISC处理器的实用性。本文描述了一个实质性的示例应用程序,该应用程序使用可重构平台为语音识别执行HMM训练。
{"title":"Speech recognition HMM training on reconfigurable parallel processor","authors":"HyunJeong Yun, Aaron Smith, H. Silverman","doi":"10.1109/FPGA.1997.624627","DOIUrl":"https://doi.org/10.1109/FPGA.1997.624627","url":null,"abstract":"Armstrong III is a 20 node multi-computer that is currently operational. In addition to a RISC processor, each node contains reconfigurable resources implemented with FPGAs. The in-circuit reprogramability of static RAM based FPGAs allows the computational capabilities of a node to be dynamically matched to the computational requirements of an application. Most reconfigurable computers in existence today rely solely on a large number of FPGAs to perform computations. In contrast, the paper demonstrates the utility of a small number of FPGAs coupled to a RISC processor with a simple interconnect. The article describes a substantive example application that performs HMM training for speech recognition with the reconfigurable platform.","PeriodicalId":303064,"journal":{"name":"Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127689329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Incremental reconfiguration for pipelined applications 流水线应用程序的增量重新配置
H. Schmit
This paper examines the implementation of pipelined applications using run-time reconfiguration. Throughput and latency of pipelined applications can be significantly improved when reconfiguration is performed at the level of individual pipeline stages, as opposed to configuration of the entire FPGA. If reconfiguration and execution can be performed simultaneously, the performance of a pipelined application approaches its theoretical maximum. This paper proposes a new FPGA configuration mechanism, called striping, that supports pipeline stage reconfiguration and simultaneous configuration and execution. Additionally, the use of the pipeline stage as the atomic unit of reconfiguration introduces a design abstraction that enables the development families of upwardly-compatible FPGAs and virtual hardware design.
本文研究了使用运行时重新配置实现流水线应用程序。与配置整个FPGA相比,在单个流水线阶段进行重新配置可以显著改善流水线应用程序的吞吐量和延迟。如果重新配置和执行可以同时进行,则流水线应用程序的性能将接近其理论最大值。本文提出了一种新的FPGA配置机制,称为条带化,它支持流水线阶段重构和同时配置和执行。此外,使用管道阶段作为重新配置的原子单元引入了一种设计抽象,使向上兼容的fpga和虚拟硬件设计的开发家族成为可能。
{"title":"Incremental reconfiguration for pipelined applications","authors":"H. Schmit","doi":"10.1109/FPGA.1997.624604","DOIUrl":"https://doi.org/10.1109/FPGA.1997.624604","url":null,"abstract":"This paper examines the implementation of pipelined applications using run-time reconfiguration. Throughput and latency of pipelined applications can be significantly improved when reconfiguration is performed at the level of individual pipeline stages, as opposed to configuration of the entire FPGA. If reconfiguration and execution can be performed simultaneously, the performance of a pipelined application approaches its theoretical maximum. This paper proposes a new FPGA configuration mechanism, called striping, that supports pipeline stage reconfiguration and simultaneous configuration and execution. Additionally, the use of the pipeline stage as the atomic unit of reconfiguration introduces a design abstraction that enables the development families of upwardly-compatible FPGAs and virtual hardware design.","PeriodicalId":303064,"journal":{"name":"Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125346948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 106
Computing kernels implemented with a wormhole RTR CCM 用虫洞RTR CCM实现的计算内核
Ray Bittner, P. Athanas
The wormhole run-time reconfiguration (RTR) computing paradigm is a method for creating high performance computational pipelines. The scalability, distributed control and data flow features of the paradigm allow it to fit neatly into the configurable computing machine (CCM) domain. To date, the field has been dominated by large bit-oriented devices whose flexibility can lead to lowered silicon utilization efficiencies. In an effort to raise this efficiency, the Colt CCM has been created based on the wormhole RTR paradigm. This paper outlines methods of implementation and performance for several common operations using these concepts. They serve as indicators of the diversity of algorithms that can be instantiated through the high-speed run-time reconfiguration that these devices make possible. Particular attention is paid to floating point multiplication. Also discussed is the topic of data dependent computation which would seem to be counter intuitive to the wormhole RTR paradigm. The paper concludes with a summary of performance of the three computations.
虫洞运行时重构(RTR)计算范式是一种创建高性能计算管道的方法。该范式的可伸缩性、分布式控制和数据流特性使其能够很好地适应可配置计算机器(CCM)领域。迄今为止,该领域一直由大型面向位的器件主导,其灵活性可能导致硅利用率降低。为了提高效率,Colt CCM是基于虫洞RTR范式创建的。本文概述了使用这些概念的几种常见操作的实现方法和性能。它们可以作为算法多样性的指示器,这些算法可以通过这些设备使之成为可能的高速运行时重新配置来实例化。特别注意浮点乘法。还讨论了数据依赖计算的主题,这似乎与虫洞RTR范式相反。最后对三种计算方法的性能进行了总结。
{"title":"Computing kernels implemented with a wormhole RTR CCM","authors":"Ray Bittner, P. Athanas","doi":"10.1109/FPGA.1997.624609","DOIUrl":"https://doi.org/10.1109/FPGA.1997.624609","url":null,"abstract":"The wormhole run-time reconfiguration (RTR) computing paradigm is a method for creating high performance computational pipelines. The scalability, distributed control and data flow features of the paradigm allow it to fit neatly into the configurable computing machine (CCM) domain. To date, the field has been dominated by large bit-oriented devices whose flexibility can lead to lowered silicon utilization efficiencies. In an effort to raise this efficiency, the Colt CCM has been created based on the wormhole RTR paradigm. This paper outlines methods of implementation and performance for several common operations using these concepts. They serve as indicators of the diversity of algorithms that can be instantiated through the high-speed run-time reconfiguration that these devices make possible. Particular attention is paid to floating point multiplication. Also discussed is the topic of data dependent computation which would seem to be counter intuitive to the wormhole RTR paradigm. The paper concludes with a summary of performance of the three computations.","PeriodicalId":303064,"journal":{"name":"Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121980377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Compilation tools for run-time reconfigurable designs 用于运行时可重构设计的编译工具
W. Luk, N. Shirazi, P. Cheung
This paper describes a framework and tools for automating the production of designs which can be partially reconfigured at run time. The tools include: a partial evaluator, which produces configuration files for a given design, where the number of configurations can be minimised by a process, known as compile-time sequencing; an incremental configuration calculator, which takes the output of the partial evaluator and generates an initial configuration file and incremental configuration files that partially update preceding configurations; and a tool which further optimises designs for FPGAs supporting simultaneous configuration of multiple cells. While many of our techniques are independent of the design language and device used, our tools currently target Xilinx 6200 devices. Simultaneous configuration, for example, can be used to reduce the time for reconfiguring an adder to a subtractor from time linear with respect to its size to constant time at best and logarithmic time at worst.
本文描述了一个可以在运行时部分重新配置的设计自动化生产的框架和工具。这些工具包括:部分评估器,它为给定的设计生成配置文件,其中配置的数量可以通过称为编译时排序的过程最小化;增量配置计算器,其接受部分求值器的输出并生成初始配置文件和部分更新先前配置的增量配置文件;以及进一步优化支持多个单元同时配置的fpga设计的工具。虽然我们的许多技术独立于所使用的设计语言和设备,但我们的工具目前针对的是Xilinx 6200设备。例如,可以使用同步配置来减少重新配置加法器到减法器的时间,从时间线性到常数时间,最好的情况下是常数时间,最坏的情况是对数时间。
{"title":"Compilation tools for run-time reconfigurable designs","authors":"W. Luk, N. Shirazi, P. Cheung","doi":"10.1109/FPGA.1997.624605","DOIUrl":"https://doi.org/10.1109/FPGA.1997.624605","url":null,"abstract":"This paper describes a framework and tools for automating the production of designs which can be partially reconfigured at run time. The tools include: a partial evaluator, which produces configuration files for a given design, where the number of configurations can be minimised by a process, known as compile-time sequencing; an incremental configuration calculator, which takes the output of the partial evaluator and generates an initial configuration file and incremental configuration files that partially update preceding configurations; and a tool which further optimises designs for FPGAs supporting simultaneous configuration of multiple cells. While many of our techniques are independent of the design language and device used, our tools currently target Xilinx 6200 devices. Simultaneous configuration, for example, can be used to reduce the time for reconfiguring an adder to a subtractor from time linear with respect to its size to constant time at best and logarithmic time at worst.","PeriodicalId":303064,"journal":{"name":"Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125218037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 109
Systems performance measurement on PCI Pamette PCI Pamette系统性能测量
L. Moll, M. Shand
We describe the use of a reconfigurable board to obtain information on the performance that can be expected on particular systems. Our goal is to use the reconfigurability, of the board's interface to test a system and discover not only the maximum bandwidth and best latency attainable, but also the way to reliably achieve these figures. The board we present uses the now widespread PCI bus. PCI is sufficiently complex, and its implementations sufficiently varied, that it is impossible to guess the performance that can be obtained by a specific board on a specific computer with the only technical characteristics of the two in hand. We observe astonishing performance differences between almost identical systems and comparable figures between small PCs and big servers. Our performance tests can be an end in themselves, however, they also serve to demonstrate the value of a reconfigurable bus interface. With the same board, we can test and choose a system, make informed architectural decisions on the hardware/software interface, and then finely tune the bus interface to get maximum and predictable figures in the running application.
我们描述了使用可重构板来获取有关特定系统上预期性能的信息。我们的目标是利用电路板接口的可重构性来测试系统,不仅可以发现最大带宽和最佳延迟,而且还可以可靠地实现这些数字。我们提出的板使用现在广泛的PCI总线。PCI是足够复杂的,它的实现是足够多样的,这是不可能猜测的性能,可以通过一个特定的板在一个特定的计算机上获得的唯一的技术特征,这两个在手。我们观察到几乎相同的系统之间惊人的性能差异,以及小型pc和大型服务器之间的可比数据。我们的性能测试本身就是一个目的,但是,它们也用于演示可重构总线接口的价值。使用相同的电路板,我们可以测试和选择系统,在硬件/软件接口上做出明智的体系结构决策,然后精细地调整总线接口,以在运行的应用程序中获得最大和可预测的数据。
{"title":"Systems performance measurement on PCI Pamette","authors":"L. Moll, M. Shand","doi":"10.1109/FPGA.1997.624612","DOIUrl":"https://doi.org/10.1109/FPGA.1997.624612","url":null,"abstract":"We describe the use of a reconfigurable board to obtain information on the performance that can be expected on particular systems. Our goal is to use the reconfigurability, of the board's interface to test a system and discover not only the maximum bandwidth and best latency attainable, but also the way to reliably achieve these figures. The board we present uses the now widespread PCI bus. PCI is sufficiently complex, and its implementations sufficiently varied, that it is impossible to guess the performance that can be obtained by a specific board on a specific computer with the only technical characteristics of the two in hand. We observe astonishing performance differences between almost identical systems and comparable figures between small PCs and big servers. Our performance tests can be an end in themselves, however, they also serve to demonstrate the value of a reconfigurable bus interface. With the same board, we can test and choose a system, make informed architectural decisions on the hardware/software interface, and then finely tune the bus interface to get maximum and predictable figures in the running application.","PeriodicalId":303064,"journal":{"name":"Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131416283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
High level compilation for fine grained FPGAs 用于细粒度fpga的高级编译
M. Gokhale, D. Gomersall
The authors present an integrated tool set to generate highly optimized hardware computation blocks from a C language subset. By starting with a C language description of the algorithm, they address the problem of making FPGA processors accessible to programmers as opposed to hardware designers. Their work is specifically targeted to fine grained FPGAs such as the National Semiconductor CLAy/sup TM/ FPGA family. Such FPGAs exhibit extremely high performance on regular data path circuits, which are more prevalent in computationally oriented hardware applications. Dense packing of data path functional elements makes it possible to fit the computation on one or a small number of chips, and the use of local routing resources makes it possible to clock the chip at a high rate. By developing a lower level tool suite that exploits the regular, geometric nature of fine grained FPGAs, and mapping the compiler output to this tool suite, they greatly improve performance over traditional high level synthesis to fine grained FPGAs.
作者提出了一个集成的工具集,从C语言子集生成高度优化的硬件计算块。通过用C语言描述算法,他们解决了让程序员(而不是硬件设计人员)可以访问FPGA处理器的问题。他们的工作是专门针对细粒度的FPGA,如国家半导体CLAy/sup TM/ FPGA家族。这种fpga在常规数据路径电路上表现出极高的性能,这在面向计算的硬件应用中更为普遍。数据路径功能元素的密集封装使得在一个或少数芯片上进行计算成为可能,并且使用本地路由资源使得以高速率对芯片进行时钟处理成为可能。通过开发一个低级工具套件,利用细粒度fpga的规则、几何特性,并将编译器输出映射到该工具套件,他们大大提高了传统的高级合成到细粒度fpga的性能。
{"title":"High level compilation for fine grained FPGAs","authors":"M. Gokhale, D. Gomersall","doi":"10.1109/FPGA.1997.624616","DOIUrl":"https://doi.org/10.1109/FPGA.1997.624616","url":null,"abstract":"The authors present an integrated tool set to generate highly optimized hardware computation blocks from a C language subset. By starting with a C language description of the algorithm, they address the problem of making FPGA processors accessible to programmers as opposed to hardware designers. Their work is specifically targeted to fine grained FPGAs such as the National Semiconductor CLAy/sup TM/ FPGA family. Such FPGAs exhibit extremely high performance on regular data path circuits, which are more prevalent in computationally oriented hardware applications. Dense packing of data path functional elements makes it possible to fit the computation on one or a small number of chips, and the use of local routing resources makes it possible to clock the chip at a high rate. By developing a lower level tool suite that exploits the regular, geometric nature of fine grained FPGAs, and mapping the compiler output to this tool suite, they greatly improve performance over traditional high level synthesis to fine grained FPGAs.","PeriodicalId":303064,"journal":{"name":"Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186)","volume":"312 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132519267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Datapath-oriented FPGA mapping and placement for configurable computing 面向数据路径的FPGA映射和可配置计算的放置
T. Callahan, J. Wawrzynek
Widespread acceptance of FPGA-based reconfigurable coprocessors will be expedited if compilation time for FPGA configurations can be reduced to be comparable to software compilation. This research achieves this goal, generating complete datapath layouts in fractions of a second rather than hours. Our algorithm, adapted from instruction selection in compilers, packs multiple operations into single rows of CLBs when possible, while preserving a regular bit-slice layout. Furthermore, placement and thus routing delays are considered simultaneously with packing, so that the total delay, not just the CLB delay, is optimized.
如果FPGA配置的编译时间可以减少到与软件编译相当,那么基于FPGA的可重构协处理器的广泛接受将会加快。这项研究实现了这一目标,在几分之一秒而不是几个小时内生成完整的数据路径布局。我们的算法改编自编译器中的指令选择,尽可能将多个操作打包到单行clb中,同时保留常规的位片布局。此外,放置和路由延迟与分组同时考虑,以便优化总延迟,而不仅仅是CLB延迟。
{"title":"Datapath-oriented FPGA mapping and placement for configurable computing","authors":"T. Callahan, J. Wawrzynek","doi":"10.1109/FPGA.1997.624624","DOIUrl":"https://doi.org/10.1109/FPGA.1997.624624","url":null,"abstract":"Widespread acceptance of FPGA-based reconfigurable coprocessors will be expedited if compilation time for FPGA configurations can be reduced to be comparable to software compilation. This research achieves this goal, generating complete datapath layouts in fractions of a second rather than hours. Our algorithm, adapted from instruction selection in compilers, packs multiple operations into single rows of CLBs when possible, while preserving a regular bit-slice layout. Furthermore, placement and thus routing delays are considered simultaneously with packing, so that the total delay, not just the CLB delay, is optimized.","PeriodicalId":303064,"journal":{"name":"Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186)","volume":"149 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122978428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A time-multiplexed FPGA 时间复用FPGA
S. Trimberger, D. Carberry, Anders Johnson, Jennifer Wong
This paper describes the architecture of a time-multiplexed FPGA. Eight configurations of the FPGA are stored in on-chip memory. This inactive on-chip memory is distributed around the chip, and accessible so that the entire configuration of the FPGA can be changed in a single cycle of the memory. The entire configuration of the FPGA can be loaded from this on-chip memory in 30 ns. Inactive memory is accessible as block RAM for applications. The FPGA is based on the Xilinx XC4000E FPGA, and includes extensions for dealing with state saving and forwarding and for increased routing demand due to time-multiplexing the hardware.
本文介绍了一种时间复用FPGA的结构。FPGA的8种配置存储在片上存储器中。这种非活动片上存储器分布在芯片周围,并且可以访问,因此可以在存储器的单个周期内更改FPGA的整个配置。FPGA的整个配置可以在30 ns内从片上存储器加载。非活动内存可以作为块RAM供应用程序访问。FPGA基于Xilinx XC4000E FPGA,包括处理状态保存和转发的扩展,以及由于硬件时间复用而增加的路由需求。
{"title":"A time-multiplexed FPGA","authors":"S. Trimberger, D. Carberry, Anders Johnson, Jennifer Wong","doi":"10.1109/FPGA.1997.624601","DOIUrl":"https://doi.org/10.1109/FPGA.1997.624601","url":null,"abstract":"This paper describes the architecture of a time-multiplexed FPGA. Eight configurations of the FPGA are stored in on-chip memory. This inactive on-chip memory is distributed around the chip, and accessible so that the entire configuration of the FPGA can be changed in a single cycle of the memory. The entire configuration of the FPGA can be loaded from this on-chip memory in 30 ns. Inactive memory is accessible as block RAM for applications. The FPGA is based on the Xilinx XC4000E FPGA, and includes extensions for dealing with state saving and forwarding and for increased routing demand due to time-multiplexing the hardware.","PeriodicalId":303064,"journal":{"name":"Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123749368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 424
期刊
Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1