Third International ACM Symposium on Field-Programmable Gate Arrays最新文献

英文中文

The Design of RPM: An FPGA-based Multiprocessor Emulator 基于fpga的多处理器仿真器RPM的设计

Third International ACM Symposium on Field-Programmable Gate Arrays

Pub Date : 1995-02-15 DOI: 10.1145/201310.201321

Koray Öner, L. Barroso, S. Iman, Jaeheon Jeong, Krishnan Ramamurthy, M. Dubois

Recent advances in Field-Programmable Gate Arrays (FPGA) and programmable interconnects have made it possible to build efficient hardware emulation engines. In addition, improvements in Computer-Aided Design (CAD) tools, mainly in synthesis tools, greatly simplify the design of large circuits. The RPM (Rapid Prototype Engine for Multiprocessors) Project leverages these two technological advances. Its goal is to develop a common hardware platform for the emulation of multiprocessor systems with different architectures. For cost reasons, the use of FPGAs in RPM is limited to the memory controllers, while the rest of the emulator, including the processors, memories and interconnect, is built with off-the-shelf components. A flexible non-intrusive event logging mechanism is included at all levels of the memory hierarchy, making it possible to monitor the emulation in very fine detail. This paper presents the hardware design of RPM.

现场可编程门阵列(FPGA)和可编程互连的最新进展使得构建高效的硬件仿真引擎成为可能。此外，计算机辅助设计(CAD)工具的改进，主要是合成工具，大大简化了大型电路的设计。RPM(多处理器快速原型引擎)项目利用了这两项技术进步。它的目标是开发一个通用的硬件平台，用于模拟具有不同体系结构的多处理器系统。出于成本原因，fpga在RPM中的使用仅限于内存控制器，而模拟器的其余部分，包括处理器、存储器和互连，都是用现成的组件构建的。灵活的非侵入性事件日志记录机制包含在内存层次结构的所有级别，从而可以非常详细地监视模拟。本文介绍了RPM的硬件设计。

引用次数: 31

Applications of Slack Neighborhood Graphs to Timing Driven Optimization Problems in FPGAs 松弛邻域图在fpga时序驱动优化问题中的应用

Third International ACM Symposium on Field-Programmable Gate Arrays

Pub Date : 1995-02-15 DOI: 10.1145/201310.201329

Anmol Mathur, Kuang-Chien Chen, C. Liu

In this paper we examine three different problems related to FPGA placement: timing driven placement of a technology mapped circuit, timing driven reconfiguration for yield enhancement and fault tolerance in FPGAs and timing driven design re-engineering for FPGAs. We show that timing driven relocation which transforms an infeasible placement into a feasible one is a key problem the solution of which will lead to good algorithms for all three of these optimization problems. We introduce the concept of a slack neighborhood graph (SNG) as a general tool for timing driven relocation of modules in an infeasible placement with a bounded increase in critical path delay. The slack neighborhood graph approach provides a unified approach to the solution of the three timing driven optimization problems of interest in this paper.

在本文中，我们研究了与FPGA放置相关的三个不同问题:时序驱动的技术映射电路放置，时序驱动的FPGA产量增强和容错重构以及时序驱动的FPGA设计再工程。我们表明，时间驱动的重新定位将不可行的位置转换为可行的位置是一个关键问题，解决这个问题将导致所有这三个优化问题的良好算法。我们引入了松弛邻域图(SNG)的概念，作为一种通用工具，用于在关键路径延迟有界增加的不可行的位置上定时驱动模块的重定位。松弛邻域图方法提供了一种统一的方法来解决本文感兴趣的三个时序驱动优化问题。

引用次数: 6

A Field-Programmable Mixed-Analog-Digital Array 一种现场可编程混合模拟-数字阵列

Third International ACM Symposium on Field-Programmable Gate Arrays

Pub Date : 1995-02-15 DOI: 10.1145/201310.201327

P. Chow, P. Gulak

A novel field-programmable mixed-analog-digital array (FPMA) is proposed, which contains a field-programmable analog array, a field-programmable digital array, and a mixed-signal interface. This device is intended to be used for the rapid implementation of mixed-signal circuits. The resource and architectural requirements for this array are determined by analyzing a set of sample circuits. The mixed-signal interface is constructed from converter blocks that contain configurable A/D and D/A converters, which gives some flexibility in the specification of the interface. A 1.2 mm CMOS prototype IC has been designed to demonstrate the feasibility of FPMA technology.

提出了一种新型的现场可编程混合模数阵列(FPMA)，它包含一个现场可编程模拟阵列、一个现场可编程数字阵列和一个混合信号接口。该器件旨在用于混合信号电路的快速实现。该阵列的资源和架构要求是通过分析一组采样电路来确定的。混合信号接口由包含可配置A/D和D/A转换器的转换模块构成，这在接口规范方面提供了一些灵活性。设计了一个1.2 mm CMOS原型IC，以验证FPMA技术的可行性。

引用次数: 33

Techniques for FPGA Implementation of Video Compression Systems 视频压缩系统的FPGA实现技术

Third International ACM Symposium on Field-Programmable Gate Arrays

Pub Date : 1995-02-15 DOI: 10.1145/201310.201334

B. Schoner, J. Villasenor, S. Molloy, R. Jain

Real-time video compression is a challenging subject for FPGA implementation because it typically has a large computational complexity and requires high data throughput. Previous implementations have used parallel banks of FPGAs or DSPs to meet these requirements. Using design techniques that maximize FPGA utilization, we have implemented two video compression systems, each of which uses a single FPGA. In the first system, algorithmic optimizations are made to create a low-complexity implementation that exploits the in-system programmability of the FPGA. This low-complexity implementation performs well, but is limited to a single compression algorithm. In the second system, the FPGA is augmented with an external, low-complexity, video signal processor (VSP.) This combination of ASIC and FPGA is flexible enough to implement four common compression algorithms, and powerful enough to execute them in real time.

实时视频压缩是FPGA实现的一个具有挑战性的课题，因为它通常具有很大的计算复杂度和高数据吞吐量。以前的实现使用fpga或dsp的并行组来满足这些要求。利用最大化FPGA利用率的设计技术，我们实现了两个视频压缩系统，每个系统都使用单个FPGA。在第一个系统中，进行算法优化以创建利用FPGA系统内可编程性的低复杂性实现。这种低复杂度的实现性能良好，但仅限于单一的压缩算法。在第二个系统中，FPGA增加了一个外部低复杂度视频信号处理器(VSP)。这种ASIC和FPGA的组合足够灵活，可以实现四种常见的压缩算法，并且足够强大，可以实时执行它们。

引用次数: 30

TIERS: Topology IndependEnt Pipelined Routing and Scheduling for VirtualWire™ Compilation 层:拓扑无关的流水线路由和调度的VirtualWire&#8482编译

Third International ACM Symposium on Field-Programmable Gate Arrays

Pub Date : 1995-02-15 DOI: 10.1145/201310.201314

C. Selvidge, A. Agarwal, M. Dahl, J. Babb

TIERS is a new pipelined routing and scheduling algorithm implemented in a completeVirtualWire TM compilation and synthesis system. TIERS is described and compared to prior work both analytically and quantitatively. TIERS improves system speed by as much as a factor of 2.5 over prior work. TIERS routing results for both Altera and Xilinx based FPGA systems are provided.

TIERS是在一个完整的virtualwire TM编译和合成系统中实现的一种新的流水线路由和调度算法。对各层进行了描述，并与之前的工作进行了分析和定量的比较。与之前的工作相比，分层将系统速度提高了2.5倍。给出了基于Altera和Xilinx的FPGA系统的层路由结果。

引用次数: 39

Simultaneous Depth and Area Minimization in LUT-based FPGA Mapping 基于lut的FPGA映射的同时深度和面积最小化

Third International ACM Symposium on Field-Programmable Gate Arrays

Pub Date : 1995-02-15 DOI: 10.1145/201310.201322

J. Cong, Yean-Yow Hwang

In this paper, we present an improvement of the FlwoMap algorithm, named CutMap, which combines depth and area minimization during the mapping process by computing min-cost min-height K-feasible cuts for critical nodes for depth minimization and computing min-cost K-feasible cuts for non-critical nodes for area minimization. CutMap guarantees depth-optimal mapping solutions in polynomial time as the FlowMap algorithm but uses considerably fewer K-LUTs. We have implemented CutMap and tested it on the MCNC logic synthesis benchmarks. For depth-optimal mapping solutions, CutMap uses 15% fewer K-LUTs than FlowMap. We also tested CutMap followed by the depth relaxation routines in FlowMap_r algorithm, which achieves area minimization by depth relaxation. CutMap followed FlowMap_r performs better than FlowMap_r.

本文提出了对FlwoMap算法的改进，即CutMap，该算法通过对关键节点计算最小代价最小高度k可行切割来实现深度最小化，对非关键节点计算最小代价k可行切割来实现面积最小化，从而将映射过程中的深度最小化和面积最小化结合起来。与FlowMap算法一样，CutMap在多项式时间内保证深度最优映射解决方案，但使用的k - lut要少得多。我们已经实现了CutMap并在MCNC逻辑合成基准上进行了测试。对于深度最优映射解决方案，CutMap使用的k - lut比FlowMap少15%。我们还在FlowMap_r算法中测试了CutMap之后的深度松弛例程，通过深度松弛实现了面积最小化。CutMap紧随FlowMap_r之后，性能优于FlowMap_r。

引用次数: 102

High-Level Bit-Serial Datapath Synthesis for Multi-FPGA Systems 多fpga系统的高级位串行数据路径合成

Third International ACM Symposium on Field-Programmable Gate Arrays

Pub Date : 1995-02-15 DOI: 10.1145/201310.201336

T. Isshiki, W. Dai

Field-programmable hardware exhibits a new trend towards computation-intensive applications. The basic idea is to completely customize the hardware architecture for the very given application in order to allocation the logic resources efficiently and effectively, improving the performance several orders of magnitude greater than general-purpose processor implementation. And at the same time, it still covers a wide variety of applications for their reconfigurability.

现场可编程硬件呈现出计算密集型应用的新趋势。其基本思想是为给定的应用程序完全定制硬件架构，以便高效地分配逻辑资源，从而比通用处理器实现提高几个数量级的性能。同时，由于其可重构性，它仍然涵盖了各种各样的应用。

引用次数: 23

Multiple FPGA Partitioning with Performance Optimization 性能优化的多FPGA分区

Third International ACM Symposium on Field-Programmable Gate Arrays

Pub Date : 1995-02-15 DOI: 10.1145/201310.201333

Kalapi Roy-Neogi, C. Sechen

We address the problem of partitioning a technology mapped FPGA circuit onto multiple FPGAs of a specific target technology. The physical characteristics of the multiple FPGA system (MFS) pose additional constraints to the circuit partitioning algorithms: the capacity of each FPGA, the timing constraints, the number of I/Os per FPGA, and the pre-designed interconnection patterns of the MFS. Existing partitioning techniques which minimize just the cut sizes of partitions fail to satisfy the above challenges. We therefore present a rectilinear partitioning algorithm which efficiently and accurately handles timing specifications. The signal path delays are estimated during partitioning using a timing model specific to a multiple FPGA architecture. The model combines all possible delay factors in a system with multiple FPGA chips of a target technology. A new dynamic net-weighting scheme was incorporated to minimize the number of pin-outs for each chip. Finally, we have developed a graph-based global router for pin assignment which can handle the pre-routed connections of our MFS structure. We successfully partitioned the MCNC Xilinx FPGA benchmarks producing 100% routable designs with high utilization levels in all cases. Using the performance optimization capabilities in our approach we have successfully partitioned these benchmarks satisfying the critical path constraints and achieving a significant reduction in the longest path delay. An average reduction of 17% in the longest path delay was achieved at the cost of 5% in total wire length. We have proved the effectiveness of our performance optimization technique by verifying the timing predictions of our partitioner with the actual delays obtained after placement and routing of a partitioned MFS. Partitioning results obtained with the Xilinx mapped MCNC benchmarks are encouraging.

我们解决了将FPGA电路映射到特定目标技术的多个FPGA上的技术分区问题。多FPGA系统(MFS)的物理特性对电路划分算法提出了额外的约束:每个FPGA的容量、时序约束、每个FPGA的I/ o数量以及预先设计的MFS互连模式。现有的仅仅最小化分区分割大小的分区技术无法满足上述挑战。因此，我们提出了一种有效而准确地处理时序规范的线性划分算法。使用特定于多FPGA架构的时序模型来估计分区期间的信号路径延迟。该模型将系统中所有可能的延迟因素与目标技术的多个FPGA芯片相结合。采用了一种新的动态净加权方案，以最大限度地减少每个芯片的引脚数。最后，我们开发了一个基于图的全局引脚分配路由器，它可以处理我们的MFS结构的预路由连接。我们成功地对MCNC赛灵思FPGA基准进行了分区，在所有情况下都产生了100%可路由的高利用率设计。使用我们方法中的性能优化功能，我们已经成功地划分了满足关键路径约束的这些基准，并显著减少了最长路径延迟。最长路径延迟平均减少17%，而总导线长度减少5%。通过验证分区器的时间预测和分区MFS放置和路由后获得的实际延迟，我们证明了性能优化技术的有效性。使用Xilinx映射的MCNC基准测试获得的分区结果令人鼓舞。

{"title":"Multiple FPGA Partitioning with Performance Optimization","authors":"Kalapi Roy-Neogi, C. Sechen","doi":"10.1145/201310.201333","DOIUrl":"https://doi.org/10.1145/201310.201333","url":null,"abstract":"We address the problem of partitioning a technology mapped FPGA circuit onto multiple FPGAs of a specific target technology. The physical characteristics of the multiple FPGA system (MFS) pose additional constraints to the circuit partitioning algorithms: the capacity of each FPGA, the timing constraints, the number of I/Os per FPGA, and the pre-designed interconnection patterns of the MFS. Existing partitioning techniques which minimize just the cut sizes of partitions fail to satisfy the above challenges. We therefore present a rectilinear partitioning algorithm which efficiently and accurately handles timing specifications. The signal path delays are estimated during partitioning using a timing model specific to a multiple FPGA architecture. The model combines all possible delay factors in a system with multiple FPGA chips of a target technology. A new dynamic net-weighting scheme was incorporated to minimize the number of pin-outs for each chip. Finally, we have developed a graph-based global router for pin assignment which can handle the pre-routed connections of our MFS structure. We successfully partitioned the MCNC Xilinx FPGA benchmarks producing 100% routable designs with high utilization levels in all cases. Using the performance optimization capabilities in our approach we have successfully partitioned these benchmarks satisfying the critical path constraints and achieving a significant reduction in the longest path delay. An average reduction of 17% in the longest path delay was achieved at the cost of 5% in total wire length. We have proved the effectiveness of our performance optimization technique by verifying the timing predictions of our partitioner with the actual delays obtained after placement and routing of a partitioned MFS. Partitioning results obtained with the Xilinx mapped MCNC benchmarks are encouraging.","PeriodicalId":396858,"journal":{"name":"Third International ACM Symposium on Field-Programmable Gate Arrays","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134381169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 42

Using Architectural "Families" to Increase FPGA Speed and Density 利用架构“家族”提高FPGA速度和密度

Third International ACM Symposium on Field-Programmable Gate Arrays

Pub Date : 1995-02-15 DOI: 10.1145/201310.201312

Vaughn Betz, Jonathan Rose

In order to narrow the speed and density gap between FPGAs and MPGAs we propose the development of "families" of FPGAs. Each FPGA family is targeted at a single maximum logic capacity, and consists of several "siblings", or FPGAs of different yet complementary architectures. Any given application circuit is implemented in the sibling with the most appropriate architecture. With properly chosen siblings, one can develop a family of FPGAs which will have better speed and density than any single FPGA. We apply this concept to create two different FPGA families, one composed of architectures with different types of hard-wired logic blocks and the other created from architectures with different types of heterogeneous logic blocks. We found that a family composed of eight chips with different hard-wired logic block architectures simultaneously improves density by 12 to 14% and speed by 18 to 20% over the best single hard-wired FPGA.

为了缩小fpga和MPGAs在速度和密度上的差距，我们提出了fpga“家族”的发展。每个FPGA系列都以单个最大逻辑容量为目标，并由几个“兄弟姐妹”或不同但互补架构的FPGA组成。任何给定的应用电路都是用最合适的体系结构在同级电路中实现的。通过正确选择兄弟姐妹，可以开发出比任何单个FPGA具有更好速度和密度的FPGA系列。我们应用这一概念来创建两个不同的FPGA系列，一个由具有不同类型硬连接逻辑块的体系结构组成，另一个由具有不同类型异构逻辑块的体系结构组成。我们发现，与最好的单硬连线FPGA相比，由8个具有不同硬连线逻辑块架构的芯片组成的家族同时将密度提高了12%至14%，速度提高了18%至20%。

引用次数: 13

HGA: A Hardware-Based Genetic Algorithm HGA:基于硬件的遗传算法

Third International ACM Symposium on Field-Programmable Gate Arrays

Pub Date : 1995-02-15 DOI: 10.1145/201310.201319

S. Scott, A. Samal, S. Seth

A genetic algorithm (GA) is a robust problem-solving method based on natural selection. Hardware's speed advantage and its ability to parallelize offer great rewards to genetic algorithms. Speedups of 1-3 orders of magnitude have been observed when frequently used software routines were implemented in hardware by way of reprogrammable field-programmable gate arrays (FPGAs). Reprogrammability is essential in a general-purpose GA engine because certain GA modules require changeability (e.g. the function to be optimized by the GA). Thus a hardware-based GA is both feasible and desirable. A fully functional hardware-based genetic algorithm (the HGA) is presented here as a proof-of-concept system. It was designed using VHDL to allow for easy scalability. It is designed to act as a coprocessor with the CPU of a PC. The user programs the FPGAs which implement the function to be optimized. Other GA parameters may also be specified by the user. Simulation results and performance analyses of the HGA are presented. A prototype HGA is described and compared to a similar GA implemented in software. In the simple tests, the prototype took about 6% as many clock cycles to run as the software-based GA. Further suggested improvements could realistically make the HGA 2-3 orders of magnitude faster than the software-based GA.

遗传算法是一种基于自然选择的鲁棒性问题求解方法。硬件的速度优势及其并行化能力为遗传算法提供了巨大的回报。当经常使用的软件例程通过可重新编程的现场可编程门阵列(fpga)在硬件中实现时，已经观察到1-3个数量级的加速。可编程性在通用遗传算法引擎中是必不可少的，因为某些遗传算法模块需要可变性(例如，要由遗传算法优化的功能)。因此，基于硬件的遗传算法是可行和可取的。一个全功能的基于硬件的遗传算法(HGA)在这里提出了一个概念验证系统。它是使用VHDL设计的，以便易于扩展。它被设计成与PC的CPU一起作为协处理器。用户编写实现所要优化功能的fpga。其他GA参数也可由用户指定。给出了该算法的仿真结果和性能分析。描述了HGA的原型，并与软件实现的类似遗传算法进行了比较。在简单的测试中，原型运行的时钟周期大约是基于软件的遗传算法的6%。进一步建议的改进可以使HGA比基于软件的GA快2-3个数量级。

{"title":"HGA: A Hardware-Based Genetic Algorithm","authors":"S. Scott, A. Samal, S. Seth","doi":"10.1145/201310.201319","DOIUrl":"https://doi.org/10.1145/201310.201319","url":null,"abstract":"A genetic algorithm (GA) is a robust problem-solving method based on natural selection. Hardware's speed advantage and its ability to parallelize offer great rewards to genetic algorithms. Speedups of 1-3 orders of magnitude have been observed when frequently used software routines were implemented in hardware by way of reprogrammable field-programmable gate arrays (FPGAs). Reprogrammability is essential in a general-purpose GA engine because certain GA modules require changeability (e.g. the function to be optimized by the GA). Thus a hardware-based GA is both feasible and desirable. A fully functional hardware-based genetic algorithm (the HGA) is presented here as a proof-of-concept system. It was designed using VHDL to allow for easy scalability. It is designed to act as a coprocessor with the CPU of a PC. The user programs the FPGAs which implement the function to be optimized. Other GA parameters may also be specified by the user. Simulation results and performance analyses of the HGA are presented. A prototype HGA is described and compared to a similar GA implemented in software. In the simple tests, the prototype took about 6% as many clock cycles to run as the software-based GA. Further suggested improvements could realistically make the HGA 2-3 orders of magnitude faster than the software-based GA.","PeriodicalId":396858,"journal":{"name":"Third International ACM Symposium on Field-Programmable Gate Arrays","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125530117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 188

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Third International ACM Symposium on Field-Programmable Gate Arrays

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀