首页 > 最新文献

Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186)最新文献

英文 中文
Mapping a real-time video algorithm to a context-switched FPGA 将实时视频算法映射到上下文切换FPGA
S. Kelem
This paper describes the implementation of a real-time video algorithm on a context-switched FPGA. The FPGA is based on the Xilinx XC4000E FPGA, and includes extensions for dealing with state saving and forwarding and for increased routing demand due to time-multiplexing the hardware. The algorithm makes use of special features of this architecture to achieve high utilization of the silicon at run time. Two configuration planes are programmed as distributed RAM and two planes perform replications of the calculation in parallel. The interplay between the CLB architecture, communication between configuration planes, context-switching overhead, and the end-user application are examined as we map the algorithm onto this architecture.
本文描述了一种实时视频算法在上下文切换FPGA上的实现。FPGA基于Xilinx XC4000E FPGA,包括处理状态保存和转发的扩展,以及由于硬件时间复用而增加的路由需求。该算法充分利用了该体系结构的特点,在运行时实现了对芯片的高利用率。两个配置平面被编程为分布式RAM,两个平面并行执行计算的复制。当我们将算法映射到该体系结构时,将检查CLB体系结构之间的相互作用、配置平面之间的通信、上下文切换开销以及最终用户应用程序。
{"title":"Mapping a real-time video algorithm to a context-switched FPGA","authors":"S. Kelem","doi":"10.1109/FPGA.1997.625366","DOIUrl":"https://doi.org/10.1109/FPGA.1997.625366","url":null,"abstract":"This paper describes the implementation of a real-time video algorithm on a context-switched FPGA. The FPGA is based on the Xilinx XC4000E FPGA, and includes extensions for dealing with state saving and forwarding and for increased routing demand due to time-multiplexing the hardware. The algorithm makes use of special features of this architecture to achieve high utilization of the silicon at run time. Two configuration planes are programmed as distributed RAM and two planes perform replications of the calculation in parallel. The interplay between the CLB architecture, communication between configuration planes, context-switching overhead, and the end-user application are examined as we map the algorithm onto this architecture.","PeriodicalId":303064,"journal":{"name":"Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124935963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Fault simulation on reconfigurable hardware 可重构硬件的故障仿真
M. Abramovici, P. R. Menon
The authors introduce a new approach to fault simulation, using reconfigurable hardware to implement a critical path tracing algorithm. The performance estimate shows that the approach is at least on order of magnitude faster than serial fault emulation used in prior work.
作者介绍了一种新的故障仿真方法,利用可重构硬件实现关键路径跟踪算法。性能估计表明,该方法比先前工作中使用的串行故障仿真至少快一个数量级。
{"title":"Fault simulation on reconfigurable hardware","authors":"M. Abramovici, P. R. Menon","doi":"10.1109/FPGA.1997.624618","DOIUrl":"https://doi.org/10.1109/FPGA.1997.624618","url":null,"abstract":"The authors introduce a new approach to fault simulation, using reconfigurable hardware to implement a critical path tracing algorithm. The performance estimate shows that the approach is at least on order of magnitude faster than serial fault emulation used in prior work.","PeriodicalId":303064,"journal":{"name":"Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117318004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Comparison of arithmetic architectures for Reed-Solomon decoders in reconfigurable hardware 可重构硬件中Reed-Solomon解码器的算法结构比较
C. Paar, M. Rosner
Reed-Solomon (RS) error correction codes are being widely used in modern communication systems such as compact disk players or satellite communication links. RS codes rely on arithmetic in finite, or Galois fields. The specific field GF(2/sup 8/) is of central importance for many practical systems. The most costly, and thus most critical, elementary operations in RS decoders are multiplication and inversion in Galois fields. Although there have been considerable efforts in the area of Galois field arithmetic architectures, there appears to be very little reported work for Galois field arithmetic for reconfigurable hardware. This contribution provides a systematic comparison of two promising arithmetic architecture classes. The first one is based on a standard base representation, and the second one is based on composite fields. For both classes a multiplier and an inverter for GF(2/sup 8/) are described and theoretical gate counts are provided. Using a design entry based on a VHDL description, each architecture is mapped to a popular FPGA and EPLD device. For each mapping an area and a speed optimization was performed. Absolute values with respect to logic cell counts and critical path simulations are provided. The results show that the composite field architectures can have great advantages on both types of reconfigurable platforms. In particular it is found that composite field multipliers can be more than twice as fast as polynomial base multipliers on FPGAs.
里德-所罗门(RS)纠错码正被广泛应用于现代通信系统,如激光唱机或卫星通信链路。RS码依赖于有限域或伽罗瓦域的算术。特定领域GF(2/sup 8/)对许多实际系统至关重要。RS解码器中最昂贵、最关键的基本操作是伽罗瓦场中的乘法和反演。尽管在伽罗瓦场算法体系结构领域已经做出了相当大的努力,但对于可重构硬件的伽罗瓦场算法,似乎很少有报道。这篇文章提供了两个有前途的算术体系结构类的系统比较。第一个基于标准基表示,第二个基于复合字段。对于这两个类别,描述了一个乘法器和一个用于GF(2/sup 8/)的逆变器,并提供了理论门计数。使用基于VHDL描述的设计条目,每个架构都映射到一个流行的FPGA和EPLD设备。对于每个映射,执行一个区域和速度优化。提供了有关逻辑单元计数和关键路径模拟的绝对值。结果表明,复合场体系结构在两种可重构平台上都具有很大的优势。特别是发现复合场乘法器可以比fpga上的多项式基乘法器快两倍以上。
{"title":"Comparison of arithmetic architectures for Reed-Solomon decoders in reconfigurable hardware","authors":"C. Paar, M. Rosner","doi":"10.1109/FPGA.1997.624622","DOIUrl":"https://doi.org/10.1109/FPGA.1997.624622","url":null,"abstract":"Reed-Solomon (RS) error correction codes are being widely used in modern communication systems such as compact disk players or satellite communication links. RS codes rely on arithmetic in finite, or Galois fields. The specific field GF(2/sup 8/) is of central importance for many practical systems. The most costly, and thus most critical, elementary operations in RS decoders are multiplication and inversion in Galois fields. Although there have been considerable efforts in the area of Galois field arithmetic architectures, there appears to be very little reported work for Galois field arithmetic for reconfigurable hardware. This contribution provides a systematic comparison of two promising arithmetic architecture classes. The first one is based on a standard base representation, and the second one is based on composite fields. For both classes a multiplier and an inverter for GF(2/sup 8/) are described and theoretical gate counts are provided. Using a design entry based on a VHDL description, each architecture is mapped to a popular FPGA and EPLD device. For each mapping an area and a speed optimization was performed. Absolute values with respect to logic cell counts and critical path simulations are provided. The results show that the composite field architectures can have great advantages on both types of reconfigurable platforms. In particular it is found that composite field multipliers can be more than twice as fast as polynomial base multipliers on FPGAs.","PeriodicalId":303064,"journal":{"name":"Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124437291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
Increased FPGA capacity enables scalable, flexible CCMs: an example from image processing 增加的FPGA容量可以实现可扩展的、灵活的ccm:图像处理的一个例子
J. Greenbaum, Michael Baxter
The need to partition computation across multiple programmable devices in array architecture CCMs leads to performance bottlenecks in data flow through the computer and wiring delays between adjacent devices. However, significant improvements in FPGA capacities have brought one to a threshold where direct inter-chip connections are not required because an entire algorithm can be implemented on a single device for important problems in areas such as image processing. One can now implement architectures that are similar to today's parallel computers in which interprocessor communication is done through shared memory or dedicated communication hardware. The benefits of this approach are system-wide scalability and flexibility. The authors illustrate this new style of CCM with examples from image processing, in particular a novel FPGA implementation of block motion estimation (as for MPEG encoding). Based on the lessons learned from these specific examples, they generalize and speculate on implications for new CCM architectures.
在阵列架构ccm中,需要跨多个可编程设备划分计算,这会导致计算机数据流的性能瓶颈和相邻设备之间的布线延迟。然而,FPGA容量的显著改进使其达到了不需要直接芯片间连接的阈值,因为整个算法可以在单个设备上实现,以解决诸如图像处理等领域的重要问题。现在可以实现类似于当今并行计算机的体系结构,其中处理器间通信是通过共享内存或专用通信硬件完成的。这种方法的好处是系统范围的可伸缩性和灵活性。作者用图像处理的例子来说明这种新型CCM,特别是一种新的块运动估计的FPGA实现(如MPEG编码)。基于从这些特定示例中获得的经验教训,他们概括并推测了新的CCM体系结构的含义。
{"title":"Increased FPGA capacity enables scalable, flexible CCMs: an example from image processing","authors":"J. Greenbaum, Michael Baxter","doi":"10.1109/FPGA.1997.624621","DOIUrl":"https://doi.org/10.1109/FPGA.1997.624621","url":null,"abstract":"The need to partition computation across multiple programmable devices in array architecture CCMs leads to performance bottlenecks in data flow through the computer and wiring delays between adjacent devices. However, significant improvements in FPGA capacities have brought one to a threshold where direct inter-chip connections are not required because an entire algorithm can be implemented on a single device for important problems in areas such as image processing. One can now implement architectures that are similar to today's parallel computers in which interprocessor communication is done through shared memory or dedicated communication hardware. The benefits of this approach are system-wide scalability and flexibility. The authors illustrate this new style of CCM with examples from image processing, in particular a novel FPGA implementation of block motion estimation (as for MPEG encoding). Based on the lessons learned from these specific examples, they generalize and speculate on implications for new CCM architectures.","PeriodicalId":303064,"journal":{"name":"Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115303647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Efficient implementation of the DCT on custom computers 在自定义计算机上有效地实现DCT
N. Bergmann, Yuk Ying Chung, B. Gunther
The discrete cosine transform (DCT) is a key step in many image and video coding applications, and its efficient implementation has been extensively studied for software implementations and for custom VLSI. We analyse the use of the distributed arithmetic algorithm for the efficient implementation of the DCT in reconfigurable logic.
离散余弦变换(DCT)是许多图像和视频编码应用中的关键步骤,其高效实现已经被广泛研究用于软件实现和定制VLSI。我们分析了分布式算法在可重构逻辑中有效实现DCT的应用。
{"title":"Efficient implementation of the DCT on custom computers","authors":"N. Bergmann, Yuk Ying Chung, B. Gunther","doi":"10.1109/FPGA.1997.624628","DOIUrl":"https://doi.org/10.1109/FPGA.1997.624628","url":null,"abstract":"The discrete cosine transform (DCT) is a key step in many image and video coding applications, and its efficient implementation has been extensively studied for software implementations and for custom VLSI. We analyse the use of the distributed arithmetic algorithm for the efficient implementation of the DCT in reconfigurable logic.","PeriodicalId":303064,"journal":{"name":"Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125683434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Implementation of single precision floating point square root on FPGAs 在fpga上实现单精度浮点平方根
Yamin Li, Wanming Chu
The square root operation is hard to implement on FPGAs because of the complexity of the algorithms. In this paper, we present a non-restoring square root algorithm and two very simple single precision floating point square root implementations based on the algorithm on FPGAs. One is low-cost iterative implementation that uses a traditional adder/subtracter. The operation latency is 25 clock cycles and the issue rate is 24 clock cycles. The other is high-throughput pipelined implementation that uses multiple adder/subtracters. The operation latency is 15 clock cycles and the issue rate is one clock cycle. It means that the pipelined implementation is capable of accepting a square root instruction on every clock cycle.
由于算法的复杂性,平方根运算很难在fpga上实现。本文提出了一种非恢复平方根算法,并在fpga上实现了两种非常简单的单精度浮点平方根算法。一种是使用传统加/减法器的低成本迭代实现。操作时延为25个时钟周期,发布速率为24个时钟周期。另一种是使用多个加/减法器的高吞吐量流水线实现。操作时延为15个时钟周期,发布速率为1个时钟周期。这意味着流水线实现能够在每个时钟周期上接受平方根指令。
{"title":"Implementation of single precision floating point square root on FPGAs","authors":"Yamin Li, Wanming Chu","doi":"10.1109/FPGA.1997.624623","DOIUrl":"https://doi.org/10.1109/FPGA.1997.624623","url":null,"abstract":"The square root operation is hard to implement on FPGAs because of the complexity of the algorithms. In this paper, we present a non-restoring square root algorithm and two very simple single precision floating point square root implementations based on the algorithm on FPGAs. One is low-cost iterative implementation that uses a traditional adder/subtracter. The operation latency is 25 clock cycles and the issue rate is 24 clock cycles. The other is high-throughput pipelined implementation that uses multiple adder/subtracters. The operation latency is 15 clock cycles and the issue rate is one clock cycle. It means that the pipelined implementation is capable of accepting a square root instruction on every clock cycle.","PeriodicalId":303064,"journal":{"name":"Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126424525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 123
Defect tolerance on the Teramac custom computer Teramac自定义计算机上的缺陷容忍度
Bruce Culbertson, R. Amerson, R. Carter, P. Kuekes, G. Snider
Teramac is a large custom computer which works correctly despite the fact that three quarters of its FPGAs contain defects. This is accomplished through unprecedented use of defect tolerance, which substantially reduces Teramac's cost and permits it to have an unusually complex interconnection network. Teramac tolerates defective resources, like gates and wires, that are introduced during the manufacture of its FPGAs and other components, and during assembly of the system. We have developed methods to precisely locate defects. User designs are mapped onto the system by a completely automated process that avoids the defects and hides the defect tolerance from the user. Defective components are not physically removed from the system.
Teramac是一种大型定制计算机,尽管其四分之三的fpga存在缺陷,但它仍能正常工作。这是通过前所未有地使用缺陷容错来实现的,这大大降低了Teramac的成本,并允许它拥有一个异常复杂的互连网络。Teramac容忍有缺陷的资源,如门和电线,在其fpga和其他组件的制造过程中引入,以及在系统组装过程中。我们已经开发出精确定位缺陷的方法。用户设计通过一个完全自动化的过程映射到系统上,该过程避免了缺陷,并对用户隐藏了缺陷容忍度。有缺陷的部件没有从系统中物理移除。
{"title":"Defect tolerance on the Teramac custom computer","authors":"Bruce Culbertson, R. Amerson, R. Carter, P. Kuekes, G. Snider","doi":"10.1109/FPGA.1997.624611","DOIUrl":"https://doi.org/10.1109/FPGA.1997.624611","url":null,"abstract":"Teramac is a large custom computer which works correctly despite the fact that three quarters of its FPGAs contain defects. This is accomplished through unprecedented use of defect tolerance, which substantially reduces Teramac's cost and permits it to have an unusually complex interconnection network. Teramac tolerates defective resources, like gates and wires, that are introduced during the manufacture of its FPGAs and other components, and during assembly of the system. We have developed methods to precisely locate defects. User designs are mapped onto the system by a completely automated process that avoids the defects and hides the defect tolerance from the user. Defective components are not physically removed from the system.","PeriodicalId":303064,"journal":{"name":"Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125056696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 133
Laser defect correction applications to FPGA based custom computers 激光缺陷校正在基于FPGA的定制计算机中的应用
G. Chapman, B. Dufort
The complexity and speed of monolithic FPGA based custom computers has been set by the presence of defective sections which limit chip area. Test FPGAs show that laser link defect avoidance routing around flawed blocks generates delays <50% of active switches, making the error cell distribution nearly invisible.
基于单片FPGA的定制计算机的复杂度和速度取决于缺陷部分的存在,缺陷部分限制了芯片的面积。测试fpga表明,围绕有缺陷块的激光链路缺陷避免路由产生的延迟<50%的有源开关,使误差单元分布几乎不可见。
{"title":"Laser defect correction applications to FPGA based custom computers","authors":"G. Chapman, B. Dufort","doi":"10.1109/FPGA.1997.624626","DOIUrl":"https://doi.org/10.1109/FPGA.1997.624626","url":null,"abstract":"The complexity and speed of monolithic FPGA based custom computers has been set by the presence of defective sections which limit chip area. Test FPGAs show that laser link defect avoidance routing around flawed blocks generates delays <50% of active switches, making the error cell distribution nearly invisible.","PeriodicalId":303064,"journal":{"name":"Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127895980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A wireless LAN demodulator in a Pamette: design and experience Pamette无线局域网解调器的设计与体会
T. McDermott, P. Ryan, M. Shand, D. Skellern, T. Percival, N. Weste
We have implemented the digital section of a wireless local area network (WLAN) demodulator in a reconfigurable interface card called the PCI Pamette. The entire baseband section of the demodulator has been implemented using the Pamette and a simple analog to digital mezzanine board. This is the second version of the demodulator, the first being a card-based design using a mixture of discrete and reconfigurable logic. The Pamette design took far less time to complete than the card-based one. Moreover, the reconfigurable substrate is much more versatile. We describe the Pamette implementation and discuss our experiences with the two different design styles and technologies.
我们已经实现了无线局域网(WLAN)解调器的数字部分在一个可重构的接口卡称为PCI Pamette。整个基带部分的解调器已经实现使用Pamette和一个简单的模拟数字夹层板。这是解调器的第二个版本,第一个是基于卡的设计,使用离散和可重构逻辑的混合。Pamette的设计比基于卡片的设计花费的时间要少得多。此外,可重构基板更加通用。我们描述了Pamette的实现,并讨论了我们使用这两种不同设计风格和技术的经验。
{"title":"A wireless LAN demodulator in a Pamette: design and experience","authors":"T. McDermott, P. Ryan, M. Shand, D. Skellern, T. Percival, N. Weste","doi":"10.1109/FPGA.1997.624603","DOIUrl":"https://doi.org/10.1109/FPGA.1997.624603","url":null,"abstract":"We have implemented the digital section of a wireless local area network (WLAN) demodulator in a reconfigurable interface card called the PCI Pamette. The entire baseband section of the demodulator has been implemented using the Pamette and a simple analog to digital mezzanine board. This is the second version of the demodulator, the first being a card-based design using a mixture of discrete and reconfigurable logic. The Pamette design took far less time to complete than the card-based one. Moreover, the reconfigurable substrate is much more versatile. We describe the Pamette implementation and discuss our experiences with the two different design styles and technologies.","PeriodicalId":303064,"journal":{"name":"Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129179580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
An FPGA architecture for DRAM-based systolic computations 基于dram的收缩计算的FPGA架构
N. Margolus
We propose an FPGA chip architecture based on a conventional FPGA logic array core, in which I/O pins are clocked at a much higher rate than that of the logic array that they serve. Wide data paths within the chip are time multiplexed at the edge of the chip into much faster and narrower data paths that run off-chip. This kind of arrangement makes it possible to interface a relatively slow FPGA core with high speed memories and data streams, and is useful for many pin-limited FPGA applications. For efficient use of the highest bandwidth DRAMs, our proposed chip includes a RAMBUS DRAM interface, a burst-transfer controller, and burst buffers. This proposal is motivated by our work with virtual processor cellular automata (CA) machines-a kind of SIMD computer. Our next generation of CA machines requires reconfigurable FPGA-like processors coupled to the highest speed DRAMs and SRAMs available. Unfortunately, no current FPGA chips have appropriate DRAM I/O support or the speed needed to easily interface with pipelined SRAMs. The chips proposed would make a wide range of large-scale CA simulations of 3D physical systems practical and economical-simulations that are currently well beyond the reach of any existing computer. These chips would also be well suited to a broad range of other simulation, graphics and DSP-like applications.
我们提出了一种基于传统FPGA逻辑阵列核心的FPGA芯片架构,其中I/O引脚的时钟速率远高于它们所服务的逻辑阵列的时钟速率。芯片内的宽数据路径在芯片边缘被时间多路复用成更快更窄的数据路径,在芯片外运行。这种安排使得相对较慢的FPGA核心与高速存储器和数据流的接口成为可能,并且对于许多引脚受限的FPGA应用非常有用。为了有效地利用最高带宽的DRAM,我们提出的芯片包括RAMBUS DRAM接口、突发传输控制器和突发缓冲区。这个提议的动机是我们对虚拟处理器元胞自动机(CA)机器的研究——一种SIMD计算机。我们的下一代CA机器需要可重构的类似fpga的处理器,以及最高速度的dram和sram。不幸的是,目前没有FPGA芯片具有适当的DRAM I/O支持或与流水线sram轻松接口所需的速度。提出的芯片将使3D物理系统的大规模CA模拟变得实用和经济——目前任何现有计算机都无法实现的模拟。这些芯片也将非常适合于广泛的其他模拟、图形和类似dsp的应用。
{"title":"An FPGA architecture for DRAM-based systolic computations","authors":"N. Margolus","doi":"10.1109/FPGA.1997.624599","DOIUrl":"https://doi.org/10.1109/FPGA.1997.624599","url":null,"abstract":"We propose an FPGA chip architecture based on a conventional FPGA logic array core, in which I/O pins are clocked at a much higher rate than that of the logic array that they serve. Wide data paths within the chip are time multiplexed at the edge of the chip into much faster and narrower data paths that run off-chip. This kind of arrangement makes it possible to interface a relatively slow FPGA core with high speed memories and data streams, and is useful for many pin-limited FPGA applications. For efficient use of the highest bandwidth DRAMs, our proposed chip includes a RAMBUS DRAM interface, a burst-transfer controller, and burst buffers. This proposal is motivated by our work with virtual processor cellular automata (CA) machines-a kind of SIMD computer. Our next generation of CA machines requires reconfigurable FPGA-like processors coupled to the highest speed DRAMs and SRAMs available. Unfortunately, no current FPGA chips have appropriate DRAM I/O support or the speed needed to easily interface with pipelined SRAMs. The chips proposed would make a wide range of large-scale CA simulations of 3D physical systems practical and economical-simulations that are currently well beyond the reach of any existing computer. These chips would also be well suited to a broad range of other simulation, graphics and DSP-like applications.","PeriodicalId":303064,"journal":{"name":"Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127357597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
期刊
Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1