首页 > 最新文献

2007 3rd Southern Conference on Programmable Logic最新文献

英文 中文
Execution of Algorithms Using a Dynamic Dataflow Model for Reconfigurable Hardware - Commands in Dataflow Graph 基于动态数据流模型的可重构硬件算法的执行——数据流图中的命令
Pub Date : 2007-06-18 DOI: 10.1109/SPL.2007.371755
V. Astolfi, Jorge LuizeSilva
Many modern scientific and engineering applications such as weather forecast, medical diagnostics, artificial intelligence, and industrial automation, demand increased computational capacity. Actual high-performance architectures are focused on the concepts of parallel processing. One of these architectures is the dataflow model, which explores parallelism in a natural form. This paper describes briefly the dataflow model and its dynamic dataflow graph (DDFG), which is the basic structure to execute dataflow programs. DDFGs of control flow statements used in the C language, such as do-while and switch, are proposed. The results of a "proof-of-concept" for the control flow DDFGs are presented at the end of this paper.
许多现代科学和工程应用,如天气预报、医疗诊断、人工智能和工业自动化,都需要增加计算能力。实际的高性能体系结构主要关注并行处理的概念。其中一种体系结构是数据流模型,它以自然的形式探索并行性。简要介绍了数据流模型及其动态数据流图(DDFG),动态数据流图是执行数据流程序的基本结构。提出了C语言中do-while和switch等控制流语句的ddfg。本文最后给出了控制流ddfg的“概念验证”结果。
{"title":"Execution of Algorithms Using a Dynamic Dataflow Model for Reconfigurable Hardware - Commands in Dataflow Graph","authors":"V. Astolfi, Jorge LuizeSilva","doi":"10.1109/SPL.2007.371755","DOIUrl":"https://doi.org/10.1109/SPL.2007.371755","url":null,"abstract":"Many modern scientific and engineering applications such as weather forecast, medical diagnostics, artificial intelligence, and industrial automation, demand increased computational capacity. Actual high-performance architectures are focused on the concepts of parallel processing. One of these architectures is the dataflow model, which explores parallelism in a natural form. This paper describes briefly the dataflow model and its dynamic dataflow graph (DDFG), which is the basic structure to execute dataflow programs. DDFGs of control flow statements used in the C language, such as do-while and switch, are proposed. The results of a \"proof-of-concept\" for the control flow DDFGs are presented at the end of this paper.","PeriodicalId":419253,"journal":{"name":"2007 3rd Southern Conference on Programmable Logic","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129788058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Novel Hardware/Software Codesign Methodology Based on Dynamic Reconfiguration with Impulse C and Codeveloper 一种基于Impulse C和Codeveloper动态重构的软硬件协同设计方法
Pub Date : 2007-06-18 DOI: 10.1109/SPL.2007.371754
A. Antola, M. Santambrogio, M. Fracassi, P. Gotti, C. Sandionigi
The design of embedded systems has rapidly changed during the last decade. It is possible to identify two main responsible factors: hardware/software codesign and dynamic reconfiguration. The work presented in this paper tries to investigate how to consider the reconfiguration as an explicit dimension in the design flow for embedded systems. This work addresses the challenge introduced by the partial dynamic reconfiguration trying to propose a novel design flow, using the CoDeveloper framework to speedup the design process. The proposed flow allows the designer to define his/her desired specification using an high level design language such as C. Finally, it provides results showing how the proposed flow can be used by the designer to have more information useful in making the correct decisions during the design of his/her embedded system.
嵌入式系统的设计在过去十年中发生了迅速的变化。可以确定两个主要的负责因素:硬件/软件协同设计和动态重新配置。本文提出的工作试图研究如何将重构作为嵌入式系统设计流程中的一个显式维度。这项工作解决了部分动态重新配置所带来的挑战,试图提出一种新的设计流程,使用CoDeveloper框架来加速设计过程。建议流程允许设计人员使用高级设计语言(如c)定义他/她想要的规范。最后,它提供了结果,显示了设计人员如何使用建议流程,以便在设计嵌入式系统期间做出正确决策时获得更多有用的信息。
{"title":"A Novel Hardware/Software Codesign Methodology Based on Dynamic Reconfiguration with Impulse C and Codeveloper","authors":"A. Antola, M. Santambrogio, M. Fracassi, P. Gotti, C. Sandionigi","doi":"10.1109/SPL.2007.371754","DOIUrl":"https://doi.org/10.1109/SPL.2007.371754","url":null,"abstract":"The design of embedded systems has rapidly changed during the last decade. It is possible to identify two main responsible factors: hardware/software codesign and dynamic reconfiguration. The work presented in this paper tries to investigate how to consider the reconfiguration as an explicit dimension in the design flow for embedded systems. This work addresses the challenge introduced by the partial dynamic reconfiguration trying to propose a novel design flow, using the CoDeveloper framework to speedup the design process. The proposed flow allows the designer to define his/her desired specification using an high level design language such as C. Finally, it provides results showing how the proposed flow can be used by the designer to have more information useful in making the correct decisions during the design of his/her embedded system.","PeriodicalId":419253,"journal":{"name":"2007 3rd Southern Conference on Programmable Logic","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130968965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Soft Error Tolerant Carry-Select Adders Implemented into Altera FPGAs Altera fpga实现的软容错进位选择加法器
Pub Date : 2007-06-18 DOI: 10.1109/SPL.2007.371749
E. Mesquita, H. Franck, L. Agostini, J. Guntzel
The drastic shrink in transistor dimensions is making circuits more susceptible to radiation-induced soft errors. While single-event upsets are beginning to be a concern for electronic systems fabricated with nanometer CMOS technology at the sea level, single-event transients (SETs) are also expected to be a serious problem for the upcoming technologies. Thanks to the high logic density and fast turnaround time, FPGAs are currently the main fabric used to implement electronic systems. However, to provide high logic density FPGA devices are also fabricated with state-of-the-art CMOS technology and thus are also susceptible to soft errors. This paper presents a novel technique to protect carry-select adders against SETs. Such technique is based on triple module redundancy (TMR) and explores the inherent duplication existing in carry-select adders to reduce resource overhead.
晶体管尺寸的急剧缩小使电路更容易受到辐射引起的软误差的影响。当单事件扰动开始成为海平面上用纳米CMOS技术制造的电子系统的一个问题时,单事件瞬变(set)也有望成为即将到来的技术的一个严重问题。由于高逻辑密度和快速周转时间,fpga目前是用于实现电子系统的主要结构。然而,为了提供高逻辑密度的FPGA器件也采用最先进的CMOS技术制造,因此也容易受到软误差的影响。提出了一种保护carry-select加法器不受set影响的新技术。该技术基于三模块冗余(TMR),探索了进位选择加法器中存在的固有重复,以减少资源开销。
{"title":"Soft Error Tolerant Carry-Select Adders Implemented into Altera FPGAs","authors":"E. Mesquita, H. Franck, L. Agostini, J. Guntzel","doi":"10.1109/SPL.2007.371749","DOIUrl":"https://doi.org/10.1109/SPL.2007.371749","url":null,"abstract":"The drastic shrink in transistor dimensions is making circuits more susceptible to radiation-induced soft errors. While single-event upsets are beginning to be a concern for electronic systems fabricated with nanometer CMOS technology at the sea level, single-event transients (SETs) are also expected to be a serious problem for the upcoming technologies. Thanks to the high logic density and fast turnaround time, FPGAs are currently the main fabric used to implement electronic systems. However, to provide high logic density FPGA devices are also fabricated with state-of-the-art CMOS technology and thus are also susceptible to soft errors. This paper presents a novel technique to protect carry-select adders against SETs. Such technique is based on triple module redundancy (TMR) and explores the inherent duplication existing in carry-select adders to reduce resource overhead.","PeriodicalId":419253,"journal":{"name":"2007 3rd Southern Conference on Programmable Logic","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128287197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Reconfigurable Fpga-Based Architecture for Modular Nodes in Wireless Sensor Networks 无线传感器网络中模块化节点的可重构fpga架构
Pub Date : 2007-06-18 DOI: 10.1109/SPL.2007.371750
J. Portilla, T. Riesgo, Á. de Castro
A reconfigurable platform for sensor networks is presented. This platform has features that allow easy reuse of the node in several applications avoiding redesigning the system from scratch. The node includes an FPGA which is the core of the reconfiguration capabilities of the node. Several hardware interfaces for sensor standard protocols like I2C or PWM have been developed and implemented in the FPGA. Remote reconfiguration is an important feature and sensor networks can take advantage of it in order to improve the global performance.
提出了一种可重构的传感器网络平台。该平台具有允许在多个应用程序中轻松重用节点的特性,从而避免从头开始重新设计系统。该节点包括一个FPGA,它是节点重新配置能力的核心。传感器标准协议(如I2C或PWM)的几个硬件接口已经在FPGA中开发和实现。远程重构是传感器网络的一个重要特性,传感器网络可以利用它来提高全局性能。
{"title":"A Reconfigurable Fpga-Based Architecture for Modular Nodes in Wireless Sensor Networks","authors":"J. Portilla, T. Riesgo, Á. de Castro","doi":"10.1109/SPL.2007.371750","DOIUrl":"https://doi.org/10.1109/SPL.2007.371750","url":null,"abstract":"A reconfigurable platform for sensor networks is presented. This platform has features that allow easy reuse of the node in several applications avoiding redesigning the system from scratch. The node includes an FPGA which is the core of the reconfiguration capabilities of the node. Several hardware interfaces for sensor standard protocols like I2C or PWM have been developed and implemented in the FPGA. Remote reconfiguration is an important feature and sensor networks can take advantage of it in order to improve the global performance.","PeriodicalId":419253,"journal":{"name":"2007 3rd Southern Conference on Programmable Logic","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129783076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Towards Fine and Medium Grain Dynamic Functional Extraction for HW/SW Acceleration 基于中、细粒动态功能提取的高频/高频加速研究
Pub Date : 2007-06-18 DOI: 10.1109/SPL.2007.371730
V. Matev, E. de la Torre, T. Riesgo
In this paper, an acceleration method for hardware platforms for embedded systems is presented. The target system is a Xilinx Virtex II Protrade with an embedded PowerPCtrade. The PowerPCtrade operates as a general purpose processor, while the reconfigurable FPGA fabric is used as a reconfigurable co-processor. A comparison experiment of HW acceleration using different grain levels is done, and results are shown using an MPEG audio decoding algorithm example. A HW/SW interface to communicate the processor with a custom hardware which is synthesized in the reconfigurable fabric is shown. Algorithm analysis is done by profiling and a partitioning decision is based on a fine-medium grain philosophy, which allows more hardware reusability, and simpler and faster reconfiguration. Repetitive functional blocks in the algorithm were detected and implemented in the FPGA logic, and corresponding generic software functionally for writing/reading data in the co-processor unit was developed.
本文提出了一种用于嵌入式系统硬件平台的加速方法。目标系统是带有嵌入式PowerPCtrade的Xilinx Virtex II Protrade。PowerPCtrade作为通用处理器运行,而可重构FPGA结构用作可重构协处理器。对不同粒度下的HW加速进行了对比实验,并以MPEG音频解码算法为例给出了实验结果。给出了一个硬件/软件接口,用于将处理器与在可重构结构中合成的自定义硬件进行通信。算法分析是通过概要分析完成的,分区决策是基于细-中粒度哲学的,这允许更多的硬件可重用性,以及更简单和更快的重新配置。对算法中的重复功能块进行检测并在FPGA逻辑中实现,并开发了相应的通用软件,用于在协处理器单元中读写数据。
{"title":"Towards Fine and Medium Grain Dynamic Functional Extraction for HW/SW Acceleration","authors":"V. Matev, E. de la Torre, T. Riesgo","doi":"10.1109/SPL.2007.371730","DOIUrl":"https://doi.org/10.1109/SPL.2007.371730","url":null,"abstract":"In this paper, an acceleration method for hardware platforms for embedded systems is presented. The target system is a Xilinx Virtex II Protrade with an embedded PowerPCtrade. The PowerPCtrade operates as a general purpose processor, while the reconfigurable FPGA fabric is used as a reconfigurable co-processor. A comparison experiment of HW acceleration using different grain levels is done, and results are shown using an MPEG audio decoding algorithm example. A HW/SW interface to communicate the processor with a custom hardware which is synthesized in the reconfigurable fabric is shown. Algorithm analysis is done by profiling and a partitioning decision is based on a fine-medium grain philosophy, which allows more hardware reusability, and simpler and faster reconfiguration. Repetitive functional blocks in the algorithm were detected and implemented in the FPGA logic, and corresponding generic software functionally for writing/reading data in the co-processor unit was developed.","PeriodicalId":419253,"journal":{"name":"2007 3rd Southern Conference on Programmable Logic","volume":"260 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132543979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
FPGA-Based Platform for Image and Video Processing Embedded Systems 基于fpga的图像和视频处理嵌入式系统平台
Pub Date : 2007-06-18 DOI: 10.1109/SPL.2007.371743
F. J. Toledo, J.J. Martinez, J. Ferrández
When image and video processing applications are moving towards consumer markets, there exists clearly the need of replacing PC-based software solutions with embedded processor. In this context, the enhanced characteristics of the modern FPGA devices make possible to build whole systems with improved performance and reduced costs. In this paper we describe a platform for developing fully FPGA-based embedded systems designed for image and video processing applications. It is a hardware/software system which makes the design process easier and faster. It also makes feasible the interaction with the user and the run-time customization of processing algorithms.
当图像和视频处理应用程序走向消费者市场时,显然需要用嵌入式处理器取代基于pc的软件解决方案。在这种情况下,现代FPGA器件的增强特性使得构建具有更高性能和更低成本的整个系统成为可能。在本文中,我们描述了一个开发完全基于fpga的嵌入式系统的平台,专为图像和视频处理应用而设计。它是一个硬件/软件系统,使设计过程更容易和更快。它还使与用户的交互和处理算法的运行时定制成为可能。
{"title":"FPGA-Based Platform for Image and Video Processing Embedded Systems","authors":"F. J. Toledo, J.J. Martinez, J. Ferrández","doi":"10.1109/SPL.2007.371743","DOIUrl":"https://doi.org/10.1109/SPL.2007.371743","url":null,"abstract":"When image and video processing applications are moving towards consumer markets, there exists clearly the need of replacing PC-based software solutions with embedded processor. In this context, the enhanced characteristics of the modern FPGA devices make possible to build whole systems with improved performance and reduced costs. In this paper we describe a platform for developing fully FPGA-based embedded systems designed for image and video processing applications. It is a hardware/software system which makes the design process easier and faster. It also makes feasible the interaction with the user and the run-time customization of processing algorithms.","PeriodicalId":419253,"journal":{"name":"2007 3rd Southern Conference on Programmable Logic","volume":"207 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132666234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
A×B B×A in Terms of Power Consumption: Some Examples on FPGA A×B B×A在功耗方面:FPGA上的一些例子
Pub Date : 2007-06-18 DOI: 10.1109/SPL.2007.371759
E. Boemo, G. Sutter
This paper shows that, under certain conditions, digital arithmetical circuits do not meet the addition commutation property in terms of power consumption. That is, the power consumed by the operation AtimesB is different from BtimesA. As a consequence, it is possible to get a power saving simply permuting the circuit inputs, wherever any of the following three conditions are present: a) the data to be processed has a strong temporal correlation; b) the delays between the circuit paths are highly unequalized; c) one of the input data communication is broadcast type, meanwhile the other is local. In order to verify these hypotheses, several binary multipliers were constructed and measured. The power consumption reduction resulted between 12% and 28% in Virtex FPGAs.
本文表明,在一定条件下,数字算术电路在功耗方面不满足加法换流特性。即操作AtimesB和操作BtimesA所消耗的功率不同。因此,只要满足以下三个条件中的任何一个,就可以简单地调整电路输入,从而节省电力:a)待处理的数据具有很强的时间相关性;B)电路路径之间的延迟高度不均衡;C)输入数据通信一种是广播型,另一种是本地型。为了验证这些假设,我们构造并测量了几个二元乘数。Virtex fpga的功耗降低了12%到28%。
{"title":"A×B B×A in Terms of Power Consumption: Some Examples on FPGA","authors":"E. Boemo, G. Sutter","doi":"10.1109/SPL.2007.371759","DOIUrl":"https://doi.org/10.1109/SPL.2007.371759","url":null,"abstract":"This paper shows that, under certain conditions, digital arithmetical circuits do not meet the addition commutation property in terms of power consumption. That is, the power consumed by the operation AtimesB is different from BtimesA. As a consequence, it is possible to get a power saving simply permuting the circuit inputs, wherever any of the following three conditions are present: a) the data to be processed has a strong temporal correlation; b) the delays between the circuit paths are highly unequalized; c) one of the input data communication is broadcast type, meanwhile the other is local. In order to verify these hypotheses, several binary multipliers were constructed and measured. The power consumption reduction resulted between 12% and 28% in Virtex FPGAs.","PeriodicalId":419253,"journal":{"name":"2007 3rd Southern Conference on Programmable Logic","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130595134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TCL/TK for EDA Tools 用于EDA工具的TCL/TK
Pub Date : 2007-06-18 DOI: 10.1109/SPL.2007.371732
E. Todorovich, O. Cadenas
Tcl/Tk scripting language has become the de-facto standard for EDA tools. This paper explains how to start working with Tcl/Tk using simple examples. Two complete applications are presented to show in more detail the capabilities of the language. In one script average power consumption of a digital system is automated. A second script creates a virtual display driven by the simulation of a graphic card.
Tcl/Tk脚本语言已经成为EDA工具事实上的标准。本文通过简单的示例解释了如何开始使用Tcl/Tk。本文提供了两个完整的应用程序,以更详细地展示该语言的功能。在一个脚本中,数字系统的平均功耗是自动化的。第二个脚本创建由图形卡模拟驱动的虚拟显示。
{"title":"TCL/TK for EDA Tools","authors":"E. Todorovich, O. Cadenas","doi":"10.1109/SPL.2007.371732","DOIUrl":"https://doi.org/10.1109/SPL.2007.371732","url":null,"abstract":"Tcl/Tk scripting language has become the de-facto standard for EDA tools. This paper explains how to start working with Tcl/Tk using simple examples. Two complete applications are presented to show in more detail the capabilities of the language. In one script average power consumption of a digital system is automated. A second script creates a virtual display driven by the simulation of a graphic card.","PeriodicalId":419253,"journal":{"name":"2007 3rd Southern Conference on Programmable Logic","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115879045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Memory Optimized Architecture for Efficient Gauss-Jordan Matrix Inversion 高效高斯-约当矩阵反演的内存优化结构
Pub Date : 2007-06-18 DOI: 10.1109/SPL.2007.371720
Gon alo
This paper presents a new architecture for efficient Gauss-Jordan matrix inversion algorithm on reconfigurable hardware platforms. The results show that currently available re- configurable computing technology can easily achieve significantly higher floating-point performance than high-end CPUs, running state-of-the-art routines for large matrices operations. For common reconfigurable systems, where the FPGAs are directly coupled to the on-board memory, the achievable performance scales directly with the number of realizable simultaneous memory accesses. A new dedicated reconfigurable architecture is proposed and analysed and the results show a performance improvement of 2x over the previous implementation, using only half of the memory and half of the floating-point units. Benchmarking against Matlab, which features high performance matrix inversion routines, shows that a 100 MHz FPGA can easily surpass the performance of 3,2 GHz Intel Pentium IV processors. This is possible having only 5 double-port memory banks or 9 single-port memory banks connected to the FPGA.
本文提出了一种在可重构硬件平台上实现高效高斯-约当矩阵反演算法的新架构。结果表明,当前可用的可重构计算技术可以轻松实现比高端cpu更高的浮点性能,运行最先进的大型矩阵操作例程。对于常见的可重构系统,其中fpga直接耦合到板载存储器,可实现的性能与可实现的并发存储器访问数量直接相关。提出并分析了一种新的专用可重构架构,结果表明,在只使用一半内存和一半浮点单元的情况下,性能比以前的实现提高了2倍。对具有高性能矩阵反演例程的Matlab进行基准测试表明,100 MHz FPGA可以轻松超越3.2 GHz Intel Pentium IV处理器的性能。只有5个双端口内存库或9个单端口内存库连接到FPGA,这是可能的。
{"title":"Memory Optimized Architecture for Efficient Gauss-Jordan Matrix Inversion","authors":"Gon alo","doi":"10.1109/SPL.2007.371720","DOIUrl":"https://doi.org/10.1109/SPL.2007.371720","url":null,"abstract":"This paper presents a new architecture for efficient Gauss-Jordan matrix inversion algorithm on reconfigurable hardware platforms. The results show that currently available re- configurable computing technology can easily achieve significantly higher floating-point performance than high-end CPUs, running state-of-the-art routines for large matrices operations. For common reconfigurable systems, where the FPGAs are directly coupled to the on-board memory, the achievable performance scales directly with the number of realizable simultaneous memory accesses. A new dedicated reconfigurable architecture is proposed and analysed and the results show a performance improvement of 2x over the previous implementation, using only half of the memory and half of the floating-point units. Benchmarking against Matlab, which features high performance matrix inversion routines, shows that a 100 MHz FPGA can easily surpass the performance of 3,2 GHz Intel Pentium IV processors. This is possible having only 5 double-port memory banks or 9 single-port memory banks connected to the FPGA.","PeriodicalId":419253,"journal":{"name":"2007 3rd Southern Conference on Programmable Logic","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123714493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Fast Placement-Intact Logic Perturbation Targeting for FPGA Performance Improvement 快速放置-完整逻辑微扰瞄准FPGA性能改进
Pub Date : 2007-06-18 DOI: 10.1109/SPL.2007.371725
C.L. Zhou, W. Tang, Yu-Liang Wu
This work presents a novel, accurate, and fast post-layout logic perturbation method for improving LUT-based FPGA routing without affecting the placement. The ATPG-based rewiring techniques are used to design the rewiring engine, which is embedded into VPR, the most powerful academic FPGA CAD tool currently. Compared with VPR's high-quality results, our method can reduce critical path delay by up to 31.74% (avg. 10%) without disturbing placement or sacrificing area. The CPU time used by the rewiring engine is only 5% of the total time consumed by VPR's placement and routing. All the benchmark circuits can be placed and routed within 3 minutes, which is much faster than the SPFD approach. This paper also analyzes the power of the ATPG- based rewiring techniques in LUT-based FPGAs. Experimental results show that 3% of all nets can be replaced by their alternative wires for FPGA performance improvement.
这项工作提出了一种新颖,准确,快速的布局后逻辑摄动方法,可以在不影响布局的情况下改善基于lut的FPGA路由。采用基于atpg的重布线技术设计了重布线引擎,并将其嵌入到目前最强大的FPGA CAD工具VPR中。与VPR的高质量结果相比,我们的方法可以在不干扰放置或牺牲面积的情况下将关键路径延迟减少31.74%(平均10%)。重新布线引擎使用的CPU时间仅为VPR放置和路由所消耗的总时间的5%。所有基准电路可以在3分钟内放置和路由,这比SPFD方法快得多。本文还分析了基于lut的fpga中基于ATPG的重布线技术的功率。实验结果表明,3%的网络可以被它们的备选线替代,从而提高FPGA的性能。
{"title":"Fast Placement-Intact Logic Perturbation Targeting for FPGA Performance Improvement","authors":"C.L. Zhou, W. Tang, Yu-Liang Wu","doi":"10.1109/SPL.2007.371725","DOIUrl":"https://doi.org/10.1109/SPL.2007.371725","url":null,"abstract":"This work presents a novel, accurate, and fast post-layout logic perturbation method for improving LUT-based FPGA routing without affecting the placement. The ATPG-based rewiring techniques are used to design the rewiring engine, which is embedded into VPR, the most powerful academic FPGA CAD tool currently. Compared with VPR's high-quality results, our method can reduce critical path delay by up to 31.74% (avg. 10%) without disturbing placement or sacrificing area. The CPU time used by the rewiring engine is only 5% of the total time consumed by VPR's placement and routing. All the benchmark circuits can be placed and routed within 3 minutes, which is much faster than the SPFD approach. This paper also analyzes the power of the ATPG- based rewiring techniques in LUT-based FPGAs. Experimental results show that 3% of all nets can be replaced by their alternative wires for FPGA performance improvement.","PeriodicalId":419253,"journal":{"name":"2007 3rd Southern Conference on Programmable Logic","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125581905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
2007 3rd Southern Conference on Programmable Logic
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1