Many modern scientific and engineering applications such as weather forecast, medical diagnostics, artificial intelligence, and industrial automation, demand increased computational capacity. Actual high-performance architectures are focused on the concepts of parallel processing. One of these architectures is the dataflow model, which explores parallelism in a natural form. This paper describes briefly the dataflow model and its dynamic dataflow graph (DDFG), which is the basic structure to execute dataflow programs. DDFGs of control flow statements used in the C language, such as do-while and switch, are proposed. The results of a "proof-of-concept" for the control flow DDFGs are presented at the end of this paper.
{"title":"Execution of Algorithms Using a Dynamic Dataflow Model for Reconfigurable Hardware - Commands in Dataflow Graph","authors":"V. Astolfi, Jorge LuizeSilva","doi":"10.1109/SPL.2007.371755","DOIUrl":"https://doi.org/10.1109/SPL.2007.371755","url":null,"abstract":"Many modern scientific and engineering applications such as weather forecast, medical diagnostics, artificial intelligence, and industrial automation, demand increased computational capacity. Actual high-performance architectures are focused on the concepts of parallel processing. One of these architectures is the dataflow model, which explores parallelism in a natural form. This paper describes briefly the dataflow model and its dynamic dataflow graph (DDFG), which is the basic structure to execute dataflow programs. DDFGs of control flow statements used in the C language, such as do-while and switch, are proposed. The results of a \"proof-of-concept\" for the control flow DDFGs are presented at the end of this paper.","PeriodicalId":419253,"journal":{"name":"2007 3rd Southern Conference on Programmable Logic","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129788058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Antola, M. Santambrogio, M. Fracassi, P. Gotti, C. Sandionigi
The design of embedded systems has rapidly changed during the last decade. It is possible to identify two main responsible factors: hardware/software codesign and dynamic reconfiguration. The work presented in this paper tries to investigate how to consider the reconfiguration as an explicit dimension in the design flow for embedded systems. This work addresses the challenge introduced by the partial dynamic reconfiguration trying to propose a novel design flow, using the CoDeveloper framework to speedup the design process. The proposed flow allows the designer to define his/her desired specification using an high level design language such as C. Finally, it provides results showing how the proposed flow can be used by the designer to have more information useful in making the correct decisions during the design of his/her embedded system.
{"title":"A Novel Hardware/Software Codesign Methodology Based on Dynamic Reconfiguration with Impulse C and Codeveloper","authors":"A. Antola, M. Santambrogio, M. Fracassi, P. Gotti, C. Sandionigi","doi":"10.1109/SPL.2007.371754","DOIUrl":"https://doi.org/10.1109/SPL.2007.371754","url":null,"abstract":"The design of embedded systems has rapidly changed during the last decade. It is possible to identify two main responsible factors: hardware/software codesign and dynamic reconfiguration. The work presented in this paper tries to investigate how to consider the reconfiguration as an explicit dimension in the design flow for embedded systems. This work addresses the challenge introduced by the partial dynamic reconfiguration trying to propose a novel design flow, using the CoDeveloper framework to speedup the design process. The proposed flow allows the designer to define his/her desired specification using an high level design language such as C. Finally, it provides results showing how the proposed flow can be used by the designer to have more information useful in making the correct decisions during the design of his/her embedded system.","PeriodicalId":419253,"journal":{"name":"2007 3rd Southern Conference on Programmable Logic","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130968965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The drastic shrink in transistor dimensions is making circuits more susceptible to radiation-induced soft errors. While single-event upsets are beginning to be a concern for electronic systems fabricated with nanometer CMOS technology at the sea level, single-event transients (SETs) are also expected to be a serious problem for the upcoming technologies. Thanks to the high logic density and fast turnaround time, FPGAs are currently the main fabric used to implement electronic systems. However, to provide high logic density FPGA devices are also fabricated with state-of-the-art CMOS technology and thus are also susceptible to soft errors. This paper presents a novel technique to protect carry-select adders against SETs. Such technique is based on triple module redundancy (TMR) and explores the inherent duplication existing in carry-select adders to reduce resource overhead.
{"title":"Soft Error Tolerant Carry-Select Adders Implemented into Altera FPGAs","authors":"E. Mesquita, H. Franck, L. Agostini, J. Guntzel","doi":"10.1109/SPL.2007.371749","DOIUrl":"https://doi.org/10.1109/SPL.2007.371749","url":null,"abstract":"The drastic shrink in transistor dimensions is making circuits more susceptible to radiation-induced soft errors. While single-event upsets are beginning to be a concern for electronic systems fabricated with nanometer CMOS technology at the sea level, single-event transients (SETs) are also expected to be a serious problem for the upcoming technologies. Thanks to the high logic density and fast turnaround time, FPGAs are currently the main fabric used to implement electronic systems. However, to provide high logic density FPGA devices are also fabricated with state-of-the-art CMOS technology and thus are also susceptible to soft errors. This paper presents a novel technique to protect carry-select adders against SETs. Such technique is based on triple module redundancy (TMR) and explores the inherent duplication existing in carry-select adders to reduce resource overhead.","PeriodicalId":419253,"journal":{"name":"2007 3rd Southern Conference on Programmable Logic","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128287197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A reconfigurable platform for sensor networks is presented. This platform has features that allow easy reuse of the node in several applications avoiding redesigning the system from scratch. The node includes an FPGA which is the core of the reconfiguration capabilities of the node. Several hardware interfaces for sensor standard protocols like I2C or PWM have been developed and implemented in the FPGA. Remote reconfiguration is an important feature and sensor networks can take advantage of it in order to improve the global performance.
{"title":"A Reconfigurable Fpga-Based Architecture for Modular Nodes in Wireless Sensor Networks","authors":"J. Portilla, T. Riesgo, Á. de Castro","doi":"10.1109/SPL.2007.371750","DOIUrl":"https://doi.org/10.1109/SPL.2007.371750","url":null,"abstract":"A reconfigurable platform for sensor networks is presented. This platform has features that allow easy reuse of the node in several applications avoiding redesigning the system from scratch. The node includes an FPGA which is the core of the reconfiguration capabilities of the node. Several hardware interfaces for sensor standard protocols like I2C or PWM have been developed and implemented in the FPGA. Remote reconfiguration is an important feature and sensor networks can take advantage of it in order to improve the global performance.","PeriodicalId":419253,"journal":{"name":"2007 3rd Southern Conference on Programmable Logic","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129783076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, an acceleration method for hardware platforms for embedded systems is presented. The target system is a Xilinx Virtex II Protrade with an embedded PowerPCtrade. The PowerPCtrade operates as a general purpose processor, while the reconfigurable FPGA fabric is used as a reconfigurable co-processor. A comparison experiment of HW acceleration using different grain levels is done, and results are shown using an MPEG audio decoding algorithm example. A HW/SW interface to communicate the processor with a custom hardware which is synthesized in the reconfigurable fabric is shown. Algorithm analysis is done by profiling and a partitioning decision is based on a fine-medium grain philosophy, which allows more hardware reusability, and simpler and faster reconfiguration. Repetitive functional blocks in the algorithm were detected and implemented in the FPGA logic, and corresponding generic software functionally for writing/reading data in the co-processor unit was developed.
本文提出了一种用于嵌入式系统硬件平台的加速方法。目标系统是带有嵌入式PowerPCtrade的Xilinx Virtex II Protrade。PowerPCtrade作为通用处理器运行,而可重构FPGA结构用作可重构协处理器。对不同粒度下的HW加速进行了对比实验,并以MPEG音频解码算法为例给出了实验结果。给出了一个硬件/软件接口,用于将处理器与在可重构结构中合成的自定义硬件进行通信。算法分析是通过概要分析完成的,分区决策是基于细-中粒度哲学的,这允许更多的硬件可重用性,以及更简单和更快的重新配置。对算法中的重复功能块进行检测并在FPGA逻辑中实现,并开发了相应的通用软件,用于在协处理器单元中读写数据。
{"title":"Towards Fine and Medium Grain Dynamic Functional Extraction for HW/SW Acceleration","authors":"V. Matev, E. de la Torre, T. Riesgo","doi":"10.1109/SPL.2007.371730","DOIUrl":"https://doi.org/10.1109/SPL.2007.371730","url":null,"abstract":"In this paper, an acceleration method for hardware platforms for embedded systems is presented. The target system is a Xilinx Virtex II Protrade with an embedded PowerPCtrade. The PowerPCtrade operates as a general purpose processor, while the reconfigurable FPGA fabric is used as a reconfigurable co-processor. A comparison experiment of HW acceleration using different grain levels is done, and results are shown using an MPEG audio decoding algorithm example. A HW/SW interface to communicate the processor with a custom hardware which is synthesized in the reconfigurable fabric is shown. Algorithm analysis is done by profiling and a partitioning decision is based on a fine-medium grain philosophy, which allows more hardware reusability, and simpler and faster reconfiguration. Repetitive functional blocks in the algorithm were detected and implemented in the FPGA logic, and corresponding generic software functionally for writing/reading data in the co-processor unit was developed.","PeriodicalId":419253,"journal":{"name":"2007 3rd Southern Conference on Programmable Logic","volume":"260 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132543979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
When image and video processing applications are moving towards consumer markets, there exists clearly the need of replacing PC-based software solutions with embedded processor. In this context, the enhanced characteristics of the modern FPGA devices make possible to build whole systems with improved performance and reduced costs. In this paper we describe a platform for developing fully FPGA-based embedded systems designed for image and video processing applications. It is a hardware/software system which makes the design process easier and faster. It also makes feasible the interaction with the user and the run-time customization of processing algorithms.
{"title":"FPGA-Based Platform for Image and Video Processing Embedded Systems","authors":"F. J. Toledo, J.J. Martinez, J. Ferrández","doi":"10.1109/SPL.2007.371743","DOIUrl":"https://doi.org/10.1109/SPL.2007.371743","url":null,"abstract":"When image and video processing applications are moving towards consumer markets, there exists clearly the need of replacing PC-based software solutions with embedded processor. In this context, the enhanced characteristics of the modern FPGA devices make possible to build whole systems with improved performance and reduced costs. In this paper we describe a platform for developing fully FPGA-based embedded systems designed for image and video processing applications. It is a hardware/software system which makes the design process easier and faster. It also makes feasible the interaction with the user and the run-time customization of processing algorithms.","PeriodicalId":419253,"journal":{"name":"2007 3rd Southern Conference on Programmable Logic","volume":"207 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132666234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper shows that, under certain conditions, digital arithmetical circuits do not meet the addition commutation property in terms of power consumption. That is, the power consumed by the operation AtimesB is different from BtimesA. As a consequence, it is possible to get a power saving simply permuting the circuit inputs, wherever any of the following three conditions are present: a) the data to be processed has a strong temporal correlation; b) the delays between the circuit paths are highly unequalized; c) one of the input data communication is broadcast type, meanwhile the other is local. In order to verify these hypotheses, several binary multipliers were constructed and measured. The power consumption reduction resulted between 12% and 28% in Virtex FPGAs.
{"title":"A×B B×A in Terms of Power Consumption: Some Examples on FPGA","authors":"E. Boemo, G. Sutter","doi":"10.1109/SPL.2007.371759","DOIUrl":"https://doi.org/10.1109/SPL.2007.371759","url":null,"abstract":"This paper shows that, under certain conditions, digital arithmetical circuits do not meet the addition commutation property in terms of power consumption. That is, the power consumed by the operation AtimesB is different from BtimesA. As a consequence, it is possible to get a power saving simply permuting the circuit inputs, wherever any of the following three conditions are present: a) the data to be processed has a strong temporal correlation; b) the delays between the circuit paths are highly unequalized; c) one of the input data communication is broadcast type, meanwhile the other is local. In order to verify these hypotheses, several binary multipliers were constructed and measured. The power consumption reduction resulted between 12% and 28% in Virtex FPGAs.","PeriodicalId":419253,"journal":{"name":"2007 3rd Southern Conference on Programmable Logic","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130595134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tcl/Tk scripting language has become the de-facto standard for EDA tools. This paper explains how to start working with Tcl/Tk using simple examples. Two complete applications are presented to show in more detail the capabilities of the language. In one script average power consumption of a digital system is automated. A second script creates a virtual display driven by the simulation of a graphic card.
{"title":"TCL/TK for EDA Tools","authors":"E. Todorovich, O. Cadenas","doi":"10.1109/SPL.2007.371732","DOIUrl":"https://doi.org/10.1109/SPL.2007.371732","url":null,"abstract":"Tcl/Tk scripting language has become the de-facto standard for EDA tools. This paper explains how to start working with Tcl/Tk using simple examples. Two complete applications are presented to show in more detail the capabilities of the language. In one script average power consumption of a digital system is automated. A second script creates a virtual display driven by the simulation of a graphic card.","PeriodicalId":419253,"journal":{"name":"2007 3rd Southern Conference on Programmable Logic","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115879045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a new architecture for efficient Gauss-Jordan matrix inversion algorithm on reconfigurable hardware platforms. The results show that currently available re- configurable computing technology can easily achieve significantly higher floating-point performance than high-end CPUs, running state-of-the-art routines for large matrices operations. For common reconfigurable systems, where the FPGAs are directly coupled to the on-board memory, the achievable performance scales directly with the number of realizable simultaneous memory accesses. A new dedicated reconfigurable architecture is proposed and analysed and the results show a performance improvement of 2x over the previous implementation, using only half of the memory and half of the floating-point units. Benchmarking against Matlab, which features high performance matrix inversion routines, shows that a 100 MHz FPGA can easily surpass the performance of 3,2 GHz Intel Pentium IV processors. This is possible having only 5 double-port memory banks or 9 single-port memory banks connected to the FPGA.
{"title":"Memory Optimized Architecture for Efficient Gauss-Jordan Matrix Inversion","authors":"Gon alo","doi":"10.1109/SPL.2007.371720","DOIUrl":"https://doi.org/10.1109/SPL.2007.371720","url":null,"abstract":"This paper presents a new architecture for efficient Gauss-Jordan matrix inversion algorithm on reconfigurable hardware platforms. The results show that currently available re- configurable computing technology can easily achieve significantly higher floating-point performance than high-end CPUs, running state-of-the-art routines for large matrices operations. For common reconfigurable systems, where the FPGAs are directly coupled to the on-board memory, the achievable performance scales directly with the number of realizable simultaneous memory accesses. A new dedicated reconfigurable architecture is proposed and analysed and the results show a performance improvement of 2x over the previous implementation, using only half of the memory and half of the floating-point units. Benchmarking against Matlab, which features high performance matrix inversion routines, shows that a 100 MHz FPGA can easily surpass the performance of 3,2 GHz Intel Pentium IV processors. This is possible having only 5 double-port memory banks or 9 single-port memory banks connected to the FPGA.","PeriodicalId":419253,"journal":{"name":"2007 3rd Southern Conference on Programmable Logic","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123714493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work presents a novel, accurate, and fast post-layout logic perturbation method for improving LUT-based FPGA routing without affecting the placement. The ATPG-based rewiring techniques are used to design the rewiring engine, which is embedded into VPR, the most powerful academic FPGA CAD tool currently. Compared with VPR's high-quality results, our method can reduce critical path delay by up to 31.74% (avg. 10%) without disturbing placement or sacrificing area. The CPU time used by the rewiring engine is only 5% of the total time consumed by VPR's placement and routing. All the benchmark circuits can be placed and routed within 3 minutes, which is much faster than the SPFD approach. This paper also analyzes the power of the ATPG- based rewiring techniques in LUT-based FPGAs. Experimental results show that 3% of all nets can be replaced by their alternative wires for FPGA performance improvement.
{"title":"Fast Placement-Intact Logic Perturbation Targeting for FPGA Performance Improvement","authors":"C.L. Zhou, W. Tang, Yu-Liang Wu","doi":"10.1109/SPL.2007.371725","DOIUrl":"https://doi.org/10.1109/SPL.2007.371725","url":null,"abstract":"This work presents a novel, accurate, and fast post-layout logic perturbation method for improving LUT-based FPGA routing without affecting the placement. The ATPG-based rewiring techniques are used to design the rewiring engine, which is embedded into VPR, the most powerful academic FPGA CAD tool currently. Compared with VPR's high-quality results, our method can reduce critical path delay by up to 31.74% (avg. 10%) without disturbing placement or sacrificing area. The CPU time used by the rewiring engine is only 5% of the total time consumed by VPR's placement and routing. All the benchmark circuits can be placed and routed within 3 minutes, which is much faster than the SPFD approach. This paper also analyzes the power of the ATPG- based rewiring techniques in LUT-based FPGAs. Experimental results show that 3% of all nets can be replaced by their alternative wires for FPGA performance improvement.","PeriodicalId":419253,"journal":{"name":"2007 3rd Southern Conference on Programmable Logic","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125581905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}