Pub Date : 2002-12-16DOI: 10.1109/FPT.2002.1188673
Michael Guntsch, M. Middendorf, B. Scheuermann, O. Diessel, H. ElGindy, H. Schmeck, K. So
We propose to modify a type of ant algorithm called Population based Ant Colony Optimization (P-ACO) to allow implementation on an FPGA architecture. Ant algorithms are adapted from the natural behavior of ants and used to find good solutions to combinatorial optimization problems. General layout on the FPGA and algorithmic description are covered The most notable achievements featured in this paper are a runtime reduction and including the approximation of the heuristic function by a small set of favored decisions which changes over time.
{"title":"Population based ant colony optimization on FPGA","authors":"Michael Guntsch, M. Middendorf, B. Scheuermann, O. Diessel, H. ElGindy, H. Schmeck, K. So","doi":"10.1109/FPT.2002.1188673","DOIUrl":"https://doi.org/10.1109/FPT.2002.1188673","url":null,"abstract":"We propose to modify a type of ant algorithm called Population based Ant Colony Optimization (P-ACO) to allow implementation on an FPGA architecture. Ant algorithms are adapted from the natural behavior of ants and used to find good solutions to combinatorial optimization problems. General layout on the FPGA and algorithmic description are covered The most notable achievements featured in this paper are a runtime reduction and including the approximation of the heuristic function by a small set of favored decisions which changes over time.","PeriodicalId":355740,"journal":{"name":"2002 IEEE International Conference on Field-Programmable Technology, 2002. (FPT). Proceedings.","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116779018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-12-16DOI: 10.1109/FPT.2002.1188707
John Hopf, Stewart ltzstein, D. Kearney
Development of high level Hardware Description Languages (HDLs) is an integral area of research in Reconfigurable Computing (RC). There is an apparent need to enhance the development tools available and achieve more abstraction in languages to make hardware development easier for software programmers. The lack of a unified hardware/software language and difficulties in system verification are also other issues currently being faced. To overcome these issues, we propose a Hardware Join Java language that uses the high level syntax and semantics of Java with additions to support reconfigurable hardware description. The language adopts Join Java semantics to allow specification of concurrency without the inherent complexity of Java's standard thread and monitor mechanisms. From a specification, hardware classes will be compiled and linked with VHDL source code. Standard Java classes can be used for the software part of an application and will serve as an interface.
{"title":"Hardware Join Java: a high level language for reconfigurable hardware development","authors":"John Hopf, Stewart ltzstein, D. Kearney","doi":"10.1109/FPT.2002.1188707","DOIUrl":"https://doi.org/10.1109/FPT.2002.1188707","url":null,"abstract":"Development of high level Hardware Description Languages (HDLs) is an integral area of research in Reconfigurable Computing (RC). There is an apparent need to enhance the development tools available and achieve more abstraction in languages to make hardware development easier for software programmers. The lack of a unified hardware/software language and difficulties in system verification are also other issues currently being faced. To overcome these issues, we propose a Hardware Join Java language that uses the high level syntax and semantics of Java with additions to support reconfigurable hardware description. The language adopts Join Java semantics to allow specification of concurrency without the inherent complexity of Java's standard thread and monitor mechanisms. From a specification, hardware classes will be compiled and linked with VHDL source code. Standard Java classes can be used for the software part of an application and will serve as an interface.","PeriodicalId":355740,"journal":{"name":"2002 IEEE International Conference on Field-Programmable Technology, 2002. (FPT). Proceedings.","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128128934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-12-16DOI: 10.1109/FPT.2002.1188703
M. Tahoori
In this paper, we present coarse-grain and fine-grain diagnosis techniques to identify a faulty element in FPGA interconnects. The fault model we use is stuck-open and resistive-open for interconnects. The presented technique requires only a small number of configurations while offering high resolution diagnosis. We implemented this technique on real FPGA chips and verified it using fault emulation.
{"title":"Diagnosis of open defects in FPGA interconnect","authors":"M. Tahoori","doi":"10.1109/FPT.2002.1188703","DOIUrl":"https://doi.org/10.1109/FPT.2002.1188703","url":null,"abstract":"In this paper, we present coarse-grain and fine-grain diagnosis techniques to identify a faulty element in FPGA interconnects. The fault model we use is stuck-open and resistive-open for interconnects. The presented technique requires only a small number of configurations while offering high resolution diagnosis. We implemented this technique on real FPGA chips and verified it using fault emulation.","PeriodicalId":355740,"journal":{"name":"2002 IEEE International Conference on Field-Programmable Technology, 2002. (FPT). Proceedings.","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116764958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-12-16DOI: 10.1109/FPT.2002.1188705
A. Muthukaruppan, S. Suresh, V. Kamakoti
This paper establishes a handshake between the fields of "parallel genetic algorithms" and reconfigurable systems, to provide a solution for the routing problem for FPGAs, that attempts to enhance the performance of the circuit implemented by the FPGA. We propose to solve the problem of routing for FPGAs in three phases, out of which the first two utilize the concept of genetic algorithms to transform an initial population of random suggested routings to a population that contains solutions approximating the optimal one.
{"title":"A novel three phase parallel genetic approach to routing for field programmable gate arrays","authors":"A. Muthukaruppan, S. Suresh, V. Kamakoti","doi":"10.1109/FPT.2002.1188705","DOIUrl":"https://doi.org/10.1109/FPT.2002.1188705","url":null,"abstract":"This paper establishes a handshake between the fields of \"parallel genetic algorithms\" and reconfigurable systems, to provide a solution for the routing problem for FPGAs, that attempts to enhance the performance of the circuit implemented by the FPGA. We propose to solve the problem of routing for FPGAs in three phases, out of which the first two utilize the concept of genetic algorithms to transform an initial population of random suggested routings to a population that contains solutions approximating the optimal one.","PeriodicalId":355740,"journal":{"name":"2002 IEEE International Conference on Field-Programmable Technology, 2002. (FPT). Proceedings.","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126505732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-12-16DOI: 10.1109/FPT.2002.1188724
F. Rodríguez, J. Campelo, J. J. Serrano
Designing a complete SoC or reuse SoC components to create a complete system is a common task nowadays. The flexibility offered by current design flows offers the designer an unprecedented capability to incorporate more and more demanded features like error detection and correction mechanisms to increase the system dependability. This is especially true for programmable devices, were rapid design and implementation methodologies are coupled with testing environments that are easily generated and used. This paper describes the design of the HORUS processor, a RISC processor augmented with a concurrent error mechanism, the architectural modifications needed on the original design to minimize the resulting performance penalty.
{"title":"Delivering error detection capabilities into a field programmable device: the HORUS processor case study","authors":"F. Rodríguez, J. Campelo, J. J. Serrano","doi":"10.1109/FPT.2002.1188724","DOIUrl":"https://doi.org/10.1109/FPT.2002.1188724","url":null,"abstract":"Designing a complete SoC or reuse SoC components to create a complete system is a common task nowadays. The flexibility offered by current design flows offers the designer an unprecedented capability to incorporate more and more demanded features like error detection and correction mechanisms to increase the system dependability. This is especially true for programmable devices, were rapid design and implementation methodologies are coupled with testing environments that are easily generated and used. This paper describes the design of the HORUS processor, a RISC processor augmented with a concurrent error mechanism, the architectural modifications needed on the original design to minimize the resulting performance penalty.","PeriodicalId":355740,"journal":{"name":"2002 IEEE International Conference on Field-Programmable Technology, 2002. (FPT). Proceedings.","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130029620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-12-16DOI: 10.1109/FPT.2002.1188670
Dennis K. Y. Tong, Pui Sze Lo, Kin-Hong Lee, P. Leong
This paper describes system level issues encountered in a high performance implementation of a Rijndael encryption core on a memory-slot based reconfigurable computing platform called Pilchard. The Rijndael algorithm was adopted in 2000 by the US National Institute of Standards and Technology (NIST) as the Advanced Encryption Standard (AES). In the implementation of Rijndael, changing the number of unrolled rounds in the encryption core can affect the performance of the system. It is shown that for the design presented, the highest performance of 755 Mbit/sec was achieved by implementing a core with a single round. Although it is relatively easy to implement a high performance core on an FPGA, due to I/O bottlenecks, achieving high system level performance is more difficult. In order to optimize the performance of the host/FPGA interface, special instructions from the Intel Pentium III streaming SIMD extensions (SSE) along with write-combining memory operations were used. These features enabled the measured throughput of the AES core to reach 445 Mbit/sec which, although still slower than the AES core, was double that of an unoptimized interface.
{"title":"A system level implementation of Rijndael on a memory-slot based FPGA card","authors":"Dennis K. Y. Tong, Pui Sze Lo, Kin-Hong Lee, P. Leong","doi":"10.1109/FPT.2002.1188670","DOIUrl":"https://doi.org/10.1109/FPT.2002.1188670","url":null,"abstract":"This paper describes system level issues encountered in a high performance implementation of a Rijndael encryption core on a memory-slot based reconfigurable computing platform called Pilchard. The Rijndael algorithm was adopted in 2000 by the US National Institute of Standards and Technology (NIST) as the Advanced Encryption Standard (AES). In the implementation of Rijndael, changing the number of unrolled rounds in the encryption core can affect the performance of the system. It is shown that for the design presented, the highest performance of 755 Mbit/sec was achieved by implementing a core with a single round. Although it is relatively easy to implement a high performance core on an FPGA, due to I/O bottlenecks, achieving high system level performance is more difficult. In order to optimize the performance of the host/FPGA interface, special instructions from the Intel Pentium III streaming SIMD extensions (SSE) along with write-combining memory operations were used. These features enabled the measured throughput of the AES core to reach 445 Mbit/sec which, although still slower than the AES core, was double that of an unoptimized interface.","PeriodicalId":355740,"journal":{"name":"2002 IEEE International Conference on Field-Programmable Technology, 2002. (FPT). Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130149943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-12-16DOI: 10.1109/FPT.2002.1188667
K. Yeung, S. Chan
This paper proposes a new architecture for the implementation of multiplier-less FIR digital filters with programmable sum-of-powers-of-two (SOPOT) or canonical signed digit (CSD) coefficient representations. The multiplier-less FIR filter is implemented as the direct form structure with the filter coefficients represented as SOPOT representation, which can be realized as limited number of shifts and additions. Traditional VLSI implementations of multiplier-less FIR filters are usually hardwired and the filter coefficients cannot be programmed online. The proposed architecture is very modular in the structure and it can be connected to implement the multiplier-less FIR filter with arbitrary filter order and SOPOT terms using programmable SOPOT coefficients. The structure is also pipelined to achieve a high data throughput rate at low hardware cost. The proposed architecture was implemented and tested using the Altera FLEX 10K Field Programmable Gate Arrays (FPGA). The finite wordlength effect such as signal roundoff and overflow errors are also taken into account. A design example is given to demonstrate the feasibility of the proposed architecture.
{"title":"Multiplier-less FIR digital filters using programmable sum-of-power-of-two (SOPOT) coefficients","authors":"K. Yeung, S. Chan","doi":"10.1109/FPT.2002.1188667","DOIUrl":"https://doi.org/10.1109/FPT.2002.1188667","url":null,"abstract":"This paper proposes a new architecture for the implementation of multiplier-less FIR digital filters with programmable sum-of-powers-of-two (SOPOT) or canonical signed digit (CSD) coefficient representations. The multiplier-less FIR filter is implemented as the direct form structure with the filter coefficients represented as SOPOT representation, which can be realized as limited number of shifts and additions. Traditional VLSI implementations of multiplier-less FIR filters are usually hardwired and the filter coefficients cannot be programmed online. The proposed architecture is very modular in the structure and it can be connected to implement the multiplier-less FIR filter with arbitrary filter order and SOPOT terms using programmable SOPOT coefficients. The structure is also pipelined to achieve a high data throughput rate at low hardware cost. The proposed architecture was implemented and tested using the Altera FLEX 10K Field Programmable Gate Arrays (FPGA). The finite wordlength effect such as signal roundoff and overflow errors are also taken into account. A design example is given to demonstrate the feasibility of the proposed architecture.","PeriodicalId":355740,"journal":{"name":"2002 IEEE International Conference on Field-Programmable Technology, 2002. (FPT). Proceedings.","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127365337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-12-16DOI: 10.1109/FPT.2002.1188666
U. Malik, K. So, O. Diessel
The Circal process algebra is being used to explore the behavioural specification of systems that are mapped to field programmable logic circuits. In this paper we report on the implementation and performance of an interpreter for system specifications given in the Circal language. In contrast to the typical design flow for field programmable technology in which designs are statically partitioned, synthesised, and mapped to pre-allocated resources, in this system the specified circuits are extracted from behavioural specifications that are partitioned, elaborated, mapped, and configured at run time as control passes through them. We report on the details of a design that targets the Celoxica RC1000 co-processor and assess preliminary performance results for this implementation. The results clearly demonstrate our method is a practical approach to overcome resource constraints, particularly in applications where these change at run time. The results also establish a benchmark against which to measure future improvements and alternative methods.
{"title":"Resource-aware run-time elaboration of behavioural FPGA specifications","authors":"U. Malik, K. So, O. Diessel","doi":"10.1109/FPT.2002.1188666","DOIUrl":"https://doi.org/10.1109/FPT.2002.1188666","url":null,"abstract":"The Circal process algebra is being used to explore the behavioural specification of systems that are mapped to field programmable logic circuits. In this paper we report on the implementation and performance of an interpreter for system specifications given in the Circal language. In contrast to the typical design flow for field programmable technology in which designs are statically partitioned, synthesised, and mapped to pre-allocated resources, in this system the specified circuits are extracted from behavioural specifications that are partitioned, elaborated, mapped, and configured at run time as control passes through them. We report on the details of a design that targets the Celoxica RC1000 co-processor and assess preliminary performance results for this implementation. The results clearly demonstrate our method is a practical approach to overcome resource constraints, particularly in applications where these change at run time. The results also establish a benchmark against which to measure future improvements and alternative methods.","PeriodicalId":355740,"journal":{"name":"2002 IEEE International Conference on Field-Programmable Technology, 2002. (FPT). Proceedings.","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116725357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-12-16DOI: 10.1109/FPT.2002.1188674
Derek B. Gottlieb, Jeffrey J. Cook, Joshua D. Walstrom, Steve Ferrera, Chi-Wei Wang, N. Carter
In order to pose a successful challenge to conventional processor architectures, reconfigurable computing systems must achieve significantly better performance than conventional programmable processors by both greatly reducing the number of clock cycles required to execute a wide range of applications and achieving high clock rates when implemented in deep-submicron fabrication technologies. In this paper, we describe the architecture of Amalgam, a clustered programmable-reconfigurable processor that integrates multiple conventional processors and blocks of reconfigurable logic onto a single chip. Amalgam's distributed architecture allows implementation at high clock rates by limiting the impact of wire delay on cycle time and delivers an average of 13.7/spl times/ speedup on our benchmark applications when compared to an equivalent architecture that contains only a single programmable processor.
{"title":"Clustered programmable-reconfigurable processors","authors":"Derek B. Gottlieb, Jeffrey J. Cook, Joshua D. Walstrom, Steve Ferrera, Chi-Wei Wang, N. Carter","doi":"10.1109/FPT.2002.1188674","DOIUrl":"https://doi.org/10.1109/FPT.2002.1188674","url":null,"abstract":"In order to pose a successful challenge to conventional processor architectures, reconfigurable computing systems must achieve significantly better performance than conventional programmable processors by both greatly reducing the number of clock cycles required to execute a wide range of applications and achieving high clock rates when implemented in deep-submicron fabrication technologies. In this paper, we describe the architecture of Amalgam, a clustered programmable-reconfigurable processor that integrates multiple conventional processors and blocks of reconfigurable logic onto a single chip. Amalgam's distributed architecture allows implementation at high clock rates by limiting the impact of wire delay on cycle time and delivers an average of 13.7/spl times/ speedup on our benchmark applications when compared to an equivalent architecture that contains only a single programmable processor.","PeriodicalId":355740,"journal":{"name":"2002 IEEE International Conference on Field-Programmable Technology, 2002. (FPT). Proceedings.","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114186834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-12-16DOI: 10.1109/FPT.2002.1188663
A. Derbyshire, W. Luk
This paper explores representations and compilation of run-time parametrisable FPGA designs. We develop methods to produce designs with many run-time parameters, which would otherwise require an impractical number of bitstreams to be generated at compile time. Run-time parametrisation facilitates specialisation, which can be used to remove logic to produce a smaller and faster design. Our approach involves a source description based on Structural VHDL that allows designers to specify what parameters are available at compile time and at run time. Using this approach, converting a compile-time parameter into a run-time parameter or vice versa is straightforward. The source description does not contain explicit information on how to modify the design at run time. We describe a compilation scheme that can be used to extract this information, generate a run-time representation of the design and rapidly instantiate this representation at run time. We present techniques that allow a parametrised design to be incrementally modified in order to minimise the reconfiguration overhead Our compiler implementation generates a Java program that uses the JBits AN to implement the runtime representation and functions to incrementally modify the design. DES and AES encryption designs are used to illustrate our approach.
{"title":"Compiling run-time parametrisable designs","authors":"A. Derbyshire, W. Luk","doi":"10.1109/FPT.2002.1188663","DOIUrl":"https://doi.org/10.1109/FPT.2002.1188663","url":null,"abstract":"This paper explores representations and compilation of run-time parametrisable FPGA designs. We develop methods to produce designs with many run-time parameters, which would otherwise require an impractical number of bitstreams to be generated at compile time. Run-time parametrisation facilitates specialisation, which can be used to remove logic to produce a smaller and faster design. Our approach involves a source description based on Structural VHDL that allows designers to specify what parameters are available at compile time and at run time. Using this approach, converting a compile-time parameter into a run-time parameter or vice versa is straightforward. The source description does not contain explicit information on how to modify the design at run time. We describe a compilation scheme that can be used to extract this information, generate a run-time representation of the design and rapidly instantiate this representation at run time. We present techniques that allow a parametrised design to be incrementally modified in order to minimise the reconfiguration overhead Our compiler implementation generates a Java program that uses the JBits AN to implement the runtime representation and functions to incrementally modify the design. DES and AES encryption designs are used to illustrate our approach.","PeriodicalId":355740,"journal":{"name":"2002 IEEE International Conference on Field-Programmable Technology, 2002. (FPT). Proceedings.","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127366662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}