Pub Date : 1993-11-01DOI: 10.1109/FPGA.1993.279469
J. Babb, R. Tessier, A. Agarwal
Existing FPGA-based logic emulators only use a fraction of potential communication bandwidth because they dedicate each FPGA pin (physical wire) to a single emulated signal (logical wire). Virtual wires overcome pin limitations by intelligently multiplexing each physical wire among multiple logical wires and pipelining these connections at the maximum clocking frequency of the FPGA. A virtual wire represents a connection from a logical output on one FPGA to a logical input on another FPGA. Virtual wires not only increase usable bandwidth, but also relax the absolute limits imposed on gate utilization. The resulting improvement in bandwidth reduces the need for global interconnect, allowing effective use of low dimension inter-chip connections (such as nearest-neighbor). Nearest-neighbor topologies, coupled with the ability of virtual wires to overlap communication with computation, can even improve emulation speeds. The authors present the concept of virtual wires and describe their first implementation, a 'softwire' compiler which utilizes static routing and relies on minimal hardware support. Results from compiling netlists for the 18 K gate Sparcle microprocessor and the 86 K gate Alewife Communications and Cache Controller indicate that virtual wires can increase FPGA gate utilization beyond 80 percent without a significant slowdown in emulation speed.<>
{"title":"Virtual wires: overcoming pin limitations in FPGA-based logic emulators","authors":"J. Babb, R. Tessier, A. Agarwal","doi":"10.1109/FPGA.1993.279469","DOIUrl":"https://doi.org/10.1109/FPGA.1993.279469","url":null,"abstract":"Existing FPGA-based logic emulators only use a fraction of potential communication bandwidth because they dedicate each FPGA pin (physical wire) to a single emulated signal (logical wire). Virtual wires overcome pin limitations by intelligently multiplexing each physical wire among multiple logical wires and pipelining these connections at the maximum clocking frequency of the FPGA. A virtual wire represents a connection from a logical output on one FPGA to a logical input on another FPGA. Virtual wires not only increase usable bandwidth, but also relax the absolute limits imposed on gate utilization. The resulting improvement in bandwidth reduces the need for global interconnect, allowing effective use of low dimension inter-chip connections (such as nearest-neighbor). Nearest-neighbor topologies, coupled with the ability of virtual wires to overlap communication with computation, can even improve emulation speeds. The authors present the concept of virtual wires and describe their first implementation, a 'softwire' compiler which utilizes static routing and relies on minimal hardware support. Results from compiling netlists for the 18 K gate Sparcle microprocessor and the 86 K gate Alewife Communications and Cache Controller indicate that virtual wires can increase FPGA gate utilization beyond 80 percent without a significant slowdown in emulation speed.<<ETX>>","PeriodicalId":104383,"journal":{"name":"[1993] Proceedings IEEE Workshop on FPGAs for Custom Computing Machines","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129113730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-05DOI: 10.1109/FPGA.1993.279480
Steven Casselman
Virtual computing is an entirely new form of supercomputing that allows an algorithm to be implemented in hardware. Based on the Xilinx FPGA and ICube's FPID the Virtual Computer is completely reconfigurable in every respect. Computing machines based on reconfigurable logic are hyper-scalable meaning they scale up better than 1-1.<>
{"title":"Virtual computing and the Virtual Computer","authors":"Steven Casselman","doi":"10.1109/FPGA.1993.279480","DOIUrl":"https://doi.org/10.1109/FPGA.1993.279480","url":null,"abstract":"Virtual computing is an entirely new form of supercomputing that allows an algorithm to be implemented in hardware. Based on the Xilinx FPGA and ICube's FPID the Virtual Computer is completely reconfigurable in every respect. Computing machines based on reconfigurable logic are hyper-scalable meaning they scale up better than 1-1.<<ETX>>","PeriodicalId":104383,"journal":{"name":"[1993] Proceedings IEEE Workshop on FPGAs for Custom Computing Machines","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121317136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-05DOI: 10.1109/FPGA.1993.279485
Frédéric Raimbault, Dominique Lavenier, Stéphane Rubini, Bernard Pottier
The article presents the use of an FPGA chip (Xilinx 3090) to set up a fast systolic communication agent on a linear asynchronous network of transputer processors; the machine is called ArMen. The authors' work relies on the systolic programming environment ReLaCS, a close cousin to the C programming language. ReLaCS provides synchronous communication operators to simplify the programming of data transfers that occur in systolic algorithms. The ReLaCS compiler generates C programs that perform the computation process and the data management process of a systolic network.<>
{"title":"Fine grain parallelism on a MIMD machine using FPGAs","authors":"Frédéric Raimbault, Dominique Lavenier, Stéphane Rubini, Bernard Pottier","doi":"10.1109/FPGA.1993.279485","DOIUrl":"https://doi.org/10.1109/FPGA.1993.279485","url":null,"abstract":"The article presents the use of an FPGA chip (Xilinx 3090) to set up a fast systolic communication agent on a linear asynchronous network of transputer processors; the machine is called ArMen. The authors' work relies on the systolic programming environment ReLaCS, a close cousin to the C programming language. ReLaCS provides synchronous communication operators to simplify the programming of data transfers that occur in systolic algorithms. The ReLaCS compiler generates C programs that perform the computation process and the data management process of a systolic network.<<ETX>>","PeriodicalId":104383,"journal":{"name":"[1993] Proceedings IEEE Workshop on FPGAs for Custom Computing Machines","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125656695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-05DOI: 10.1109/FPGA.1993.279463
W. Luk, V. Lok, I. Page
The authors describe a method for speeding up divide-and-conquer algorithms with a hardware coprocessor, using sorting as an example. The method employs a conventional processor for the 'divide' and 'merge' phases, while the 'conquer' phase is handled by a purpose-built coprocessor. It is shown how transformation techniques from the Ruby language can be adopted in developing a family of systolic sorters, and how one of the resulting designs is prototyped in eight FPGAs on a PC coprocessor board known as CHS2*4 from Algotronix. The execution of the hardware unit is embedded in a sorting program, with the PC host merging the sorted sequences from the hardware sorter. The performance of this implementation is compared against various sorting algorithms on a number of PC systems.<>
{"title":"Hardware acceleration of divide-and-conquer paradigms: a case study","authors":"W. Luk, V. Lok, I. Page","doi":"10.1109/FPGA.1993.279463","DOIUrl":"https://doi.org/10.1109/FPGA.1993.279463","url":null,"abstract":"The authors describe a method for speeding up divide-and-conquer algorithms with a hardware coprocessor, using sorting as an example. The method employs a conventional processor for the 'divide' and 'merge' phases, while the 'conquer' phase is handled by a purpose-built coprocessor. It is shown how transformation techniques from the Ruby language can be adopted in developing a family of systolic sorters, and how one of the resulting designs is prototyped in eight FPGAs on a PC coprocessor board known as CHS2*4 from Algotronix. The execution of the hardware unit is embedded in a sorting program, with the PC host merging the sorted sequences from the hardware sorter. The performance of this implementation is compared against various sorting algorithms on a number of PC systems.<<ETX>>","PeriodicalId":104383,"journal":{"name":"[1993] Proceedings IEEE Workshop on FPGAs for Custom Computing Machines","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115383234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-05DOI: 10.1109/FPGA.1993.279470
L. Wood
M is a highly parallel asynchronous computer for the analysis and control of complex systems. A complex system is a system with many interacting components. Examples of complex systems include applications in molecular biology, economics, and signal processing. M asynchronous computations reproduce the structural dynamics of a system using high fidelity behavioral modeling. Programs are composed of an application model, an environment model, and a distributed subsumption operating system. Processes are implemented using position independent instructions (broadcast automata) that operate in parallel on strings of binary data. All M FPGA fine grained parallel processing nodes are double buffered, asynchronous, and highly pipelined. The fiber system memory is optically multiplexed, and asynchronous. The technology will extend new gigabit ATM optical networks with integrated high performance computing services.<>
{"title":"High performance analysis and control of complex systems using dynamically reconfigurable silicon and optical fiber memory","authors":"L. Wood","doi":"10.1109/FPGA.1993.279470","DOIUrl":"https://doi.org/10.1109/FPGA.1993.279470","url":null,"abstract":"M is a highly parallel asynchronous computer for the analysis and control of complex systems. A complex system is a system with many interacting components. Examples of complex systems include applications in molecular biology, economics, and signal processing. M asynchronous computations reproduce the structural dynamics of a system using high fidelity behavioral modeling. Programs are composed of an application model, an environment model, and a distributed subsumption operating system. Processes are implemented using position independent instructions (broadcast automata) that operate in parallel on strings of binary data. All M FPGA fine grained parallel processing nodes are double buffered, asynchronous, and highly pipelined. The fiber system memory is optically multiplexed, and asynchronous. The technology will extend new gigabit ATM optical networks with integrated high performance computing services.<<ETX>>","PeriodicalId":104383,"journal":{"name":"[1993] Proceedings IEEE Workshop on FPGAs for Custom Computing Machines","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130225214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-05DOI: 10.1109/FPGA.1993.279474
Maya Gokhale, Ron Minnich
The authors demonstrate a new technique for automatically synthesizing digital logic from a high level algorithmic description in a data parallel language. The methodology has been implemented using the Splash 2 reconfigurable logic arrays for programs written in Data-parallel Bit-serial C (dbC). The translator generates a VHDL description of a SIMD processor array with one or more processors per Xilinx 4010 FPGA. The instruction set of each processor is customized to the dbC program being processed. In addition to the usual arithmetic operations, nearest neighbor communication, host-to-processor communication, and global reductions are supported.<>
{"title":"FPGA computing in a data parallel C","authors":"Maya Gokhale, Ron Minnich","doi":"10.1109/FPGA.1993.279474","DOIUrl":"https://doi.org/10.1109/FPGA.1993.279474","url":null,"abstract":"The authors demonstrate a new technique for automatically synthesizing digital logic from a high level algorithmic description in a data parallel language. The methodology has been implemented using the Splash 2 reconfigurable logic arrays for programs written in Data-parallel Bit-serial C (dbC). The translator generates a VHDL description of a SIMD processor array with one or more processors per Xilinx 4010 FPGA. The instruction set of each processor is customized to the dbC program being processed. In addition to the usual arithmetic operations, nearest neighbor communication, host-to-processor communication, and global reductions are supported.<<ETX>>","PeriodicalId":104383,"journal":{"name":"[1993] Proceedings IEEE Workshop on FPGAs for Custom Computing Machines","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121277175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-05DOI: 10.1109/FPGA.1993.279467
P. Foulk
FPGAs which are configured by static RAM can be rapidly changed from one logic configuration to another. This raises the possibility of configuring the logic to implement a function for a specific set of values, i.e. folding the inputs into the logic design. The paper discusses data folding with respect to Algotronix FPGAs, presenting a text searching circuit as an example. This folded circuit saves at least half the logic over a conventional circuit, and very much more if data folding is taken as far as possible. It also presents performance figures for the folded circuit, and discusses other applications, and suggests features which are desirable if data folding is to be practicable, most of which are possessed by the Algotronix CAL array.<>
{"title":"Data-folding in SRAM configurable FPGAs","authors":"P. Foulk","doi":"10.1109/FPGA.1993.279467","DOIUrl":"https://doi.org/10.1109/FPGA.1993.279467","url":null,"abstract":"FPGAs which are configured by static RAM can be rapidly changed from one logic configuration to another. This raises the possibility of configuring the logic to implement a function for a specific set of values, i.e. folding the inputs into the logic design. The paper discusses data folding with respect to Algotronix FPGAs, presenting a text searching circuit as an example. This folded circuit saves at least half the logic over a conventional circuit, and very much more if data folding is taken as far as possible. It also presents performance figures for the folded circuit, and discusses other applications, and suggests features which are desirable if data folding is to be practicable, most of which are possessed by the Algotronix CAL array.<<ETX>>","PeriodicalId":104383,"journal":{"name":"[1993] Proceedings IEEE Workshop on FPGAs for Custom Computing Machines","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116690911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-05DOI: 10.1109/FPGA.1993.279484
M. Wazlowski, L. Agarwal, T. Lee, A. Smith, E. Lam, P. Athanas, H. Silverman, S. Ghosh
This paper discusses the architecture and compiler for a general-purpose metamorphic computing platform called PRISM-II. PRISM-II improves the performance of many computationally-intensive tasks by augmenting the functionality of the core processor with new instructions that match the characteristics of targeted applications. In essence, PRISM (processor reconfiguration through instruction set metamorphosis) is a general purpose hardware platform that behaves like an application-specific platform. Two methods for hardware synthesis, one using VHDL Designer and the other using X-BLOX, are presented and synthesis results are compared.<>
{"title":"PRISM-II compiler and architecture","authors":"M. Wazlowski, L. Agarwal, T. Lee, A. Smith, E. Lam, P. Athanas, H. Silverman, S. Ghosh","doi":"10.1109/FPGA.1993.279484","DOIUrl":"https://doi.org/10.1109/FPGA.1993.279484","url":null,"abstract":"This paper discusses the architecture and compiler for a general-purpose metamorphic computing platform called PRISM-II. PRISM-II improves the performance of many computationally-intensive tasks by augmenting the functionality of the core processor with new instructions that match the characteristics of targeted applications. In essence, PRISM (processor reconfiguration through instruction set metamorphosis) is a general purpose hardware platform that behaves like an application-specific platform. Two methods for hardware synthesis, one using VHDL Designer and the other using X-BLOX, are presented and synthesis results are compared.<<ETX>>","PeriodicalId":104383,"journal":{"name":"[1993] Proceedings IEEE Workshop on FPGAs for Custom Computing Machines","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134491192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-05DOI: 10.1109/FPGA.1993.279471
Steven, Cuccaro, Craig E Reese
This paper describes the CM-2X prototype. This one-of-a-kind machine is the result of a Supercomputing Research Center/Thinking Machines Corporation joint effort to examine the suitability of a hybrid combination of CM-2 architecture and Xilinx programmable gate array technology. In addition to a description of the CM-2X and Xilinx architecture, a simple applications example is provided that illustrates many of the issues involved in programming the machine.<>
{"title":"The CM-2X: a hybrid CM-2/Xilinx prototype","authors":"Steven, Cuccaro, Craig E Reese","doi":"10.1109/FPGA.1993.279471","DOIUrl":"https://doi.org/10.1109/FPGA.1993.279471","url":null,"abstract":"This paper describes the CM-2X prototype. This one-of-a-kind machine is the result of a Supercomputing Research Center/Thinking Machines Corporation joint effort to examine the suitability of a hybrid combination of CM-2 architecture and Xilinx programmable gate array technology. In addition to a description of the CM-2X and Xilinx architecture, a simple applications example is provided that illustrates many of the issues involved in programming the machine.<<ETX>>","PeriodicalId":104383,"journal":{"name":"[1993] Proceedings IEEE Workshop on FPGAs for Custom Computing Machines","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114881215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-05DOI: 10.1109/FPGA.1993.279478
D. Lewis, M.H. van Ierssel, D. H. Wong
The paper describes a special purpose application accelerator using field programmable gate arrays to accelerate a range of applications. The accelerator is designed to support applications by allowing the user to implement a processor with an instruction set designed for the specific application being accelerated, using specialized instructions to implement critical fragments of the application. A compiled-code software organization is used to reduce overhead operations. A prototype has been built, and the first application to be ported to it, logic simulation, is underway.<>
{"title":"A field programmable accelerator for compiled-code applications","authors":"D. Lewis, M.H. van Ierssel, D. H. Wong","doi":"10.1109/FPGA.1993.279478","DOIUrl":"https://doi.org/10.1109/FPGA.1993.279478","url":null,"abstract":"The paper describes a special purpose application accelerator using field programmable gate arrays to accelerate a range of applications. The accelerator is designed to support applications by allowing the user to implement a processor with an instruction set designed for the specific application being accelerated, using specialized instructions to implement critical fragments of the application. A compiled-code software organization is used to reduce overhead operations. A prototype has been built, and the first application to be ported to it, logic simulation, is underway.<<ETX>>","PeriodicalId":104383,"journal":{"name":"[1993] Proceedings IEEE Workshop on FPGAs for Custom Computing Machines","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133279495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}