Pub Date : 2022-12-05DOI: 10.1109/icfpt56656.2022.9974448
W. Zhang, R. Cheung, Yuning. Liang, Hiroki Nakahara
{"title":"Message from the General Chair and Program Co-Chairs","authors":"W. Zhang, R. Cheung, Yuning. Liang, Hiroki Nakahara","doi":"10.1109/icfpt56656.2022.9974448","DOIUrl":"https://doi.org/10.1109/icfpt56656.2022.9974448","url":null,"abstract":"","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"2021 1","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75340110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accelerator-in-Switch: A Novel Cooperation Framework for FPGAs and GPUs","authors":"H. Amano","doi":"10.1109/FPT.2018.00010","DOIUrl":"https://doi.org/10.1109/FPT.2018.00010","url":null,"abstract":"","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"266 1","pages":"22"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77816777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FPGA Accelerated HPC and Data Analytics","authors":"M. Strickland","doi":"10.1109/FPT.2018.00009","DOIUrl":"https://doi.org/10.1109/FPT.2018.00009","url":null,"abstract":"","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"2017 1","pages":"21"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86709331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Novel Neural Network Applications on New Python Enabled Platforms","authors":"K. Vissers","doi":"10.1109/FPT.2018.00011","DOIUrl":"https://doi.org/10.1109/FPT.2018.00011","url":null,"abstract":"","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"55 1","pages":"23"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90405595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/FPT.2016.7929179
Yonghua Lin
IBM is the leader of Accelerator cloud technology innovation in industry. IBM Supervessel Cloud is the first cloud providing FPGA accelerator service and FPGA DevOps service to developers. In this talk, Yonghua Lin, Supervessel Cloud leader will share her view on why FPGA service in cloud is important, and how FPGA service could accelerate the Cognitive Computing in Cloud. Meanwhile, Yonghua will introduce the key technologies supporting FPGA service in cloud. Supervessel Cloud has launched FPGA as service for more than 18 months. The FPGA service has been used by developers from different countries. In this talk, Yonghua will also share the gap learned from all these users, and her vision for future.
{"title":"FPGA as service in public Cloud: Why and how","authors":"Yonghua Lin","doi":"10.1109/FPT.2016.7929179","DOIUrl":"https://doi.org/10.1109/FPT.2016.7929179","url":null,"abstract":"IBM is the leader of Accelerator cloud technology innovation in industry. IBM Supervessel Cloud is the first cloud providing FPGA accelerator service and FPGA DevOps service to developers. In this talk, Yonghua Lin, Supervessel Cloud leader will share her view on why FPGA service in cloud is important, and how FPGA service could accelerate the Cognitive Computing in Cloud. Meanwhile, Yonghua will introduce the key technologies supporting FPGA service in cloud. Supervessel Cloud has launched FPGA as service for more than 18 months. The FPGA service has been used by developers from different countries. In this talk, Yonghua will also share the gap learned from all these users, and her vision for future.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"85 1","pages":"3"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89861222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-12-01DOI: 10.1109/FPT.2016.7929177
J. Anderson
High-level synthesis (HLS) was first proposed in the 1980s. After spending decades on the sidelines of mainstream RTL digital design, there has been tremendous buzz around HLS technology in recent years. Indeed, HLS is on the upswing as a design methodology for field-programmable gate arrays (FPGAs) to improve designer productivity and ultimately, to make FPGA technology accessible to software engineers having limited hardware expertise. The hope is that down the road, software developers could use HLS to realize FPGA-based accelerators customized to applications that work in tandem with standard processors to raise computational throughput and energy efficiency. And, the further hope is that such HLS-generated accelerators operate close to the speed and energy efficiency of human-expert-designed accelerators. In this talk, I will overview the trends behind the recent drive towards FPGA HLS and why the need for, and use of, HLS will only become more pronounced in the coming years. I will argue that HLS, as opposed to traditional RTL design, is on the “right side of history”. The talk will highlight current HLS research directions and expose some of the challenges for HLS that may hinder its update in the digital design community. I will also describe work underway in the LegUp HLS project at the University of Toronto - a publicly available HLS tool that has been downloaded by over 4000 groups from around the world.
{"title":"High-level synthesis - the right side of history","authors":"J. Anderson","doi":"10.1109/FPT.2016.7929177","DOIUrl":"https://doi.org/10.1109/FPT.2016.7929177","url":null,"abstract":"High-level synthesis (HLS) was first proposed in the 1980s. After spending decades on the sidelines of mainstream RTL digital design, there has been tremendous buzz around HLS technology in recent years. Indeed, HLS is on the upswing as a design methodology for field-programmable gate arrays (FPGAs) to improve designer productivity and ultimately, to make FPGA technology accessible to software engineers having limited hardware expertise. The hope is that down the road, software developers could use HLS to realize FPGA-based accelerators customized to applications that work in tandem with standard processors to raise computational throughput and energy efficiency. And, the further hope is that such HLS-generated accelerators operate close to the speed and energy efficiency of human-expert-designed accelerators. In this talk, I will overview the trends behind the recent drive towards FPGA HLS and why the need for, and use of, HLS will only become more pronounced in the coming years. I will argue that HLS, as opposed to traditional RTL design, is on the “right side of history”. The talk will highlight current HLS research directions and expose some of the challenges for HLS that may hinder its update in the digital design community. I will also describe work underway in the LegUp HLS project at the University of Toronto - a publicly available HLS tool that has been downloaded by over 4000 groups from around the world.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"1 1","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82917649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/FPT.2014.7082766
Wenqiang Wang, Kaiyuan Guo, Mengyuan Gu, Yuchun Ma, Yu Wang
FPGA-based acceleration of matrix operations is a promising solution in mobile systems. However, most related work focuses on a certain operation instead of a complete system. In this paper, we explore the possibility of integrating multiple matrix accelerators with a master processor and propose a universal floating-point matrix processor. The processor supports multiple matrix-matrix operations (Level 3 BLAS) and the matrix size is unlimited. The key component of the processor is a shared matrix cache which enables on-chip communication between different accelerators. This structure reduces the external memory bandwidth requirement and improves the overall performance. Considering the performance of the whole system, an asynchronous instruction execution mechanism is further proposed in the hardware-software interface so as to reduce the workload of the master processor. We demonstrate the system using a DE3 develop board and achieve a computing performance of about 19 GFLOPS. Experiments show the proposed processor achieves higher performance and energy efficiency than some state-of-the-art embedded processors including ARM cortex A9 and NIOS Il/f soft-core processor. The performance of the processor is even comparable to some desktop processors.
{"title":"A universal FPGA-based floating-point matrix processor for mobile systems","authors":"Wenqiang Wang, Kaiyuan Guo, Mengyuan Gu, Yuchun Ma, Yu Wang","doi":"10.1109/FPT.2014.7082766","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082766","url":null,"abstract":"FPGA-based acceleration of matrix operations is a promising solution in mobile systems. However, most related work focuses on a certain operation instead of a complete system. In this paper, we explore the possibility of integrating multiple matrix accelerators with a master processor and propose a universal floating-point matrix processor. The processor supports multiple matrix-matrix operations (Level 3 BLAS) and the matrix size is unlimited. The key component of the processor is a shared matrix cache which enables on-chip communication between different accelerators. This structure reduces the external memory bandwidth requirement and improves the overall performance. Considering the performance of the whole system, an asynchronous instruction execution mechanism is further proposed in the hardware-software interface so as to reduce the workload of the master processor. We demonstrate the system using a DE3 develop board and achieve a computing performance of about 19 GFLOPS. Experiments show the proposed processor achieves higher performance and energy efficiency than some state-of-the-art embedded processors including ARM cortex A9 and NIOS Il/f soft-core processor. The performance of the processor is even comparable to some desktop processors.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"51 1","pages":"139-146"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75285538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/FPT.2014.7082798
Fubing Mao, Wei Zhang, Bingsheng He
Partial Reconfiguration (PR) is an advanced reconfigurable characteristic for FPGAs and it has the capability to reconfigure specific regions of FPGAs while the other parts are still active or are inactive in a shutdown mode after its initial configuration. It provides many benefits for industry, e.g. sharing the same hardware resource for different applications.
{"title":"Towards automatic partial reconfiguration in FPGAs","authors":"Fubing Mao, Wei Zhang, Bingsheng He","doi":"10.1109/FPT.2014.7082798","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082798","url":null,"abstract":"Partial Reconfiguration (PR) is an advanced reconfigurable characteristic for FPGAs and it has the capability to reconfigure specific regions of FPGAs while the other parts are still active or are inactive in a shutdown mode after its initial configuration. It provides many benefits for industry, e.g. sharing the same hardware resource for different applications.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"70 1","pages":"286-287"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77920035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/FPT.2014.7082824
Susumu Mashimo, K. Fukuda, M. Amagasaki, M. Iida, M. Kuga, T. Sueyoshi
In this article, we present a design of a Blokus Duo engine for the ICFPT 2014 Design Competition. Our design is implemented on a Xilinx Zynq-7000 SoC ZC706 Evaluation Kit and we employ the minimax algorithm with alpha-beta pruning. The ARM processor runs the search algorithm, and the handwritten hardware accelerator calculate within 1 second under the competition constraint. One of the keys to a stronger Blokus Duo player is to evaluate more states of a game; our Blokus Duo engine evaluates 12.3 times as many nodes of a game search tree as the Intel Core i7-3770T.
{"title":"Blokus Duo engine on a Zynq","authors":"Susumu Mashimo, K. Fukuda, M. Amagasaki, M. Iida, M. Kuga, T. Sueyoshi","doi":"10.1109/FPT.2014.7082824","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082824","url":null,"abstract":"In this article, we present a design of a Blokus Duo engine for the ICFPT 2014 Design Competition. Our design is implemented on a Xilinx Zynq-7000 SoC ZC706 Evaluation Kit and we employ the minimax algorithm with alpha-beta pruning. The ARM processor runs the search algorithm, and the handwritten hardware accelerator calculate within 1 second under the competition constraint. One of the keys to a stronger Blokus Duo player is to evaluate more states of a game; our Blokus Duo engine evaluates 12.3 times as many nodes of a game search tree as the Intel Core i7-3770T.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"7 1","pages":"374-377"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84254655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-12-01DOI: 10.1109/FPT.2014.7082781
T. Moorthy, S. Gopalakrishnan
We describe the design of a PC-to-FPGA data streaming platform that enables hardware acceleration of gigabyte scale input data. Specifically, the acceleration is an FPGA implementation of the Dialign Algorithm, which performs both global and local alignment of query biological sequences against relatively larger reference strands of biological sequences. Earlier implementations of this algorithm could not be scaled to handle gigabyte-length reference sequences, nor megabyte-length query sequences, due to the inherent limitations of available memory and logic on single-FPGA platforms. We solve these issues via the design of an Ethernet channel to stream the reference sequence, and describe the novel use of SATA based Solid State Drives (SSDs) to time multiplex the FPGA logic into handling larger query sequences as well. In doing so, this paper also presents a general method to achieve gigabyte-depth FIFOs on commercially available FPGA development boards. This benefits data-intensive acceleration even outside of the bioinformatics application domain. Through the development of our acceleration logic and careful coupling of the required IO peripherals, we have successfully demonstrated a processing time of 28.61 minutes for a 200 base-pair query-sequence aligned against a 1 GB reference-sequence, a rate that is limited only by SATA 2 SDD write speeds. The present runtime offers a 38× speedup (18.36 hours down to 28.61 minutes) compared to standalone PC based processing.
{"title":"Gigabyte-scale alignment acceleration of biological sequences via Ethernet streaming","authors":"T. Moorthy, S. Gopalakrishnan","doi":"10.1109/FPT.2014.7082781","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082781","url":null,"abstract":"We describe the design of a PC-to-FPGA data streaming platform that enables hardware acceleration of gigabyte scale input data. Specifically, the acceleration is an FPGA implementation of the Dialign Algorithm, which performs both global and local alignment of query biological sequences against relatively larger reference strands of biological sequences. Earlier implementations of this algorithm could not be scaled to handle gigabyte-length reference sequences, nor megabyte-length query sequences, due to the inherent limitations of available memory and logic on single-FPGA platforms. We solve these issues via the design of an Ethernet channel to stream the reference sequence, and describe the novel use of SATA based Solid State Drives (SSDs) to time multiplex the FPGA logic into handling larger query sequences as well. In doing so, this paper also presents a general method to achieve gigabyte-depth FIFOs on commercially available FPGA development boards. This benefits data-intensive acceleration even outside of the bioinformatics application domain. Through the development of our acceleration logic and careful coupling of the required IO peripherals, we have successfully demonstrated a processing time of 28.61 minutes for a 200 base-pair query-sequence aligned against a 1 GB reference-sequence, a rate that is limited only by SATA 2 SDD write speeds. The present runtime offers a 38× speedup (18.36 hours down to 28.61 minutes) compared to standalone PC based processing.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"54 1","pages":"227-230"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76939245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}