2014 International Conference on Field-Programmable Technology (FPT)最新文献

英文中文

Message from the General Chair and Program Co-Chairs 来自总主委和项目联合主委的信息

2014 International Conference on Field-Programmable Technology (FPT)

Pub Date : 2022-12-05 DOI: 10.1109/icfpt56656.2022.9974448

W. Zhang, R. Cheung, Yuning. Liang, Hiroki Nakahara

引用次数: 0

Accelerator-in-Switch: A Novel Cooperation Framework for FPGAs and GPUs 开关中的加速器:fpga与gpu的新型合作框架

2014 International Conference on Field-Programmable Technology (FPT)

Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00010

H. Amano

引用次数: 0

FPGA Accelerated HPC and Data Analytics FPGA加速HPC和数据分析

2014 International Conference on Field-Programmable Technology (FPT)

Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00009

M. Strickland

引用次数: 1

Novel Neural Network Applications on New Python Enabled Platforms 新的支持Python的平台上的新颖神经网络应用

2014 International Conference on Field-Programmable Technology (FPT)

Pub Date : 2018-12-01 DOI: 10.1109/FPT.2018.00011

K. Vissers

引用次数: 0

FPGA as service in public Cloud: Why and how FPGA在公共云中的服务:为什么和如何

2014 International Conference on Field-Programmable Technology (FPT)

Pub Date : 2016-12-01 DOI: 10.1109/FPT.2016.7929179

Yonghua Lin

IBM is the leader of Accelerator cloud technology innovation in industry. IBM Supervessel Cloud is the first cloud providing FPGA accelerator service and FPGA DevOps service to developers. In this talk, Yonghua Lin, Supervessel Cloud leader will share her view on why FPGA service in cloud is important, and how FPGA service could accelerate the Cognitive Computing in Cloud. Meanwhile, Yonghua will introduce the key technologies supporting FPGA service in cloud. Supervessel Cloud has launched FPGA as service for more than 18 months. The FPGA service has been used by developers from different countries. In this talk, Yonghua will also share the gap learned from all these users, and her vision for future.

IBM是行业加速器云技术创新的领导者。IBM Supervessel Cloud是第一个为开发者提供FPGA加速器服务和FPGA DevOps服务的云。在这次演讲中，Supervessel Cloud的负责人林永华将分享她对FPGA服务在云中的重要性以及FPGA服务如何加速云中的认知计算的看法。同时，永华将介绍支持FPGA云服务的关键技术。Supervessel Cloud推出FPGA服务已经超过18个月。FPGA服务已被来自不同国家的开发人员使用。在这次演讲中，永华也将分享从这些用户身上学到的差距，以及她对未来的展望。

引用次数: 0

High-level synthesis - the right side of history 高层次的综合——历史正确的一面

2014 International Conference on Field-Programmable Technology (FPT)

Pub Date : 2016-12-01 DOI: 10.1109/FPT.2016.7929177

J. Anderson

High-level synthesis (HLS) was first proposed in the 1980s. After spending decades on the sidelines of mainstream RTL digital design, there has been tremendous buzz around HLS technology in recent years. Indeed, HLS is on the upswing as a design methodology for field-programmable gate arrays (FPGAs) to improve designer productivity and ultimately, to make FPGA technology accessible to software engineers having limited hardware expertise. The hope is that down the road, software developers could use HLS to realize FPGA-based accelerators customized to applications that work in tandem with standard processors to raise computational throughput and energy efficiency. And, the further hope is that such HLS-generated accelerators operate close to the speed and energy efficiency of human-expert-designed accelerators. In this talk, I will overview the trends behind the recent drive towards FPGA HLS and why the need for, and use of, HLS will only become more pronounced in the coming years. I will argue that HLS, as opposed to traditional RTL design, is on the “right side of history”. The talk will highlight current HLS research directions and expose some of the challenges for HLS that may hinder its update in the digital design community. I will also describe work underway in the LegUp HLS project at the University of Toronto - a publicly available HLS tool that has been downloaded by over 4000 groups from around the world.

高水平合成(HLS)在20世纪80年代首次提出。在主流RTL数字设计的边缘徘徊了几十年之后，近年来HLS技术得到了极大的关注。事实上，HLS作为一种现场可编程门阵列(FPGA)的设计方法正在上升，以提高设计师的工作效率，并最终使硬件专业知识有限的软件工程师能够使用FPGA技术。希望在未来的道路上，软件开发人员可以使用HLS实现基于fpga的加速器，定制应用程序，与标准处理器协同工作，以提高计算吞吐量和能源效率。而且，进一步的希望是，这种由hls产生的加速器的运行速度和能量效率接近人类专家设计的加速器。在这次演讲中，我将概述最近FPGA HLS背后的趋势，以及为什么HLS的需求和使用在未来几年只会变得更加明显。我认为，与传统的RTL设计相反，HLS是站在“历史正确的一边”。讲座将重点介绍当前HLS的研究方向，并揭示HLS面临的一些挑战，这些挑战可能会阻碍其在数字设计界的更新。我还将介绍多伦多大学的LegUp HLS项目中正在进行的工作，这是一个公开可用的HLS工具，已被来自世界各地的4000多个团体下载。

{"title":"High-level synthesis - the right side of history","authors":"J. Anderson","doi":"10.1109/FPT.2016.7929177","DOIUrl":"https://doi.org/10.1109/FPT.2016.7929177","url":null,"abstract":"High-level synthesis (HLS) was first proposed in the 1980s. After spending decades on the sidelines of mainstream RTL digital design, there has been tremendous buzz around HLS technology in recent years. Indeed, HLS is on the upswing as a design methodology for field-programmable gate arrays (FPGAs) to improve designer productivity and ultimately, to make FPGA technology accessible to software engineers having limited hardware expertise. The hope is that down the road, software developers could use HLS to realize FPGA-based accelerators customized to applications that work in tandem with standard processors to raise computational throughput and energy efficiency. And, the further hope is that such HLS-generated accelerators operate close to the speed and energy efficiency of human-expert-designed accelerators. In this talk, I will overview the trends behind the recent drive towards FPGA HLS and why the need for, and use of, HLS will only become more pronounced in the coming years. I will argue that HLS, as opposed to traditional RTL design, is on the “right side of history”. The talk will highlight current HLS research directions and expose some of the challenges for HLS that may hinder its update in the digital design community. I will also describe work underway in the LegUp HLS project at the University of Toronto - a publicly available HLS tool that has been downloaded by over 4000 groups from around the world.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"1 1","pages":"1"},"PeriodicalIF":0.0,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82917649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A universal FPGA-based floating-point matrix processor for mobile systems 一种通用的基于fpga的移动系统浮点矩阵处理器

2014 International Conference on Field-Programmable Technology (FPT)

Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082766

Wenqiang Wang, Kaiyuan Guo, Mengyuan Gu, Yuchun Ma, Yu Wang

FPGA-based acceleration of matrix operations is a promising solution in mobile systems. However, most related work focuses on a certain operation instead of a complete system. In this paper, we explore the possibility of integrating multiple matrix accelerators with a master processor and propose a universal floating-point matrix processor. The processor supports multiple matrix-matrix operations (Level 3 BLAS) and the matrix size is unlimited. The key component of the processor is a shared matrix cache which enables on-chip communication between different accelerators. This structure reduces the external memory bandwidth requirement and improves the overall performance. Considering the performance of the whole system, an asynchronous instruction execution mechanism is further proposed in the hardware-software interface so as to reduce the workload of the master processor. We demonstrate the system using a DE3 develop board and achieve a computing performance of about 19 GFLOPS. Experiments show the proposed processor achieves higher performance and energy efficiency than some state-of-the-art embedded processors including ARM cortex A9 and NIOS Il/f soft-core processor. The performance of the processor is even comparable to some desktop processors.

基于fpga的矩阵运算加速是移动系统中一个很有前途的解决方案。然而，大多数相关工作都集中在某个操作上，而不是一个完整的系统。本文探讨了用一个主处理器集成多个矩阵加速器的可能性，并提出了一个通用浮点矩阵处理器。该处理器支持多重矩阵-矩阵运算(Level 3 BLAS)，矩阵大小不限。处理器的关键组件是一个共享矩阵缓存，它使不同加速器之间的片上通信成为可能。这种结构降低了外部内存带宽需求，提高了整体性能。考虑到整个系统的性能，在硬件软件接口上进一步提出了异步指令执行机制，以减少主处理器的工作量。我们使用DE3开发板对系统进行了演示，并实现了约19 GFLOPS的计算性能。实验表明，该处理器比ARM cortex A9和NIOS Il/f软核处理器具有更高的性能和能效。处理器的性能甚至可以与一些桌面处理器相媲美。

{"title":"A universal FPGA-based floating-point matrix processor for mobile systems","authors":"Wenqiang Wang, Kaiyuan Guo, Mengyuan Gu, Yuchun Ma, Yu Wang","doi":"10.1109/FPT.2014.7082766","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082766","url":null,"abstract":"FPGA-based acceleration of matrix operations is a promising solution in mobile systems. However, most related work focuses on a certain operation instead of a complete system. In this paper, we explore the possibility of integrating multiple matrix accelerators with a master processor and propose a universal floating-point matrix processor. The processor supports multiple matrix-matrix operations (Level 3 BLAS) and the matrix size is unlimited. The key component of the processor is a shared matrix cache which enables on-chip communication between different accelerators. This structure reduces the external memory bandwidth requirement and improves the overall performance. Considering the performance of the whole system, an asynchronous instruction execution mechanism is further proposed in the hardware-software interface so as to reduce the workload of the master processor. We demonstrate the system using a DE3 develop board and achieve a computing performance of about 19 GFLOPS. Experiments show the proposed processor achieves higher performance and energy efficiency than some state-of-the-art embedded processors including ARM cortex A9 and NIOS Il/f soft-core processor. The performance of the processor is even comparable to some desktop processors.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"51 1","pages":"139-146"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75285538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Towards automatic partial reconfiguration in FPGAs fpga的自动部分重构

2014 International Conference on Field-Programmable Technology (FPT)

Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082798

Fubing Mao, Wei Zhang, Bingsheng He

Partial Reconfiguration (PR) is an advanced reconfigurable characteristic for FPGAs and it has the capability to reconfigure specific regions of FPGAs while the other parts are still active or are inactive in a shutdown mode after its initial configuration. It provides many benefits for industry, e.g. sharing the same hardware resource for different applications.

部分重新配置(PR)是fpga的一种高级可重新配置特性，它具有在初始配置后其他部分仍处于活动状态或在关闭模式下不活动时重新配置fpga特定区域的能力。它为工业提供了许多好处，例如为不同的应用程序共享相同的硬件资源。

引用次数: 3

Blokus Duo engine on a Zynq 在Zynq上使用Blokus Duo引擎

2014 International Conference on Field-Programmable Technology (FPT)

Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082824

Susumu Mashimo, K. Fukuda, M. Amagasaki, M. Iida, M. Kuga, T. Sueyoshi

In this article, we present a design of a Blokus Duo engine for the ICFPT 2014 Design Competition. Our design is implemented on a Xilinx Zynq-7000 SoC ZC706 Evaluation Kit and we employ the minimax algorithm with alpha-beta pruning. The ARM processor runs the search algorithm, and the handwritten hardware accelerator calculate within 1 second under the competition constraint. One of the keys to a stronger Blokus Duo player is to evaluate more states of a game; our Blokus Duo engine evaluates 12.3 times as many nodes of a game search tree as the Intel Core i7-3770T.

在本文中，我们为ICFPT 2014设计竞赛提供了一个Blokus Duo引擎的设计。我们的设计是在Xilinx Zynq-7000 SoC ZC706评估套件上实现的，我们采用了带有α - β修剪的极大极小算法。ARM处理器运行搜索算法，手写硬件加速器在竞争约束下1秒内完成计算。成为更强大的《Blokus Duo》玩家的关键之一便是评估游戏的更多状态;我们的Blokus Duo引擎评估的游戏搜索树节点数量是Intel酷睿i7-3770T的12.3倍。

引用次数: 0

Gigabyte-scale alignment acceleration of biological sequences via Ethernet streaming 通过以太网流实现生物序列的千兆级对齐加速

2014 International Conference on Field-Programmable Technology (FPT)

Pub Date : 2014-12-01 DOI: 10.1109/FPT.2014.7082781

T. Moorthy, S. Gopalakrishnan

We describe the design of a PC-to-FPGA data streaming platform that enables hardware acceleration of gigabyte scale input data. Specifically, the acceleration is an FPGA implementation of the Dialign Algorithm, which performs both global and local alignment of query biological sequences against relatively larger reference strands of biological sequences. Earlier implementations of this algorithm could not be scaled to handle gigabyte-length reference sequences, nor megabyte-length query sequences, due to the inherent limitations of available memory and logic on single-FPGA platforms. We solve these issues via the design of an Ethernet channel to stream the reference sequence, and describe the novel use of SATA based Solid State Drives (SSDs) to time multiplex the FPGA logic into handling larger query sequences as well. In doing so, this paper also presents a general method to achieve gigabyte-depth FIFOs on commercially available FPGA development boards. This benefits data-intensive acceleration even outside of the bioinformatics application domain. Through the development of our acceleration logic and careful coupling of the required IO peripherals, we have successfully demonstrated a processing time of 28.61 minutes for a 200 base-pair query-sequence aligned against a 1 GB reference-sequence, a rate that is limited only by SATA 2 SDD write speeds. The present runtime offers a 38× speedup (18.36 hours down to 28.61 minutes) compared to standalone PC based processing.

我们描述了一个pc到fpga数据流平台的设计，该平台可以实现千兆字节规模输入数据的硬件加速。具体来说，加速是Dialign算法的FPGA实现，它根据相对较大的生物序列参考链执行查询生物序列的全局和局部对齐。由于单fpga平台上可用内存和逻辑的固有限制，该算法的早期实现无法扩展到处理千兆字节长度的参考序列，也无法处理兆字节长度的查询序列。我们通过设计一个以太网通道来传输参考序列来解决这些问题，并描述了基于SATA的固态硬盘(ssd)的新颖使用，以便将FPGA逻辑时间复用以处理更大的查询序列。在此过程中，本文还提出了在商用FPGA开发板上实现千兆字节深度fifo的一般方法。这有利于数据密集型加速，甚至在生物信息学应用领域之外。通过开发我们的加速逻辑和仔细耦合所需的IO外设，我们已经成功地演示了针对1 GB参考序列的200碱基对查询序列的处理时间为28.61分钟，该速率仅受SATA 2 SDD写入速度的限制。与基于独立PC的处理相比，当前运行时提供了38倍的加速(从18.36小时减少到28.61分钟)。

{"title":"Gigabyte-scale alignment acceleration of biological sequences via Ethernet streaming","authors":"T. Moorthy, S. Gopalakrishnan","doi":"10.1109/FPT.2014.7082781","DOIUrl":"https://doi.org/10.1109/FPT.2014.7082781","url":null,"abstract":"We describe the design of a PC-to-FPGA data streaming platform that enables hardware acceleration of gigabyte scale input data. Specifically, the acceleration is an FPGA implementation of the Dialign Algorithm, which performs both global and local alignment of query biological sequences against relatively larger reference strands of biological sequences. Earlier implementations of this algorithm could not be scaled to handle gigabyte-length reference sequences, nor megabyte-length query sequences, due to the inherent limitations of available memory and logic on single-FPGA platforms. We solve these issues via the design of an Ethernet channel to stream the reference sequence, and describe the novel use of SATA based Solid State Drives (SSDs) to time multiplex the FPGA logic into handling larger query sequences as well. In doing so, this paper also presents a general method to achieve gigabyte-depth FIFOs on commercially available FPGA development boards. This benefits data-intensive acceleration even outside of the bioinformatics application domain. Through the development of our acceleration logic and careful coupling of the required IO peripherals, we have successfully demonstrated a processing time of 28.61 minutes for a 200 base-pair query-sequence aligned against a 1 GB reference-sequence, a rate that is limited only by SATA 2 SDD write speeds. The present runtime offers a 38× speedup (18.36 hours down to 28.61 minutes) compared to standalone PC based processing.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"54 1","pages":"227-230"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76939245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2014 International Conference on Field-Programmable Technology (FPT)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀