高级合成是否已准备就绪?计算金融案例研究

2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI:10.1109/FPT.2014.7082747

G. Inggs, Shane T. Fleming, David B. Thomas, W. Luk

{"title":"高级合成是否已准备就绪?计算金融案例研究","authors":"G. Inggs, Shane T. Fleming, David B. Thomas, W. Luk","doi":"10.1109/FPT.2014.7082747","DOIUrl":null,"url":null,"abstract":"High Level Synthesis (HLS) tools for Field Programmable Gate Arrays (FPGAs) have made considerable progress, and are now sufficiently mature that a novice developer could create functionally correct implementation with limited understanding of the target hardware. In this case study, a novice developer considers a benchmark of financial problems for implementation upon FPGA via HLS. This novice starts by extending an existing implementation for a CPU or GPU using tools such as Xilinx's Vivado HLS, the Altera OpenCL SDK or Maxeler's MaxCompiler. When their direct source code translation inevitably didn't meet performance expectations, this developer then applies optimisations such as exploiting task or pipeline parallelism as well as C-slowing. When a combination of these optimisations are considered for a range of devices and process technologies, an acceleration of up to 220 times is achieved using these tools, the sort of acceleration expected of custom architectures. Compared to the 31 times improvement shown by an optimised Multicore CPU implementation, the 60 times improvement by a GPU and 207 times by a Xeon Phi, these results suggest that HLS is indeed ready for industrial adoption.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"66 1","pages":"12-19"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"27","resultStr":"{\"title\":\"Is high level synthesis ready for business? A computational finance case study\",\"authors\":\"G. Inggs, Shane T. Fleming, David B. Thomas, W. Luk\",\"doi\":\"10.1109/FPT.2014.7082747\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"High Level Synthesis (HLS) tools for Field Programmable Gate Arrays (FPGAs) have made considerable progress, and are now sufficiently mature that a novice developer could create functionally correct implementation with limited understanding of the target hardware. In this case study, a novice developer considers a benchmark of financial problems for implementation upon FPGA via HLS. This novice starts by extending an existing implementation for a CPU or GPU using tools such as Xilinx's Vivado HLS, the Altera OpenCL SDK or Maxeler's MaxCompiler. When their direct source code translation inevitably didn't meet performance expectations, this developer then applies optimisations such as exploiting task or pipeline parallelism as well as C-slowing. When a combination of these optimisations are considered for a range of devices and process technologies, an acceleration of up to 220 times is achieved using these tools, the sort of acceleration expected of custom architectures. Compared to the 31 times improvement shown by an optimised Multicore CPU implementation, the 60 times improvement by a GPU and 207 times by a Xeon Phi, these results suggest that HLS is indeed ready for industrial adoption.\",\"PeriodicalId\":6877,\"journal\":{\"name\":\"2014 International Conference on Field-Programmable Technology (FPT)\",\"volume\":\"66 1\",\"pages\":\"12-19\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"27\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 International Conference on Field-Programmable Technology (FPT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FPT.2014.7082747\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Field-Programmable Technology (FPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPT.2014.7082747","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 27

摘要

用于现场可编程门阵列(fpga)的高级综合(HLS)工具已经取得了相当大的进展，并且现在已经足够成熟，以至于新手开发人员可以在对目标硬件了解有限的情况下创建功能正确的实现。在本案例研究中，新手开发人员考虑通过HLS在FPGA上实现财务问题的基准。这个新手首先使用Xilinx的Vivado HLS、Altera OpenCL SDK或Maxeler的MaxCompiler等工具扩展CPU或GPU的现有实现。当他们的直接源代码翻译不可避免地不能满足性能期望时，该开发人员就会应用优化，例如利用任务或管道并行性以及c减慢。当考虑将这些优化组合用于一系列设备和工艺技术时，使用这些工具可实现高达220倍的加速，这是自定义架构所期望的那种加速。与优化后的多核CPU实现的31倍改进、GPU的60倍改进和Xeon Phi的207倍改进相比，这些结果表明HLS确实已经为工业应用做好了准备。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Is high level synthesis ready for business? A computational finance case study

High Level Synthesis (HLS) tools for Field Programmable Gate Arrays (FPGAs) have made considerable progress, and are now sufficiently mature that a novice developer could create functionally correct implementation with limited understanding of the target hardware. In this case study, a novice developer considers a benchmark of financial problems for implementation upon FPGA via HLS. This novice starts by extending an existing implementation for a CPU or GPU using tools such as Xilinx's Vivado HLS, the Altera OpenCL SDK or Maxeler's MaxCompiler. When their direct source code translation inevitably didn't meet performance expectations, this developer then applies optimisations such as exploiting task or pipeline parallelism as well as C-slowing. When a combination of these optimisations are considered for a range of devices and process technologies, an acceleration of up to 220 times is achieved using these tools, the sort of acceleration expected of custom architectures. Compared to the 31 times improvement shown by an optimised Multicore CPU implementation, the 60 times improvement by a GPU and 207 times by a Xeon Phi, these results suggest that HLS is indeed ready for industrial adoption.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 International Conference on Field-Programmable Technology (FPT)

自引率

0.00%

发文量

期刊最新文献

Message from the General Chair and Program Co-Chairs Accelerator-in-Switch: A Novel Cooperation Framework for FPGAs and GPUs FPGA Accelerated HPC and Data Analytics Novel Neural Network Applications on New Python Enabled Platforms High-level synthesis - the right side of history