基于共轭梯度粗网格解的多网格求解器可重构硬件生成

Parallel Process. Lett. Pub Date : 2018-12-01 DOI:10.1142/S0129626418500160

Christian Schmitt, Moritz Schmid, S. Kuckuk, H. Köstler, Jürgen Teich, Frank Hannig

{"title":"基于共轭梯度粗网格解的多网格求解器可重构硬件生成","authors":"Christian Schmitt, Moritz Schmid, S. Kuckuk, H. Köstler, Jürgen Teich, Frank Hannig","doi":"10.1142/S0129626418500160","DOIUrl":null,"url":null,"abstract":"Not only in the field of high-performance computing (HPC), field programmable gate arrays (FPGAs) are a soaringly popular accelerator technology. However, they use a completely different programming paradigm and tool set compared to central processing units (CPUs) or even graphics processing units (GPUs), adding extra development steps and requiring special knowledge, hindering widespread use in scientific computing. To bridge this programmability gap, domain-specific languages (DSLs) are a popular choice to generate low-level implementations from an abstract algorithm description. In this work, we demonstrate our approach for the generation of numerical solver implementations based on the multigrid method for FPGAs from the same code base that is also used to generate code for CPUs using a hybrid parallelization of MPI and OpenMP. Our approach yields in a hardware design that can compute up to 11 V-cycles per second with an input grid size of 4096[Formula: see text]4096 and solution on the coarsest using the conjugate gradient (CG) method on a mid-range FPGA, beating vectorized, multi-threaded execution on an Intel Xeon processor.","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Reconfigurable Hardware Generation of Multigrid Solvers with Conjugate Gradient Coarse-Grid Solution\",\"authors\":\"Christian Schmitt, Moritz Schmid, S. Kuckuk, H. Köstler, Jürgen Teich, Frank Hannig\",\"doi\":\"10.1142/S0129626418500160\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Not only in the field of high-performance computing (HPC), field programmable gate arrays (FPGAs) are a soaringly popular accelerator technology. However, they use a completely different programming paradigm and tool set compared to central processing units (CPUs) or even graphics processing units (GPUs), adding extra development steps and requiring special knowledge, hindering widespread use in scientific computing. To bridge this programmability gap, domain-specific languages (DSLs) are a popular choice to generate low-level implementations from an abstract algorithm description. In this work, we demonstrate our approach for the generation of numerical solver implementations based on the multigrid method for FPGAs from the same code base that is also used to generate code for CPUs using a hybrid parallelization of MPI and OpenMP. Our approach yields in a hardware design that can compute up to 11 V-cycles per second with an input grid size of 4096[Formula: see text]4096 and solution on the coarsest using the conjugate gradient (CG) method on a mid-range FPGA, beating vectorized, multi-threaded execution on an Intel Xeon processor.\",\"PeriodicalId\":422436,\"journal\":{\"name\":\"Parallel Process. Lett.\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Parallel Process. Lett.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/S0129626418500160\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Parallel Process. Lett.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/S0129626418500160","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

不仅在高性能计算(HPC)领域，现场可编程门阵列(fpga)也是一种迅速流行的加速器技术。然而，与中央处理单元(cpu)甚至图形处理单元(gpu)相比，它们使用了完全不同的编程范式和工具集，增加了额外的开发步骤并需要专门的知识，阻碍了科学计算的广泛应用。为了弥补这种可编程性差距，领域特定语言(dsl)是一种流行的选择，用于从抽象算法描述生成低级实现。在这项工作中，我们展示了基于fpga的多网格方法从相同的代码库生成数值求解器实现的方法，该代码库也用于使用MPI和OpenMP的混合并行化为cpu生成代码。我们的方法产生了一种硬件设计，可以在输入网格大小为4096的情况下每秒计算多达11个v周期，并且在中程FPGA上使用共轭梯度(CG)方法进行最粗略的解决方案，胜过在英特尔至强处理器上的矢量化多线程执行。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Reconfigurable Hardware Generation of Multigrid Solvers with Conjugate Gradient Coarse-Grid Solution

Not only in the field of high-performance computing (HPC), field programmable gate arrays (FPGAs) are a soaringly popular accelerator technology. However, they use a completely different programming paradigm and tool set compared to central processing units (CPUs) or even graphics processing units (GPUs), adding extra development steps and requiring special knowledge, hindering widespread use in scientific computing. To bridge this programmability gap, domain-specific languages (DSLs) are a popular choice to generate low-level implementations from an abstract algorithm description. In this work, we demonstrate our approach for the generation of numerical solver implementations based on the multigrid method for FPGAs from the same code base that is also used to generate code for CPUs using a hybrid parallelization of MPI and OpenMP. Our approach yields in a hardware design that can compute up to 11 V-cycles per second with an input grid size of 4096[Formula: see text]4096 and solution on the coarsest using the conjugate gradient (CG) method on a mid-range FPGA, beating vectorized, multi-threaded execution on an Intel Xeon processor.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助