Stefan Groth, D. Grünewald, J. Teich, Frank Hannig
{"title":"一个运行时系统的有限元方法在一个分区的全局地址空间","authors":"Stefan Groth, D. Grünewald, J. Teich, Frank Hannig","doi":"10.1145/3387902.3392628","DOIUrl":null,"url":null,"abstract":"With approaching exascale performance, applications in the domain of high-performance computing (HPC) have to scale to an ever-increasing amount of compute nodes. The Global Address Space Programming Interface (GASPI) communication API promises to handle this challenge by providing a highly flexible and efficient programming model in a partitioned global address space (PGAS). Suitable applications targeting supercomputers include the domain of mesh-based solvers for partial differential equations (PDEs) due to their high computational intensity. The implementation of such solvers is highly interdisciplinary, which therefore requires an abstraction of hardware-specific parallelization techniques from developing numerical algorithms. We present an open-source run-time system (RTS) that distributes and parallelizes device-agnostic kernels, which define algorithms on unstructured grids. We describe how the RTS abstracts common parts of iterative solvers and further explain how to parallelize and distribute these components. We further show the efficiency of our approach for several microbenchmarks and an implementation of the discontinuous Galerkin method (DGM). The results show that we can almost completely hide all synchronization overhead and that the RTS only imposes a small computational cost.","PeriodicalId":155089,"journal":{"name":"Proceedings of the 17th ACM International Conference on Computing Frontiers","volume":"132 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A runtime system for finite element methods in a partitioned global address space\",\"authors\":\"Stefan Groth, D. Grünewald, J. Teich, Frank Hannig\",\"doi\":\"10.1145/3387902.3392628\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With approaching exascale performance, applications in the domain of high-performance computing (HPC) have to scale to an ever-increasing amount of compute nodes. The Global Address Space Programming Interface (GASPI) communication API promises to handle this challenge by providing a highly flexible and efficient programming model in a partitioned global address space (PGAS). Suitable applications targeting supercomputers include the domain of mesh-based solvers for partial differential equations (PDEs) due to their high computational intensity. The implementation of such solvers is highly interdisciplinary, which therefore requires an abstraction of hardware-specific parallelization techniques from developing numerical algorithms. We present an open-source run-time system (RTS) that distributes and parallelizes device-agnostic kernels, which define algorithms on unstructured grids. We describe how the RTS abstracts common parts of iterative solvers and further explain how to parallelize and distribute these components. We further show the efficiency of our approach for several microbenchmarks and an implementation of the discontinuous Galerkin method (DGM). The results show that we can almost completely hide all synchronization overhead and that the RTS only imposes a small computational cost.\",\"PeriodicalId\":155089,\"journal\":{\"name\":\"Proceedings of the 17th ACM International Conference on Computing Frontiers\",\"volume\":\"132 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 17th ACM International Conference on Computing Frontiers\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3387902.3392628\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 17th ACM International Conference on Computing Frontiers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3387902.3392628","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A runtime system for finite element methods in a partitioned global address space
With approaching exascale performance, applications in the domain of high-performance computing (HPC) have to scale to an ever-increasing amount of compute nodes. The Global Address Space Programming Interface (GASPI) communication API promises to handle this challenge by providing a highly flexible and efficient programming model in a partitioned global address space (PGAS). Suitable applications targeting supercomputers include the domain of mesh-based solvers for partial differential equations (PDEs) due to their high computational intensity. The implementation of such solvers is highly interdisciplinary, which therefore requires an abstraction of hardware-specific parallelization techniques from developing numerical algorithms. We present an open-source run-time system (RTS) that distributes and parallelizes device-agnostic kernels, which define algorithms on unstructured grids. We describe how the RTS abstracts common parts of iterative solvers and further explain how to parallelize and distribute these components. We further show the efficiency of our approach for several microbenchmarks and an implementation of the discontinuous Galerkin method (DGM). The results show that we can almost completely hide all synchronization overhead and that the RTS only imposes a small computational cost.