{"title":"稀疏网格组合技术的高度可扩展算法","authors":"P. Strazdins, Md. Mohsin Ali, B. Harding","doi":"10.1109/IPDPSW.2015.76","DOIUrl":null,"url":null,"abstract":"Many petascale and exascale scientific simulations involve the time evolution of systems modelled as Partial Differential Equations (PDEs). The sparse grid combination technique (SGCT) is a cost-effective method for solve time-evolving PDEs, especially for higher-dimensional problems. It consists of evolving PDE over a set of grids of differing resolution in each dimension, and then combining the results to approximate the solution of the PDE on a grid of high resolution in all dimensions. It can also be extended to support algorithmic-based fault-tolerance, which is also important for computations at this scale. In this paper, we present two new parallel algorithms for the SGCT that supports the full distributed memory parallelization over the dimensions of the component grids, as well as over the grids as well. The direct algorithm is so called because it directly implements a SGCT combination formula. The second algorithm converts each component grid into their hierarchical surpluses, and then uses the direct algorithm on each of the hierarchical surpluses. The conversion to/from the hierarchical surpluses is also an important algorithm in its own right. An analysis of both indicates the direct algorithm minimizes the number of messages, whereas the hierarchical surplus minimizes memory consumption and offers a reduction in bandwidth by a factor of 1 -- 2 -- d, where d is the dimensionality of the SGCT. However, this is offset by its incomplete parallelism and factor of two load imbalance in practical scenarios. Our analysis also indicates both are suitable in a bandwidth-limiting regime. Experimental results including the strong and weak scalability of the algorithms indicates that, for scenarios of practical interest, both are sufficiently scalable to support large-scale SGCT but the direct algorithm has generally better performance, to within a factor of 2. Hierarchical surplus formation is much less communication intensive, but shows less scalability with increasing core counts.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Highly Scalable Algorithms for the Sparse Grid Combination Technique\",\"authors\":\"P. Strazdins, Md. Mohsin Ali, B. Harding\",\"doi\":\"10.1109/IPDPSW.2015.76\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many petascale and exascale scientific simulations involve the time evolution of systems modelled as Partial Differential Equations (PDEs). The sparse grid combination technique (SGCT) is a cost-effective method for solve time-evolving PDEs, especially for higher-dimensional problems. It consists of evolving PDE over a set of grids of differing resolution in each dimension, and then combining the results to approximate the solution of the PDE on a grid of high resolution in all dimensions. It can also be extended to support algorithmic-based fault-tolerance, which is also important for computations at this scale. In this paper, we present two new parallel algorithms for the SGCT that supports the full distributed memory parallelization over the dimensions of the component grids, as well as over the grids as well. The direct algorithm is so called because it directly implements a SGCT combination formula. The second algorithm converts each component grid into their hierarchical surpluses, and then uses the direct algorithm on each of the hierarchical surpluses. The conversion to/from the hierarchical surpluses is also an important algorithm in its own right. An analysis of both indicates the direct algorithm minimizes the number of messages, whereas the hierarchical surplus minimizes memory consumption and offers a reduction in bandwidth by a factor of 1 -- 2 -- d, where d is the dimensionality of the SGCT. However, this is offset by its incomplete parallelism and factor of two load imbalance in practical scenarios. Our analysis also indicates both are suitable in a bandwidth-limiting regime. Experimental results including the strong and weak scalability of the algorithms indicates that, for scenarios of practical interest, both are sufficiently scalable to support large-scale SGCT but the direct algorithm has generally better performance, to within a factor of 2. Hierarchical surplus formation is much less communication intensive, but shows less scalability with increasing core counts.\",\"PeriodicalId\":340697,\"journal\":{\"name\":\"2015 IEEE International Parallel and Distributed Processing Symposium Workshop\",\"volume\":\"64 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-05-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE International Parallel and Distributed Processing Symposium Workshop\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW.2015.76\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2015.76","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Highly Scalable Algorithms for the Sparse Grid Combination Technique
Many petascale and exascale scientific simulations involve the time evolution of systems modelled as Partial Differential Equations (PDEs). The sparse grid combination technique (SGCT) is a cost-effective method for solve time-evolving PDEs, especially for higher-dimensional problems. It consists of evolving PDE over a set of grids of differing resolution in each dimension, and then combining the results to approximate the solution of the PDE on a grid of high resolution in all dimensions. It can also be extended to support algorithmic-based fault-tolerance, which is also important for computations at this scale. In this paper, we present two new parallel algorithms for the SGCT that supports the full distributed memory parallelization over the dimensions of the component grids, as well as over the grids as well. The direct algorithm is so called because it directly implements a SGCT combination formula. The second algorithm converts each component grid into their hierarchical surpluses, and then uses the direct algorithm on each of the hierarchical surpluses. The conversion to/from the hierarchical surpluses is also an important algorithm in its own right. An analysis of both indicates the direct algorithm minimizes the number of messages, whereas the hierarchical surplus minimizes memory consumption and offers a reduction in bandwidth by a factor of 1 -- 2 -- d, where d is the dimensionality of the SGCT. However, this is offset by its incomplete parallelism and factor of two load imbalance in practical scenarios. Our analysis also indicates both are suitable in a bandwidth-limiting regime. Experimental results including the strong and weak scalability of the algorithms indicates that, for scenarios of practical interest, both are sufficiently scalable to support large-scale SGCT but the direct algorithm has generally better performance, to within a factor of 2. Hierarchical surplus formation is much less communication intensive, but shows less scalability with increasing core counts.