稀疏网格组合技术的高度可扩展算法

2015 IEEE International Parallel and Distributed Processing Symposium Workshop Pub Date : 2015-05-25 DOI:10.1109/IPDPSW.2015.76

P. Strazdins, Md. Mohsin Ali, B. Harding

{"title":"稀疏网格组合技术的高度可扩展算法","authors":"P. Strazdins, Md. Mohsin Ali, B. Harding","doi":"10.1109/IPDPSW.2015.76","DOIUrl":null,"url":null,"abstract":"Many petascale and exascale scientific simulations involve the time evolution of systems modelled as Partial Differential Equations (PDEs). The sparse grid combination technique (SGCT) is a cost-effective method for solve time-evolving PDEs, especially for higher-dimensional problems. It consists of evolving PDE over a set of grids of differing resolution in each dimension, and then combining the results to approximate the solution of the PDE on a grid of high resolution in all dimensions. It can also be extended to support algorithmic-based fault-tolerance, which is also important for computations at this scale. In this paper, we present two new parallel algorithms for the SGCT that supports the full distributed memory parallelization over the dimensions of the component grids, as well as over the grids as well. The direct algorithm is so called because it directly implements a SGCT combination formula. The second algorithm converts each component grid into their hierarchical surpluses, and then uses the direct algorithm on each of the hierarchical surpluses. The conversion to/from the hierarchical surpluses is also an important algorithm in its own right. An analysis of both indicates the direct algorithm minimizes the number of messages, whereas the hierarchical surplus minimizes memory consumption and offers a reduction in bandwidth by a factor of 1 -- 2 -- d, where d is the dimensionality of the SGCT. However, this is offset by its incomplete parallelism and factor of two load imbalance in practical scenarios. Our analysis also indicates both are suitable in a bandwidth-limiting regime. Experimental results including the strong and weak scalability of the algorithms indicates that, for scenarios of practical interest, both are sufficiently scalable to support large-scale SGCT but the direct algorithm has generally better performance, to within a factor of 2. Hierarchical surplus formation is much less communication intensive, but shows less scalability with increasing core counts.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Highly Scalable Algorithms for the Sparse Grid Combination Technique\",\"authors\":\"P. Strazdins, Md. Mohsin Ali, B. Harding\",\"doi\":\"10.1109/IPDPSW.2015.76\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many petascale and exascale scientific simulations involve the time evolution of systems modelled as Partial Differential Equations (PDEs). The sparse grid combination technique (SGCT) is a cost-effective method for solve time-evolving PDEs, especially for higher-dimensional problems. It consists of evolving PDE over a set of grids of differing resolution in each dimension, and then combining the results to approximate the solution of the PDE on a grid of high resolution in all dimensions. It can also be extended to support algorithmic-based fault-tolerance, which is also important for computations at this scale. In this paper, we present two new parallel algorithms for the SGCT that supports the full distributed memory parallelization over the dimensions of the component grids, as well as over the grids as well. The direct algorithm is so called because it directly implements a SGCT combination formula. The second algorithm converts each component grid into their hierarchical surpluses, and then uses the direct algorithm on each of the hierarchical surpluses. The conversion to/from the hierarchical surpluses is also an important algorithm in its own right. An analysis of both indicates the direct algorithm minimizes the number of messages, whereas the hierarchical surplus minimizes memory consumption and offers a reduction in bandwidth by a factor of 1 -- 2 -- d, where d is the dimensionality of the SGCT. However, this is offset by its incomplete parallelism and factor of two load imbalance in practical scenarios. Our analysis also indicates both are suitable in a bandwidth-limiting regime. Experimental results including the strong and weak scalability of the algorithms indicates that, for scenarios of practical interest, both are sufficiently scalable to support large-scale SGCT but the direct algorithm has generally better performance, to within a factor of 2. Hierarchical surplus formation is much less communication intensive, but shows less scalability with increasing core counts.\",\"PeriodicalId\":340697,\"journal\":{\"name\":\"2015 IEEE International Parallel and Distributed Processing Symposium Workshop\",\"volume\":\"64 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-05-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE International Parallel and Distributed Processing Symposium Workshop\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW.2015.76\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2015.76","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

许多千万亿级和百亿亿级的科学模拟都涉及到用偏微分方程(PDEs)建模的系统的时间演化。稀疏网格组合技术(SGCT)是求解时间演化偏微分方程的一种经济有效的方法，尤其适用于高维问题。它包括在一组不同维度分辨率的网格上演化PDE，然后将结果组合在所有维度的高分辨率网格上近似求解PDE。它还可以扩展为支持基于算法的容错，这对于这种规模的计算也很重要。在本文中，我们提出了两种新的SGCT并行算法，它们支持组件网格维度上的完全分布式内存并行化，以及网格上的完全分布式内存并行化。直接算法之所以被称为直接算法，是因为它直接实现了SGCT组合公式。第二种算法将每个组件网格转换为其分层剩余，然后在每个分层剩余上使用直接算法。从层次盈余到层次盈余的转换本身也是一个重要的算法。对这两种算法的分析表明，直接算法最大限度地减少了消息数量，而分层剩余最小化了内存消耗，并提供了1 - 2 - d的带宽减少，其中d是SGCT的维度。但是，在实际应用中，由于其不完全并行性和双负载不平衡的因素，这些缺点被抵消了。我们的分析还表明，两者都适用于带宽限制制度。实验结果表明，对于实际场景，两种算法都具有足够的可扩展性来支持大规模的SGCT，但直接算法通常具有更好的性能，在2倍之内。分层盈余形成的通信密集程度要低得多，但随着核心数的增加，其可扩展性较差。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Highly Scalable Algorithms for the Sparse Grid Combination Technique

Many petascale and exascale scientific simulations involve the time evolution of systems modelled as Partial Differential Equations (PDEs). The sparse grid combination technique (SGCT) is a cost-effective method for solve time-evolving PDEs, especially for higher-dimensional problems. It consists of evolving PDE over a set of grids of differing resolution in each dimension, and then combining the results to approximate the solution of the PDE on a grid of high resolution in all dimensions. It can also be extended to support algorithmic-based fault-tolerance, which is also important for computations at this scale. In this paper, we present two new parallel algorithms for the SGCT that supports the full distributed memory parallelization over the dimensions of the component grids, as well as over the grids as well. The direct algorithm is so called because it directly implements a SGCT combination formula. The second algorithm converts each component grid into their hierarchical surpluses, and then uses the direct algorithm on each of the hierarchical surpluses. The conversion to/from the hierarchical surpluses is also an important algorithm in its own right. An analysis of both indicates the direct algorithm minimizes the number of messages, whereas the hierarchical surplus minimizes memory consumption and offers a reduction in bandwidth by a factor of 1 -- 2 -- d, where d is the dimensionality of the SGCT. However, this is offset by its incomplete parallelism and factor of two load imbalance in practical scenarios. Our analysis also indicates both are suitable in a bandwidth-limiting regime. Experimental results including the strong and weak scalability of the algorithms indicates that, for scenarios of practical interest, both are sufficiently scalable to support large-scale SGCT but the direct algorithm has generally better performance, to within a factor of 2. Hierarchical surplus formation is much less communication intensive, but shows less scalability with increasing core counts.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 IEEE International Parallel and Distributed Processing Symposium Workshop

自引率

0.00%

发文量

期刊最新文献

Accelerating Large-Scale Single-Source Shortest Path on FPGA Relocation-Aware Floorplanning for Partially-Reconfigurable FPGA-Based Systems iWAPT Introduction and Committees Computing the Pseudo-Inverse of a Graph's Laplacian Using GPUs Optimizing Defensive Investments in Energy-Based Cyber-Physical Systems