STS-k: a multilevel sparse triangular solution scheme for NUMA multicores

SC15: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2015-11-15 DOI:10.1145/2807591.2807667

H. Kabir, J. Booth, G. Aupy, A. Benoit, Y. Robert, P. Raghavan

{"title":"STS-k: a multilevel sparse triangular solution scheme for NUMA multicores","authors":"H. Kabir, J. Booth, G. Aupy, A. Benoit, Y. Robert, P. Raghavan","doi":"10.1145/2807591.2807667","DOIUrl":null,"url":null,"abstract":"We consider techniques to improve the performance of parallel sparse triangular solution on non-uniform memory architecture multicores by extending earlier coloring and level set schemes for single-core multiprocessors. We develop STS-k, where k represents a small number of transformations for latency reduction from increased spatial and temporal locality of data accesses. We propose a graph model of data reuse to inform the development of STS-k and to prove that computing an optimal cost schedule is NP-complete. We observe significant speed-ups with STS-3 on 32-core Intel Westmere-Ex and 24-core AMD `MagnyCours' processors. Incremental gains solely from the 3-level transformations in STS-3 for a fixed ordering, correspond to reductions in execution times by factors of 1.4(Intel) and 1.5(AMD) for level sets and 2(Intel) and 2.2(AMD) for coloring. On average, execution times are reduced by a factor of 6(Intel) and 4(AMD) for STS-3 with coloring compared to a reference implementation using level sets.","PeriodicalId":117494,"journal":{"name":"SC15: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SC15: International Conference for High Performance Computing, Networking, Storage and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2807591.2807667","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

Abstract

We consider techniques to improve the performance of parallel sparse triangular solution on non-uniform memory architecture multicores by extending earlier coloring and level set schemes for single-core multiprocessors. We develop STS-k, where k represents a small number of transformations for latency reduction from increased spatial and temporal locality of data accesses. We propose a graph model of data reuse to inform the development of STS-k and to prove that computing an optimal cost schedule is NP-complete. We observe significant speed-ups with STS-3 on 32-core Intel Westmere-Ex and 24-core AMD `MagnyCours' processors. Incremental gains solely from the 3-level transformations in STS-3 for a fixed ordering, correspond to reductions in execution times by factors of 1.4(Intel) and 1.5(AMD) for level sets and 2(Intel) and 2.2(AMD) for coloring. On average, execution times are reduced by a factor of 6(Intel) and 4(AMD) for STS-3 with coloring compared to a reference implementation using level sets.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

STS-k: NUMA多核的多层稀疏三角解方案

通过扩展早期的单核多处理器的着色和水平集方案，研究了在非均匀存储架构多核上提高并行稀疏三角形解性能的技术。我们开发了STS-k，其中k表示从数据访问的空间和时间局域性增加中减少延迟的少量转换。我们提出了一个数据重用的图模型，为STS-k的开发提供了信息，并证明了计算最优成本计划是np完全的。我们观察到STS-3在32核英特尔Westmere-Ex和24核AMD“MagnyCours”处理器上的显著加速。对于固定的顺序，仅从STS-3中的3级转换中获得的增量收益对应于级别集的执行时间减少1.4(Intel)和1.5(AMD)，以及着色的2(Intel)和2.2(AMD)。平均而言，与使用级别集的参考实现相比，使用着色的STS-3的执行时间减少了6倍(Intel)和4倍(AMD)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

SC15: International Conference for High Performance Computing, Networking, Storage and Analysis

自引率

0.00%

发文量