SDS-Sort: Scalable Dynamic Skew-aware Parallel Sorting

Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing Pub Date : 2016-05-31 DOI:10.1145/2907294.2907300

Bin Dong, S. Byna, Kesheng Wu

{"title":"SDS-Sort: Scalable Dynamic Skew-aware Parallel Sorting","authors":"Bin Dong, S. Byna, Kesheng Wu","doi":"10.1145/2907294.2907300","DOIUrl":null,"url":null,"abstract":"Parallel sorting is an essential algorithm in large-scale data analytics using distributed memory systems. As the number of processes increases, existing parallel sorting algorithms could become inefficient because of the unbalanced workload. A common cause of load imbalance is the skewness of data, which is common in application data sets from physics, biology, earth and planetary sciences. In this work, we introduce a new scalable dynamic skew-aware parallel sorting algorithm, named SDS-Sort. It uses a skew-aware partition method to guarantee a tighter upper bound on the workload of each process. To improve load balance among parallel processes, existing algorithms usually add extra variables to the sorting key, which increase the time needed to complete the sorting operation. SDS-Sort allows a user to select any sorting key without sacrificing performance. SDS-Sort also provides optimizations, including adaptive local merging, overlapping of data exchange and data processing, and dynamic selection of data processing algorithms for different hardware configurations and for partially ordered data. SDS-Sort uses local-sampling based partitioning to further reduce its overhead. We tested SDS-Sort extensively on Edison, a Cray XC30 supercomputer. Timing measurements show that SDS-Sort can scale to 130K CPU cores and deliver a sorting throughput of 117TB/min. In tests with real application data from large science projects, SDS-Sort outperforms HykSort, a state-of-art parallel sorting algorithm, by 3.4X.","PeriodicalId":20515,"journal":{"name":"Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing","volume":"48 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2907294.2907300","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

Parallel sorting is an essential algorithm in large-scale data analytics using distributed memory systems. As the number of processes increases, existing parallel sorting algorithms could become inefficient because of the unbalanced workload. A common cause of load imbalance is the skewness of data, which is common in application data sets from physics, biology, earth and planetary sciences. In this work, we introduce a new scalable dynamic skew-aware parallel sorting algorithm, named SDS-Sort. It uses a skew-aware partition method to guarantee a tighter upper bound on the workload of each process. To improve load balance among parallel processes, existing algorithms usually add extra variables to the sorting key, which increase the time needed to complete the sorting operation. SDS-Sort allows a user to select any sorting key without sacrificing performance. SDS-Sort also provides optimizations, including adaptive local merging, overlapping of data exchange and data processing, and dynamic selection of data processing algorithms for different hardware configurations and for partially ordered data. SDS-Sort uses local-sampling based partitioning to further reduce its overhead. We tested SDS-Sort extensively on Edison, a Cray XC30 supercomputer. Timing measurements show that SDS-Sort can scale to 130K CPU cores and deliver a sorting throughput of 117TB/min. In tests with real application data from large science projects, SDS-Sort outperforms HykSort, a state-of-art parallel sorting algorithm, by 3.4X.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SDS-Sort:可扩展的动态倾斜感知并行排序

并行排序是使用分布式存储系统进行大规模数据分析的基本算法。随着进程数量的增加，由于工作负载的不平衡，现有的并行排序算法可能会变得效率低下。负载不平衡的一个常见原因是数据的偏度，这在物理、生物、地球和行星科学的应用程序数据集中很常见。在这项工作中，我们引入了一种新的可扩展的动态倾斜感知并行排序算法，称为SDS-Sort。它使用倾斜感知分区方法来保证每个进程的工作负载有一个更严格的上限。为了改善并行进程之间的负载平衡，现有算法通常在排序键中添加额外的变量，这增加了完成排序操作所需的时间。SDS-Sort允许用户在不牺牲性能的情况下选择任何排序键。SDS-Sort还提供了优化，包括自适应本地合并、数据交换和数据处理的重叠，以及针对不同硬件配置和部分有序数据动态选择数据处理算法。SDS-Sort使用基于本地采样的分区来进一步减少开销。我们在一台名为Edison的克雷XC30超级计算机上广泛测试了SDS-Sort。计时测量表明，SDS-Sort可以扩展到130K CPU内核，并提供117TB/min的排序吞吐量。在对大型科学项目的实际应用程序数据进行测试时，SDS-Sort的性能比HykSort(一种最先进的并行排序算法)高出3.4倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing

自引率

0.00%

发文量