Nan Hu , Yutong Lu , Zhuo Tang , Zhiyong Liu , Dan Huang , Zhiguang Chen
{"title":"Topo: Towards a fine-grained topological data processing framework on Tianhe-3 supercomputer","authors":"Nan Hu , Yutong Lu , Zhuo Tang , Zhiyong Liu , Dan Huang , Zhiguang Chen","doi":"10.1016/j.jpdc.2024.104926","DOIUrl":null,"url":null,"abstract":"<div><p>Big data frameworks are widely deployed in supercomputers for analyzing large-scale datasets. Topological data processing is an emerging approach that focuses on analyzing the topological structures in high-dimensional scientific data. However, incorporating topological data processing into current big data frameworks presents three main challenges: (1) The frequent data exchange poses challenges to the traditional coarse-grained parallelism. (2) The spatial topology makes parallel programming harder using oversimplified MapReduce APIs. (3) The massive intermediate data and NUMA architecture hinder resource utilization and scalability on novel supercomputers and many-core processors.</p><p>In this paper, we present Topo, a generic distributed framework that enhances topological data processing on many-core supercomputers. Topo relies on three concepts. (1) It employs fine-grained parallelism, with awareness of topological structures in datasets, to support interactions among collaborative workers before each shuffle phase. (2) It provides intuitive APIs for topological data operations. (3) It implements efficient collective I/O and NUMA-aware dynamic task scheduling to improve multi-threading and load balancing. We evaluate Topo's performance on the Tianhe-3 supercomputer, which utilizes state-of-the-art ARM many-core processors. Experimental results of execution time show that compared to popular frameworks, Topo achieves an average speedup of 5.3× and 6.3×, with a maximum speedup of 8.4× and 20×, on HPC workloads and big data benchmarks, respectively. Topo further reduces total execution time on processing skewed datasets by 41%.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Parallel and Distributed Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S074373152400090X","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Big data frameworks are widely deployed in supercomputers for analyzing large-scale datasets. Topological data processing is an emerging approach that focuses on analyzing the topological structures in high-dimensional scientific data. However, incorporating topological data processing into current big data frameworks presents three main challenges: (1) The frequent data exchange poses challenges to the traditional coarse-grained parallelism. (2) The spatial topology makes parallel programming harder using oversimplified MapReduce APIs. (3) The massive intermediate data and NUMA architecture hinder resource utilization and scalability on novel supercomputers and many-core processors.
In this paper, we present Topo, a generic distributed framework that enhances topological data processing on many-core supercomputers. Topo relies on three concepts. (1) It employs fine-grained parallelism, with awareness of topological structures in datasets, to support interactions among collaborative workers before each shuffle phase. (2) It provides intuitive APIs for topological data operations. (3) It implements efficient collective I/O and NUMA-aware dynamic task scheduling to improve multi-threading and load balancing. We evaluate Topo's performance on the Tianhe-3 supercomputer, which utilizes state-of-the-art ARM many-core processors. Experimental results of execution time show that compared to popular frameworks, Topo achieves an average speedup of 5.3× and 6.3×, with a maximum speedup of 8.4× and 20×, on HPC workloads and big data benchmarks, respectively. Topo further reduces total execution time on processing skewed datasets by 41%.
大数据框架被广泛部署在超级计算机中,用于分析大规模数据集。拓扑数据处理是一种新兴方法,主要用于分析高维科学数据中的拓扑结构。然而,将拓扑数据处理纳入当前的大数据框架面临三大挑战:(1)频繁的数据交换对传统的粗粒度并行性提出了挑战。(2) 空间拓扑使得使用过于简化的 MapReduce API 进行并行编程变得更加困难。(3) 海量中间数据和 NUMA 架构阻碍了新型超级计算机和多核处理器的资源利用率和可扩展性。Topo 依赖于三个概念。(1)它采用细粒度并行技术,并能感知数据集中的拓扑结构,从而在每个洗牌阶段之前支持协作工作者之间的互动。(2) 为拓扑数据操作提供直观的应用程序接口。(3) 实现高效的集体 I/O 和 NUMA 感知动态任务调度,以改进多线程和负载平衡。我们在使用最先进 ARM 多核处理器的天河-3 超级计算机上评估了 Topo 的性能。执行时间的实验结果表明,与流行的框架相比,Topo在高性能计算工作负载和大数据基准上的平均速度分别提高了5.3倍和6.3倍,最大速度提高了8.4倍和20倍。在处理倾斜数据集方面,Topo进一步将总执行时间缩短了41%。
期刊介绍:
This international journal is directed to researchers, engineers, educators, managers, programmers, and users of computers who have particular interests in parallel processing and/or distributed computing.
The Journal of Parallel and Distributed Computing publishes original research papers and timely review articles on the theory, design, evaluation, and use of parallel and/or distributed computing systems. The journal also features special issues on these topics; again covering the full range from the design to the use of our targeted systems.