首页 > 最新文献

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)最新文献

英文 中文
INT Based Network-Aware Task Scheduling for Edge Computing 基于INT的边缘计算感知网络任务调度
Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00131
B. Shrestha, Richard Cziva, Engin Arslan
Edge computing promises low-latency computation for delay sensitive applications by processing data close to its source. Task scheduling in edge computing is however not immune to performance fluctuations as dynamic and unpredictable nature of network traffic can adversely affect the data transfer performance between end devices and edge servers. In this paper, we leverage In-band Network Telemetry (INT) to gather fine-grained, temporal statistics about network conditions and incorporate network-awareness into task scheduling for edge computing. Unlike legacy network monitoring techniques that collect port-level or flow-level statistics at the order of tens of seconds, INT offers highly accurate network visibility by capturing network telemetry at packet-level granularity, thereby presenting a unique opportunity to detect network congestion precisely. Our experimental analysis using various workload types and network congestion scenarios reveal that enhancing task scheduling of edge computing with high-precision network telemetry can lead up to 40% reduction in data transfer times and up to 30% reduction in total task execution times by favoring edge servers in uncongested (or mildly congested) sections of network when scheduling tasks.
边缘计算承诺通过在接近数据源的地方处理数据,为延迟敏感的应用程序提供低延迟计算。然而,边缘计算中的任务调度也不能避免性能波动,因为网络流量的动态性和不可预测性会对终端设备和边缘服务器之间的数据传输性能产生不利影响。在本文中,我们利用带内网络遥测(INT)来收集有关网络状况的细粒度、时间统计信息,并将网络感知纳入边缘计算的任务调度中。与传统的网络监控技术不同,传统的网络监控技术在几十秒的时间内收集端口级或流级统计数据,INT通过在数据包级粒度上捕获网络遥测数据,提供了高度精确的网络可见性,从而提供了精确检测网络拥塞的独特机会。我们使用各种工作负载类型和网络拥塞场景进行的实验分析表明,通过高精度网络遥测增强边缘计算的任务调度,在调度任务时,通过在非拥塞(或轻度拥塞)的网络部分中使用边缘服务器,可以减少多达40%的数据传输时间和多达30%的总任务执行时间。
{"title":"INT Based Network-Aware Task Scheduling for Edge Computing","authors":"B. Shrestha, Richard Cziva, Engin Arslan","doi":"10.1109/IPDPSW52791.2021.00131","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00131","url":null,"abstract":"Edge computing promises low-latency computation for delay sensitive applications by processing data close to its source. Task scheduling in edge computing is however not immune to performance fluctuations as dynamic and unpredictable nature of network traffic can adversely affect the data transfer performance between end devices and edge servers. In this paper, we leverage In-band Network Telemetry (INT) to gather fine-grained, temporal statistics about network conditions and incorporate network-awareness into task scheduling for edge computing. Unlike legacy network monitoring techniques that collect port-level or flow-level statistics at the order of tens of seconds, INT offers highly accurate network visibility by capturing network telemetry at packet-level granularity, thereby presenting a unique opportunity to detect network congestion precisely. Our experimental analysis using various workload types and network congestion scenarios reveal that enhancing task scheduling of edge computing with high-precision network telemetry can lead up to 40% reduction in data transfer times and up to 30% reduction in total task execution times by favoring edge servers in uncongested (or mildly congested) sections of network when scheduling tasks.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115690623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Message from the HIPS 2021 Workshop Co-Chairs HIPS 2021研讨会联合主席致辞
Pub Date : 2021-06-01 DOI: 10.1109/ipdpsw52791.2021.00063
{"title":"Message from the HIPS 2021 Workshop Co-Chairs","authors":"","doi":"10.1109/ipdpsw52791.2021.00063","DOIUrl":"https://doi.org/10.1109/ipdpsw52791.2021.00063","url":null,"abstract":"","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127152278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Parallel Multigrid Methods on Manycore Clusters with Double/Single Precision Computing 基于双/单精度计算的多核集群并行多网格方法
Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00114
K. Nakajima, T. Ogita, Masatoshi Kawai
The parallel multigrid method is expected to play an important role in scientific computing on exa-scale supercomputer systems for solving large-scale linear equations with sparse coefficient matrices. Because solving sparse linear systems is a very memory-bound process, efficient method for storage of coefficient matrices is a crucial issue. In the previous works, authors implemented sliced ELL method to parallel conjugate gradient solvers with multigrid preconditioning (MGCG) for the application on 3D groundwater flow through heterogeneous porous media (pGW3D-LVM), and excellent performance has been obtained on large-scale multicore/manycore clusters. In the present work, authors introduced SELL-C-σ with double/single precision computing to the MGCG solver, and evaluated the performance of the solver with OpenMP/MPI hybrid parallel programing models on the Oakforest-PACS (OLP) system at JCAHPC using up to 2,048 nodes of Intel Xeon Phi. Because SELL-C-σ is suitable for wide-SIMD architecture, such as Xeon Phi, improvement of the performance over the sliced ELL was more than 35% for double precision and more than 45% for single precision on OFP. Finally, accuracy verification was conducted based on the method proposed by authors for solving linear equations with sparse coefficient matrices with M-property.
并行多重网格法有望在超大规模超级计算机系统的科学计算中发挥重要作用,用于求解具有稀疏系数矩阵的大规模线性方程。由于求解稀疏线性系统是一个非常受内存限制的过程,因此系数矩阵的有效存储方法是一个关键问题。在之前的工作中,作者将切片ELL方法与多网格预处理(MGCG)并行共轭梯度求解器应用于非均质多孔介质三维地下水流动(pGW3D-LVM),并在大规模多核/多核集群上取得了优异的性能。在本工作中,作者将SELL-C-σ双/单精度计算引入到MGCG求解器中,并在JCAHPC的Oakforest-PACS (OLP)系统上使用多达2,048个Intel Xeon Phi节点,使用OpenMP/MPI混合并行编程模型评估了求解器的性能。由于SELL-C-σ适用于宽simd架构,例如Xeon Phi,因此在OFP上,双精度的性能比切片ELL提高35%以上,单精度的性能提高45%以上。最后,对本文提出的求解m -性质稀疏系数矩阵线性方程的方法进行了精度验证。
{"title":"Efficient Parallel Multigrid Methods on Manycore Clusters with Double/Single Precision Computing","authors":"K. Nakajima, T. Ogita, Masatoshi Kawai","doi":"10.1109/IPDPSW52791.2021.00114","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00114","url":null,"abstract":"The parallel multigrid method is expected to play an important role in scientific computing on exa-scale supercomputer systems for solving large-scale linear equations with sparse coefficient matrices. Because solving sparse linear systems is a very memory-bound process, efficient method for storage of coefficient matrices is a crucial issue. In the previous works, authors implemented sliced ELL method to parallel conjugate gradient solvers with multigrid preconditioning (MGCG) for the application on 3D groundwater flow through heterogeneous porous media (pGW3D-LVM), and excellent performance has been obtained on large-scale multicore/manycore clusters. In the present work, authors introduced SELL-C-σ with double/single precision computing to the MGCG solver, and evaluated the performance of the solver with OpenMP/MPI hybrid parallel programing models on the Oakforest-PACS (OLP) system at JCAHPC using up to 2,048 nodes of Intel Xeon Phi. Because SELL-C-σ is suitable for wide-SIMD architecture, such as Xeon Phi, improvement of the performance over the sliced ELL was more than 35% for double precision and more than 45% for single precision on OFP. Finally, accuracy verification was conducted based on the method proposed by authors for solving linear equations with sparse coefficient matrices with M-property.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127195388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Message from the HPS 2021 Workshop Chairs 来自HPS 2021工作坊主席的信息
Pub Date : 2021-06-01 DOI: 10.1109/ipdpsw52791.2021.00148
{"title":"Message from the HPS 2021 Workshop Chairs","authors":"","doi":"10.1109/ipdpsw52791.2021.00148","DOIUrl":"https://doi.org/10.1109/ipdpsw52791.2021.00148","url":null,"abstract":"","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126952554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Processor Selection Method based on Execution Time Estimation for Machine Learning Programs 基于执行时间估计的机器学习程序处理器选择方法
Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00116
Kou Murakami, K. Komatsu, Masayuki Sato, Hiroaki Kobayashi
In recent years, machine learning has become widespread. Since machine learning algorithms have become complex and the amount of data to be handled have become large, the execution times of machine learning programs have been increasing. Processors called accelerators can contribute to the execution of a machine learning program with a short time. However, the processors including the accelerators have different characteristics. Therefore, it is unclear whether existing machine learning programs are executed on the appropriate processor or not. This paper proposes a method for selecting a processor suitable for each machine learning program. In the proposed method, the selection is based on the estimation of the execution time of machine learning programs on each processor. The proposed method does not need to execute a target machine learning program in advance. From the experimental results, it is clarified that the proposed method can achieve up to 5.3 times faster execution than the original implementation by NumPy. These results prove that the proposed method can be used in a system that automatically selects the processor so that each machine learning program can be easily executed on the best processor.
近年来,机器学习已经普及。由于机器学习算法变得越来越复杂,需要处理的数据量也越来越大,机器学习程序的执行时间也越来越多。称为加速器的处理器可以在短时间内帮助执行机器学习程序。然而,包括加速器在内的处理器具有不同的特性。因此,目前尚不清楚现有的机器学习程序是否在适当的处理器上执行。本文提出了一种选择适合每个机器学习程序的处理器的方法。在该方法中,选择基于机器学习程序在每个处理器上的执行时间的估计。该方法不需要事先执行目标机器学习程序。实验结果表明,该方法的执行速度比原来的NumPy实现快5.3倍。结果表明,该方法可用于自动选择处理器的系统,使每个机器学习程序可以轻松地在最佳处理器上执行。
{"title":"A Processor Selection Method based on Execution Time Estimation for Machine Learning Programs","authors":"Kou Murakami, K. Komatsu, Masayuki Sato, Hiroaki Kobayashi","doi":"10.1109/IPDPSW52791.2021.00116","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00116","url":null,"abstract":"In recent years, machine learning has become widespread. Since machine learning algorithms have become complex and the amount of data to be handled have become large, the execution times of machine learning programs have been increasing. Processors called accelerators can contribute to the execution of a machine learning program with a short time. However, the processors including the accelerators have different characteristics. Therefore, it is unclear whether existing machine learning programs are executed on the appropriate processor or not. This paper proposes a method for selecting a processor suitable for each machine learning program. In the proposed method, the selection is based on the estimation of the execution time of machine learning programs on each processor. The proposed method does not need to execute a target machine learning program in advance. From the experimental results, it is clarified that the proposed method can achieve up to 5.3 times faster execution than the original implementation by NumPy. These results prove that the proposed method can be used in a system that automatically selects the processor so that each machine learning program can be easily executed on the best processor.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114593971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
FPGA Acceleration of Zstd Compression Algorithm Zstd压缩算法的FPGA加速
Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00035
Jianyu Chen, M.A.F.M. Daverveldt, Z. Al-Ars
With the continued increase in the amount of big data generated and stored in various application domains, such as high-frequency trading, compression techniques are becoming ever more important to reduce the requirements on communication bandwidth and storage capacity. Zstandard (Zstd) is emerging as an important compression algorithm for big data sets capable of achieving a good compression ratio but with a higher speed than comparable algorithms. In this paper, we introduce the architecture of a new hardware compression kernel for Zstd that allows the algorithm to be used for real-time compression of big data streams. In addition, we optimize the proposed architecture for the specific use case of streaming high-frequency trading data. The optimized kernel is implemented on a Xilinx Alveo U200 board. Our optimized implementation allows us to fit ten kernel blocks on one board, which is able to achieve a compression throughput of about 8.6GB/s and compression ratio of about 23.6%. The hardware implementation is open source and publicly available at https://github.com/ChenJianyunp/Hardware-Zstd-Compression-Unit.
随着各种应用领域(如高频交易)产生和存储的大数据量的不断增加,压缩技术对于降低对通信带宽和存储容量的要求变得越来越重要。Zstandard (Zstd)正在成为一种重要的大数据集压缩算法,它能够获得良好的压缩比,但比同类算法具有更高的速度。在本文中,我们介绍了一种新的Zstd硬件压缩内核的架构,该内核允许该算法用于大数据流的实时压缩。此外,我们针对流式高频交易数据的特定用例优化了所提出的架构。优化后的内核在Xilinx Alveo U200单板上实现。我们的优化实现允许我们在一块板上容纳10个内核块,这能够实现大约8.6GB/s的压缩吞吐量和大约23.6%的压缩比。硬件实现是开源的,可以在https://github.com/ChenJianyunp/Hardware-Zstd-Compression-Unit上公开获得。
{"title":"FPGA Acceleration of Zstd Compression Algorithm","authors":"Jianyu Chen, M.A.F.M. Daverveldt, Z. Al-Ars","doi":"10.1109/IPDPSW52791.2021.00035","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00035","url":null,"abstract":"With the continued increase in the amount of big data generated and stored in various application domains, such as high-frequency trading, compression techniques are becoming ever more important to reduce the requirements on communication bandwidth and storage capacity. Zstandard (Zstd) is emerging as an important compression algorithm for big data sets capable of achieving a good compression ratio but with a higher speed than comparable algorithms. In this paper, we introduce the architecture of a new hardware compression kernel for Zstd that allows the algorithm to be used for real-time compression of big data streams. In addition, we optimize the proposed architecture for the specific use case of streaming high-frequency trading data. The optimized kernel is implemented on a Xilinx Alveo U200 board. Our optimized implementation allows us to fit ten kernel blocks on one board, which is able to achieve a compression throughput of about 8.6GB/s and compression ratio of about 23.6%. The hardware implementation is open source and publicly available at https://github.com/ChenJianyunp/Hardware-Zstd-Compression-Unit.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116753781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
A Machine Learning Approach to Predict Timing Delays During FPGA Placement 一种预测FPGA放置时间延迟的机器学习方法
Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00026
T. Martin, G. Grewal, S. Areibi
Timing-driven placement tools for FPGAs rely on the availability of accurate delay estimates for nets in order to identify and optimize critical paths. In this paper, we propose a machine-learning framework for predicting net delay to reduce miscorrelation between placement and detailed-routing. Features relevant to timing delay are engineered based on characteristics of nets, available routing resources, and the behavior of the detailed router. Our results show an accuracy above 94%, and when integrated within an FPGA analytical placer Critical Path Delay (CPD) is improved by 10% on average compared to a static delay model.
时序驱动的fpga放置工具依赖于网络的准确延迟估计,以识别和优化关键路径。在本文中,我们提出了一个用于预测净延迟的机器学习框架,以减少放置和详细路由之间的不相关。与时间延迟相关的特征是基于网络的特征、可用的路由资源和详细路由器的行为来设计的。我们的结果表明,准确度高于94%,并且当集成在FPGA分析placer中时,与静态延迟模型相比,关键路径延迟(CPD)平均提高了10%。
{"title":"A Machine Learning Approach to Predict Timing Delays During FPGA Placement","authors":"T. Martin, G. Grewal, S. Areibi","doi":"10.1109/IPDPSW52791.2021.00026","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00026","url":null,"abstract":"Timing-driven placement tools for FPGAs rely on the availability of accurate delay estimates for nets in order to identify and optimize critical paths. In this paper, we propose a machine-learning framework for predicting net delay to reduce miscorrelation between placement and detailed-routing. Features relevant to timing delay are engineered based on characteristics of nets, available routing resources, and the behavior of the detailed router. Our results show an accuracy above 94%, and when integrated within an FPGA analytical placer Critical Path Delay (CPD) is improved by 10% on average compared to a static delay model.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130353229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
TurboBFS: GPU Based Breadth-First Search (BFS) Algorithms in the Language of Linear Algebra TurboBFS:基于GPU的宽度优先搜索(BFS)算法在线性代数语言
Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00084
Oswaldo Artiles, F. Saeed
Graphs that are used for modeling of human brain, omics data, or social networks are huge, and manual inspection of these graph is impossible. A popular, and fundamental, method used for making sense of these large graphs is the well-known Breadth-First Search (BFS) algorithm. However, BFS suffers from large computational cost especially for big graphs of interest. More recently, the use of Graphics processing units (GPU) has been promising, but challenging because of limited global memory of GPU’s, and irregular structures of real-world graphs. In this paper, we present a GPU based linear-algebraic formulation and implementation of BFS, called TurboBFS, that exhibits excellent scalability on unweighted, undirected or directed sparse graphs of arbitrary structure. We demonstrate that our algorithms obtain up to 40 GTEPs, and are on average 15.7x, 5.8x, and 1.8x faster than the other state-of-the-art algorithms implemented on the SuiteSparse:GraphBLAS, GraphBLAST, and gunrock libraries respectively. The codes to implement the algorithms proposed in this paper are available at https://github.com/pcdslab.
用于人脑、组学数据或社交网络建模的图形非常庞大,手工检查这些图形是不可能的。用于理解这些大型图的一个流行且基本的方法是众所周知的广度优先搜索(BFS)算法。然而,BFS的计算成本很高,特别是对于感兴趣的大图形。最近,图形处理单元(GPU)的使用前景很好,但由于GPU的全局内存有限,以及现实世界中图形的不规则结构,因此具有挑战性。在本文中,我们提出了一种基于GPU的线性代数公式和BFS的实现,称为TurboBFS,它在任意结构的无权、无向或有向稀疏图上表现出出色的可扩展性。我们证明了我们的算法获得了高达40 GTEPs,并且平均比在SuiteSparse:GraphBLAS、GraphBLAST和gunrock库上实现的其他最先进的算法分别快15.7倍、5.8倍和1.8倍。实现本文提出的算法的代码可在https://github.com/pcdslab上获得。
{"title":"TurboBFS: GPU Based Breadth-First Search (BFS) Algorithms in the Language of Linear Algebra","authors":"Oswaldo Artiles, F. Saeed","doi":"10.1109/IPDPSW52791.2021.00084","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00084","url":null,"abstract":"Graphs that are used for modeling of human brain, omics data, or social networks are huge, and manual inspection of these graph is impossible. A popular, and fundamental, method used for making sense of these large graphs is the well-known Breadth-First Search (BFS) algorithm. However, BFS suffers from large computational cost especially for big graphs of interest. More recently, the use of Graphics processing units (GPU) has been promising, but challenging because of limited global memory of GPU’s, and irregular structures of real-world graphs. In this paper, we present a GPU based linear-algebraic formulation and implementation of BFS, called TurboBFS, that exhibits excellent scalability on unweighted, undirected or directed sparse graphs of arbitrary structure. We demonstrate that our algorithms obtain up to 40 GTEPs, and are on average 15.7x, 5.8x, and 1.8x faster than the other state-of-the-art algorithms implemented on the SuiteSparse:GraphBLAS, GraphBLAST, and gunrock libraries respectively. The codes to implement the algorithms proposed in this paper are available at https://github.com/pcdslab.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129098991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Load Balancing Schemes for Large Synthetic Population-Based Complex Simulators 基于大型合成种群的复杂模拟器负载平衡方案
Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00156
Bogdan Mucenic, Chaitanya Kaligotla, Abby Stevens, J. Ozik, Nicholson T. Collier, C. Macal
We present our development of load balancing algorithms to efficiently distribute and parallelize the running of large-scale complex agent-based modeling (ABM) simulators on High-Performance Computing (HPC) resources. Our algorithm is based on partitioning the co-location network that emerges from an ABM’s underlying synthetic population. Variations of this algorithm are experimentally applied to investigate algorithmic choices on two factors that affect run-time performance. We report these experiments’ results on the CityCOVID ABM, built to model the spread of COVID-19 in the Chicago metropolitan region.
我们提出了负载平衡算法的发展,以有效地分布和并行运行大规模复杂基于代理的建模(ABM)模拟器在高性能计算(HPC)资源。我们的算法基于从ABM的底层合成种群中产生的共定位网络的分区。该算法的变体被实验应用于研究影响运行时性能的两个因素的算法选择。我们在CityCOVID ABM上报告了这些实验结果,该模型旨在模拟COVID-19在芝加哥大都会地区的传播。
{"title":"Load Balancing Schemes for Large Synthetic Population-Based Complex Simulators","authors":"Bogdan Mucenic, Chaitanya Kaligotla, Abby Stevens, J. Ozik, Nicholson T. Collier, C. Macal","doi":"10.1109/IPDPSW52791.2021.00156","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00156","url":null,"abstract":"We present our development of load balancing algorithms to efficiently distribute and parallelize the running of large-scale complex agent-based modeling (ABM) simulators on High-Performance Computing (HPC) resources. Our algorithm is based on partitioning the co-location network that emerges from an ABM’s underlying synthetic population. Variations of this algorithm are experimentally applied to investigate algorithmic choices on two factors that affect run-time performance. We report these experiments’ results on the CityCOVID ABM, built to model the spread of COVID-19 in the Chicago metropolitan region.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114117475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Leveraging High Dimensional Spatial Graph Embedding as a Heuristic for Graph Algorithms 利用高维空间图嵌入作为图算法的启发式算法
Pub Date : 2021-06-01 DOI: 10.1109/IPDPSW52791.2021.00086
Peter Oostema, F. Franchetti
Spatial graph embedding is a technique for placing graphs in space used for visualization and graph analytics. The general goal is to place connected nodes close together while spreading apart all others. Previous work has looked at spatial graph embedding in 2 or 3 dimensions. These used high performance libraries and fast algorithms for N-body simulation. We expand into higher dimensions to find what it can be useful for. Using an arbitrary number of dimensions allows all unweighted graph to have exact edge lengths, as n nodes can all be one distance part in a n − 1 dimensional simplex. This increases the complexity of the simulation, so we provide an efficient GPU implementation in high dimensions. Although high dimensional embeddings cannot be easily visualized they find a consistent structure which can be used for graph analytics. Problems this has been used to solve are graph isomorphism and graph coloring.
空间图嵌入是一种在空间中放置图的技术,用于可视化和图分析。一般目标是将连接的节点紧密地放在一起,同时分散所有其他节点。之前的研究着眼于二维或三维空间图嵌入。它们使用高性能库和快速算法进行n体仿真。我们扩展到更高的维度去寻找它的用处。使用任意数量的维度允许所有未加权的图具有精确的边缘长度,因为n个节点都可以是n - 1维单纯形中的一个距离部分。这增加了模拟的复杂性,因此我们提供了一个高效的高维GPU实现。虽然高维嵌入不能很容易地可视化,但它们找到了一个一致的结构,可以用于图形分析。用它来解决的问题是图同构和图着色。
{"title":"Leveraging High Dimensional Spatial Graph Embedding as a Heuristic for Graph Algorithms","authors":"Peter Oostema, F. Franchetti","doi":"10.1109/IPDPSW52791.2021.00086","DOIUrl":"https://doi.org/10.1109/IPDPSW52791.2021.00086","url":null,"abstract":"Spatial graph embedding is a technique for placing graphs in space used for visualization and graph analytics. The general goal is to place connected nodes close together while spreading apart all others. Previous work has looked at spatial graph embedding in 2 or 3 dimensions. These used high performance libraries and fast algorithms for N-body simulation. We expand into higher dimensions to find what it can be useful for. Using an arbitrary number of dimensions allows all unweighted graph to have exact edge lengths, as n nodes can all be one distance part in a n − 1 dimensional simplex. This increases the complexity of the simulation, so we provide an efficient GPU implementation in high dimensions. Although high dimensional embeddings cannot be easily visualized they find a consistent structure which can be used for graph analytics. Problems this has been used to solve are graph isomorphism and graph coloring.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128158098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1