Shuai Che, Gregory P. Rodgers, Bradford M. Beckmann, S. Reinhardt
{"title":"Graph Coloring on the GPU and Some Techniques to Improve Load Imbalance","authors":"Shuai Che, Gregory P. Rodgers, Bradford M. Beckmann, S. Reinhardt","doi":"10.1109/IPDPSW.2015.74","DOIUrl":null,"url":null,"abstract":"Graphics processing units (GPUs) have been increasingly used to accelerate irregular applications such as graph and sparse-matrix computation. Graph coloring is a key building block for many graph applications. The first step of many graph applications is graph coloring/partitioning to obtain sets of independent vertices for subsequent parallel computations. However, parallelization and optimization of coloring for GPUs have been a challenge for programmers. This paper studies approaches to implementing graph coloring on a GPU and characterizes their program behaviors with different graph structures. We also investigate load imbalance, which can be the main cause for performance bottlenecks. We evaluate the effectiveness of different optimization techniques, including the use of work stealing and the design of a hybrid algorithm. We are able to improve graph coloring performance by approximately 25% compared to a baseline GPU implementation on an AMD Radeon HD 7950 GPU. We also analyze some important factors affecting performance.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2015.74","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
Graphics processing units (GPUs) have been increasingly used to accelerate irregular applications such as graph and sparse-matrix computation. Graph coloring is a key building block for many graph applications. The first step of many graph applications is graph coloring/partitioning to obtain sets of independent vertices for subsequent parallel computations. However, parallelization and optimization of coloring for GPUs have been a challenge for programmers. This paper studies approaches to implementing graph coloring on a GPU and characterizes their program behaviors with different graph structures. We also investigate load imbalance, which can be the main cause for performance bottlenecks. We evaluate the effectiveness of different optimization techniques, including the use of work stealing and the design of a hybrid algorithm. We are able to improve graph coloring performance by approximately 25% compared to a baseline GPU implementation on an AMD Radeon HD 7950 GPU. We also analyze some important factors affecting performance.
图形处理单元(gpu)越来越多地用于加速图形和稀疏矩阵计算等不规则应用。图形着色是许多图形应用程序的关键组成部分。许多图形应用程序的第一步是图形着色/划分,以获得后续并行计算的独立顶点集。然而,gpu的并行化和着色优化一直是程序员面临的挑战。本文研究了在GPU上实现图着色的方法,并描述了它们在不同图结构下的程序行为。我们还研究了负载不平衡,这可能是导致性能瓶颈的主要原因。我们评估了不同优化技术的有效性,包括使用工作窃取和混合算法的设计。与AMD Radeon HD 7950 GPU上的基准GPU实现相比,我们能够将图形着色性能提高约25%。本文还分析了影响性能的一些重要因素。