Carl Pearson, M. Almasri, Omer Anjum, Vikram Sharma Mailthody, Zaid Qureshi, R. Nagi, Jinjun Xiong, Wen-mei W. Hwu
{"title":"Update on Triangle Counting on GPU","authors":"Carl Pearson, M. Almasri, Omer Anjum, Vikram Sharma Mailthody, Zaid Qureshi, R. Nagi, Jinjun Xiong, Wen-mei W. Hwu","doi":"10.1109/HPEC.2019.8916547","DOIUrl":null,"url":null,"abstract":"This work presents an update to the triangle-counting portion of the subgraph isomorphism static graph challenge. This work is motivated by a desire to understand the impact of CUDA unified memory on the triangle-counting problem. First, CUDA unified memory is used to overlap reading large graph data from disk with graph data structures in GPU memory. Second, we use CUDA unified memory hints to solve multi-GPU performance scaling challenges present in our last submission. Finally, we improve the single-GPU kernel performance from our past submission by introducing a work-stealing dynamic algorithm GPU kernel with persistent threads, which makes performance adaptive for large graphs without requiring a graph analysis phase.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC.2019.8916547","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
This work presents an update to the triangle-counting portion of the subgraph isomorphism static graph challenge. This work is motivated by a desire to understand the impact of CUDA unified memory on the triangle-counting problem. First, CUDA unified memory is used to overlap reading large graph data from disk with graph data structures in GPU memory. Second, we use CUDA unified memory hints to solve multi-GPU performance scaling challenges present in our last submission. Finally, we improve the single-GPU kernel performance from our past submission by introducing a work-stealing dynamic algorithm GPU kernel with persistent threads, which makes performance adaptive for large graphs without requiring a graph analysis phase.