Wagner M. Nunan Zola, L. C. E. Bona, Fabiano Silva
{"title":"Fast GPU parallel N-Body tree traversal with Simulated Wide-Warp","authors":"Wagner M. Nunan Zola, L. C. E. Bona, Fabiano Silva","doi":"10.1109/PADSW.2014.7097874","DOIUrl":null,"url":null,"abstract":"The Barnes-Hut algorithm is a widely used approximation method for the N-Body simulation problem. The irregular nature of this tree walking code presents interesting challenges for its computation on parallel systems. Additional problems arise in effectively exploiting the processing capacity of GPU architectures. We propose and investigate the applicability of software Simulated Wide-Warps (SWW) in this context. To this extent, we explicitly deal with dynamic irregular patterns in data accesses with data remapping and data transformation, by controlling execution flow divergence of threads. We present a new compact data-structure for the tree layout, GPU parallel algorithms for tree transformation and parallel walking using SWW. Benefits of our techniques are in transposing the tree algorithm to execute regular patterns to match the GPU model. Our experiments show significant performance improvement over the best known GPU solutions to this algorithm.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"170 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PADSW.2014.7097874","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
The Barnes-Hut algorithm is a widely used approximation method for the N-Body simulation problem. The irregular nature of this tree walking code presents interesting challenges for its computation on parallel systems. Additional problems arise in effectively exploiting the processing capacity of GPU architectures. We propose and investigate the applicability of software Simulated Wide-Warps (SWW) in this context. To this extent, we explicitly deal with dynamic irregular patterns in data accesses with data remapping and data transformation, by controlling execution flow divergence of threads. We present a new compact data-structure for the tree layout, GPU parallel algorithms for tree transformation and parallel walking using SWW. Benefits of our techniques are in transposing the tree algorithm to execute regular patterns to match the GPU model. Our experiments show significant performance improvement over the best known GPU solutions to this algorithm.