Distributing Simplex-Shaped Nested for-Loops to Identify Carcinogenic Gene Combinations

2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2023-05-01 DOI:10.1109/IPDPS54959.2023.00101

Sajal Dash, Mohammad Alaul Haque Monil, Junqi Yin, R. Anandakrishnan, Feiyi Wang

{"title":"Distributing Simplex-Shaped Nested for-Loops to Identify Carcinogenic Gene Combinations","authors":"Sajal Dash, Mohammad Alaul Haque Monil, Junqi Yin, R. Anandakrishnan, Feiyi Wang","doi":"10.1109/IPDPS54959.2023.00101","DOIUrl":null,"url":null,"abstract":"Cancer is a leading cause of death in the US, and it results from a combination of two-nine genetic mutations. Identifying five-hit combinations responsible for several cancer types is computationally intractable even with the fastest super-computers in the USA. Iterating through nested loops required by the process presents a simplex-shaped workload with irregular memory access patterns. Distributing this workload efficiently across thousands of GPUs offers a challenge in dividing simplex-shaped (triangular/tetrahedral) workload into similar shapes with equal volume. Irregular memory access patterns create imbalanced compute utilization across nodes. We developed a generalized solution for distributing a simplex-shaped workload by partially coalescing the nested for-loops, minimizing the memory access overhead by efficiently utilizing limited shared memory, a dynamic scheduler, and loop tiling. For 4-hit combinations, we achieved a 90% − 100% strong scaling efficiency for up to 3594 V100 GPUs on the Summit supercomputer. Finally, we designed and implemented a distributed algorithm to identify 5-hit combinations for four different cancer types, and the identified combinations can differentiate between cancer and normal samples with 86.59−88.79% precision and 84.42 − 90.91% recall. We also demonstrated the robustness of our solution by porting the code to another leadership class computing platform Crusher, a testbed for the fastest supercomputer Frontier. On Crusher, we achieved 98% strong scaling efficiency on 50 nodes (400 AMD MI250X GCDs) and demonstrated the computational readiness of Frontier for scientific applications.","PeriodicalId":343684,"journal":{"name":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS54959.2023.00101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Cancer is a leading cause of death in the US, and it results from a combination of two-nine genetic mutations. Identifying five-hit combinations responsible for several cancer types is computationally intractable even with the fastest super-computers in the USA. Iterating through nested loops required by the process presents a simplex-shaped workload with irregular memory access patterns. Distributing this workload efficiently across thousands of GPUs offers a challenge in dividing simplex-shaped (triangular/tetrahedral) workload into similar shapes with equal volume. Irregular memory access patterns create imbalanced compute utilization across nodes. We developed a generalized solution for distributing a simplex-shaped workload by partially coalescing the nested for-loops, minimizing the memory access overhead by efficiently utilizing limited shared memory, a dynamic scheduler, and loop tiling. For 4-hit combinations, we achieved a 90% − 100% strong scaling efficiency for up to 3594 V100 GPUs on the Summit supercomputer. Finally, we designed and implemented a distributed algorithm to identify 5-hit combinations for four different cancer types, and the identified combinations can differentiate between cancer and normal samples with 86.59−88.79% precision and 84.42 − 90.91% recall. We also demonstrated the robustness of our solution by porting the code to another leadership class computing platform Crusher, a testbed for the fastest supercomputer Frontier. On Crusher, we achieved 98% strong scaling efficiency on 50 nodes (400 AMD MI250X GCDs) and demonstrated the computational readiness of Frontier for scientific applications.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

分布简单形嵌套for- loop以识别致癌基因组合

在美国，癌症是导致死亡的主要原因之一，它是由29种基因突变共同导致的。即使使用美国最快的超级计算机，确定导致几种癌症类型的五击组合在计算上也是困难的。通过进程所需的嵌套循环进行迭代，呈现出具有不规则内存访问模式的简单型工作负载。将这种工作负载高效地分布到数千个gpu上，这对将简单形状(三角形/四面体)工作负载划分为具有相同体积的类似形状提出了挑战。不规则的内存访问模式导致节点间计算利用率不平衡。我们开发了一种通用的解决方案，通过部分地合并嵌套的for循环来分发简单形状的工作负载，通过有效地利用有限的共享内存、动态调度器和循环平铺来最小化内存访问开销。对于4击组合，我们在Summit超级计算机上实现了高达3594个V100 gpu的90% - 100%的强大扩展效率。最后，我们设计并实现了一种分布式算法来识别4种不同癌症类型的5命中组合，识别出的组合可以区分癌症和正常样本，准确率为86.59 ~ 88.79%，召回率为84.42 ~ 90.91%。我们还通过将代码移植到另一个领先级计算平台Crusher(最快的超级计算机Frontier的测试平台)来展示我们解决方案的健壮性。在Crusher上，我们在50个节点(400 AMD MI250X gcd)上实现了98%的强大缩放效率，并展示了Frontier在科学应用中的计算就绪性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量