分布简单形嵌套for- loop以识别致癌基因组合

2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2023-05-01 DOI:10.1109/IPDPS54959.2023.00101

Sajal Dash, Mohammad Alaul Haque Monil, Junqi Yin, R. Anandakrishnan, Feiyi Wang

{"title":"分布简单形嵌套for- loop以识别致癌基因组合","authors":"Sajal Dash, Mohammad Alaul Haque Monil, Junqi Yin, R. Anandakrishnan, Feiyi Wang","doi":"10.1109/IPDPS54959.2023.00101","DOIUrl":null,"url":null,"abstract":"Cancer is a leading cause of death in the US, and it results from a combination of two-nine genetic mutations. Identifying five-hit combinations responsible for several cancer types is computationally intractable even with the fastest super-computers in the USA. Iterating through nested loops required by the process presents a simplex-shaped workload with irregular memory access patterns. Distributing this workload efficiently across thousands of GPUs offers a challenge in dividing simplex-shaped (triangular/tetrahedral) workload into similar shapes with equal volume. Irregular memory access patterns create imbalanced compute utilization across nodes. We developed a generalized solution for distributing a simplex-shaped workload by partially coalescing the nested for-loops, minimizing the memory access overhead by efficiently utilizing limited shared memory, a dynamic scheduler, and loop tiling. For 4-hit combinations, we achieved a 90% − 100% strong scaling efficiency for up to 3594 V100 GPUs on the Summit supercomputer. Finally, we designed and implemented a distributed algorithm to identify 5-hit combinations for four different cancer types, and the identified combinations can differentiate between cancer and normal samples with 86.59−88.79% precision and 84.42 − 90.91% recall. We also demonstrated the robustness of our solution by porting the code to another leadership class computing platform Crusher, a testbed for the fastest supercomputer Frontier. On Crusher, we achieved 98% strong scaling efficiency on 50 nodes (400 AMD MI250X GCDs) and demonstrated the computational readiness of Frontier for scientific applications.","PeriodicalId":343684,"journal":{"name":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Distributing Simplex-Shaped Nested for-Loops to Identify Carcinogenic Gene Combinations\",\"authors\":\"Sajal Dash, Mohammad Alaul Haque Monil, Junqi Yin, R. Anandakrishnan, Feiyi Wang\",\"doi\":\"10.1109/IPDPS54959.2023.00101\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cancer is a leading cause of death in the US, and it results from a combination of two-nine genetic mutations. Identifying five-hit combinations responsible for several cancer types is computationally intractable even with the fastest super-computers in the USA. Iterating through nested loops required by the process presents a simplex-shaped workload with irregular memory access patterns. Distributing this workload efficiently across thousands of GPUs offers a challenge in dividing simplex-shaped (triangular/tetrahedral) workload into similar shapes with equal volume. Irregular memory access patterns create imbalanced compute utilization across nodes. We developed a generalized solution for distributing a simplex-shaped workload by partially coalescing the nested for-loops, minimizing the memory access overhead by efficiently utilizing limited shared memory, a dynamic scheduler, and loop tiling. For 4-hit combinations, we achieved a 90% − 100% strong scaling efficiency for up to 3594 V100 GPUs on the Summit supercomputer. Finally, we designed and implemented a distributed algorithm to identify 5-hit combinations for four different cancer types, and the identified combinations can differentiate between cancer and normal samples with 86.59−88.79% precision and 84.42 − 90.91% recall. We also demonstrated the robustness of our solution by porting the code to another leadership class computing platform Crusher, a testbed for the fastest supercomputer Frontier. On Crusher, we achieved 98% strong scaling efficiency on 50 nodes (400 AMD MI250X GCDs) and demonstrated the computational readiness of Frontier for scientific applications.\",\"PeriodicalId\":343684,\"journal\":{\"name\":\"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPS54959.2023.00101\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS54959.2023.00101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在美国，癌症是导致死亡的主要原因之一，它是由29种基因突变共同导致的。即使使用美国最快的超级计算机，确定导致几种癌症类型的五击组合在计算上也是困难的。通过进程所需的嵌套循环进行迭代，呈现出具有不规则内存访问模式的简单型工作负载。将这种工作负载高效地分布到数千个gpu上，这对将简单形状(三角形/四面体)工作负载划分为具有相同体积的类似形状提出了挑战。不规则的内存访问模式导致节点间计算利用率不平衡。我们开发了一种通用的解决方案，通过部分地合并嵌套的for循环来分发简单形状的工作负载，通过有效地利用有限的共享内存、动态调度器和循环平铺来最小化内存访问开销。对于4击组合，我们在Summit超级计算机上实现了高达3594个V100 gpu的90% - 100%的强大扩展效率。最后，我们设计并实现了一种分布式算法来识别4种不同癌症类型的5命中组合，识别出的组合可以区分癌症和正常样本，准确率为86.59 ~ 88.79%，召回率为84.42 ~ 90.91%。我们还通过将代码移植到另一个领先级计算平台Crusher(最快的超级计算机Frontier的测试平台)来展示我们解决方案的健壮性。在Crusher上，我们在50个节点(400 AMD MI250X gcd)上实现了98%的强大缩放效率，并展示了Frontier在科学应用中的计算就绪性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Distributing Simplex-Shaped Nested for-Loops to Identify Carcinogenic Gene Combinations

Cancer is a leading cause of death in the US, and it results from a combination of two-nine genetic mutations. Identifying five-hit combinations responsible for several cancer types is computationally intractable even with the fastest super-computers in the USA. Iterating through nested loops required by the process presents a simplex-shaped workload with irregular memory access patterns. Distributing this workload efficiently across thousands of GPUs offers a challenge in dividing simplex-shaped (triangular/tetrahedral) workload into similar shapes with equal volume. Irregular memory access patterns create imbalanced compute utilization across nodes. We developed a generalized solution for distributing a simplex-shaped workload by partially coalescing the nested for-loops, minimizing the memory access overhead by efficiently utilizing limited shared memory, a dynamic scheduler, and loop tiling. For 4-hit combinations, we achieved a 90% − 100% strong scaling efficiency for up to 3594 V100 GPUs on the Summit supercomputer. Finally, we designed and implemented a distributed algorithm to identify 5-hit combinations for four different cancer types, and the identified combinations can differentiate between cancer and normal samples with 86.59−88.79% precision and 84.42 − 90.91% recall. We also demonstrated the robustness of our solution by porting the code to another leadership class computing platform Crusher, a testbed for the fastest supercomputer Frontier. On Crusher, we achieved 98% strong scaling efficiency on 50 nodes (400 AMD MI250X GCDs) and demonstrated the computational readiness of Frontier for scientific applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量