Employing Artificial Intelligence to Steer Exascale Workflows with Colmena

Logan Ward, J. Gregory Pauloski, Valerie Hayot-Sasson, Yadu Babuji, Alexander Brace, Ryan Chard, Kyle Chard, Rajeev Thakur, Ian Foster
{"title":"Employing Artificial Intelligence to Steer Exascale Workflows with Colmena","authors":"Logan Ward, J. Gregory Pauloski, Valerie Hayot-Sasson, Yadu Babuji, Alexander Brace, Ryan Chard, Kyle Chard, Rajeev Thakur, Ian Foster","doi":"arxiv-2408.14434","DOIUrl":null,"url":null,"abstract":"Computational workflows are a common class of application on supercomputers,\nyet the loosely coupled and heterogeneous nature of workflows often fails to\ntake full advantage of their capabilities. We created Colmena to leverage the\nmassive parallelism of a supercomputer by using Artificial Intelligence (AI) to\nlearn from and adapt a workflow as it executes. Colmena allows scientists to\ndefine how their application should respond to events (e.g., task completion)\nas a series of cooperative agents. In this paper, we describe the design of\nColmena, the challenges we overcame while deploying applications on exascale\nsystems, and the science workflows we have enhanced through interweaving AI.\nThe scaling challenges we discuss include developing steering strategies that\nmaximize node utilization, introducing data fabrics that reduce communication\noverhead of data-intensive tasks, and implementing workflow tasks that cache\ncostly operations between invocations. These innovations coupled with a variety\nof application patterns accessible through our agent-based steering model have\nenabled science advances in chemistry, biophysics, and materials science using\ndifferent types of AI. Our vision is that Colmena will spur creative solutions\nthat harness AI across many domains of scientific computing.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.14434","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Computational workflows are a common class of application on supercomputers, yet the loosely coupled and heterogeneous nature of workflows often fails to take full advantage of their capabilities. We created Colmena to leverage the massive parallelism of a supercomputer by using Artificial Intelligence (AI) to learn from and adapt a workflow as it executes. Colmena allows scientists to define how their application should respond to events (e.g., task completion) as a series of cooperative agents. In this paper, we describe the design of Colmena, the challenges we overcame while deploying applications on exascale systems, and the science workflows we have enhanced through interweaving AI. The scaling challenges we discuss include developing steering strategies that maximize node utilization, introducing data fabrics that reduce communication overhead of data-intensive tasks, and implementing workflow tasks that cache costly operations between invocations. These innovations coupled with a variety of application patterns accessible through our agent-based steering model have enabled science advances in chemistry, biophysics, and materials science using different types of AI. Our vision is that Colmena will spur creative solutions that harness AI across many domains of scientific computing.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用人工智能引导 Colmena 的超大规模工作流
计算工作流是超级计算机上常见的一类应用,但工作流的松散耦合和异构特性往往无法充分利用其能力。我们创建了 Colmena,利用人工智能(AI)来学习和调整工作流,从而充分利用超级计算机的大规模并行性。Colmena 允许科学家定义他们的应用程序应如何作为一系列合作代理对事件(如任务完成)做出响应。在本文中,我们将介绍Colmena的设计、我们在超大规模系统上部署应用时克服的挑战,以及我们通过人工智能交织增强的科学工作流。我们讨论的扩展挑战包括开发可最大限度提高节点利用率的转向策略、引入可减少数据密集型任务通信开销的数据结构,以及实施可在调用之间缓存高成本操作的工作流任务。这些创新加上我们基于代理的转向模型所提供的各种应用模式,使得化学、生物物理和材料科学领域利用不同类型的人工智能取得了科学进步。我们的愿景是,Colmena 将推动在科学计算的多个领域利用人工智能的创造性解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Massively parallel CMA-ES with increasing population Communication Lower Bounds and Optimal Algorithms for Symmetric Matrix Computations Energy Efficiency Support for Software Defined Networks: a Serverless Computing Approach CountChain: A Decentralized Oracle Network for Counting Systems Delay Analysis of EIP-4844
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1