Automatic Parallelism Management

IF 2.2 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Proceedings of the ACM on Programming Languages Pub Date : 2024-01-05 DOI:10.1145/3632880
Sam Westrick, M. Fluet, Mike Rainey, Umut A. Acar
{"title":"Automatic Parallelism Management","authors":"Sam Westrick, M. Fluet, Mike Rainey, Umut A. Acar","doi":"10.1145/3632880","DOIUrl":null,"url":null,"abstract":"On any modern computer architecture today, parallelism comes with a modest cost, born from the creation and management of threads or tasks. Today, programmers battle this cost by manually optimizing/tuning their codes to minimize the cost of parallelism without harming its benefit, performance. This is a difficult battle: programmers must reason about architectural constant factors hidden behind layers of software abstractions, including thread schedulers and memory managers, and their impact on performance, also at scale. In languages that support higher-order functions, the battle hardens: higher order functions can make it difficult, if not impossible, to reason about the cost and benefits of parallelism. Motivated by these challenges and the numerous advantages of high-level languages, we believe that it has become essential to manage parallelism automatically so as to minimize its cost and maximize its benefit. This is a challenging problem, even when considered on a case-by-case, application-specific basis. But if a solution were possible, then it could combine the many correctness benefits of high-level languages with performance by managing parallelism without the programmer effort needed to ensure performance. This paper proposes techniques for such automatic management of parallelism by combining static (compilation) and run-time techniques. Specifically, we consider the Parallel ML language with task parallelism, and describe a compiler pipeline that embeds \"potential parallelism\" directly into the call-stack and avoids the cost of task creation by default. We then pair this compilation pipeline with a run-time system that dynamically converts potential parallelism into actual parallel tasks. Together, the compiler and run-time system guarantee that the cost of parallelism remains low without losing its benefit. We prove that our techniques have no asymptotic impact on the work and span of parallel programs and thus preserve their asymptotic properties. We implement the proposed techniques by extending the MPL compiler for Parallel ML and show that it can eliminate the burden of manual optimization while delivering good practical performance.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"48 19","pages":"1118 - 1149"},"PeriodicalIF":2.2000,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM on Programming Languages","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3632880","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

On any modern computer architecture today, parallelism comes with a modest cost, born from the creation and management of threads or tasks. Today, programmers battle this cost by manually optimizing/tuning their codes to minimize the cost of parallelism without harming its benefit, performance. This is a difficult battle: programmers must reason about architectural constant factors hidden behind layers of software abstractions, including thread schedulers and memory managers, and their impact on performance, also at scale. In languages that support higher-order functions, the battle hardens: higher order functions can make it difficult, if not impossible, to reason about the cost and benefits of parallelism. Motivated by these challenges and the numerous advantages of high-level languages, we believe that it has become essential to manage parallelism automatically so as to minimize its cost and maximize its benefit. This is a challenging problem, even when considered on a case-by-case, application-specific basis. But if a solution were possible, then it could combine the many correctness benefits of high-level languages with performance by managing parallelism without the programmer effort needed to ensure performance. This paper proposes techniques for such automatic management of parallelism by combining static (compilation) and run-time techniques. Specifically, we consider the Parallel ML language with task parallelism, and describe a compiler pipeline that embeds "potential parallelism" directly into the call-stack and avoids the cost of task creation by default. We then pair this compilation pipeline with a run-time system that dynamically converts potential parallelism into actual parallel tasks. Together, the compiler and run-time system guarantee that the cost of parallelism remains low without losing its benefit. We prove that our techniques have no asymptotic impact on the work and span of parallel programs and thus preserve their asymptotic properties. We implement the proposed techniques by extending the MPL compiler for Parallel ML and show that it can eliminate the burden of manual optimization while delivering good practical performance.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
自动并行管理
在当今任何现代计算机体系结构中,并行性都会产生一定的成本,这来自于线程或任务的创建和管理。如今,程序员通过手动优化/调整代码来降低并行成本,同时又不损害并行的优势--性能。这是一场艰苦的战斗:程序员必须推理隐藏在层层软件抽象(包括线程调度器和内存管理器)背后的架构常量因素,以及它们对性能(同样是大规模性能)的影响。在支持高阶函数的语言中,这场战斗更加艰苦:高阶函数可能会使推理并行性的成本和收益变得困难,甚至不可能。在这些挑战和高级语言众多优势的激励下,我们认为,自动管理并行性以最小化其成本和最大化其收益已变得至关重要。这是一个极具挑战性的问题,即使是根据具体情况和特定应用来考虑也是如此。但是,如果有可能找到一种解决方案,那么它就可以通过管理并行性,将高级语言的许多正确性优势与性能结合起来,而无需程序员为确保性能而付出努力。本文结合静态(编译)和运行时技术,提出了自动管理并行性的技术。具体来说,我们考虑了具有任务并行性的 Parallel ML 语言,并描述了一种编译器流水线,该流水线可将 "潜在并行性 "直接嵌入调用堆栈,并在默认情况下避免任务创建的成本。然后,我们将该编译管道与运行时系统配对,后者可将潜在并行性动态转换为实际并行任务。编译器和运行时系统共同保证了并行的低成本,同时又不失其优势。我们证明,我们的技术对并行程序的工作和跨度没有渐进影响,因此保留了它们的渐进特性。我们通过为并行 ML 扩展 MPL 编译器来实现所提出的技术,并证明它可以消除手动优化的负担,同时提供良好的实用性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages Engineering-Safety, Risk, Reliability and Quality
CiteScore
5.20
自引率
22.20%
发文量
192
期刊最新文献
ReLU Hull Approximation An Axiomatic Basis for Computer Programming on the Relaxed Arm-A Architecture: The AxSL Logic The Essence of Generalized Algebraic Data Types Explicit Effects and Effect Constraints in ReML Indexed Types for a Statically Safe WebAssembly
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1