{"title":"Automatic Parallelism Management","authors":"Sam Westrick, M. Fluet, Mike Rainey, Umut A. Acar","doi":"10.1145/3632880","DOIUrl":null,"url":null,"abstract":"On any modern computer architecture today, parallelism comes with a modest cost, born from the creation and management of threads or tasks. Today, programmers battle this cost by manually optimizing/tuning their codes to minimize the cost of parallelism without harming its benefit, performance. This is a difficult battle: programmers must reason about architectural constant factors hidden behind layers of software abstractions, including thread schedulers and memory managers, and their impact on performance, also at scale. In languages that support higher-order functions, the battle hardens: higher order functions can make it difficult, if not impossible, to reason about the cost and benefits of parallelism. Motivated by these challenges and the numerous advantages of high-level languages, we believe that it has become essential to manage parallelism automatically so as to minimize its cost and maximize its benefit. This is a challenging problem, even when considered on a case-by-case, application-specific basis. But if a solution were possible, then it could combine the many correctness benefits of high-level languages with performance by managing parallelism without the programmer effort needed to ensure performance. This paper proposes techniques for such automatic management of parallelism by combining static (compilation) and run-time techniques. Specifically, we consider the Parallel ML language with task parallelism, and describe a compiler pipeline that embeds \"potential parallelism\" directly into the call-stack and avoids the cost of task creation by default. We then pair this compilation pipeline with a run-time system that dynamically converts potential parallelism into actual parallel tasks. Together, the compiler and run-time system guarantee that the cost of parallelism remains low without losing its benefit. We prove that our techniques have no asymptotic impact on the work and span of parallel programs and thus preserve their asymptotic properties. We implement the proposed techniques by extending the MPL compiler for Parallel ML and show that it can eliminate the burden of manual optimization while delivering good practical performance.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"48 19","pages":"1118 - 1149"},"PeriodicalIF":2.2000,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM on Programming Languages","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3632880","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
On any modern computer architecture today, parallelism comes with a modest cost, born from the creation and management of threads or tasks. Today, programmers battle this cost by manually optimizing/tuning their codes to minimize the cost of parallelism without harming its benefit, performance. This is a difficult battle: programmers must reason about architectural constant factors hidden behind layers of software abstractions, including thread schedulers and memory managers, and their impact on performance, also at scale. In languages that support higher-order functions, the battle hardens: higher order functions can make it difficult, if not impossible, to reason about the cost and benefits of parallelism. Motivated by these challenges and the numerous advantages of high-level languages, we believe that it has become essential to manage parallelism automatically so as to minimize its cost and maximize its benefit. This is a challenging problem, even when considered on a case-by-case, application-specific basis. But if a solution were possible, then it could combine the many correctness benefits of high-level languages with performance by managing parallelism without the programmer effort needed to ensure performance. This paper proposes techniques for such automatic management of parallelism by combining static (compilation) and run-time techniques. Specifically, we consider the Parallel ML language with task parallelism, and describe a compiler pipeline that embeds "potential parallelism" directly into the call-stack and avoids the cost of task creation by default. We then pair this compilation pipeline with a run-time system that dynamically converts potential parallelism into actual parallel tasks. Together, the compiler and run-time system guarantee that the cost of parallelism remains low without losing its benefit. We prove that our techniques have no asymptotic impact on the work and span of parallel programs and thus preserve their asymptotic properties. We implement the proposed techniques by extending the MPL compiler for Parallel ML and show that it can eliminate the burden of manual optimization while delivering good practical performance.
在当今任何现代计算机体系结构中,并行性都会产生一定的成本,这来自于线程或任务的创建和管理。如今,程序员通过手动优化/调整代码来降低并行成本,同时又不损害并行的优势--性能。这是一场艰苦的战斗:程序员必须推理隐藏在层层软件抽象(包括线程调度器和内存管理器)背后的架构常量因素,以及它们对性能(同样是大规模性能)的影响。在支持高阶函数的语言中,这场战斗更加艰苦:高阶函数可能会使推理并行性的成本和收益变得困难,甚至不可能。在这些挑战和高级语言众多优势的激励下,我们认为,自动管理并行性以最小化其成本和最大化其收益已变得至关重要。这是一个极具挑战性的问题,即使是根据具体情况和特定应用来考虑也是如此。但是,如果有可能找到一种解决方案,那么它就可以通过管理并行性,将高级语言的许多正确性优势与性能结合起来,而无需程序员为确保性能而付出努力。本文结合静态(编译)和运行时技术,提出了自动管理并行性的技术。具体来说,我们考虑了具有任务并行性的 Parallel ML 语言,并描述了一种编译器流水线,该流水线可将 "潜在并行性 "直接嵌入调用堆栈,并在默认情况下避免任务创建的成本。然后,我们将该编译管道与运行时系统配对,后者可将潜在并行性动态转换为实际并行任务。编译器和运行时系统共同保证了并行的低成本,同时又不失其优势。我们证明,我们的技术对并行程序的工作和跨度没有渐进影响,因此保留了它们的渐进特性。我们通过为并行 ML 扩展 MPL 编译器来实现所提出的技术,并证明它可以消除手动优化的负担,同时提供良好的实用性能。