任务并行汇编语言的不妥协的并行性

Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation Pub Date : 2021-06-19 DOI:10.1145/3453483.3460969

Mike Rainey, Kyle C. Hale, Nikos Hardavellas, Simone Campanoni, Umut A. Acar

{"title":"任务并行汇编语言的不妥协的并行性","authors":"Mike Rainey, Kyle C. Hale, Nikos Hardavellas, Simone Campanoni, Umut A. Acar","doi":"10.1145/3453483.3460969","DOIUrl":null,"url":null,"abstract":"Achieving parallel performance and scalability involves making compromises between parallel and sequential computation. If not contained, the overheads of parallelism can easily outweigh its benefits, sometimes by orders of magnitude. Today, we expect programmers to implement this compromise by optimizing their code manually. This process is labor intensive, requires deep expertise, and reduces code quality. Recent work on heartbeat scheduling shows a promising approach that manifests the potentially vast amounts of available, latent parallelism, at a regular rate, based on even beats in time. The idea is to amortize the overheads of parallelism over the useful work performed between the beats. Heartbeat scheduling is promising in theory, but the reality is complicated: it has no known practical implementation. In this paper, we propose a practical approach to heartbeat scheduling that involves equipping the assembly language with a small set of primitives. These primitives leverage existing kernel and hardware support for interrupts to allow parallelism to remain latent, until a heartbeat, when it can be manifested with low cost. Our Task Parallel Assembly Language (TPAL) is a compact, RISC-like assembly language. We specify TPAL through an abstract machine and implement the abstract machine as compiler transformations for C/C++ code and a specialized run-time system. We present an evaluation on both the Linux and the Nautilus kernels, considering a range of heartbeat interrupt mechanisms. The evaluation shows that TPAL can dramatically reduce the overheads of parallelism without compromising scalability.","PeriodicalId":20557,"journal":{"name":"Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"os-48 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Task parallel assembly language for uncompromising parallelism\",\"authors\":\"Mike Rainey, Kyle C. Hale, Nikos Hardavellas, Simone Campanoni, Umut A. Acar\",\"doi\":\"10.1145/3453483.3460969\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Achieving parallel performance and scalability involves making compromises between parallel and sequential computation. If not contained, the overheads of parallelism can easily outweigh its benefits, sometimes by orders of magnitude. Today, we expect programmers to implement this compromise by optimizing their code manually. This process is labor intensive, requires deep expertise, and reduces code quality. Recent work on heartbeat scheduling shows a promising approach that manifests the potentially vast amounts of available, latent parallelism, at a regular rate, based on even beats in time. The idea is to amortize the overheads of parallelism over the useful work performed between the beats. Heartbeat scheduling is promising in theory, but the reality is complicated: it has no known practical implementation. In this paper, we propose a practical approach to heartbeat scheduling that involves equipping the assembly language with a small set of primitives. These primitives leverage existing kernel and hardware support for interrupts to allow parallelism to remain latent, until a heartbeat, when it can be manifested with low cost. Our Task Parallel Assembly Language (TPAL) is a compact, RISC-like assembly language. We specify TPAL through an abstract machine and implement the abstract machine as compiler transformations for C/C++ code and a specialized run-time system. We present an evaluation on both the Linux and the Nautilus kernels, considering a range of heartbeat interrupt mechanisms. The evaluation shows that TPAL can dramatically reduce the overheads of parallelism without compromising scalability.\",\"PeriodicalId\":20557,\"journal\":{\"name\":\"Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation\",\"volume\":\"os-48 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3453483.3460969\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3453483.3460969","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

实现并行性能和可伸缩性需要在并行计算和顺序计算之间做出妥协。如果不加以控制，并行性的开销很容易超过它的好处，有时甚至超过它的数量级。今天，我们期望程序员通过手动优化他们的代码来实现这种折衷。这个过程是劳动密集型的，需要深厚的专业知识，并且降低了代码质量。最近关于心跳调度的工作显示了一种很有前途的方法，该方法显示了潜在的大量可用性，潜在的并行性，以规律的速率，基于时间上的均匀心跳。其思想是将并行性的开销分摊到节拍之间执行的有用工作上。心跳调度在理论上很有希望，但现实很复杂:它没有已知的实际实现。在本文中，我们提出了一种实用的心跳调度方法，该方法包括为汇编语言配备一小组原语。这些原语利用现有的内核和硬件对中断的支持，使并行性保持潜伏状态，直到出现心跳时，才能以低成本表现出来。我们的任务并行汇编语言(TPAL)是一种紧凑的、类似risc的汇编语言。我们通过一个抽象机器来指定TPAL，并将抽象机器实现为C/ c++代码的编译器转换和一个专门的运行时系统。我们对Linux和Nautilus内核进行了评估，考虑了一系列心跳中断机制。评估表明，TPAL可以在不影响可伸缩性的情况下显著降低并行性的开销。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Task parallel assembly language for uncompromising parallelism

Achieving parallel performance and scalability involves making compromises between parallel and sequential computation. If not contained, the overheads of parallelism can easily outweigh its benefits, sometimes by orders of magnitude. Today, we expect programmers to implement this compromise by optimizing their code manually. This process is labor intensive, requires deep expertise, and reduces code quality. Recent work on heartbeat scheduling shows a promising approach that manifests the potentially vast amounts of available, latent parallelism, at a regular rate, based on even beats in time. The idea is to amortize the overheads of parallelism over the useful work performed between the beats. Heartbeat scheduling is promising in theory, but the reality is complicated: it has no known practical implementation. In this paper, we propose a practical approach to heartbeat scheduling that involves equipping the assembly language with a small set of primitives. These primitives leverage existing kernel and hardware support for interrupts to allow parallelism to remain latent, until a heartbeat, when it can be manifested with low cost. Our Task Parallel Assembly Language (TPAL) is a compact, RISC-like assembly language. We specify TPAL through an abstract machine and implement the abstract machine as compiler transformations for C/C++ code and a specialized run-time system. We present an evaluation on both the Linux and the Nautilus kernels, considering a range of heartbeat interrupt mechanisms. The evaluation shows that TPAL can dramatically reduce the overheads of parallelism without compromising scalability.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助