{"title":"OpenStream: a data-flow approach to solving the von Neumann bottlenecks","authors":"Antoniu Pop","doi":"10.1145/2463596.2486782","DOIUrl":null,"url":null,"abstract":"As single-threaded performance is reaching its limits, the prevailing trend in multi-core and embedded MPSoC architectures is to provide an ever increasing number of processing units. This convergence leads to shared concerns, like scalability and programmability. Exploiting such architectures poses tremendous challenges to application programmers and to compiler/runtime developers alike. Uncovering raw parallelism is often insufficient in and of itself: improving performance requires changing the code structure to harness complex parallel hardware and memory hierarchies; translating more processing units into effective performance gains involves a combination of target-specific optimizations, subtle concurrency concepts and non-deterministic algorithms.\n In this presentation, we examine the limitations of current, von Neumann architectures and the impact on programmability of the drift from hardware-managed complexity to an increasing reliance on software solutions. We first propose OpenStream, a high-level data-flow programming model, as a pragmatic answer from the application programmer's perspective. Recognizing that the burden cannot be borne by either programmers or compilers alone, OpenStream is designed to strike a fair balance: programmers provide abstract information about their applications and leave the compiler and runtime system with the responsibility of lowering these abstractions to well-orchestrated threads and memory management. In the second part, we adopt the runtime developer's perspective and examine these impacts through the example of the implementation and proof of concurrent lock-free algorithms, a cornerstone of runtime system implementation, critically important in the context of relaxed memory consistency models.","PeriodicalId":344517,"journal":{"name":"M-SCOPES","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"M-SCOPES","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2463596.2486782","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
As single-threaded performance is reaching its limits, the prevailing trend in multi-core and embedded MPSoC architectures is to provide an ever increasing number of processing units. This convergence leads to shared concerns, like scalability and programmability. Exploiting such architectures poses tremendous challenges to application programmers and to compiler/runtime developers alike. Uncovering raw parallelism is often insufficient in and of itself: improving performance requires changing the code structure to harness complex parallel hardware and memory hierarchies; translating more processing units into effective performance gains involves a combination of target-specific optimizations, subtle concurrency concepts and non-deterministic algorithms.
In this presentation, we examine the limitations of current, von Neumann architectures and the impact on programmability of the drift from hardware-managed complexity to an increasing reliance on software solutions. We first propose OpenStream, a high-level data-flow programming model, as a pragmatic answer from the application programmer's perspective. Recognizing that the burden cannot be borne by either programmers or compilers alone, OpenStream is designed to strike a fair balance: programmers provide abstract information about their applications and leave the compiler and runtime system with the responsibility of lowering these abstractions to well-orchestrated threads and memory management. In the second part, we adopt the runtime developer's perspective and examine these impacts through the example of the implementation and proof of concurrent lock-free algorithms, a cornerstone of runtime system implementation, critically important in the context of relaxed memory consistency models.