{"title":"自顶向下的编程方法和工具与stars -支持可扩展的编程范例:扩展抽象","authors":"Rosa M. Badia","doi":"10.1145/2133173.2133182","DOIUrl":null,"url":null,"abstract":"Current supercomputers are evolving to clusters with a very large number of nodes, and what is more, the nodes are each time becoming more complex composed of several multicore chips and GPUs. With such architectures, the application developers are every time facing a more complex task. On the other hand, most HPC applications are scientific legacy codes written in MPI and designed for at most thousands of processors. Current efforts deal with extending these applications to scale to larger number of cores and to be combined with CUDA or OpenCL to efficienly run on GPUs.\n To evolve a given application to be suitable to run in new heterogeneous supercomputers, application developers can take different alternatives. Optimizations to improve the MPI bottlenecks, for example, by using asynchronous communications, or optimizations on the sequential code to improve its locality, or optimizations at the node level to avoid resource contention, to list a few.\n This paper proposes a methodology to enable current MPI applications to be improved using the MPI/StarSs programming model. StarSs [2] is a task-based programming model that enables to parallelize sequential applications by means of annotating the code with compiler directives. What is more important, it supports their execution in heterogeneous platforms, including clusters of GPUs. Also it nicely hybridizes with MPI [1], and enables the overlap of communication and computation.\n The approach is based on the generation at execution time of a directed acyclic graph (DAG), where the nodes of the graph denote tasks in the application and edges denote data dependences between tasks. Once a partial DAG has been generated, the StarSs runtime is able to schedule the tasks to the different cores or GPUs of the platform.\n Another relevant aspect is that the programming model offers to the application developers a single name space while the actual memory addresses can be distributed (as in a cluster or a node with GPUs). The StarSs runtime maintains a hierarchical directory with information about where to find each block of data and different software caches are maintained in each of the distributed memory spaces. The runtime is responsible for transferring the data between the different memory spaces and for keeping the coherence.\n While the programming model itself comes with a very simple syntax, identifying tasks may sometimes not be as easy as one can predict, especially when trying to taskify MPI applications. With the purpose of simplifying this process, a set of tools has been developed to conform with the framework: Ssgrind, that helps identifying tasks and the directionality of the tasksâǍŹ parameters, Ayudame and Temanejo, to help debugging StarSs applications, and Paraver, Cube and Scalasca, that enable a detailed performance analysis of the applications. The extended version of the paper will detail the programming methodology outlined illustrating it with examples.","PeriodicalId":259517,"journal":{"name":"ACM SIGPLAN Symposium on Scala","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Top down programming methodology and tools with StarSs - enabling scalable programming paradigms: extended abstract\",\"authors\":\"Rosa M. Badia\",\"doi\":\"10.1145/2133173.2133182\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Current supercomputers are evolving to clusters with a very large number of nodes, and what is more, the nodes are each time becoming more complex composed of several multicore chips and GPUs. With such architectures, the application developers are every time facing a more complex task. On the other hand, most HPC applications are scientific legacy codes written in MPI and designed for at most thousands of processors. Current efforts deal with extending these applications to scale to larger number of cores and to be combined with CUDA or OpenCL to efficienly run on GPUs.\\n To evolve a given application to be suitable to run in new heterogeneous supercomputers, application developers can take different alternatives. Optimizations to improve the MPI bottlenecks, for example, by using asynchronous communications, or optimizations on the sequential code to improve its locality, or optimizations at the node level to avoid resource contention, to list a few.\\n This paper proposes a methodology to enable current MPI applications to be improved using the MPI/StarSs programming model. StarSs [2] is a task-based programming model that enables to parallelize sequential applications by means of annotating the code with compiler directives. What is more important, it supports their execution in heterogeneous platforms, including clusters of GPUs. Also it nicely hybridizes with MPI [1], and enables the overlap of communication and computation.\\n The approach is based on the generation at execution time of a directed acyclic graph (DAG), where the nodes of the graph denote tasks in the application and edges denote data dependences between tasks. Once a partial DAG has been generated, the StarSs runtime is able to schedule the tasks to the different cores or GPUs of the platform.\\n Another relevant aspect is that the programming model offers to the application developers a single name space while the actual memory addresses can be distributed (as in a cluster or a node with GPUs). The StarSs runtime maintains a hierarchical directory with information about where to find each block of data and different software caches are maintained in each of the distributed memory spaces. The runtime is responsible for transferring the data between the different memory spaces and for keeping the coherence.\\n While the programming model itself comes with a very simple syntax, identifying tasks may sometimes not be as easy as one can predict, especially when trying to taskify MPI applications. With the purpose of simplifying this process, a set of tools has been developed to conform with the framework: Ssgrind, that helps identifying tasks and the directionality of the tasksâǍŹ parameters, Ayudame and Temanejo, to help debugging StarSs applications, and Paraver, Cube and Scalasca, that enable a detailed performance analysis of the applications. The extended version of the paper will detail the programming methodology outlined illustrating it with examples.\",\"PeriodicalId\":259517,\"journal\":{\"name\":\"ACM SIGPLAN Symposium on Scala\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-11-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM SIGPLAN Symposium on Scala\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2133173.2133182\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM SIGPLAN Symposium on Scala","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2133173.2133182","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Top down programming methodology and tools with StarSs - enabling scalable programming paradigms: extended abstract
Current supercomputers are evolving to clusters with a very large number of nodes, and what is more, the nodes are each time becoming more complex composed of several multicore chips and GPUs. With such architectures, the application developers are every time facing a more complex task. On the other hand, most HPC applications are scientific legacy codes written in MPI and designed for at most thousands of processors. Current efforts deal with extending these applications to scale to larger number of cores and to be combined with CUDA or OpenCL to efficienly run on GPUs.
To evolve a given application to be suitable to run in new heterogeneous supercomputers, application developers can take different alternatives. Optimizations to improve the MPI bottlenecks, for example, by using asynchronous communications, or optimizations on the sequential code to improve its locality, or optimizations at the node level to avoid resource contention, to list a few.
This paper proposes a methodology to enable current MPI applications to be improved using the MPI/StarSs programming model. StarSs [2] is a task-based programming model that enables to parallelize sequential applications by means of annotating the code with compiler directives. What is more important, it supports their execution in heterogeneous platforms, including clusters of GPUs. Also it nicely hybridizes with MPI [1], and enables the overlap of communication and computation.
The approach is based on the generation at execution time of a directed acyclic graph (DAG), where the nodes of the graph denote tasks in the application and edges denote data dependences between tasks. Once a partial DAG has been generated, the StarSs runtime is able to schedule the tasks to the different cores or GPUs of the platform.
Another relevant aspect is that the programming model offers to the application developers a single name space while the actual memory addresses can be distributed (as in a cluster or a node with GPUs). The StarSs runtime maintains a hierarchical directory with information about where to find each block of data and different software caches are maintained in each of the distributed memory spaces. The runtime is responsible for transferring the data between the different memory spaces and for keeping the coherence.
While the programming model itself comes with a very simple syntax, identifying tasks may sometimes not be as easy as one can predict, especially when trying to taskify MPI applications. With the purpose of simplifying this process, a set of tools has been developed to conform with the framework: Ssgrind, that helps identifying tasks and the directionality of the tasksâǍŹ parameters, Ayudame and Temanejo, to help debugging StarSs applications, and Paraver, Cube and Scalasca, that enable a detailed performance analysis of the applications. The extended version of the paper will detail the programming methodology outlined illustrating it with examples.