iPSC上对动态可重分发和可调整大小数组的超任务支持

The Sixth Distributed Memory Computing Conference, 1991. Proceedings Pub Date : 1991-04-28 DOI:10.1109/DMCC.1991.633086

M. Baber

{"title":"iPSC上对动态可重分发和可调整大小数组的超任务支持","authors":"M. Baber","doi":"10.1109/DMCC.1991.633086","DOIUrl":null,"url":null,"abstract":"Static allocations of arrays on multicomputers have two major shortcomings. First, algorithms often employ more than one referencepattern for a given array, resulting in the need for more than one mapping between the array elements and the multicomputer nodes. Secondly, it is desirable to provide easily resizeable arrays, especially for multigrid algorithms. This paper describes extensions to the hypertasking paracompiler which provide both dynamically resizeable and redistributable arrays. Hypertasking is a parallel programming tool that transforms C programs containing comment-directives into SPMD Cprogirams that can be run on any size hypercube without recompilation for each cube size. Introduction This paper describes extensions tc~ hypertasking [ 11, a domain decomposition tool that operates on commentdirectives inserted into ordinary sequential C source code. The extensions support run-time redistribution and resizing of arrays. Hypertasking is one of seveial projects [4,5,6,8] that have proposed or produced sourceto-source compilers for parallel architectures. I refer to this class of software tools as paracompilers to distinguish them from the sequential source-to-object compilers they are built upon. A fundamental question for paracompiler designers is whether to make decisions about data and control decomposition at compile-time or at ruin-time. If decisions are made at compile-time, the logic does not have to be repeated every time the program is executed and it is possible to optimize the code for known parameters. * Supported in part by: Defense Advanced Research Projects Agency Information Science and Technology Office Research in Concurrent Computing Systems ARPA Order No. 6402.6402-1; Program Code No. 8E20 & 9E20 Issued by DARPAKMO under Contract #&IDA-972-89-C-0034 Unfortunately, compile-time decisions are also inflexible. Hypertasking nnakes all significant decisions about decomposition at ]run-time. A run-time initialization routine is called by each node to assign values to the members of an amay definition structure. The C code generated by the paracompiler references the values in the structure instead of constants chosen at compile-time. The resulting code is surprisingly efficient. Furthermore, because it is relatively straightforward to change the decomposition variables in the array definition structure, run -ti me decomposition great 1 y facilitates the implementation of dynamic array resizing and redistribution features such as those described in this paper. This paper will begin with an overview of the Hypertasking programming model to provide a framework for the new features. Beginning with redistributable arrays, the purpose and performance of the new features are discussed with reference to example programs. Finally, conclusions and goals for future research are presented. Hypertasking overview Hypertasking is; designed to make it easy for software developers to port their existing data parallel applications to a multicomputer without making their code hardware specific. Hypertasking library routines decompose arrays in any or all dimensions, buit the number of nodes allocated in any given dimension is controlled by the hypertasking run-the library, and is always a power of two to preserve locality of reference within the logical node mesh. All arrays are decomposed into regular rectangular sub-blocks with sides as equal as possible given the previous constraints. Guard rings [3] for each subblock are provided. The term “guard ring”’ tends to imply a 2-D problem decomposed in both dimensions, but the concept is extended in this implementation to multiple dimensions. This paper uses guard wrapper as a general term encompassing guard rings in 2-D decompositions, guard shells in 3-D, and so on. A guard wrapper could be one array element thick for a 2-D 5-point stencil, or two elements thick for a ‘2-D 9-point stencil, for example. Each array element is stored on one or more nodes, 0-81 86-2290-3/91/0OO0/0059/$01 .OO Q 1991 IEEE 59 though it is only owned by one node. Array assignments and references are automatically rewritten by the paracompiler (hype command) so that any node can transparently read or write any element in the distributed virtual array, but communication costs make non-local reads and writes expensive. Application algorithms should exhibit good locality of reference to make hypertasking worthwhile. Hypertasking supports directives to decompose arrays in any or all dimensions, to limit loops to indices that are local for a given array, and to update guard wrappers with boundary values from neighboring nodes. The tool consists of a paracompiler and library routines. Figure 1 depicts the basic hypertasking usage model. Redist ributable arrays The array redistribution features are implemented as two new directives for the paracompiler and two new runtime library routines. The REDISTRIBUTE directive The REDISTRIBUTE directive is similar to the original ARRAY directive except that it is executable instead of declarative. The arguments are the same, allowing the user to specify the thickness of the guard wrapper and whether or not to distribute each dimension of the array. Can be run on a single node as a reference for speedups. C Compiler and Linker","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Hypertasking Support for Dynamically Redistributable and Resizeable Arrays on the iPSC\",\"authors\":\"M. Baber\",\"doi\":\"10.1109/DMCC.1991.633086\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Static allocations of arrays on multicomputers have two major shortcomings. First, algorithms often employ more than one referencepattern for a given array, resulting in the need for more than one mapping between the array elements and the multicomputer nodes. Secondly, it is desirable to provide easily resizeable arrays, especially for multigrid algorithms. This paper describes extensions to the hypertasking paracompiler which provide both dynamically resizeable and redistributable arrays. Hypertasking is a parallel programming tool that transforms C programs containing comment-directives into SPMD Cprogirams that can be run on any size hypercube without recompilation for each cube size. Introduction This paper describes extensions tc~ hypertasking [ 11, a domain decomposition tool that operates on commentdirectives inserted into ordinary sequential C source code. The extensions support run-time redistribution and resizing of arrays. Hypertasking is one of seveial projects [4,5,6,8] that have proposed or produced sourceto-source compilers for parallel architectures. I refer to this class of software tools as paracompilers to distinguish them from the sequential source-to-object compilers they are built upon. A fundamental question for paracompiler designers is whether to make decisions about data and control decomposition at compile-time or at ruin-time. If decisions are made at compile-time, the logic does not have to be repeated every time the program is executed and it is possible to optimize the code for known parameters. * Supported in part by: Defense Advanced Research Projects Agency Information Science and Technology Office Research in Concurrent Computing Systems ARPA Order No. 6402.6402-1; Program Code No. 8E20 & 9E20 Issued by DARPAKMO under Contract #&IDA-972-89-C-0034 Unfortunately, compile-time decisions are also inflexible. Hypertasking nnakes all significant decisions about decomposition at ]run-time. A run-time initialization routine is called by each node to assign values to the members of an amay definition structure. The C code generated by the paracompiler references the values in the structure instead of constants chosen at compile-time. The resulting code is surprisingly efficient. Furthermore, because it is relatively straightforward to change the decomposition variables in the array definition structure, run -ti me decomposition great 1 y facilitates the implementation of dynamic array resizing and redistribution features such as those described in this paper. This paper will begin with an overview of the Hypertasking programming model to provide a framework for the new features. Beginning with redistributable arrays, the purpose and performance of the new features are discussed with reference to example programs. Finally, conclusions and goals for future research are presented. Hypertasking overview Hypertasking is; designed to make it easy for software developers to port their existing data parallel applications to a multicomputer without making their code hardware specific. Hypertasking library routines decompose arrays in any or all dimensions, buit the number of nodes allocated in any given dimension is controlled by the hypertasking run-the library, and is always a power of two to preserve locality of reference within the logical node mesh. All arrays are decomposed into regular rectangular sub-blocks with sides as equal as possible given the previous constraints. Guard rings [3] for each subblock are provided. The term “guard ring”’ tends to imply a 2-D problem decomposed in both dimensions, but the concept is extended in this implementation to multiple dimensions. This paper uses guard wrapper as a general term encompassing guard rings in 2-D decompositions, guard shells in 3-D, and so on. A guard wrapper could be one array element thick for a 2-D 5-point stencil, or two elements thick for a ‘2-D 9-point stencil, for example. Each array element is stored on one or more nodes, 0-81 86-2290-3/91/0OO0/0059/$01 .OO Q 1991 IEEE 59 though it is only owned by one node. Array assignments and references are automatically rewritten by the paracompiler (hype command) so that any node can transparently read or write any element in the distributed virtual array, but communication costs make non-local reads and writes expensive. Application algorithms should exhibit good locality of reference to make hypertasking worthwhile. Hypertasking supports directives to decompose arrays in any or all dimensions, to limit loops to indices that are local for a given array, and to update guard wrappers with boundary values from neighboring nodes. The tool consists of a paracompiler and library routines. Figure 1 depicts the basic hypertasking usage model. Redist ributable arrays The array redistribution features are implemented as two new directives for the paracompiler and two new runtime library routines. The REDISTRIBUTE directive The REDISTRIBUTE directive is similar to the original ARRAY directive except that it is executable instead of declarative. The arguments are the same, allowing the user to specify the thickness of the guard wrapper and whether or not to distribute each dimension of the array. Can be run on a single node as a reference for speedups. C Compiler and Linker\",\"PeriodicalId\":313314,\"journal\":{\"name\":\"The Sixth Distributed Memory Computing Conference, 1991. Proceedings\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1991-04-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Sixth Distributed Memory Computing Conference, 1991. Proceedings\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DMCC.1991.633086\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DMCC.1991.633086","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

摘要

REDISTRIBUTE指令与原来的ARRAY指令类似，不同之处在于它是可执行的而不是声明性的。参数是相同的，允许用户指定保护包装器的厚度以及是否分配数组的每个维度。可以在单个节点上运行，作为加速的参考。C编译器和链接器

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Hypertasking Support for Dynamically Redistributable and Resizeable Arrays on the iPSC

Static allocations of arrays on multicomputers have two major shortcomings. First, algorithms often employ more than one referencepattern for a given array, resulting in the need for more than one mapping between the array elements and the multicomputer nodes. Secondly, it is desirable to provide easily resizeable arrays, especially for multigrid algorithms. This paper describes extensions to the hypertasking paracompiler which provide both dynamically resizeable and redistributable arrays. Hypertasking is a parallel programming tool that transforms C programs containing comment-directives into SPMD Cprogirams that can be run on any size hypercube without recompilation for each cube size. Introduction This paper describes extensions tc~ hypertasking [ 11, a domain decomposition tool that operates on commentdirectives inserted into ordinary sequential C source code. The extensions support run-time redistribution and resizing of arrays. Hypertasking is one of seveial projects [4,5,6,8] that have proposed or produced sourceto-source compilers for parallel architectures. I refer to this class of software tools as paracompilers to distinguish them from the sequential source-to-object compilers they are built upon. A fundamental question for paracompiler designers is whether to make decisions about data and control decomposition at compile-time or at ruin-time. If decisions are made at compile-time, the logic does not have to be repeated every time the program is executed and it is possible to optimize the code for known parameters. * Supported in part by: Defense Advanced Research Projects Agency Information Science and Technology Office Research in Concurrent Computing Systems ARPA Order No. 6402.6402-1; Program Code No. 8E20 & 9E20 Issued by DARPAKMO under Contract #&IDA-972-89-C-0034 Unfortunately, compile-time decisions are also inflexible. Hypertasking nnakes all significant decisions about decomposition at ]run-time. A run-time initialization routine is called by each node to assign values to the members of an amay definition structure. The C code generated by the paracompiler references the values in the structure instead of constants chosen at compile-time. The resulting code is surprisingly efficient. Furthermore, because it is relatively straightforward to change the decomposition variables in the array definition structure, run -ti me decomposition great 1 y facilitates the implementation of dynamic array resizing and redistribution features such as those described in this paper. This paper will begin with an overview of the Hypertasking programming model to provide a framework for the new features. Beginning with redistributable arrays, the purpose and performance of the new features are discussed with reference to example programs. Finally, conclusions and goals for future research are presented. Hypertasking overview Hypertasking is; designed to make it easy for software developers to port their existing data parallel applications to a multicomputer without making their code hardware specific. Hypertasking library routines decompose arrays in any or all dimensions, buit the number of nodes allocated in any given dimension is controlled by the hypertasking run-the library, and is always a power of two to preserve locality of reference within the logical node mesh. All arrays are decomposed into regular rectangular sub-blocks with sides as equal as possible given the previous constraints. Guard rings [3] for each subblock are provided. The term “guard ring”’ tends to imply a 2-D problem decomposed in both dimensions, but the concept is extended in this implementation to multiple dimensions. This paper uses guard wrapper as a general term encompassing guard rings in 2-D decompositions, guard shells in 3-D, and so on. A guard wrapper could be one array element thick for a 2-D 5-point stencil, or two elements thick for a ‘2-D 9-point stencil, for example. Each array element is stored on one or more nodes, 0-81 86-2290-3/91/0OO0/0059/$01 .OO Q 1991 IEEE 59 though it is only owned by one node. Array assignments and references are automatically rewritten by the paracompiler (hype command) so that any node can transparently read or write any element in the distributed virtual array, but communication costs make non-local reads and writes expensive. Application algorithms should exhibit good locality of reference to make hypertasking worthwhile. Hypertasking supports directives to decompose arrays in any or all dimensions, to limit loops to indices that are local for a given array, and to update guard wrappers with boundary values from neighboring nodes. The tool consists of a paracompiler and library routines. Figure 1 depicts the basic hypertasking usage model. Redist ributable arrays The array redistribution features are implemented as two new directives for the paracompiler and two new runtime library routines. The REDISTRIBUTE directive The REDISTRIBUTE directive is similar to the original ARRAY directive except that it is executable instead of declarative. The arguments are the same, allowing the user to specify the thickness of the guard wrapper and whether or not to distribute each dimension of the array. Can be run on a single node as a reference for speedups. C Compiler and Linker

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The Sixth Distributed Memory Computing Conference, 1991. Proceedings

自引率

0.00%

发文量

期刊最新文献

Scalable Performance Environments for Parallel Systems Using Spanning-Trees for Balancing Dynamic Load on Multiprocessors Optimal Total Exchange on an SIMD Distributed-Memory Hypercube Structured Parallel Programming on Multicomputers Parallel Solutions to the Phase Problem in X-Ray Crystallography: An Update