D. Hammer, E. Luit, O. V. Roosmalen, P. V. D. Stok, J. Verhoosel
Until now, little research has been done on methods to combine the seemingly incompatible paradigms of hard and soft real-time systems. To this end, we have developed Dedos, a dependable distributed operating system. The driving forces behind the project are twofold: to meet the demand for dependable distributed control systems, especially in the area of embedded systems and industrial control; and to increase the productivity and quality of application programming for distributed control. Our current focus is on hard real-time issues; soft real-time needs are handled by conventional techniques. However, our work has raised interesting questions about the communication between the soft and hard real-time tasks of the system, which is necessary to pass externally specified control parameters and control status information. The problem is that the data set must always be consistent (concurrency atomicity), but hard real-time activities can never be delayed by soft real-time ones. Other intriguing questions are related to the integration of the reliability and security concepts that are used in the two parts of the systems. in this paper, however, we limit our discussion to the Dedos development model, the Dedos programming model, hard real-time scheduling, and the distributed algorithms needed to implement the Dedos execution environment.<>
{"title":"Dedos: a distributed real-time environment","authors":"D. Hammer, E. Luit, O. V. Roosmalen, P. V. D. Stok, J. Verhoosel","doi":"10.1109/88.345962","DOIUrl":"https://doi.org/10.1109/88.345962","url":null,"abstract":"Until now, little research has been done on methods to combine the seemingly incompatible paradigms of hard and soft real-time systems. To this end, we have developed Dedos, a dependable distributed operating system. The driving forces behind the project are twofold: to meet the demand for dependable distributed control systems, especially in the area of embedded systems and industrial control; and to increase the productivity and quality of application programming for distributed control. Our current focus is on hard real-time issues; soft real-time needs are handled by conventional techniques. However, our work has raised interesting questions about the communication between the soft and hard real-time tasks of the system, which is necessary to pass externally specified control parameters and control status information. The problem is that the data set must always be consistent (concurrency atomicity), but hard real-time activities can never be delayed by soft real-time ones. Other intriguing questions are related to the integration of the reliability and security concepts that are used in the two parts of the systems. in this paper, however, we limit our discussion to the Dedos development model, the Dedos programming model, hard real-time scheduling, and the distributed algorithms needed to implement the Dedos execution environment.<<ETX>>","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125690306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-09-01DOI: 10.1109/M-PDT.1994.329791
T. Gross, D. O'Hallaron, J. Subhlok
Exploiting both data and task parallelism in a single framework is the key to achieving good performance for a variety of applications.
在单一框架中利用数据和任务并行性是为各种应用程序实现良好性能的关键。
{"title":"Task Parallelism in a High Performance Fortran Framework","authors":"T. Gross, D. O'Hallaron, J. Subhlok","doi":"10.1109/M-PDT.1994.329791","DOIUrl":"https://doi.org/10.1109/M-PDT.1994.329791","url":null,"abstract":"Exploiting both data and task parallelism in a single framework is the key to achieving good performance for a variety of applications.","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130029812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-09-01DOI: 10.1109/M-PDT.1994.329803
B. Chapman, H. Zima, P. Mehrotra
High Performance Fortran can support regular numerical algorithms, but it cannot adequately express advanced applications such as particle-in-cell codes or unstructured mesh solvers.This article addresses this problem and outlines possible development paths.
{"title":"Extending HPF for Advanced Data-Parallel Applications","authors":"B. Chapman, H. Zima, P. Mehrotra","doi":"10.1109/M-PDT.1994.329803","DOIUrl":"https://doi.org/10.1109/M-PDT.1994.329803","url":null,"abstract":"High Performance Fortran can support regular numerical algorithms, but it cannot adequately express advanced applications such as particle-in-cell codes or unstructured mesh solvers.This article addresses this problem and outlines possible development paths.","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117182971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-09-01DOI: 10.1109/M-PDT.1994.329794
Ian T Foster
The definition of High Performance Fortran (HPF) is a significant event in the maturation of parallel computing: it represents the first parallel language that has gained widespread support from vendors and users. The subject of this paper is to incorporate support for task parallelism. The term task parallelism refers to the explicit creation of multiple threads of control, or tasks, which synchronize and communicate under programmer control. Task and data parallelism are complementary rather than competing programming models. While task parallelism is more general and can be used to implement algorithms that are not amenable to data-parallel solutions, many problems can benefit from a mixed approach, with for example a task-parallel coordination layer integrating multiple data-parallel computations. Other problems admit to both data- and task-parallel solutions, with the better solution depending on machine characteristics, compiler performance, or personal taste. For these reasons, we believe that a general-purpose high-performance language should integrate both task- and data-parallel constructs. The challenge is to do so in a way that provides the expressivity needed for applications, while preserving the flexibility and portability of a high-level language. In this paper, we examine and illustrate the considerations that motivate the use of task parallelism. We also describe one particular approach to task parallelism in Fortran, namely the Fortran M extensions. Finally, we contrast Fortran M with other proposed approaches and discuss the implications of this work for task parallelism and high-performance languages.
{"title":"Task Parallelism and High-Performance Languages","authors":"Ian T Foster","doi":"10.1109/M-PDT.1994.329794","DOIUrl":"https://doi.org/10.1109/M-PDT.1994.329794","url":null,"abstract":"The definition of High Performance Fortran (HPF) is a significant event in the maturation of parallel computing: it represents the first parallel language that has gained widespread support from vendors and users. The subject of this paper is to incorporate support for task parallelism. The term task parallelism refers to the explicit creation of multiple threads of control, or tasks, which synchronize and communicate under programmer control. Task and data parallelism are complementary rather than competing programming models. While task parallelism is more general and can be used to implement algorithms that are not amenable to data-parallel solutions, many problems can benefit from a mixed approach, with for example a task-parallel coordination layer integrating multiple data-parallel computations. Other problems admit to both data- and task-parallel solutions, with the better solution depending on machine characteristics, compiler performance, or personal taste. For these reasons, we believe that a general-purpose high-performance language should integrate both task- and data-parallel constructs. The challenge is to do so in a way that provides the expressivity needed for applications, while preserving the flexibility and portability of a high-level language. In this paper, we examine and illustrate the considerations that motivate the use of task parallelism. We also describe one particular approach to task parallelism in Fortran, namely the Fortran M extensions. Finally, we contrast Fortran M with other proposed approaches and discuss the implications of this work for task parallelism and high-performance languages.","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115934664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-09-01DOI: 10.1109/M-PDT.1994.329801
Vikram S. Adve, A. Carle, Elana D. Granston, S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, J. Mellor-Crummey, S. Warren, C. Tseng
An effective data-parallel programming environment will use a variety of tools that support the development of efficient data-parallel programs while insulating the programmer from the intricacies of the explicitly parallel code.
{"title":"Requirements for DataParallel Programming Environments","authors":"Vikram S. Adve, A. Carle, Elana D. Granston, S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, J. Mellor-Crummey, S. Warren, C. Tseng","doi":"10.1109/M-PDT.1994.329801","DOIUrl":"https://doi.org/10.1109/M-PDT.1994.329801","url":null,"abstract":"An effective data-parallel programming environment will use a variety of tools that support the development of efficient data-parallel programs while insulating the programmer from the intricacies of the explicitly parallel code.","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122647401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1994-09-01DOI: 10.1109/M-PDT.1994.329796
W. Blume, R. Eigenmann, J. Hoeflinger, D. Padua, Paul Petersen, Lawrence Rauchwerger, P. Tu
The limited ability of compilers to find the parallelism in programs is a significant barrier to the use of high-performance computers.However, a combination of static and runtime techniques can improve compilers to the extent that a significant group of scientific programs can be parallelized automatically.
{"title":"Automatic Detection of Parallelism: A grand challenge for high performance computing","authors":"W. Blume, R. Eigenmann, J. Hoeflinger, D. Padua, Paul Petersen, Lawrence Rauchwerger, P. Tu","doi":"10.1109/M-PDT.1994.329796","DOIUrl":"https://doi.org/10.1109/M-PDT.1994.329796","url":null,"abstract":"The limited ability of compilers to find the parallelism in programs is a significant barrier to the use of high-performance computers.However, a combination of static and runtime techniques can improve compilers to the extent that a significant group of scientific programs can be parallelized automatically.","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123532905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We have implemented a parallel volume renderer that successfully manages work and data distribution by exploiting data coherence-the tendency of neighboring pixels to use the same data during rendering, particularly when rendering volume data. This flexible, powerful renderer uses ray-casting on a Fujitsu AP1000 to generate high-quality images of volume data sets with other geometrically defined structures, such as a set of coordinate axes or a world map. This article focuses on our schemes for work and data distribution. Using image-space work distribution to partition a 2D image among processing nodes, and distributed virtual memory to assign 3D volume data, this renderer effectively and efficiently parallelizes volume rendering.<>
{"title":"Exploiting data coherence to improve parallel volume rendering","authors":"P. Mackerras, B. Corrie","doi":"10.1109/88.311568","DOIUrl":"https://doi.org/10.1109/88.311568","url":null,"abstract":"We have implemented a parallel volume renderer that successfully manages work and data distribution by exploiting data coherence-the tendency of neighboring pixels to use the same data during rendering, particularly when rendering volume data. This flexible, powerful renderer uses ray-casting on a Fujitsu AP1000 to generate high-quality images of volume data sets with other geometrically defined structures, such as a set of coordinate axes or a world map. This article focuses on our schemes for work and data distribution. Using image-space work distribution to partition a 2D image among processing nodes, and distributed virtual memory to assign 3D volume data, this renderer effectively and efficiently parallelizes volume rendering.<<ETX>>","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127832734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The CASA Gigabit Network Testbed, part of NSF and ARPA's Gigabit Project, is investigating whether a metacomputer consisting of widely distributed, heterogeneous supercomputers connected by a high-speed network is viable for large scientific applications. A particular challenge is to determine if such a metacomputer can produce superlinear speedup despite latency and communication overheads. One of the applications in the CASA testbed is a model we developed that couples a global atmosphere model to a world ocean model. Simulations using such coupled general circulation models for climate studies demand considerable computer resources. When distributing such a model, we need to consider the methods for masking latency with computation, the communications bandwidth requirements for different decomposition strategies, the optimal computer architecture for each major phase of the computation, and the effects of latency and communication costs for different decomposition strategies. Here we focus an the last two issues, and demonstrate that choosing the appropriate computer architectures and masking communication with computation can produce superlinear speedup.<>
{"title":"Achieving superlinear speedup on a heterogeneous, distributed system","authors":"C. Mechoso, J. Farrara, J. A. Spahr","doi":"10.1109/88.311573","DOIUrl":"https://doi.org/10.1109/88.311573","url":null,"abstract":"The CASA Gigabit Network Testbed, part of NSF and ARPA's Gigabit Project, is investigating whether a metacomputer consisting of widely distributed, heterogeneous supercomputers connected by a high-speed network is viable for large scientific applications. A particular challenge is to determine if such a metacomputer can produce superlinear speedup despite latency and communication overheads. One of the applications in the CASA testbed is a model we developed that couples a global atmosphere model to a world ocean model. Simulations using such coupled general circulation models for climate studies demand considerable computer resources. When distributing such a model, we need to consider the methods for masking latency with computation, the communications bandwidth requirements for different decomposition strategies, the optimal computer architecture for each major phase of the computation, and the effects of latency and communication costs for different decomposition strategies. Here we focus an the last two issues, and demonstrate that choosing the appropriate computer architectures and masking communication with computation can produce superlinear speedup.<<ETX>>","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124394621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
There has been much research on checkpointing algorithms for parallel and distributed systems; but surprisingly few implementations for uniprocessors, multiprocessors, and distributed systems, and none at all for multicomputers. We discuss ickp, our consistent checkpointer for the Intel iPSC/860, which is the first general-purpose checkpointer for a multicomputer. It is a checkpointing library that may be invoked asynchronously from the host processor, at a periodic interval, or by a library call. It implements three consistent checkpointing algorithms, two optimizations to reduce checkpoint time and overhead, and recovery.<>
{"title":"ickp: a consistent checkpointer for multicomputers","authors":"J. Plank, Kai Li","doi":"10.1109/88.311574","DOIUrl":"https://doi.org/10.1109/88.311574","url":null,"abstract":"There has been much research on checkpointing algorithms for parallel and distributed systems; but surprisingly few implementations for uniprocessors, multiprocessors, and distributed systems, and none at all for multicomputers. We discuss ickp, our consistent checkpointer for the Intel iPSC/860, which is the first general-purpose checkpointer for a multicomputer. It is a checkpointing library that may be invoked asynchronously from the host processor, at a periodic interval, or by a library call. It implements three consistent checkpointing algorithms, two optimizations to reduce checkpoint time and overhead, and recovery.<<ETX>>","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115808779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Applications such as real-time animation and scientific visualization demand high performance for rendering complex 3D abstract data models into 2D images. As large applications migrate to highly parallel supercomputers, how can we exploit the available parallelism to keep the rendering on the supercomputer? To answer this question, we developed a parallel polygon renderer for general-purpose MIMD distributed-memory message-passing systems. It exploits object-level and image-level parallelism, and can run on systems containing from one processor to a number bounded by the number of scan lines in the resulting image. Unlike earlier approaches, ours multiplexes the transformation and rasterization phases on the same machine. This reduces memory usage and network contention, and overlaps computation and communication.<>
{"title":"Parallel polygon rendering for message-passing architectures","authors":"T. Crockett, T. Orloff","doi":"10.1109/88.311569","DOIUrl":"https://doi.org/10.1109/88.311569","url":null,"abstract":"Applications such as real-time animation and scientific visualization demand high performance for rendering complex 3D abstract data models into 2D images. As large applications migrate to highly parallel supercomputers, how can we exploit the available parallelism to keep the rendering on the supercomputer? To answer this question, we developed a parallel polygon renderer for general-purpose MIMD distributed-memory message-passing systems. It exploits object-level and image-level parallelism, and can run on systems containing from one processor to a number bounded by the number of scan lines in the resulting image. Unlike earlier approaches, ours multiplexes the transformation and rasterization phases on the same machine. This reduces memory usage and network contention, and overlaps computation and communication.<<ETX>>","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"303 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116364493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}