Pub Date : 1991-10-18DOI: 10.1109/DMCC.1991.633315
Daniel A. Reed, R. D. Olson, R. Aydt, Tara M. Madhyastha, T. Birkett, David W. Jensen, B. Nazief, B. K. Totty
As parallel systems expand in size and complexity, the absence of performance tools for these parallel systems exacerbates the already difficult problems of application program and system software performance tuning. Moreover, given the pace of technological change, we can no longer afford to develop ad hoc, one-of-a-kind performance instrumentation software; we need scalable, portable performance analysis tools. We describe an environment prototype based on the lessons learned from two previous generations of performance data analysis software. Our environment prototype contains a set of performance data transformation modules that can be interconnected in user-specified ways. It is the responsibility of the environment infrastructure to hide details of module interconnection and data sharing. The environment is written in C++ with the graphical displays based on X windows and the Motif toolkit. It allows users to interconnect and configure modules graphically to form an acyclic, directed data analysis graph. Performance trace data are represented in a self-documenting stream format that includes internal definitions of data types, sizes, and names. The environment prototype supports the use of head-mounted displays and sonic data presentation in addition to the traditional use of visual techniques.
{"title":"Scalable Performance Environments for Parallel Systems","authors":"Daniel A. Reed, R. D. Olson, R. Aydt, Tara M. Madhyastha, T. Birkett, David W. Jensen, B. Nazief, B. K. Totty","doi":"10.1109/DMCC.1991.633315","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633315","url":null,"abstract":"As parallel systems expand in size and complexity, the absence of performance tools for these parallel systems exacerbates the already difficult problems of application program and system software performance tuning. Moreover, given the pace of technological change, we can no longer afford to develop ad hoc, one-of-a-kind performance instrumentation software; we need scalable, portable performance analysis tools. We describe an environment prototype based on the lessons learned from two previous generations of performance data analysis software. Our environment prototype contains a set of performance data transformation modules that can be interconnected in user-specified ways. It is the responsibility of the environment infrastructure to hide details of module interconnection and data sharing. The environment is written in C++ with the graphical displays based on X windows and the Motif toolkit. It allows users to interconnect and configure modules graphically to form an acyclic, directed data analysis graph. Performance trace data are represented in a self-documenting stream format that includes internal definitions of data types, sizes, and names. The environment prototype supports the use of head-mounted displays and sonic data presentation in addition to the traditional use of visual techniques.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122370339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633352
H. Corporaal, J. Olk
The number of design decisions for connecting processor nodes within MIMD systems is rather large. This paper systematically introduces the most important design parameters for communication processors in MIMD systems. Together, these parameters span a multidimensional design space. Points in this space are clarijied through classijication of a number of existing communication processors. The design choices made for these processors are reviewed and their performance is evaluated. Suitable choices of the design parameters are highly influenced by application behavior. Ideally one would like to design processors which cover a whole area in this design space. A companion paper describes a scalable andflexible design currently being realized at our laboratory.
{"title":"Design and Evaluation of Communication Processors supporting Message Passing in Distributed Memory Systems","authors":"H. Corporaal, J. Olk","doi":"10.1109/DMCC.1991.633352","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633352","url":null,"abstract":"The number of design decisions for connecting processor nodes within MIMD systems is rather large. This paper systematically introduces the most important design parameters for communication processors in MIMD systems. Together, these parameters span a multidimensional design space. Points in this space are clarijied through classijication of a number of existing communication processors. The design choices made for these processors are reviewed and their performance is evaluated. Suitable choices of the design parameters are highly influenced by application behavior. Ideally one would like to design processors which cover a whole area in this design space. A companion paper describes a scalable andflexible design currently being realized at our laboratory.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115183275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633137
V. Lo
The temporal communication graph is a new graph theoretic model of parallel computation that we have developed for the mapping of parallel computations to message-passing parallel architectures. The TCG integrates the two dominant models currently in use in the area of mapping, task assignment, partitioning, and scheduling: the static task graph and the DAG. The TCG augments these models with the capability to identify logically synchronous phases of communication and computation, and to describe the temporal behavior of a parallel algorithm in terms of these phases. This paper defines the TCG, illustrates its use for mapping and scheduling, and discusses a wide range of potentials uses for the TCG in the area of parallel programming environments.
{"title":"Temporal Communication Graphs: A New Graph Theoretic Model Mapping and Scheduling in Distributed Memory Systems","authors":"V. Lo","doi":"10.1109/DMCC.1991.633137","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633137","url":null,"abstract":"The temporal communication graph is a new graph theoretic model of parallel computation that we have developed for the mapping of parallel computations to message-passing parallel architectures. The TCG integrates the two dominant models currently in use in the area of mapping, task assignment, partitioning, and scheduling: the static task graph and the DAG. The TCG augments these models with the capability to identify logically synchronous phases of communication and computation, and to describe the temporal behavior of a parallel algorithm in terms of these phases. This paper defines the TCG, illustrates its use for mapping and scheduling, and discusses a wide range of potentials uses for the TCG in the area of parallel programming environments.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122756296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633119
B. Baxter, B. Greer
Local operator compurafions used in 2-dimensional image processing can be applied TO individual pixels independenr ly. making ir easy ro perform This class ofproblems in paral lel. The Apply language is designed to exploir rhis OppOrlU niry while hiding mosr parallel programming dewilsfrom the programmer andrewining rhe lookandfeel ofa conven fional sequential style.
{"title":"Apply: A Parallel Compiler on iWarp for Image-Processing Applications","authors":"B. Baxter, B. Greer","doi":"10.1109/DMCC.1991.633119","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633119","url":null,"abstract":"Local operator compurafions used in 2-dimensional image processing can be applied TO individual pixels independenr ly. making ir easy ro perform This class ofproblems in paral lel. The Apply language is designed to exploir rhis OppOrlU niry while hiding mosr parallel programming dewilsfrom the programmer andrewining rhe lookandfeel ofa conven fional sequential style.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121214046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633134
R. Melhem, K. Pruhs, T. Znati
We consider the problem of load balancing to minimize the cost of dynamic computations, including the cost of migrations. We analyze the costs associated with diffusion based algorithms for several common architectures. We introduce the Ripple load balancing paradigm, which has several advantages over diffusion methods, including flexibility and faster convergence.
{"title":"Using Spanning-Trees for Balancing Dynamic Load on Multiprocessors","authors":"R. Melhem, K. Pruhs, T. Znati","doi":"10.1109/DMCC.1991.633134","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633134","url":null,"abstract":"We consider the problem of load balancing to minimize the cost of dynamic computations, including the cost of migrations. We analyze the costs associated with diffusion based algorithms for several common architectures. We introduce the Ripple load balancing paradigm, which has several advantages over diffusion methods, including flexibility and faster convergence.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114073030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633128
Zhiwei Xu
This paper presents a technique for programming distributed memory multicomputers by automatically generating parallel programs from parallel computations specified as synchronous data flow graphs or recurrence equations.
{"title":"Executing Synchronous Data Flow Graphs on Multicomputers","authors":"Zhiwei Xu","doi":"10.1109/DMCC.1991.633128","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633128","url":null,"abstract":"This paper presents a technique for programming distributed memory multicomputers by automatically generating parallel programs from parallel computations specified as synchronous data flow graphs or recurrence equations.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126469715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633354
S. Miguet, Yves Robert
Performing reduction operations with distributed memory machines whose interconnection networks are reconfigurable is considered. The focus is on machines whose interconnection graph can be configured as any graph of maximum degree d. The best way of interconnecting the p processors as a function of p,d and some problem- and machine-dependent parameters that characterize the ratio communication/arithmetic for the reduction operation are discussed. Experiments on transputer-based networks are in good accordance with the theoretical results. >
{"title":"Reduction operations on a distributed memory machine with a reconfigurable interconnection network","authors":"S. Miguet, Yves Robert","doi":"10.1109/DMCC.1991.633354","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633354","url":null,"abstract":"Performing reduction operations with distributed memory machines whose interconnection networks are reconfigurable is considered. The focus is on machines whose interconnection graph can be configured as any graph of maximum degree d. The best way of interconnecting the p processors as a function of p,d and some problem- and machine-dependent parameters that characterize the ratio communication/arithmetic for the reduction operation are discussed. Experiments on transputer-based networks are in good accordance with the theoretical results. >","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125722372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633314
E. Williams, G.B. Lament
Algorithm animation is a visualization method used to enhance understanding of the functioning of an algorithm or program. Visualization is used for many purposes, including education, algorithm research, performance analysis, and program debugging. This research applies algorithm animation techniques to programs developed for parallel architectures, with specific emphasis on the Intel iPSC/2 hypercube. Current, investigations focus in two different areas: performance data display and animations of specific algorithms or classes of algorithms. This research builds on these efforts to provide a system that is able to both display performance data from parallel programs and support the creation of animations for specific algorithms. There are three goals for this visualization system. Data should be displayed as it is generated. The inteiface to the target program should be transparent, allowing the animation of existing programs. The system must be flexible enough to animate any algorithm. The resulting system incorporates, integrates and extends two systems: the AFIT Algorithm Animation Research Facility (AAARF) and the Parallel Resource Analysis Software Environment (PRASE). Since performance data is an essential part of analyzing any parallel program, multiple views of the performance data are provided as an elementary part of the system. In addition to the animation system, a method for developing the animations is discussed. This method is arpplicable to animating any type of program, sequential or parallel. Whilc: both P-time and NP-ttme algorithms can potentially benefit from using visualization techniques, the set of NP .complete problems provides fertile ground for developing parallel atpplications. The methods discussed in this paper were used to animate a parallel implementation of a general Set Covering Problem (SCP).
{"title":"A Real-Time Parallel Algorithm Animation System","authors":"E. Williams, G.B. Lament","doi":"10.1109/DMCC.1991.633314","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633314","url":null,"abstract":"Algorithm animation is a visualization method used to enhance understanding of the functioning of an algorithm or program. Visualization is used for many purposes, including education, algorithm research, performance analysis, and program debugging. This research applies algorithm animation techniques to programs developed for parallel architectures, with specific emphasis on the Intel iPSC/2 hypercube. Current, investigations focus in two different areas: performance data display and animations of specific algorithms or classes of algorithms. This research builds on these efforts to provide a system that is able to both display performance data from parallel programs and support the creation of animations for specific algorithms. There are three goals for this visualization system. Data should be displayed as it is generated. The inteiface to the target program should be transparent, allowing the animation of existing programs. The system must be flexible enough to animate any algorithm. The resulting system incorporates, integrates and extends two systems: the AFIT Algorithm Animation Research Facility (AAARF) and the Parallel Resource Analysis Software Environment (PRASE). Since performance data is an essential part of analyzing any parallel program, multiple views of the performance data are provided as an elementary part of the system. In addition to the animation system, a method for developing the animations is discussed. This method is arpplicable to animating any type of program, sequential or parallel. Whilc: both P-time and NP-ttme algorithms can potentially benefit from using visualization techniques, the set of NP .complete problems provides fertile ground for developing parallel atpplications. The methods discussed in this paper were used to animate a parallel implementation of a general Set Covering Problem (SCP).","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131952211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633304
Xiaodong Zhang
The structured decompositions for solving sparse nonlinear systems of equations are to transform the sparse systems into some special structures so that the computations can be decomposed efficiently for parallel processing. A group of structured nonlinear systems by decoinpositions and their parallel methods for solutions are overviewed.
{"title":"Structured Decompositions for Solving Sparse Nonlinear Systems of Equations on Parallel Computers","authors":"Xiaodong Zhang","doi":"10.1109/DMCC.1991.633304","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633304","url":null,"abstract":"The structured decompositions for solving sparse nonlinear systems of equations are to transform the sparse systems into some special structures so that the computations can be decomposed efficiently for parallel processing. A group of structured nonlinear systems by decoinpositions and their parallel methods for solutions are overviewed.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132216992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633309
D. Anderson, L. Sattler
The conjugate gradient method for solving the system of linear equations arising during a finite element analysis has gained renewed interest with the advent of distributed memory computers. In this paper a method will be described which minimizes storage by taking advantage of symmetry and sparsity and minimizes communication overhead by using asynchronous message passing. The data structure necessary to implement this procedure follows naturally frotn the finite element mesh. Test results show near linear speedup for a suflciently large matrix.
{"title":"Effective Storage and Communication Schemes for Implementation of the Conjugate Gradient Method on an Intel iPSC/860","authors":"D. Anderson, L. Sattler","doi":"10.1109/DMCC.1991.633309","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633309","url":null,"abstract":"The conjugate gradient method for solving the system of linear equations arising during a finite element analysis has gained renewed interest with the advent of distributed memory computers. In this paper a method will be described which minimizes storage by taking advantage of symmetry and sparsity and minimizes communication overhead by using asynchronous message passing. The data structure necessary to implement this procedure follows naturally frotn the finite element mesh. Test results show near linear speedup for a suflciently large matrix.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133894358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}