Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.556303
P. M. Campbell, Se, Albuquerque, Edward, A., Carmona, D. Walker
This paper presents a new approach to parallelizing particle-in-cell (PIC) algorithms used in the numeri- cal simulation of three-dimensional plasmas on MIMD multicomputers. Two new concepts are introduced: unitary load balance and hierarchical decomposition. The combined load for particle and field calculations ouer the time step is balanced together to form a single spatial decomposition. The unitary load scheme permits the load to be approzimately balanced while requiring less communication. Decomposition and dynamic bal- ancing as performed in each of the coordinate directions independently (hierarchical), and is particularly efi- cient when load imbalance propagates preferentially in a given direction. The hierarchical decomposition also minimizes the amount of particles that cross bound- ary regions, thereby decreasing communication. A local load balancing method is also introduced which allows rows or columns of processors to perform dynamic load balancing locally and in parallel.
{"title":"Hierarchical Domain Decomposition With Unitary Load Balancing For Electromagnetic Particle-In-Cell Codes","authors":"P. M. Campbell, Se, Albuquerque, Edward, A., Carmona, D. Walker","doi":"10.1109/DMCC.1990.556303","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556303","url":null,"abstract":"This paper presents a new approach to parallelizing particle-in-cell (PIC) algorithms used in the numeri- cal simulation of three-dimensional plasmas on MIMD multicomputers. Two new concepts are introduced: unitary load balance and hierarchical decomposition. The combined load for particle and field calculations ouer the time step is balanced together to form a single spatial decomposition. The unitary load scheme permits the load to be approzimately balanced while requiring less communication. Decomposition and dynamic bal- ancing as performed in each of the coordinate directions independently (hierarchical), and is particularly efi- cient when load imbalance propagates preferentially in a given direction. The hierarchical decomposition also minimizes the amount of particles that cross bound- ary regions, thereby decreasing communication. A local load balancing method is also introduced which allows rows or columns of processors to perform dynamic load balancing locally and in parallel.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128128179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.556331
V. Balasundaram, Geoffrey C. Fox, K. Kennedy, U. Kremer
An a.pproach to distributed riieiiiory pa.ralle1 programining that has recently become popular is oue where the programmer explicitly specilies t.he data decoiriposit.ion using language extensions, and a. compiler geuerates all the coiriinunicatioii. While this frees the prograiniuer froin tlie tedium of thinking about message-passing, no assistance is provided in determining the data decouiposition scheme that gives the best performance on tlie target machine. In this paper, we propose an interactive software tool that provides assistance for this very task. The proposed tool also computes performance estimates for any chosen data partitioning scheme, allowing tlie programmer to experiment with several different stra.tegies without ever running the program on the rnacliine.
{"title":"An Interactive Environment for Data Partitioning and Distribution","authors":"V. Balasundaram, Geoffrey C. Fox, K. Kennedy, U. Kremer","doi":"10.1109/DMCC.1990.556331","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556331","url":null,"abstract":"An a.pproach to distributed riieiiiory pa.ralle1 programining that has recently become popular is oue where the programmer explicitly specilies t.he data decoiriposit.ion using language extensions, and a. compiler geuerates all the coiriinunicatioii. While this frees the prograiniuer froin tlie tedium of thinking about message-passing, no assistance is provided in determining the data decouiposition scheme that gives the best performance on tlie target machine. In this paper, we propose an interactive software tool that provides assistance for this very task. The proposed tool also computes performance estimates for any chosen data partitioning scheme, allowing tlie programmer to experiment with several different stra.tegies without ever running the program on the rnacliine.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"194 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121612961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.555380
T. Huntsberger, B.A. Huntsberger
The tremendous amount of data contained in an image oftentimes precludes the extraction of useful information in real-time environments. A multiresolution representation can be used to obtain structural properties of a single image or sequences of images[8]. These structural properties are useful for such operations as texture analysis, image segmentation, object identification and stereo mat ching[7,12,11].
{"title":"Hypercube Algorithm for Image Decomposition and Analysis in the Wavelet Representation","authors":"T. Huntsberger, B.A. Huntsberger","doi":"10.1109/DMCC.1990.555380","DOIUrl":"https://doi.org/10.1109/DMCC.1990.555380","url":null,"abstract":"The tremendous amount of data contained in an image oftentimes precludes the extraction of useful information in real-time environments. A multiresolution representation can be used to obtain structural properties of a single image or sequences of images[8]. These structural properties are useful for such operations as texture analysis, image segmentation, object identification and stereo mat ching[7,12,11].","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115564436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.556409
Jin-Kun Wang, F. Ozguner
In many parallel algorithms in hypercubes, linear arrays and 2-D meshes are embedded for the computations that require local communication while the hypercube topology is used for global communication. In this paper, embedding and global data communication schemes are developed for faulty hypercubes and studied in the context of algorithms. The schemes are also applicable to incomplete hypercubes resulting from the allocation of subcubes to different users.
{"title":"Embeddings, Communication and Performance of Algorithms in Faulty Hypercubes","authors":"Jin-Kun Wang, F. Ozguner","doi":"10.1109/DMCC.1990.556409","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556409","url":null,"abstract":"In many parallel algorithms in hypercubes, linear arrays and 2-D meshes are embedded for the computations that require local communication while the hypercube topology is used for global communication. In this paper, embedding and global data communication schemes are developed for faulty hypercubes and studied in the context of algorithms. The schemes are also applicable to incomplete hypercubes resulting from the allocation of subcubes to different users.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129001744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.556309
D. Grunwald, B. Nazief, D. Reed
The study compared several load placement algorithms using instrumented programs and synthetic program models. Salient characteristics of these program traces (total computation time, total number of messages sent, and average message time) span two orders of magnitude. Load distribution algorithms determine the initial placement for processes, a precursor to the more general problem of load redistribution. It is found that desirable workload distribution strategies will place new processes globally, rather than locally, to spread processes rapidly, but that local information should be used to refine global placement.
{"title":"Empirical Comparison of Heuristic Load Distribution in Point-to-Point Multicomputer Networks","authors":"D. Grunwald, B. Nazief, D. Reed","doi":"10.1109/DMCC.1990.556309","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556309","url":null,"abstract":"The study compared several load placement algorithms using instrumented programs and synthetic program models. Salient characteristics of these program traces (total computation time, total number of messages sent, and average message time) span two orders of magnitude. Load distribution algorithms determine the initial placement for processes, a precursor to the more general problem of load redistribution. It is found that desirable workload distribution strategies will place new processes globally, rather than locally, to spread processes rapidly, but that local information should be used to refine global placement.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129076007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.556405
W. Thacker, O.E. Katter
To compete successfully in a world that is becoming more international and more competitive, we must increasingly become more effective in using technology, especially in our educational institutions. It is clear that there is a strong parallel between the way current developments in information technology are significantly effecting our business organizations and the way machines helped to transform our society during the Industrial Revolution. Academic organizations must accelerate the integration of new information technology, such as parallel processing methodology into the classroom for our economic growth plans to be achieved. This paper discusses the Winthrop College Computer Science Department's methodology to increase the parallel processing content and experience in our educational offerings.
{"title":"Transferring Parallel Processing Technology To Undergraduate Computer Science Students","authors":"W. Thacker, O.E. Katter","doi":"10.1109/DMCC.1990.556405","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556405","url":null,"abstract":"To compete successfully in a world that is becoming more international and more competitive, we must increasingly become more effective in using technology, especially in our educational institutions. It is clear that there is a strong parallel between the way current developments in information technology are significantly effecting our business organizations and the way machines helped to transform our society during the Industrial Revolution. Academic organizations must accelerate the integration of new information technology, such as parallel processing methodology into the classroom for our economic growth plans to be achieved. This paper discusses the Winthrop College Computer Science Department's methodology to increase the parallel processing content and experience in our educational offerings.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128784747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.555422
Roy Williams, B. Rasnow, Christopher Assad
We present a simulation of the electrosensory input of the weakly electric fish Apteronotus leptorhynchus. This fish senses its environment by producing a sinusoidal voltage difference between its body and tail sections, causing an electric field and a current distribution in the surrounding water. If an object is nearby which has different electrical conductivity from the surrounding water, the current distribution is disturbed on the skin of the fish. The fish senses this difference from the usual current distribution, and infers the presence and location of the object. Mathematically, the problem is to solve a potential equation in the domain exterior to the fish with Cauchy boundary conditions, in the presence of an induced dipole arising from the object, and extract the potential difference across the fish skin. We have created an unstructured triangular mesh covering the two-dimensional manifold of the fish skin, using the distributed Irregular Mesh Environment (DIME), then used the Boundary Element Method to solve for the potential derivative at the fish skin. The computational problem is the solution of a full set of simultaneous linear equations, where there is an equation for each node of the boundary mesh, typically about 100 - 200. We have used an NCUBE hypercube to calculate the matrix elements and solve these equations, once for each relative position of the fish and the test object. We present some early results from the simulation.
{"title":"Hypercube Simulation of Electric Fish Potentials","authors":"Roy Williams, B. Rasnow, Christopher Assad","doi":"10.1109/DMCC.1990.555422","DOIUrl":"https://doi.org/10.1109/DMCC.1990.555422","url":null,"abstract":"We present a simulation of the electrosensory input of the weakly electric fish Apteronotus leptorhynchus. This fish senses its environment by producing a sinusoidal voltage difference between its body and tail sections, causing an \u0000electric field and a current distribution in the surrounding \u0000water. If an object is nearby which has different electrical conductivity from the surrounding water, the current distribution is disturbed on the skin of the fish. The fish senses this difference from the usual current distribution, and infers the presence and location of the object. \u0000 \u0000Mathematically, the problem is to solve a potential equation in the domain exterior to the fish with Cauchy boundary conditions, in the presence of an induced dipole arising from the object, and extract the potential difference across the fish skin. \u0000 \u0000We have created an unstructured triangular mesh covering the two-dimensional manifold of the fish skin, using the distributed Irregular Mesh Environment (DIME), then used the Boundary Element Method to solve for the potential derivative at the fish skin. \u0000 \u0000The computational problem is the solution of a full set of simultaneous linear equations, where there is an equation for each node of the boundary mesh, typically about 100 - 200. We have used an NCUBE hypercube to calculate the matrix elements and solve these equations, once for each relative position of the fish and the test object. We present some early results from the simulation.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124661796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.555426
P. Hipes, C. Winstead, M. Lima, V. McKoy
We report on a distributed memory implementation and initial applications of a program for calculating electron-molecule collision cross sections. Runs on the Mark IIIfp hypercube show that large-grain MIMD machines are well suited for these applications. Some results of studies of e^--Si_2H_6 and e^--SiF_4 collisions will be discussed.
{"title":"Studies of Electron-Molecule Collisions on the Mark IIIfp Hypercube","authors":"P. Hipes, C. Winstead, M. Lima, V. McKoy","doi":"10.1109/DMCC.1990.555426","DOIUrl":"https://doi.org/10.1109/DMCC.1990.555426","url":null,"abstract":"We report on a distributed memory implementation and initial applications of a program for calculating electron-molecule collision cross sections. Runs on the Mark IIIfp hypercube show that large-grain MIMD machines are well suited for these applications. Some results of studies of e^--Si_2H_6 and e^--SiF_4 collisions will be discussed.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116695540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.555372
D. Meier, K. Cloud, J. C. Horvath, L.D. Allan, W. Hammond, H. Maxfield
We describe a general framework for building and running complex time-driven simulations with several levels of concurrency. The framework has been implemented on the Caltech/JPL Mark IIIfp hypercube using the Centaur communications protocol. Our framework allows the programmer to break the hypercube up into one or more subcubes of arbitrary size (task parallelism). Each subcube runs a separate application using data parallelism and synchronous communications internal to the subcube. Communications between subcubes are performed with asynchronous messages. Subcubes can each define their own parameters and commands which drive their particular application. These are collected and organized by the Control Processor (CP) in order that the entire simulation can be driven from a single command-driven shell. This system allows several programmers to develop disjoint pieces of a large simulation in parallel and to then integrate them with little effort. Each programmer is, of course, also able to take advantage of the separate data and I/O processors on each hypercube node in order to overlap calculation and communication (on-board parallelism) as well as the pipelined floating point processor on each node (pipelined processor parallelism). We show, as an example of the framework, a large space defense simulation. Functions (sensing, tracking, etc.) each comprise a subcube; functions are collected into defense platforms (satellites); and many platforms comprise the defense architecture. Software in the CP uses simple input to determine the node allocation to each function based on the desired defense architecture and number of platforms simulated in the hypercube. This allows many different architectures to be simulated. The set of simulated platforms, the results, and the messages between them are shown on color graphics displays. The methods used herein can be generalized to other simulations of a similar nature in a straightforward manner.
{"title":"A General Framework for Complex Time-Driven Simulations on Hypercubes","authors":"D. Meier, K. Cloud, J. C. Horvath, L.D. Allan, W. Hammond, H. Maxfield","doi":"10.1109/DMCC.1990.555372","DOIUrl":"https://doi.org/10.1109/DMCC.1990.555372","url":null,"abstract":"We describe a general framework for building and running complex time-driven simulations with several levels of concurrency. The framework has been implemented on the Caltech/JPL Mark IIIfp hypercube using the Centaur communications protocol. Our framework allows the programmer to break the hypercube up into one or more subcubes of arbitrary size (task parallelism). Each subcube runs a separate application using data parallelism and synchronous communications internal to the subcube. Communications between subcubes are performed with asynchronous messages. Subcubes can each define their own parameters and commands which drive their particular application. These are collected and organized by the Control Processor (CP) in order that the entire simulation can be driven from a single command-driven shell. This system allows several programmers to develop disjoint pieces of a large simulation in parallel and to then integrate them with little effort. Each programmer is, of course, also able to take advantage of the separate data and I/O processors on each hypercube node in order to overlap calculation and communication (on-board parallelism) as well as the pipelined floating point processor on each node (pipelined processor parallelism). \u0000 \u0000We show, as an example of the framework, a large space defense simulation. Functions (sensing, tracking, etc.) each comprise a subcube; functions are collected into defense platforms (satellites); and many platforms comprise the defense architecture. Software in the CP uses simple input to determine the node allocation to each function based on the desired defense architecture and number of platforms simulated in the hypercube. This allows many different architectures to be simulated. The set of simulated platforms, the results, and the messages between them are shown on color graphics displays. The methods used herein can be generalized to other simulations of a similar nature in a straightforward manner.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"165 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125171224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1990-04-08DOI: 10.1109/DMCC.1990.556387
Xiaojun Guan, M. Langston
A distributed algorithm is time-space optimal if it achieves optimal speedup and if it uses only a constant amount of extra space when the number of processors is fixed. In this brief paper, we outline a distributed algorithm for merging that, given a multi-channel broadcast network with k processors, merges two sorted lists of total length n in O(n/b+logk) time and O(k) extra space, and are thus time-space optimal for any fixed value of IC that satisfies n 2 klogk.
{"title":"Distributed Algorithms for Multi-Channel Broadcast Networks","authors":"Xiaojun Guan, M. Langston","doi":"10.1109/DMCC.1990.556387","DOIUrl":"https://doi.org/10.1109/DMCC.1990.556387","url":null,"abstract":"A distributed algorithm is time-space optimal if it achieves optimal speedup and if it uses only a constant amount of extra space when the number of processors is fixed. In this brief paper, we outline a distributed algorithm for merging that, given a multi-channel broadcast network with k processors, merges two sorted lists of total length n in O(n/b+logk) time and O(k) extra space, and are thus time-space optimal for any fixed value of IC that satisfies n 2 klogk.","PeriodicalId":204431,"journal":{"name":"Proceedings of the Fifth Distributed Memory Computing Conference, 1990.","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122277601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}