NYNET (ATM wide-area network testbed in New York state) Communication System (NCS) is a multithreaded message-passing tool developed at Syracuse University that provides low-latency and high-throughput communication services over Asynchronous Transfer Mode (ATM)-based high-performance distributed computing (HPDC) environments. NCS provides flexible and scalable group communication services based on dynamic grouping and tree-based multicasting. The NCS architecture, which separates the data and control functions, allows group operations to be implemented efficiently by utilizing the control connections when transferring status information (e.g. topology information, routing information). Furthermore, NCS provides several different algorithms for group communication and allows programmers to select an appropriate algorithm at runtime. The authors overview the general architecture of NCS and present the multicasting services provided by NCS. They analyze and compare the performance of NCS with that of other message-passing tools such as p4, PVM, and MPI in terms of primitive performance and performance.
{"title":"An efficient group communication architecture over ATM networks","authors":"Sung-Yong Park, Joohan Lee, S. Hariri","doi":"10.1109/HCW.1998.666551","DOIUrl":"https://doi.org/10.1109/HCW.1998.666551","url":null,"abstract":"NYNET (ATM wide-area network testbed in New York state) Communication System (NCS) is a multithreaded message-passing tool developed at Syracuse University that provides low-latency and high-throughput communication services over Asynchronous Transfer Mode (ATM)-based high-performance distributed computing (HPDC) environments. NCS provides flexible and scalable group communication services based on dynamic grouping and tree-based multicasting. The NCS architecture, which separates the data and control functions, allows group operations to be implemented efficiently by utilizing the control connections when transferring status information (e.g. topology information, routing information). Furthermore, NCS provides several different algorithms for group communication and allows programmers to select an appropriate algorithm at runtime. The authors overview the general architecture of NCS and present the multicasting services provided by NCS. They analyze and compare the performance of NCS with that of other message-passing tools such as p4, PVM, and MPI in terms of primitive performance and performance.","PeriodicalId":273718,"journal":{"name":"Proceedings Seventh Heterogeneous Computing Workshop (HCW'98)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128736576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The author studies the performance of four mapping algorithms. The four algorithms include two naive ones: opportunistic load balancing (OLB), and limited best assignment (LBA), and two intelligent greedy algorithms: an O(nm) greedy algorithm, and an O(n/sup 2/m) greedy algorithm. All of these algorithms, except OLB, use expected run-times to assign jobs to machines. As expected run-times are rarely deterministic in modern networked and server based systems, he first uses experimentation to determine some plausible run-time distributions. Using these distributions, he next executes simulations to determine how the mapping algorithms perform. Performance comparisons show that the greedy algorithms produce schedules that, when executed, perform better than naive algorithms, even though the exact run-times are not available to the schedulers. He concludes that the use of intelligent mapping algorithms is beneficial, even when the expected time for completion of a job is not deterministic.
{"title":"The relative performance of various mapping algorithms is independent of sizable variances in run-time predictions","authors":"R. Armstrong, D. Hensgen, T. Kidd","doi":"10.1109/HCW.1998.666547","DOIUrl":"https://doi.org/10.1109/HCW.1998.666547","url":null,"abstract":"The author studies the performance of four mapping algorithms. The four algorithms include two naive ones: opportunistic load balancing (OLB), and limited best assignment (LBA), and two intelligent greedy algorithms: an O(nm) greedy algorithm, and an O(n/sup 2/m) greedy algorithm. All of these algorithms, except OLB, use expected run-times to assign jobs to machines. As expected run-times are rarely deterministic in modern networked and server based systems, he first uses experimentation to determine some plausible run-time distributions. Using these distributions, he next executes simulations to determine how the mapping algorithms perform. Performance comparisons show that the greedy algorithms produce schedules that, when executed, perform better than naive algorithms, even though the exact run-times are not available to the schedulers. He concludes that the use of intelligent mapping algorithms is beneficial, even when the expected time for completion of a job is not deterministic.","PeriodicalId":273718,"journal":{"name":"Proceedings Seventh Heterogeneous Computing Workshop (HCW'98)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126003364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the advent of large scale heterogeneous environments, there is a need for matching and scheduling algorithms which can allow multiple DAG-structured applications to share the computational resources of the network. The paper presents a matching and scheduling framework where multiple applications compete for the computational resources on the network. In this environment, each application makes its own scheduling decisions. Thus, no centralized scheduling resource is required. Applications do not need direct knowledge of the other applications. The only knowledge of other applications arrives indirectly through load estimates (like queue lengths). The paper also presents algorithms for each portion of this scheduling framework. One of these algorithms is modification of a static scheduling algorithm, the DLS algorithm, first presented by Sih and Lee (1993). Other algorithms attempt to predict the future task arrivals by modeling the task arrivals as Poisson random processes. A series of simulations are presented to examine the performance of these algorithms in this environment. These simulations also compare the performance of this environment to a more conventional, single user environment.
{"title":"Dynamic, competitive scheduling of multiple DAGs in a distributed heterogeneous environment","authors":"Michael A. Iverson, F. Özgüner","doi":"10.1109/HCW.1998.666546","DOIUrl":"https://doi.org/10.1109/HCW.1998.666546","url":null,"abstract":"With the advent of large scale heterogeneous environments, there is a need for matching and scheduling algorithms which can allow multiple DAG-structured applications to share the computational resources of the network. The paper presents a matching and scheduling framework where multiple applications compete for the computational resources on the network. In this environment, each application makes its own scheduling decisions. Thus, no centralized scheduling resource is required. Applications do not need direct knowledge of the other applications. The only knowledge of other applications arrives indirectly through load estimates (like queue lengths). The paper also presents algorithms for each portion of this scheduling framework. One of these algorithms is modification of a static scheduling algorithm, the DLS algorithm, first presented by Sih and Lee (1993). Other algorithms attempt to predict the future task arrivals by modeling the task arrivals as Poisson random processes. A series of simulations are presented to examine the performance of these algorithms in this environment. These simulations also compare the performance of this environment to a more conventional, single user environment.","PeriodicalId":273718,"journal":{"name":"Proceedings Seventh Heterogeneous Computing Workshop (HCW'98)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124003115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data-parallel applications executing in multi-user clustered environments share resources with other applications. Since this sharing of resources dramatically affects the performance of individual applications, it is critical to estimate its effect, i.e., the application slowdown, in order to predict application behavior. The authors develop a new approach for predicting the slowdown imposed on data-parallel applications executing on homogeneous and heterogeneous clusters of workstations. The model synthesizes the slowdown on each machine used by an application into a contention measure-the aggregate slowdown factor-used to adjust the execution time of the application to account for the aggregate load. The model is parameterized by the work (or data) partitioning policy employed by the targeted application, the local slowdown (due to contention from other users) present in each node of the cluster and the relative weight (capacity) associated with each node in the cluster. This model provides a basis for predicting realistic execution times for distributed data-parallel applications in production clustered environments.
{"title":"Modeling the slowdown of data-parallel applications in homogeneous and heterogeneous clusters of workstations","authors":"S. Figueira, F. Berman","doi":"10.1109/HCW.1998.666548","DOIUrl":"https://doi.org/10.1109/HCW.1998.666548","url":null,"abstract":"Data-parallel applications executing in multi-user clustered environments share resources with other applications. Since this sharing of resources dramatically affects the performance of individual applications, it is critical to estimate its effect, i.e., the application slowdown, in order to predict application behavior. The authors develop a new approach for predicting the slowdown imposed on data-parallel applications executing on homogeneous and heterogeneous clusters of workstations. The model synthesizes the slowdown on each machine used by an application into a contention measure-the aggregate slowdown factor-used to adjust the execution time of the application to account for the aggregate load. The model is parameterized by the work (or data) partitioning policy employed by the targeted application, the local slowdown (due to contention from other users) present in each node of the cluster and the relative weight (capacity) associated with each node in the cluster. This model provides a basis for predicting realistic execution times for distributed data-parallel applications in production clustered environments.","PeriodicalId":273718,"journal":{"name":"Proceedings Seventh Heterogeneous Computing Workshop (HCW'98)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115325334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A heterogeneous computing system provides a variety of different machines, orchestrated to perform an application whose subtasks have diverse execution requirements. The subtasks must be assigned to machines (matching) and ordered for execution (scheduling) such that the overall application execution time is minimized. A new dynamic mapping (matching and scheduling) heuristic called the hybrid remapper is presented here. The hybrid remapper is based on a centralized policy and improves a statically, obtained initial matching and scheduling by remapping to reduce the overall execution time. The remapping is non-preemptive and the execution of the hybrid remapper can be overlapped with the execution of the subtasks. During application execution, the hybrid remapper uses run-time values for the subtask completion times and machine availability times whenever possible. Therefore, the hybrid remapper bases its decisions on a mixture of run-time and expected values. The potential of the hybrid remapper to improve the performance of initial static mappings is demonstrated using simulation studies.
{"title":"A dynamic matching and scheduling algorithm for heterogeneous computing systems","authors":"Muthucumaru Maheswaran, H. Siegel","doi":"10.1109/HCW.1998.666545","DOIUrl":"https://doi.org/10.1109/HCW.1998.666545","url":null,"abstract":"A heterogeneous computing system provides a variety of different machines, orchestrated to perform an application whose subtasks have diverse execution requirements. The subtasks must be assigned to machines (matching) and ordered for execution (scheduling) such that the overall application execution time is minimized. A new dynamic mapping (matching and scheduling) heuristic called the hybrid remapper is presented here. The hybrid remapper is based on a centralized policy and improves a statically, obtained initial matching and scheduling by remapping to reduce the overall execution time. The remapping is non-preemptive and the execution of the hybrid remapper can be overlapped with the execution of the subtasks. During application execution, the hybrid remapper uses run-time values for the subtask completion times and machine availability times whenever possible. Therefore, the hybrid remapper bases its decisions on a mixture of run-time and expected values. The potential of the hybrid remapper to improve the performance of initial static mappings is demonstrated using simulation studies.","PeriodicalId":273718,"journal":{"name":"Proceedings Seventh Heterogeneous Computing Workshop (HCW'98)","volume":"269 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122186590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Hoyos-Rivera, Esther Martínez González, H. Rios-Figueroa, V. G. Sánchez-Arias, H. Acosta-Mesa, N. Lopez-Benitez
The implementation of an interface to support cooperative work in a heterogeneous completing environment is based on previously proposed definitions referred to as the cooperative work model (CWM) and cooperative work language (CWL). The interface for cooperative work (ICW) and the graphical interface for cooperative work (GICW) are the main two components of a tool useful in the set up and control of a cooperative working environment in a general purpose heterogeneous computing platform. This tool is described as well as some desired characteristics to improve its effectiveness. The specification and control of a virtual parallel machine are illustrated with an algorithm for 3D-reconstruction from two stereoscopic images. Test results on this application are also reported.
{"title":"Specification and control of cooperative work in a heterogeneous computing environment","authors":"G. Hoyos-Rivera, Esther Martínez González, H. Rios-Figueroa, V. G. Sánchez-Arias, H. Acosta-Mesa, N. Lopez-Benitez","doi":"10.1109/HCW.1998.666549","DOIUrl":"https://doi.org/10.1109/HCW.1998.666549","url":null,"abstract":"The implementation of an interface to support cooperative work in a heterogeneous completing environment is based on previously proposed definitions referred to as the cooperative work model (CWM) and cooperative work language (CWL). The interface for cooperative work (ICW) and the graphical interface for cooperative work (GICW) are the main two components of a tool useful in the set up and control of a cooperative working environment in a general purpose heterogeneous computing platform. This tool is described as well as some desired characteristics to improve its effectiveness. The specification and control of a virtual parallel machine are illustrated with an algorithm for 3D-reconstruction from two stereoscopic images. Test results on this application are also reported.","PeriodicalId":273718,"journal":{"name":"Proceedings Seventh Heterogeneous Computing Workshop (HCW'98)","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132515253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Brunett, D. Davis, T. D. Gottschalk, P. Messina, C. Kesselman
A distributed, parallel implementation of the widely used Modular Semi-Automated Forces (ModSAF) Distributed Interactive Simulation (DIS) is presented, with scalable parallel processors (SPPs) used to simulate more than 50,000 individual vehicles. The single-SPP code is portable and has been used on a variety of different SPP architectures for simulations with up to 15,000 vehicles. A general metacomputing framework for DIS on multiple SPPs is discussed and results are presented for an initial system using explicit Gateway processes to manage communications among the SPPs. These 50K-vehicle simulations utilized 1,904 processors at six sites across seven time zones, including platforms from three manufacturers. Ongoing activities to both simplify and enhance the metacomputing system using Globus are described.
{"title":"Implementing distributed synthetic forces simulations in metacomputing environments","authors":"S. Brunett, D. Davis, T. D. Gottschalk, P. Messina, C. Kesselman","doi":"10.1109/HCW.1998.666543","DOIUrl":"https://doi.org/10.1109/HCW.1998.666543","url":null,"abstract":"A distributed, parallel implementation of the widely used Modular Semi-Automated Forces (ModSAF) Distributed Interactive Simulation (DIS) is presented, with scalable parallel processors (SPPs) used to simulate more than 50,000 individual vehicles. The single-SPP code is portable and has been used on a variety of different SPP architectures for simulations with up to 15,000 vehicles. A general metacomputing framework for DIS on multiple SPPs is discussed and results are presented for an initial system using explicit Gateway processes to manage communications among the SPPs. These 50K-vehicle simulations utilized 1,904 processors at six sites across seven time zones, including platforms from three manufacturers. Ongoing activities to both simplify and enhance the metacomputing system using Globus are described.","PeriodicalId":273718,"journal":{"name":"Proceedings Seventh Heterogeneous Computing Workshop (HCW'98)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114846858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Java's remote method invocation mechanism provides a number of features that extended the functionality of traditional client/server-based distributed systems. However, there are a number of characteristics of the language that influence its utility as a vehicle in which to express lightweight mobile processes. Among these are its highly imperative sequential core, the close coupling of control and state as a consequence of its object model, and the fact that remote method calls are not properly tail-recursive. These features impact the likelihood that Java can easily support process and object mobility for programs which exhibit complex communication and distribution patterns.
{"title":"On the interaction between mobile processes and objects","authors":"S. Jagannathan, R. Kelsey","doi":"10.1109/HCW.1998.666555","DOIUrl":"https://doi.org/10.1109/HCW.1998.666555","url":null,"abstract":"Java's remote method invocation mechanism provides a number of features that extended the functionality of traditional client/server-based distributed systems. However, there are a number of characteristics of the language that influence its utility as a vehicle in which to express lightweight mobile processes. Among these are its highly imperative sequential core, the close coupling of control and state as a consequence of its object model, and the fact that remote method calls are not properly tail-recursive. These features impact the likelihood that Java can easily support process and object mobility for programs which exhibit complex communication and distribution patterns.","PeriodicalId":273718,"journal":{"name":"Proceedings Seventh Heterogeneous Computing Workshop (HCW'98)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115953040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}