Pub Date : 1997-04-01DOI: 10.1109/IPPS.1997.580894
A. Downey
We present statistical techniques for predicting the queue times experienced by jobs submitted to a space-sharing parallel machine with first-come-first-served (FCFS) scheduling. We apply these techniques to trace data from the Intel Paragon at the San Diego Supercomputer Center and the IBM SP2 at the Cornell Theory Center. We show that it is possible to predict queue times with accuracy that is acceptable for several intended applications. The coefficient of correlation between our predicted queue times and the actual queue times from simulated schedules is between 0.65 and 0.72.
{"title":"Predicting queue times on space-sharing parallel computers","authors":"A. Downey","doi":"10.1109/IPPS.1997.580894","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580894","url":null,"abstract":"We present statistical techniques for predicting the queue times experienced by jobs submitted to a space-sharing parallel machine with first-come-first-served (FCFS) scheduling. We apply these techniques to trace data from the Intel Paragon at the San Diego Supercomputer Center and the IBM SP2 at the Cornell Theory Center. We show that it is possible to predict queue times with accuracy that is acceptable for several intended applications. The coefficient of correlation between our predicted queue times and the actual queue times from simulated schedules is between 0.65 and 0.72.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131239351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-04-01DOI: 10.1109/IPPS.1997.580930
Stefanos N. Damianakis, Yuqun Chen, E. Felten
Describes a mechanism for reducing the cost of waiting for messages in architectures that allow user-level communication libraries. We reduce waiting costs in two ways: by reducing the cost of servicing interrupts, and by carefully controlling when the system uses interrupts and when it uses polling. We have implemented our mechanism on the SHRIMP multicomputer and integrated it with our user-level sockets library. Experiments show that a hybrid spin-then-block strategy offers good performance in a wide variety of situations, and that speeding up the interrupt path significantly improves performance.
{"title":"Reducing waiting costs in user-level communication","authors":"Stefanos N. Damianakis, Yuqun Chen, E. Felten","doi":"10.1109/IPPS.1997.580930","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580930","url":null,"abstract":"Describes a mechanism for reducing the cost of waiting for messages in architectures that allow user-level communication libraries. We reduce waiting costs in two ways: by reducing the cost of servicing interrupts, and by carefully controlling when the system uses interrupts and when it uses polling. We have implemented our mechanism on the SHRIMP multicomputer and integrated it with our user-level sockets library. Experiments show that a hybrid spin-then-block strategy offers good performance in a wide variety of situations, and that speeding up the interrupt path significantly improves performance.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124857993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-04-01DOI: 10.1109/IPPS.1997.580927
C. Srinilta, D. Jadav, A. Choudhary
High-performance servers and high-speed networks will form the backbone of the infrastructure required for distributed multimedia information systems. Given that the goal of such a server is to support hundreds of interactive data streams simultaneously, various tradeoffs are possible with respect to the storage of data on secondary memory and its retrieval In this paper, we identify and evaluate these tradeoffs. We evaluate the effect of varying the stripe factor and also the performance of batched retrieval of disk-resident data. We develop a methodology to predict the stream capacity of such a server. The evaluation is done for both uniform and skewed access patterns. Experimental results on the Intel Paragon computer are presented.
{"title":"Design and evaluation of data storage and retrieval strategies in a distributed memory continuous media server","authors":"C. Srinilta, D. Jadav, A. Choudhary","doi":"10.1109/IPPS.1997.580927","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580927","url":null,"abstract":"High-performance servers and high-speed networks will form the backbone of the infrastructure required for distributed multimedia information systems. Given that the goal of such a server is to support hundreds of interactive data streams simultaneously, various tradeoffs are possible with respect to the storage of data on secondary memory and its retrieval In this paper, we identify and evaluate these tradeoffs. We evaluate the effect of varying the stripe factor and also the performance of batched retrieval of disk-resident data. We develop a methodology to predict the stream capacity of such a server. The evaluation is done for both uniform and skewed access patterns. Experimental results on the Intel Paragon computer are presented.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124895892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-04-01DOI: 10.1109/IPPS.1997.580908
Rajeev Sivaram, C. Stunkel, D. Panda
Barrier synchronization is a crucial operation for parallel systems. Many schemes have been proposed in the literature to achieve fast barrier synchronization through software, hardware, or a combination of these mechanisms. However few of these schemes emphasize fault-tolerant barrier operations. In this paper, we describe inexpensive support that can be added to network switches for achieving reliable hardware-based barrier synchronization while recovering from lost or corrupted messages. Necessary modifications to the switch architecture and the associated fault-tolerant message-passing protocols are presented. The protocols are optimized for the no-fault case while providing means to detect the failure of any step of the operation and to recover from it. The proposed scheme shows significant potential for use in parallel systems, especially the emerging systems based on networks of workstations.
{"title":"A reliable hardware barrier synchronization scheme","authors":"Rajeev Sivaram, C. Stunkel, D. Panda","doi":"10.1109/IPPS.1997.580908","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580908","url":null,"abstract":"Barrier synchronization is a crucial operation for parallel systems. Many schemes have been proposed in the literature to achieve fast barrier synchronization through software, hardware, or a combination of these mechanisms. However few of these schemes emphasize fault-tolerant barrier operations. In this paper, we describe inexpensive support that can be added to network switches for achieving reliable hardware-based barrier synchronization while recovering from lost or corrupted messages. Necessary modifications to the switch architecture and the associated fault-tolerant message-passing protocols are presented. The protocols are optimized for the no-fault case while providing means to detect the failure of any step of the operation and to recover from it. The proposed scheme shows significant potential for use in parallel systems, especially the emerging systems based on networks of workstations.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"2013 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121516563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-04-01DOI: 10.1109/IPPS.1997.580913
Tatsuya Hayashi, K. Nakano, S. Olariu
The k-merge problem, given a collection of k, (2/spl les/k/spl les/n), sorted sequences of total length a asks to merge them into a new sorted sequence. The main contribution of the work is to propose simple and intuitive work-time optimal algorithms for the k-merge problem on two PRAM models. Specifically their k-merge algorithms perform O(nlogk) work and run in O(log n) time on the EREW-PRAM and in O (log log n+log k) time on the CREW-PRAM, respectively.
{"title":"Work-time optimal k-merge algorithms on the PRAM","authors":"Tatsuya Hayashi, K. Nakano, S. Olariu","doi":"10.1109/IPPS.1997.580913","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580913","url":null,"abstract":"The k-merge problem, given a collection of k, (2/spl les/k/spl les/n), sorted sequences of total length a asks to merge them into a new sorted sequence. The main contribution of the work is to propose simple and intuitive work-time optimal algorithms for the k-merge problem on two PRAM models. Specifically their k-merge algorithms perform O(nlogk) work and run in O(log n) time on the EREW-PRAM and in O (log log n+log k) time on the CREW-PRAM, respectively.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115178232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-04-01DOI: 10.1109/IPPS.1997.580864
Delbert Hart, Eileen T. Kraemer
Program understanding is central to the development of distributed computations, from the initial coding phase, through testing and debugging, to maintenance and support. Our goal is to support programmers unfamiliar with a particular distributed computation in developing a reasonable understanding of the workings of a program, without requiring that they examine the details of the code itself. Toward, this goal, we propose query based visualization, a novel exploratory approach to understanding distributed computations. The key features of the approach are the use of queries as a device for searching the state space, visual presentation techniques adapted from program animation, and the ability to navigate through the state space using visual interactions. All views correspond to globally consistent snapshots of the computation. A working prototype demonstrates the technical feasibility of the approach.
{"title":"Interactive visual exploration of distributed computations","authors":"Delbert Hart, Eileen T. Kraemer","doi":"10.1109/IPPS.1997.580864","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580864","url":null,"abstract":"Program understanding is central to the development of distributed computations, from the initial coding phase, through testing and debugging, to maintenance and support. Our goal is to support programmers unfamiliar with a particular distributed computation in developing a reasonable understanding of the workings of a program, without requiring that they examine the details of the code itself. Toward, this goal, we propose query based visualization, a novel exploratory approach to understanding distributed computations. The key features of the approach are the use of queries as a device for searching the state space, visual presentation techniques adapted from program animation, and the ability to navigate through the state space using visual interactions. All views correspond to globally consistent snapshots of the computation. A working prototype demonstrates the technical feasibility of the approach.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"33 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126224886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-04-01DOI: 10.1109/IPPS.1997.580889
A. Bilas, Jason E. Fritts, J. Singh
The growing demand for high quality compressed video has led to an increasing need for real-time MPEG decoding at greater resolutions and picture sizes. With the widespread availability of small-scale multiprocessors, a parallel software implementation may provide an effective solution to the decoding problem. We present a parallel decoder for the MPEG standard, implemented on a shared memory multiprocessor. Goal of this work is to provide an all-software solution for real-time, high-quality video decoding and to investigate the important properties of this application as they pertain to multiprocessor systems. Both coarse and fine grained implementations are considered for parallelizing the decoder. The coarse-grained approach exploits parallelism at the group of pictures level, while the fine-grained approach parallelizes within pictures, at the slice level. A comparative evaluation of these methods is made, with results presented in terms of speedup, memory requirements, load balance, synchronization time, and temporal and spatial locality. Both methods demonstrate very good speedups and locality properties.
{"title":"Real-time parallel MPEG-2 decoding in software","authors":"A. Bilas, Jason E. Fritts, J. Singh","doi":"10.1109/IPPS.1997.580889","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580889","url":null,"abstract":"The growing demand for high quality compressed video has led to an increasing need for real-time MPEG decoding at greater resolutions and picture sizes. With the widespread availability of small-scale multiprocessors, a parallel software implementation may provide an effective solution to the decoding problem. We present a parallel decoder for the MPEG standard, implemented on a shared memory multiprocessor. Goal of this work is to provide an all-software solution for real-time, high-quality video decoding and to investigate the important properties of this application as they pertain to multiprocessor systems. Both coarse and fine grained implementations are considered for parallelizing the decoder. The coarse-grained approach exploits parallelism at the group of pictures level, while the fine-grained approach parallelizes within pictures, at the slice level. A comparative evaluation of these methods is made, with results presented in terms of speedup, memory requirements, load balance, synchronization time, and temporal and spatial locality. Both methods demonstrate very good speedups and locality properties.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134213104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-04-01DOI: 10.1109/IPPS.1997.580862
Eileen T. Kraemer
Interactive program steering is a promising technique for improving the performance of parallel and distributed applications. Steering decisions are typically based on visual presentations of some subset of the computation's current state, a historical view of the computation's behavior or views of metrics based on the program's performance. As in any endeavor good decisions require accurate information. However the distributed nature of the collection process may result in distortions in the portrayal of the program's execution. These distortions stem from the merging of streams of information from distributed collection points into a single stream without enforcing the ordering relationships that held among the program components that produced the information. An ordering filter placed at the point at which the streams are merged can ensure a valid ordering, leading to more accurate visualizations and better informed steering decisions. In this paper we describe the implementation of such filters in the Falcon interactive steering toolkit, and present a methodology for their specification for automated generation.
{"title":"Causality filters: a tool for the online visualization and steering of parallel and distributed programs","authors":"Eileen T. Kraemer","doi":"10.1109/IPPS.1997.580862","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580862","url":null,"abstract":"Interactive program steering is a promising technique for improving the performance of parallel and distributed applications. Steering decisions are typically based on visual presentations of some subset of the computation's current state, a historical view of the computation's behavior or views of metrics based on the program's performance. As in any endeavor good decisions require accurate information. However the distributed nature of the collection process may result in distortions in the portrayal of the program's execution. These distortions stem from the merging of streams of information from distributed collection points into a single stream without enforcing the ordering relationships that held among the program components that produced the information. An ordering filter placed at the point at which the streams are merged can ensure a valid ordering, leading to more accurate visualizations and better informed steering decisions. In this paper we describe the implementation of such filters in the Falcon interactive steering toolkit, and present a methodology for their specification for automated generation.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128832929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-04-01DOI: 10.1109/IPPS.1997.580991
W. Hahn, K. Rim, Soo-Won Kim
A new parallel processing system for commercial applications, called SPAX, is described. SPAX cost-effectively overcomes the SMP limitation by providing scalability of the parallel processing system and application portability of the SMP. We also describe a new system network, called Xcent-Net, which interconnects hundreds of multiprocessor PC boards in SPAX. It is a hierarchical network that provides incremental scalability with minimum re-wiring when the user's requirement is changed. This is based on the low latency crossbar routers on each hierarchy, which consist a router-cloud and provide up to 2.67 Gbytes/sec/router-cloud of bandwidth. We briefly describe the preliminary evaluation result that shows Xcent-Net will not be the bottleneck in the system running a typical commercial application.
{"title":"SPAX: a new parallel processing system for commercial applications","authors":"W. Hahn, K. Rim, Soo-Won Kim","doi":"10.1109/IPPS.1997.580991","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580991","url":null,"abstract":"A new parallel processing system for commercial applications, called SPAX, is described. SPAX cost-effectively overcomes the SMP limitation by providing scalability of the parallel processing system and application portability of the SMP. We also describe a new system network, called Xcent-Net, which interconnects hundreds of multiprocessor PC boards in SPAX. It is a hierarchical network that provides incremental scalability with minimum re-wiring when the user's requirement is changed. This is based on the low latency crossbar routers on each hierarchy, which consist a router-cloud and provide up to 2.67 Gbytes/sec/router-cloud of bandwidth. We briefly describe the preliminary evaluation result that shows Xcent-Net will not be the bottleneck in the system running a typical commercial application.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132312066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-04-01DOI: 10.1109/IPPS.1997.580925
Y. Yasuda, Hiroaki Fujii, Hideya Akashi, Y. Inagami, Teruo Tanaka, Junji Nakagoshi, Hideo Wada, Tsutomu Sumimoto
We have developed a hardware detour path selection facility for the Hitachi SR2201 parallel computer, which uses a multi-dimensional crossbar as an inter-processor network to ensure operating efficiency and high reliability when a part of the network is faulty. When this hardware facility is used, packets are transmitted to their destination along alternative paths to avoid the fault. However, changing the routing may cause deadlock. This paper describes a deadlock-free fault-tolerant routing scheme that can be used by the detour path selection facility to avoid deadlock, and its implementation for the SR2201.
{"title":"Deadlock-free fault-tolerant routing in the multi-dimensional crossbar network and its implementation for the Hitachi SR2201","authors":"Y. Yasuda, Hiroaki Fujii, Hideya Akashi, Y. Inagami, Teruo Tanaka, Junji Nakagoshi, Hideo Wada, Tsutomu Sumimoto","doi":"10.1109/IPPS.1997.580925","DOIUrl":"https://doi.org/10.1109/IPPS.1997.580925","url":null,"abstract":"We have developed a hardware detour path selection facility for the Hitachi SR2201 parallel computer, which uses a multi-dimensional crossbar as an inter-processor network to ensure operating efficiency and high reliability when a part of the network is faulty. When this hardware facility is used, packets are transmitted to their destination along alternative paths to avoid the fault. However, changing the routing may cause deadlock. This paper describes a deadlock-free fault-tolerant routing scheme that can be used by the detour path selection facility to avoid deadlock, and its implementation for the SR2201.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130935472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}