{"title":"Approximation algorithms for time constrained scheduling","authors":"K. Jansen, Sabine R. Öhring","doi":"10.1109/ICAPP.1995.472254","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472254","url":null,"abstract":"In this paper we consider the following time constrained scheduling problem. Given a set of jobs J with execution times e(j)/spl isin/(0, .<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116598723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-06-01DOI: 10.1109/ICAPP.1995.472201
J. Brzeziński, J. Hélary, M. Raynal
This paper is about the definition of deadlocks in asynchronous messages communication systems. The considered system model covers unspecified receptions, not FIFO channels, and general resource (message) requests including, among others, AND, OR, AND-OR and k-out-of-n requests.<>
本文讨论了异步消息通信系统中死锁的定义。所考虑的系统模型包括未指定的接收,而不是FIFO通道,以及一般资源(消息)请求,其中包括and, OR, and -OR和k- of-n请求。
{"title":"A general definition of deadlocks for distributed systems","authors":"J. Brzeziński, J. Hélary, M. Raynal","doi":"10.1109/ICAPP.1995.472201","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472201","url":null,"abstract":"This paper is about the definition of deadlocks in asynchronous messages communication systems. The considered system model covers unspecified receptions, not FIFO channels, and general resource (message) requests including, among others, AND, OR, AND-OR and k-out-of-n requests.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131452892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472173
Y. M. Teo, S. Tay
Multistage interconnection networks are used in a number of application areas such as parallel computers and high-speed communication systems. As the performance of these systems lies on an efficient design of the interconnection network, a thorough analysis of the network's performance is important. Mathematical analysis so far provides inadequate results and simulation analysis using a uniprocessor usually requires extremely long run time to evaluate large networks. This paper addresses the use of parallel simulation techniques to speedup the simulation of multistage interconnection networks. The conventional null-message approach for resolving deadlock problem in conservative simulation may cause livelock if lookahead is not guaranteed. We propose a deadlock/livelock free scheme using null messages, but without the guaranteed lookahead, to coordinate the simulation, and different partitioning techniques for mapping of the simulation program onto multicomputers. A flushing mechanism is also used to resolve the null-message explosion problem. Our analysis shows that the proposed flushing mechanism effectively reduces the number of null messages from exponential to linear.<>
{"title":"Modeling and efficient distributed simulation of multistage interconnection networks","authors":"Y. M. Teo, S. Tay","doi":"10.1109/ICAPP.1995.472173","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472173","url":null,"abstract":"Multistage interconnection networks are used in a number of application areas such as parallel computers and high-speed communication systems. As the performance of these systems lies on an efficient design of the interconnection network, a thorough analysis of the network's performance is important. Mathematical analysis so far provides inadequate results and simulation analysis using a uniprocessor usually requires extremely long run time to evaluate large networks. This paper addresses the use of parallel simulation techniques to speedup the simulation of multistage interconnection networks. The conventional null-message approach for resolving deadlock problem in conservative simulation may cause livelock if lookahead is not guaranteed. We propose a deadlock/livelock free scheme using null messages, but without the guaranteed lookahead, to coordinate the simulation, and different partitioning techniques for mapping of the simulation program onto multicomputers. A flushing mechanism is also used to resolve the null-message explosion problem. Our analysis shows that the proposed flushing mechanism effectively reduces the number of null messages from exponential to linear.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127462605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472249
A. Symons, V. Narasimhan
The task of predicting the performance of an application on a new platform given its performance on a particular computer system is fairly difficult. Significant degree of exhaustive modeling of both algorithms and architectures is required before a reasonable prediction can be attempted. This paper describes a general purpose simulator for message passing multiprocessors (Parsim), which facilitates system modeling. Besides monitoring a number of system parameters, Parsim permits easy reconfiguration of topology and algorithm mappings. Parsim has been used to predict the performance of a number of different algorithms such as Fast Fourier Transform, Livermore loops and integer programming, on a variety of topologies such as transputer hypercubes, transputer meshes and a cluster of workstations on an ethernet backbone.<>
{"title":"Parsim-message PAssing computeR SIMulator","authors":"A. Symons, V. Narasimhan","doi":"10.1109/ICAPP.1995.472249","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472249","url":null,"abstract":"The task of predicting the performance of an application on a new platform given its performance on a particular computer system is fairly difficult. Significant degree of exhaustive modeling of both algorithms and architectures is required before a reasonable prediction can be attempted. This paper describes a general purpose simulator for message passing multiprocessors (Parsim), which facilitates system modeling. Besides monitoring a number of system parameters, Parsim permits easy reconfiguration of topology and algorithm mappings. Parsim has been used to predict the performance of a number of different algorithms such as Fast Fourier Transform, Livermore loops and integer programming, on a variety of topologies such as transputer hypercubes, transputer meshes and a cluster of workstations on an ethernet backbone.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126164482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472250
K. M. Poon, N. Yung
This paper presents the performance analysis of realizing median filtering on a distributed multiprocessor system. The results of the performance analysis give a good indication of the performance gain in using multi-processor for median filtering over uni-processor. Such performance gain is proportional to the problem size as shown by varying the size of the image. Furthermore, through the analysis, it is clear that the computation time and inter-processor communications scale well with the number of processors in the system. However, the overall system performance does not have such behavior because of the initialization overhead dominating the computation time as the number of processors increases beyond a certain point. It is because of this relationship that an optimal performance is achievable with a certain number of processors. It is also found that this number varies with the problem size. In addition, the subimage model is found to be an acceptable approach far this type of processing as only the necessary parts of the image are sent to the other processors. The master and slave scheme proves to be easy for programming, control and data manipulation. As a whole, this type of non-linear processing seems to fit well into the MIMD architecture.<>
{"title":"Performance analysis of median filtering on Meiko-a distributed multiprocessor system","authors":"K. M. Poon, N. Yung","doi":"10.1109/ICAPP.1995.472250","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472250","url":null,"abstract":"This paper presents the performance analysis of realizing median filtering on a distributed multiprocessor system. The results of the performance analysis give a good indication of the performance gain in using multi-processor for median filtering over uni-processor. Such performance gain is proportional to the problem size as shown by varying the size of the image. Furthermore, through the analysis, it is clear that the computation time and inter-processor communications scale well with the number of processors in the system. However, the overall system performance does not have such behavior because of the initialization overhead dominating the computation time as the number of processors increases beyond a certain point. It is because of this relationship that an optimal performance is achievable with a certain number of processors. It is also found that this number varies with the problem size. In addition, the subimage model is found to be an acceptable approach far this type of processing as only the necessary parts of the image are sent to the other processors. The master and slave scheme proves to be easy for programming, control and data manipulation. As a whole, this type of non-linear processing seems to fit well into the MIMD architecture.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115066053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472169
A. Radenski
The transition from sequential object-oriented programming (OOP) to parallelism has been in the focus of active research. Experimental languages that try to integrate objects and parallelism are often seriously compromised in their capability to provide inheritance for parallel objects. Even languages that permit some amalgamation of parallelism and inheritance tend to support only single-class inheritance. The purpose of this paper is to specify a strongly typed language framework for parallel object-oriented programming which provides easy-to-use multiple inheritance for parallel objects, including inheritance for synchronization code. The proposed approach to parallelism is based on "separate" methods which generate processes and provide rendezvous-type coordination: it succeeds in cases where known languages fail to combine inheritance with parallelism. Or do it inefficiently and inconveniently.<>
{"title":"Parallel object-oriented programming with multiple inheritance: language design issues","authors":"A. Radenski","doi":"10.1109/ICAPP.1995.472169","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472169","url":null,"abstract":"The transition from sequential object-oriented programming (OOP) to parallelism has been in the focus of active research. Experimental languages that try to integrate objects and parallelism are often seriously compromised in their capability to provide inheritance for parallel objects. Even languages that permit some amalgamation of parallelism and inheritance tend to support only single-class inheritance. The purpose of this paper is to specify a strongly typed language framework for parallel object-oriented programming which provides easy-to-use multiple inheritance for parallel objects, including inheritance for synchronization code. The proposed approach to parallelism is based on \"separate\" methods which generate processes and provide rendezvous-type coordination: it succeeds in cases where known languages fail to combine inheritance with parallelism. Or do it inefficiently and inconveniently.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129556712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472247
C. Izu, R. Beivide, A. Arruabarrena, J. Gregorio
The performance of an interconnection network with adaptive routing is strongly related to the deadlock avoidance method it applies. Virtual channels are normally used for this purpose in mesh and torus networks. This work compares true architectural alternatives in the router design: mapping each virtual channel onto a different physical link and multiplexing the set of virtual channels onto the same physical link. Besides, multiplexing at the packet level is proposed as an alternative to multiplexing at the flit level, showing the advantages of this not previously used approach. The benefits of each multiplexing type, both in message latency and throughput, have been evaluated under several traffic conditions. An estimation of the node delay for each implementation scheme has also been calculated.<>
{"title":"Packet multiplexing: an efficient router implementation for adaptive mesh networks","authors":"C. Izu, R. Beivide, A. Arruabarrena, J. Gregorio","doi":"10.1109/ICAPP.1995.472247","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472247","url":null,"abstract":"The performance of an interconnection network with adaptive routing is strongly related to the deadlock avoidance method it applies. Virtual channels are normally used for this purpose in mesh and torus networks. This work compares true architectural alternatives in the router design: mapping each virtual channel onto a different physical link and multiplexing the set of virtual channels onto the same physical link. Besides, multiplexing at the packet level is proposed as an alternative to multiplexing at the flit level, showing the advantages of this not previously used approach. The benefits of each multiplexing type, both in message latency and throughput, have been evaluated under several traffic conditions. An estimation of the node delay for each implementation scheme has also been calculated.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123927248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472178
L. D. D. Cerio, M. Valero-García, Antonio Gonzalez
The computation of a one-dimensional FFT on a c-dimensional torus multicomputer is analyzed. Different approaches are proposed which differ in the way they use the interconnection network. The first approach is based on the multidimensional index mapping technique for the FFT computation. The second approach starts from a hypercube algorithm and then embeds the hypercube onto the torus. The third approach reduces the communication cost of the hypercube algorithm by pipelining the communication operations. A novel methodology to pipeline the communication operations on a torus is proposed. Analytical models are presented to compare the different approaches. This comparison study shows that the best approach depends on the number of dimensions of the torus and the communication start-up and transfer times. The analytical models allow us to select the most efficient approach for the available machine.<>
{"title":"A study of the communication cost of the FFT on torus multicomputers","authors":"L. D. D. Cerio, M. Valero-García, Antonio Gonzalez","doi":"10.1109/ICAPP.1995.472178","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472178","url":null,"abstract":"The computation of a one-dimensional FFT on a c-dimensional torus multicomputer is analyzed. Different approaches are proposed which differ in the way they use the interconnection network. The first approach is based on the multidimensional index mapping technique for the FFT computation. The second approach starts from a hypercube algorithm and then embeds the hypercube onto the torus. The third approach reduces the communication cost of the hypercube algorithm by pipelining the communication operations. A novel methodology to pipeline the communication operations on a torus is proposed. Analytical models are presented to compare the different approaches. This comparison study shows that the best approach depends on the number of dimensions of the torus and the communication start-up and transfer times. The analytical models allow us to select the most efficient approach for the available machine.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123928283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472190
T. Ae, T. Tsyosaki, H. Fukumoto, K. Sakai
We propose an objective-neuron-based active memory, ONBAM, and design it by HDL. The ONBAM is an active topological memory, which is a memory with equivalently a topological space of data and a flexibly content-addressable function, which plays an important role of the memory-based AI machine.<>
{"title":"ONBAM: an objective-neuron-based active memory","authors":"T. Ae, T. Tsyosaki, H. Fukumoto, K. Sakai","doi":"10.1109/ICAPP.1995.472190","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472190","url":null,"abstract":"We propose an objective-neuron-based active memory, ONBAM, and design it by HDL. The ONBAM is an active topological memory, which is a memory with equivalently a topological space of data and a flexibly content-addressable function, which plays an important role of the memory-based AI machine.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121648080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472189
S. Beaty, G. Johnson
In the past, vector supercomputers achieved high performance with long arithmetic pipelines coupled with fast scalar processors. Processor speed has increased at a rate greater than memory speed. Indeed, current vector processors have cycle times far faster than the memories they are connected to. When compilers can predict memory access patterns, they vectorize computations and thereby hide the processor/memory disparity. When memory access patterns are not known until run-time, caches can pay large dividends. This paper studies the effects of adding a scalar data cache to a modern vector processor and shows some encouraging results.<>
{"title":"The effect of adding a scalar D-cache to the Cray-4 vector processor","authors":"S. Beaty, G. Johnson","doi":"10.1109/ICAPP.1995.472189","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472189","url":null,"abstract":"In the past, vector supercomputers achieved high performance with long arithmetic pipelines coupled with fast scalar processors. Processor speed has increased at a rate greater than memory speed. Indeed, current vector processors have cycle times far faster than the memories they are connected to. When compilers can predict memory access patterns, they vectorize computations and thereby hide the processor/memory disparity. When memory access patterns are not known until run-time, caches can pay large dividends. This paper studies the effects of adding a scalar data cache to a modern vector processor and shows some encouraging results.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128160620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}