Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472283
Gernot A. Fink, N. Jungclaus, Helge Ritter, G. Sagerer
Unlike in traditional approaches to parallel or distributed processing where normally well structured problems are implemented completely in some programming environment we are faced with the problem of integrating existing heterogeneous software systems. Furthermore, pattern analysis stresses special aspects of communication capabilities. Therefore, we propose a new communication framework dedicated to heterogeneous pattern analysis systems that handles typed structured data, enables completely symmetric interaction, and provides various call semantics. A first prototype evaluating some of the concepts in practical situations is presented.<>
{"title":"A communication framework for heterogeneous distributed pattern analysis","authors":"Gernot A. Fink, N. Jungclaus, Helge Ritter, G. Sagerer","doi":"10.1109/ICAPP.1995.472283","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472283","url":null,"abstract":"Unlike in traditional approaches to parallel or distributed processing where normally well structured problems are implemented completely in some programming environment we are faced with the problem of integrating existing heterogeneous software systems. Furthermore, pattern analysis stresses special aspects of communication capabilities. Therefore, we propose a new communication framework dedicated to heterogeneous pattern analysis systems that handles typed structured data, enables completely symmetric interaction, and provides various call semantics. A first prototype evaluating some of the concepts in practical situations is presented.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130247737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472302
V.L. Varscavsky
From the standpoint of hardware experts, asynchronism is connected with the concept of physical time as an independent physical variable and is determined by the variations of transient process durations in hardware circuits, modules and blocks that are physical objects by their nature. Software and architecture experts treat asynchronism as a partial order on events that are logical objects, i.e. they think in terms of logical time. In these terms, asynchronism is the variation of the process step quantity without respect to the real duration of these seeps in physical time. The measuring tool for time is a clock and the precision of the clock (along with the system of signal delivery) we can attain determines the area of its application (the allowed value of physical time step). The basic idea of self-timing is detecting the moments when transient processes in physical components are over and producing the corresponding logical signals that provide the transition to logical time (delay-insensitive design) in spite of the delay variation reasons. As all the logical signals invariant to the physical time and representing the events in the system are formed, self-timed methodology has a number of efficient hardware support methods to coordinate the events of the corresponding concurrent specification.<>
{"title":"Asynchronous interaction in massively parallel computing","authors":"V.L. Varscavsky","doi":"10.1109/ICAPP.1995.472302","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472302","url":null,"abstract":"From the standpoint of hardware experts, asynchronism is connected with the concept of physical time as an independent physical variable and is determined by the variations of transient process durations in hardware circuits, modules and blocks that are physical objects by their nature. Software and architecture experts treat asynchronism as a partial order on events that are logical objects, i.e. they think in terms of logical time. In these terms, asynchronism is the variation of the process step quantity without respect to the real duration of these seeps in physical time. The measuring tool for time is a clock and the precision of the clock (along with the system of signal delivery) we can attain determines the area of its application (the allowed value of physical time step). The basic idea of self-timing is detecting the moments when transient processes in physical components are over and producing the corresponding logical signals that provide the transition to logical time (delay-insensitive design) in spite of the delay variation reasons. As all the logical signals invariant to the physical time and representing the events in the system are formed, self-timed methodology has a number of efficient hardware support methods to coordinate the events of the corresponding concurrent specification.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128726601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472299
G. Manis, K. Voliotis, C. Lekatsas, P. Tsanakas, G. Papakonstantinou
Orchid is a portable software platform aiming to decouple the parallel software development from the underlying system. Having layered structure, Orchid can be easily ported to different architectures only by reconstructing its lowest level. It also provides advanced facilities not supported by most operating systems and software platforms.<>
{"title":"Orchid: the design of a parallel and portable software platform for local area networks","authors":"G. Manis, K. Voliotis, C. Lekatsas, P. Tsanakas, G. Papakonstantinou","doi":"10.1109/ICAPP.1995.472299","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472299","url":null,"abstract":"Orchid is a portable software platform aiming to decouple the parallel software development from the underlying system. Having layered structure, Orchid can be easily ported to different architectures only by reconstructing its lowest level. It also provides advanced facilities not supported by most operating systems and software platforms.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123439334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472230
V. Varshavsky, V.B. Marakhovsky, R.A. Lashevsky
The problems are discussed that arise when designing massively parallel computer systems. The transition from globally synchronized working of such systems to globally asynchronous behavior resolves most of them. This transition implies that all the local processes in the system should interact between each other on the base of asynchronous interfaces. The problems of asynchronous interaction of local processes with the system of their global coordination on the base of handshake are considered as well as the problems of self-timed data transmission between processes. If the system modules that realize local processes are not asynchronous and implemented in CMOS-technology, then, to detect the moments of the transient processes completion in them, the idea of current indication is used. A circuit of a current sensor is suggested with wide range of permissible changes of the measured current and with admissible characteristics. Two ways of organizing the interaction between circuits with current sensors are developed. The principles of self-timed data exchange between local processes of the system and data transmission by means of a dual-rail code and binary code with handshake for every bit are considered. The possibility of organizing single-wire bit handshake is demonstrated and its self-timed implementation is developed with the transmission rate no worse than that of double-wire bit handshake.<>
{"title":"Asynchronous interaction in massively parallel computing systems","authors":"V. Varshavsky, V.B. Marakhovsky, R.A. Lashevsky","doi":"10.1109/ICAPP.1995.472230","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472230","url":null,"abstract":"The problems are discussed that arise when designing massively parallel computer systems. The transition from globally synchronized working of such systems to globally asynchronous behavior resolves most of them. This transition implies that all the local processes in the system should interact between each other on the base of asynchronous interfaces. The problems of asynchronous interaction of local processes with the system of their global coordination on the base of handshake are considered as well as the problems of self-timed data transmission between processes. If the system modules that realize local processes are not asynchronous and implemented in CMOS-technology, then, to detect the moments of the transient processes completion in them, the idea of current indication is used. A circuit of a current sensor is suggested with wide range of permissible changes of the measured current and with admissible characteristics. Two ways of organizing the interaction between circuits with current sensors are developed. The principles of self-timed data exchange between local processes of the system and data transmission by means of a dual-rail code and binary code with handshake for every bit are considered. The possibility of organizing single-wire bit handshake is demonstrated and its self-timed implementation is developed with the transmission rate no worse than that of double-wire bit handshake.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123075206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472263
M. Valerio, L. Moser, P. Melliar-Smith
Orthogonal fat-trees are a type of interconnection network with several desirable characteristics: short distance between processors, constant degree of the switching elements, uniform traffic load, symmetry, and recursive scalability. We first show how to build two-level orthogonal fat-trees, where each node has a fixed degree and there is a maximum distance of two between any two leaves. We then show how to provide fault tolerance by including redundant paths at the cost of reducing the number of leaves. Finally, we show how to construct large orthogonal fat-trees from two-level fat-trees recursively.<>
{"title":"Fault-tolerant orthogonal fat-trees as interconnection networks","authors":"M. Valerio, L. Moser, P. Melliar-Smith","doi":"10.1109/ICAPP.1995.472263","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472263","url":null,"abstract":"Orthogonal fat-trees are a type of interconnection network with several desirable characteristics: short distance between processors, constant degree of the switching elements, uniform traffic load, symmetry, and recursive scalability. We first show how to build two-level orthogonal fat-trees, where each node has a fixed degree and there is a maximum distance of two between any two leaves. We then show how to provide fault tolerance by including redundant paths at the cost of reducing the number of leaves. Finally, we show how to construct large orthogonal fat-trees from two-level fat-trees recursively.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"149 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123562504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472212
R. Neogi
DCT/IDCT bared source coding and decoding techniques are widely accepted in HDTV systems and other MPEG based applications. In this paper, we propose a new direct 2-D IDCT algorithm bared on the parallel divide-and-conquer approach. The algorithm distributes computation by considering one transformed coefficient at a time and doing partial computation and updating as every coefficient arrives. A novel parallel and fully pipelined architecture with an effective processing time of one cycle per pixel for an N/spl times/N size block is designed to implement the algorithm. An unique feature of this architecture is that it integrates inverse-shuffling, inverse-quantization, inverse-source-coding, and motion-compensation into a single compact data-path. We avoid the insertion of a FIFO between the bit-stream decoder and decompression engine. The entire block of pixel values are sampled in a single cycle for post-processing after de-compression. Also, we use only (N/2(N/2+1))/2 multipliers and N/sup 2/ adders.<>
{"title":"Embedded real-time video decompression algorithm and architecture for HDTV applications","authors":"R. Neogi","doi":"10.1109/ICAPP.1995.472212","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472212","url":null,"abstract":"DCT/IDCT bared source coding and decoding techniques are widely accepted in HDTV systems and other MPEG based applications. In this paper, we propose a new direct 2-D IDCT algorithm bared on the parallel divide-and-conquer approach. The algorithm distributes computation by considering one transformed coefficient at a time and doing partial computation and updating as every coefficient arrives. A novel parallel and fully pipelined architecture with an effective processing time of one cycle per pixel for an N/spl times/N size block is designed to implement the algorithm. An unique feature of this architecture is that it integrates inverse-shuffling, inverse-quantization, inverse-source-coding, and motion-compensation into a single compact data-path. We avoid the insertion of a FIFO between the bit-stream decoder and decompression engine. The entire block of pixel values are sampled in a single cycle for post-processing after de-compression. Also, we use only (N/2(N/2+1))/2 multipliers and N/sup 2/ adders.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122309485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472166
G. Luce, J. Myoupo
This paper presents an implementable linear systolic array of m cells which computes both a longest common subsequence and its length in time n+3m+p-1, where m/spl les/n and p is the length of the LCS. Our algorithm can be extended to recover more than one LCS. Another important property of our algorithm is that each element of an LCS is extracted with its ranks in A and B respectively. Thus we can precisely localize the elements of A and B which match each other. In practice, this information is essential in some situations.<>
{"title":"An efficient linear systolic algorithm for recovering longest common subsequences","authors":"G. Luce, J. Myoupo","doi":"10.1109/ICAPP.1995.472166","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472166","url":null,"abstract":"This paper presents an implementable linear systolic array of m cells which computes both a longest common subsequence and its length in time n+3m+p-1, where m/spl les/n and p is the length of the LCS. Our algorithm can be extended to recover more than one LCS. Another important property of our algorithm is that each element of an LCS is extracted with its ranks in A and B respectively. Thus we can precisely localize the elements of A and B which match each other. In practice, this information is essential in some situations.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122603905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472207
A. Roberts, A. Symvonis
In this paper, we consider the deflection worm routing problem on two dimensional n/spl times/n meshes. Our results include: (i) an off-line algorithm for routing permutations in O(kn) steps, and (ii) a general method to obtain deflection worm routing algorithms from packet routing algorithms.<>
{"title":"On deflection worm routing on meshes","authors":"A. Roberts, A. Symvonis","doi":"10.1109/ICAPP.1995.472207","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472207","url":null,"abstract":"In this paper, we consider the deflection worm routing problem on two dimensional n/spl times/n meshes. Our results include: (i) an off-line algorithm for routing permutations in O(kn) steps, and (ii) a general method to obtain deflection worm routing algorithms from packet routing algorithms.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"176 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114075009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472234
K. Miyashita, Y. Tsujino, N. Tokura
The Processor Array with Reconfigurable Bus System (PARBS) and the Reconfigurable Multiple Bus Machine (RMBM) are models of parallel computation based on reconfigurable bus and processor array. The PARBS is a processor array that consists of processors arranged to a 2-dimensional grid with a reconfigurable bus system. The RMBM is also made of processors and reconfigurable bus system, but the processors are located in a row and the number of processors and the number of buses are independent of each other. In this paper, we describe that the computational power of the PARBS is equal to that of the RMBM on condition that two models are polynomially bounded. This is because that one model can be simulated in constant time by another model.<>
{"title":"A comparison between the powers of the PARBS and the RMBM","authors":"K. Miyashita, Y. Tsujino, N. Tokura","doi":"10.1109/ICAPP.1995.472234","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472234","url":null,"abstract":"The Processor Array with Reconfigurable Bus System (PARBS) and the Reconfigurable Multiple Bus Machine (RMBM) are models of parallel computation based on reconfigurable bus and processor array. The PARBS is a processor array that consists of processors arranged to a 2-dimensional grid with a reconfigurable bus system. The RMBM is also made of processors and reconfigurable bus system, but the processors are located in a row and the number of processors and the number of buses are independent of each other. In this paper, we describe that the computational power of the PARBS is equal to that of the RMBM on condition that two models are polynomially bounded. This is because that one model can be simulated in constant time by another model.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114510048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472196
Z. Leyk, M. Dow
We are implementing iterative methods on the VPP500 parallel computer. During this process we have met with different kind of problems. It is easy to notice that performance on the VPP500 depends critically on the type of matrices taken for computations. In sparse computations, it is important to take advantage of the structure of the matrix. There can be a big difference between the performance obtained from a matrix stored in the diagonal format and one stored in a more general format. Therefore it is necessary to choose an appropriate format for a matrix used in computations. Preliminary tests show that implementation of the package is scalable with respect to the number of processors, especially for large problems. It is becoming clear for us that the traditional efficient preconditioning techniques result only in a speedup of factor 2 at best. We need to look for new preconditioners more adjusted for parallel computations. The polynomial preconditioning approach is attractive because of the negligible preprocessing cost involved. We favour the reverse communication interface for added flexibility necessary for doing tests with different storage formats and preconditioners. We can conclude that it is crucial to experiment with existing parallel machines to better understand the effects that are difficult to derive from theory, such as impact of communication costs or ways of storing data.<>
{"title":"Package of iterative solvers for the Fujitsu VPP500 parallel supercomputer","authors":"Z. Leyk, M. Dow","doi":"10.1109/ICAPP.1995.472196","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472196","url":null,"abstract":"We are implementing iterative methods on the VPP500 parallel computer. During this process we have met with different kind of problems. It is easy to notice that performance on the VPP500 depends critically on the type of matrices taken for computations. In sparse computations, it is important to take advantage of the structure of the matrix. There can be a big difference between the performance obtained from a matrix stored in the diagonal format and one stored in a more general format. Therefore it is necessary to choose an appropriate format for a matrix used in computations. Preliminary tests show that implementation of the package is scalable with respect to the number of processors, especially for large problems. It is becoming clear for us that the traditional efficient preconditioning techniques result only in a speedup of factor 2 at best. We need to look for new preconditioners more adjusted for parallel computations. The polynomial preconditioning approach is attractive because of the negligible preprocessing cost involved. We favour the reverse communication interface for added flexibility necessary for doing tests with different storage formats and preconditioners. We can conclude that it is crucial to experiment with existing parallel machines to better understand the effects that are difficult to derive from theory, such as impact of communication costs or ways of storing data.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116743448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}