Pub Date : 1995-05-01DOI: 10.1016/0165-6074(94)00090-W
Constantinos V. Papadopoulos
Wormhole message routing is supported by the communication hardware of several distributed memory machines. This particular method of message routing has numerous advantages but creates the problem of a routing deadlock. When long messages compete for the same channels in the network, some messages will be blocked until the first message is fully consumed by the processor at the destination of the message. A deadlock occurs if a set of messages mutually blocks, and no message can progress towards its destination. Most deadlock free routing schemes previously known are designed to work on regular binary hypercubes, a very special case of multicomputer interconnection networks. However, these routing schemes do not provide enough flexibility to deal with the irregular 2-D-tori and attached auxiliary cells found on many newer parallel systems.
To handle irregular topologies elegantly, a simple proof is necessary to verify the router code. The new proof given in this report is carried out directly on the network graph. It is constructive in the sense that it reveals the design options to deal with irregularities and shows how additional flexibility can be used to achieve better load balancing.
Based on the modified routing model, a set of deadlock free router functions relevant to the iWarp system configurations are described and proven to be correct.
{"title":"On the routing of signals in parallel processor meshes","authors":"Constantinos V. Papadopoulos","doi":"10.1016/0165-6074(94)00090-W","DOIUrl":"10.1016/0165-6074(94)00090-W","url":null,"abstract":"<div><p><em>Wormhole message routing</em> is supported by the communication hardware of several distributed memory machines. This particular method of message routing has numerous advantages but creates the problem of a <em>routing deadlock</em>. When long messages compete for the same channels in the network, some messages will be blocked until the first message is fully consumed by the processor at the destination of the message. A deadlock occurs if a set of messages mutually blocks, and no message can progress towards its destination. Most deadlock free routing schemes previously known are designed to work on regular binary hypercubes, a very special case of multicomputer interconnection networks. However, these routing schemes do not provide enough flexibility to deal with the irregular 2-D-tori and attached auxiliary cells found on many newer parallel systems.</p><p>To handle irregular topologies elegantly, a simple proof is necessary to verify the router code. The new proof given in this report is carried out directly on the network graph. It is constructive in the sense that it reveals the design options to deal with irregularities and shows how additional flexibility can be used to achieve better load balancing.</p><p>Based on the modified routing model, a set of deadlock free router functions relevant to the iWarp system configurations are described and proven to be correct.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 2","pages":"Pages 171-189"},"PeriodicalIF":0.0,"publicationDate":"1995-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(94)00090-W","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129168612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-05-01DOI: 10.1016/0165-6074(95)00007-B
C.P. Ravikumar , Naresh Vedi
We consider the problem of mapping tasks onto processors in a reconfigurable array architecture. We assume a directed acyclic task graph as input. The node weights in the task graph represent their computational requirement; the weight on an edge (i, j) is an estimate of the communication requirement between tasks i and j. The problem is to (a) estimate the minimum number of processors p to execute all the tasks with the highest possible efficiency, (b) bind each task to a processor, (c) schedule the tasks within each processor, and (d) carry out link allocation among processors. We assume a realistic model of reconfigurable parallel processors, where each processor can be connected to at most d other processors through bidirectional links. The objective of the problem is to minimize the total overall execution time, which includes the time spent by the processors in computation, communication, and idling. The mapping problem is computationally hard, and we present two algorithms for obtaining near-optimal solutions. The first algorithm is a heuristic algorithm based on the critical path method and as soon as possible scheduling. The second algorithm uses the Boltzmann machine model of artificial neural networks to solve the mapping problem. We have implemented both the algorithms on a Sun/SPARC workstation. Experimental results on a set of benchmark problems indicate that the neural algorithm generates better solutions than the heuristic algorithm, but takes significantly larger amounts of time than the latter. The number of neurons required in the algorithm is equal to n.p and hence the connection matrix is np × np; thus the neural algorithm is also memory intensive and I/O intensive due to swapping. We have devised a parallel divide-and-conquer algorithm which decomposes a large mapping problem into several smaller ones and solves the subproblems concurrently on a network of Sun workstations.
{"title":"Heuristic and neural algorithms for mapping tasks to a reconfigurable array","authors":"C.P. Ravikumar , Naresh Vedi","doi":"10.1016/0165-6074(95)00007-B","DOIUrl":"10.1016/0165-6074(95)00007-B","url":null,"abstract":"<div><p>We consider the problem of mapping tasks onto processors in a reconfigurable array architecture. We assume a directed acyclic task graph as input. The node weights in the task graph represent their computational requirement; the weight on an edge (<em>i, j</em>) is an estimate of the communication requirement between tasks <em>i</em> and <em>j</em>. The problem is to (a) estimate the minimum number of processors <em>p</em> to execute all the tasks with the highest possible efficiency, (b) bind each task to a processor, (c) schedule the tasks within each processor, and (d) carry out link allocation among processors. We assume a realistic model of reconfigurable parallel processors, where each processor can be connected to at most <em>d</em> other processors through bidirectional links. The objective of the problem is to minimize the total overall execution time, which includes the time spent by the processors in computation, communication, and idling. The mapping problem is computationally hard, and we present two algorithms for obtaining near-optimal solutions. The first algorithm is a heuristic algorithm based on the <em>critical path method</em> and <em>as soon as possible</em> scheduling. The second algorithm uses the Boltzmann machine model of artificial neural networks to solve the mapping problem. We have implemented both the algorithms on a Sun/SPARC workstation. Experimental results on a set of benchmark problems indicate that the neural algorithm generates better solutions than the heuristic algorithm, but takes significantly larger amounts of time than the latter. The number of neurons required in the algorithm is equal to <em>n.p</em> and hence the connection matrix is <em>np</em> × <em>np</em>; thus the neural algorithm is also memory intensive and I/O intensive due to swapping. We have devised a parallel divide-and-conquer algorithm which decomposes a large mapping problem into several smaller ones and solves the subproblems concurrently on a network of Sun workstations.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 2","pages":"Pages 137-151"},"PeriodicalIF":0.0,"publicationDate":"1995-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(95)00007-B","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121795005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a new global scheduling algorithm for automatic synthesis of the control blocks of special-purpose microprocessors. The main distinction of the proposed algorithm is that it exploits the inheritances of structured programs. The optimization goal is to maximize the speedup of the processor and minimize the size of the control block. If compared with existing global scheduling algorithms such as Trace scheduling, Tree compaction, and Percolation scheduling, the proposed algorithm consistently achieves better results in terms of the speedup of the processor and the size of the control block.
{"title":"A new approach to schedule operations across nested-ifs and nested-loops","authors":"Shih-Hsu Huang , Cheng-Tsung Hwang , Yu-Chin Hsu , Yen-Jen Oyang","doi":"10.1016/0165-6074(94)00024-5","DOIUrl":"https://doi.org/10.1016/0165-6074(94)00024-5","url":null,"abstract":"<div><p>This paper presents a new global scheduling algorithm for automatic synthesis of the control blocks of special-purpose microprocessors. The main distinction of the proposed algorithm is that it exploits the inheritances of structured programs. The optimization goal is to maximize the speedup of the processor and minimize the size of the control block. If compared with existing global scheduling algorithms such as Trace scheduling, Tree compaction, and Percolation scheduling, the proposed algorithm consistently achieves better results in terms of the speedup of the processor and the size of the control block.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 1","pages":"Pages 37-52"},"PeriodicalIF":0.0,"publicationDate":"1995-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(94)00024-5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91774223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-01DOI: 10.1016/0165-6074(95)90006-3
Mariagiovanna Sami (Editor-in-Chief), Lutz Richter (Editor-in-Chief)
{"title":"Letter from the editors-in-chief","authors":"Mariagiovanna Sami (Editor-in-Chief), Lutz Richter (Editor-in-Chief)","doi":"10.1016/0165-6074(95)90006-3","DOIUrl":"https://doi.org/10.1016/0165-6074(95)90006-3","url":null,"abstract":"","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 1","pages":"Pages 1-3"},"PeriodicalIF":0.0,"publicationDate":"1995-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(95)90006-3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90007816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-01DOI: 10.1016/0165-6074(94)00025-6
Robert Manger , Mladen Grbić , Vito Leornardo Plantamura , Branko Souček
In this paper we describe a parallel version of the Hestenes algorithm for computing the singular value decomposition. We also describe a corresponding implementation on a transputer network. The implementation has been used to accelerate some programs for financial ratio analysis. Empirical results regarding the efficiency of our implementation are also presented.
{"title":"A parallel SVD algorithm and its application to financial ratio analysis","authors":"Robert Manger , Mladen Grbić , Vito Leornardo Plantamura , Branko Souček","doi":"10.1016/0165-6074(94)00025-6","DOIUrl":"https://doi.org/10.1016/0165-6074(94)00025-6","url":null,"abstract":"<div><p>In this paper we describe a parallel version of the Hestenes algorithm for computing the singular value decomposition. We also describe a corresponding implementation on a transputer network. The implementation has been used to accelerate some programs for financial ratio analysis. Empirical results regarding the efficiency of our implementation are also presented.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 1","pages":"Pages 97-106"},"PeriodicalIF":0.0,"publicationDate":"1995-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(94)00025-6","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91774224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-01DOI: 10.1016/0165-6074(94)00017-5
K.W. Ng , C.K. Luk
Functional, object-oriented and logic programming are widely regarded as the three most dominant programming paradigms nowadays. For the past decade, many attempts have been made to integrate these three paradigms into a single language. This paper is a survey of this new breed of multiparadigm languages. First we give a succinct introduction to the three paradigms. Then we discuss a variety of approaches to the integration of the three paradigms through an overview of some of the existing multiparadigm languages. All possible combinations of the three paradigms, namely logic + object-oriented, functional + logic, functional + object-oriented, and object-oriented + logic + functional, are considered separately. For the purpose of classification, we have proposed a design space of programming languages called the FOOL-space.
{"title":"A survey of languages integrating functional, object-oriented and logic programming","authors":"K.W. Ng , C.K. Luk","doi":"10.1016/0165-6074(94)00017-5","DOIUrl":"https://doi.org/10.1016/0165-6074(94)00017-5","url":null,"abstract":"<div><p>Functional, object-oriented and logic programming are widely regarded as the three most dominant programming paradigms nowadays. For the past decade, many attempts have been made to integrate these three paradigms into a single language. This paper is a survey of this new breed of multiparadigm languages. First we give a succinct introduction to the three paradigms. Then we discuss a variety of approaches to the integration of the three paradigms through an overview of some of the existing multiparadigm languages. All possible combinations of the three paradigms, namely logic + object-oriented, functional + logic, functional + object-oriented, and object-oriented + logic + functional, are considered separately. For the purpose of classification, we have proposed a design space of programming languages called the FOOL-space.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 1","pages":"Pages 5-36"},"PeriodicalIF":0.0,"publicationDate":"1995-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(94)00017-5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91774219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-01DOI: 10.1016/0165-6074(95)90007-1
{"title":"Calendar of forthcoming conference and events","authors":"","doi":"10.1016/0165-6074(95)90007-1","DOIUrl":"https://doi.org/10.1016/0165-6074(95)90007-1","url":null,"abstract":"","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 1","pages":"Pages 107-109"},"PeriodicalIF":0.0,"publicationDate":"1995-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(95)90007-1","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90125917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-01DOI: 10.1016/0165-6074(94)00089-S
Ananta K. Majhi , L.M. Patnaik , Srilata Raman
Multichip Modules (MCMs) is a packaging technology gaining importance, because it reduces the interconnect delays across chips, by bringing the interconnect delays closer in magnitude to the on-chip delays. The problem here is to partition a circuit across multiple chips, producing MCMs. Partitioning is a combinatorial optimization problem. One of the methods to solve the problem is by the use of Genetic Algorithms (GAs), which are based on genetics. GAs can be used to solve both combinatorial as well as functional optimization problems. This paper solves the problem of partitioning using the GA approach. The performance of GAs is compared with that of Simulated Annealing (SA), by executing the algorithms on three benchmark circuits. The effect of varying the parameters of the algorithm on the performance of GAs is studied.
{"title":"A genetic algorithm-based circuit partitioner for MCMs","authors":"Ananta K. Majhi , L.M. Patnaik , Srilata Raman","doi":"10.1016/0165-6074(94)00089-S","DOIUrl":"10.1016/0165-6074(94)00089-S","url":null,"abstract":"<div><p><em>Multichip Modules</em> (MCMs) is a packaging technology gaining importance, because it reduces the interconnect delays across chips, by bringing the interconnect delays closer in magnitude to the on-chip delays. The problem here is to partition a circuit across multiple chips, producing MCMs. Partitioning is a combinatorial optimization problem. One of the methods to solve the problem is by the use of <em>Genetic Algorithms</em> (GAs), which are based on genetics. GAs can be used to solve both combinatorial as well as functional optimization problems. This paper solves the problem of partitioning using the GA approach. The performance of GAs is compared with that of Simulated Annealing (SA), by executing the algorithms on three benchmark circuits. The effect of varying the parameters of the algorithm on the performance of GAs is studied.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 1","pages":"Pages 83-96"},"PeriodicalIF":0.0,"publicationDate":"1995-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(94)00089-S","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132934908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-01DOI: 10.1016/0165-6074(94)00088-R
Philippe O.A. Navaux , César A.F. De Rose , Gerson G.H. Cavalheiro
The application of parallelism considering the field of image processing is an alternative to implement real time image processing. The data parallelism found in array processors simplify the mapping of this kind of problem as each processing element works on part of the image. The GAPP board (Geometric Arithmetic Parallel Processor) is a near-neighbor mesh architecture with 144 processors interconnected as a 12 × 12 bidimensional array. This work analyzes the performance of the GAPP board regarding image processing. The implementation of two image convolution algorithms and the analyses of the obtained results, as well as the utilization of the GAPP board in this kind of application and the performance achieved are discussed. The results found in this work are also compared to the results presented in [11]. Some ways to achieve a better performance of the GAPP array in this kind of application are also presented.
{"title":"Performance evaluation in image processing with GAPP array processor","authors":"Philippe O.A. Navaux , César A.F. De Rose , Gerson G.H. Cavalheiro","doi":"10.1016/0165-6074(94)00088-R","DOIUrl":"https://doi.org/10.1016/0165-6074(94)00088-R","url":null,"abstract":"<div><p>The application of parallelism considering the field of image processing is an alternative to implement real time image processing. The data parallelism found in array processors simplify the mapping of this kind of problem as each processing element works on part of the image. The GAPP board (Geometric Arithmetic Parallel Processor) is a near-neighbor mesh architecture with 144 processors interconnected as a 12 × 12 bidimensional array. This work analyzes the performance of the GAPP board regarding image processing. The implementation of two image convolution algorithms and the analyses of the obtained results, as well as the utilization of the GAPP board in this kind of application and the performance achieved are discussed. The results found in this work are also compared to the results presented in [11]. Some ways to achieve a better performance of the GAPP array in this kind of application are also presented.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 1","pages":"Pages 71-82"},"PeriodicalIF":0.0,"publicationDate":"1995-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(94)00088-R","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91774221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-01DOI: 10.1016/0165-6074(95)90627-O
B.A. Coghlan, J.O. Jones
Lack of I/O performance is fast becoming a limiting factor in many computing systems. The yearly doubling of CPU speeds is not being matched by corresponding gains in I/O performance. This paper explores one aspect of the architecture of a high performance fault-tolerant cached RAID subsystem for a multiprocessor. The disk write cache is implemented as a memory-mapped stable memory. The features of a VRAM-based stable memory and its associated RAID controller are discussed.
{"title":"Stable memory for a disk write cache","authors":"B.A. Coghlan, J.O. Jones","doi":"10.1016/0165-6074(95)90627-O","DOIUrl":"https://doi.org/10.1016/0165-6074(95)90627-O","url":null,"abstract":"<div><p>Lack of I/O performance is fast becoming a limiting factor in many computing systems. The yearly doubling of CPU speeds is not being matched by corresponding gains in I/O performance. This paper explores one aspect of the architecture of a high performance fault-tolerant cached RAID subsystem for a multiprocessor. The disk write cache is implemented as a memory-mapped stable memory. The features of a VRAM-based stable memory and its associated RAID controller are discussed.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 1","pages":"Pages 53-70"},"PeriodicalIF":0.0,"publicationDate":"1995-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(95)90627-O","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91774222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}