Pub Date : 1995-11-01DOI: 10.1016/0165-6074(95)00017-I
Myung-Kyun Kim, Hyunsoo Yoon, S.R. Maeng
In this paper, we consider a class of log N stage interconnection networks called Bit-Permute Multistage Interconnection Networks (BPMIN's) where the ports of each switch of a stage are different at only one bit position of their labels. We describe the decomposition structure of the BPMIN's and prove that all of the BPMIN's are topologically equivalent and some of them are functionally equivalent. We also identify a class of 2 log N stage rearrangeable networks called symmetric BPMIN's where two log N stage BPMIN's are connected in sequence. The symmetric BPMIN's are either symmetric or asymmetric and regular or irregular in their inter-stage connections and can be reduced into 2 log N-1 stages by combining the two center stages. We show that the symmetric BPMIN's constitute larger class of rearrangeable networks than ever known. We also propose a general routing algorithm for the symmetric BPMIN's by modifying slightly the looping algorithm of the Benes network.
{"title":"Bit-permute multistage interconnection networks","authors":"Myung-Kyun Kim, Hyunsoo Yoon, S.R. Maeng","doi":"10.1016/0165-6074(95)00017-I","DOIUrl":"10.1016/0165-6074(95)00017-I","url":null,"abstract":"<div><p>In this paper, we consider a class of log <em>N</em> stage interconnection networks called Bit-Permute Multistage Interconnection Networks (BPMIN's) where the ports of each switch of a stage are different at only one bit position of their labels. We describe the decomposition structure of the BPMIN's and prove that all of the BPMIN's are topologically equivalent and some of them are functionally equivalent. We also identify a class of 2 log <em>N</em> stage rearrangeable networks called symmetric BPMIN's where two log <em>N</em> stage BPMIN's are connected in sequence. The symmetric BPMIN's are either symmetric or asymmetric and regular or irregular in their inter-stage connections and can be reduced into 2 log <em>N</em>-1 stages by combining the two center stages. We show that the symmetric BPMIN's constitute larger class of rearrangeable networks than ever known. We also propose a general routing algorithm for the symmetric BPMIN's by modifying slightly the looping algorithm of the Benes network.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 7","pages":"Pages 449-468"},"PeriodicalIF":0.0,"publicationDate":"1995-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(95)00017-I","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121395183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-11-01DOI: 10.1016/0165-6074(95)00016-H
Shih-Hsu Huang , Yu-Chin Hsu , Yen-Jen Oyang
This paper describes a new scheduling algorithm for automatic synthesis of the control blocks of control-dominated circuits. The proposed scheduling algorithm is distinctive in its approach to partition a control/data flow graph (CDFG) into an equivalent state transition graph. It works on the CDFG to exploit operation relocation, chaining, duplication, and unification. The optimization goal is to schedule each execution path as fast as possible. Benchmark data shows that this approach achieved better results over the previous ones in terms of the speedup of the circuit and the number of states and transitions.
{"title":"A new scheduling algorithm for synthesizing the control blocks of control-dominated circuits","authors":"Shih-Hsu Huang , Yu-Chin Hsu , Yen-Jen Oyang","doi":"10.1016/0165-6074(95)00016-H","DOIUrl":"10.1016/0165-6074(95)00016-H","url":null,"abstract":"<div><p>This paper describes a new scheduling algorithm for automatic synthesis of the control blocks of control-dominated circuits. The proposed scheduling algorithm is distinctive in its approach to partition a control/data flow graph (CDFG) into an equivalent state transition graph. It works on the CDFG to exploit operation relocation, chaining, duplication, and unification. The optimization goal is to schedule each execution path as fast as possible. Benchmark data shows that this approach achieved better results over the previous ones in terms of the speedup of the circuit and the number of states and transitions.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 7","pages":"Pages 501-519"},"PeriodicalIF":0.0,"publicationDate":"1995-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(95)00016-H","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115458622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-11-01DOI: 10.1016/0165-6074(95)90002-0
{"title":"Calendar of forthcoming conferences and events","authors":"","doi":"10.1016/0165-6074(95)90002-0","DOIUrl":"https://doi.org/10.1016/0165-6074(95)90002-0","url":null,"abstract":"","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 7","pages":"Pages 521-522"},"PeriodicalIF":0.0,"publicationDate":"1995-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(95)90002-0","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"137422156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-11-01DOI: 10.1016/0165-6074(95)00018-J
Hong Shen
The mutual range-join of k sets, S1, S2,…, Sk, is the set containing all tuples (s1, s2…, sk) that satisfy ¦ ≤ e2 for all 1 ≤i≠j≤k, where siϵSi and e1 ≤ e2 are fixed constants. This paper presents an efficient parallel algorithm for computing the k-set mutual range-join in hypercube computers. The proposed algorithm uses a fast method to determine whether the differences of all pair numbers among k given numbers are within a given range and applies the technique of permutation-based range-join [11]. To compute the mutual range-join of k sets S1,S2,…, Sk in a hypercube of p processors with O(∑ki = 1nini/p) local memory, and 1 ≤ i ≤ k, our algorithm requires at most O((klogk/p)πki = 1ni) data comparisons in the worst case. The algorithm is implemented in PVM and its performance is extensively evaluated on various input data.
{"title":"Parallel K-set mutual range-join in hypercubes","authors":"Hong Shen","doi":"10.1016/0165-6074(95)00018-J","DOIUrl":"10.1016/0165-6074(95)00018-J","url":null,"abstract":"<div><p>The mutual range-join of <em>k</em> sets, <em>S</em><sub>1</sub>, <em>S</em><sub>2</sub>,…, <em>S</em><sub><em>k</em></sub>, is the set containing all tuples (<em>s</em><sub>1</sub>, <em>s</em><sub>2</sub>…, <em>s</em><sub><em>k</em></sub>) that satisfy <span><math><msub><mi></mi><mn>1</mn></msub><mtext> ≤ ¦s</mtext><msub><mi></mi><mn>i</mn></msub><mtext> − s</mtext><msub><mi></mi><mn>j</mn></msub></math></span> ¦ ≤ e<sub>2</sub> for all 1 ≤<em>i</em>≠<em>j</em>≤<em>k</em>, where <em>s</em><sub><em>i</em></sub> <em>ϵ</em> <em>S</em><sub><em>i</em></sub> and <em>e</em><sub>1</sub> ≤ <em>e</em><sub>2</sub> are fixed constants. This paper presents an efficient parallel algorithm for computing the <em>k</em>-set mutual range-join in hypercube computers. The proposed algorithm uses a fast method to determine whether the differences of all pair numbers among <em>k</em> given numbers are within a given range and applies the technique of permutation-based range-join [11]. To compute the mutual range-join of <em>k</em> sets <em>S</em><sub>1</sub>,<em>S</em><sub>2</sub>,…, <em>S</em><sub><em>k</em></sub> in a hypercube of <em>p</em> processors with <em>O</em>(∑<sup><em>k</em></sup><sub><em>i</em> = 1</sub><em>n</em><sub><em>i</em></sub><em>n</em><sub><em>i</em></sub>/<em>p</em>) local memory, <span><math><mtext>p ≤ ¦S</mtext><msub><mi></mi><mn>i</mn></msub><mtext>¦ = n</mtext><msub><mi></mi><mn>i</mn></msub></math></span> and 1 ≤ <em>i</em> ≤ <em>k</em>, our algorithm requires at most <em>O</em>((<em>k</em> <em>log</em> <em>k</em>/<em>p</em>)<em>π</em><sup><em>k</em></sup><sub><em>i</em> = 1</sub><em>n</em><sub><em>i</em></sub>) data comparisons in the worst case. The algorithm is implemented in <em>PVM</em> and its performance is extensively evaluated on various input data.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 7","pages":"Pages 443-448"},"PeriodicalIF":0.0,"publicationDate":"1995-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(95)00018-J","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121329230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-11-01DOI: 10.1016/0165-6074(95)00027-L
R.K. Arora , Vimal K. Khanna
We describe the design of a kernel for an inexpensive front-end processor to run the lower layers of common communication protocols. Implementing a full fledged kernel on a card requires large memory, expensive hardware and heavy processing overhead. We studied the requirements of the layers of communication protocols which generally run as a part of the host kernel. We realised that a workable kernel for the front-end processor which has a subset of the features of the host kernel, could be implemented within reasonable time. This could provide the functions required by the communication protocol layers and run them within the constraints of inexpensive hardware. The aims of this research were manifold:
1.
(1) To explore the general set of requirements of common connection-oriented and connectionless protocols.
2.
(2) To design and present algorithms of a kernel which can satisfy such requirements. Since the kernel is based on this general set of requirements, it will be generic and not specific to only the protocol running on the card. Thus it can be used to download different protocols on the card.
3.
(3) Suggest implementation techniques so as to reduce memory and processing overhead on the host and to improve the performance of the protocols running in the kernel.
4.
(4) To actually run a protocol on this kernel and compare its performance with an existing design.
{"title":"Design of a kernel for implementing communication protocols","authors":"R.K. Arora , Vimal K. Khanna","doi":"10.1016/0165-6074(95)00027-L","DOIUrl":"10.1016/0165-6074(95)00027-L","url":null,"abstract":"<div><p>We describe the design of a kernel for an inexpensive front-end processor to run the lower layers of common communication protocols. Implementing a full fledged kernel on a card requires large memory, expensive hardware and heavy processing overhead. We studied the requirements of the layers of communication protocols which generally run as a part of the host kernel. We realised that a workable kernel for the front-end processor which has a subset of the features of the host kernel, could be implemented within reasonable time. This could provide the functions required by the communication protocol layers and run them within the constraints of inexpensive hardware. The aims of this research were manifold: </p><ul><li><span>1.</span><span><p>(1) To explore the general set of requirements of common connection-oriented and connectionless protocols.</p></span></li><li><span>2.</span><span><p>(2) To design and present algorithms of a kernel which can satisfy such requirements. Since the kernel is based on this general set of requirements, it will be generic and not specific to only the protocol running on the card. Thus it can be used to download different protocols on the card.</p></span></li><li><span>3.</span><span><p>(3) Suggest implementation techniques so as to reduce memory and processing overhead on the host and to improve the performance of the protocols running in the kernel.</p></span></li><li><span>4.</span><span><p>(4) To actually run a protocol on this kernel and compare its performance with an existing design.</p></span></li></ul></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 7","pages":"Pages 469-485"},"PeriodicalIF":0.0,"publicationDate":"1995-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(95)00027-L","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121655574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-11-01DOI: 10.1016/0165-6074(95)00028-M
Enrico Macii, Massimo Poncino
Representing finite state systems by means of finite state machines is a common approach in VLSI circuit design. BDD-based algorithms have made possible the manipulation of FSMs with very large state spaces; however, when the representation of the set of reachable states grows too much, the original FSM is no longer manageable as a whole, and it needs to be decomposed into smaller sub-machines. Structural analysis of the circuit from which the FSM has been extracted has shown to be very effective to determine good state variable partitions which induce FSM decomposition for logic synthesis and formal verification applications. In this paper we propose FSM analysis techniques based on connectivity and spectral characteristics of the state machine which take into account the mutual dependency of the state variables, but which are no longer dependent on the structure of the underlying circuit; therefore, they may be used in a context different from sequential logic optimization and FSM verification. Experimental results are presented and discussed for the mcnc'91 FSM benchmarks and for the iscas'89 sequential circuits.
{"title":"Using connectivity and spectral methods to characterize the structure of sequential logic circuits","authors":"Enrico Macii, Massimo Poncino","doi":"10.1016/0165-6074(95)00028-M","DOIUrl":"10.1016/0165-6074(95)00028-M","url":null,"abstract":"<div><p>Representing finite state systems by means of finite state machines is a common approach in VLSI circuit design. BDD-based algorithms have made possible the manipulation of FSMs with very large state spaces; however, when the representation of the set of reachable states grows too much, the original FSM is no longer manageable as a whole, and it needs to be decomposed into smaller sub-machines. Structural analysis of the circuit from which the FSM has been extracted has shown to be very effective to determine good state variable partitions which induce FSM decomposition for logic synthesis and formal verification applications. In this paper we propose FSM analysis techniques based on connectivity and spectral characteristics of the state machine which take into account the mutual dependency of the state variables, but which are no longer dependent on the structure of the underlying circuit; therefore, they may be used in a context different from sequential logic optimization and FSM verification. Experimental results are presented and discussed for the <span>mcnc</span>'91 FSM benchmarks and for the <span>iscas</span>'89 sequential circuits.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 7","pages":"Pages 487-500"},"PeriodicalIF":0.0,"publicationDate":"1995-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(95)00028-M","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126052553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-01DOI: 10.1016/0165-6074(95)00147-G
K.A. Vissers, G. Essink, P.H.J. van Gerwen, P.J.M. Janssen, O. Popp, E. Riddersma, W.J.M. Smits, H.J.M. Veendrick
Programmable video signal processor ICs (VSPs) and dedicated programming tools have been developed for the real-time processing of digital video signals. A large number of applications have been developed with boards containing several of these processors. Currently two implementations of the general architecture exist: VSP1 and VSP2. A single VSP chip contains several arithmetic and logic elements (ALEs) and memory elements. A complete switch matrix implements the unconstrained communication between all elements in a single cycle. The programming of these processors is carried out with signal flow graphs. These signal flow graphs can conveniently express multi-rate algorithms. These algorithms are then mapped onto a network of processors. Mapping is decomposed into delay management, partitioning and scheduling. The solution strategies for the partitioning problem and the scheduling problem are illustrated. Applications with these processors have been made for a number of industrially relevant video algorithms, including the complete processing of next generation fully digital studio TV cameras and several image improvement algorithms in medical applications. Results of the mapping are presented for a number of algorithms in the field of TV processing.
{"title":"Architecture and programming of two generations video signal processors","authors":"K.A. Vissers, G. Essink, P.H.J. van Gerwen, P.J.M. Janssen, O. Popp, E. Riddersma, W.J.M. Smits, H.J.M. Veendrick","doi":"10.1016/0165-6074(95)00147-G","DOIUrl":"10.1016/0165-6074(95)00147-G","url":null,"abstract":"<div><p>Programmable video signal processor ICs (VSPs) and dedicated programming tools have been developed for the real-time processing of digital video signals. A large number of applications have been developed with boards containing several of these processors. Currently two implementations of the general architecture exist: VSP1 and VSP2. A single VSP chip contains several arithmetic and logic elements (ALEs) and memory elements. A complete switch matrix implements the unconstrained communication between all elements in a single cycle. The programming of these processors is carried out with signal flow graphs. These signal flow graphs can conveniently express multi-rate algorithms. These algorithms are then mapped onto a network of processors. Mapping is decomposed into delay management, partitioning and scheduling. The solution strategies for the partitioning problem and the scheduling problem are illustrated. Applications with these processors have been made for a number of industrially relevant video algorithms, including the complete processing of next generation fully digital studio TV cameras and several image improvement algorithms in medical applications. Results of the mapping are presented for a number of algorithms in the field of TV processing.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 5","pages":"Pages 373-390"},"PeriodicalIF":0.0,"publicationDate":"1995-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(95)00147-G","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130948584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-01DOI: 10.1016/0165-6074(95)00025-J
Paul Feautrier
The problem of automatically generating programs for massively parallel computers is a very complicated one, mainly because there are many architectures, each of them seeming to pose its own particular compilation problem. The purpose of this paper is to propose a framework in which to discuss the compilation process, and to show that the features which affect it are few and generate a small number of combinations. The paper is oriented toward fine-grained parallelization of static control programs, with emphasis on dataflow analysis, scheduling and placement. When going from there to more general programs and to coarser parallelism, one encounters new problems, some of which are discussed in the conclusion.
{"title":"Compiling for massively parallel architectures: a perspective","authors":"Paul Feautrier","doi":"10.1016/0165-6074(95)00025-J","DOIUrl":"10.1016/0165-6074(95)00025-J","url":null,"abstract":"<div><p>The problem of automatically generating programs for massively parallel computers is a very complicated one, mainly because there are many architectures, each of them seeming to pose its own particular compilation problem. The purpose of this paper is to propose a framework in which to discuss the compilation process, and to show that the features which affect it are few and generate a small number of combinations. The paper is oriented toward fine-grained parallelization of static control programs, with emphasis on dataflow analysis, scheduling and placement. When going from there to more general programs and to coarser parallelism, one encounters new problems, some of which are discussed in the conclusion.</p></div>","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 5","pages":"Pages 425-439"},"PeriodicalIF":0.0,"publicationDate":"1995-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(95)00025-J","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134205407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-10-01DOI: 10.1016/0165-6074(95)90000-4
{"title":"Calender of forthcoming conferences and events","authors":"","doi":"10.1016/0165-6074(95)90000-4","DOIUrl":"https://doi.org/10.1016/0165-6074(95)90000-4","url":null,"abstract":"","PeriodicalId":100927,"journal":{"name":"Microprocessing and Microprogramming","volume":"41 5","pages":"Pages 441-442"},"PeriodicalIF":0.0,"publicationDate":"1995-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/0165-6074(95)90000-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138196566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}