Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.222985
P. Wilsey, D. Hensgen
This paper proposes a strategy for exploiting massively parallel SIMD computers for general purpose computation. The approach places compiled programs into the local memory space of each distinct processing element (PE). Within each PE, a local program counter is initialized and the instructions are interpreted in parallel across all of the PEs by control signals emanating from the central control unit. Initial experiments with randomly generated programs show that speedup of approximately 700 is attainable on a SIMD processor with 8 K processing elements. Furthermore, additional experiments have shown that the speedup increases linearly with the number of processing elements.<>
{"title":"Exploiting SIMD computers for general purpose computation","authors":"P. Wilsey, D. Hensgen","doi":"10.1109/IPPS.1992.222985","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222985","url":null,"abstract":"This paper proposes a strategy for exploiting massively parallel SIMD computers for general purpose computation. The approach places compiled programs into the local memory space of each distinct processing element (PE). Within each PE, a local program counter is initialized and the instructions are interpreted in parallel across all of the PEs by control signals emanating from the central control unit. Initial experiments with randomly generated programs show that speedup of approximately 700 is attainable on a SIMD processor with 8 K processing elements. Furthermore, additional experiments have shown that the speedup increases linearly with the number of processing elements.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130206845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223008
J. Jenq, S. Sahni
The authors develop reconfigurable mesh (RMESH) algorithms for window broadcasting, data shifts and consecutive sum. These are then used to develop efficient algorithms to compute the histogram of an image and to perform histogram modification. The histogram of an N*N image is computed by an N*N RMESH in O( square root B log /sub square root B/(N/ square root B) for B>
作者开发了可重构网格(RMESH)算法,用于窗口广播、数据移位和连续求和。然后使用这些来开发有效的算法来计算图像的直方图并执行直方图修改。N*N图像的直方图由N*N的RMESH在O(√B log /sub√B/(N/√B) for B>中计算。
{"title":"Histogramming on a reconfigurable mesh computer","authors":"J. Jenq, S. Sahni","doi":"10.1109/IPPS.1992.223008","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223008","url":null,"abstract":"The authors develop reconfigurable mesh (RMESH) algorithms for window broadcasting, data shifts and consecutive sum. These are then used to develop efficient algorithms to compute the histogram of an image and to perform histogram modification. The histogram of an N*N image is computed by an N*N RMESH in O( square root B log /sub square root B/(N/ square root B) for B<N, O( square root N) for B=N, and O( square root B) for N<B<or=N/sup 2/. B is the number of gray scale values. Histogram modification is done in O( square root N) time by an N*N RMESH.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"44 23","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132737925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.222968
A. Khokhar, M. Dubois
The authors map several interprocessor communication and linear algebra algorithms on a memory coherent hierarchical shared-memory multiprocessor (HSM) system and their communication complexities are evaluated. The results show that the hierarchical architecture is ill-suited to algorithms exhibiting no temporal locality on data accesses or to the algorithms with point-to-point communication.<>
{"title":"Matching algorithms and architecture in hierarchical shared-memory multiprocessor (HSM) systems","authors":"A. Khokhar, M. Dubois","doi":"10.1109/IPPS.1992.222968","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222968","url":null,"abstract":"The authors map several interprocessor communication and linear algebra algorithms on a memory coherent hierarchical shared-memory multiprocessor (HSM) system and their communication complexities are evaluated. The results show that the hierarchical architecture is ill-suited to algorithms exhibiting no temporal locality on data accesses or to the algorithms with point-to-point communication.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132695974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223030
M. Hakami, P. Warter, C. Boncelet, David Nassimi
Based on a recently developed class of sorting networks, new VLSI architectures suitable for order statistic filtering are developed. The major advantage of these architectures is minimal response-time regardless of the number of stages in the pipeline; an effective characteristic for implementing recursive order statistic filters. The devised word-parallel architecture is the only one introduced to date that is capable of operating in both recursive and standard modes with optimal throughput. The proposed architectures are also suitable for implementing order statistic filters with multiple overlapping windows.<>
{"title":"VLSI architectures for recursive and multiple-window order statistic filtering","authors":"M. Hakami, P. Warter, C. Boncelet, David Nassimi","doi":"10.1109/IPPS.1992.223030","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223030","url":null,"abstract":"Based on a recently developed class of sorting networks, new VLSI architectures suitable for order statistic filtering are developed. The major advantage of these architectures is minimal response-time regardless of the number of stages in the pipeline; an effective characteristic for implementing recursive order statistic filters. The devised word-parallel architecture is the only one introduced to date that is capable of operating in both recursive and standard modes with optimal throughput. The proposed architectures are also suitable for implementing order statistic filters with multiple overlapping windows.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"322 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116340099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223062
A. Saoudi, M. Nivat, C. Rangan, Ravi Sundaram, G. D. Ramkumar
Presents a parallel algorithm for verifying that a string X is formed by the shuffle of two strings Y and Z. The algorithm runs in O(log/sup 2/n) time with O(n/sup 2//log/sup 2/n) processors on the EREW-PRAM model.<>
{"title":"A parallel algorithm for recognizing the shuffle of two strings","authors":"A. Saoudi, M. Nivat, C. Rangan, Ravi Sundaram, G. D. Ramkumar","doi":"10.1109/IPPS.1992.223062","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223062","url":null,"abstract":"Presents a parallel algorithm for verifying that a string X is formed by the shuffle of two strings Y and Z. The algorithm runs in O(log/sup 2/n) time with O(n/sup 2//log/sup 2/n) processors on the EREW-PRAM model.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"15 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133876725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.222995
L. Lin, J. Antonio
A stochastic model for a class of distributed asynchronous fixed point algorithms is presented and a methodology for optimizing the rate of convergence is introduced. An important parameter in the authors model, called the degree of synchronization, quantifies the average amount of time each processor is willing to wait for information from other processors (before beginning computation of its update variable based on the available estimates of variables from other processors). The authors analyze the relationship between the convergence rate and the degree of synchronization for a class of iterative fixed point algorithms. Preliminary analysis indicates that significant improvements in convergence rates can be achieved by proper control of the parameters in the authors model.<>
{"title":"Modeling and control of distributed asynchronous computations","authors":"L. Lin, J. Antonio","doi":"10.1109/IPPS.1992.222995","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222995","url":null,"abstract":"A stochastic model for a class of distributed asynchronous fixed point algorithms is presented and a methodology for optimizing the rate of convergence is introduced. An important parameter in the authors model, called the degree of synchronization, quantifies the average amount of time each processor is willing to wait for information from other processors (before beginning computation of its update variable based on the available estimates of variables from other processors). The authors analyze the relationship between the convergence rate and the degree of synchronization for a class of iterative fixed point algorithms. Preliminary analysis indicates that significant improvements in convergence rates can be achieved by proper control of the parameters in the authors model.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121727516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223006
R. Mazzaferri, Heiko Schröder
To achieve fault tolerance through reconfiguration in mesh-connected arrays many rail networks with vastly different effectiveness and cost have been presented. The authors attempt a unified notation of these networks to allow for their comparative evaluation. They further present a method to improve the effectiveness of fault tolerant networks by combining several small switches into large crossbar switches. This method is applicable to almost all rail networks presented in the literature leading to a significant improvement of effectiveness and often also delay time along the network connections for little hardware cost. Furthermore, the switches used provide fault tolerance of the network itself, which is usually unrealistically assumed to be always fault free.<>
{"title":"A superior class of networks for reconfigurable meshes","authors":"R. Mazzaferri, Heiko Schröder","doi":"10.1109/IPPS.1992.223006","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223006","url":null,"abstract":"To achieve fault tolerance through reconfiguration in mesh-connected arrays many rail networks with vastly different effectiveness and cost have been presented. The authors attempt a unified notation of these networks to allow for their comparative evaluation. They further present a method to improve the effectiveness of fault tolerant networks by combining several small switches into large crossbar switches. This method is applicable to almost all rail networks presented in the literature leading to a significant improvement of effectiveness and often also delay time along the network connections for little hardware cost. Furthermore, the switches used provide fault tolerance of the network itself, which is usually unrealistically assumed to be always fault free.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122380526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223065
Ajay K. Gupta, Hong Wang
Parallel machines interconnecting up to thousands of processors have been proposed and recently built. One of the earliest and the most prominent one is a complete binary tree machine. The authors propose a family of tree machines called generalized compressed tree machines. Generalized compressed tree machines may, in general, be viewed as a derivative of the complete binary tree networks.<>
{"title":"Generalized compressed tree machines","authors":"Ajay K. Gupta, Hong Wang","doi":"10.1109/IPPS.1992.223065","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223065","url":null,"abstract":"Parallel machines interconnecting up to thousands of processors have been proposed and recently built. One of the earliest and the most prominent one is a complete binary tree machine. The authors propose a family of tree machines called generalized compressed tree machines. Generalized compressed tree machines may, in general, be viewed as a derivative of the complete binary tree networks.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124658189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.222977
R. Podraza, Dariusz Turlej, K. Piorun
The Vesicular Dataflow (VDF) model is presented in the paper. The VDF model has been formulated to introduce a way of storing and retrieving information and hence to reduce the main drawback of the basic DF model. Tokens can be stored in vesicles in the VDF model and then distributed in non-deterministic way. State-dependent computations and global variables can be expressed in the dataflow manner. Informal definition of the VDF model and some simple applications are covered by the paper.<>
{"title":"The vesicular dataflow model","authors":"R. Podraza, Dariusz Turlej, K. Piorun","doi":"10.1109/IPPS.1992.222977","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222977","url":null,"abstract":"The Vesicular Dataflow (VDF) model is presented in the paper. The VDF model has been formulated to introduce a way of storing and retrieving information and hence to reduce the main drawback of the basic DF model. Tokens can be stored in vesicles in the VDF model and then distributed in non-deterministic way. State-dependent computations and global variables can be expressed in the dataflow manner. Informal definition of the VDF model and some simple applications are covered by the paper.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124755620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.222975
L. Gravano, G. Pifarré, Gustavo Denicolay, J. Sanz
Two new algorithms for worm-hole routing in the hypercube are presented. The first hypercube algorithm is adaptive, but non-minimal in the sense that some derouting is permitted. Then another deadlock-free adaptive worm-hole based routing algorithm for the hypercube interconnection is presented which is minimal. Finally some well-known worm-hole algorithms for the hypercube were evaluated together with the new ones on a hypercube of 2/sup 10/ nodes. One oblivious algorithm, the Dimension-Order, or E-Cube routing algorithm (W. Dally, C. Seitz, 1987) was tried. In addition, three partially adaptive algorithms were considered: the Hanging algorithm (Y. Birk, P. Gibbons, D. Soroker, J. Sanz, 1989 and S. Konstantinidou, 1990), the Zenith algorithm (S. Konstantinidou, 1990), and the Hanging-Order algorithm (G.-M. Chia, S. Chalasani, C.S. Raghavendra, 1991). Finally, a fully adaptive minimal algorithm presented independently by L. Gravano, G. Pifarre, S.A. Felperin and J. Sanz (1991) and J. Duato was tried. This algorithm allows each message to choose adaptively among all the shortest paths from its source to its destination. Only four virtual channels per physical link are needed to achieve this. This technique is referred to as Fully. The results obtained show that the two new algorithms are good candidates as a choice for worm-hole routing in the hypercube network.<>
提出了超立方体中虫孔路由的两种新算法。第一种超立方体算法是自适应的,但在允许一些路由的意义上是非最小的。在此基础上,提出了另一种基于无死锁自适应虫洞的超立方体互连最小路由算法。最后,在2/sup / 10/节点的超立方体上对一些著名的虫洞算法和新的虫洞算法进行了评价。一种无关算法,维度顺序,或E-Cube路由算法(W. Dally, C. Seitz, 1987)进行了尝试。此外,还考虑了三种部分自适应算法:悬挂算法(Y. Birk, P. Gibbons, D. Soroker, J. Sanz, 1989和S. Konstantinidou, 1990), Zenith算法(S. Konstantinidou, 1990)和悬挂顺序算法(g . m .;Chia, S. Chalasani, C.S. Raghavendra, 1991)。最后,尝试了L. Gravano, G. Pifarre, sa . Felperin和J. Sanz(1991)以及J. Duato独立提出的全自适应最小算法。该算法允许每个消息自适应地选择从其源到目的地的所有最短路径。每个物理链路只需要四个虚拟通道就可以实现这一点。这种技术被称为full。结果表明,这两种算法都是超立方体网络中虫洞路由的理想选择。
{"title":"Adaptive deadlock-free worm-hole routing in hypercubes","authors":"L. Gravano, G. Pifarré, Gustavo Denicolay, J. Sanz","doi":"10.1109/IPPS.1992.222975","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222975","url":null,"abstract":"Two new algorithms for worm-hole routing in the hypercube are presented. The first hypercube algorithm is adaptive, but non-minimal in the sense that some derouting is permitted. Then another deadlock-free adaptive worm-hole based routing algorithm for the hypercube interconnection is presented which is minimal. Finally some well-known worm-hole algorithms for the hypercube were evaluated together with the new ones on a hypercube of 2/sup 10/ nodes. One oblivious algorithm, the Dimension-Order, or E-Cube routing algorithm (W. Dally, C. Seitz, 1987) was tried. In addition, three partially adaptive algorithms were considered: the Hanging algorithm (Y. Birk, P. Gibbons, D. Soroker, J. Sanz, 1989 and S. Konstantinidou, 1990), the Zenith algorithm (S. Konstantinidou, 1990), and the Hanging-Order algorithm (G.-M. Chia, S. Chalasani, C.S. Raghavendra, 1991). Finally, a fully adaptive minimal algorithm presented independently by L. Gravano, G. Pifarre, S.A. Felperin and J. Sanz (1991) and J. Duato was tried. This algorithm allows each message to choose adaptively among all the shortest paths from its source to its destination. Only four virtual channels per physical link are needed to achieve this. This technique is referred to as Fully. The results obtained show that the two new algorithms are good candidates as a choice for worm-hole routing in the hypercube network.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125012678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}