Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.222973
Magali E. Azema-Barac
This paper describes a framework for implementing neural networks on massively parallel machines. The framework is generic and applies to a range of neural networks (Multi Layer Perceptron, Competitive Learning, Self-Organising Map, etc.) as well as a range of massively parallel machines (Connection Machine, Distributed Array Processor, MasPar). It consists of two phases: an abstract decomposition of neural networks and a machine specific decomposition. The abstract decomposition identifies the parallelism implemented by neural networks, and provides alternative distribution schemes according to the required exploitation of parallelism. The machine specific decomposition considers the relevant machine criteria, and integrates these with the result of the abstract decomposition to form a 'decision' system. This system formalises the relative gain of each distribution scheme according to neural network and machine criteria. It then identifies their possible optimisations. Finally, it computes and ranks the absolute speed up of each distribution scheme.<>
{"title":"A conceptual framework for implementing neural networks on massively parallel machines","authors":"Magali E. Azema-Barac","doi":"10.1109/IPPS.1992.222973","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222973","url":null,"abstract":"This paper describes a framework for implementing neural networks on massively parallel machines. The framework is generic and applies to a range of neural networks (Multi Layer Perceptron, Competitive Learning, Self-Organising Map, etc.) as well as a range of massively parallel machines (Connection Machine, Distributed Array Processor, MasPar). It consists of two phases: an abstract decomposition of neural networks and a machine specific decomposition. The abstract decomposition identifies the parallelism implemented by neural networks, and provides alternative distribution schemes according to the required exploitation of parallelism. The machine specific decomposition considers the relevant machine criteria, and integrates these with the result of the abstract decomposition to form a 'decision' system. This system formalises the relative gain of each distribution scheme according to neural network and machine criteria. It then identifies their possible optimisations. Finally, it computes and ranks the absolute speed up of each distribution scheme.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133600571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223079
A. W. Kwan, L. Bic
A technique for structuring compute-aggregate-broadcast algorithms on distributed memory computers is presented. The compute-aggregate-broadcast paradigm provides an abstraction of the problem for the programmer, allowing for separation of computation and synchronization. Such algorithms are well suited for application on distributed memory computers. The structuring technique assists the parallel programmer with synchronization, allowing the programmer to concentrate more on developing code for computation. Two examples are presented.<>
{"title":"A structuring technique for compute-aggregate-broadcast algorithms on distributed memory computers","authors":"A. W. Kwan, L. Bic","doi":"10.1109/IPPS.1992.223079","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223079","url":null,"abstract":"A technique for structuring compute-aggregate-broadcast algorithms on distributed memory computers is presented. The compute-aggregate-broadcast paradigm provides an abstraction of the problem for the programmer, allowing for separation of computation and synchronization. Such algorithms are well suited for application on distributed memory computers. The structuring technique assists the parallel programmer with synchronization, allowing the programmer to concentrate more on developing code for computation. Two examples are presented.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127181399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.222974
E. Haddad
Nonreplicated shared data of distributed applications is optimally allocated to pre-specified multilevel memory partitions at the sites of a heterogeneous multicomputer network to minimize a weighted combination of systemwide mean time delay performance and mean communication cost per access request. Greedy and fast optimization algorithms are presented for nonqueueing lightly-loaded as well as heavily-loaded multiqueue system models with channel, l/O, and memory hierarchy queues. Extensions to data exhibiting nonuniform access demand rates and distinct query and update statistics are presented.<>
{"title":"Optimal allocation of shared data over distributed memory hierarchies","authors":"E. Haddad","doi":"10.1109/IPPS.1992.222974","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222974","url":null,"abstract":"Nonreplicated shared data of distributed applications is optimally allocated to pre-specified multilevel memory partitions at the sites of a heterogeneous multicomputer network to minimize a weighted combination of systemwide mean time delay performance and mean communication cost per access request. Greedy and fast optimization algorithms are presented for nonqueueing lightly-loaded as well as heavily-loaded multiqueue system models with channel, l/O, and memory hierarchy queues. Extensions to data exhibiting nonuniform access demand rates and distinct query and update statistics are presented.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116684505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.222992
K. Rokusawa, N. Ichiyoshi
This paper proposes a scheme for changing the execution state of a pool of processes in a distributed environment where there may be processes in transit. The scheme can detect the completion of state change using weighted throw counting and detect the termination as well. It works whether the communication channels are synchronous or asynchronous, FIFO or non-FIFO. The message complexity of the scheme is typically O(number of processing elements).<>
{"title":"A scheme for state change in a distributed environment using weighted throw counting","authors":"K. Rokusawa, N. Ichiyoshi","doi":"10.1109/IPPS.1992.222992","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222992","url":null,"abstract":"This paper proposes a scheme for changing the execution state of a pool of processes in a distributed environment where there may be processes in transit. The scheme can detect the completion of state change using weighted throw counting and detect the termination as well. It works whether the communication channels are synchronous or asynchronous, FIFO or non-FIFO. The message complexity of the scheme is typically O(number of processing elements).<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134351561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223049
D. Menascé, S. Porto, S. Tripathi
It has been already demonstrated that cost-effective multiprocessor designs may be obtained by combining in the same architecture processors of different speeds (heterogeneous architecture) so that the serial and critical portions of the application may benefit from a fast single processor. The paper presents a systematic way to build static heuristic scheduling algorithms for such environments. Several algorithms are proposed and their performances are compared through simulation. One of the proposed algorithms is shown to achieve substantial performance gains as the degree of heterogeneity of the architecture increases.<>
{"title":"Processor assignment in heterogeneous parallel architectures","authors":"D. Menascé, S. Porto, S. Tripathi","doi":"10.1109/IPPS.1992.223049","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223049","url":null,"abstract":"It has been already demonstrated that cost-effective multiprocessor designs may be obtained by combining in the same architecture processors of different speeds (heterogeneous architecture) so that the serial and critical portions of the application may benefit from a fast single processor. The paper presents a systematic way to build static heuristic scheduling algorithms for such environments. Several algorithms are proposed and their performances are compared through simulation. One of the proposed algorithms is shown to achieve substantial performance gains as the degree of heterogeneity of the architecture increases.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"109 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124159810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223027
Weixiong Zhang, R. Korf
The authors present parallel algorithms for heap operations on an EREW PRAM. They first present a parallel heap construction algorithm with p processors running in O(n/p+logp) time. It takes 3.625n/p+4log p time in the worst case. The algorithm is optimal when p= theta (n/logn). They then propose a method to delete the root of a heap in parallel. To facilitate dynamic processor allocation, a data structure is developed in a preparatory step using O((n/logn)/sup 1-1/p/) processors in O(logp) time. A sequence of root deletion operations is realized such that each of these operations takes O((logn)/p+logp+loglogn) time using p processors. The authors also suggest an O((logn)/p+log p) time optimal parallel insert algorithm using p processors. When p= theta ((logn)/loglogn), both algorithms run in O(loglogn) time. The algorithms can also be extended to a parallel algorithm for deleting an element from a heap, given the address of the element.<>
{"title":"Parallel heap operations on EREW PRAM: summary of results","authors":"Weixiong Zhang, R. Korf","doi":"10.1109/IPPS.1992.223027","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223027","url":null,"abstract":"The authors present parallel algorithms for heap operations on an EREW PRAM. They first present a parallel heap construction algorithm with p processors running in O(n/p+logp) time. It takes 3.625n/p+4log p time in the worst case. The algorithm is optimal when p= theta (n/logn). They then propose a method to delete the root of a heap in parallel. To facilitate dynamic processor allocation, a data structure is developed in a preparatory step using O((n/logn)/sup 1-1/p/) processors in O(logp) time. A sequence of root deletion operations is realized such that each of these operations takes O((logn)/p+logp+loglogn) time using p processors. The authors also suggest an O((logn)/p+log p) time optimal parallel insert algorithm using p processors. When p= theta ((logn)/loglogn), both algorithms run in O(loglogn) time. The algorithms can also be extended to a parallel algorithm for deleting an element from a heap, given the address of the element.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"165 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115173649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223052
T. V. Lakshman, A. Bagchi, K. Rastani
The paper describes a scheme to schedule uncoordinated requests for resources that arrive in parallel. The specific application that it considered is that of scheduling transmission requests in ATM switches. The scheme is capable of handling both unicast and multicast transmission requests. Two implementations of the scheme using photonic devices are described. A novel aspect of the scheme is that it uses photonic devices to implement a heuristic graph-coloring algorithm needed to generate transmission schedules.<>
{"title":"A fast parallel scheduler for resource requests implemented using optical devices","authors":"T. V. Lakshman, A. Bagchi, K. Rastani","doi":"10.1109/IPPS.1992.223052","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223052","url":null,"abstract":"The paper describes a scheme to schedule uncoordinated requests for resources that arrive in parallel. The specific application that it considered is that of scheduling transmission requests in ATM switches. The scheme is capable of handling both unicast and multicast transmission requests. Two implementations of the scheme using photonic devices are described. A novel aspect of the scheme is that it uses photonic devices to implement a heuristic graph-coloring algorithm needed to generate transmission schedules.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115827498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223078
A. Aggarwal, W. T. Ma, G. Sandri, S. Sarkar
Results from parallel computing on a CM-2 Connection Machine are reported for a variety of graph-theoretic models for fitness optimization in evolutionary biology. These computations are among the most complex ever undertaken in this field and make full use of the internal hypercube architecture of the CM-2.<>
{"title":"Adaptive graph computations with a connection machine","authors":"A. Aggarwal, W. T. Ma, G. Sandri, S. Sarkar","doi":"10.1109/IPPS.1992.223078","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223078","url":null,"abstract":"Results from parallel computing on a CM-2 Connection Machine are reported for a variety of graph-theoretic models for fitness optimization in evolutionary biology. These computations are among the most complex ever undertaken in this field and make full use of the internal hypercube architecture of the CM-2.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125863376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223037
Myung-Kook Yang, C. Das
The authors propose a parallel decomposite, best-first' search branch-and bound algorithm for MIN-based multiprocessors. They start with a new probabilistic model to estimate the number of evaluated nodes for a serial algorithm. The proposed algorithm initially decomposes a problem into several subproblems. Each processor executes the serial best-first search to find a local feasible solution. The local solutions are broadcast through the network to compute the final solution. The speed-up analysis considers both the computation and communication overheads. It is seen that the parallel decomposite best-first search algorithm performs better than other reported schemes when communication overhead is taken into consideration.<>
{"title":"Analytical modeling of a parallel branch-and-bound algorithm on MIN-based multiprocessors","authors":"Myung-Kook Yang, C. Das","doi":"10.1109/IPPS.1992.223037","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223037","url":null,"abstract":"The authors propose a parallel decomposite, best-first' search branch-and bound algorithm for MIN-based multiprocessors. They start with a new probabilistic model to estimate the number of evaluated nodes for a serial algorithm. The proposed algorithm initially decomposes a problem into several subproblems. Each processor executes the serial best-first search to find a local feasible solution. The local solutions are broadcast through the network to compute the final solution. The speed-up analysis considers both the computation and communication overheads. It is seen that the parallel decomposite best-first search algorithm performs better than other reported schemes when communication overhead is taken into consideration.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121503518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.222969
Zhiyong Liu, Jia-Huai You, Xiaobo Li
The authors present a parallel storage scheme to distribute the elements of an N*N matrix over N memory banks, where N is any (odd or even) power of two, such that any rows, columns, forward and backward diagonals, and square or rectangular blocks can be accessed simultaneously without memory conflict. They present a simple scheme for address generation, which requires only logic operations and can be completed in constant time. They present two network implementation methods for data alignments for this storage scheme. Different from previously proposed routing algorithms, the algorithms for hypercube routing in this paper are free from network conflict. They do not require buffering and time length of a 'step' is shorter, therefore they are more efficient in terms of both hardware cost and speed. The authors also present a simple MIN implementation scheme for the realization of the data alignments. Schemes for processing smaller matrices efficiently on larger scale systems are also developed.<>
{"title":"The odd-even expansion storage scheme and its implementation issues","authors":"Zhiyong Liu, Jia-Huai You, Xiaobo Li","doi":"10.1109/IPPS.1992.222969","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222969","url":null,"abstract":"The authors present a parallel storage scheme to distribute the elements of an N*N matrix over N memory banks, where N is any (odd or even) power of two, such that any rows, columns, forward and backward diagonals, and square or rectangular blocks can be accessed simultaneously without memory conflict. They present a simple scheme for address generation, which requires only logic operations and can be completed in constant time. They present two network implementation methods for data alignments for this storage scheme. Different from previously proposed routing algorithms, the algorithms for hypercube routing in this paper are free from network conflict. They do not require buffering and time length of a 'step' is shorter, therefore they are more efficient in terms of both hardware cost and speed. The authors also present a simple MIN implementation scheme for the realization of the data alignments. Schemes for processing smaller matrices efficiently on larger scale systems are also developed.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":" 19","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120828750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}