Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472175
Young-Chon Kim, Pal-Jin Lee, D. Choi, Byung-Ok Kim, Sungwan Park, Young-sun Kim
With variable bit rate (VBR) video sources, adjacent slices in a frame are strongly correlated with each other. This is also the case for the frame represented by frame correlation. VBR video sources can be statistically characterized by peak rate, average rate, and standard deviation of the rate of generated cells. Taking account of each correlative and statistical properties, VBR video sources can be more efficiently transmitted by estimating the required bandwidth. In this paper, we propose a scheme that predicts and allocates dynamically transmission bandwidth for VBR video sources in ATM based BISDN. The performance of the proposed scheme is evaluated through simulations. Simulation results show that the proposed scheme is superior to the conventional ones in terms of bandwidth utilization and cell loss rate.<>
{"title":"Dynamic bandwidth allocation for VBR video sources in ATM based BISDN","authors":"Young-Chon Kim, Pal-Jin Lee, D. Choi, Byung-Ok Kim, Sungwan Park, Young-sun Kim","doi":"10.1109/ICAPP.1995.472175","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472175","url":null,"abstract":"With variable bit rate (VBR) video sources, adjacent slices in a frame are strongly correlated with each other. This is also the case for the frame represented by frame correlation. VBR video sources can be statistically characterized by peak rate, average rate, and standard deviation of the rate of generated cells. Taking account of each correlative and statistical properties, VBR video sources can be more efficiently transmitted by estimating the required bandwidth. In this paper, we propose a scheme that predicts and allocates dynamically transmission bandwidth for VBR video sources in ATM based BISDN. The performance of the proposed scheme is evaluated through simulations. Simulation results show that the proposed scheme is superior to the conventional ones in terms of bandwidth utilization and cell loss rate.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124094099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472245
S. Lei, Kang Zhang
A major problem with collecting trace data for performance monitoring is its intrusiveness to the program being monitored. It sometimes distorts the run-time behaviour of the program so that the collected data become irrelevant to its original program. We proposed a new technique, called the postponing technique, to maintain the original program behaviour in order to collect accurate performance data. It preserves event orders by equalling the instrumentation delay for each pair of communication events. This technique does not extend the execution time taken by the conventional approach and is able to estimate the original event ordering. Our technique was implemented on a Connection Machine, CM-5. We find that the technique estimates more accurate event ordering information than the conventional technique.<>
{"title":"A software instrumentation technique for performance tuning of message-passing programs","authors":"S. Lei, Kang Zhang","doi":"10.1109/ICAPP.1995.472245","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472245","url":null,"abstract":"A major problem with collecting trace data for performance monitoring is its intrusiveness to the program being monitored. It sometimes distorts the run-time behaviour of the program so that the collected data become irrelevant to its original program. We proposed a new technique, called the postponing technique, to maintain the original program behaviour in order to collect accurate performance data. It preserves event orders by equalling the instrumentation delay for each pair of communication events. This technique does not extend the execution time taken by the conventional approach and is able to estimate the original event ordering. Our technique was implemented on a Connection Machine, CM-5. We find that the technique estimates more accurate event ordering information than the conventional technique.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"156 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124378238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472291
Ok-Hyeong Cho, R. Colomb
In massively parallel SIMD machines, communication bottlenecks have been a major problem due to the limitation of available topologies. Especially they are not well suited to broadcast-type communications. Some suggested approaches are not practical, even though they are asymptotically fast, because they incur large minimum latency. In this paper, a simple and practical linear broadcast-type communication algorithm which is based on associative computing and does not use interconnection networks at all, is presented.<>
{"title":"Associative broadcast communication in massively parallel SIMD machines: a practical approach","authors":"Ok-Hyeong Cho, R. Colomb","doi":"10.1109/ICAPP.1995.472291","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472291","url":null,"abstract":"In massively parallel SIMD machines, communication bottlenecks have been a major problem due to the limitation of available topologies. Especially they are not well suited to broadcast-type communications. Some suggested approaches are not practical, even though they are asymptotically fast, because they incur large minimum latency. In this paper, a simple and practical linear broadcast-type communication algorithm which is based on associative computing and does not use interconnection networks at all, is presented.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114490936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472187
P. Lukowicz, W. Tichy
We discuss the results of a feasibility study of an opto-electronic shared memory with concurrent read, concurrent write capability. Unlike previous such work we consider a true hardware shared memory rather then a simulation on a tightly, optically connected distributed memory computer. We describe a design that could be implemented using compact integrated semiconductor modules and propose ways to solve two major problems faced by such a device: optical system complexity and parallel word level write consistency. It is shown that, in principle, a memory with GBytes capacity and a latency of less then 1 ns, accessed by up to 10/sup 5/ processors could be feasible. Using devices currently available as laboratory prototypes and taking into account energy and crosstalk considerations a capacity of more then 1 MB and a latency of about 50 ns might be attained for up to 1000 processors.<>
{"title":"On the feasibility of a scalable opto-electronic CRCW shared memory","authors":"P. Lukowicz, W. Tichy","doi":"10.1109/ICAPP.1995.472187","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472187","url":null,"abstract":"We discuss the results of a feasibility study of an opto-electronic shared memory with concurrent read, concurrent write capability. Unlike previous such work we consider a true hardware shared memory rather then a simulation on a tightly, optically connected distributed memory computer. We describe a design that could be implemented using compact integrated semiconductor modules and propose ways to solve two major problems faced by such a device: optical system complexity and parallel word level write consistency. It is shown that, in principle, a memory with GBytes capacity and a latency of less then 1 ns, accessed by up to 10/sup 5/ processors could be feasible. Using devices currently available as laboratory prototypes and taking into account energy and crosstalk considerations a capacity of more then 1 MB and a latency of about 50 ns might be attained for up to 1000 processors.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120998560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472289
M.F. Ali, M. Guizani
An efficient design methodology for the construction of an optical space invariant hypercube interconnection network is presented. This network connects a two-dimensional array of input nodes to a two-dimensional array of output nodes. The basis of the design is a 2/sup 6/ node hypercube from which hypercubes of higher dimensions can be built. The requirements for the optical implementation of this scheme are also proposed. It is shown that hypercubes of dimension up to 21 can be realized using the given implementation.<>
{"title":"A new design methodology for optical hypercube interconnection network","authors":"M.F. Ali, M. Guizani","doi":"10.1109/ICAPP.1995.472289","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472289","url":null,"abstract":"An efficient design methodology for the construction of an optical space invariant hypercube interconnection network is presented. This network connects a two-dimensional array of input nodes to a two-dimensional array of output nodes. The basis of the design is a 2/sup 6/ node hypercube from which hypercubes of higher dimensions can be built. The requirements for the optical implementation of this scheme are also proposed. It is shown that hypercubes of dimension up to 21 can be realized using the given implementation.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123051251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472278
N. Mani, B. Srinivasan
This paper describes a floorplan design approach that combines both a heuristic graph bipartitioning procedure and a slicing tree representation in the physical design of VLSI systems. The description of the circuit to be floorplanned contains a set of functional modules each having a number of possible dimensions and a net-list containing the connectivity information. The slicing tree representation provides an efficient free traversal operations using recursion for obtaining area-efficient floorplans. The slicing paradigm also eliminates the cyclical conflicts in module placement and hence ensures better routability.<>
{"title":"A slicing-floorplan algorithm implementation for VLSI design","authors":"N. Mani, B. Srinivasan","doi":"10.1109/ICAPP.1995.472278","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472278","url":null,"abstract":"This paper describes a floorplan design approach that combines both a heuristic graph bipartitioning procedure and a slicing tree representation in the physical design of VLSI systems. The description of the circuit to be floorplanned contains a set of functional modules each having a number of possible dimensions and a net-list containing the connectivity information. The slicing tree representation provides an efficient free traversal operations using recursion for obtaining area-efficient floorplans. The slicing paradigm also eliminates the cyclical conflicts in module placement and hence ensures better routability.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127440368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472279
F. Wang, Young-il Choo
We develop an optimized program for the N-body problem on the CM-5 with vector units. The work is intended to make full use of the power of the vector pipelines provided by the CM-5 equipped with vector units to improve the computation performance. Some development issues using the vector units are discussed. The code is written in CDPEAC, an assembly-like language which can be called from C. Performance data and some analysis results are given.<>
{"title":"Vectoring the N-body problem on the CM-5","authors":"F. Wang, Young-il Choo","doi":"10.1109/ICAPP.1995.472279","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472279","url":null,"abstract":"We develop an optimized program for the N-body problem on the CM-5 with vector units. The work is intended to make full use of the power of the vector pipelines provided by the CM-5 equipped with vector units to improve the computation performance. Some development issues using the vector units are discussed. The code is written in CDPEAC, an assembly-like language which can be called from C. Performance data and some analysis results are given.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"127 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113996850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472253
A. Koseki, H. Komatsu, Y. Fukazawa
For instruction-level parallel machines, it is essential to extract parallelly executable instructions from a program by code scheduling. In this paper, we propose a new code scheduling technique using an extension of PDG. This technique parallelizes non-numerical programs, producing better machine codes than these created by percolation scheduling.<>
{"title":"A global code scheduling technique using guarded PDG","authors":"A. Koseki, H. Komatsu, Y. Fukazawa","doi":"10.1109/ICAPP.1995.472253","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472253","url":null,"abstract":"For instruction-level parallel machines, it is essential to extract parallelly executable instructions from a program by code scheduling. In this paper, we propose a new code scheduling technique using an extension of PDG. This technique parallelizes non-numerical programs, producing better machine codes than these created by percolation scheduling.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134444531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472188
S. Mahapatra, R. Mahapatra
This paper presents a mapping scheme for parallel pipelined execution of the Backpropagation Learning Algorithm on distributed memory multiprocessors (DMMs). The proposed implementation exhibits training set parallelism that involves batch updating. Simple algorithms have been presented, which allow the data transfer involved in both forward and backward executions phases of the backpropagation algorithm to be carried out with a small communication overhead. The effectiveness of our mapping has been illustrated, by estimating the speedup of a proposed implementation on an array of T-805 transputers.<>
{"title":"Mapping of backpropagation learning onto distributed memory multiprocessors","authors":"S. Mahapatra, R. Mahapatra","doi":"10.1109/ICAPP.1995.472188","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472188","url":null,"abstract":"This paper presents a mapping scheme for parallel pipelined execution of the Backpropagation Learning Algorithm on distributed memory multiprocessors (DMMs). The proposed implementation exhibits training set parallelism that involves batch updating. Simple algorithms have been presented, which allow the data transfer involved in both forward and backward executions phases of the backpropagation algorithm to be carried out with a small communication overhead. The effectiveness of our mapping has been illustrated, by estimating the speedup of a proposed implementation on an array of T-805 transputers.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130344975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472271
Bin Zhou, Mohammed Atiquzzaman
Multistage Interconnection Networks (MIN) are used to connect processors and memories in large scale scalable multiprocessor systems. MINs have also been proposed as switching fabrics in ATM networks in the future Broadband ISDN networks. A MIN consists of several stages of small crossbar switching elements (SE). Buffers are used in the SEs to increase the throughput of the MIN and prevent internal loss of packets. Different buffering schemes for the SEs are discussed in this paper. The objective of this paper is to study the performance of MINs with different buffering schemes, in the presence of uniform and hot spot traffic patterns. The results obtained from the study will help the network designers in choosing appropriate buffering strategies for MINs. For comparing different buffering strategies, the throughput and packet delay have been used as the performance measures.<>
{"title":"A performance comparison of buffering schemes for multistage switches","authors":"Bin Zhou, Mohammed Atiquzzaman","doi":"10.1109/ICAPP.1995.472271","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472271","url":null,"abstract":"Multistage Interconnection Networks (MIN) are used to connect processors and memories in large scale scalable multiprocessor systems. MINs have also been proposed as switching fabrics in ATM networks in the future Broadband ISDN networks. A MIN consists of several stages of small crossbar switching elements (SE). Buffers are used in the SEs to increase the throughput of the MIN and prevent internal loss of packets. Different buffering schemes for the SEs are discussed in this paper. The objective of this paper is to study the performance of MINs with different buffering schemes, in the presence of uniform and hot spot traffic patterns. The results obtained from the study will help the network designers in choosing appropriate buffering strategies for MINs. For comparing different buffering strategies, the throughput and packet delay have been used as the performance measures.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132613045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}