Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472292
Guang Li, B. Svensson
The application of artificial neural networks (ANN) in real-time embedded systems demands high performance computers. Miniaturized massively parallel architectures are suitable computation platforms for this task. An important question which arises is how to establish an effective mapping from ANN algorithms to hardware. In this paper, we demonstrate how an effective mapping can be achieved with our programming environment in close combination with an optimized architecture design targeted for neuro-computing.<>
{"title":"Effective mapping of artificial neural network algorithms onto massively parallel hardware: the REMAP programming environment","authors":"Guang Li, B. Svensson","doi":"10.1109/ICAPP.1995.472292","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472292","url":null,"abstract":"The application of artificial neural networks (ANN) in real-time embedded systems demands high performance computers. Miniaturized massively parallel architectures are suitable computation platforms for this task. An important question which arises is how to establish an effective mapping from ANN algorithms to hardware. In this paper, we demonstrate how an effective mapping can be achieved with our programming environment in close combination with an optimized architecture design targeted for neuro-computing.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121464484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472187
P. Lukowicz, W. Tichy
We discuss the results of a feasibility study of an opto-electronic shared memory with concurrent read, concurrent write capability. Unlike previous such work we consider a true hardware shared memory rather then a simulation on a tightly, optically connected distributed memory computer. We describe a design that could be implemented using compact integrated semiconductor modules and propose ways to solve two major problems faced by such a device: optical system complexity and parallel word level write consistency. It is shown that, in principle, a memory with GBytes capacity and a latency of less then 1 ns, accessed by up to 10/sup 5/ processors could be feasible. Using devices currently available as laboratory prototypes and taking into account energy and crosstalk considerations a capacity of more then 1 MB and a latency of about 50 ns might be attained for up to 1000 processors.<>
{"title":"On the feasibility of a scalable opto-electronic CRCW shared memory","authors":"P. Lukowicz, W. Tichy","doi":"10.1109/ICAPP.1995.472187","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472187","url":null,"abstract":"We discuss the results of a feasibility study of an opto-electronic shared memory with concurrent read, concurrent write capability. Unlike previous such work we consider a true hardware shared memory rather then a simulation on a tightly, optically connected distributed memory computer. We describe a design that could be implemented using compact integrated semiconductor modules and propose ways to solve two major problems faced by such a device: optical system complexity and parallel word level write consistency. It is shown that, in principle, a memory with GBytes capacity and a latency of less then 1 ns, accessed by up to 10/sup 5/ processors could be feasible. Using devices currently available as laboratory prototypes and taking into account energy and crosstalk considerations a capacity of more then 1 MB and a latency of about 50 ns might be attained for up to 1000 processors.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120998560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472302
V.L. Varscavsky
From the standpoint of hardware experts, asynchronism is connected with the concept of physical time as an independent physical variable and is determined by the variations of transient process durations in hardware circuits, modules and blocks that are physical objects by their nature. Software and architecture experts treat asynchronism as a partial order on events that are logical objects, i.e. they think in terms of logical time. In these terms, asynchronism is the variation of the process step quantity without respect to the real duration of these seeps in physical time. The measuring tool for time is a clock and the precision of the clock (along with the system of signal delivery) we can attain determines the area of its application (the allowed value of physical time step). The basic idea of self-timing is detecting the moments when transient processes in physical components are over and producing the corresponding logical signals that provide the transition to logical time (delay-insensitive design) in spite of the delay variation reasons. As all the logical signals invariant to the physical time and representing the events in the system are formed, self-timed methodology has a number of efficient hardware support methods to coordinate the events of the corresponding concurrent specification.<>
{"title":"Asynchronous interaction in massively parallel computing","authors":"V.L. Varscavsky","doi":"10.1109/ICAPP.1995.472302","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472302","url":null,"abstract":"From the standpoint of hardware experts, asynchronism is connected with the concept of physical time as an independent physical variable and is determined by the variations of transient process durations in hardware circuits, modules and blocks that are physical objects by their nature. Software and architecture experts treat asynchronism as a partial order on events that are logical objects, i.e. they think in terms of logical time. In these terms, asynchronism is the variation of the process step quantity without respect to the real duration of these seeps in physical time. The measuring tool for time is a clock and the precision of the clock (along with the system of signal delivery) we can attain determines the area of its application (the allowed value of physical time step). The basic idea of self-timing is detecting the moments when transient processes in physical components are over and producing the corresponding logical signals that provide the transition to logical time (delay-insensitive design) in spite of the delay variation reasons. As all the logical signals invariant to the physical time and representing the events in the system are formed, self-timed methodology has a number of efficient hardware support methods to coordinate the events of the corresponding concurrent specification.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128726601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472278
N. Mani, B. Srinivasan
This paper describes a floorplan design approach that combines both a heuristic graph bipartitioning procedure and a slicing tree representation in the physical design of VLSI systems. The description of the circuit to be floorplanned contains a set of functional modules each having a number of possible dimensions and a net-list containing the connectivity information. The slicing tree representation provides an efficient free traversal operations using recursion for obtaining area-efficient floorplans. The slicing paradigm also eliminates the cyclical conflicts in module placement and hence ensures better routability.<>
{"title":"A slicing-floorplan algorithm implementation for VLSI design","authors":"N. Mani, B. Srinivasan","doi":"10.1109/ICAPP.1995.472278","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472278","url":null,"abstract":"This paper describes a floorplan design approach that combines both a heuristic graph bipartitioning procedure and a slicing tree representation in the physical design of VLSI systems. The description of the circuit to be floorplanned contains a set of functional modules each having a number of possible dimensions and a net-list containing the connectivity information. The slicing tree representation provides an efficient free traversal operations using recursion for obtaining area-efficient floorplans. The slicing paradigm also eliminates the cyclical conflicts in module placement and hence ensures better routability.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127440368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472245
S. Lei, Kang Zhang
A major problem with collecting trace data for performance monitoring is its intrusiveness to the program being monitored. It sometimes distorts the run-time behaviour of the program so that the collected data become irrelevant to its original program. We proposed a new technique, called the postponing technique, to maintain the original program behaviour in order to collect accurate performance data. It preserves event orders by equalling the instrumentation delay for each pair of communication events. This technique does not extend the execution time taken by the conventional approach and is able to estimate the original event ordering. Our technique was implemented on a Connection Machine, CM-5. We find that the technique estimates more accurate event ordering information than the conventional technique.<>
{"title":"A software instrumentation technique for performance tuning of message-passing programs","authors":"S. Lei, Kang Zhang","doi":"10.1109/ICAPP.1995.472245","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472245","url":null,"abstract":"A major problem with collecting trace data for performance monitoring is its intrusiveness to the program being monitored. It sometimes distorts the run-time behaviour of the program so that the collected data become irrelevant to its original program. We proposed a new technique, called the postponing technique, to maintain the original program behaviour in order to collect accurate performance data. It preserves event orders by equalling the instrumentation delay for each pair of communication events. This technique does not extend the execution time taken by the conventional approach and is able to estimate the original event ordering. Our technique was implemented on a Connection Machine, CM-5. We find that the technique estimates more accurate event ordering information than the conventional technique.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"156 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124378238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472216
H. ElGindy, L. Wetherall
In this paper we introduce an algorithm for computing the Voronoi Diagram using the L/sub 1/ metric for n planar points on the reconfigurable mesh model of computation. The algorithm contains a new technique of embedding a planar graph on the mesh using the reconfigurable nature of the architecture.<>
{"title":"An L/sub 1/ Voronoi diagram algorithm for a reconfigurable mesh","authors":"H. ElGindy, L. Wetherall","doi":"10.1109/ICAPP.1995.472216","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472216","url":null,"abstract":"In this paper we introduce an algorithm for computing the Voronoi Diagram using the L/sub 1/ metric for n planar points on the reconfigurable mesh model of computation. The algorithm contains a new technique of embedding a planar graph on the mesh using the reconfigurable nature of the architecture.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122316694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472291
Ok-Hyeong Cho, R. Colomb
In massively parallel SIMD machines, communication bottlenecks have been a major problem due to the limitation of available topologies. Especially they are not well suited to broadcast-type communications. Some suggested approaches are not practical, even though they are asymptotically fast, because they incur large minimum latency. In this paper, a simple and practical linear broadcast-type communication algorithm which is based on associative computing and does not use interconnection networks at all, is presented.<>
{"title":"Associative broadcast communication in massively parallel SIMD machines: a practical approach","authors":"Ok-Hyeong Cho, R. Colomb","doi":"10.1109/ICAPP.1995.472291","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472291","url":null,"abstract":"In massively parallel SIMD machines, communication bottlenecks have been a major problem due to the limitation of available topologies. Especially they are not well suited to broadcast-type communications. Some suggested approaches are not practical, even though they are asymptotically fast, because they incur large minimum latency. In this paper, a simple and practical linear broadcast-type communication algorithm which is based on associative computing and does not use interconnection networks at all, is presented.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114490936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472289
M.F. Ali, M. Guizani
An efficient design methodology for the construction of an optical space invariant hypercube interconnection network is presented. This network connects a two-dimensional array of input nodes to a two-dimensional array of output nodes. The basis of the design is a 2/sup 6/ node hypercube from which hypercubes of higher dimensions can be built. The requirements for the optical implementation of this scheme are also proposed. It is shown that hypercubes of dimension up to 21 can be realized using the given implementation.<>
{"title":"A new design methodology for optical hypercube interconnection network","authors":"M.F. Ali, M. Guizani","doi":"10.1109/ICAPP.1995.472289","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472289","url":null,"abstract":"An efficient design methodology for the construction of an optical space invariant hypercube interconnection network is presented. This network connects a two-dimensional array of input nodes to a two-dimensional array of output nodes. The basis of the design is a 2/sup 6/ node hypercube from which hypercubes of higher dimensions can be built. The requirements for the optical implementation of this scheme are also proposed. It is shown that hypercubes of dimension up to 21 can be realized using the given implementation.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123051251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472283
Gernot A. Fink, N. Jungclaus, Helge Ritter, G. Sagerer
Unlike in traditional approaches to parallel or distributed processing where normally well structured problems are implemented completely in some programming environment we are faced with the problem of integrating existing heterogeneous software systems. Furthermore, pattern analysis stresses special aspects of communication capabilities. Therefore, we propose a new communication framework dedicated to heterogeneous pattern analysis systems that handles typed structured data, enables completely symmetric interaction, and provides various call semantics. A first prototype evaluating some of the concepts in practical situations is presented.<>
{"title":"A communication framework for heterogeneous distributed pattern analysis","authors":"Gernot A. Fink, N. Jungclaus, Helge Ritter, G. Sagerer","doi":"10.1109/ICAPP.1995.472283","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472283","url":null,"abstract":"Unlike in traditional approaches to parallel or distributed processing where normally well structured problems are implemented completely in some programming environment we are faced with the problem of integrating existing heterogeneous software systems. Furthermore, pattern analysis stresses special aspects of communication capabilities. Therefore, we propose a new communication framework dedicated to heterogeneous pattern analysis systems that handles typed structured data, enables completely symmetric interaction, and provides various call semantics. A first prototype evaluating some of the concepts in practical situations is presented.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130247737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1995-04-19DOI: 10.1109/ICAPP.1995.472253
A. Koseki, H. Komatsu, Y. Fukazawa
For instruction-level parallel machines, it is essential to extract parallelly executable instructions from a program by code scheduling. In this paper, we propose a new code scheduling technique using an extension of PDG. This technique parallelizes non-numerical programs, producing better machine codes than these created by percolation scheduling.<>
{"title":"A global code scheduling technique using guarded PDG","authors":"A. Koseki, H. Komatsu, Y. Fukazawa","doi":"10.1109/ICAPP.1995.472253","DOIUrl":"https://doi.org/10.1109/ICAPP.1995.472253","url":null,"abstract":"For instruction-level parallel machines, it is essential to extract parallelly executable instructions from a program by code scheduling. In this paper, we propose a new code scheduling technique using an extension of PDG. This technique parallelizes non-numerical programs, producing better machine codes than these created by percolation scheduling.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134444531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}