Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633316
J. Francioni, J. A. Jackson, L. Albright
Portraying the behavior of parallel programs is useful in pro8ram debuming and performance tuning. For the most part, researchers have focused on finding ways to visualize what happens during a program's execution. As an alternative to visualization, auralization can also be used to portray the behavior of parallel programs. This paper investigates whether or not sound can be used effectively t o depict dzferent events that take place during a parallel proBram's execution. In particular, we focus this discussion on distributedmemory parallel programs. Three mappings of execution behavior to sound were studied. ?'he first mapping tracks the load balance of the processors of a system. In the second mapping, the jlows-of-control of the parallel processes are mapped to related sounds. The third mapping is related t o process communication in a distributed-memory parallel program.
{"title":"The Sounds of Parallel Programs","authors":"J. Francioni, J. A. Jackson, L. Albright","doi":"10.1109/DMCC.1991.633316","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633316","url":null,"abstract":"Portraying the behavior of parallel programs is useful in pro8ram debuming and performance tuning. For the most part, researchers have focused on finding ways to visualize what happens during a program's execution. As an alternative to visualization, auralization can also be used to portray the behavior of parallel programs. This paper investigates whether or not sound can be used effectively t o depict dzferent events that take place during a parallel proBram's execution. In particular, we focus this discussion on distributedmemory parallel programs. Three mappings of execution behavior to sound were studied. ?'he first mapping tracks the load balance of the processors of a system. In the second mapping, the jlows-of-control of the parallel processes are mapped to related sounds. The third mapping is related t o process communication in a distributed-memory parallel program.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131090668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633174
D. Scott
Some application programs on distributed memory parallel computers occasionally require an "all-to-all" communication pattern, where each compute node must send a distinct message to each other compute node. Assuming that each node can send and receive only one message at a t ime, the all-to-all pattern must be implemented as a sequence of phases in which certain nodes send and receive messages. r f there are p compute nodes, then at least p-1 phases are needed to complete the operation. A proof of a schedule achieving this lower bound on a circuit switched hypercube with fuced routing is given. This lower bound cannot be achieved on a 2 dimensional mesh. On an axa mesh, dl4 is shown to be a lower bound and a schedule with this number of phases is given. Whether hypercubes or meshes are better for this algorithm depends on the relative bandwidths of the communication channels.
{"title":"Efficient All-to-All Communication Patterns in Hypercube and Mesh Topologies","authors":"D. Scott","doi":"10.1109/DMCC.1991.633174","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633174","url":null,"abstract":"Some application programs on distributed memory parallel computers occasionally require an \"all-to-all\" communication pattern, where each compute node must send a distinct message to each other compute node. Assuming that each node can send and receive only one message at a t ime, the all-to-all pattern must be implemented as a sequence of phases in which certain nodes send and receive messages. r f there are p compute nodes, then at least p-1 phases are needed to complete the operation. A proof of a schedule achieving this lower bound on a circuit switched hypercube with fuced routing is given. This lower bound cannot be achieved on a 2 dimensional mesh. On an axa mesh, dl4 is shown to be a lower bound and a schedule with this number of phases is given. Whether hypercubes or meshes are better for this algorithm depends on the relative bandwidths of the communication channels.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132329335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633140
Peter Steenkiste
example, both sends and receives can operate on both Applications have very diverse communication local and remote buffers. Although this communication requirements. Although individual algorithms often use model does not correspond directly to the low-level regular communication patterns, there is little regularity communication primitives supported by the hardware, it across applications or even across different phases of the can be implemented efficiently, and it gives the users same application. For this reason, a low-level more control over how and when transfers over the communication interface should support the unrestricted, network takes place. The interface is the lowest-level reliable exchange of variable-length messages. communication interface for the Nectar multicomputer.
{"title":"A Symmetrical Communication Interface for Distributed-Memory Computers","authors":"Peter Steenkiste","doi":"10.1109/DMCC.1991.633140","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633140","url":null,"abstract":"example, both sends and receives can operate on both Applications have very diverse communication local and remote buffers. Although this communication requirements. Although individual algorithms often use model does not correspond directly to the low-level regular communication patterns, there is little regularity communication primitives supported by the hardware, it across applications or even across different phases of the can be implemented efficiently, and it gives the users same application. For this reason, a low-level more control over how and when transfers over the communication interface should support the unrestricted, network takes place. The interface is the lowest-level reliable exchange of variable-length messages. communication interface for the Nectar multicomputer.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131637154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633086
M. Baber
Static allocations of arrays on multicomputers have two major shortcomings. First, algorithms often employ more than one referencepattern for a given array, resulting in the need for more than one mapping between the array elements and the multicomputer nodes. Secondly, it is desirable to provide easily resizeable arrays, especially for multigrid algorithms. This paper describes extensions to the hypertasking paracompiler which provide both dynamically resizeable and redistributable arrays. Hypertasking is a parallel programming tool that transforms C programs containing comment-directives into SPMD Cprogirams that can be run on any size hypercube without recompilation for each cube size. Introduction This paper describes extensions tc~ hypertasking [ 11, a domain decomposition tool that operates on commentdirectives inserted into ordinary sequential C source code. The extensions support run-time redistribution and resizing of arrays. Hypertasking is one of seveial projects [4,5,6,8] that have proposed or produced sourceto-source compilers for parallel architectures. I refer to this class of software tools as paracompilers to distinguish them from the sequential source-to-object compilers they are built upon. A fundamental question for paracompiler designers is whether to make decisions about data and control decomposition at compile-time or at ruin-time. If decisions are made at compile-time, the logic does not have to be repeated every time the program is executed and it is possible to optimize the code for known parameters. * Supported in part by: Defense Advanced Research Projects Agency Information Science and Technology Office Research in Concurrent Computing Systems ARPA Order No. 6402.6402-1; Program Code No. 8E20 & 9E20 Issued by DARPAKMO under Contract #&IDA-972-89-C-0034 Unfortunately, compile-time decisions are also inflexible. Hypertasking nnakes all significant decisions about decomposition at ]run-time. A run-time initialization routine is called by each node to assign values to the members of an amay definition structure. The C code generated by the paracompiler references the values in the structure instead of constants chosen at compile-time. The resulting code is surprisingly efficient. Furthermore, because it is relatively straightforward to change the decomposition variables in the array definition structure, run -ti me decomposition great 1 y facilitates the implementation of dynamic array resizing and redistribution features such as those described in this paper. This paper will begin with an overview of the Hypertasking programming model to provide a framework for the new features. Beginning with redistributable arrays, the purpose and performance of the new features are discussed with reference to example programs. Finally, conclusions and goals for future research are presented. Hypertasking overview Hypertasking is; designed to make it easy for software developers to port their existing data parallel applications to a m
{"title":"Hypertasking Support for Dynamically Redistributable and Resizeable Arrays on the iPSC","authors":"M. Baber","doi":"10.1109/DMCC.1991.633086","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633086","url":null,"abstract":"Static allocations of arrays on multicomputers have two major shortcomings. First, algorithms often employ more than one referencepattern for a given array, resulting in the need for more than one mapping between the array elements and the multicomputer nodes. Secondly, it is desirable to provide easily resizeable arrays, especially for multigrid algorithms. This paper describes extensions to the hypertasking paracompiler which provide both dynamically resizeable and redistributable arrays. Hypertasking is a parallel programming tool that transforms C programs containing comment-directives into SPMD Cprogirams that can be run on any size hypercube without recompilation for each cube size. Introduction This paper describes extensions tc~ hypertasking [ 11, a domain decomposition tool that operates on commentdirectives inserted into ordinary sequential C source code. The extensions support run-time redistribution and resizing of arrays. Hypertasking is one of seveial projects [4,5,6,8] that have proposed or produced sourceto-source compilers for parallel architectures. I refer to this class of software tools as paracompilers to distinguish them from the sequential source-to-object compilers they are built upon. A fundamental question for paracompiler designers is whether to make decisions about data and control decomposition at compile-time or at ruin-time. If decisions are made at compile-time, the logic does not have to be repeated every time the program is executed and it is possible to optimize the code for known parameters. * Supported in part by: Defense Advanced Research Projects Agency Information Science and Technology Office Research in Concurrent Computing Systems ARPA Order No. 6402.6402-1; Program Code No. 8E20 & 9E20 Issued by DARPAKMO under Contract #&IDA-972-89-C-0034 Unfortunately, compile-time decisions are also inflexible. Hypertasking nnakes all significant decisions about decomposition at ]run-time. A run-time initialization routine is called by each node to assign values to the members of an amay definition structure. The C code generated by the paracompiler references the values in the structure instead of constants chosen at compile-time. The resulting code is surprisingly efficient. Furthermore, because it is relatively straightforward to change the decomposition variables in the array definition structure, run -ti me decomposition great 1 y facilitates the implementation of dynamic array resizing and redistribution features such as those described in this paper. This paper will begin with an overview of the Hypertasking programming model to provide a framework for the new features. Beginning with redistributable arrays, the purpose and performance of the new features are discussed with reference to example programs. Finally, conclusions and goals for future research are presented. Hypertasking overview Hypertasking is; designed to make it easy for software developers to port their existing data parallel applications to a m","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133137719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633344
A. Sarwal, F. Ozguner, J. Ramanathan
The paper investigates schemes f o r implementing the 3 0 reconstruction of the Coronary Ar te r i e s o n a! MIMD sys t em. The performance of ihe: s y s t em f o r calculating the 3 0 descript ion of the uri'erial tree i s redated t o the mapping strategy selecte,d. The image processing algorithms can be parallelized t o provide fa-, vorable performance for the complete computat ion cy-, d e . Results are provided for t w o mappzng approaches; for an X r a y image, and an extension is proposed f o r the mult iv iew case.
{"title":"Mapping Techniques for Parallel 3D Coronary Arteriography","authors":"A. Sarwal, F. Ozguner, J. Ramanathan","doi":"10.1109/DMCC.1991.633344","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633344","url":null,"abstract":"The paper investigates schemes f o r implementing the 3 0 reconstruction of the Coronary Ar te r i e s o n a! MIMD sys t em. The performance of ihe: s y s t em f o r calculating the 3 0 descript ion of the uri'erial tree i s redated t o the mapping strategy selecte,d. The image processing algorithms can be parallelized t o provide fa-, vorable performance for the complete computat ion cy-, d e . Results are provided for t w o mappzng approaches; for an X r a y image, and an extension is proposed f o r the mult iv iew case.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134078324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633200
S. Breit
The TC.2000 is a MIMD parallel processor wi,th memory that is physically distributed memory, but logically shared. Interprocessor covnmunication, and therefore access to shared memory, is sufficiently fast that most applications can be ported to the TC.2000 without rewriting the code from scratch. This paper shows how this was done for the Perfect ARC'2D benchmark. The code was first restructured by changing the order of subroutine calls so that interprocessor communication would be reduced to the equivalent of three full transposes ofthe data per iteration. The parallel implementation was then completed by inserting shared data declarations and parallel extensions provided by the TC.2000 Fortran language. Thi:F approach was easier to implement than a domain decomposition technique, but requires more interprocessor communication. It is feasible only (because of the TC.2000'~ highspeed interprocessor communications network. References to shared memory take about 25% of the totai execution time for the parallel version of ARC2D. an acceptable amount considering the code did not have to be completely rewritten. High parallel efficiency was obtained using up
{"title":"Implementing the Perfect ARC2D Benchmark on the BBN TC2000 Parallel Supercomputer","authors":"S. Breit","doi":"10.1109/DMCC.1991.633200","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633200","url":null,"abstract":"The TC.2000 is a MIMD parallel processor wi,th memory that is physically distributed memory, but logically shared. Interprocessor covnmunication, and therefore access to shared memory, is sufficiently fast that most applications can be ported to the TC.2000 without rewriting the code from scratch. This paper shows how this was done for the Perfect ARC'2D benchmark. The code was first restructured by changing the order of subroutine calls so that interprocessor communication would be reduced to the equivalent of three full transposes ofthe data per iteration. The parallel implementation was then completed by inserting shared data declarations and parallel extensions provided by the TC.2000 Fortran language. Thi:F approach was easier to implement than a domain decomposition technique, but requires more interprocessor communication. It is feasible only (because of the TC.2000'~ highspeed interprocessor communications network. References to shared memory take about 25% of the totai execution time for the parallel version of ARC2D. an acceptable amount considering the code did not have to be completely rewritten. High parallel efficiency was obtained using up","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130116534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633150
S. Johnsson, Ching-Tien Ho
All-to-all personalized communication is a class, of permutations in which each processor sends a unique message to every other processor. We present optimal algorithms for concurrent communication on all channels in Boolean cube networks, both for the case with a single permutation, and the case where multiple permutations shall be performed on the same local data set, but on different sets of processors. For K elements per processor our algorithms give the optimal number of elements transfer, K/2. For a succession of all-to-all personalized communications on disjoint subcubes of p dimensions each, our best algorithm yields $.+c-p element exchanges in sequence, where cr is the total number of processor dimensions in the permutation. An implementation on the Connection Machine of one of the algorithms offers a maximum speed-up of 50% compared to the previously best known algorithm.
{"title":"Optimal All-to-All Personalized Communication with Minimum Span on Boolean Cubes","authors":"S. Johnsson, Ching-Tien Ho","doi":"10.1109/DMCC.1991.633150","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633150","url":null,"abstract":"All-to-all personalized communication is a class, of permutations in which each processor sends a unique message to every other processor. We present optimal algorithms for concurrent communication on all channels in Boolean cube networks, both for the case with a single permutation, and the case where multiple permutations shall be performed on the same local data set, but on different sets of processors. For K elements per processor our algorithms give the optimal number of elements transfer, K/2. For a succession of all-to-all personalized communications on disjoint subcubes of p dimensions each, our best algorithm yields $.+c-p element exchanges in sequence, where cr is the total number of processor dimensions in the permutation. An implementation on the Connection Machine of one of the algorithms offers a maximum speed-up of 50% compared to the previously best known algorithm.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128596445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633313
D. Rover, M. B. Carter, J. Gustafson
Performance visua1,ization provides insights about the complex operation of concurrent computer systems. SLAL O W M is a scalable, fuced-time coinputer benchmark. Each corresponds to U method of computer performance evaluation: monitoring and benchmarking, respectively. Whereas benchmark programs typically report singlenumber performance naetrics for ease of comparison among different machines, a perforfinance monitor (via instrumentation and visualization) gives (a detailed account of the dynamks of program execution. Using sofrware tools developed for the nCCBE 2 and the MasPar MP-1 distributed memory machines and applied to the SLALOM program, we demonstrate the utility of performance visualization for fine-tuning algorithms and understanding phenomena. The tools include PICL and ParaGraph and custom VISTA components.
{"title":"Performance Visualization of SLALOM","authors":"D. Rover, M. B. Carter, J. Gustafson","doi":"10.1109/DMCC.1991.633313","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633313","url":null,"abstract":"Performance visua1,ization provides insights about the complex operation of concurrent computer systems. SLAL O W M is a scalable, fuced-time coinputer benchmark. Each corresponds to U method of computer performance evaluation: monitoring and benchmarking, respectively. Whereas benchmark programs typically report singlenumber performance naetrics for ease of comparison among different machines, a perforfinance monitor (via instrumentation and visualization) gives (a detailed account of the dynamks of program execution. Using sofrware tools developed for the nCCBE 2 and the MasPar MP-1 distributed memory machines and applied to the SLALOM program, we demonstrate the utility of performance visualization for fine-tuning algorithms and understanding phenomena. The tools include PICL and ParaGraph and custom VISTA components.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129138244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633162
V. Saletore, L. Kalé
{"title":"Efficient Parallel Execution of IDA on Shared and Distributed Memory Multiprocessors","authors":"V. Saletore, L. Kalé","doi":"10.1109/DMCC.1991.633162","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633162","url":null,"abstract":"","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116563839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633358
S. Kambhatla
Hypercubes and cube-connected cycles di'er in the number of links per node which has fundamental implications on several issues including performance and ease of implementation. In this paper, we evaluate these networks with respect to a number of parameters including several topological characterizations, fault-tolerance, various broadcast and point-to-point communication primitives. In the process we also derive several lower bound figures and describe algorithms for communication in cube-connected cycles. We conclude that while having lower number of links per node in a CCC might not degrade performance drastically (especially for lowe,r dimensions) as compared to a hypercube of a similar size, this feature has several consequences which substantially aid its (VLSI and non- VLSI) implementation.
{"title":"Hypercube Vs Cube-Connected Cycles: A Topological Evaluation","authors":"S. Kambhatla","doi":"10.1109/DMCC.1991.633358","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633358","url":null,"abstract":"Hypercubes and cube-connected cycles di'er in the number of links per node which has fundamental implications on several issues including performance and ease of implementation. In this paper, we evaluate these networks with respect to a number of parameters including several topological characterizations, fault-tolerance, various broadcast and point-to-point communication primitives. In the process we also derive several lower bound figures and describe algorithms for communication in cube-connected cycles. We conclude that while having lower number of links per node in a CCC might not degrade performance drastically (especially for lowe,r dimensions) as compared to a hypercube of a similar size, this feature has several consequences which substantially aid its (VLSI and non- VLSI) implementation.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122289023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}