{"title":"Parallel sorting of large arrays on the MasPar MP-1","authors":"J. Prins, J.A. Smith","doi":"10.1109/FMPC.1990.89439","DOIUrl":null,"url":null,"abstract":"The problem of sorting a collection of values on a mesh-connected, distributed-memory, SIMD (single-instruction-stream, multiple-data-stream) computer using variants of Batcher's bitonic sort algorithm is considered for the case in which the number of values exceeds the number of processors in the machine. In this setting the number of comparisons can be reduced asymptotically if the processors have addressing autonomy (locally indirect addressing), and communication costs can be reduced by judicious domain decomposition. The implementation of several related adaptations of bitonic sort on a MasPar MP-1 is reported. Performance is analyzed in relation to the virtualization ratio VPR. It is concluded that the most reasonable large-array sort for this machine will combine hypercube virtualization with the processor axes transposed dynamically within an xnet embedding.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"127 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FMPC.1990.89439","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16
Abstract
The problem of sorting a collection of values on a mesh-connected, distributed-memory, SIMD (single-instruction-stream, multiple-data-stream) computer using variants of Batcher's bitonic sort algorithm is considered for the case in which the number of values exceeds the number of processors in the machine. In this setting the number of comparisons can be reduced asymptotically if the processors have addressing autonomy (locally indirect addressing), and communication costs can be reduced by judicious domain decomposition. The implementation of several related adaptations of bitonic sort on a MasPar MP-1 is reported. Performance is analyzed in relation to the virtualization ratio VPR. It is concluded that the most reasonable large-array sort for this machine will combine hypercube virtualization with the processor axes transposed dynamically within an xnet embedding.<>