{"title":"Multi-Assignment Single Joins for Parallel Cross-Match of Astronomic Catalogs on Heterogeneous Clusters","authors":"Xiaoying Jia, Qiong Luo","doi":"10.1145/2949689.2949705","DOIUrl":null,"url":null,"abstract":"Cross-match is a central operation in astronomic databases to integrate multiple catalogs of celestial objects. With the rapid development of new astronomy projects, large amounts of astronomic catalogs are generated and require fast cross-match with existing databases. In this paper, we propose to adopt a Multi-Assignment Single Join (MASJ) method for cross-match on heterogeneous clusters that consist of both CPUs and GPUs. We chose MASJ for cross-match, because (1) cross-matching records from astronomic catalogs is essentially a spatial distance join on two sets of points, and (2) each reference point is mapped to only a small number of search intervals. As a result, the MASJ cross-match, or MASJ-CM algorithm is feasible and highly efficient in a heterogeneous cluster environment. We have implemented MASJ-CM in two packages: one is an MPI-CUDA implementation, which fully utilizes the multi-core CPUs, GPUs, and InfiniBand communications; the other is on top of the popular distributed computing platform Spark, which greatly simplifies the programming. Our results on a six-node CPU-GPU cluster show that the MPI-CUDA implementation achieved a speedup of 2.69 times over a previous indexed nested-loop join algorithm. The Spark-based implementation was an order of magnitude slower than the MPI-CUDA; nevertheless, it is widely applicable and its source code much simpler.","PeriodicalId":254803,"journal":{"name":"Proceedings of the 28th International Conference on Scientific and Statistical Database Management","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 28th International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2949689.2949705","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
Cross-match is a central operation in astronomic databases to integrate multiple catalogs of celestial objects. With the rapid development of new astronomy projects, large amounts of astronomic catalogs are generated and require fast cross-match with existing databases. In this paper, we propose to adopt a Multi-Assignment Single Join (MASJ) method for cross-match on heterogeneous clusters that consist of both CPUs and GPUs. We chose MASJ for cross-match, because (1) cross-matching records from astronomic catalogs is essentially a spatial distance join on two sets of points, and (2) each reference point is mapped to only a small number of search intervals. As a result, the MASJ cross-match, or MASJ-CM algorithm is feasible and highly efficient in a heterogeneous cluster environment. We have implemented MASJ-CM in two packages: one is an MPI-CUDA implementation, which fully utilizes the multi-core CPUs, GPUs, and InfiniBand communications; the other is on top of the popular distributed computing platform Spark, which greatly simplifies the programming. Our results on a six-node CPU-GPU cluster show that the MPI-CUDA implementation achieved a speedup of 2.69 times over a previous indexed nested-loop join algorithm. The Spark-based implementation was an order of magnitude slower than the MPI-CUDA; nevertheless, it is widely applicable and its source code much simpler.