{"title":"Parallelizing a multi-frame blind deconvolution algorithm on clusters of multicore processors","authors":"R. Linderman, S. Spetka, S. Emeny, D. Fitzgerald","doi":"10.1109/AERO.2009.4839545","DOIUrl":null,"url":null,"abstract":"The parallelization strategy of the Physically-Constrained Iterative Deconvolution (PCID) algorithm is being altered and optimized to enhance performance on emerging multi-core architectures. This paper reports results from porting PCID to multi-core architectures including the JAWS supercomputer at the Maui HPC Center (60 TFLOPS of dual-dual Xeon® nodes) and the Cell Cluster at AFRL in Rome, NY (52 TFLOPS of Playstation 3® nodes with IBM Cell Broadband Engine® multi-cores and 14 dual-quad Xeon headnodes). For 512×512 image sizes FFT performance exceeding 60 GFLOPS has been observed on dual-quad Xeon nodes. Multi-core architectures programmed with multiple threads delivered significantly better performance for parallelization of the low level image convolution operations compared to earlier parallelization across cluster nodes with MPI. Another focus of the PCID multi-core effort was to move from MPI message passing to a publish-subscribe-query approach to information management. The publish, subscribe and query infrastructure was optimized for large scale machines, such as JAWS, and features a “loose coupling“ of publishers to subscribers through intervening brokers. This change makes runs on large HPCs with thousands of intercommunicating cores more flexible and more fault tolerant.","PeriodicalId":117250,"journal":{"name":"2009 IEEE Aerospace conference","volume":"131 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE Aerospace conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AERO.2009.4839545","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
The parallelization strategy of the Physically-Constrained Iterative Deconvolution (PCID) algorithm is being altered and optimized to enhance performance on emerging multi-core architectures. This paper reports results from porting PCID to multi-core architectures including the JAWS supercomputer at the Maui HPC Center (60 TFLOPS of dual-dual Xeon® nodes) and the Cell Cluster at AFRL in Rome, NY (52 TFLOPS of Playstation 3® nodes with IBM Cell Broadband Engine® multi-cores and 14 dual-quad Xeon headnodes). For 512×512 image sizes FFT performance exceeding 60 GFLOPS has been observed on dual-quad Xeon nodes. Multi-core architectures programmed with multiple threads delivered significantly better performance for parallelization of the low level image convolution operations compared to earlier parallelization across cluster nodes with MPI. Another focus of the PCID multi-core effort was to move from MPI message passing to a publish-subscribe-query approach to information management. The publish, subscribe and query infrastructure was optimized for large scale machines, such as JAWS, and features a “loose coupling“ of publishers to subscribers through intervening brokers. This change makes runs on large HPCs with thousands of intercommunicating cores more flexible and more fault tolerant.