在多核处理器集群上并行化一种多帧盲反卷积算法

2009 IEEE Aerospace conference Pub Date : 2009-03-07 DOI:10.1109/AERO.2009.4839545

R. Linderman, S. Spetka, S. Emeny, D. Fitzgerald

{"title":"在多核处理器集群上并行化一种多帧盲反卷积算法","authors":"R. Linderman, S. Spetka, S. Emeny, D. Fitzgerald","doi":"10.1109/AERO.2009.4839545","DOIUrl":null,"url":null,"abstract":"The parallelization strategy of the Physically-Constrained Iterative Deconvolution (PCID) algorithm is being altered and optimized to enhance performance on emerging multi-core architectures. This paper reports results from porting PCID to multi-core architectures including the JAWS supercomputer at the Maui HPC Center (60 TFLOPS of dual-dual Xeon® nodes) and the Cell Cluster at AFRL in Rome, NY (52 TFLOPS of Playstation 3® nodes with IBM Cell Broadband Engine® multi-cores and 14 dual-quad Xeon headnodes). For 512×512 image sizes FFT performance exceeding 60 GFLOPS has been observed on dual-quad Xeon nodes. Multi-core architectures programmed with multiple threads delivered significantly better performance for parallelization of the low level image convolution operations compared to earlier parallelization across cluster nodes with MPI. Another focus of the PCID multi-core effort was to move from MPI message passing to a publish-subscribe-query approach to information management. The publish, subscribe and query infrastructure was optimized for large scale machines, such as JAWS, and features a “loose coupling“ of publishers to subscribers through intervening brokers. This change makes runs on large HPCs with thousands of intercommunicating cores more flexible and more fault tolerant.","PeriodicalId":117250,"journal":{"name":"2009 IEEE Aerospace conference","volume":"131 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Parallelizing a multi-frame blind deconvolution algorithm on clusters of multicore processors\",\"authors\":\"R. Linderman, S. Spetka, S. Emeny, D. Fitzgerald\",\"doi\":\"10.1109/AERO.2009.4839545\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The parallelization strategy of the Physically-Constrained Iterative Deconvolution (PCID) algorithm is being altered and optimized to enhance performance on emerging multi-core architectures. This paper reports results from porting PCID to multi-core architectures including the JAWS supercomputer at the Maui HPC Center (60 TFLOPS of dual-dual Xeon® nodes) and the Cell Cluster at AFRL in Rome, NY (52 TFLOPS of Playstation 3® nodes with IBM Cell Broadband Engine® multi-cores and 14 dual-quad Xeon headnodes). For 512×512 image sizes FFT performance exceeding 60 GFLOPS has been observed on dual-quad Xeon nodes. Multi-core architectures programmed with multiple threads delivered significantly better performance for parallelization of the low level image convolution operations compared to earlier parallelization across cluster nodes with MPI. Another focus of the PCID multi-core effort was to move from MPI message passing to a publish-subscribe-query approach to information management. The publish, subscribe and query infrastructure was optimized for large scale machines, such as JAWS, and features a “loose coupling“ of publishers to subscribers through intervening brokers. This change makes runs on large HPCs with thousands of intercommunicating cores more flexible and more fault tolerant.\",\"PeriodicalId\":117250,\"journal\":{\"name\":\"2009 IEEE Aerospace conference\",\"volume\":\"131 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-03-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE Aerospace conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AERO.2009.4839545\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE Aerospace conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AERO.2009.4839545","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

物理约束迭代反卷积(PCID)算法的并行化策略正在改变和优化，以提高在新兴多核架构上的性能。本文报告了将pcd移植到多核架构的结果，包括Maui HPC中心的JAWS超级计算机(双双Xeon®节点的60 TFLOPS)和纽约罗马AFRL的Cell集群(Playstation 3®节点的52 TFLOPS与IBM Cell宽带引擎®多核和14双四Xeon头节点)。对于512×512图像大小，在双四Xeon节点上观察到FFT性能超过60 GFLOPS。与使用MPI跨集群节点的早期并行化相比，使用多线程编程的多核架构在低级图像卷积操作的并行化方面提供了明显更好的性能。PCID多核工作的另一个重点是从MPI消息传递转向信息管理的发布-订阅-查询方法。发布、订阅和查询基础设施针对大型机器(如JAWS)进行了优化，并通过中介将发布者与订阅者“松耦合”。这一变化使得在具有数千个相互通信核心的大型高性能计算机上运行更加灵活，容错能力也更强。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Parallelizing a multi-frame blind deconvolution algorithm on clusters of multicore processors

The parallelization strategy of the Physically-Constrained Iterative Deconvolution (PCID) algorithm is being altered and optimized to enhance performance on emerging multi-core architectures. This paper reports results from porting PCID to multi-core architectures including the JAWS supercomputer at the Maui HPC Center (60 TFLOPS of dual-dual Xeon® nodes) and the Cell Cluster at AFRL in Rome, NY (52 TFLOPS of Playstation 3® nodes with IBM Cell Broadband Engine® multi-cores and 14 dual-quad Xeon headnodes). For 512×512 image sizes FFT performance exceeding 60 GFLOPS has been observed on dual-quad Xeon nodes. Multi-core architectures programmed with multiple threads delivered significantly better performance for parallelization of the low level image convolution operations compared to earlier parallelization across cluster nodes with MPI. Another focus of the PCID multi-core effort was to move from MPI message passing to a publish-subscribe-query approach to information management. The publish, subscribe and query infrastructure was optimized for large scale machines, such as JAWS, and features a “loose coupling“ of publishers to subscribers through intervening brokers. This change makes runs on large HPCs with thousands of intercommunicating cores more flexible and more fault tolerant.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2009 IEEE Aerospace conference

自引率

0.00%

发文量

期刊最新文献

Dynamic Wiener filters for small-target radiometric restoration Hop-by-hop transport for satellite networks Creating virtual sensors using learning based super resolution and data fusion Autonomous robot navigation using advanced motion primitives Development of a relay performance web tool for the Mars network