在多核处理器集群上并行化一种多帧盲反卷积算法

R. Linderman, S. Spetka, S. Emeny, D. Fitzgerald
{"title":"在多核处理器集群上并行化一种多帧盲反卷积算法","authors":"R. Linderman, S. Spetka, S. Emeny, D. Fitzgerald","doi":"10.1109/AERO.2009.4839545","DOIUrl":null,"url":null,"abstract":"The parallelization strategy of the Physically-Constrained Iterative Deconvolution (PCID) algorithm is being altered and optimized to enhance performance on emerging multi-core architectures. This paper reports results from porting PCID to multi-core architectures including the JAWS supercomputer at the Maui HPC Center (60 TFLOPS of dual-dual Xeon® nodes) and the Cell Cluster at AFRL in Rome, NY (52 TFLOPS of Playstation 3® nodes with IBM Cell Broadband Engine® multi-cores and 14 dual-quad Xeon headnodes). For 512×512 image sizes FFT performance exceeding 60 GFLOPS has been observed on dual-quad Xeon nodes. Multi-core architectures programmed with multiple threads delivered significantly better performance for parallelization of the low level image convolution operations compared to earlier parallelization across cluster nodes with MPI. Another focus of the PCID multi-core effort was to move from MPI message passing to a publish-subscribe-query approach to information management. The publish, subscribe and query infrastructure was optimized for large scale machines, such as JAWS, and features a “loose coupling“ of publishers to subscribers through intervening brokers. This change makes runs on large HPCs with thousands of intercommunicating cores more flexible and more fault tolerant.","PeriodicalId":117250,"journal":{"name":"2009 IEEE Aerospace conference","volume":"131 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Parallelizing a multi-frame blind deconvolution algorithm on clusters of multicore processors\",\"authors\":\"R. Linderman, S. Spetka, S. Emeny, D. Fitzgerald\",\"doi\":\"10.1109/AERO.2009.4839545\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The parallelization strategy of the Physically-Constrained Iterative Deconvolution (PCID) algorithm is being altered and optimized to enhance performance on emerging multi-core architectures. This paper reports results from porting PCID to multi-core architectures including the JAWS supercomputer at the Maui HPC Center (60 TFLOPS of dual-dual Xeon® nodes) and the Cell Cluster at AFRL in Rome, NY (52 TFLOPS of Playstation 3® nodes with IBM Cell Broadband Engine® multi-cores and 14 dual-quad Xeon headnodes). For 512×512 image sizes FFT performance exceeding 60 GFLOPS has been observed on dual-quad Xeon nodes. Multi-core architectures programmed with multiple threads delivered significantly better performance for parallelization of the low level image convolution operations compared to earlier parallelization across cluster nodes with MPI. Another focus of the PCID multi-core effort was to move from MPI message passing to a publish-subscribe-query approach to information management. The publish, subscribe and query infrastructure was optimized for large scale machines, such as JAWS, and features a “loose coupling“ of publishers to subscribers through intervening brokers. This change makes runs on large HPCs with thousands of intercommunicating cores more flexible and more fault tolerant.\",\"PeriodicalId\":117250,\"journal\":{\"name\":\"2009 IEEE Aerospace conference\",\"volume\":\"131 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-03-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE Aerospace conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AERO.2009.4839545\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE Aerospace conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AERO.2009.4839545","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

物理约束迭代反卷积(PCID)算法的并行化策略正在改变和优化,以提高在新兴多核架构上的性能。本文报告了将pcd移植到多核架构的结果,包括Maui HPC中心的JAWS超级计算机(双双Xeon®节点的60 TFLOPS)和纽约罗马AFRL的Cell集群(Playstation 3®节点的52 TFLOPS与IBM Cell宽带引擎®多核和14双四Xeon头节点)。对于512×512图像大小,在双四Xeon节点上观察到FFT性能超过60 GFLOPS。与使用MPI跨集群节点的早期并行化相比,使用多线程编程的多核架构在低级图像卷积操作的并行化方面提供了明显更好的性能。PCID多核工作的另一个重点是从MPI消息传递转向信息管理的发布-订阅-查询方法。发布、订阅和查询基础设施针对大型机器(如JAWS)进行了优化,并通过中介将发布者与订阅者“松耦合”。这一变化使得在具有数千个相互通信核心的大型高性能计算机上运行更加灵活,容错能力也更强。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Parallelizing a multi-frame blind deconvolution algorithm on clusters of multicore processors
The parallelization strategy of the Physically-Constrained Iterative Deconvolution (PCID) algorithm is being altered and optimized to enhance performance on emerging multi-core architectures. This paper reports results from porting PCID to multi-core architectures including the JAWS supercomputer at the Maui HPC Center (60 TFLOPS of dual-dual Xeon® nodes) and the Cell Cluster at AFRL in Rome, NY (52 TFLOPS of Playstation 3® nodes with IBM Cell Broadband Engine® multi-cores and 14 dual-quad Xeon headnodes). For 512×512 image sizes FFT performance exceeding 60 GFLOPS has been observed on dual-quad Xeon nodes. Multi-core architectures programmed with multiple threads delivered significantly better performance for parallelization of the low level image convolution operations compared to earlier parallelization across cluster nodes with MPI. Another focus of the PCID multi-core effort was to move from MPI message passing to a publish-subscribe-query approach to information management. The publish, subscribe and query infrastructure was optimized for large scale machines, such as JAWS, and features a “loose coupling“ of publishers to subscribers through intervening brokers. This change makes runs on large HPCs with thousands of intercommunicating cores more flexible and more fault tolerant.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Dynamic Wiener filters for small-target radiometric restoration Hop-by-hop transport for satellite networks Creating virtual sensors using learning based super resolution and data fusion Autonomous robot navigation using advanced motion primitives Development of a relay performance web tool for the Mars network
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1