Data Distribution Method for Fast Giga-scale Hologram Generation on a Multi-GPU Cluster

T. Baba, Shinpei Watanabe, B. Jackin, K. Ootsu, Takeshi Ohkawa, T. Yokota, Y. Hayasaki, T. Yatagai
{"title":"Data Distribution Method for Fast Giga-scale Hologram Generation on a Multi-GPU Cluster","authors":"T. Baba, Shinpei Watanabe, B. Jackin, K. Ootsu, Takeshi Ohkawa, T. Yokota, Y. Hayasaki, T. Yatagai","doi":"10.1145/3231104.3231105","DOIUrl":null,"url":null,"abstract":"The 3D holographic display has long been expected as a future human interface as it does not require users to wear special devices. However, in addition to the delay of display device technology, its heavy computation requirement prevents the realization of such displays. A recent study says that objects and holograms with several giga-pixels should be processed in real time for the realization of high resolution and wide view angle. To this problem, first, we have proposed a new data distribution method that utilizes a basic FFT-based O(N log N) computation but does not need any inter-node communications during the computation on a multi-GPU cluster. Then, we have implemented the method on a multi-GPU cluster, applying several single-node and multi-node optimization and parallelization techniques. The experimental results show that the intra-node optimizations attain 11.52 times speed-up from the original single node code. Further, multi-node optimizations using 8 nodes, 2 GPUs per node, attain the execution time of 4.28 sec. for generating 1.6 giga-pixel hologram from 3.2 giga-pixel object. It means 237.92 times speed-up of the sequential processing by CPU using a conventional FFT-based algorithm.","PeriodicalId":164914,"journal":{"name":"Proceedings of the 2018 Workshop on Advanced Tools, Programming Languages, and PLatforms for Implementing and Evaluating Algorithms for Distributed systems","volume":"118 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 Workshop on Advanced Tools, Programming Languages, and PLatforms for Implementing and Evaluating Algorithms for Distributed systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3231104.3231105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The 3D holographic display has long been expected as a future human interface as it does not require users to wear special devices. However, in addition to the delay of display device technology, its heavy computation requirement prevents the realization of such displays. A recent study says that objects and holograms with several giga-pixels should be processed in real time for the realization of high resolution and wide view angle. To this problem, first, we have proposed a new data distribution method that utilizes a basic FFT-based O(N log N) computation but does not need any inter-node communications during the computation on a multi-GPU cluster. Then, we have implemented the method on a multi-GPU cluster, applying several single-node and multi-node optimization and parallelization techniques. The experimental results show that the intra-node optimizations attain 11.52 times speed-up from the original single node code. Further, multi-node optimizations using 8 nodes, 2 GPUs per node, attain the execution time of 4.28 sec. for generating 1.6 giga-pixel hologram from 3.2 giga-pixel object. It means 237.92 times speed-up of the sequential processing by CPU using a conventional FFT-based algorithm.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
多gpu集群上快速生成千兆级全息图的数据分布方法
长期以来,人们一直期待3D全息显示器成为未来的人机界面,因为它不需要用户佩戴特殊的设备。然而,除了显示设备技术的延迟外,其繁重的计算需求也阻碍了这种显示的实现。最近的一项研究表明,为了实现高分辨率和宽视角,需要实时处理数十亿像素的物体和全息图。针对这一问题,我们首先提出了一种新的数据分布方法,该方法利用基于fft的基本O(N log N)计算,但在多gpu集群的计算过程中不需要任何节点间通信。然后,我们在一个多gpu集群上实现了该方法,应用了几种单节点和多节点优化和并行化技术。实验结果表明,节点内优化比原单节点代码提高了11.52倍的速度。此外,使用8个节点,每个节点2个gpu的多节点优化,从3.2千兆像素对象生成1.6千兆像素全息图的执行时间为4.28秒。这意味着使用传统的基于fft的算法,CPU的顺序处理速度提高了237.92倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Proceedings of the 2018 Workshop on Advanced Tools, Programming Languages, and PLatforms for Implementing and Evaluating Algorithms for Distributed systems Language Semantics Driven Design and Formal Analysis for Distributed Cyber-Physical Systems: [Extended Abstract] Towards a More Reliable Store-and-forward Protocol for Mobile Text Messages Partisan Logical Clocks Are Not Fair: What Is Fair?
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1