Data Distribution Method for Fast Giga-scale Hologram Generation on a Multi-GPU Cluster

Proceedings of the 2018 Workshop on Advanced Tools, Programming Languages, and PLatforms for Implementing and Evaluating Algorithms for Distributed systems Pub Date : 2018-07-23 DOI:10.1145/3231104.3231105

T. Baba, Shinpei Watanabe, B. Jackin, K. Ootsu, Takeshi Ohkawa, T. Yokota, Y. Hayasaki, T. Yatagai

{"title":"Data Distribution Method for Fast Giga-scale Hologram Generation on a Multi-GPU Cluster","authors":"T. Baba, Shinpei Watanabe, B. Jackin, K. Ootsu, Takeshi Ohkawa, T. Yokota, Y. Hayasaki, T. Yatagai","doi":"10.1145/3231104.3231105","DOIUrl":null,"url":null,"abstract":"The 3D holographic display has long been expected as a future human interface as it does not require users to wear special devices. However, in addition to the delay of display device technology, its heavy computation requirement prevents the realization of such displays. A recent study says that objects and holograms with several giga-pixels should be processed in real time for the realization of high resolution and wide view angle. To this problem, first, we have proposed a new data distribution method that utilizes a basic FFT-based O(N log N) computation but does not need any inter-node communications during the computation on a multi-GPU cluster. Then, we have implemented the method on a multi-GPU cluster, applying several single-node and multi-node optimization and parallelization techniques. The experimental results show that the intra-node optimizations attain 11.52 times speed-up from the original single node code. Further, multi-node optimizations using 8 nodes, 2 GPUs per node, attain the execution time of 4.28 sec. for generating 1.6 giga-pixel hologram from 3.2 giga-pixel object. It means 237.92 times speed-up of the sequential processing by CPU using a conventional FFT-based algorithm.","PeriodicalId":164914,"journal":{"name":"Proceedings of the 2018 Workshop on Advanced Tools, Programming Languages, and PLatforms for Implementing and Evaluating Algorithms for Distributed systems","volume":"118 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 Workshop on Advanced Tools, Programming Languages, and PLatforms for Implementing and Evaluating Algorithms for Distributed systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3231104.3231105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

The 3D holographic display has long been expected as a future human interface as it does not require users to wear special devices. However, in addition to the delay of display device technology, its heavy computation requirement prevents the realization of such displays. A recent study says that objects and holograms with several giga-pixels should be processed in real time for the realization of high resolution and wide view angle. To this problem, first, we have proposed a new data distribution method that utilizes a basic FFT-based O(N log N) computation but does not need any inter-node communications during the computation on a multi-GPU cluster. Then, we have implemented the method on a multi-GPU cluster, applying several single-node and multi-node optimization and parallelization techniques. The experimental results show that the intra-node optimizations attain 11.52 times speed-up from the original single node code. Further, multi-node optimizations using 8 nodes, 2 GPUs per node, attain the execution time of 4.28 sec. for generating 1.6 giga-pixel hologram from 3.2 giga-pixel object. It means 237.92 times speed-up of the sequential processing by CPU using a conventional FFT-based algorithm.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

多gpu集群上快速生成千兆级全息图的数据分布方法

长期以来，人们一直期待3D全息显示器成为未来的人机界面，因为它不需要用户佩戴特殊的设备。然而，除了显示设备技术的延迟外，其繁重的计算需求也阻碍了这种显示的实现。最近的一项研究表明，为了实现高分辨率和宽视角，需要实时处理数十亿像素的物体和全息图。针对这一问题，我们首先提出了一种新的数据分布方法，该方法利用基于fft的基本O(N log N)计算，但在多gpu集群的计算过程中不需要任何节点间通信。然后，我们在一个多gpu集群上实现了该方法，应用了几种单节点和多节点优化和并行化技术。实验结果表明，节点内优化比原单节点代码提高了11.52倍的速度。此外，使用8个节点，每个节点2个gpu的多节点优化，从3.2千兆像素对象生成1.6千兆像素全息图的执行时间为4.28秒。这意味着使用传统的基于fft的算法，CPU的顺序处理速度提高了237.92倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2018 Workshop on Advanced Tools, Programming Languages, and PLatforms for Implementing and Evaluating Algorithms for Distributed systems

自引率

0.00%

发文量