16-bit FP sub-word parallelism to facilitate compiler vectorization and improve performance of image and media processing

International Conference on Parallel Processing, 2004. ICPP 2004. Pub Date : 2004-08-15 DOI:10.1109/ICPP.2004.1327964

D. Etiemble, L. Lacassagne

引用次数: 5

Abstract

We consider the implementation of 16-bit floating point instructions on a Pentium 4 and a PowerPC G5 for image and media processing. By measuring the execution time of benchmarks with these new simulated instructions, we show that significant speed-up is obtained compared to 32-bit FP versions. For image processing, the speed-up both comes from doubling the number of operations per SIMD instruction and the better cache behavior with byte storage. For data stream processing with arrays of structures, the speed-up mainly comes from the wider SIMD instructions.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

16位FP子字并行，方便编译矢量化，提高图像和媒体处理性能

我们考虑在Pentium 4和PowerPC G5上实现16位浮点指令，用于图像和媒体处理。通过使用这些新的模拟指令测量基准测试的执行时间，我们发现与32位FP版本相比，获得了显著的加速。对于图像处理，速度提升来自于每条SIMD指令的操作数量翻倍，以及字节存储带来的更好的缓存行为。对于结构数组的数据流处理，加速主要来自于更广泛的SIMD指令。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International Conference on Parallel Processing, 2004. ICPP 2004.

自引率

0.00%

发文量

期刊最新文献

Non-uniform dependences partitioned by recurrence chains Clustering strategies for cluster timestamps An effective fault-tolerant routing methodology for direct networks Complexity results and heuristics for pipelined multicast operations on heterogeneous platforms Low-cost register-pressure prediction for scalar replacement using pseudo-schedules