优化精确半精度平均值的实例研究

2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) Pub Date : 2018-09-01 DOI:10.1109/CAHPC.2018.8645923

K. Peou, A. Kelly, J. Falcou, Cécile Germain

{"title":"优化精确半精度平均值的实例研究","authors":"K. Peou, A. Kelly, J. Falcou, Cécile Germain","doi":"10.1109/CAHPC.2018.8645923","DOIUrl":null,"url":null,"abstract":"In this work, we study the numerical performance of various common algorithms used to calculate the average of an array of half precision (FP16) floating point values. While the current generation of CPUs does not support native FP16 arithmetic, it is a planned feature in a number of next-generation CPUs. FP16 arithmetic was emulated via the half software library. Due to the limitations of the FP16 data type, some algorithms proved insufficient for arrays as small as 100 elements. We propose an algorithm that allows numerically stable FP16 computation of the average and compare it to the naive floating point (FP32) algorithm in terms of both numerical precision and runtime performance. We find that our algorithm offers comparable robustness, numerical precision, and SIMD performance to the higher precision computation.","PeriodicalId":307747,"journal":{"name":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Case Study on Optimizing Accurate Half Precision Average\",\"authors\":\"K. Peou, A. Kelly, J. Falcou, Cécile Germain\",\"doi\":\"10.1109/CAHPC.2018.8645923\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work, we study the numerical performance of various common algorithms used to calculate the average of an array of half precision (FP16) floating point values. While the current generation of CPUs does not support native FP16 arithmetic, it is a planned feature in a number of next-generation CPUs. FP16 arithmetic was emulated via the half software library. Due to the limitations of the FP16 data type, some algorithms proved insufficient for arrays as small as 100 elements. We propose an algorithm that allows numerically stable FP16 computation of the average and compare it to the naive floating point (FP32) algorithm in terms of both numerical precision and runtime performance. We find that our algorithm offers comparable robustness, numerical precision, and SIMD performance to the higher precision computation.\",\"PeriodicalId\":307747,\"journal\":{\"name\":\"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CAHPC.2018.8645923\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CAHPC.2018.8645923","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在这项工作中，我们研究了用于计算半精度(FP16)浮点值数组平均值的各种常用算法的数值性能。虽然当前一代cpu不支持原生FP16算法，但它是许多下一代cpu计划中的功能。利用半软件对FP16算法进行了仿真。由于FP16数据类型的限制，一些算法被证明不足以处理100个元素的数组。我们提出了一种算法，允许数值稳定的FP16计算平均值，并将其与朴素浮点(FP32)算法在数值精度和运行时性能方面进行比较。我们发现我们的算法提供了相当的鲁棒性，数值精度和SIMD性能，以更高的精度计算。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Case Study on Optimizing Accurate Half Precision Average

In this work, we study the numerical performance of various common algorithms used to calculate the average of an array of half precision (FP16) floating point values. While the current generation of CPUs does not support native FP16 arithmetic, it is a planned feature in a number of next-generation CPUs. FP16 arithmetic was emulated via the half software library. Due to the limitations of the FP16 data type, some algorithms proved insufficient for arrays as small as 100 elements. We propose an algorithm that allows numerically stable FP16 computation of the average and compare it to the naive floating point (FP32) algorithm in terms of both numerical precision and runtime performance. We find that our algorithm offers comparable robustness, numerical precision, and SIMD performance to the higher precision computation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

自引率

0.00%

发文量

期刊最新文献

Assessing Time Predictability Features of ARM Big. LITTLE Multicores Impacts of Three Soft-Fault Models on Hybrid Parallel Asynchronous Iterative Methods Predicting the Performance Impact of Increasing Memory Bandwidth for Scientific Workflows From Java to FPGA: An Experience with the Intel HARP System Copyright