Accurate Computation of the Logarithm of Modified Bessel Functions on GPUs

arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2024-09-13 DOI:arxiv-2409.08729

Andreas Plesner, Hans Henrik Brandenborg Sørensen, Søren Hauberg

{"title":"Accurate Computation of the Logarithm of Modified Bessel Functions on GPUs","authors":"Andreas Plesner, Hans Henrik Brandenborg Sørensen, Søren Hauberg","doi":"arxiv-2409.08729","DOIUrl":null,"url":null,"abstract":"Bessel functions are critical in scientific computing for applications such\nas machine learning, protein structure modeling, and robotics. However,\ncurrently, available routines lack precision or fail for certain input ranges,\nsuch as when the order $v$ is large, and GPU-specific implementations are\nlimited. We address the precision limitations of current numerical\nimplementations while dramatically improving the runtime. We propose two novel\nalgorithms for computing the logarithm of modified Bessel functions of the\nfirst and second kinds by computing intermediate values on a logarithmic scale.\nOur algorithms are robust and never have issues with underflows or overflows\nwhile having relative errors on the order of machine precision, even for inputs\nwhere existing libraries fail. In C++/CUDA, our algorithms have median and\nmaximum speedups of 45x and 6150x for GPU and 17x and 3403x for CPU,\nrespectively, over the ranges of inputs and third-party libraries tested.\nCompared to SciPy, the algorithms have median and maximum speedups of 77x and\n300x for GPU and 35x and 98x for CPU, respectively, over the tested inputs. The ability to robustly compute a solution and the low relative errors allow\nus to fit von Mises-Fisher, vMF, distributions to high-dimensional neural\nnetwork features. This is, e.g., relevant for uncertainty quantification in\nmetric learning. We obtain image feature data by processing CIFAR10 training\nimages with the convolutional layers of a pre-trained ResNet50. We successfully\nfit vMF distributions to 2048-, 8192-, and 32768-dimensional image feature data\nusing our algorithms. Our approach provides fast and accurate results while\nexisting implementations in SciPy and mpmath fail to fit successfully. Our approach is readily implementable on GPUs, and we provide a fast\nopen-source implementation alongside this paper.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"32 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08729","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Bessel functions are critical in scientific computing for applications such as machine learning, protein structure modeling, and robotics. However, currently, available routines lack precision or fail for certain input ranges, such as when the order $v$ is large, and GPU-specific implementations are limited. We address the precision limitations of current numerical implementations while dramatically improving the runtime. We propose two novel algorithms for computing the logarithm of modified Bessel functions of the first and second kinds by computing intermediate values on a logarithmic scale. Our algorithms are robust and never have issues with underflows or overflows while having relative errors on the order of machine precision, even for inputs where existing libraries fail. In C++/CUDA, our algorithms have median and maximum speedups of 45x and 6150x for GPU and 17x and 3403x for CPU, respectively, over the ranges of inputs and third-party libraries tested. Compared to SciPy, the algorithms have median and maximum speedups of 77x and 300x for GPU and 35x and 98x for CPU, respectively, over the tested inputs. The ability to robustly compute a solution and the low relative errors allow us to fit von Mises-Fisher, vMF, distributions to high-dimensional neural network features. This is, e.g., relevant for uncertainty quantification in metric learning. We obtain image feature data by processing CIFAR10 training images with the convolutional layers of a pre-trained ResNet50. We successfully fit vMF distributions to 2048-, 8192-, and 32768-dimensional image feature data using our algorithms. Our approach provides fast and accurate results while existing implementations in SciPy and mpmath fail to fit successfully. Our approach is readily implementable on GPUs, and we provide a fast open-source implementation alongside this paper.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在 GPU 上精确计算修正贝塞尔函数的对数

贝塞尔函数在机器学习、蛋白质结构建模和机器人等科学计算应用中至关重要。然而，目前可用的例程缺乏精度，或者在某些输入范围（如阶数 $v$ 较大时）失效，而且 GPU 特定的实现也受到限制。我们解决了当前数值实现的精度限制问题，同时显著改善了运行时间。我们提出了两种新颖的算法，通过在对数尺度上计算中间值来计算第一种和第二种修正贝塞尔函数的对数。我们的算法非常稳健，从未出现过下溢或溢出问题，同时具有机器精度数量级的相对误差，即使对于现有库失效的输入也是如此。在 C++/CUDA 中，我们的算法在所测试的输入和第三方库的范围内，GPU 的中位速度和最大速度分别提高了 45 倍和 6150 倍，CPU 的中位速度和最大速度分别提高了 17 倍和 3403 倍。稳健计算解决方案的能力和较低的相对误差使我们能够将 von Mises-Fisher（vMF）分布拟合到高维神经网络特征中。这与度量学习中的不确定性量化等相关。我们通过使用预先训练好的 ResNet50 的卷积层处理 CIFAR10 训练图像来获取图像特征数据。我们利用算法成功地将 vMF 分布拟合到 2048、8192 和 32768 维图像特征数据中。我们的方法提供了快速而准确的结果，而现有的 SciPy 和 mpmath 实现却无法成功拟合。我们的方法很容易在 GPU 上实现，我们在本文中还提供了一个快速的开源实现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Distributed, Parallel, and Cluster Computing

自引率

0.00%

发文量

期刊最新文献

Massively parallel CMA-ES with increasing population Communication Lower Bounds and Optimal Algorithms for Symmetric Matrix Computations Energy Efficiency Support for Software Defined Networks: a Serverless Computing Approach CountChain: A Decentralized Oracle Network for Counting Systems Delay Analysis of EIP-4844