Towards a GPU accelerated selective sparsity multilayer perceptron algorithm using K-Nearest Neighbors search

Workshop Proceedings of the 51st International Conference on Parallel Processing Pub Date : 2022-08-29 DOI:10.1145/3547276.3548634

B. H. Meyer, Wagner M. Nunan Zola

{"title":"Towards a GPU accelerated selective sparsity multilayer perceptron algorithm using K-Nearest Neighbors search","authors":"B. H. Meyer, Wagner M. Nunan Zola","doi":"10.1145/3547276.3548634","DOIUrl":null,"url":null,"abstract":"The use of artificial neural networks and deep learning is common in several areas of knowledge. In many situations, it is necessary to use neural networks with many neurons. For example, the Extreme Classification problems can use neural networks that process more than 500,000 classes and inputs with more than 100,000 dimensions, which can make the training process unfeasible due to the high computational cost required. To overcome this limitation, several techniques were proposed in past works, such as the SLIDE algorithm, whose implementation is based on the construction of hash tables and on CPU parallelism. This work proposes the SLIDE-GPU, which replaces the use of hash tables by algorithms that use GPU to search for approximate neighbors, or approximate nearest neighbors (ANN) search. In addition, SLIDE-GPU also proposes the use of GPU to accelerate the activation step of neural networks. Among the experiments carried out, it was possible to notice a training process acceleration of up to 268% in execution time considering the inference accuracy, although currently maintaining the backpropagation phase with CPU processing. This suggests that further acceleration can be obtained in future work, by using massive parallelism in the entire process. The ANN-based technique provides better inference accuracy at each epoch, which helps producing the global acceleration, besides using the GPU in the neuron activation step. The GPU neuron activation acceleration reached a 28.09 times shorter execution time compared to the CPU implementation on this step alone.","PeriodicalId":255540,"journal":{"name":"Workshop Proceedings of the 51st International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop Proceedings of the 51st International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3547276.3548634","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The use of artificial neural networks and deep learning is common in several areas of knowledge. In many situations, it is necessary to use neural networks with many neurons. For example, the Extreme Classification problems can use neural networks that process more than 500,000 classes and inputs with more than 100,000 dimensions, which can make the training process unfeasible due to the high computational cost required. To overcome this limitation, several techniques were proposed in past works, such as the SLIDE algorithm, whose implementation is based on the construction of hash tables and on CPU parallelism. This work proposes the SLIDE-GPU, which replaces the use of hash tables by algorithms that use GPU to search for approximate neighbors, or approximate nearest neighbors (ANN) search. In addition, SLIDE-GPU also proposes the use of GPU to accelerate the activation step of neural networks. Among the experiments carried out, it was possible to notice a training process acceleration of up to 268% in execution time considering the inference accuracy, although currently maintaining the backpropagation phase with CPU processing. This suggests that further acceleration can be obtained in future work, by using massive parallelism in the entire process. The ANN-based technique provides better inference accuracy at each epoch, which helps producing the global acceleration, besides using the GPU in the neuron activation step. The GPU neuron activation acceleration reached a 28.09 times shorter execution time compared to the CPU implementation on this step alone.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于k近邻搜索的GPU加速选择稀疏多层感知器算法

人工神经网络和深度学习的使用在一些知识领域很常见。在许多情况下，有必要使用具有许多神经元的神经网络。例如，极端分类问题可以使用处理超过500,000个类和超过100,000个维度的输入的神经网络，这可能会使训练过程由于所需的高计算成本而变得不可行的。为了克服这一限制，在过去的工作中提出了几种技术，例如SLIDE算法，其实现基于哈希表的构造和CPU并行性。这项工作提出了SLIDE-GPU，它通过使用GPU搜索近似邻居或近似近邻(ANN)搜索的算法取代哈希表的使用。此外，SLIDE-GPU还提出利用GPU加速神经网络的激活步骤。在进行的实验中，考虑到推理精度，可以注意到训练过程在执行时间上的加速高达268%，尽管目前仍保持CPU处理的反向传播阶段。这表明，在未来的工作中，通过在整个过程中使用大规模并行，可以获得进一步的加速。除了在神经元激活步骤中使用GPU之外，基于人工神经网络的技术在每个epoch提供了更好的推理精度，这有助于产生全局加速。与CPU实现相比，GPU神经元激活加速在这一步上的执行时间缩短了28.09倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Workshop Proceedings of the 51st International Conference on Parallel Processing

自引率

0.00%

发文量

期刊最新文献

A Software/Hardware Co-design Local Irregular Sparsity Method for Accelerating CNNs on FPGA A Fast and Secure AKA Protocol for B5G Execution Flow Aware Profiling for ROS-based Autonomous Vehicle Software A User-Based Bike Return Algorithm for Docked Bike Sharing Systems Extracting High Definition Map Information from Aerial Images