Towards a GPU accelerated selective sparsity multilayer perceptron algorithm using K-Nearest Neighbors search

B. H. Meyer, Wagner M. Nunan Zola
{"title":"Towards a GPU accelerated selective sparsity multilayer perceptron algorithm using K-Nearest Neighbors search","authors":"B. H. Meyer, Wagner M. Nunan Zola","doi":"10.1145/3547276.3548634","DOIUrl":null,"url":null,"abstract":"The use of artificial neural networks and deep learning is common in several areas of knowledge. In many situations, it is necessary to use neural networks with many neurons. For example, the Extreme Classification problems can use neural networks that process more than 500,000 classes and inputs with more than 100,000 dimensions, which can make the training process unfeasible due to the high computational cost required. To overcome this limitation, several techniques were proposed in past works, such as the SLIDE algorithm, whose implementation is based on the construction of hash tables and on CPU parallelism. This work proposes the SLIDE-GPU, which replaces the use of hash tables by algorithms that use GPU to search for approximate neighbors, or approximate nearest neighbors (ANN) search. In addition, SLIDE-GPU also proposes the use of GPU to accelerate the activation step of neural networks. Among the experiments carried out, it was possible to notice a training process acceleration of up to 268% in execution time considering the inference accuracy, although currently maintaining the backpropagation phase with CPU processing. This suggests that further acceleration can be obtained in future work, by using massive parallelism in the entire process. The ANN-based technique provides better inference accuracy at each epoch, which helps producing the global acceleration, besides using the GPU in the neuron activation step. The GPU neuron activation acceleration reached a 28.09 times shorter execution time compared to the CPU implementation on this step alone.","PeriodicalId":255540,"journal":{"name":"Workshop Proceedings of the 51st International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop Proceedings of the 51st International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3547276.3548634","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The use of artificial neural networks and deep learning is common in several areas of knowledge. In many situations, it is necessary to use neural networks with many neurons. For example, the Extreme Classification problems can use neural networks that process more than 500,000 classes and inputs with more than 100,000 dimensions, which can make the training process unfeasible due to the high computational cost required. To overcome this limitation, several techniques were proposed in past works, such as the SLIDE algorithm, whose implementation is based on the construction of hash tables and on CPU parallelism. This work proposes the SLIDE-GPU, which replaces the use of hash tables by algorithms that use GPU to search for approximate neighbors, or approximate nearest neighbors (ANN) search. In addition, SLIDE-GPU also proposes the use of GPU to accelerate the activation step of neural networks. Among the experiments carried out, it was possible to notice a training process acceleration of up to 268% in execution time considering the inference accuracy, although currently maintaining the backpropagation phase with CPU processing. This suggests that further acceleration can be obtained in future work, by using massive parallelism in the entire process. The ANN-based technique provides better inference accuracy at each epoch, which helps producing the global acceleration, besides using the GPU in the neuron activation step. The GPU neuron activation acceleration reached a 28.09 times shorter execution time compared to the CPU implementation on this step alone.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于k近邻搜索的GPU加速选择稀疏多层感知器算法
人工神经网络和深度学习的使用在一些知识领域很常见。在许多情况下,有必要使用具有许多神经元的神经网络。例如,极端分类问题可以使用处理超过500,000个类和超过100,000个维度的输入的神经网络,这可能会使训练过程由于所需的高计算成本而变得不可行的。为了克服这一限制,在过去的工作中提出了几种技术,例如SLIDE算法,其实现基于哈希表的构造和CPU并行性。这项工作提出了SLIDE-GPU,它通过使用GPU搜索近似邻居或近似近邻(ANN)搜索的算法取代哈希表的使用。此外,SLIDE-GPU还提出利用GPU加速神经网络的激活步骤。在进行的实验中,考虑到推理精度,可以注意到训练过程在执行时间上的加速高达268%,尽管目前仍保持CPU处理的反向传播阶段。这表明,在未来的工作中,通过在整个过程中使用大规模并行,可以获得进一步的加速。除了在神经元激活步骤中使用GPU之外,基于人工神经网络的技术在每个epoch提供了更好的推理精度,这有助于产生全局加速。与CPU实现相比,GPU神经元激活加速在这一步上的执行时间缩短了28.09倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Software/Hardware Co-design Local Irregular Sparsity Method for Accelerating CNNs on FPGA A Fast and Secure AKA Protocol for B5G Execution Flow Aware Profiling for ROS-based Autonomous Vehicle Software A User-Based Bike Return Algorithm for Docked Bike Sharing Systems Extracting High Definition Map Information from Aerial Images
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1