基于Nvidia CUDA架构的显卡上三对角矩阵的并行SVD算法

NaUKMA Research Papers. Computer Science Pub Date : 2021-12-10 DOI:10.18523/2617-3808.2021.4.16-22

Mykola Semylitko, G. Malaschonok

{"title":"基于Nvidia CUDA架构的显卡上三对角矩阵的并行SVD算法","authors":"Mykola Semylitko, G. Malaschonok","doi":"10.18523/2617-3808.2021.4.16-22","DOIUrl":null,"url":null,"abstract":"SVD (Singular Value Decomposition) algorithm is used in recommendation systems, machine learning, image processing, and in various algorithms for working with matrices which can be very large and Big Data, so, given the peculiarities of this algorithm, it can be performed on a large number of computing threads that have only video cards.CUDA is a parallel computing platform and application programming interface model created by Nvidia. It allows software developers and software engineers to use a CUDA-enabled graphics processing unit for general purpose processing – an approach termed GPGPU (general-purpose computing on graphics processing units). The GPU provides much higher instruction throughput and memory bandwidth than the CPU within a similar price and power envelope. Many applications leverage these higher capabilities to run faster on the GPU than on the CPU. Other computing devices, like FPGAs, are also very energy efficient, but they offer much less programming flexibility than GPUs.The developed modification uses the CUDA architecture, which is intended for a large number of simultaneous calculations, which allows to quickly process matrices of very large sizes. The algorithm of parallel SVD for a three-diagonal matrix based on the Givents rotation provides a high accuracy of calculations. Also the algorithm has a number of optimizations to work with memory and multiplication algorithms that can significantly reduce the computation time discarding empty iterations.This article proposes an approach that will reduce the computation time and, consequently, resources and costs. The developed algorithm can be used with the help of a simple and convenient API in C ++ and Java, as well as will be improved by using dynamic parallelism or parallelization of multiplication operations. Also the obtained results can be used by other developers for comparison, as all conditions of the research are described in detail, and the code is in free access.","PeriodicalId":433538,"journal":{"name":"NaUKMA Research Papers. Computer Science","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Parallel SVD Algorithm for a Three-Diagonal Matrix on a Video Card Using the Nvidia CUDA Architecture\",\"authors\":\"Mykola Semylitko, G. Malaschonok\",\"doi\":\"10.18523/2617-3808.2021.4.16-22\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"SVD (Singular Value Decomposition) algorithm is used in recommendation systems, machine learning, image processing, and in various algorithms for working with matrices which can be very large and Big Data, so, given the peculiarities of this algorithm, it can be performed on a large number of computing threads that have only video cards.CUDA is a parallel computing platform and application programming interface model created by Nvidia. It allows software developers and software engineers to use a CUDA-enabled graphics processing unit for general purpose processing – an approach termed GPGPU (general-purpose computing on graphics processing units). The GPU provides much higher instruction throughput and memory bandwidth than the CPU within a similar price and power envelope. Many applications leverage these higher capabilities to run faster on the GPU than on the CPU. Other computing devices, like FPGAs, are also very energy efficient, but they offer much less programming flexibility than GPUs.The developed modification uses the CUDA architecture, which is intended for a large number of simultaneous calculations, which allows to quickly process matrices of very large sizes. The algorithm of parallel SVD for a three-diagonal matrix based on the Givents rotation provides a high accuracy of calculations. Also the algorithm has a number of optimizations to work with memory and multiplication algorithms that can significantly reduce the computation time discarding empty iterations.This article proposes an approach that will reduce the computation time and, consequently, resources and costs. The developed algorithm can be used with the help of a simple and convenient API in C ++ and Java, as well as will be improved by using dynamic parallelism or parallelization of multiplication operations. Also the obtained results can be used by other developers for comparison, as all conditions of the research are described in detail, and the code is in free access.\",\"PeriodicalId\":433538,\"journal\":{\"name\":\"NaUKMA Research Papers. Computer Science\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"NaUKMA Research Papers. Computer Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18523/2617-3808.2021.4.16-22\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"NaUKMA Research Papers. Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18523/2617-3808.2021.4.16-22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

SVD(奇异值分解)算法被用于推荐系统、机器学习、图像处理，以及各种处理矩阵的算法，这些矩阵可能非常大，数据也很大，因此，考虑到该算法的特殊性，它可以在只有显卡的大量计算线程上执行。CUDA是Nvidia创建的并行计算平台和应用程序编程接口模型。它允许软件开发人员和软件工程师使用支持cuda的图形处理单元进行通用处理——一种称为GPGPU(图形处理单元上的通用计算)的方法。在类似的价格和功率范围内，GPU提供比CPU更高的指令吞吐量和内存带宽。许多应用程序利用这些更高的功能在GPU上比在CPU上运行得更快。其他计算设备，如fpga，也非常节能，但它们提供的编程灵活性远不如gpu。开发的修改使用CUDA架构，用于大量同时计算，允许快速处理非常大尺寸的矩阵。基于Givents旋转的三对角矩阵并行奇异值分解算法提供了较高的计算精度。此外，该算法还对内存和乘法算法进行了许多优化，可以显著减少丢弃空迭代的计算时间。本文提出了一种方法，可以减少计算时间，从而减少资源和成本。所开发的算法可以在c++和Java中使用一个简单方便的API，并且可以通过使用乘法运算的动态并行化或并行化来改进。由于详细描述了研究的所有条件，并且代码是免费访问的，因此获得的结果可以被其他开发人员用于比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Parallel SVD Algorithm for a Three-Diagonal Matrix on a Video Card Using the Nvidia CUDA Architecture

SVD (Singular Value Decomposition) algorithm is used in recommendation systems, machine learning, image processing, and in various algorithms for working with matrices which can be very large and Big Data, so, given the peculiarities of this algorithm, it can be performed on a large number of computing threads that have only video cards.CUDA is a parallel computing platform and application programming interface model created by Nvidia. It allows software developers and software engineers to use a CUDA-enabled graphics processing unit for general purpose processing – an approach termed GPGPU (general-purpose computing on graphics processing units). The GPU provides much higher instruction throughput and memory bandwidth than the CPU within a similar price and power envelope. Many applications leverage these higher capabilities to run faster on the GPU than on the CPU. Other computing devices, like FPGAs, are also very energy efficient, but they offer much less programming flexibility than GPUs.The developed modification uses the CUDA architecture, which is intended for a large number of simultaneous calculations, which allows to quickly process matrices of very large sizes. The algorithm of parallel SVD for a three-diagonal matrix based on the Givents rotation provides a high accuracy of calculations. Also the algorithm has a number of optimizations to work with memory and multiplication algorithms that can significantly reduce the computation time discarding empty iterations.This article proposes an approach that will reduce the computation time and, consequently, resources and costs. The developed algorithm can be used with the help of a simple and convenient API in C ++ and Java, as well as will be improved by using dynamic parallelism or parallelization of multiplication operations. Also the obtained results can be used by other developers for comparison, as all conditions of the research are described in detail, and the code is in free access.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

NaUKMA Research Papers. Computer Science

自引率

0.00%

发文量

期刊最新文献

Bicycle Protection System Using GPS/GSM Modules аnd Radio Protocol Parking Spot Occupancy Classification Using Deep Learning Information System Assessment of the Creditworthiness of an Individual Transdisciplinary Information and Analytical Platform Supporting Evaluation Processes Two-Stage Transportation Problem with Unknown Consumer Demands