Advancing Matrix Decomposition Efficiency: A Study on FT-Matrix DSP Based SVD Optimization

IF 3.7 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Journal of Cloud Computing-Advances Systems and Applications Pub Date : 2023-07-01 DOI:10.1109/CSCloud-EdgeCom58631.2023.00085
Anxing Xie, Yonghua Hu, Aobo Cheng, Zhuoyou Tang, P. Liu, Xin Zhang
{"title":"Advancing Matrix Decomposition Efficiency: A Study on FT-Matrix DSP Based SVD Optimization","authors":"Anxing Xie, Yonghua Hu, Aobo Cheng, Zhuoyou Tang, P. Liu, Xin Zhang","doi":"10.1109/CSCloud-EdgeCom58631.2023.00085","DOIUrl":null,"url":null,"abstract":"Matrix decomposition is a fundamental operation in linear algebra, and it has various applications in machine learning, signal processing, edge computing, and many other fields. Singular Value Decomposition (SVD) is a matrix decomposition method that can break down a matrix into three matrices: two orthogonal matrices and a diagonal matrix. With the development of domestic high-performance Digital Signal Value Processors (DSP), the demand for matrix computation based on DSP platforms is increasing. The research of SVD implemented based on DSP is important and meaningful. However, accessing the high-performance algorithm requires developers who are familiar with the hardware characteristics, in order to combine the unique features of the algorithm with the limited hardware resources. To reduce the cost of computing the SVD in matrix, we implement a vectorization mapping method for the SVD algorithm on the FT-M7002. The single instruction multiple data (SIMD) instructions embedded in the FT-M7002 processor were utilized to exploit the data-level parallelism in the SVD algorithm. Instead of using data movement and a scalar processing unit (SPU), we compute with a single vector processing element (VPE). Additionally, DMA transfer algorithm is designed to implement matrix transposition and resolve the issue of discontinuous data access. Experimental results show that the optimized SVD algorithm improves execution performance relative to the original SVD algorithm on FT by up to 5.0 ×. Furthermore, we demonstrate that the optimized SVD algorithm on the FT-M7002 performs 1.0-2.0× faster than the optimized SVD algorithm on TMS320C6678 processor.","PeriodicalId":56007,"journal":{"name":"Journal of Cloud Computing-Advances Systems and Applications","volume":"97 1","pages":"464-469"},"PeriodicalIF":3.7000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cloud Computing-Advances Systems and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/CSCloud-EdgeCom58631.2023.00085","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Matrix decomposition is a fundamental operation in linear algebra, and it has various applications in machine learning, signal processing, edge computing, and many other fields. Singular Value Decomposition (SVD) is a matrix decomposition method that can break down a matrix into three matrices: two orthogonal matrices and a diagonal matrix. With the development of domestic high-performance Digital Signal Value Processors (DSP), the demand for matrix computation based on DSP platforms is increasing. The research of SVD implemented based on DSP is important and meaningful. However, accessing the high-performance algorithm requires developers who are familiar with the hardware characteristics, in order to combine the unique features of the algorithm with the limited hardware resources. To reduce the cost of computing the SVD in matrix, we implement a vectorization mapping method for the SVD algorithm on the FT-M7002. The single instruction multiple data (SIMD) instructions embedded in the FT-M7002 processor were utilized to exploit the data-level parallelism in the SVD algorithm. Instead of using data movement and a scalar processing unit (SPU), we compute with a single vector processing element (VPE). Additionally, DMA transfer algorithm is designed to implement matrix transposition and resolve the issue of discontinuous data access. Experimental results show that the optimized SVD algorithm improves execution performance relative to the original SVD algorithm on FT by up to 5.0 ×. Furthermore, we demonstrate that the optimized SVD algorithm on the FT-M7002 performs 1.0-2.0× faster than the optimized SVD algorithm on TMS320C6678 processor.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
提高矩阵分解效率:基于ft矩阵DSP的SVD优化研究
矩阵分解是线性代数中的一项基本运算,在机器学习、信号处理、边缘计算等许多领域都有广泛的应用。奇异值分解(SVD)是一种矩阵分解方法,它可以将一个矩阵分解成三个矩阵:两个正交矩阵和一个对角矩阵。随着国内高性能数字信号值处理器(DSP)的发展,基于DSP平台的矩阵计算需求越来越大。基于DSP实现奇异值分解的研究是非常重要和有意义的。然而,访问高性能算法需要熟悉硬件特性的开发人员,以便将算法的独特特性与有限的硬件资源相结合。为了减少矩阵SVD的计算成本,我们在FT-M7002上实现了SVD算法的矢量化映射方法。利用FT-M7002处理器内嵌的单指令多数据(SIMD)指令,利用SVD算法的数据级并行性。我们不使用数据移动和标量处理单元(SPU),而是使用单个向量处理元素(VPE)进行计算。另外,设计了DMA传输算法,实现了矩阵变换,解决了数据访问不连续的问题。实验结果表明,与原SVD算法相比,优化后的SVD算法在FT上的执行性能提高了5.0倍。此外,我们还证明了优化后的奇异值分解算法在FT-M7002上的运算速度比在TMS320C6678处理器上的运算速度快1.0-2.0倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Cloud Computing-Advances Systems and Applications
Journal of Cloud Computing-Advances Systems and Applications Computer Science-Computer Networks and Communications
CiteScore
6.80
自引率
7.50%
发文量
76
审稿时长
75 days
期刊介绍: The Journal of Cloud Computing: Advances, Systems and Applications (JoCCASA) will publish research articles on all aspects of Cloud Computing. Principally, articles will address topics that are core to Cloud Computing, focusing on the Cloud applications, the Cloud systems, and the advances that will lead to the Clouds of the future. Comprehensive review and survey articles that offer up new insights, and lay the foundations for further exploratory and experimental work, are also relevant.
期刊最新文献
Research on electromagnetic vibration energy harvester for cloud-edge-end collaborative architecture in power grid FedEem: a fairness-based asynchronous federated learning mechanism Adaptive device sampling and deadline determination for cloud-based heterogeneous federated learning Review on the application of cloud computing in the sports industry Improving cloud storage and privacy security for digital twin based medical records
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1