{"title":"TC-GVF: Tensor Core GPU-Based Vector Fitting via Accelerated Tall-Skinny QR Solvers","authors":"Vinay Kukutla;Ramachandra Achar;Wai-Kong Lee","doi":"10.1109/TCPMT.2024.3410298","DOIUrl":null,"url":null,"abstract":"QR decomposition and solution of linear least-squares-based large system of equations form the backbone of computational flow in many scientific applications. Usually, these account for the bulk of the computational cost in these applications, such as in vector fitting (VF) methods, which are widely used for system identification via rational function approximation from tabulated data of high-speed modules. Since the VF algorithm is iterative in nature, minimizing its computational cost and increasing its parallel efficiency on mixed CPU and GPU environments are critical in reducing the time needed for each iteration. In this article, a novel tensor core-based QR (TC-QR) decomposition method and tensor core-based linear least-squares-based solver (TC-LLS) are introduced to speed up the computationally expensive steps of QR factorization and solution to a set of linear least-squares equations, exploiting the emerging GPU platforms with tensor core (TC) architectures. These modules are utilized in developing the TC GPU-based VF (TC-GVF) algorithm, providing significant speedup compared with the state-of-the-art GVF implementations in the literature.","PeriodicalId":13085,"journal":{"name":"IEEE Transactions on Components, Packaging and Manufacturing Technology","volume":"15 1","pages":"54-63"},"PeriodicalIF":3.0000,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Components, Packaging and Manufacturing Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10553395/","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
QR decomposition and solution of linear least-squares-based large system of equations form the backbone of computational flow in many scientific applications. Usually, these account for the bulk of the computational cost in these applications, such as in vector fitting (VF) methods, which are widely used for system identification via rational function approximation from tabulated data of high-speed modules. Since the VF algorithm is iterative in nature, minimizing its computational cost and increasing its parallel efficiency on mixed CPU and GPU environments are critical in reducing the time needed for each iteration. In this article, a novel tensor core-based QR (TC-QR) decomposition method and tensor core-based linear least-squares-based solver (TC-LLS) are introduced to speed up the computationally expensive steps of QR factorization and solution to a set of linear least-squares equations, exploiting the emerging GPU platforms with tensor core (TC) architectures. These modules are utilized in developing the TC GPU-based VF (TC-GVF) algorithm, providing significant speedup compared with the state-of-the-art GVF implementations in the literature.
期刊介绍:
IEEE Transactions on Components, Packaging, and Manufacturing Technology publishes research and application articles on modeling, design, building blocks, technical infrastructure, and analysis underpinning electronic, photonic and MEMS packaging, in addition to new developments in passive components, electrical contacts and connectors, thermal management, and device reliability; as well as the manufacture of electronics parts and assemblies, with broad coverage of design, factory modeling, assembly methods, quality, product robustness, and design-for-environment.