GPU 上用于卷积码的 76.5-Gbps Viterbi 译码器

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE IEEE Embedded Systems Letters Pub Date : 2024-06-18 DOI:10.1109/LES.2024.3416401

Zhanxian Liu;Chufan Liu;Haijun Zhang;Ling Zhao

{"title":"GPU 上用于卷积码的 76.5-Gbps Viterbi 译码器","authors":"Zhanxian Liu;Chufan Liu;Haijun Zhang;Ling Zhao","doi":"10.1109/LES.2024.3416401","DOIUrl":null,"url":null,"abstract":"This letter presents an optimized Viterbi decoder of convolutional codes on graphics processing unit (GPU) for software defined radio (SDR) platforms. Before the forward process, channel messages are interleaved with coalesced global memory access and the interleaved messages are represented with 4 bits to improve shared memory efficiency. Moreover, we optimize on-chip memory allocations of the forward process to accelerate instruction execution. Excluding the data transfer latency between host and device, the proposed Viterbi decoder achieves 22.2 and 76.5-Gb/s throughput on Tesla V100 and RTX4090, respectively. Compared with related works, the throughput speedups achieved by the proposed decoder are from <inline-formula> <tex-math>$2.06\\times $ </tex-math></inline-formula> to <inline-formula> <tex-math>$2.93\\times $ </tex-math></inline-formula>.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"17 1","pages":"22-25"},"PeriodicalIF":2.0000,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"76.5-Gb/s Viterbi Decoder for Convolutional Codes on GPU\",\"authors\":\"Zhanxian Liu;Chufan Liu;Haijun Zhang;Ling Zhao\",\"doi\":\"10.1109/LES.2024.3416401\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This letter presents an optimized Viterbi decoder of convolutional codes on graphics processing unit (GPU) for software defined radio (SDR) platforms. Before the forward process, channel messages are interleaved with coalesced global memory access and the interleaved messages are represented with 4 bits to improve shared memory efficiency. Moreover, we optimize on-chip memory allocations of the forward process to accelerate instruction execution. Excluding the data transfer latency between host and device, the proposed Viterbi decoder achieves 22.2 and 76.5-Gb/s throughput on Tesla V100 and RTX4090, respectively. Compared with related works, the throughput speedups achieved by the proposed decoder are from <inline-formula> <tex-math>$2.06\\\\times $ </tex-math></inline-formula> to <inline-formula> <tex-math>$2.93\\\\times $ </tex-math></inline-formula>.\",\"PeriodicalId\":56143,\"journal\":{\"name\":\"IEEE Embedded Systems Letters\",\"volume\":\"17 1\",\"pages\":\"22-25\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2024-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Embedded Systems Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10561537/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Embedded Systems Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10561537/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

本文介绍了一种用于软件定义无线电（SDR）平台的图形处理单元（GPU）上卷积码的优化Viterbi解码器。在前向处理之前，将通道消息与合并的全局内存访问进行交错处理，并用4位表示交错消息，以提高共享内存效率。此外，我们还优化了正向过程的片上内存分配，以加速指令的执行。排除主机与设备之间的数据传输延迟，本文提出的Viterbi解码器在Tesla V100和RTX4090上的吞吐量分别达到22.2和76.5 gb /s。与相关工作相比，所提出的解码器实现的吞吐量速度从2.06美元提高到2.93美元。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

76.5-Gb/s Viterbi Decoder for Convolutional Codes on GPU

This letter presents an optimized Viterbi decoder of convolutional codes on graphics processing unit (GPU) for software defined radio (SDR) platforms. Before the forward process, channel messages are interleaved with coalesced global memory access and the interleaved messages are represented with 4 bits to improve shared memory efficiency. Moreover, we optimize on-chip memory allocations of the forward process to accelerate instruction execution. Excluding the data transfer latency between host and device, the proposed Viterbi decoder achieves 22.2 and 76.5-Gb/s throughput on Tesla V100 and RTX4090, respectively. Compared with related works, the throughput speedups achieved by the proposed decoder are from

$2.06\times $

$2.93\times $

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Embedded Systems Letters Engineering-Control and Systems Engineering

CiteScore

3.30

自引率

0.00%

发文量

期刊介绍： The IEEE Embedded Systems Letters (ESL), provides a forum for rapid dissemination of latest technical advances in embedded systems and related areas in embedded software. The emphasis is on models, methods, and tools that ensure secure, correct, efficient and robust design of embedded systems and their applications.

期刊最新文献

Table of Contents Compatibility Analysis and Smooth Transition of Heterogeneous Controllers in Longitudinal Merging Platoons Compressing Runtime Memory Usage via Activation Remapping for Deploying Deep Neural Networks on MCUs IEEE Embedded Systems Letters Publication Information FPGA-Based Real-Time Multi-Class Vehicle Classification Using mmWave Radar