{"title":"在图形处理单元上的块涡轮码的高吞吐量解码","authors":"Junhee Cho, Wonyong Sung","doi":"10.1109/SiPS.2017.8109996","DOIUrl":null,"url":null,"abstract":"Block turbo codes (BTCs) can provide very powerful forward error correction (FEC) for several applications, such as optical networks and NAND flash memory devices. These applications require soft-decision FEC codes to guarantee the bit error rate (BER) of under 10−12 which is, however, very difficult to verify with a CPU simulator. In this paper, we present high-throughput graphics processing unit (GPU) based turbo decoding software to aid the development of very low error rate BTCs. For effective utilization of the GPUs, the software processes multiple BTC frames simultaneously and minimizes the global memory access latency. Especially, the Chase-Pyndiah algorithm is efficiently parallelized to decode every row and column of a BTC word. The GPU-based simulator achieved the throughputs of about 80 and 150 Mb/s for decoding of BTCs composed of Hamming and BCH codes, respectively. The throughput results are up to 124 times higher when compared to the CPU-based ones.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"High-throughput decoding of block turbo codes on graphics processing units\",\"authors\":\"Junhee Cho, Wonyong Sung\",\"doi\":\"10.1109/SiPS.2017.8109996\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Block turbo codes (BTCs) can provide very powerful forward error correction (FEC) for several applications, such as optical networks and NAND flash memory devices. These applications require soft-decision FEC codes to guarantee the bit error rate (BER) of under 10−12 which is, however, very difficult to verify with a CPU simulator. In this paper, we present high-throughput graphics processing unit (GPU) based turbo decoding software to aid the development of very low error rate BTCs. For effective utilization of the GPUs, the software processes multiple BTC frames simultaneously and minimizes the global memory access latency. Especially, the Chase-Pyndiah algorithm is efficiently parallelized to decode every row and column of a BTC word. The GPU-based simulator achieved the throughputs of about 80 and 150 Mb/s for decoding of BTCs composed of Hamming and BCH codes, respectively. The throughput results are up to 124 times higher when compared to the CPU-based ones.\",\"PeriodicalId\":251688,\"journal\":{\"name\":\"2017 IEEE International Workshop on Signal Processing Systems (SiPS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Workshop on Signal Processing Systems (SiPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SiPS.2017.8109996\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SiPS.2017.8109996","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
High-throughput decoding of block turbo codes on graphics processing units
Block turbo codes (BTCs) can provide very powerful forward error correction (FEC) for several applications, such as optical networks and NAND flash memory devices. These applications require soft-decision FEC codes to guarantee the bit error rate (BER) of under 10−12 which is, however, very difficult to verify with a CPU simulator. In this paper, we present high-throughput graphics processing unit (GPU) based turbo decoding software to aid the development of very low error rate BTCs. For effective utilization of the GPUs, the software processes multiple BTC frames simultaneously and minimizes the global memory access latency. Especially, the Chase-Pyndiah algorithm is efficiently parallelized to decode every row and column of a BTC word. The GPU-based simulator achieved the throughputs of about 80 and 150 Mb/s for decoding of BTCs composed of Hamming and BCH codes, respectively. The throughput results are up to 124 times higher when compared to the CPU-based ones.