Pub Date : 2017-10-01DOI: 10.1109/SiPS.2017.8110009
M. Masera, M. Martina, G. Masera
In this paper, we show a class of relationships which link Discrete Cosine Transforms (DCT) and Discrete Sine Transforms (DST) of types V, VI, VII and VIII, which have been recently considered for inclusion in the future video coding technology. In particular, the proposed relationships allow to compute the DCT-V and the DCT-VIII as functions of the DCT-VI and the DST-VII respectively, plus simple reordering and sign-inversion. Moreover, this paper exploits the proposed relationships and the Winograd factorization of the Discrete Fourier Transform to construct low-complexity factorizations for computing the DCT-V and the DCT-VIII of length 4 and 8. Finally, the proposed signal-flow-graphs have been implemented using an FPGA technology, thus showing reduced hardware utilization with respect to the direct implementation of the matrix-vector multiplication algorithm.
{"title":"Odd type DCT/DST for video coding: Relationships and low-complexity implementations","authors":"M. Masera, M. Martina, G. Masera","doi":"10.1109/SiPS.2017.8110009","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8110009","url":null,"abstract":"In this paper, we show a class of relationships which link Discrete Cosine Transforms (DCT) and Discrete Sine Transforms (DST) of types V, VI, VII and VIII, which have been recently considered for inclusion in the future video coding technology. In particular, the proposed relationships allow to compute the DCT-V and the DCT-VIII as functions of the DCT-VI and the DST-VII respectively, plus simple reordering and sign-inversion. Moreover, this paper exploits the proposed relationships and the Winograd factorization of the Discrete Fourier Transform to construct low-complexity factorizations for computing the DCT-V and the DCT-VIII of length 4 and 8. Finally, the proposed signal-flow-graphs have been implemented using an FPGA technology, thus showing reduced hardware utilization with respect to the direct implementation of the matrix-vector multiplication algorithm.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"337 8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123232119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-10-01DOI: 10.1109/SiPS.2017.8110014
Furkan Ercan, C. Condo, W. Gross
Polar codes have been selected for use within 5G networks, and are being considered for data and control channel for additional 5G scenarios, like the next generation ultra reliable low latency channel. As a result, efficient fast polar code decoder implementations are essential. In this work, we present a new fast simplified successive cancellation (Fast-SSC) decoder architecture. Our proposed solution is able to reduce the memory requirements and has an improved throughput with respect to state of the art Fast-SSC decoders. We achieve these objectives through a more efficient memory utilization than that of Fast-SSC, which also enables to execute multiple instructions in a single clock cycle. Our work shows that, compared to the state of the art, memory requirements are reduced by 22.2%; at the same time, a throughput improvement of 11.6% is achieved with (1024, 512) polar codes. Comparing equal throughputs, the memory requirements are reduced by up to 60.4%.
{"title":"Reduced-memory high-throughput fast-SSC polar code decoder architecture","authors":"Furkan Ercan, C. Condo, W. Gross","doi":"10.1109/SiPS.2017.8110014","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8110014","url":null,"abstract":"Polar codes have been selected for use within 5G networks, and are being considered for data and control channel for additional 5G scenarios, like the next generation ultra reliable low latency channel. As a result, efficient fast polar code decoder implementations are essential. In this work, we present a new fast simplified successive cancellation (Fast-SSC) decoder architecture. Our proposed solution is able to reduce the memory requirements and has an improved throughput with respect to state of the art Fast-SSC decoders. We achieve these objectives through a more efficient memory utilization than that of Fast-SSC, which also enables to execute multiple instructions in a single clock cycle. Our work shows that, compared to the state of the art, memory requirements are reduced by 22.2%; at the same time, a throughput improvement of 11.6% is achieved with (1024, 512) polar codes. Comparing equal throughputs, the memory requirements are reduced by up to 60.4%.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125720328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-07-01DOI: 10.1109/SiPS.2017.8110021
Yoonho Boo, Wonyong Sung
Deep neural networks (DNNs) usually demand a large amount of operations for real-time inference. Especially, fully-connected layers contain a large number of weights, thus they usually need many off-chip memory accesses for inference. We propose a weight compression method for deep neural networks, which allows values of +1 or −1 only at predetermined positions of the weights so that decoding using a table can be conducted easily. For example, the structured sparse (8,2) coding allows at most two non-zero values among eight weights. This method not only enables multiplication-free DNN implementations but also compresses the weight storage by up to x32 compared to floating-point networks. Weight distribution normalization and gradual pruning techniques are applied to mitigate the performance degradation. The experiments are conducted with fully-connected deep neural networks and convolutional neural networks.
{"title":"Structured sparse ternary weight coding of deep neural networks for efficient hardware implementations","authors":"Yoonho Boo, Wonyong Sung","doi":"10.1109/SiPS.2017.8110021","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8110021","url":null,"abstract":"Deep neural networks (DNNs) usually demand a large amount of operations for real-time inference. Especially, fully-connected layers contain a large number of weights, thus they usually need many off-chip memory accesses for inference. We propose a weight compression method for deep neural networks, which allows values of +1 or −1 only at predetermined positions of the weights so that decoding using a table can be conducted easily. For example, the structured sparse (8,2) coding allows at most two non-zero values among eight weights. This method not only enables multiplication-free DNN implementations but also compresses the weight storage by up to x32 compared to floating-point networks. Weight distribution normalization and gradual pruning techniques are applied to mitigate the performance degradation. The experiments are conducted with fully-connected deep neural networks and convolutional neural networks.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131872583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-05-16DOI: 10.1109/SiPS.2017.8109987
C. Condo, Seyyed Ali Hashemi, W. Gross
Polar codes are a family of capacity-achieving error-correcting codes, and they have been selected as part of the next generation wireless communication standard. Each polar code bit-channel is assigned a reliability value, used to determine which bits transmit information and which parity. Relative reliabilities need to be known by both encoders and decoders: in case of multi-mode systems, where multiple code lengths and code rates are supported, the storage of relative reliabilities can lead to high implementation complexity. In this work, we observe patterns among code reliabilities, and propose an approximate computation technique to easily represent the reliabilities of multiple codes, through a limited set of variables and update rules. The proposed method allows to tune the trade-off between reliability accuracy and implementation complexity. An approximate computation architecture for encoders and decoders is designed and implemented, showing 50.7% less area occupation than storage-based solutions, with less than 0.05 dB error correction performance degradation. Used within a standard SCL decoder, the proposed architecture results in up to 17.0% less area occupation.
{"title":"Efficient bit-channel reliability computation for multi-mode polar code encoders and decoders","authors":"C. Condo, Seyyed Ali Hashemi, W. Gross","doi":"10.1109/SiPS.2017.8109987","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8109987","url":null,"abstract":"Polar codes are a family of capacity-achieving error-correcting codes, and they have been selected as part of the next generation wireless communication standard. Each polar code bit-channel is assigned a reliability value, used to determine which bits transmit information and which parity. Relative reliabilities need to be known by both encoders and decoders: in case of multi-mode systems, where multiple code lengths and code rates are supported, the storage of relative reliabilities can lead to high implementation complexity. In this work, we observe patterns among code reliabilities, and propose an approximate computation technique to easily represent the reliabilities of multiple codes, through a limited set of variables and update rules. The proposed method allows to tune the trade-off between reliability accuracy and implementation complexity. An approximate computation architecture for encoders and decoders is designed and implemented, showing 50.7% less area occupation than storage-based solutions, with less than 0.05 dB error correction performance degradation. Used within a standard SCL decoder, the proposed architecture results in up to 17.0% less area occupation.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124777050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2017-05-05DOI: 10.1109/SiPS.2017.8109977
P. Giard, Alexios Balatsoukas-Stimming, A. Burg
Polar codes were recently chosen to protect the control channel information in the next-generation mobile communication standard (5G) defined by the 3GPP. As a result, receivers will have to implement blind detection of polar coded frames in order to keep complexity, latency, and power consumption tractable. As a newly proposed class of block codes, the problem of polar-code blind detection has received very little attention. In this work, we propose a low-complexity blind-detection algorithm for polar-encoded frames. We base this algorithm on a novel detection metric with update rules that leverage the a priori knowledge of the frozen-bit locations, exploiting the inherent structures that these locations impose on a polar-encoded block of data. We show that the proposed detection metric allows to clearly distinguish polar-encoded frames from other types of data by considering the cumulative distribution functions of the detection metric, and the receiver operating characteristic. The presented results are tailored to the 5G standardization effort discussions, i.e., we consider a short low-rate polar code concatenated with a CRC.
{"title":"Blind detection of polar codes","authors":"P. Giard, Alexios Balatsoukas-Stimming, A. Burg","doi":"10.1109/SiPS.2017.8109977","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8109977","url":null,"abstract":"Polar codes were recently chosen to protect the control channel information in the next-generation mobile communication standard (5G) defined by the 3GPP. As a result, receivers will have to implement blind detection of polar coded frames in order to keep complexity, latency, and power consumption tractable. As a newly proposed class of block codes, the problem of polar-code blind detection has received very little attention. In this work, we propose a low-complexity blind-detection algorithm for polar-encoded frames. We base this algorithm on a novel detection metric with update rules that leverage the a priori knowledge of the frozen-bit locations, exploiting the inherent structures that these locations impose on a polar-encoded block of data. We show that the proposed detection metric allows to clearly distinguish polar-encoded frames from other types of data by considering the cumulative distribution functions of the detection metric, and the receiver operating characteristic. The presented results are tailored to the 5G standardization effort discussions, i.e., we consider a short low-rate polar code concatenated with a CRC.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132284398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.1109/SiPS.2017.8109989
Tzu-Hao Tai, Hsin-Jung Chen, W. Chung, Ta-Sung Lee
Massive multiuser multi-input multi-output (MU-MIMO) system adopting large amount of antennas is a promising technique to improve spectral efficiency and energy efficiency for next generation wireless communication systems. In practice, the channels composed of pairs of transmit and receive antennas are often correlated and computational complexity is one of the critical concern in implementation issue. Hence, antenna selection techniques can be adopted to improve system performance. In this paper, we propose a norm-and-correlation-based selection algorithm for energy efficiency maximization to decide transmit RF chain configuration under the total power constraint in massive MU-MIMO systems.
{"title":"Energy efficient norm-and-correlation-based antenna selection algorithm in spatially correlated massive multi-user MIMO systems","authors":"Tzu-Hao Tai, Hsin-Jung Chen, W. Chung, Ta-Sung Lee","doi":"10.1109/SiPS.2017.8109989","DOIUrl":"https://doi.org/10.1109/SiPS.2017.8109989","url":null,"abstract":"Massive multiuser multi-input multi-output (MU-MIMO) system adopting large amount of antennas is a promising technique to improve spectral efficiency and energy efficiency for next generation wireless communication systems. In practice, the channels composed of pairs of transmit and receive antennas are often correlated and computational complexity is one of the critical concern in implementation issue. Hence, antenna selection techniques can be adopted to improve system performance. In this paper, we propose a norm-and-correlation-based selection algorithm for energy efficiency maximization to decide transmit RF chain configuration under the total power constraint in massive MU-MIMO systems.","PeriodicalId":251688,"journal":{"name":"2017 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125949148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}