Pub Date : 2024-10-25DOI: 10.1109/TBC.2024.3475808
Lizhi Hou;Linyao Gao;Qian Zhang;Yiling Xu;Jenq-Neng Hwang;Dong Wang
Geometry-based Point Cloud Compression (G-PCC) standard developed by the Moving Picture Experts Group has shown a promising prospect for compressing extremely sparse point clouds captured by the Light Detection And Ranging (LiDAR) equipment. However, as an essential functionality for low delay and limited bandwidth transmission, rate control for Geometry-based LiDAR Point Cloud Compression (G-LPCC) has not been fully studied. In this paper, we propose a rate control scheme for G-LPCC. We first adopt the best configuration of G-PCC for the LiDAR point cloud as the basis in terms of the Rate-Distortion (R-D) performance, which is the predictive tree (PT) for geometry compression and Region Adaptive Haar Transform (RAHT) for attribute compression. The common challenge of designing rate control algorithms for PT and RAHT is that their rates are determined by multiple factors. To address that, we propose a l domain rate control algorithm for PT that unifies the various geometry influential factors in the expression of the minimum arc length $mathrm {d}l$ to determine the final rate. A power-style geometry rate curve characterized by $mathrm {d}l$ has been modeled. By analyzing the distortion behavior of different quantization parameters, an adaptive bitrate control method is proposed to improve the R-D performance. In addition, we borrow the $rho $ factor from the previous 2D video rate control and successfully apply it to RAHT rate control. A simple and clean linear attribute rate curve characterized by $rho $ has been modeled, and a corresponding parameter estimation method based on the cumulative distribution function is proposed for bitrate control. The experimental results demonstrate that the proposed rate control algorithm can achieve accurate rate control with additional Bjontegaard-Delta-rate (BD-rate) gains.
{"title":"Rate Control for Geometry-Based LiDAR Point Cloud Compression via Multi-Factor Modeling","authors":"Lizhi Hou;Linyao Gao;Qian Zhang;Yiling Xu;Jenq-Neng Hwang;Dong Wang","doi":"10.1109/TBC.2024.3475808","DOIUrl":"https://doi.org/10.1109/TBC.2024.3475808","url":null,"abstract":"Geometry-based Point Cloud Compression (G-PCC) standard developed by the Moving Picture Experts Group has shown a promising prospect for compressing extremely sparse point clouds captured by the Light Detection And Ranging (LiDAR) equipment. However, as an essential functionality for low delay and limited bandwidth transmission, rate control for Geometry-based LiDAR Point Cloud Compression (G-LPCC) has not been fully studied. In this paper, we propose a rate control scheme for G-LPCC. We first adopt the best configuration of G-PCC for the LiDAR point cloud as the basis in terms of the Rate-Distortion (R-D) performance, which is the predictive tree (PT) for geometry compression and Region Adaptive Haar Transform (RAHT) for attribute compression. The common challenge of designing rate control algorithms for PT and RAHT is that their rates are determined by multiple factors. To address that, we propose a <italic>l</i> domain rate control algorithm for PT that unifies the various geometry influential factors in the expression of the minimum arc length <inline-formula> <tex-math>$mathrm {d}l$ </tex-math></inline-formula> to determine the final rate. A power-style geometry rate curve characterized by <inline-formula> <tex-math>$mathrm {d}l$ </tex-math></inline-formula> has been modeled. By analyzing the distortion behavior of different quantization parameters, an adaptive bitrate control method is proposed to improve the R-D performance. In addition, we borrow the <inline-formula> <tex-math>$rho $ </tex-math></inline-formula> factor from the previous 2D video rate control and successfully apply it to RAHT rate control. A simple and clean linear attribute rate curve characterized by <inline-formula> <tex-math>$rho $ </tex-math></inline-formula> has been modeled, and a corresponding parameter estimation method based on the cumulative distribution function is proposed for bitrate control. The experimental results demonstrate that the proposed rate control algorithm can achieve accurate rate control with additional Bjontegaard-Delta-rate (BD-rate) gains.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"167-179"},"PeriodicalIF":3.2,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-16DOI: 10.1109/TBC.2024.3475748
The Khai Nguyen;Ebrahim Bedeer;Ha H. Nguyen;J. Eric Salt;Colin Howlett
This paper introduces a novel blind partial transmission sequence (PTS) scheme to lower the peak-to-average-power ratio (PAPR) of orthogonal frequency division multiplexing (OFDM) systems. Unlike existing PTS schemes in which the first sub-block (SB) is preserved as a phase reference for other SBs, we propose to add an optimized canceling signal (CS) to the first SB to further reduce the PAPR. The CS is designed such that they can be reconstructed by the receiver, and subtracted from the received signals before demodulation without requiring side information (SI). Since errors in reproducing the CS at the receiver can degrade the error performance, we design a novel CS protection mechanism specifically to protect the reconstruction of the CS. The proposed method is shown to significantly reduce the PAPR and symbol error rate (SER) without sacrificing the data rate due to using SI as many other existing PTS schemes.
{"title":"Optimized Canceling Signals for PTS Schemes to Improve the PAPR of OFDM Systems Without Side Information","authors":"The Khai Nguyen;Ebrahim Bedeer;Ha H. Nguyen;J. Eric Salt;Colin Howlett","doi":"10.1109/TBC.2024.3475748","DOIUrl":"https://doi.org/10.1109/TBC.2024.3475748","url":null,"abstract":"This paper introduces a novel blind partial transmission sequence (PTS) scheme to lower the peak-to-average-power ratio (PAPR) of orthogonal frequency division multiplexing (OFDM) systems. Unlike existing PTS schemes in which the first sub-block (SB) is preserved as a phase reference for other SBs, we propose to add an optimized canceling signal (CS) to the first SB to further reduce the PAPR. The CS is designed such that they can be reconstructed by the receiver, and subtracted from the received signals before demodulation without requiring side information (SI). Since errors in reproducing the CS at the receiver can degrade the error performance, we design a novel CS protection mechanism specifically to protect the reconstruction of the CS. The proposed method is shown to significantly reduce the PAPR and symbol error rate (SER) without sacrificing the data rate due to using SI as many other existing PTS schemes.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"360-370"},"PeriodicalIF":3.2,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-15DOI: 10.1109/TBC.2024.3475811
Yan Zhao;Chen Zhu;Jun Xu;Guo Lu;Li Song;Siwei Ma
Recently, numerous complexity control approaches have been proposed to achieve the target encoding complexity. However, only few of them were developed for VVC encoders. This paper fills this gap by proposing an efficient and flexible complexity control approach for VVC. The support for both Acceleration Ratio Control (ARC) and Encoding Time Control (ETC) makes our method highly versatile for various applications. At first, we introduce a sequence-level complexity estimation model to merge the ARC and ETC tasks. Then, four key modules are involved for complexity control: complexity allocation, complexity estimation, encoding configuration decision, and feedback. Specifically, we hierarchically allocate the complexity budget to three coding levels: GOP, frame, and Basic Unit (BU). Each BU’s allocation weight is decided by its SSIM distortion, whereby the perceptual quality can be ensured. The multi-complexity configurations are established by altering the partition depth and number of reference frames. Via tuning each BU’s configuration according to its target acceleration ratio and adaptively updating the control strategies based on the feedback, our scheme can precisely realize any achievable acceleration targets within one-pass encoding. Moreover, each BU’s un-accelerated reference encoding time, which is used to calculate its target acceleration ratio, is estimated by SVR models. Experiments prove that for both the ARC and ETC tasks, our scheme can precisely achieve a wide range of complexity targets (30% $sim ~100$ %) with negligible RD loss in PSNR and SSIM, outperforming other state-of-the-art methods.
{"title":"An Efficient and Flexible Complexity Control Method for Versatile Video Coding","authors":"Yan Zhao;Chen Zhu;Jun Xu;Guo Lu;Li Song;Siwei Ma","doi":"10.1109/TBC.2024.3475811","DOIUrl":"https://doi.org/10.1109/TBC.2024.3475811","url":null,"abstract":"Recently, numerous complexity control approaches have been proposed to achieve the target encoding complexity. However, only few of them were developed for VVC encoders. This paper fills this gap by proposing an efficient and flexible complexity control approach for VVC. The support for both Acceleration Ratio Control (ARC) and Encoding Time Control (ETC) makes our method highly versatile for various applications. At first, we introduce a sequence-level complexity estimation model to merge the ARC and ETC tasks. Then, four key modules are involved for complexity control: complexity allocation, complexity estimation, encoding configuration decision, and feedback. Specifically, we hierarchically allocate the complexity budget to three coding levels: GOP, frame, and Basic Unit (BU). Each BU’s allocation weight is decided by its SSIM distortion, whereby the perceptual quality can be ensured. The multi-complexity configurations are established by altering the partition depth and number of reference frames. Via tuning each BU’s configuration according to its target acceleration ratio and adaptively updating the control strategies based on the feedback, our scheme can precisely realize any achievable acceleration targets within one-pass encoding. Moreover, each BU’s un-accelerated reference encoding time, which is used to calculate its target acceleration ratio, is estimated by SVR models. Experiments prove that for both the ARC and ETC tasks, our scheme can precisely achieve a wide range of complexity targets (30% <inline-formula> <tex-math>$sim ~100$ </tex-math></inline-formula>%) with negligible RD loss in PSNR and SSIM, outperforming other state-of-the-art methods.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"96-110"},"PeriodicalIF":3.2,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-15DOI: 10.1109/TBC.2024.3475743
Xinze Lyu;Sundar Aditya;Bruno Clerckx
Multi-group multicast (MGM) is an increasingly important form of multi-user wireless communications with several potential applications, such as video streaming, federated learning, safety-critical vehicular communications, etc. Rate-Splitting Multiple Access (RSMA) is a powerful interference management technique that can, in principle, achieve higher data rates and greater fairness for all types of multi-user wireless communications, including MGM. This paper presents the first-ever experimental evaluation of RSMA-based MGM, as well as the first-ever three-way comparison of RSMA-based, Space Division Multiple Access (SDMA)-based and Non-Orthogonal Multiple Access (NOMA)-based MGM. Using a measurement setup involving a two-antenna transmitter and two groups of two single-antenna users per group, we consider the problem of realizing throughput (max-min) fairness across groups for each of three multiple access schemes, over nine experimental cases in a line-of-sight environment capturing varying levels of pathloss difference and channel correlation across the groups. Over these cases, we observe that RSMA-based MGM achieves fairness at a higher throughput for each group than SDMA- and NOMA-based MGM. These findings validate RSMA-based MGM’s promised gains from the theoretical literature.
{"title":"Rate-Splitting Multiple Access for Overloaded Multi-Group Multicast: A First Experimental Study","authors":"Xinze Lyu;Sundar Aditya;Bruno Clerckx","doi":"10.1109/TBC.2024.3475743","DOIUrl":"https://doi.org/10.1109/TBC.2024.3475743","url":null,"abstract":"Multi-group multicast (MGM) is an increasingly important form of multi-user wireless communications with several potential applications, such as video streaming, federated learning, safety-critical vehicular communications, etc. Rate-Splitting Multiple Access (RSMA) is a powerful interference management technique that can, in principle, achieve higher data rates and greater fairness for all types of multi-user wireless communications, including MGM. This paper presents the first-ever experimental evaluation of RSMA-based MGM, as well as the first-ever three-way comparison of RSMA-based, Space Division Multiple Access (SDMA)-based and Non-Orthogonal Multiple Access (NOMA)-based MGM. Using a measurement setup involving a two-antenna transmitter and two groups of two single-antenna users per group, we consider the problem of realizing throughput (max-min) fairness across groups for each of three multiple access schemes, over nine experimental cases in a line-of-sight environment capturing varying levels of pathloss difference and channel correlation across the groups. Over these cases, we observe that RSMA-based MGM achieves fairness at a higher throughput for each group than SDMA- and NOMA-based MGM. These findings validate RSMA-based MGM’s promised gains from the theoretical literature.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"30-41"},"PeriodicalIF":3.2,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the third generation of the Audio Video Coding Standard (AVS3), the size of Coding Tree Units (CTUs) has been expanded to four times larger than the previous generation, and more Coding Unit (CU) partition modes have been introduced, enhancing adaptability and efficiency in video encoding. CU partition in AVS3 not only brings improvements in encoding performance but also significantly increases the computational complexity, posing substantial challenges to real-time encoding. We propose a fast algorithm for CU partition, which features adaptive tree search and pruning optimization. Firstly, it adjusts the tree search order based on neighbor CU and lookahead information. Specifically, the analysis order of sub-blocks and parent blocks is adaptively adjusted: the potential optimal partition is prioritized, the non-optimal partitions are deferred, and an optimized order of first-full-then-sub or first-sub-then-full is selected. Secondly, the pruning optimization algorithm utilizes analyzed information to skip non-optimal partitions to reduce computational complexity. Due to the adjusted tree search order and the prioritization of potential optimal partitions, more analyzed information becomes available when evaluating non-optimal partitions, thereby improving the recall and precision rates of non-optimal partitions detection, saving more time, and introducing negligible loss in coding performance. The proposed algorithm has been implemented in the open-source encoder uavs3e. Experimental results indicate that under the three encoding configurations of AI, LD B, and RA, the algorithm achieves significant time saving of 51.41%, 40.57%, and 40.57%, with BDBR increases of 0.64%, 1.61%, and 1.04%, respectively. These results outperform the state-of-the-art fast CU partition algorithms.
{"title":"A Fast CU Partition Algorithm for AVS3 Based on Adaptive Tree Search and Pruning Optimization","authors":"Jihang Yin;Honggang Qi;Liang Zhong;Zhiyuan Zhao;Qiang Wang;Jingran Wu;Xianguo Zhang","doi":"10.1109/TBC.2024.3465838","DOIUrl":"https://doi.org/10.1109/TBC.2024.3465838","url":null,"abstract":"In the third generation of the Audio Video Coding Standard (AVS3), the size of Coding Tree Units (CTUs) has been expanded to four times larger than the previous generation, and more Coding Unit (CU) partition modes have been introduced, enhancing adaptability and efficiency in video encoding. CU partition in AVS3 not only brings improvements in encoding performance but also significantly increases the computational complexity, posing substantial challenges to real-time encoding. We propose a fast algorithm for CU partition, which features adaptive tree search and pruning optimization. Firstly, it adjusts the tree search order based on neighbor CU and lookahead information. Specifically, the analysis order of sub-blocks and parent blocks is adaptively adjusted: the potential optimal partition is prioritized, the non-optimal partitions are deferred, and an optimized order of first-full-then-sub or first-sub-then-full is selected. Secondly, the pruning optimization algorithm utilizes analyzed information to skip non-optimal partitions to reduce computational complexity. Due to the adjusted tree search order and the prioritization of potential optimal partitions, more analyzed information becomes available when evaluating non-optimal partitions, thereby improving the recall and precision rates of non-optimal partitions detection, saving more time, and introducing negligible loss in coding performance. The proposed algorithm has been implemented in the open-source encoder uavs3e. Experimental results indicate that under the three encoding configurations of AI, LD B, and RA, the algorithm achieves significant time saving of 51.41%, 40.57%, and 40.57%, with BDBR increases of 0.64%, 1.61%, and 1.04%, respectively. These results outperform the state-of-the-art fast CU partition algorithms.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"125-141"},"PeriodicalIF":3.2,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Blind image quality assessment (BIQA) is a subjective perception-driven task, which necessitates assessment results consistent with human cognition. The human cognitive system inherently involves both separation and integration mechanisms. Recent works have witnessed the success of deep learning methods in separating distortion features. Nonetheless, traditional deep-learning-based BIQA methods predominantly depend on fixed topology to mimic the information integration in the brain, which gives rise to scale sensitivity and low flexibility. To handle this challenge, we delve into the dynamic interactions among neurons and propose a cognition-inspired BIQA model. Drawing insights from the rich club structure in network neuroscience, a graph-inspired feature integrator is devised to reconstruct the network topology. Specifically, we argue that the activity of individual neurons (pixels) tends to exhibit a random fluctuation with ambiguous meaning, while clear and coherent cognition arises from neurons with high connectivity (rich-nodes). Therefore, a self-attention mechanism is employed to establish strong semantic associations between pixels and rich-nodes. Subsequently, we design intra- and inter-layer graph structures to promote the feature interaction across spatial and scale dimensions. Such dynamic circuits endow the BIQA method with efficient, flexible, and robust information processing capabilities, so as to achieve more human-subjective assessment results. Moreover, since the limited samples in existing IQA datasets are prone to model overfitting, we devise two prior hypotheses: frequency prior and ranking prior. The former stepwise augments high-frequency components that reflect the distortion degree during the multilevel feature extraction, while the latter seeks to motivate the model’s in-depth comprehension of differences in sample quality. Extensive experiments on five publicly datasets reveal that the proposed algorithm achieves competitive results.
{"title":"From Pixels to Rich-Nodes: A Cognition-Inspired Framework for Blind Image Quality Assessment","authors":"Tian He;Lin Shi;Wenjia Xu;Yu Wang;Weijie Qiu;Houbang Guo;Zhuqing Jiang","doi":"10.1109/TBC.2024.3464418","DOIUrl":"https://doi.org/10.1109/TBC.2024.3464418","url":null,"abstract":"Blind image quality assessment (BIQA) is a subjective perception-driven task, which necessitates assessment results consistent with human cognition. The human cognitive system inherently involves both separation and integration mechanisms. Recent works have witnessed the success of deep learning methods in separating distortion features. Nonetheless, traditional deep-learning-based BIQA methods predominantly depend on fixed topology to mimic the information integration in the brain, which gives rise to scale sensitivity and low flexibility. To handle this challenge, we delve into the dynamic interactions among neurons and propose a cognition-inspired BIQA model. Drawing insights from the rich club structure in network neuroscience, a graph-inspired feature integrator is devised to reconstruct the network topology. Specifically, we argue that the activity of individual neurons (pixels) tends to exhibit a random fluctuation with ambiguous meaning, while clear and coherent cognition arises from neurons with high connectivity (rich-nodes). Therefore, a self-attention mechanism is employed to establish strong semantic associations between pixels and rich-nodes. Subsequently, we design intra- and inter-layer graph structures to promote the feature interaction across spatial and scale dimensions. Such dynamic circuits endow the BIQA method with efficient, flexible, and robust information processing capabilities, so as to achieve more human-subjective assessment results. Moreover, since the limited samples in existing IQA datasets are prone to model overfitting, we devise two prior hypotheses: frequency prior and ranking prior. The former stepwise augments high-frequency components that reflect the distortion degree during the multilevel feature extraction, while the latter seeks to motivate the model’s in-depth comprehension of differences in sample quality. Extensive experiments on five publicly datasets reveal that the proposed algorithm achieves competitive results.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"229-239"},"PeriodicalIF":3.2,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10706639","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-27DOI: 10.1109/TBC.2024.3464413
Zhaoqing Pan;Guoyu Zhang;Bo Peng;Jianjun Lei;Haoran Xie;Fu Lee Wang;Nam Ling
Existing human visual perception-oriented image compression methods well maintain the perceptual quality of compressed images, but they may introduce fake details into the compressed images, and cannot dynamically improve the perceptual rate-distortion performance at the pixel level. To address these issues, a just noticeable difference (JND)-based learned image compression (JND-LIC) method is proposed for human visual perception in this paper, in which a weight-shared model is used to extract image features and JND features, and the learned JND features are utilized as perceptual prior knowledge to assist the image coding process. In order to generate a highly compact image feature representation, a JND-based feature transform module is proposed to model the pixel-to-pixel masking correlation between the image features and the JND features. Furthermore, inspired by eye movement research that the human visual system perceives image degradation unevenly, a JND-guided quantization mechanism is proposed for the entropy coding, which adjusts the quantization step of each pixel to further eliminate perceptual redundancies. Extensive experimental results show that our proposed JND-LIC significantly improves the perceptual quality of compressed images with fewer coding bits compared to state-of-the-art learned image compression methods. Additionally, the proposed method can be flexibly integrated with various advanced learned image compression methods, and has robust generalization capabilities to improve the efficiency of perceptual coding.
{"title":"JND-LIC: Learned Image Compression via Just Noticeable Difference for Human Visual Perception","authors":"Zhaoqing Pan;Guoyu Zhang;Bo Peng;Jianjun Lei;Haoran Xie;Fu Lee Wang;Nam Ling","doi":"10.1109/TBC.2024.3464413","DOIUrl":"https://doi.org/10.1109/TBC.2024.3464413","url":null,"abstract":"Existing human visual perception-oriented image compression methods well maintain the perceptual quality of compressed images, but they may introduce fake details into the compressed images, and cannot dynamically improve the perceptual rate-distortion performance at the pixel level. To address these issues, a just noticeable difference (JND)-based learned image compression (JND-LIC) method is proposed for human visual perception in this paper, in which a weight-shared model is used to extract image features and JND features, and the learned JND features are utilized as perceptual prior knowledge to assist the image coding process. In order to generate a highly compact image feature representation, a JND-based feature transform module is proposed to model the pixel-to-pixel masking correlation between the image features and the JND features. Furthermore, inspired by eye movement research that the human visual system perceives image degradation unevenly, a JND-guided quantization mechanism is proposed for the entropy coding, which adjusts the quantization step of each pixel to further eliminate perceptual redundancies. Extensive experimental results show that our proposed JND-LIC significantly improves the perceptual quality of compressed images with fewer coding bits compared to state-of-the-art learned image compression methods. Additionally, the proposed method can be flexibly integrated with various advanced learned image compression methods, and has robust generalization capabilities to improve the efficiency of perceptual coding.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"217-228"},"PeriodicalIF":3.2,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Point cloud has been the mainstream representation for advanced 3D applications, such as virtual reality and augmented reality. However, the massive data amounts of point clouds is one of the most challenging issues for transmission and storage. In this paper, we propose an end-to-end voxel Transformer and Sparse Convolution based Point Cloud Attribute Compression (TSC-PCAC) for 3D broadcasting. Firstly, we present a framework of the TSC-PCAC, which includes Transformer and Sparse Convolutional Module (TSCM) based variational autoencoder and channel context module. Secondly, we propose a two-stage TSCM, where the first stage focuses on modeling local dependencies and feature representations of the point clouds, and the second stage captures global features through spatial and channel pooling encompassing larger receptive fields. This module effectively extracts global and local inter-point relevance to reduce informational redundancy. Thirdly, we design a TSCM based channel context module to exploit inter-channel correlations, which improves the predicted probability distribution of quantized latent representations and thus reduces the bitrate. Experimental results indicate that the proposed TSC-PCAC method achieves an average of 38.53%, 21.30%, and 11.19% bitrate reductions on datasets 8iVFB, Owlii, 8iVSLF, Volograms, and MVUB compared to the Sparse-PCAC, NF-PCAC, and G-PCC v23 methods, respectively. The encoding/decoding time costs are reduced 97.68%/98.78% on average compared to the Sparse-PCAC. The source code and the trained TSC-PCAC models are available at https://github.com/igizuxo/TSC-PCAC.
{"title":"TSC-PCAC: Voxel Transformer and Sparse Convolution-Based Point Cloud Attribute Compression for 3D Broadcasting","authors":"Zixi Guo;Yun Zhang;Linwei Zhu;Hanli Wang;Gangyi Jiang","doi":"10.1109/TBC.2024.3464417","DOIUrl":"https://doi.org/10.1109/TBC.2024.3464417","url":null,"abstract":"Point cloud has been the mainstream representation for advanced 3D applications, such as virtual reality and augmented reality. However, the massive data amounts of point clouds is one of the most challenging issues for transmission and storage. In this paper, we propose an end-to-end voxel Transformer and Sparse Convolution based Point Cloud Attribute Compression (TSC-PCAC) for 3D broadcasting. Firstly, we present a framework of the TSC-PCAC, which includes Transformer and Sparse Convolutional Module (TSCM) based variational autoencoder and channel context module. Secondly, we propose a two-stage TSCM, where the first stage focuses on modeling local dependencies and feature representations of the point clouds, and the second stage captures global features through spatial and channel pooling encompassing larger receptive fields. This module effectively extracts global and local inter-point relevance to reduce informational redundancy. Thirdly, we design a TSCM based channel context module to exploit inter-channel correlations, which improves the predicted probability distribution of quantized latent representations and thus reduces the bitrate. Experimental results indicate that the proposed TSC-PCAC method achieves an average of 38.53%, 21.30%, and 11.19% bitrate reductions on datasets 8iVFB, Owlii, 8iVSLF, Volograms, and MVUB compared to the Sparse-PCAC, NF-PCAC, and G-PCC v23 methods, respectively. The encoding/decoding time costs are reduced 97.68%/98.78% on average compared to the Sparse-PCAC. The source code and the trained TSC-PCAC models are available at <uri>https://github.com/igizuxo/TSC-PCAC</uri>.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"154-166"},"PeriodicalIF":3.2,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-19DOI: 10.1109/TBC.2024.3417228
Iñigo Bilbao;Eneko Iradier;Jon Montalban;Pablo Angueira;Sung-Ik Park
Artificial Intelligence (AI) and Machine Learning (ML) approaches have emerged as viable alternatives to conventional Physical Layer (PHY) signal processing methods. Specifically, in any wireless point-to-multipoint communication, accurate channel estimation plays a pivotal role in exploiting spectrum efficiency with functionalities such as higher-order modulation or full-duplex communication. This research paper proposes leveraging ML solutions, including Convolutional Neural Networks (CNNs) and Multilayer Perceptrons (MLPs), to enhance channel estimation within broadcast environments. Each architecture is instantiated using distinct procedures, focusing on two fundamental approaches: channel estimation denoising and ML-assisted pilot interpolation. Rigorous evaluations are conducted across diverse configurations and conditions, spanning rural areas and co-channel interference scenarios. The results demonstrate that MLP and CNN architectures consistently outperform classical methods, yielding 10 and 20 dB performance improvements, respectively. These results underscore the efficacy of ML-driven approaches in advancing channel estimation capabilities for broadcast communication systems.
{"title":"Enhancing Channel Estimation in Terrestrial Broadcast Communications Using Machine Learning","authors":"Iñigo Bilbao;Eneko Iradier;Jon Montalban;Pablo Angueira;Sung-Ik Park","doi":"10.1109/TBC.2024.3417228","DOIUrl":"https://doi.org/10.1109/TBC.2024.3417228","url":null,"abstract":"Artificial Intelligence (AI) and Machine Learning (ML) approaches have emerged as viable alternatives to conventional Physical Layer (PHY) signal processing methods. Specifically, in any wireless point-to-multipoint communication, accurate channel estimation plays a pivotal role in exploiting spectrum efficiency with functionalities such as higher-order modulation or full-duplex communication. This research paper proposes leveraging ML solutions, including Convolutional Neural Networks (CNNs) and Multilayer Perceptrons (MLPs), to enhance channel estimation within broadcast environments. Each architecture is instantiated using distinct procedures, focusing on two fundamental approaches: channel estimation denoising and ML-assisted pilot interpolation. Rigorous evaluations are conducted across diverse configurations and conditions, spanning rural areas and co-channel interference scenarios. The results demonstrate that MLP and CNN architectures consistently outperform classical methods, yielding 10 and 20 dB performance improvements, respectively. These results underscore the efficacy of ML-driven approaches in advancing channel estimation capabilities for broadcast communication systems.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 4","pages":"1181-1191"},"PeriodicalIF":3.2,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-16DOI: 10.1109/TBC.2024.3453631
{"title":"IEEE Transactions on Broadcasting Information for Authors","authors":"","doi":"10.1109/TBC.2024.3453631","DOIUrl":"https://doi.org/10.1109/TBC.2024.3453631","url":null,"abstract":"","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 3","pages":"C3-C4"},"PeriodicalIF":3.2,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10680489","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142235706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}