IEEE Transactions on Broadcasting最新文献

英文中文

Rate Control for Geometry-Based LiDAR Point Cloud Compression via Multi-Factor Modeling

IF 3.2 1区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Broadcasting

Pub Date : 2024-10-25 DOI: 10.1109/TBC.2024.3475808

Lizhi Hou;Linyao Gao;Qian Zhang;Yiling Xu;Jenq-Neng Hwang;Dong Wang

Geometry-based Point Cloud Compression (G-PCC) standard developed by the Moving Picture Experts Group has shown a promising prospect for compressing extremely sparse point clouds captured by the Light Detection And Ranging (LiDAR) equipment. However, as an essential functionality for low delay and limited bandwidth transmission, rate control for Geometry-based LiDAR Point Cloud Compression (G-LPCC) has not been fully studied. In this paper, we propose a rate control scheme for G-LPCC. We first adopt the best configuration of G-PCC for the LiDAR point cloud as the basis in terms of the Rate-Distortion (R-D) performance, which is the predictive tree (PT) for geometry compression and Region Adaptive Haar Transform (RAHT) for attribute compression. The common challenge of designing rate control algorithms for PT and RAHT is that their rates are determined by multiple factors. To address that, we propose a l domain rate control algorithm for PT that unifies the various geometry influential factors in the expression of the minimum arc length

$mathrm {d}l$

to determine the final rate. A power-style geometry rate curve characterized by

$mathrm {d}l$

has been modeled. By analyzing the distortion behavior of different quantization parameters, an adaptive bitrate control method is proposed to improve the R-D performance. In addition, we borrow the

$rho $

factor from the previous 2D video rate control and successfully apply it to RAHT rate control. A simple and clean linear attribute rate curve characterized by

$rho $

has been modeled, and a corresponding parameter estimation method based on the cumulative distribution function is proposed for bitrate control. The experimental results demonstrate that the proposed rate control algorithm can achieve accurate rate control with additional Bjontegaard-Delta-rate (BD-rate) gains.

{"title":"Rate Control for Geometry-Based LiDAR Point Cloud Compression via Multi-Factor Modeling","authors":"Lizhi Hou;Linyao Gao;Qian Zhang;Yiling Xu;Jenq-Neng Hwang;Dong Wang","doi":"10.1109/TBC.2024.3475808","DOIUrl":"https://doi.org/10.1109/TBC.2024.3475808","url":null,"abstract":"Geometry-based Point Cloud Compression (G-PCC) standard developed by the Moving Picture Experts Group has shown a promising prospect for compressing extremely sparse point clouds captured by the Light Detection And Ranging (LiDAR) equipment. However, as an essential functionality for low delay and limited bandwidth transmission, rate control for Geometry-based LiDAR Point Cloud Compression (G-LPCC) has not been fully studied. In this paper, we propose a rate control scheme for G-LPCC. We first adopt the best configuration of G-PCC for the LiDAR point cloud as the basis in terms of the Rate-Distortion (R-D) performance, which is the predictive tree (PT) for geometry compression and Region Adaptive Haar Transform (RAHT) for attribute compression. The common challenge of designing rate control algorithms for PT and RAHT is that their rates are determined by multiple factors. To address that, we propose a <italic>l</i> domain rate control algorithm for PT that unifies the various geometry influential factors in the expression of the minimum arc length <inline-formula> <tex-math>$mathrm {d}l$ </tex-math></inline-formula> to determine the final rate. A power-style geometry rate curve characterized by <inline-formula> <tex-math>$mathrm {d}l$ </tex-math></inline-formula> has been modeled. By analyzing the distortion behavior of different quantization parameters, an adaptive bitrate control method is proposed to improve the R-D performance. In addition, we borrow the <inline-formula> <tex-math>$rho $ </tex-math></inline-formula> factor from the previous 2D video rate control and successfully apply it to RAHT rate control. A simple and clean linear attribute rate curve characterized by <inline-formula> <tex-math>$rho $ </tex-math></inline-formula> has been modeled, and a corresponding parameter estimation method based on the cumulative distribution function is proposed for bitrate control. The experimental results demonstrate that the proposed rate control algorithm can achieve accurate rate control with additional Bjontegaard-Delta-rate (BD-rate) gains.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"167-179"},"PeriodicalIF":3.2,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimized Canceling Signals for PTS Schemes to Improve the PAPR of OFDM Systems Without Side Information

IF 3.2 1区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Broadcasting

Pub Date : 2024-10-16 DOI: 10.1109/TBC.2024.3475748

The Khai Nguyen;Ebrahim Bedeer;Ha H. Nguyen;J. Eric Salt;Colin Howlett

This paper introduces a novel blind partial transmission sequence (PTS) scheme to lower the peak-to-average-power ratio (PAPR) of orthogonal frequency division multiplexing (OFDM) systems. Unlike existing PTS schemes in which the first sub-block (SB) is preserved as a phase reference for other SBs, we propose to add an optimized canceling signal (CS) to the first SB to further reduce the PAPR. The CS is designed such that they can be reconstructed by the receiver, and subtracted from the received signals before demodulation without requiring side information (SI). Since errors in reproducing the CS at the receiver can degrade the error performance, we design a novel CS protection mechanism specifically to protect the reconstruction of the CS. The proposed method is shown to significantly reduce the PAPR and symbol error rate (SER) without sacrificing the data rate due to using SI as many other existing PTS schemes.

引用次数: 0

An Efficient and Flexible Complexity Control Method for Versatile Video Coding

IF 3.2 1区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Broadcasting

Pub Date : 2024-10-15 DOI: 10.1109/TBC.2024.3475811

Yan Zhao;Chen Zhu;Jun Xu;Guo Lu;Li Song;Siwei Ma

Recently, numerous complexity control approaches have been proposed to achieve the target encoding complexity. However, only few of them were developed for VVC encoders. This paper fills this gap by proposing an efficient and flexible complexity control approach for VVC. The support for both Acceleration Ratio Control (ARC) and Encoding Time Control (ETC) makes our method highly versatile for various applications. At first, we introduce a sequence-level complexity estimation model to merge the ARC and ETC tasks. Then, four key modules are involved for complexity control: complexity allocation, complexity estimation, encoding configuration decision, and feedback. Specifically, we hierarchically allocate the complexity budget to three coding levels: GOP, frame, and Basic Unit (BU). Each BU’s allocation weight is decided by its SSIM distortion, whereby the perceptual quality can be ensured. The multi-complexity configurations are established by altering the partition depth and number of reference frames. Via tuning each BU’s configuration according to its target acceleration ratio and adaptively updating the control strategies based on the feedback, our scheme can precisely realize any achievable acceleration targets within one-pass encoding. Moreover, each BU’s un-accelerated reference encoding time, which is used to calculate its target acceleration ratio, is estimated by SVR models. Experiments prove that for both the ARC and ETC tasks, our scheme can precisely achieve a wide range of complexity targets (30%

$sim ~100$

%) with negligible RD loss in PSNR and SSIM, outperforming other state-of-the-art methods.

{"title":"An Efficient and Flexible Complexity Control Method for Versatile Video Coding","authors":"Yan Zhao;Chen Zhu;Jun Xu;Guo Lu;Li Song;Siwei Ma","doi":"10.1109/TBC.2024.3475811","DOIUrl":"https://doi.org/10.1109/TBC.2024.3475811","url":null,"abstract":"Recently, numerous complexity control approaches have been proposed to achieve the target encoding complexity. However, only few of them were developed for VVC encoders. This paper fills this gap by proposing an efficient and flexible complexity control approach for VVC. The support for both Acceleration Ratio Control (ARC) and Encoding Time Control (ETC) makes our method highly versatile for various applications. At first, we introduce a sequence-level complexity estimation model to merge the ARC and ETC tasks. Then, four key modules are involved for complexity control: complexity allocation, complexity estimation, encoding configuration decision, and feedback. Specifically, we hierarchically allocate the complexity budget to three coding levels: GOP, frame, and Basic Unit (BU). Each BU’s allocation weight is decided by its SSIM distortion, whereby the perceptual quality can be ensured. The multi-complexity configurations are established by altering the partition depth and number of reference frames. Via tuning each BU’s configuration according to its target acceleration ratio and adaptively updating the control strategies based on the feedback, our scheme can precisely realize any achievable acceleration targets within one-pass encoding. Moreover, each BU’s un-accelerated reference encoding time, which is used to calculate its target acceleration ratio, is estimated by SVR models. Experiments prove that for both the ARC and ETC tasks, our scheme can precisely achieve a wide range of complexity targets (30% <inline-formula> <tex-math>$sim ~100$ </tex-math></inline-formula>%) with negligible RD loss in PSNR and SSIM, outperforming other state-of-the-art methods.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"96-110"},"PeriodicalIF":3.2,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Rate-Splitting Multiple Access for Overloaded Multi-Group Multicast: A First Experimental Study

IF 3.2 1区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Broadcasting

Pub Date : 2024-10-15 DOI: 10.1109/TBC.2024.3475743

Xinze Lyu;Sundar Aditya;Bruno Clerckx

Multi-group multicast (MGM) is an increasingly important form of multi-user wireless communications with several potential applications, such as video streaming, federated learning, safety-critical vehicular communications, etc. Rate-Splitting Multiple Access (RSMA) is a powerful interference management technique that can, in principle, achieve higher data rates and greater fairness for all types of multi-user wireless communications, including MGM. This paper presents the first-ever experimental evaluation of RSMA-based MGM, as well as the first-ever three-way comparison of RSMA-based, Space Division Multiple Access (SDMA)-based and Non-Orthogonal Multiple Access (NOMA)-based MGM. Using a measurement setup involving a two-antenna transmitter and two groups of two single-antenna users per group, we consider the problem of realizing throughput (max-min) fairness across groups for each of three multiple access schemes, over nine experimental cases in a line-of-sight environment capturing varying levels of pathloss difference and channel correlation across the groups. Over these cases, we observe that RSMA-based MGM achieves fairness at a higher throughput for each group than SDMA- and NOMA-based MGM. These findings validate RSMA-based MGM’s promised gains from the theoretical literature.

{"title":"Rate-Splitting Multiple Access for Overloaded Multi-Group Multicast: A First Experimental Study","authors":"Xinze Lyu;Sundar Aditya;Bruno Clerckx","doi":"10.1109/TBC.2024.3475743","DOIUrl":"https://doi.org/10.1109/TBC.2024.3475743","url":null,"abstract":"Multi-group multicast (MGM) is an increasingly important form of multi-user wireless communications with several potential applications, such as video streaming, federated learning, safety-critical vehicular communications, etc. Rate-Splitting Multiple Access (RSMA) is a powerful interference management technique that can, in principle, achieve higher data rates and greater fairness for all types of multi-user wireless communications, including MGM. This paper presents the first-ever experimental evaluation of RSMA-based MGM, as well as the first-ever three-way comparison of RSMA-based, Space Division Multiple Access (SDMA)-based and Non-Orthogonal Multiple Access (NOMA)-based MGM. Using a measurement setup involving a two-antenna transmitter and two groups of two single-antenna users per group, we consider the problem of realizing throughput (max-min) fairness across groups for each of three multiple access schemes, over nine experimental cases in a line-of-sight environment capturing varying levels of pathloss difference and channel correlation across the groups. Over these cases, we observe that RSMA-based MGM achieves fairness at a higher throughput for each group than SDMA- and NOMA-based MGM. These findings validate RSMA-based MGM’s promised gains from the theoretical literature.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"30-41"},"PeriodicalIF":3.2,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Fast CU Partition Algorithm for AVS3 Based on Adaptive Tree Search and Pruning Optimization

IF 3.2 1区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Broadcasting

Pub Date : 2024-10-08 DOI: 10.1109/TBC.2024.3465838

Jihang Yin;Honggang Qi;Liang Zhong;Zhiyuan Zhao;Qiang Wang;Jingran Wu;Xianguo Zhang

In the third generation of the Audio Video Coding Standard (AVS3), the size of Coding Tree Units (CTUs) has been expanded to four times larger than the previous generation, and more Coding Unit (CU) partition modes have been introduced, enhancing adaptability and efficiency in video encoding. CU partition in AVS3 not only brings improvements in encoding performance but also significantly increases the computational complexity, posing substantial challenges to real-time encoding. We propose a fast algorithm for CU partition, which features adaptive tree search and pruning optimization. Firstly, it adjusts the tree search order based on neighbor CU and lookahead information. Specifically, the analysis order of sub-blocks and parent blocks is adaptively adjusted: the potential optimal partition is prioritized, the non-optimal partitions are deferred, and an optimized order of first-full-then-sub or first-sub-then-full is selected. Secondly, the pruning optimization algorithm utilizes analyzed information to skip non-optimal partitions to reduce computational complexity. Due to the adjusted tree search order and the prioritization of potential optimal partitions, more analyzed information becomes available when evaluating non-optimal partitions, thereby improving the recall and precision rates of non-optimal partitions detection, saving more time, and introducing negligible loss in coding performance. The proposed algorithm has been implemented in the open-source encoder uavs3e. Experimental results indicate that under the three encoding configurations of AI, LD B, and RA, the algorithm achieves significant time saving of 51.41%, 40.57%, and 40.57%, with BDBR increases of 0.64%, 1.61%, and 1.04%, respectively. These results outperform the state-of-the-art fast CU partition algorithms.

{"title":"A Fast CU Partition Algorithm for AVS3 Based on Adaptive Tree Search and Pruning Optimization","authors":"Jihang Yin;Honggang Qi;Liang Zhong;Zhiyuan Zhao;Qiang Wang;Jingran Wu;Xianguo Zhang","doi":"10.1109/TBC.2024.3465838","DOIUrl":"https://doi.org/10.1109/TBC.2024.3465838","url":null,"abstract":"In the third generation of the Audio Video Coding Standard (AVS3), the size of Coding Tree Units (CTUs) has been expanded to four times larger than the previous generation, and more Coding Unit (CU) partition modes have been introduced, enhancing adaptability and efficiency in video encoding. CU partition in AVS3 not only brings improvements in encoding performance but also significantly increases the computational complexity, posing substantial challenges to real-time encoding. We propose a fast algorithm for CU partition, which features adaptive tree search and pruning optimization. Firstly, it adjusts the tree search order based on neighbor CU and lookahead information. Specifically, the analysis order of sub-blocks and parent blocks is adaptively adjusted: the potential optimal partition is prioritized, the non-optimal partitions are deferred, and an optimized order of first-full-then-sub or first-sub-then-full is selected. Secondly, the pruning optimization algorithm utilizes analyzed information to skip non-optimal partitions to reduce computational complexity. Due to the adjusted tree search order and the prioritization of potential optimal partitions, more analyzed information becomes available when evaluating non-optimal partitions, thereby improving the recall and precision rates of non-optimal partitions detection, saving more time, and introducing negligible loss in coding performance. The proposed algorithm has been implemented in the open-source encoder uavs3e. Experimental results indicate that under the three encoding configurations of AI, LD B, and RA, the algorithm achieves significant time saving of 51.41%, 40.57%, and 40.57%, with BDBR increases of 0.64%, 1.61%, and 1.04%, respectively. These results outperform the state-of-the-art fast CU partition algorithms.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"125-141"},"PeriodicalIF":3.2,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

From Pixels to Rich-Nodes: A Cognition-Inspired Framework for Blind Image Quality Assessment

IF 3.2 1区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Broadcasting

Pub Date : 2024-10-07 DOI: 10.1109/TBC.2024.3464418

Tian He;Lin Shi;Wenjia Xu;Yu Wang;Weijie Qiu;Houbang Guo;Zhuqing Jiang

Blind image quality assessment (BIQA) is a subjective perception-driven task, which necessitates assessment results consistent with human cognition. The human cognitive system inherently involves both separation and integration mechanisms. Recent works have witnessed the success of deep learning methods in separating distortion features. Nonetheless, traditional deep-learning-based BIQA methods predominantly depend on fixed topology to mimic the information integration in the brain, which gives rise to scale sensitivity and low flexibility. To handle this challenge, we delve into the dynamic interactions among neurons and propose a cognition-inspired BIQA model. Drawing insights from the rich club structure in network neuroscience, a graph-inspired feature integrator is devised to reconstruct the network topology. Specifically, we argue that the activity of individual neurons (pixels) tends to exhibit a random fluctuation with ambiguous meaning, while clear and coherent cognition arises from neurons with high connectivity (rich-nodes). Therefore, a self-attention mechanism is employed to establish strong semantic associations between pixels and rich-nodes. Subsequently, we design intra- and inter-layer graph structures to promote the feature interaction across spatial and scale dimensions. Such dynamic circuits endow the BIQA method with efficient, flexible, and robust information processing capabilities, so as to achieve more human-subjective assessment results. Moreover, since the limited samples in existing IQA datasets are prone to model overfitting, we devise two prior hypotheses: frequency prior and ranking prior. The former stepwise augments high-frequency components that reflect the distortion degree during the multilevel feature extraction, while the latter seeks to motivate the model’s in-depth comprehension of differences in sample quality. Extensive experiments on five publicly datasets reveal that the proposed algorithm achieves competitive results.

{"title":"From Pixels to Rich-Nodes: A Cognition-Inspired Framework for Blind Image Quality Assessment","authors":"Tian He;Lin Shi;Wenjia Xu;Yu Wang;Weijie Qiu;Houbang Guo;Zhuqing Jiang","doi":"10.1109/TBC.2024.3464418","DOIUrl":"https://doi.org/10.1109/TBC.2024.3464418","url":null,"abstract":"Blind image quality assessment (BIQA) is a subjective perception-driven task, which necessitates assessment results consistent with human cognition. The human cognitive system inherently involves both separation and integration mechanisms. Recent works have witnessed the success of deep learning methods in separating distortion features. Nonetheless, traditional deep-learning-based BIQA methods predominantly depend on fixed topology to mimic the information integration in the brain, which gives rise to scale sensitivity and low flexibility. To handle this challenge, we delve into the dynamic interactions among neurons and propose a cognition-inspired BIQA model. Drawing insights from the rich club structure in network neuroscience, a graph-inspired feature integrator is devised to reconstruct the network topology. Specifically, we argue that the activity of individual neurons (pixels) tends to exhibit a random fluctuation with ambiguous meaning, while clear and coherent cognition arises from neurons with high connectivity (rich-nodes). Therefore, a self-attention mechanism is employed to establish strong semantic associations between pixels and rich-nodes. Subsequently, we design intra- and inter-layer graph structures to promote the feature interaction across spatial and scale dimensions. Such dynamic circuits endow the BIQA method with efficient, flexible, and robust information processing capabilities, so as to achieve more human-subjective assessment results. Moreover, since the limited samples in existing IQA datasets are prone to model overfitting, we devise two prior hypotheses: frequency prior and ranking prior. The former stepwise augments high-frequency components that reflect the distortion degree during the multilevel feature extraction, while the latter seeks to motivate the model’s in-depth comprehension of differences in sample quality. Extensive experiments on five publicly datasets reveal that the proposed algorithm achieves competitive results.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"229-239"},"PeriodicalIF":3.2,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10706639","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

JND-LIC: Learned Image Compression via Just Noticeable Difference for Human Visual Perception

IF 3.2 1区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Broadcasting

Pub Date : 2024-09-27 DOI: 10.1109/TBC.2024.3464413

Zhaoqing Pan;Guoyu Zhang;Bo Peng;Jianjun Lei;Haoran Xie;Fu Lee Wang;Nam Ling

Existing human visual perception-oriented image compression methods well maintain the perceptual quality of compressed images, but they may introduce fake details into the compressed images, and cannot dynamically improve the perceptual rate-distortion performance at the pixel level. To address these issues, a just noticeable difference (JND)-based learned image compression (JND-LIC) method is proposed for human visual perception in this paper, in which a weight-shared model is used to extract image features and JND features, and the learned JND features are utilized as perceptual prior knowledge to assist the image coding process. In order to generate a highly compact image feature representation, a JND-based feature transform module is proposed to model the pixel-to-pixel masking correlation between the image features and the JND features. Furthermore, inspired by eye movement research that the human visual system perceives image degradation unevenly, a JND-guided quantization mechanism is proposed for the entropy coding, which adjusts the quantization step of each pixel to further eliminate perceptual redundancies. Extensive experimental results show that our proposed JND-LIC significantly improves the perceptual quality of compressed images with fewer coding bits compared to state-of-the-art learned image compression methods. Additionally, the proposed method can be flexibly integrated with various advanced learned image compression methods, and has robust generalization capabilities to improve the efficiency of perceptual coding.

{"title":"JND-LIC: Learned Image Compression via Just Noticeable Difference for Human Visual Perception","authors":"Zhaoqing Pan;Guoyu Zhang;Bo Peng;Jianjun Lei;Haoran Xie;Fu Lee Wang;Nam Ling","doi":"10.1109/TBC.2024.3464413","DOIUrl":"https://doi.org/10.1109/TBC.2024.3464413","url":null,"abstract":"Existing human visual perception-oriented image compression methods well maintain the perceptual quality of compressed images, but they may introduce fake details into the compressed images, and cannot dynamically improve the perceptual rate-distortion performance at the pixel level. To address these issues, a just noticeable difference (JND)-based learned image compression (JND-LIC) method is proposed for human visual perception in this paper, in which a weight-shared model is used to extract image features and JND features, and the learned JND features are utilized as perceptual prior knowledge to assist the image coding process. In order to generate a highly compact image feature representation, a JND-based feature transform module is proposed to model the pixel-to-pixel masking correlation between the image features and the JND features. Furthermore, inspired by eye movement research that the human visual system perceives image degradation unevenly, a JND-guided quantization mechanism is proposed for the entropy coding, which adjusts the quantization step of each pixel to further eliminate perceptual redundancies. Extensive experimental results show that our proposed JND-LIC significantly improves the perceptual quality of compressed images with fewer coding bits compared to state-of-the-art learned image compression methods. Additionally, the proposed method can be flexibly integrated with various advanced learned image compression methods, and has robust generalization capabilities to improve the efficiency of perceptual coding.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"217-228"},"PeriodicalIF":3.2,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TSC-PCAC: Voxel Transformer and Sparse Convolution-Based Point Cloud Attribute Compression for 3D Broadcasting

IF 3.2 1区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Broadcasting

Pub Date : 2024-09-25 DOI: 10.1109/TBC.2024.3464417

Zixi Guo;Yun Zhang;Linwei Zhu;Hanli Wang;Gangyi Jiang

Point cloud has been the mainstream representation for advanced 3D applications, such as virtual reality and augmented reality. However, the massive data amounts of point clouds is one of the most challenging issues for transmission and storage. In this paper, we propose an end-to-end voxel Transformer and Sparse Convolution based Point Cloud Attribute Compression (TSC-PCAC) for 3D broadcasting. Firstly, we present a framework of the TSC-PCAC, which includes Transformer and Sparse Convolutional Module (TSCM) based variational autoencoder and channel context module. Secondly, we propose a two-stage TSCM, where the first stage focuses on modeling local dependencies and feature representations of the point clouds, and the second stage captures global features through spatial and channel pooling encompassing larger receptive fields. This module effectively extracts global and local inter-point relevance to reduce informational redundancy. Thirdly, we design a TSCM based channel context module to exploit inter-channel correlations, which improves the predicted probability distribution of quantized latent representations and thus reduces the bitrate. Experimental results indicate that the proposed TSC-PCAC method achieves an average of 38.53%, 21.30%, and 11.19% bitrate reductions on datasets 8iVFB, Owlii, 8iVSLF, Volograms, and MVUB compared to the Sparse-PCAC, NF-PCAC, and G-PCC v23 methods, respectively. The encoding/decoding time costs are reduced 97.68%/98.78% on average compared to the Sparse-PCAC. The source code and the trained TSC-PCAC models are available at https://github.com/igizuxo/TSC-PCAC.

{"title":"TSC-PCAC: Voxel Transformer and Sparse Convolution-Based Point Cloud Attribute Compression for 3D Broadcasting","authors":"Zixi Guo;Yun Zhang;Linwei Zhu;Hanli Wang;Gangyi Jiang","doi":"10.1109/TBC.2024.3464417","DOIUrl":"https://doi.org/10.1109/TBC.2024.3464417","url":null,"abstract":"Point cloud has been the mainstream representation for advanced 3D applications, such as virtual reality and augmented reality. However, the massive data amounts of point clouds is one of the most challenging issues for transmission and storage. In this paper, we propose an end-to-end voxel Transformer and Sparse Convolution based Point Cloud Attribute Compression (TSC-PCAC) for 3D broadcasting. Firstly, we present a framework of the TSC-PCAC, which includes Transformer and Sparse Convolutional Module (TSCM) based variational autoencoder and channel context module. Secondly, we propose a two-stage TSCM, where the first stage focuses on modeling local dependencies and feature representations of the point clouds, and the second stage captures global features through spatial and channel pooling encompassing larger receptive fields. This module effectively extracts global and local inter-point relevance to reduce informational redundancy. Thirdly, we design a TSCM based channel context module to exploit inter-channel correlations, which improves the predicted probability distribution of quantized latent representations and thus reduces the bitrate. Experimental results indicate that the proposed TSC-PCAC method achieves an average of 38.53%, 21.30%, and 11.19% bitrate reductions on datasets 8iVFB, Owlii, 8iVSLF, Volograms, and MVUB compared to the Sparse-PCAC, NF-PCAC, and G-PCC v23 methods, respectively. The encoding/decoding time costs are reduced 97.68%/98.78% on average compared to the Sparse-PCAC. The source code and the trained TSC-PCAC models are available at <uri>https://github.com/igizuxo/TSC-PCAC</uri>.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"154-166"},"PeriodicalIF":3.2,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing Channel Estimation in Terrestrial Broadcast Communications Using Machine Learning 利用机器学习增强地面广播通信中的信道估计

IF 3.2 1区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Broadcasting

Pub Date : 2024-09-19 DOI: 10.1109/TBC.2024.3417228

Iñigo Bilbao;Eneko Iradier;Jon Montalban;Pablo Angueira;Sung-Ik Park

Artificial Intelligence (AI) and Machine Learning (ML) approaches have emerged as viable alternatives to conventional Physical Layer (PHY) signal processing methods. Specifically, in any wireless point-to-multipoint communication, accurate channel estimation plays a pivotal role in exploiting spectrum efficiency with functionalities such as higher-order modulation or full-duplex communication. This research paper proposes leveraging ML solutions, including Convolutional Neural Networks (CNNs) and Multilayer Perceptrons (MLPs), to enhance channel estimation within broadcast environments. Each architecture is instantiated using distinct procedures, focusing on two fundamental approaches: channel estimation denoising and ML-assisted pilot interpolation. Rigorous evaluations are conducted across diverse configurations and conditions, spanning rural areas and co-channel interference scenarios. The results demonstrate that MLP and CNN architectures consistently outperform classical methods, yielding 10 and 20 dB performance improvements, respectively. These results underscore the efficacy of ML-driven approaches in advancing channel estimation capabilities for broadcast communication systems.

人工智能（AI）和机器学习（ML）方法已经成为传统物理层（PHY）信号处理方法的可行替代方案。具体来说，在任何无线点对多点通信中，准确的信道估计在利用高阶调制或全双工通信等功能的频谱效率方面起着关键作用。本研究论文提出利用机器学习解决方案，包括卷积神经网络（cnn）和多层感知器（mlp），来增强广播环境中的信道估计。每个架构都使用不同的程序进行实例化，重点关注两种基本方法：信道估计去噪和ml辅助导频插值。在不同的配置和条件下进行了严格的评估，包括农村地区和同信道干扰情况。结果表明，MLP和CNN架构始终优于经典方法，分别产生10和20 dB的性能提升。这些结果强调了机器学习驱动的方法在提高广播通信系统信道估计能力方面的有效性。

{"title":"Enhancing Channel Estimation in Terrestrial Broadcast Communications Using Machine Learning","authors":"Iñigo Bilbao;Eneko Iradier;Jon Montalban;Pablo Angueira;Sung-Ik Park","doi":"10.1109/TBC.2024.3417228","DOIUrl":"https://doi.org/10.1109/TBC.2024.3417228","url":null,"abstract":"Artificial Intelligence (AI) and Machine Learning (ML) approaches have emerged as viable alternatives to conventional Physical Layer (PHY) signal processing methods. Specifically, in any wireless point-to-multipoint communication, accurate channel estimation plays a pivotal role in exploiting spectrum efficiency with functionalities such as higher-order modulation or full-duplex communication. This research paper proposes leveraging ML solutions, including Convolutional Neural Networks (CNNs) and Multilayer Perceptrons (MLPs), to enhance channel estimation within broadcast environments. Each architecture is instantiated using distinct procedures, focusing on two fundamental approaches: channel estimation denoising and ML-assisted pilot interpolation. Rigorous evaluations are conducted across diverse configurations and conditions, spanning rural areas and co-channel interference scenarios. The results demonstrate that MLP and CNN architectures consistently outperform classical methods, yielding 10 and 20 dB performance improvements, respectively. These results underscore the efficacy of ML-driven approaches in advancing channel estimation capabilities for broadcast communication systems.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 4","pages":"1181-1191"},"PeriodicalIF":3.2,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IEEE Transactions on Broadcasting Information for Authors 电气和电子工程师学会（IEEE）《关于广播作者信息的论文集》（IEEE Transactions on Broadcasting Information for Authors

IF 3.2 1区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Broadcasting

Pub Date : 2024-09-16 DOI: 10.1109/TBC.2024.3453631

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

IEEE Transactions on Broadcasting

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀