Due to the complex underwater imaging environment, existing Underwater Image Enhancement (UIE) techniques are unable to handle the increasing demand for high-quality underwater content in broadcasting systems. Thus, a robust quality assessment method is highly expected to effectively compare the quality of different enhanced underwater images. To this end, we propose a novel quality assessment method for enhanced underwater images by utilizing multiple levels of features at various stages of the network’s depth. We first select underwater images with different distortions to analyze the characteristics of different UIE results at various feature levels. We found that low-level features are more sensitive to color information, while mid-level features are more indicative of structural differences. Based on this, a Channel-Spatial-Pixel Attention Module (CSPAM) is designed for low-level perception to capture color characteristics, utilizing channel, spatial, and pixel dimensions. To capture structural variations, a Parallel Structural Perception Module (PSPM) with convolutional kernels of different scales is introduced for mid-level perception. For high-level perception, due to the accumulation of noise, an Adaptive Weighted Downsampling (AWD) layer is employed to restore the semantic information. Furthermore, a new top-down multi-level feature fusion method is designed. Information from different levels is integrated through a Selective Feature Fusion (SFF) mechanism, which produces semantically rich features and enhances the model’s feature representation capability. Experimental results demonstrate the superior performance of the proposed method over the competing image quality evaluation methods.
{"title":"Multi-Level Perception Assessment for Underwater Image Enhancement","authors":"Yiwen Xu;Yuxiang Lin;Nian He;Xuejin Wang;Tiesong Zhao","doi":"10.1109/TBC.2025.3525972","DOIUrl":"https://doi.org/10.1109/TBC.2025.3525972","url":null,"abstract":"Due to the complex underwater imaging environment, existing Underwater Image Enhancement (UIE) techniques are unable to handle the increasing demand for high-quality underwater content in broadcasting systems. Thus, a robust quality assessment method is highly expected to effectively compare the quality of different enhanced underwater images. To this end, we propose a novel quality assessment method for enhanced underwater images by utilizing multiple levels of features at various stages of the network’s depth. We first select underwater images with different distortions to analyze the characteristics of different UIE results at various feature levels. We found that low-level features are more sensitive to color information, while mid-level features are more indicative of structural differences. Based on this, a Channel-Spatial-Pixel Attention Module (CSPAM) is designed for low-level perception to capture color characteristics, utilizing channel, spatial, and pixel dimensions. To capture structural variations, a Parallel Structural Perception Module (PSPM) with convolutional kernels of different scales is introduced for mid-level perception. For high-level perception, due to the accumulation of noise, an Adaptive Weighted Downsampling (AWD) layer is employed to restore the semantic information. Furthermore, a new top-down multi-level feature fusion method is designed. Information from different levels is integrated through a Selective Feature Fusion (SFF) mechanism, which produces semantically rich features and enhances the model’s feature representation capability. Experimental results demonstrate the superior performance of the proposed method over the competing image quality evaluation methods.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 2","pages":"606-615"},"PeriodicalIF":3.2,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-22DOI: 10.1109/TBC.2025.3525976
Yongli Chang;Guanghui Yue;Bo Zhao
Recently, convolutional neural network (CNN) based stereo image quality assessment (SIQA) has been extensively researched, achieving impressive performance. However, most SIQA methods tend to only mine features from distorted stereo image, neglecting the exploitation of valuable features present in other image domains. Moreover, some simple fusion strategies like addition and concatenation for binocular fusion further limit the network’s prediction performance. Therefore, we design a cross-domain feature interaction network (CDFINet) for SIQA in this paper, which considers the complementarity between different domain features and realizes binocular fusion between the left and right monocular features based on difference information. Specifically, to boost the prediction ability, we design a dual-branch network with image and gradient feature extraction branches, extracting hierarchical features from both domains. Moreover, to explore more proper binocular information, we propose a difference information guidance based binocular fusion (DIGBF) module to achieve the binocular fusion. Furthermore, to better achieve information compensation between image and gradient domain, binocular features obtained from image domain and gradient domain are fused in the proposed cross-domain feature fusion (CDFF) module. In addition, considering the feedback mechanism of the visual cortex, higher-level features are backpropagated to lower-level regions, and the proposed cross-layer feature interaction (CLFI) module realizes the guidance of higher-level features to lower-level features. Finally, to encourage a more effective way to get the perceptual quality, a hierarchical multi-score quality aggregation method is proposed. The experimental results on four SIQA databases show that our CDFINet outperforms the compared mainstream metrics.
{"title":"Cross-Domain Feature Interaction Network for Stereo Image Quality Assessment Considering Difference Information Guiding Binocular Fusion","authors":"Yongli Chang;Guanghui Yue;Bo Zhao","doi":"10.1109/TBC.2025.3525976","DOIUrl":"https://doi.org/10.1109/TBC.2025.3525976","url":null,"abstract":"Recently, convolutional neural network (CNN) based stereo image quality assessment (SIQA) has been extensively researched, achieving impressive performance. However, most SIQA methods tend to only mine features from distorted stereo image, neglecting the exploitation of valuable features present in other image domains. Moreover, some simple fusion strategies like addition and concatenation for binocular fusion further limit the network’s prediction performance. Therefore, we design a cross-domain feature interaction network (CDFINet) for SIQA in this paper, which considers the complementarity between different domain features and realizes binocular fusion between the left and right monocular features based on difference information. Specifically, to boost the prediction ability, we design a dual-branch network with image and gradient feature extraction branches, extracting hierarchical features from both domains. Moreover, to explore more proper binocular information, we propose a difference information guidance based binocular fusion (DIGBF) module to achieve the binocular fusion. Furthermore, to better achieve information compensation between image and gradient domain, binocular features obtained from image domain and gradient domain are fused in the proposed cross-domain feature fusion (CDFF) module. In addition, considering the feedback mechanism of the visual cortex, higher-level features are backpropagated to lower-level regions, and the proposed cross-layer feature interaction (CLFI) module realizes the guidance of higher-level features to lower-level features. Finally, to encourage a more effective way to get the perceptual quality, a hierarchical multi-score quality aggregation method is proposed. The experimental results on four SIQA databases show that our CDFINet outperforms the compared mainstream metrics.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 2","pages":"593-605"},"PeriodicalIF":3.2,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-20DOI: 10.1109/TBC.2024.3519909
Hongwei Guo;Ce Zhu;Junjie Chen;Lei Luo;Yongkai Huo;Yutian Liu
Previous studies have shown that temporally dependent rate-distortion optimization (RDO) methods can enhance the compression performance of video encoders. However, accurately quantifying temporal rate-distortion dependencies in the latest video coding standard, Versatile Video Coding (VVC), remains a significant challenge. To address this issue, this paper proposes a distortion propagation factor (DPF) estimation method tailored for VVC low-delay hierarchical coding, aiming to achieve temporally dependent RDO. Specifically, we first derive a formula for calculating the DPF based on coding distortion and motion-compensated prediction (MCP) errors. Building on this, we present several pre-encoding-based DPF estimation schemes designed for the VVC low-delay hierarchical coding structure. These schemes have very low computational complexity and do not require buffering subsequent unencoded frames for pre-analysis, thereby avoiding additional encoding delays. Finally, the estimated DPFs are used to adaptively adjust the Lagrange multipliers and quantization parameters of each coding tree unit, optimizing the allocation of coding bit resources. After integrating the proposed method into the VVC test model VTM-23.0, experimental results show that one of the proposed DPF estimation schemes achieves average bit rate savings of 4.25% for low-delay B slices and 4.12% for low-delay P slices, with only a 1% increase in computational complexity. The proposed method offers an effective solution for enhancing the compression performance of VVC encoders. Consequently, the proposed DPF estimation approaches have already been adopted by the Joint Video Experts Team (JVET) and officially integrated into the VVC reference software.
{"title":"Distortion Propagation Factor Estimation for VVC Low-Delay Hierarchical Coding","authors":"Hongwei Guo;Ce Zhu;Junjie Chen;Lei Luo;Yongkai Huo;Yutian Liu","doi":"10.1109/TBC.2024.3519909","DOIUrl":"https://doi.org/10.1109/TBC.2024.3519909","url":null,"abstract":"Previous studies have shown that temporally dependent rate-distortion optimization (RDO) methods can enhance the compression performance of video encoders. However, accurately quantifying temporal rate-distortion dependencies in the latest video coding standard, Versatile Video Coding (VVC), remains a significant challenge. To address this issue, this paper proposes a distortion propagation factor (DPF) estimation method tailored for VVC low-delay hierarchical coding, aiming to achieve temporally dependent RDO. Specifically, we first derive a formula for calculating the DPF based on coding distortion and motion-compensated prediction (MCP) errors. Building on this, we present several pre-encoding-based DPF estimation schemes designed for the VVC low-delay hierarchical coding structure. These schemes have very low computational complexity and do not require buffering subsequent unencoded frames for pre-analysis, thereby avoiding additional encoding delays. Finally, the estimated DPFs are used to adaptively adjust the Lagrange multipliers and quantization parameters of each coding tree unit, optimizing the allocation of coding bit resources. After integrating the proposed method into the VVC test model VTM-23.0, experimental results show that one of the proposed DPF estimation schemes achieves average bit rate savings of 4.25% for low-delay B slices and 4.12% for low-delay P slices, with only a 1% increase in computational complexity. The proposed method offers an effective solution for enhancing the compression performance of VVC encoders. Consequently, the proposed DPF estimation approaches have already been adopted by the Joint Video Experts Team (JVET) and officially integrated into the VVC reference software.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 2","pages":"492-505"},"PeriodicalIF":3.2,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1109/TBC.2024.3511928
Allan Seiti Sassaqui Chaubet;Rodrigo Admir Vaz;George Henrique Maranhão Garcia de Oliveira;Ricardo Seriacopi Rabaça;Isabela Coelho Dourado;Gustavo de Melo Valeira;Cristiano Akamine
A new Digital Terrestrial Television Broadcasting (DTTB) system, called Television (TV) 3.0, is being developed in Brazil and is expected to be on air by 2025 under the commercial name DTV+. It started with a Call for Proposals (CfP) for its systems components, for which organizations worldwide have submitted candidate technologies. After two testing and evaluation phases, the technologies for all layers were selected, the TV 3.0 architecture was completely defined, and the standards were written. It consists of modern Modulation and Code (MODCOD) techniques, mandatory transmission and reception in Multiple-Input Multiple-Output (MIMO) with cross-polarized antennas, an app-oriented interface, an Internet-based Transport Layer (TL), and state-of-the-art efficient coding for audio, video, and captions. This set of technologies will allow for several new use cases that change the user experience with TV, such as Geographically Segmented Broadcasting (GSB), targeted advertising, sensory effects, and interactivity. This paper reviews the phases already concluded for the TV 3.0 project and presents its potentialities and the current developments at its final stage.
{"title":"TV 3.0: An Overview","authors":"Allan Seiti Sassaqui Chaubet;Rodrigo Admir Vaz;George Henrique Maranhão Garcia de Oliveira;Ricardo Seriacopi Rabaça;Isabela Coelho Dourado;Gustavo de Melo Valeira;Cristiano Akamine","doi":"10.1109/TBC.2024.3511928","DOIUrl":"https://doi.org/10.1109/TBC.2024.3511928","url":null,"abstract":"A new Digital Terrestrial Television Broadcasting (DTTB) system, called Television (TV) 3.0, is being developed in Brazil and is expected to be on air by 2025 under the commercial name DTV+. It started with a Call for Proposals (CfP) for its systems components, for which organizations worldwide have submitted candidate technologies. After two testing and evaluation phases, the technologies for all layers were selected, the TV 3.0 architecture was completely defined, and the standards were written. It consists of modern Modulation and Code (MODCOD) techniques, mandatory transmission and reception in Multiple-Input Multiple-Output (MIMO) with cross-polarized antennas, an app-oriented interface, an Internet-based Transport Layer (TL), and state-of-the-art efficient coding for audio, video, and captions. This set of technologies will allow for several new use cases that change the user experience with TV, such as Geographically Segmented Broadcasting (GSB), targeted advertising, sensory effects, and interactivity. This paper reviews the phases already concluded for the TV 3.0 project and presents its potentialities and the current developments at its final stage.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"11-18"},"PeriodicalIF":3.2,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1109/TBC.2024.3517138
Yegi Lee;Myung-Sun Baek;Kyoungro Yoon
Many efforts to achieve cost savings through simulations have been ongoing in the cyber-physical system (CPS) industry and manufacturing field. Recently, the concept of digital twins has emerged as a promising solution for cost reduction in various fields, such as smart cities, factory optimization, architecture, and manufacturing. Digital twins offer enormous potential by continuously monitoring and updating data to study a wide range of issues and improve products and processes. However, the practical implementation of digital twins presents significant challenges. Additionally, while various studies have introduced the concepts and roles of digital twin systems and digital components, further research is needed to explore efficient operation and management strategies. This paper aims to present digital entity management methodology for the efficient implementation of digital twin systems. Our proposed class-level digital entity management methodology constructs complex and repetitively used digital entities into digital entity classes. This approach facilitates the abstraction, inheritance, and upcasting of digital entity classes. By leveraging class-level management and easily reusable and modifiable digital entities, the implementation of low-complexity digital twin systems becomes feasible. The proposed methodology aims to streamline the digital twin implementation process, addressing complex technical integration and practical implementation challenges.
{"title":"Digital Entity Management Methodology for Digital Twin Implementation: Concept, Definition, and Examples","authors":"Yegi Lee;Myung-Sun Baek;Kyoungro Yoon","doi":"10.1109/TBC.2024.3517138","DOIUrl":"https://doi.org/10.1109/TBC.2024.3517138","url":null,"abstract":"Many efforts to achieve cost savings through simulations have been ongoing in the cyber-physical system (CPS) industry and manufacturing field. Recently, the concept of digital twins has emerged as a promising solution for cost reduction in various fields, such as smart cities, factory optimization, architecture, and manufacturing. Digital twins offer enormous potential by continuously monitoring and updating data to study a wide range of issues and improve products and processes. However, the practical implementation of digital twins presents significant challenges. Additionally, while various studies have introduced the concepts and roles of digital twin systems and digital components, further research is needed to explore efficient operation and management strategies. This paper aims to present digital entity management methodology for the efficient implementation of digital twin systems. Our proposed class-level digital entity management methodology constructs complex and repetitively used digital entities into digital entity classes. This approach facilitates the abstraction, inheritance, and upcasting of digital entity classes. By leveraging class-level management and easily reusable and modifiable digital entities, the implementation of low-complexity digital twin systems becomes feasible. The proposed methodology aims to streamline the digital twin implementation process, addressing complex technical integration and practical implementation challenges.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"19-29"},"PeriodicalIF":3.2,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents an advanced rate control (ARC) algorithm for Versatile Video Coding (VVC). The proposed method is based on spatial coupling strategy and improved Broyden Fletcher Goldfarb Shanno (BFGS) algorithm to achieve a high performance rate control (RC). In this paper, we address the problem that the current coding block does not fully utilise the spatial information during the encoding process. Firstly, a parameter updating strategy at the coding tree unit (CTU) level is constructed based on spatial coupling strategy. The spatial coupling strategy established the relationship between video parameters and video texture, which enables the video parameters at the CTU level to be more closely aligned with the video content. Furthermore, in order to enhance the precision of RC, we have proposed an improved BFGS algorithm to update video parameters, which utilizes the optimal search direction of the different partial differentials and sets an adaptive speed control factor. The experimental results indicate that the proposed method offers better performance compared to the default RC in VVC Test Moder (VTM) 19.0, with Bjøntegaard Delta Rate (BD-Rate) savings of 6.35%, 5.09% and 5.43% under Low Delay P, Low Delay B and Random Access configurations, respectively. Moreover, the proposed method demonstrates superior performance compared to other state-of-the-art algorithms.
{"title":"Spatial Coupling Strategy and Improved BFGS-Based Advanced Rate Control for VVC","authors":"Jiahao Zhang;Shuhua Xiong;Xiaohai He;Zeming Zhao;Hongdong Qin","doi":"10.1109/TBC.2024.3517167","DOIUrl":"https://doi.org/10.1109/TBC.2024.3517167","url":null,"abstract":"This paper presents an advanced rate control (ARC) algorithm for Versatile Video Coding (VVC). The proposed method is based on spatial coupling strategy and improved Broyden Fletcher Goldfarb Shanno (BFGS) algorithm to achieve a high performance rate control (RC). In this paper, we address the problem that the current coding block does not fully utilise the spatial information during the encoding process. Firstly, a parameter updating strategy at the coding tree unit (CTU) level is constructed based on spatial coupling strategy. The spatial coupling strategy established the relationship between video parameters and video texture, which enables the video parameters at the CTU level to be more closely aligned with the video content. Furthermore, in order to enhance the precision of RC, we have proposed an improved BFGS algorithm to update video parameters, which utilizes the optimal search direction of the different partial differentials and sets an adaptive speed control factor. The experimental results indicate that the proposed method offers better performance compared to the default RC in VVC Test Moder (VTM) 19.0, with Bjøntegaard Delta Rate (BD-Rate) savings of 6.35%, 5.09% and 5.43% under Low Delay P, Low Delay B and Random Access configurations, respectively. Moreover, the proposed method demonstrates superior performance compared to other state-of-the-art algorithms.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"111-124"},"PeriodicalIF":3.2,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The complex distortions suffered by real-world underwater images pose urgent demands on accurate underwater image quality assessment (UIQA) approaches that can predict underwater image quality consistently with human perception. Deep learning techniques have achieved great success in many applications, yet usually requiring a substantial amount of human-labeled data, which is time-consuming and labor-intensive. Developing a deep learning-based UIQA method that does not rely on any human labeled underwater images for model training poses a great challenge. In this work, we propose a novel UIQA method based on domain adaption (DA) from a curriculum learning perspective. The proposed method is called curriculum learning-inspired DA (CLIDA), aiming to learn an robust and generalizable UIQA model by conducting DA between the labeled natural images and unlabeled underwater images progressively, i.e., from easy to hard. The key is how to select easy samples from all underwater images in the target domain so that the difficulty of DA can be well-controlled at each stage. To this end, we propose a simple yet effective easy sample selection (ESS) scheme to form an easy sample set at each stage. Then, DA is performed between the entire natural image set in the source domain (with labels) and the selected easy sample set in the target domain (with pseudo labels) at each stage. As only those reliable easy examples are involved in DA at each stage, the difficulty of DA is well-controlled and the capability of the model is expected to be progressively enhanced. We conduct extensive experiments to verify the superiority of the proposed CLIDA method and also the effectiveness of each key component involved in our CLIDA framework. The source code will be made available at https://github.com/zzeu001/CLIDA.
{"title":"Generalizable Underwater Image Quality Assessment With Curriculum Learning-Inspired Domain Adaption","authors":"Shihui Wu;Qiuping Jiang;Guanghui Yue;Shiqi Wang;Guangtao Zhai","doi":"10.1109/TBC.2024.3511962","DOIUrl":"https://doi.org/10.1109/TBC.2024.3511962","url":null,"abstract":"The complex distortions suffered by real-world underwater images pose urgent demands on accurate underwater image quality assessment (UIQA) approaches that can predict underwater image quality consistently with human perception. Deep learning techniques have achieved great success in many applications, yet usually requiring a substantial amount of human-labeled data, which is time-consuming and labor-intensive. Developing a deep learning-based UIQA method that does not rely on any human labeled underwater images for model training poses a great challenge. In this work, we propose a novel UIQA method based on domain adaption (DA) from a curriculum learning perspective. The proposed method is called curriculum learning-inspired DA (CLIDA), aiming to learn an robust and generalizable UIQA model by conducting DA between the labeled natural images and unlabeled underwater images progressively, i.e., from easy to hard. The key is how to select easy samples from all underwater images in the target domain so that the difficulty of DA can be well-controlled at each stage. To this end, we propose a simple yet effective easy sample selection (ESS) scheme to form an easy sample set at each stage. Then, DA is performed between the entire natural image set in the source domain (with labels) and the selected easy sample set in the target domain (with pseudo labels) at each stage. As only those reliable easy examples are involved in DA at each stage, the difficulty of DA is well-controlled and the capability of the model is expected to be progressively enhanced. We conduct extensive experiments to verify the superiority of the proposed CLIDA method and also the effectiveness of each key component involved in our CLIDA framework. The source code will be made available at <uri>https://github.com/zzeu001/CLIDA</uri>.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"252-263"},"PeriodicalIF":3.2,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-27DOI: 10.1109/TBC.2024.3511927
Fengchuang Xing;Mingjie Li;Yuan-Gen Wang;Guopu Zhu;Xiaochun Cao
In learning vision-language representations from Web-scale data, the contrastive language-image pre-training (CLIP) mechanism has demonstrated a remarkable performance in many vision tasks. However, its application to the widely studied video quality assessment (VQA) task is still an open issue. In this paper, we propose an efficient and effective CLIP-based Transformer method for the VQA problem (CLIPVQA). Specifically, we first design an effective video frame perception paradigm with the goal of extracting the rich spatiotemporal quality and content information among video frames. Then, the spatiotemporal quality features are adequately integrated together using a self-attention mechanism to yield video-level quality representation. To utilize the quality language descriptions of videos for supervision, we develop a CLIP-based encoder for language embedding, which is then fully aggregated with the generated content information via a cross-attention module for producing video-language representation. Finally, the video-level quality and video-language representations are fused together for final video quality prediction, where a vectorized regression loss is employed for efficient end-to-end optimization. Comprehensive experiments are conducted on eight in-the-wild video datasets with diverse resolutions to evaluate the performance of CLIPVQA. The experimental results show that the proposed CLIPVQA achieves new state-of-the-art VQA performance and up to 37% better generalizability than existing benchmark VQA methods. A series of ablation studies are also performed to validate the effectiveness of each module in CLIPVQA.
{"title":"CLIPVQA: Video Quality Assessment via CLIP","authors":"Fengchuang Xing;Mingjie Li;Yuan-Gen Wang;Guopu Zhu;Xiaochun Cao","doi":"10.1109/TBC.2024.3511927","DOIUrl":"https://doi.org/10.1109/TBC.2024.3511927","url":null,"abstract":"In learning vision-language representations from Web-scale data, the contrastive language-image pre-training (CLIP) mechanism has demonstrated a remarkable performance in many vision tasks. However, its application to the widely studied video quality assessment (VQA) task is still an open issue. In this paper, we propose an efficient and effective CLIP-based Transformer method for the VQA problem (CLIPVQA). Specifically, we first design an effective video frame perception paradigm with the goal of extracting the rich spatiotemporal quality and content information among video frames. Then, the spatiotemporal quality features are adequately integrated together using a self-attention mechanism to yield video-level quality representation. To utilize the quality language descriptions of videos for supervision, we develop a CLIP-based encoder for language embedding, which is then fully aggregated with the generated content information via a cross-attention module for producing video-language representation. Finally, the video-level quality and video-language representations are fused together for final video quality prediction, where a vectorized regression loss is employed for efficient end-to-end optimization. Comprehensive experiments are conducted on eight in-the-wild video datasets with diverse resolutions to evaluate the performance of CLIPVQA. The experimental results show that the proposed CLIPVQA achieves new state-of-the-art VQA performance and up to 37% better generalizability than existing benchmark VQA methods. A series of ablation studies are also performed to validate the effectiveness of each module in CLIPVQA.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"291-306"},"PeriodicalIF":3.2,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-25DOI: 10.1109/TBC.2024.3517141
Lang Lin;Wensheng Pan;Hongzhi Zhao;Shengfeng Zhang;Shihai Shao;Youxi Tang
The full-duplex (FD) technique provides spectrum-efficient transmission service by supporting uplink and downlink transmission at the same time over the same frequency band. Combined FD with multi-input multi-output (MIMO) antenna systems can improve the capability and efficiency of next-generation broadcasting to provide various services for multiple users. However, the strong self-interference (SI) coupled between the transmit and receive arrays is a significant challenge for simultaneous transmission and reception. This study considers multi-user oriented broadcast systems where the transmit array with subarray division allows simultaneous multi-beam transmission. Our objective is to mitigate SI from two aspects: 1) subarray assignment; 2) adaptive transmit beamforming (TxBF). We propose a min-SI TxBF design jointly with subarray assignment, which determines the best subarray assignment pattern and provides a closed-form solution of the optimal TxBF weight. Theoretical analysis and simulations show our design has a low algorithm complexity. Based on the measured SI channel data collected from a hardware prototype testbed, simulation results verify that the min-SI TxBF design with subarray assignment can effectively enhance SI cancelation.
{"title":"Joint Optimization of Beamforming and Subarray Assignment for Full-Duplex Arrays in Next Generation Broadcast Systems","authors":"Lang Lin;Wensheng Pan;Hongzhi Zhao;Shengfeng Zhang;Shihai Shao;Youxi Tang","doi":"10.1109/TBC.2024.3517141","DOIUrl":"https://doi.org/10.1109/TBC.2024.3517141","url":null,"abstract":"The full-duplex (FD) technique provides spectrum-efficient transmission service by supporting uplink and downlink transmission at the same time over the same frequency band. Combined FD with multi-input multi-output (MIMO) antenna systems can improve the capability and efficiency of next-generation broadcasting to provide various services for multiple users. However, the strong self-interference (SI) coupled between the transmit and receive arrays is a significant challenge for simultaneous transmission and reception. This study considers multi-user oriented broadcast systems where the transmit array with subarray division allows simultaneous multi-beam transmission. Our objective is to mitigate SI from two aspects: 1) subarray assignment; 2) adaptive transmit beamforming (TxBF). We propose a min-SI TxBF design jointly with subarray assignment, which determines the best subarray assignment pattern and provides a closed-form solution of the optimal TxBF weight. Theoretical analysis and simulations show our design has a low algorithm complexity. Based on the measured SI channel data collected from a hardware prototype testbed, simulation results verify that the min-SI TxBF design with subarray assignment can effectively enhance SI cancelation.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 2","pages":"672-679"},"PeriodicalIF":3.2,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-12DOI: 10.1109/TBC.2024.3511950
Zhanyuan Cai;Wenxu Gao;Ge Li;Wei Gao
For efficient point cloud broadcasting, point cloud compression technologies serve as the foundation, which plays a crucial role in immersive media communication and streaming. Video-based point cloud compression (V-PCC) is the recently developed standard by the Moving Picture Experts Group (MPEG) for dynamic point clouds. Its original fixed-ratio bit allocation (FR-BA) method in the unique all intra (AI) structure leads to a significant rate-distortion performance gap between the rate control manner and the fixed quantization parameters (FixedQP) scheme, as evidenced by significant increases in BD-Rate (Bjøntegaard Delta Rate) for both geometry and attribute. To address this issue, we propose a distortion propagation model-based frame-level bit allocation method that is specifically tailored for AI structure in V-PCC. First, the analysis is carried out for the distortion propagation model inside the group of pictures (GOP) for the AI configuration. Second, the skip ratio of 4x4 minimum coding units (CUs) is utilized to predict the distortion propagation factor. Third, the occupancy information is employed to refine the distortion propagation model and further enhance compression performance. Finally, experimental results demonstrate the effectiveness of the proposed distortion propagation model-based frame-level bit allocation method. Specifically, experimental results reveal that the proposed method achieves BD-Rate reductions of 0.92% and 4.85% in geometry and attribute, respectively, compared to the FR-BA method. Furthermore, with the introduction of distortion propagation factor prediction incorporating occupancy correction, the BD-Rate reductions are further extended to 2.16% and 6.13% in geometry and attribute, respectively.
{"title":"Distortion Propagation Model-Based V-PCC Rate Control for 3D Point Cloud Broadcasting","authors":"Zhanyuan Cai;Wenxu Gao;Ge Li;Wei Gao","doi":"10.1109/TBC.2024.3511950","DOIUrl":"https://doi.org/10.1109/TBC.2024.3511950","url":null,"abstract":"For efficient point cloud broadcasting, point cloud compression technologies serve as the foundation, which plays a crucial role in immersive media communication and streaming. Video-based point cloud compression (V-PCC) is the recently developed standard by the Moving Picture Experts Group (MPEG) for dynamic point clouds. Its original fixed-ratio bit allocation (FR-BA) method in the unique all intra (AI) structure leads to a significant rate-distortion performance gap between the rate control manner and the fixed quantization parameters (FixedQP) scheme, as evidenced by significant increases in BD-Rate (Bjøntegaard Delta Rate) for both geometry and attribute. To address this issue, we propose a distortion propagation model-based frame-level bit allocation method that is specifically tailored for AI structure in V-PCC. First, the analysis is carried out for the distortion propagation model inside the group of pictures (GOP) for the AI configuration. Second, the skip ratio of 4x4 minimum coding units (CUs) is utilized to predict the distortion propagation factor. Third, the occupancy information is employed to refine the distortion propagation model and further enhance compression performance. Finally, experimental results demonstrate the effectiveness of the proposed distortion propagation model-based frame-level bit allocation method. Specifically, experimental results reveal that the proposed method achieves BD-Rate reductions of 0.92% and 4.85% in geometry and attribute, respectively, compared to the FR-BA method. Furthermore, with the introduction of distortion propagation factor prediction incorporating occupancy correction, the BD-Rate reductions are further extended to 2.16% and 6.13% in geometry and attribute, respectively.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"180-192"},"PeriodicalIF":3.2,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}