Currently, screen content video applications are widely used in our daily lives. As the latest Screen Content Coding (SCC) standard, Versatile Video Coding (VVC) SCC employs a quad-tree plus nested multi-type tree (QTMT) coding structure and various screen content coding modes (CMs). This design enhances the coding efficiency of VVC SCC but also results in a highly complex coding process, which significantly hinders the broader adoption of screen content video technology. Consequently, improving the coding speed of VVC SCC is highly desirable. In this paper, we propose a fast CM and transform decision algorithm for Intra prediction in VVC SCC. Specifically, we initially use Convolutional Neural Networks (CNNs) to predict content types for all Coding Units (CUs). Subsequently, we predict candidate CMs for CUs based on the CM distributions of different content types. We then select the Sum of Absolute Transformed Difference (SATD) as a feature and use a naive Bayes classifier to skip unlikely Intra mode early. Finally, we terminate Block-based Differential Pulse-Code Modulation (BDPCM) early and then select the best transform type in Intra mode prediction to improve coding speed. Experimental results demonstrate that the proposed algorithm improves coding speed by an average of 39.28%, with the BDBR increasing by 0.80%.
{"title":"Fast Coding Mode Decision for Intra Prediction in VVC SCC","authors":"Dayong Wang;Weihong Liu;Zeyu Zhou;Xin Lu;Jinhua Liu;Hui Guo;Ce Zhu","doi":"10.1109/TBC.2025.3541773","DOIUrl":"https://doi.org/10.1109/TBC.2025.3541773","url":null,"abstract":"Currently, screen content video applications are widely used in our daily lives. As the latest Screen Content Coding (SCC) standard, Versatile Video Coding (VVC) SCC employs a quad-tree plus nested multi-type tree (QTMT) coding structure and various screen content coding modes (CMs). This design enhances the coding efficiency of VVC SCC but also results in a highly complex coding process, which significantly hinders the broader adoption of screen content video technology. Consequently, improving the coding speed of VVC SCC is highly desirable. In this paper, we propose a fast CM and transform decision algorithm for Intra prediction in VVC SCC. Specifically, we initially use Convolutional Neural Networks (CNNs) to predict content types for all Coding Units (CUs). Subsequently, we predict candidate CMs for CUs based on the CM distributions of different content types. We then select the Sum of Absolute Transformed Difference (SATD) as a feature and use a naive Bayes classifier to skip unlikely Intra mode early. Finally, we terminate Block-based Differential Pulse-Code Modulation (BDPCM) early and then select the best transform type in Intra mode prediction to improve coding speed. Experimental results demonstrate that the proposed algorithm improves coding speed by an average of 39.28%, with the BDBR increasing by 0.80%.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 2","pages":"506-516"},"PeriodicalIF":3.2,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-28DOI: 10.1109/TBC.2025.3570871
Jian Yue;Mao Ye;Luping Ji;Hongwei Guo;Ce Zhu
With the rapid growth of digital media applications, the need for advanced video compression technology has become indispensable, as achieving high compression ratios often leads to quality degradation, making compressed video quality enhancement a crucial research focus. In recent years, deep learning-based approaches have revolutionized compressed video quality enhancement, far surpassing traditional methods and enabling unprecedented high-quality reconstruction. Leveraging data-driven techniques, deep learning has demonstrated remarkable progress in image and video quality enhancement tasks. This study offers a comprehensive review of recent advances in the enhancement of compressed video quality. It focuses on deep learning-based methods, particularly those leveraging convolutional neural networks, and explores their advantages over traditional approaches. The review is structured around key topics, including task definitions and challenges, general-purpose and domain-specific quality enhancement techniques, as well as datasets and metrics. Beyond summarizing the state of the art, this article offers an in-depth analysis of current methods, highlighting their strengths, limitations, and practical application scenarios. Finally, it identifies future research directions and discusses the critical challenges that remain, with the aim of guiding further exploration in the field of compressed video quality enhancement.
{"title":"A Survey of Deep-Learning-Based Compressed Video Quality Enhancement","authors":"Jian Yue;Mao Ye;Luping Ji;Hongwei Guo;Ce Zhu","doi":"10.1109/TBC.2025.3570871","DOIUrl":"https://doi.org/10.1109/TBC.2025.3570871","url":null,"abstract":"With the rapid growth of digital media applications, the need for advanced video compression technology has become indispensable, as achieving high compression ratios often leads to quality degradation, making compressed video quality enhancement a crucial research focus. In recent years, deep learning-based approaches have revolutionized compressed video quality enhancement, far surpassing traditional methods and enabling unprecedented high-quality reconstruction. Leveraging data-driven techniques, deep learning has demonstrated remarkable progress in image and video quality enhancement tasks. This study offers a comprehensive review of recent advances in the enhancement of compressed video quality. It focuses on deep learning-based methods, particularly those leveraging convolutional neural networks, and explores their advantages over traditional approaches. The review is structured around key topics, including task definitions and challenges, general-purpose and domain-specific quality enhancement techniques, as well as datasets and metrics. Beyond summarizing the state of the art, this article offers an in-depth analysis of current methods, highlighting their strengths, limitations, and practical application scenarios. Finally, it identifies future research directions and discusses the critical challenges that remain, with the aim of guiding further exploration in the field of compressed video quality enhancement.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 4","pages":"977-992"},"PeriodicalIF":4.8,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145766201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-26DOI: 10.1109/TBC.2025.3549985
Bo Hu;Wenzhi Chen;Jia Zheng;Leida Li;Wen Lu;Xinbo Gao
Compared with no-reference image quality assessment (IQA), full-reference IQA often achieves higher consistency with human subjective perception due to the reference information for comparison. A natural idea is to design strategies that allow the latter to guide the former’s learning to achieve better performance. However, how to construct the reference information and how to transfer prior knowledge are two important issues we are going to face that have not been fully explored. To this end, a novel method called no-reference IQA via inter-level adaptive knowledge distillation (AKD-IQA) is proposed. The core of AKD-IQA lies in transferring image distribution difference information from the full-reference teacher model to the no-reference student model through inter-level AKD. First, the teacher model is constructed based on multi-level feature discrepancy extractor and cross-scale feature integrator. Then, it is trained on a large synthetic distortion dataset to establish a comprehensive difference prior distribution. Finally, the image re-distortion strategy and inter-level AKD are introduced into the student model for effective learning. Experimental results on six standard IQA datasets demonstrate that the AKD-IQA achieves state-of-the-art performance. In addition, cross-dataset experiments confirm the superiority of it in generalization ability.
{"title":"No-Reference Image Quality Assessment via Inter-Level Adaptive Knowledge Distillation","authors":"Bo Hu;Wenzhi Chen;Jia Zheng;Leida Li;Wen Lu;Xinbo Gao","doi":"10.1109/TBC.2025.3549985","DOIUrl":"https://doi.org/10.1109/TBC.2025.3549985","url":null,"abstract":"Compared with no-reference image quality assessment (IQA), full-reference IQA often achieves higher consistency with human subjective perception due to the reference information for comparison. A natural idea is to design strategies that allow the latter to guide the former’s learning to achieve better performance. However, how to construct the reference information and how to transfer prior knowledge are two important issues we are going to face that have not been fully explored. To this end, a novel method called no-reference IQA via inter-level adaptive knowledge distillation (AKD-IQA) is proposed. The core of AKD-IQA lies in transferring image distribution difference information from the full-reference teacher model to the no-reference student model through inter-level AKD. First, the teacher model is constructed based on multi-level feature discrepancy extractor and cross-scale feature integrator. Then, it is trained on a large synthetic distortion dataset to establish a comprehensive difference prior distribution. Finally, the image re-distortion strategy and inter-level AKD are introduced into the student model for effective learning. Experimental results on six standard IQA datasets demonstrate that the AKD-IQA achieves state-of-the-art performance. In addition, cross-dataset experiments confirm the superiority of it in generalization ability.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 2","pages":"581-592"},"PeriodicalIF":3.2,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To address the high cost associated with using high-speed and large-acquisition-bandwidth analog-to-digital-converters (ADCs) in the feedback path, a new low-sampling-rate digital predistortion (DPD) method is proposed in this paper. To model the analog bandpass filter (BPF) in the feedback path, a training method for digital finite impulse response (FIR) filter coefficients in a practical band-limited DPD system is proposed, and a filter matrix is constructed in different forms in the case of continuous signal and cyclic signal inputs. The filter matrix provides an extra degree of band-limited power amplifier (PA) model accuracy and robustness. Then, an inverse filter signal recovery (IFSR) method is proposed to recover the full-band output signal of the PA, which can be used to train the predistorter using conventional DPD techniques. Simulation results validates the effectiveness of the IFSR method, demonstrating that the IFSR-DPD method can reduce the ADC sampling rate to 1/10 or less compared to full-rate sampling methods, and decrease the ADC acquisition bandwidth to about 0.3 times that of the original input signal bandwidth. The linearization performance of the IFSR-DPD method is also evaluated on an instrument-based test platform. When the passband and transition band characteristics of the BPF are unsatisfactory, the proposed low-sampling rate DPD method improves the adjacent channel power ratio (ACPR) by 18.67 dB and the error vector magnitude (EVM) by 1.214%, compared to the scenario without DPD.
{"title":"A Low-Sampling-Rate Digital Predistortion Method Based on Inverse Filter Signal Recovery for Wideband Power Amplifiers","authors":"Xiaofang Wu;Jiawen Yan;Dehuang Zhang;Jianyang Zhou","doi":"10.1109/TBC.2025.3549995","DOIUrl":"https://doi.org/10.1109/TBC.2025.3549995","url":null,"abstract":"To address the high cost associated with using high-speed and large-acquisition-bandwidth analog-to-digital-converters (ADCs) in the feedback path, a new low-sampling-rate digital predistortion (DPD) method is proposed in this paper. To model the analog bandpass filter (BPF) in the feedback path, a training method for digital finite impulse response (FIR) filter coefficients in a practical band-limited DPD system is proposed, and a filter matrix is constructed in different forms in the case of continuous signal and cyclic signal inputs. The filter matrix provides an extra degree of band-limited power amplifier (PA) model accuracy and robustness. Then, an inverse filter signal recovery (IFSR) method is proposed to recover the full-band output signal of the PA, which can be used to train the predistorter using conventional DPD techniques. Simulation results validates the effectiveness of the IFSR method, demonstrating that the IFSR-DPD method can reduce the ADC sampling rate to 1/10 or less compared to full-rate sampling methods, and decrease the ADC acquisition bandwidth to about 0.3 times that of the original input signal bandwidth. The linearization performance of the IFSR-DPD method is also evaluated on an instrument-based test platform. When the passband and transition band characteristics of the BPF are unsatisfactory, the proposed low-sampling rate DPD method improves the adjacent channel power ratio (ACPR) by 18.67 dB and the error vector magnitude (EVM) by 1.214%, compared to the scenario without DPD.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 2","pages":"653-665"},"PeriodicalIF":3.2,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the OFDM-based digital terrestrial broadcasting systems, impulsive noise is a significant factor affecting communication quality. A prominent method to suppress impulsive noise is to incorporate a memoryless nonlinearity at the receiver front-end of the OFDM demodulator, in which parameter estimation of memoryless nonlinearity directly impact the effectiveness of impulsive noise suppression. In this paper, we proposes a deep learning-based memoryless nonlinearity approach for impulsive noise suppression. The proposed method can adaptively estimate the parameters of the memoryless nonlinearity in dynamic impulsive noise environments and achieve totically-optimal parameter estimation. To specific, we design a High-Amplitude Priority Downsampling method to extract the key amplitude characteristics from the input signal, which effectively resolves the issue of extracting amplitude features of impulsive noise. Besides, to address the issue of performance degradation due to insufficient training samples, we propose a novel training method that integrates progressive fine-tuning to complete the training only using few samples. Furthermore, we conduct experiments on signal-to-noise ratio (SNR) and bit error rate (BER) of the signal after impulsive noise suppression. The results validate that the parameters estimated by the proposed method can approximate the theoretical optimal values and the proposed method can effectively suppress impulsive noise and outperform the traditional methods in terms of SNR and BER.
{"title":"Parameter Estimation for Adaptive Impulsive Noise Suppression: A Deep Learning-Based Memoryless Nonlinearity Approach","authors":"Zhu Xiao;Yiqiu Zhang;Tong Li;Jing Bai;Siwang Zhou;Yonghu Zhang","doi":"10.1109/TBC.2025.3550016","DOIUrl":"https://doi.org/10.1109/TBC.2025.3550016","url":null,"abstract":"In the OFDM-based digital terrestrial broadcasting systems, impulsive noise is a significant factor affecting communication quality. A prominent method to suppress impulsive noise is to incorporate a memoryless nonlinearity at the receiver front-end of the OFDM demodulator, in which parameter estimation of memoryless nonlinearity directly impact the effectiveness of impulsive noise suppression. In this paper, we proposes a deep learning-based memoryless nonlinearity approach for impulsive noise suppression. The proposed method can adaptively estimate the parameters of the memoryless nonlinearity in dynamic impulsive noise environments and achieve totically-optimal parameter estimation. To specific, we design a High-Amplitude Priority Downsampling method to extract the key amplitude characteristics from the input signal, which effectively resolves the issue of extracting amplitude features of impulsive noise. Besides, to address the issue of performance degradation due to insufficient training samples, we propose a novel training method that integrates progressive fine-tuning to complete the training only using few samples. Furthermore, we conduct experiments on signal-to-noise ratio (SNR) and bit error rate (BER) of the signal after impulsive noise suppression. The results validate that the parameters estimated by the proposed method can approximate the theoretical optimal values and the proposed method can effectively suppress impulsive noise and outperform the traditional methods in terms of SNR and BER.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 2","pages":"641-652"},"PeriodicalIF":3.2,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-21DOI: 10.1109/TBC.2025.3550020
Yang Wang;Chuang Yang;Mugen Peng
Terahertz (THz) communication is considered as one of the most critical technologies for 6G broadcasting communications because of its abundant bandwidth. To compensate for the high propagation of THz, analog/digital hybrid precoding for THz massive multiple input multiple output (MIMO) is proposed to focus signals and extend the broadcasting communication range. Notably, considering hardware cost and power consumption, infinite and high-resolution phase shifters (PSs) are difficult to implement in THz massive MIMO, and low-resolution PSs are typically adopted in practice. However, low-resolution PSs cause severe performance degradation, which also poses challenges for the design of analog precoders for multi-carrier systems. Moreover, THz communication with broadband suffers severe frequency selective fading, further increasing the analog precoder design difficulty. Motivated by the above factors, in this paper, we propose a new heuristic algorithm under a fully connected (FC) structure and partially-connected (PC) architecture, which firstly decouples partially the digital precoder and the analog precoder and then optimizes alternately. To further improve the performance, we extend our partial decoupling method to dynamic subarrays in which each RF chain is connected to an antenna that does not duplicate. The numerical results demonstrate that our proposed THz hybrid precoding with low-resolution PSs achieves better performance to the comparisons for both FC structure and PC structure.
{"title":"Terahertz Hybrid Precoding With Low-Resolution PSs Under Frequency Selective Channel: A Partial Decoupling Method","authors":"Yang Wang;Chuang Yang;Mugen Peng","doi":"10.1109/TBC.2025.3550020","DOIUrl":"https://doi.org/10.1109/TBC.2025.3550020","url":null,"abstract":"Terahertz (THz) communication is considered as one of the most critical technologies for 6G broadcasting communications because of its abundant bandwidth. To compensate for the high propagation of THz, analog/digital hybrid precoding for THz massive multiple input multiple output (MIMO) is proposed to focus signals and extend the broadcasting communication range. Notably, considering hardware cost and power consumption, infinite and high-resolution phase shifters (PSs) are difficult to implement in THz massive MIMO, and low-resolution PSs are typically adopted in practice. However, low-resolution PSs cause severe performance degradation, which also poses challenges for the design of analog precoders for multi-carrier systems. Moreover, THz communication with broadband suffers severe frequency selective fading, further increasing the analog precoder design difficulty. Motivated by the above factors, in this paper, we propose a new heuristic algorithm under a fully connected (FC) structure and partially-connected (PC) architecture, which firstly decouples partially the digital precoder and the analog precoder and then optimizes alternately. To further improve the performance, we extend our partial decoupling method to dynamic subarrays in which each RF chain is connected to an antenna that does not duplicate. The numerical results demonstrate that our proposed THz hybrid precoding with low-resolution PSs achieves better performance to the comparisons for both FC structure and PC structure.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 2","pages":"453-466"},"PeriodicalIF":3.2,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-15DOI: 10.1109/TBC.2025.3565895
Hui Hu;Yunhui Shi;Jin Wang;Nam Ling;Baocai Yin
Based on the measured latitude and longitude, users can freely view different perspectives of the omnidirectional image. Typically, omnidirectional images are represented in the equirectangular projection (ERP) format. Although ERP images suffer from distortion and redundancy due to oversampling, making traditional codec inefficient, they maintain visual consistency and enhance compatibility with deep learning-based image processing tools. This has led to the emergence of end-to-end omnidirectional image compression methods based on the ERP format. In fact, transform coding, a key component in learned planar image compression, has not yet been fully explored in the domain of learned omnidirectional image compression. In this paper, we propose a transform coding method with adaptive latitude-aware and importance-activated features for omnidirectional image compression. Specifically, the adaptive latitude-aware mechanism comprises two modules. The first module, termed Adaptive Latitude-aware Module (ALAM), employs rectangular dilated convolutional kernels of multiple sizes to perceive distortion redundancy across different latitudes, followed by latitude-adaptive weighting to select optimal features for respective latitudes. The second module, named Multi-scale Convolutional Gated Feedforward Network (MCGFN), fully exploits local contextual information while suppressing feature redundancy induced by diverse dilated convolutions in the first module. Furthermore, to further reduce ERP redundancy, we design an importance-activated spatial feature transform module that regulates latent representations to allocate more bits to significant regions. Experimental results demonstrate that our proposed method outperforms existing VVC standards and learning-based omnidirectional image compression approaches at medium-to-high bitrates while maintaining low computational complexity.
{"title":"Adaptive Latitude-Aware and Importance-Activated Transform Coding for Learned Omnidirectional Image Compression","authors":"Hui Hu;Yunhui Shi;Jin Wang;Nam Ling;Baocai Yin","doi":"10.1109/TBC.2025.3565895","DOIUrl":"https://doi.org/10.1109/TBC.2025.3565895","url":null,"abstract":"Based on the measured latitude and longitude, users can freely view different perspectives of the omnidirectional image. Typically, omnidirectional images are represented in the equirectangular projection (ERP) format. Although ERP images suffer from distortion and redundancy due to oversampling, making traditional codec inefficient, they maintain visual consistency and enhance compatibility with deep learning-based image processing tools. This has led to the emergence of end-to-end omnidirectional image compression methods based on the ERP format. In fact, transform coding, a key component in learned planar image compression, has not yet been fully explored in the domain of learned omnidirectional image compression. In this paper, we propose a transform coding method with adaptive latitude-aware and importance-activated features for omnidirectional image compression. Specifically, the adaptive latitude-aware mechanism comprises two modules. The first module, termed Adaptive Latitude-aware Module (ALAM), employs rectangular dilated convolutional kernels of multiple sizes to perceive distortion redundancy across different latitudes, followed by latitude-adaptive weighting to select optimal features for respective latitudes. The second module, named Multi-scale Convolutional Gated Feedforward Network (MCGFN), fully exploits local contextual information while suppressing feature redundancy induced by diverse dilated convolutions in the first module. Furthermore, to further reduce ERP redundancy, we design an importance-activated spatial feature transform module that regulates latent representations to allocate more bits to significant regions. Experimental results demonstrate that our proposed method outperforms existing VVC standards and learning-based omnidirectional image compression approaches at medium-to-high bitrates while maintaining low computational complexity.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 3","pages":"874-888"},"PeriodicalIF":4.8,"publicationDate":"2025-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144998171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-12DOI: 10.1109/TBC.2025.3553307
Jian Xiong;Junqi Wu;You Zhou;Shiqing Xu
In recent years, with the advancement of autonomous aerial vehicles (AAV) technologies, small AAVs have been utilized for borderline patrol, especially for real-time video transmission without interruption. However, these small AAVs face limitations in conducting long-endurance and long-distance missions solely relying on their initial onboard resources. To address this issue, this paper introduces a novel combined AAV air resupply system based on energy cycle resupply. In this system, a ground energy resupply station dispatches a replenishing AAV (AAV-R) to dock with it along the border and transmit energy to the task AAV (AAV-T), when its energy resources are depleted, ensuring continuous energy supply. To tackle the challenge of siting the energy recharge station, we propose a greedy siting algorithm utilizing Monte Carlo methods and an algorithm based on ant colony and clustering. Simulations demonstrate that the number of energy recharge stations can be reduced to 47.6% - 52.9% compared to the AAV-T autonomous return recharge scheme. Additionally, we present a Q Learning-based energy cycle resupply algorithm for AAV-R path planning, offering practical applications in real-world borderline patrol scenarios.
{"title":"On Energy Replenishment Station Site Selection and Path Planning for Drone Video Streaming","authors":"Jian Xiong;Junqi Wu;You Zhou;Shiqing Xu","doi":"10.1109/TBC.2025.3553307","DOIUrl":"https://doi.org/10.1109/TBC.2025.3553307","url":null,"abstract":"In recent years, with the advancement of autonomous aerial vehicles (AAV) technologies, small AAVs have been utilized for borderline patrol, especially for real-time video transmission without interruption. However, these small AAVs face limitations in conducting long-endurance and long-distance missions solely relying on their initial onboard resources. To address this issue, this paper introduces a novel combined AAV air resupply system based on energy cycle resupply. In this system, a ground energy resupply station dispatches a replenishing AAV (AAV-R) to dock with it along the border and transmit energy to the task AAV (AAV-T), when its energy resources are depleted, ensuring continuous energy supply. To tackle the challenge of siting the energy recharge station, we propose a greedy siting algorithm utilizing Monte Carlo methods and an algorithm based on ant colony and clustering. Simulations demonstrate that the number of energy recharge stations can be reduced to 47.6% - 52.9% compared to the AAV-T autonomous return recharge scheme. Additionally, we present a Q Learning-based energy cycle resupply algorithm for AAV-R path planning, offering practical applications in real-world borderline patrol scenarios.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 3","pages":"862-873"},"PeriodicalIF":4.8,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144998336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-11DOI: 10.1109/TBC.2025.3541860
Chunguang Li;Dayoung Lee;Minseok Song
360-degree videos inherently require significant storage space because each segment consists of many tiles, each of which is further transcoded and stored in multiple versions. It is thus impractical to store all transcoded versions, which makes it essential to make effective use of limited storage space. However, the inefficiency of existing heuristic-based management schemes arises from the challenge of incorporating various factors, such as variable bandwidth requirements influenced by network conditions, tile access distribution, and video quality dependent on content. To address this, we propose a new storage space management scheme, which combines the dueling deep Q-network (DQN) algorithm based on the field-of-view (FoV) distribution and the greedy algorithm that considers the overall video popularity. We first model an environment in which the agent can determine the versions for each tile to achieve the best video quality under various storage limit conditions. The dueling DQN environment comprises 1) an action space determining version combinations for each tile within specified storage limits, 2) an observation space enabling the agent to learn variable bandwidths and tile access distributions, and 3) a reward model deriving the expected video quality for different actions. Building upon the dueling DQN model correlating storage limits with expected video quality, we present a greedy algorithm that selects versions among multiple videos within storage limits for the purpose of maximizing popularity-weighted video quality. Extensive simulations evaluated the proposed scheme under various storage limits, bandwidth changes, and FoV distributions, demonstrating an improvement in overall popularity-weighted video quality ranging from 0.49% to 37.77% (with an average improvement of 13.96%) compared to existing benchmark schemes.
{"title":"Using Deep Reinforcement Learning (DRL) to Optimize Quality in 360-Degree Video Tile Management","authors":"Chunguang Li;Dayoung Lee;Minseok Song","doi":"10.1109/TBC.2025.3541860","DOIUrl":"https://doi.org/10.1109/TBC.2025.3541860","url":null,"abstract":"360-degree videos inherently require significant storage space because each segment consists of many tiles, each of which is further transcoded and stored in multiple versions. It is thus impractical to store all transcoded versions, which makes it essential to make effective use of limited storage space. However, the inefficiency of existing heuristic-based management schemes arises from the challenge of incorporating various factors, such as variable bandwidth requirements influenced by network conditions, tile access distribution, and video quality dependent on content. To address this, we propose a new storage space management scheme, which combines the dueling deep Q-network (DQN) algorithm based on the field-of-view (FoV) distribution and the greedy algorithm that considers the overall video popularity. We first model an environment in which the agent can determine the versions for each tile to achieve the best video quality under various storage limit conditions. The dueling DQN environment comprises 1) an action space determining version combinations for each tile within specified storage limits, 2) an observation space enabling the agent to learn variable bandwidths and tile access distributions, and 3) a reward model deriving the expected video quality for different actions. Building upon the dueling DQN model correlating storage limits with expected video quality, we present a greedy algorithm that selects versions among multiple videos within storage limits for the purpose of maximizing popularity-weighted video quality. Extensive simulations evaluated the proposed scheme under various storage limits, bandwidth changes, and FoV distributions, demonstrating an improvement in overall popularity-weighted video quality ranging from 0.49% to 37.77% (with an average improvement of 13.96%) compared to existing benchmark schemes.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 2","pages":"555-569"},"PeriodicalIF":3.2,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The evolution of 5G and Beyond 5G (B5G) networks has intensified the demand for efficient Multimedia Broadcast Multicast Services (MBMS), particularly in dynamic edge environments. The frequent alterations in network topology and multicast group configurations in these environments present substantial scalability challenges for traditional IP MultiCast (IPMC) mechanisms. Bit Index Explicit Replication (BIER) offers a stateless IPMC alternative that mitigates the limitations of traditional IPMC mechanisms. However, it still encounters fault tolerance issues in dynamic edge networks, where link faults occur frequently. This paper propose a Fault-Tolerant BIER Multicast (FTBM) mechanism specifically designed for MBMS in dynamic edge networks. FTBM optimizes BIER multicast paths by employing Multi-Agent Deep Reinforcement Learning (MADRL) to minimize transmission delays while addressing constraints such as random link faults, limited queue capacity, and forwarding restrictions. Extensive simulations demonstrate that FTBM significantly enhances multicast performance under varying traffic loads and dense fault conditions, leading to improved transmission efficiency and network load balancing. This work provides a resilient and scalable solution for next-generation MBMS in dynamic network environments.
5G和超5G (B5G)网络的发展加剧了对高效多媒体广播多播服务(MBMS)的需求,特别是在动态边缘环境中。在这些环境中,网络拓扑结构和组播组配置的频繁变化给传统的IP组播(IPMC)机制带来了巨大的可扩展性挑战。Bit Index Explicit Replication (BIER)提供了一种无状态的IPMC替代方案,减轻了传统IPMC机制的局限性。但是,在链路故障频繁发生的动态边缘网络中,仍然存在容错问题。针对动态边缘网络中的MBMS,提出了一种容错BIER组播(FTBM)机制。FTBM通过使用多智能体深度强化学习(MADRL)来优化BIER组播路径,以最大限度地减少传输延迟,同时解决诸如随机链路故障,有限队列容量和转发限制等约束。大量的仿真结果表明,在不同的流量负载和密集的故障条件下,FTBM可以显著提高组播性能,从而提高传输效率和网络负载均衡。这项工作为动态网络环境下的下一代MBMS提供了弹性和可扩展的解决方案。
{"title":"FTBM: A Fault-Tolerant BIER Multicast for MBMS in 5G/B5G Dynamic Edge Networks","authors":"Honglin Fang;Peng Yu;Xinxiu Liu;Ying Wang;Wenjing Li;Xuesong Qiu;Zhaowei Qu","doi":"10.1109/TBC.2025.3541889","DOIUrl":"https://doi.org/10.1109/TBC.2025.3541889","url":null,"abstract":"The evolution of 5G and Beyond 5G (B5G) networks has intensified the demand for efficient Multimedia Broadcast Multicast Services (MBMS), particularly in dynamic edge environments. The frequent alterations in network topology and multicast group configurations in these environments present substantial scalability challenges for traditional IP MultiCast (IPMC) mechanisms. Bit Index Explicit Replication (BIER) offers a stateless IPMC alternative that mitigates the limitations of traditional IPMC mechanisms. However, it still encounters fault tolerance issues in dynamic edge networks, where link faults occur frequently. This paper propose a Fault-Tolerant BIER Multicast (FTBM) mechanism specifically designed for MBMS in dynamic edge networks. FTBM optimizes BIER multicast paths by employing Multi-Agent Deep Reinforcement Learning (MADRL) to minimize transmission delays while addressing constraints such as random link faults, limited queue capacity, and forwarding restrictions. Extensive simulations demonstrate that FTBM significantly enhances multicast performance under varying traffic loads and dense fault conditions, leading to improved transmission efficiency and network load balancing. This work provides a resilient and scalable solution for next-generation MBMS in dynamic network environments.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 2","pages":"411-425"},"PeriodicalIF":3.2,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}