Video compression artifact removal focuses on enhancing the visual quality of compressed videos by mitigating visual distortions. However, existing methods often struggle to effectively capture spatio-temporal features and recover high-frequency details, due to their suboptimal adaptation to the characteristics of compression artifacts. To overcome these limitations, we propose a novel Spatio-Temporal and Frequency Fusion (STFF) framework. STFF incorporates three key components: Feature Extraction and Alignment (FEA), which employs SRU for effective spatiotemporal feature extraction; Bidirectional High-Frequency Enhanced Propagation (BHFEP), which integrates HCAB to restore high-frequency details through bidirectional propagation; and Residual High-Frequency Refinement (RHFR), which further enhances high-frequency information. Extensive experiments demonstrate that STFF achieves superior performance compared to state-of-the-art methods in both objective metrics and subjective visual quality, effectively addressing the challenges posed by video compression artifacts. Trained model available: https://github.com/Stars-WMX/STFF.
{"title":"STFF: Spatio-Temporal and Frequency Fusion for Video Compression Artifact Removal","authors":"Mingxing Wang;Yipeng Liao;Weiling Chen;Liqun Lin;Tiesong Zhao","doi":"10.1109/TBC.2025.3550018","DOIUrl":"https://doi.org/10.1109/TBC.2025.3550018","url":null,"abstract":"Video compression artifact removal focuses on enhancing the visual quality of compressed videos by mitigating visual distortions. However, existing methods often struggle to effectively capture spatio-temporal features and recover high-frequency details, due to their suboptimal adaptation to the characteristics of compression artifacts. To overcome these limitations, we propose a novel Spatio-Temporal and Frequency Fusion (STFF) framework. STFF incorporates three key components: Feature Extraction and Alignment (FEA), which employs SRU for effective spatiotemporal feature extraction; Bidirectional High-Frequency Enhanced Propagation (BHFEP), which integrates HCAB to restore high-frequency details through bidirectional propagation; and Residual High-Frequency Refinement (RHFR), which further enhances high-frequency information. Extensive experiments demonstrate that STFF achieves superior performance compared to state-of-the-art methods in both objective metrics and subjective visual quality, effectively addressing the challenges posed by video compression artifacts. Trained model available: <uri>https://github.com/Stars-WMX/STFF</uri>.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 2","pages":"542-554"},"PeriodicalIF":3.2,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-18DOI: 10.1109/TBC.2025.3553295
Jianjun Lei;Hao Li;Bo Peng;Bo Zhao;Nam Ling
Recently, learning-based light field (LF) image compression methods have achieved impressive progress, while end-to-end spatially scalable LF image compression (SS-LFIC) has not been explored. To tackle this problem, this paper proposes an end-to-end spatially scalable LF compression network (SSLFC-Net). In the SSLFC-Net, a spatial-angular domain-specific enhancement layer coding strategy is designed to boost the coding performance of the enhancement layers (ELs). Specifically, by referencing domain-specific features, the ELs compress spatial features by predictive coding in the spatial domain to effectively remove inter-layer spatial redundancy, and reconstruct angular features by decoder-side generative method in the angular domain to strategically avoid angular compression. Particularly, to produce accurate spatial predictions and reconstruct high-quality LF images, an inter-layer spatial prediction module and a spatial-angular context-aware reconstruction module are presented to collaboratively promote EL compression. Experiments show that the proposed SSLFC-Net effectively supports spatial scalability and achieves state-of-the-art rate-distortion performance.
{"title":"An End-to-End Spatially Scalable Light Field Image Compression Method","authors":"Jianjun Lei;Hao Li;Bo Peng;Bo Zhao;Nam Ling","doi":"10.1109/TBC.2025.3553295","DOIUrl":"https://doi.org/10.1109/TBC.2025.3553295","url":null,"abstract":"Recently, learning-based light field (LF) image compression methods have achieved impressive progress, while end-to-end spatially scalable LF image compression (SS-LFIC) has not been explored. To tackle this problem, this paper proposes an end-to-end spatially scalable LF compression network (SSLFC-Net). In the SSLFC-Net, a spatial-angular domain-specific enhancement layer coding strategy is designed to boost the coding performance of the enhancement layers (ELs). Specifically, by referencing domain-specific features, the ELs compress spatial features by predictive coding in the spatial domain to effectively remove inter-layer spatial redundancy, and reconstruct angular features by decoder-side generative method in the angular domain to strategically avoid angular compression. Particularly, to produce accurate spatial predictions and reconstruct high-quality LF images, an inter-layer spatial prediction module and a spatial-angular context-aware reconstruction module are presented to collaboratively promote EL compression. Experiments show that the proposed SSLFC-Net effectively supports spatial scalability and achieves state-of-the-art rate-distortion performance.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 2","pages":"570-580"},"PeriodicalIF":3.2,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Intercity railways are vital to modern transportation systems, providing high-speed and efficient connections between cities. With the increasing demand for onboard entertainment and real-time monitoring systems, ensuring high Quality of Experience (QoE) video transmission has become a critical challenge. The unique characteristics of intercity railways, such as predictable railway schedules, spatial routes, and passenger-induced tidal effects, offer significant opportunities for optimizing video transmission performance. However, existing video streaming solutions must fully leverage these characteristics, resulting in inefficient bandwidth utilization, unstable video quality, and frequent interruptions caused by rapid train velocity, frequent handovers, and fluctuating network loads. This paper proposes an Environmental Information Enhanced adaptive video streaming (EIE-ABR) scheme that integrates environmental information with advanced techniques to address these challenges. Firstly, the scheme employs Deep Reinforcement Learning (DRL) to model the dynamic relationship between train speed and base station distance, enabling proactive bitrate adjustments in response to fluctuating network conditions. Secondly, EIE-ABR uses seasonal trend decomposition (STL) to capture throughput variations driven by periodic patterns, such as railway schedules and tidal effects, as well as abrupt disruptions from handovers or link failures. By combining DRL with STL, EIE-ABR achieves accurate throughput prediction and adapts effectively to the highly dynamic intercity railway environment. Simulation results show that EIE-ABR outperforms existing ABR algorithms, achieving an 11.22% improvement in average QoE reward.
{"title":"Environment Information Enhanced Neural Adaptive Bitrate Video Streaming for Intercity Railway","authors":"Liuchang Yang;Guanghua Liu;Shuo Li;Jintang Zhao;Tao Jiang","doi":"10.1109/TBC.2025.3559002","DOIUrl":"https://doi.org/10.1109/TBC.2025.3559002","url":null,"abstract":"Intercity railways are vital to modern transportation systems, providing high-speed and efficient connections between cities. With the increasing demand for onboard entertainment and real-time monitoring systems, ensuring high Quality of Experience (QoE) video transmission has become a critical challenge. The unique characteristics of intercity railways, such as predictable railway schedules, spatial routes, and passenger-induced tidal effects, offer significant opportunities for optimizing video transmission performance. However, existing video streaming solutions must fully leverage these characteristics, resulting in inefficient bandwidth utilization, unstable video quality, and frequent interruptions caused by rapid train velocity, frequent handovers, and fluctuating network loads. This paper proposes an Environmental Information Enhanced adaptive video streaming (EIE-ABR) scheme that integrates environmental information with advanced techniques to address these challenges. Firstly, the scheme employs Deep Reinforcement Learning (DRL) to model the dynamic relationship between train speed and base station distance, enabling proactive bitrate adjustments in response to fluctuating network conditions. Secondly, EIE-ABR uses seasonal trend decomposition (STL) to capture throughput variations driven by periodic patterns, such as railway schedules and tidal effects, as well as abrupt disruptions from handovers or link failures. By combining DRL with STL, EIE-ABR achieves accurate throughput prediction and adapts effectively to the highly dynamic intercity railway environment. Simulation results show that EIE-ABR outperforms existing ABR algorithms, achieving an 11.22% improvement in average QoE reward.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 3","pages":"849-861"},"PeriodicalIF":4.8,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144998288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Broadcast and broadband converged transmission has emerged as a prominent research focus within broadcast technology. Abundant corresponding studies have been conducted in traditional terrestrial broadcast and 3GPP unicast systems. However, due to issues like system compatibility, traditional terrestrial broadcasts usually reveal insufficient flexibility in transmitting broadband services, and conventional unicast systems always perform inefficiently in delivering broadcast services in scenarios of converged transmission. In addition, as the current Non-Orthogonal Multiplexing (NOM) scheme employed in converged transmission usually does not comply with the Gray-mapping rule, the required codeword-level Successive Interference Cancellation (SIC) algorithm makes the Enhanced Layer (EL) data share the same processing delay as the Core Layer (CL) one, which restricts the variety of EL services. This paper focuses on the physical layer technologies of converged transmission in the 3GPP LTE-based 5G Broadcast system. Due to the inherent good compatibility with both broadcast and broadband systems, LTE-based 5G Broadcast has great potential in realizing the converged transmission of broadcast and broadband. In addition, a novel converged transmission scheme enhanced by Gray-mapped NOM is proposed in this paper, and the corresponding networking architecture, frame structure, transmitting processing, and receiving algorithms are put forward. By significantly improving the performance of the non-SIC receiving algorithm, the proposed Gray-mapped NOM-enhanced SFN (GNeSFN) scheme enables the EL customized services and the CL broadcast services to have processing delays independent from each other, bringing more flexibility to converged transmission. Link-level simulations are carried out with different system configurations and multiple channel scenarios, verifying the effectiveness and feasibility of the proposed scheme.
广播与宽带融合传输已成为广播技术领域的一个重要研究热点。在传统的地面广播和3GPP单播系统中进行了大量相应的研究。然而,由于系统兼容性等问题,传统地面广播在传输宽带业务时往往灵活性不足,传统单播系统在融合传输场景下传输广播业务的效率往往不高。此外,由于当前融合传输中采用的非正交复用(NOM)方案通常不符合灰度映射规则,所需的码字级连续干扰抵消(SIC)算法使得增强层(EL)数据与核心层(CL)数据共享相同的处理延迟,限制了EL业务的多样性。本文重点研究了基于3GPP lte的5G广播系统中融合传输的物理层技术。基于lte的5G广播由于其固有的对广播和宽带系统的良好兼容性,在实现广播和宽带融合传输方面具有很大的潜力。此外,本文还提出了一种基于灰度映射NOM增强的融合传输方案,并给出了相应的网络架构、帧结构、发送处理和接收算法。本文提出的GNeSFN (grey -map nomo -enhanced SFN)方案通过显著提高非sic接收算法的性能,使EL定制业务和CL广播业务具有相互独立的处理时延,为融合传输带来更大的灵活性。在不同的系统配置和多信道场景下进行链路级仿真,验证了所提方案的有效性和可行性。
{"title":"Gray-Mapped NOM-Enhanced SFN: A Broadcast and Broadband Converged Transmission Solution in LTE-Based 5G Broadcast","authors":"Haoyang Li;Dazhi He;Yin Xu;Kewu Peng;Yunfeng Guan;Wenjun Zhang","doi":"10.1109/TBC.2025.3553318","DOIUrl":"https://doi.org/10.1109/TBC.2025.3553318","url":null,"abstract":"Broadcast and broadband converged transmission has emerged as a prominent research focus within broadcast technology. Abundant corresponding studies have been conducted in traditional terrestrial broadcast and 3GPP unicast systems. However, due to issues like system compatibility, traditional terrestrial broadcasts usually reveal insufficient flexibility in transmitting broadband services, and conventional unicast systems always perform inefficiently in delivering broadcast services in scenarios of converged transmission. In addition, as the current Non-Orthogonal Multiplexing (NOM) scheme employed in converged transmission usually does not comply with the Gray-mapping rule, the required codeword-level Successive Interference Cancellation (SIC) algorithm makes the Enhanced Layer (EL) data share the same processing delay as the Core Layer (CL) one, which restricts the variety of EL services. This paper focuses on the physical layer technologies of converged transmission in the 3GPP LTE-based 5G Broadcast system. Due to the inherent good compatibility with both broadcast and broadband systems, LTE-based 5G Broadcast has great potential in realizing the converged transmission of broadcast and broadband. In addition, a novel converged transmission scheme enhanced by Gray-mapped NOM is proposed in this paper, and the corresponding networking architecture, frame structure, transmitting processing, and receiving algorithms are put forward. By significantly improving the performance of the non-SIC receiving algorithm, the proposed Gray-mapped NOM-enhanced SFN (GNeSFN) scheme enables the EL customized services and the CL broadcast services to have processing delays independent from each other, bringing more flexibility to converged transmission. Link-level simulations are carried out with different system configurations and multiple channel scenarios, verifying the effectiveness and feasibility of the proposed scheme.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 2","pages":"426-438"},"PeriodicalIF":3.2,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-08DOI: 10.1109/TBC.2025.3549994
Tao Zhou;Liang Chen;Jing Sun;Zhenghang Jiao
Digital multimedia broadcast (DTMB) signal presents a potential opportunity for wireless localization. This paper studies the time of arrival (TOA) estimation based on the DTMB signal for localization. Theoretical analysis of the autocorrelation on the DTMB signal suggested that the DTMB signal has the characteristics for localization. In this paper, we propose software-defineded radio (SDR) receiver based on the DTMB signal for localization. The key innovations of the proposed SDR receiver are as follows: 1) employing a narrow Early-Minus-Late Power Delay Discriminator (nEML) in the delay-locked loop (DLL) to improve the multipath resistance; 2) proposing a multi-state fusion filter to improve the robustness and accuracy of the loop filter; 3) utilizing the carrier-to-noise radio (C/N0) to remove the range observation influenced by heavy non-line of sight (NLOS) environment, thereby reducing the impact of low-quality observations. The static field experiments show that the accuracy of TOA ranging is 1.666m. The motion experiment results show that the root mean square error (RMSE) of the TOA measurements from the DTMB receiver is about 16m, and the RMSE of the DTMB localization is about 17.7m, which shows that the designed receiver can provide relatively reliable localization results when processing DTMB signal in complex urban environments.
{"title":"Localization With DTMB Signal Under Complex Urban Environments","authors":"Tao Zhou;Liang Chen;Jing Sun;Zhenghang Jiao","doi":"10.1109/TBC.2025.3549994","DOIUrl":"https://doi.org/10.1109/TBC.2025.3549994","url":null,"abstract":"Digital multimedia broadcast (DTMB) signal presents a potential opportunity for wireless localization. This paper studies the time of arrival (TOA) estimation based on the DTMB signal for localization. Theoretical analysis of the autocorrelation on the DTMB signal suggested that the DTMB signal has the characteristics for localization. In this paper, we propose software-defineded radio (SDR) receiver based on the DTMB signal for localization. The key innovations of the proposed SDR receiver are as follows: 1) employing a narrow Early-Minus-Late Power Delay Discriminator (nEML) in the delay-locked loop (DLL) to improve the multipath resistance; 2) proposing a multi-state fusion filter to improve the robustness and accuracy of the loop filter; 3) utilizing the carrier-to-noise radio (C/N0) to remove the range observation influenced by heavy non-line of sight (NLOS) environment, thereby reducing the impact of low-quality observations. The static field experiments show that the accuracy of TOA ranging is 1.666m. The motion experiment results show that the root mean square error (RMSE) of the TOA measurements from the DTMB receiver is about 16m, and the RMSE of the DTMB localization is about 17.7m, which shows that the designed receiver can provide relatively reliable localization results when processing DTMB signal in complex urban environments.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 2","pages":"439-452"},"PeriodicalIF":3.2,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the evolution of video coding, balancing video compression efficiency with quality has become a critical challenge for researchers and the industry. The development of the next-generation video coding standards, such as Versatile Video Coding (VVC), signifies a significant leap in supporting high-resolution formats including 8K, HDR, and WCG. Currently, machine vision has emerged as a rising research focus, driven by breakthrough in Artificial Intelligence and its growing role in content generation, production, distribution, and storage in multimedia applications. This paper presents a comprehensive survey of the video coding tools in the VVC standard. Additionally, we examine recent research in next-generation video coding, particularly in Beyond VVC and end-to-end coding frameworks. Developments in shared human-machine vision systems are also discussed, emphasizing their relevance in evolving multimedia applications. Finally, this paper provides an outlook on video coding standards, considering their potential to drive next-generation multimedia technologies.
随着视频编码技术的不断发展,如何平衡视频压缩效率和视频压缩质量已成为研究人员和业界面临的重大挑战。VVC (Versatile video coding)等下一代视频编码标准的发展,标志着对8K、HDR、WCG等高分辨率格式的支持实现了重大飞跃。目前,机器视觉已经成为一个新兴的研究热点,这是由人工智能的突破和它在多媒体应用的内容生成、生产、分发和存储中越来越重要的作用所驱动的。本文对VVC标准中的视频编码工具进行了全面的综述。此外,我们研究了下一代视频编码的最新研究,特别是在超越VVC和端到端编码框架。本文还讨论了共享人机视觉系统的发展,强调了它们在不断发展的多媒体应用中的相关性。最后,本文展望了视频编码标准,考虑到它们推动下一代多媒体技术的潜力。
{"title":"A Survey on Recent Advances in Video Coding Technologies and Future Research Directions","authors":"Houbang Guo;Yun Zhou;Hongwei Guo;Zhuqing Jiang;Tian He;Yiyan Wu","doi":"10.1109/TBC.2025.3553306","DOIUrl":"https://doi.org/10.1109/TBC.2025.3553306","url":null,"abstract":"With the evolution of video coding, balancing video compression efficiency with quality has become a critical challenge for researchers and the industry. The development of the next-generation video coding standards, such as Versatile Video Coding (VVC), signifies a significant leap in supporting high-resolution formats including 8K, HDR, and WCG. Currently, machine vision has emerged as a rising research focus, driven by breakthrough in Artificial Intelligence and its growing role in content generation, production, distribution, and storage in multimedia applications. This paper presents a comprehensive survey of the video coding tools in the VVC standard. Additionally, we examine recent research in next-generation video coding, particularly in Beyond VVC and end-to-end coding frameworks. Developments in shared human-machine vision systems are also discussed, emphasizing their relevance in evolving multimedia applications. Finally, this paper provides an outlook on video coding standards, considering their potential to drive next-generation multimedia technologies.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 2","pages":"666-671"},"PeriodicalIF":3.2,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-03DOI: 10.1109/TBC.2025.3553305
Michael Neri;Federica Battisti
During the compression, transmission, and rendering of point clouds, various artifacts are introduced, affecting the quality perceived by the end user. However, evaluating the impact of these distortions on the overall quality is a challenging task. This study introduces PST-PCQA, a no-reference point cloud quality metric based on a low-complexity, learning-based framework. It evaluates point cloud quality by analyzing individual patches, integrating local and global features to predict the Mean Opinion Score. In summary, the process involves extracting features from patches, combining them, and using correlation weights to predict the overall quality. This approach allows us to assess point cloud quality without relying on a reference point cloud, making it particularly useful in scenarios where reference data is unavailable. Experimental tests on three state-of-the-art datasets show good prediction capabilities of PST-PCQA, through the analysis of different feature pooling strategies and its ability to generalize across different datasets. The ablation study confirms the benefits of evaluating quality on a patch-by-patch basis. Additionally, PST-PCQA’s light-weight structure, with a small number of parameters to learn, makes it well-suited for real-time applications and devices with limited computational capacity. For reproducibility purposes, we made code, model, and pretrained weights available at https://github.com/michaelneri/PST-PCQA.
{"title":"Low-Complexity Patch-Based No-Reference Point Cloud Quality Metric Exploiting Weighted Structure and Texture Features","authors":"Michael Neri;Federica Battisti","doi":"10.1109/TBC.2025.3553305","DOIUrl":"https://doi.org/10.1109/TBC.2025.3553305","url":null,"abstract":"During the compression, transmission, and rendering of point clouds, various artifacts are introduced, affecting the quality perceived by the end user. However, evaluating the impact of these distortions on the overall quality is a challenging task. This study introduces PST-PCQA, a no-reference point cloud quality metric based on a low-complexity, learning-based framework. It evaluates point cloud quality by analyzing individual patches, integrating local and global features to predict the Mean Opinion Score. In summary, the process involves extracting features from patches, combining them, and using correlation weights to predict the overall quality. This approach allows us to assess point cloud quality without relying on a reference point cloud, making it particularly useful in scenarios where reference data is unavailable. Experimental tests on three state-of-the-art datasets show good prediction capabilities of PST-PCQA, through the analysis of different feature pooling strategies and its ability to generalize across different datasets. The ablation study confirms the benefits of evaluating quality on a patch-by-patch basis. Additionally, PST-PCQA’s light-weight structure, with a small number of parameters to learn, makes it well-suited for real-time applications and devices with limited computational capacity. For reproducibility purposes, we made code, model, and pretrained weights available at <uri>https://github.com/michaelneri/PST-PCQA</uri>.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 2","pages":"631-640"},"PeriodicalIF":3.2,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10948465","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose a Frame-Channel Polarization (FCP) technique to enhance wireless transmission reliability for low-latency mobile video in Multiple-Input Multiple-Output Orthogonal Frequency-Division Multiplexing (MIMO-OFDM) systems. We begin by analyzing the reliability of video frame transmission, quantified by the Transmission Success Probability (TSP), and derive closed-form TSP expressions under Maximum Ratio Combining (MRC) for a single subcarrier. We also summarize the corresponding TSP formulation for Zero-Forcing (ZF). To extend the analysis to multiple subcarriers, we introduce a dynamic programming approach that computes the TSP for multiple subcarriers based on the single-subcarrier results, thereby reducing computational complexity from exponential to polynomial. Using TSP as a reliability metric, the FCP method dynamically prioritizes subcarrier allocation, assigning more resources to high-priority video frames while allocating fewer subcarriers to lower-priority frames. As a result, the reliability of frame channels becomes polarized, with the degree of polarization directly linked to the reliability requirements of each frame. Experimental results validate the accuracy of the derived TSP expressions for both single and multiple subcarriers and demonstrate that the FCP method significantly improves transmission reliability compared to existing methods, achieving improvements in reliability for low-latency video transmission.
{"title":"Frame-Channel Polarization for Improved Reliability in Mobile Video Wireless Transmission","authors":"Zhaoyang Wang;Jiaxi Zhou;Guanghua Liu;Yangyang Liu;Ting Bi;Tao Jiang","doi":"10.1109/TBC.2025.3549991","DOIUrl":"https://doi.org/10.1109/TBC.2025.3549991","url":null,"abstract":"In this paper, we propose a Frame-Channel Polarization (FCP) technique to enhance wireless transmission reliability for low-latency mobile video in Multiple-Input Multiple-Output Orthogonal Frequency-Division Multiplexing (MIMO-OFDM) systems. We begin by analyzing the reliability of video frame transmission, quantified by the Transmission Success Probability (TSP), and derive closed-form TSP expressions under Maximum Ratio Combining (MRC) for a single subcarrier. We also summarize the corresponding TSP formulation for Zero-Forcing (ZF). To extend the analysis to multiple subcarriers, we introduce a dynamic programming approach that computes the TSP for multiple subcarriers based on the single-subcarrier results, thereby reducing computational complexity from exponential to polynomial. Using TSP as a reliability metric, the FCP method dynamically prioritizes subcarrier allocation, assigning more resources to high-priority video frames while allocating fewer subcarriers to lower-priority frames. As a result, the reliability of frame channels becomes polarized, with the degree of polarization directly linked to the reliability requirements of each frame. Experimental results validate the accuracy of the derived TSP expressions for both single and multiple subcarriers and demonstrate that the FCP method significantly improves transmission reliability compared to existing methods, achieving improvements in reliability for low-latency video transmission.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 2","pages":"467-479"},"PeriodicalIF":3.2,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-02DOI: 10.1109/TBC.2025.3549983
Yang Zhang;Hanling Wang;Qing Bai;Haifeng Liang;Peican Zhu;Gabriel-Miro Muntean;Qing Li
The advancement of Large Language Models (LLMs) with vision capabilities in recent years has elevated video analytics applications to new heights. To address the limited computing and bandwidth resources on edge devices, edge-cloud collaborative video analytics has emerged as a promising paradigm. However, most existing edge-cloud video analytics systems are designed for traditional deep learning models (e.g., image classification and object detection), where each model handles a specific task. In this paper, we introduce VaVLM, a novel edge-cloud collaborative video analytics system tailored for Vision-Language Models (VLMs), which can support multiple tasks using a single model. VaVLM aims to enhance the performance of VLM-powered video analytics systems in three key aspects. First, to reduce bandwidth consumption during video transmission, we propose a novel Region-of-Interest (RoI) generation mechanism based on the VLM’s understanding of the task and scene. Second, to lower inference costs, we design a task-oriented inference trigger that processes only a subset of video frames using an optimized inference logic. Third, to improve inference accuracy, the model is augmented with additional information from both the environment and auxiliary analytics models during the inference stage. Extensive experiments on real-world datasets demonstrate that VaVLM achieves an 80.3% reduction in bandwidth consumption and an 89.5% reduction in computational cost compared to baseline methods.
{"title":"VaVLM: Toward Efficient Edge-Cloud Video Analytics With Vision-Language Models","authors":"Yang Zhang;Hanling Wang;Qing Bai;Haifeng Liang;Peican Zhu;Gabriel-Miro Muntean;Qing Li","doi":"10.1109/TBC.2025.3549983","DOIUrl":"https://doi.org/10.1109/TBC.2025.3549983","url":null,"abstract":"The advancement of Large Language Models (LLMs) with vision capabilities in recent years has elevated video analytics applications to new heights. To address the limited computing and bandwidth resources on edge devices, edge-cloud collaborative video analytics has emerged as a promising paradigm. However, most existing edge-cloud video analytics systems are designed for traditional deep learning models (e.g., image classification and object detection), where each model handles a specific task. In this paper, we introduce VaVLM, a novel edge-cloud collaborative video analytics system tailored for Vision-Language Models (VLMs), which can support multiple tasks using a single model. VaVLM aims to enhance the performance of VLM-powered video analytics systems in three key aspects. First, to reduce bandwidth consumption during video transmission, we propose a novel Region-of-Interest (RoI) generation mechanism based on the VLM’s understanding of the task and scene. Second, to lower inference costs, we design a task-oriented inference trigger that processes only a subset of video frames using an optimized inference logic. Third, to improve inference accuracy, the model is augmented with additional information from both the environment and auxiliary analytics models during the inference stage. Extensive experiments on real-world datasets demonstrate that VaVLM achieves an 80.3% reduction in bandwidth consumption and an 89.5% reduction in computational cost compared to baseline methods.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 2","pages":"529-541"},"PeriodicalIF":3.2,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10947590","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01DOI: 10.1109/TBC.2025.3553298
Rongxing Guo;Junsheng Mu;Jia Zhu;Lei Liu;Fei Qi;Yi Wang
The coexistence of radar systems and 5G Broadcast/Multicast Service Broadcast (BMSB) networks presents unique challenges in resource allocation. Our study addresses these challenges by developing an innovative approach for simultaneous sub-carrier assignment and power distribution in a scenario where a base station delivers broadcast content to multiple users near a radar installation. Using orthogonal frequency division multiple access (OFDM), we introduce a penalty term to relax binary constraints and consolidate power-related variables, transforming the complex non-linear problem into manageable convex sub-challenges through quadratic transformation. Our results demonstrate the balance between optimizing 5G BMSB performance and preserving radar functionality, revealing that increasing BMSB power beyond a certain point doesn’t improve performance when radar interference is present. This insight contributes to designing energy-efficient 5G BMSB systems that coexist with critical infrastructure.
{"title":"Advanced Spectrum Sharing Techniques for Coexistence of OFDM Radar and 5G BMSB System","authors":"Rongxing Guo;Junsheng Mu;Jia Zhu;Lei Liu;Fei Qi;Yi Wang","doi":"10.1109/TBC.2025.3553298","DOIUrl":"https://doi.org/10.1109/TBC.2025.3553298","url":null,"abstract":"The coexistence of radar systems and 5G Broadcast/Multicast Service Broadcast (BMSB) networks presents unique challenges in resource allocation. Our study addresses these challenges by developing an innovative approach for simultaneous sub-carrier assignment and power distribution in a scenario where a base station delivers broadcast content to multiple users near a radar installation. Using orthogonal frequency division multiple access (OFDM), we introduce a penalty term to relax binary constraints and consolidate power-related variables, transforming the complex non-linear problem into manageable convex sub-challenges through quadratic transformation. Our results demonstrate the balance between optimizing 5G BMSB performance and preserving radar functionality, revealing that increasing BMSB power beyond a certain point doesn’t improve performance when radar interference is present. This insight contributes to designing energy-efficient 5G BMSB systems that coexist with critical infrastructure.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 3","pages":"696-705"},"PeriodicalIF":4.8,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144998015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}