Pub Date : 2025-03-31DOI: 10.1109/TBC.2025.3541875
Rohit Verma;Anderson Augusto Simiscuka;Mohammed Amine Togou;Mikel Zorrilla;Gabriel-Miro Muntean
The collaborative nature of opera production offers a unique opportunity to strengthen societal cohesion and empower marginalized voices through storytelling. However, existing live streaming approaches, such as HTTP-Adaptive Streaming (HAS), are not equipped to handle the complexities of co-created opera content, resulting in suboptimal user experiences. To address these limitations, this article introduces the Live Stream Adaptation for Opera (LSAO), a solution designed as part of the EU Horizon 2020 TRACTION project. LSAO is a network-aware adaptive scheme designed to optimize the delivery of live co-created opera performances by dynamically adjusting audiovisual quality based on varying network conditions. Unlike traditional streaming solutions, LSAO prioritizes the unique demands of opera, ensuring seamless delivery and preserving artistic features. The evaluation of LSAO involved an online live opera show featuring four distinct performances by six artists located in globally distributed locations. Delivered to 35 remote viewers across 12 countries and 3 continents, the LSAO system was evaluated based on user feedback on the quality of their streaming experience. The results demonstrate the effectiveness of LSAO in enhancing audio and video quality levels, leading to heightened user enjoyment during live co-created opera performances. Through its approach and successful evaluation, LSAO represents a significant advancement in the delivery of live co-created opera content.
歌剧制作的合作性质为通过讲故事加强社会凝聚力和增强边缘化声音提供了独特的机会。然而,现有的直播方法,如http自适应流媒体(HAS),无法处理共同创建的歌剧内容的复杂性,导致次优用户体验。为了解决这些限制,本文介绍了Live Stream Adaptation for Opera (LSAO),这是一个作为EU Horizon 2020 TRACTION项目的一部分而设计的解决方案。LSAO是一种网络感知的自适应方案,旨在根据不同的网络条件动态调整视听质量,从而优化现场共创歌剧表演的交付。与传统的流媒体解决方案不同,LSAO优先考虑歌剧的独特需求,确保无缝传输并保留艺术特色。对LSAO的评估涉及一场在线现场歌剧表演,由分布在全球各地的六位艺术家进行四场不同的表演。LSAO系统已交付给3大洲12个国家的35名远程观众,该系统是根据用户对其流媒体体验质量的反馈进行评估的。结果证明了LSAO在提高音频和视频质量水平方面的有效性,从而提高了用户在现场共同创作的歌剧表演中的享受。通过它的方法和成功的评估,LSAO代表了现场共同创作的歌剧内容交付的重大进步。
{"title":"A Live Adaptive Streaming Solution for Enhancing Quality of Experience in Co-Created Opera","authors":"Rohit Verma;Anderson Augusto Simiscuka;Mohammed Amine Togou;Mikel Zorrilla;Gabriel-Miro Muntean","doi":"10.1109/TBC.2025.3541875","DOIUrl":"https://doi.org/10.1109/TBC.2025.3541875","url":null,"abstract":"The collaborative nature of opera production offers a unique opportunity to strengthen societal cohesion and empower marginalized voices through storytelling. However, existing live streaming approaches, such as HTTP-Adaptive Streaming (HAS), are not equipped to handle the complexities of co-created opera content, resulting in suboptimal user experiences. To address these limitations, this article introduces the Live Stream Adaptation for Opera (LSAO), a solution designed as part of the EU Horizon 2020 TRACTION project. LSAO is a network-aware adaptive scheme designed to optimize the delivery of live co-created opera performances by dynamically adjusting audiovisual quality based on varying network conditions. Unlike traditional streaming solutions, LSAO prioritizes the unique demands of opera, ensuring seamless delivery and preserving artistic features. The evaluation of LSAO involved an online live opera show featuring four distinct performances by six artists located in globally distributed locations. Delivered to 35 remote viewers across 12 countries and 3 continents, the LSAO system was evaluated based on user feedback on the quality of their streaming experience. The results demonstrate the effectiveness of LSAO in enhancing audio and video quality levels, leading to heightened user enjoyment during live co-created opera performances. Through its approach and successful evaluation, LSAO represents a significant advancement in the delivery of live co-created opera content.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 2","pages":"480-491"},"PeriodicalIF":3.2,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10945661","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurate spectrum sensing in TV White Space (TVWS) is crucial for enhancing spectral efficiency in 5G Multimedia Broadcast Multicast Services (MBMS) networks. Traditional spectrum sensing techniques suffer from poor performance in low-SNR environments, necessitating a robust, data-driven approach. This study introduces a deep learning-based multi-feature fusion approach that integrates energy detection, cyclostationary analysis, and covariance matrix detection. The proposed model employs an adaptive thresholding mechanism and multi-task learning to enhance detection accuracy while ensuring real-time feasibility in dynamic spectrum environments. Our model implements multi-task learning for concurrent primary user detection and MBMS signal classification, featuring adaptive thresholds that adjust to signal conditions. Develops a novel multi-task learning-based spectrum sensing framework for concurrent primary user detection and MBMS signal classification. Introduces adaptive thresholding mechanisms to improve detection robustness under varying SNR conditions. Achieves 99% classification accuracy at −10 dB SNR, significantly outperforming traditional methods. Demonstrates practical feasibility for real-time spectrum sensing in 5G-MBMS networks.
{"title":"Deep Learning-Based Spectrum Sensing for TV White Space in 5G-MBMS Networks","authors":"Fenghua Xu;Yukun Zhu;Hongyuan Zhu;Junsheng Mu;Jie Wang;Bingxin Wang;Jieliang Zheng","doi":"10.1109/TBC.2025.3553296","DOIUrl":"https://doi.org/10.1109/TBC.2025.3553296","url":null,"abstract":"Accurate spectrum sensing in TV White Space (TVWS) is crucial for enhancing spectral efficiency in 5G Multimedia Broadcast Multicast Services (MBMS) networks. Traditional spectrum sensing techniques suffer from poor performance in low-SNR environments, necessitating a robust, data-driven approach. This study introduces a deep learning-based multi-feature fusion approach that integrates energy detection, cyclostationary analysis, and covariance matrix detection. The proposed model employs an adaptive thresholding mechanism and multi-task learning to enhance detection accuracy while ensuring real-time feasibility in dynamic spectrum environments. Our model implements multi-task learning for concurrent primary user detection and MBMS signal classification, featuring adaptive thresholds that adjust to signal conditions. Develops a novel multi-task learning-based spectrum sensing framework for concurrent primary user detection and MBMS signal classification. Introduces adaptive thresholding mechanisms to improve detection robustness under varying SNR conditions. Achieves 99% classification accuracy at −10 dB SNR, significantly outperforming traditional methods. Demonstrates practical feasibility for real-time spectrum sensing in 5G-MBMS networks.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 3","pages":"706-716"},"PeriodicalIF":4.8,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144998334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Currently, screen content video applications are widely used in our daily lives. As the latest Screen Content Coding (SCC) standard, Versatile Video Coding (VVC) SCC employs a quad-tree plus nested multi-type tree (QTMT) coding structure and various screen content coding modes (CMs). This design enhances the coding efficiency of VVC SCC but also results in a highly complex coding process, which significantly hinders the broader adoption of screen content video technology. Consequently, improving the coding speed of VVC SCC is highly desirable. In this paper, we propose a fast CM and transform decision algorithm for Intra prediction in VVC SCC. Specifically, we initially use Convolutional Neural Networks (CNNs) to predict content types for all Coding Units (CUs). Subsequently, we predict candidate CMs for CUs based on the CM distributions of different content types. We then select the Sum of Absolute Transformed Difference (SATD) as a feature and use a naive Bayes classifier to skip unlikely Intra mode early. Finally, we terminate Block-based Differential Pulse-Code Modulation (BDPCM) early and then select the best transform type in Intra mode prediction to improve coding speed. Experimental results demonstrate that the proposed algorithm improves coding speed by an average of 39.28%, with the BDBR increasing by 0.80%.
{"title":"Fast Coding Mode Decision for Intra Prediction in VVC SCC","authors":"Dayong Wang;Weihong Liu;Zeyu Zhou;Xin Lu;Jinhua Liu;Hui Guo;Ce Zhu","doi":"10.1109/TBC.2025.3541773","DOIUrl":"https://doi.org/10.1109/TBC.2025.3541773","url":null,"abstract":"Currently, screen content video applications are widely used in our daily lives. As the latest Screen Content Coding (SCC) standard, Versatile Video Coding (VVC) SCC employs a quad-tree plus nested multi-type tree (QTMT) coding structure and various screen content coding modes (CMs). This design enhances the coding efficiency of VVC SCC but also results in a highly complex coding process, which significantly hinders the broader adoption of screen content video technology. Consequently, improving the coding speed of VVC SCC is highly desirable. In this paper, we propose a fast CM and transform decision algorithm for Intra prediction in VVC SCC. Specifically, we initially use Convolutional Neural Networks (CNNs) to predict content types for all Coding Units (CUs). Subsequently, we predict candidate CMs for CUs based on the CM distributions of different content types. We then select the Sum of Absolute Transformed Difference (SATD) as a feature and use a naive Bayes classifier to skip unlikely Intra mode early. Finally, we terminate Block-based Differential Pulse-Code Modulation (BDPCM) early and then select the best transform type in Intra mode prediction to improve coding speed. Experimental results demonstrate that the proposed algorithm improves coding speed by an average of 39.28%, with the BDBR increasing by 0.80%.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 2","pages":"506-516"},"PeriodicalIF":3.2,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-28DOI: 10.1109/TBC.2025.3570871
Jian Yue;Mao Ye;Luping Ji;Hongwei Guo;Ce Zhu
With the rapid growth of digital media applications, the need for advanced video compression technology has become indispensable, as achieving high compression ratios often leads to quality degradation, making compressed video quality enhancement a crucial research focus. In recent years, deep learning-based approaches have revolutionized compressed video quality enhancement, far surpassing traditional methods and enabling unprecedented high-quality reconstruction. Leveraging data-driven techniques, deep learning has demonstrated remarkable progress in image and video quality enhancement tasks. This study offers a comprehensive review of recent advances in the enhancement of compressed video quality. It focuses on deep learning-based methods, particularly those leveraging convolutional neural networks, and explores their advantages over traditional approaches. The review is structured around key topics, including task definitions and challenges, general-purpose and domain-specific quality enhancement techniques, as well as datasets and metrics. Beyond summarizing the state of the art, this article offers an in-depth analysis of current methods, highlighting their strengths, limitations, and practical application scenarios. Finally, it identifies future research directions and discusses the critical challenges that remain, with the aim of guiding further exploration in the field of compressed video quality enhancement.
{"title":"A Survey of Deep-Learning-Based Compressed Video Quality Enhancement","authors":"Jian Yue;Mao Ye;Luping Ji;Hongwei Guo;Ce Zhu","doi":"10.1109/TBC.2025.3570871","DOIUrl":"https://doi.org/10.1109/TBC.2025.3570871","url":null,"abstract":"With the rapid growth of digital media applications, the need for advanced video compression technology has become indispensable, as achieving high compression ratios often leads to quality degradation, making compressed video quality enhancement a crucial research focus. In recent years, deep learning-based approaches have revolutionized compressed video quality enhancement, far surpassing traditional methods and enabling unprecedented high-quality reconstruction. Leveraging data-driven techniques, deep learning has demonstrated remarkable progress in image and video quality enhancement tasks. This study offers a comprehensive review of recent advances in the enhancement of compressed video quality. It focuses on deep learning-based methods, particularly those leveraging convolutional neural networks, and explores their advantages over traditional approaches. The review is structured around key topics, including task definitions and challenges, general-purpose and domain-specific quality enhancement techniques, as well as datasets and metrics. Beyond summarizing the state of the art, this article offers an in-depth analysis of current methods, highlighting their strengths, limitations, and practical application scenarios. Finally, it identifies future research directions and discusses the critical challenges that remain, with the aim of guiding further exploration in the field of compressed video quality enhancement.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 4","pages":"977-992"},"PeriodicalIF":4.8,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145766201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-26DOI: 10.1109/TBC.2025.3549985
Bo Hu;Wenzhi Chen;Jia Zheng;Leida Li;Wen Lu;Xinbo Gao
Compared with no-reference image quality assessment (IQA), full-reference IQA often achieves higher consistency with human subjective perception due to the reference information for comparison. A natural idea is to design strategies that allow the latter to guide the former’s learning to achieve better performance. However, how to construct the reference information and how to transfer prior knowledge are two important issues we are going to face that have not been fully explored. To this end, a novel method called no-reference IQA via inter-level adaptive knowledge distillation (AKD-IQA) is proposed. The core of AKD-IQA lies in transferring image distribution difference information from the full-reference teacher model to the no-reference student model through inter-level AKD. First, the teacher model is constructed based on multi-level feature discrepancy extractor and cross-scale feature integrator. Then, it is trained on a large synthetic distortion dataset to establish a comprehensive difference prior distribution. Finally, the image re-distortion strategy and inter-level AKD are introduced into the student model for effective learning. Experimental results on six standard IQA datasets demonstrate that the AKD-IQA achieves state-of-the-art performance. In addition, cross-dataset experiments confirm the superiority of it in generalization ability.
{"title":"No-Reference Image Quality Assessment via Inter-Level Adaptive Knowledge Distillation","authors":"Bo Hu;Wenzhi Chen;Jia Zheng;Leida Li;Wen Lu;Xinbo Gao","doi":"10.1109/TBC.2025.3549985","DOIUrl":"https://doi.org/10.1109/TBC.2025.3549985","url":null,"abstract":"Compared with no-reference image quality assessment (IQA), full-reference IQA often achieves higher consistency with human subjective perception due to the reference information for comparison. A natural idea is to design strategies that allow the latter to guide the former’s learning to achieve better performance. However, how to construct the reference information and how to transfer prior knowledge are two important issues we are going to face that have not been fully explored. To this end, a novel method called no-reference IQA via inter-level adaptive knowledge distillation (AKD-IQA) is proposed. The core of AKD-IQA lies in transferring image distribution difference information from the full-reference teacher model to the no-reference student model through inter-level AKD. First, the teacher model is constructed based on multi-level feature discrepancy extractor and cross-scale feature integrator. Then, it is trained on a large synthetic distortion dataset to establish a comprehensive difference prior distribution. Finally, the image re-distortion strategy and inter-level AKD are introduced into the student model for effective learning. Experimental results on six standard IQA datasets demonstrate that the AKD-IQA achieves state-of-the-art performance. In addition, cross-dataset experiments confirm the superiority of it in generalization ability.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 2","pages":"581-592"},"PeriodicalIF":3.2,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To address the high cost associated with using high-speed and large-acquisition-bandwidth analog-to-digital-converters (ADCs) in the feedback path, a new low-sampling-rate digital predistortion (DPD) method is proposed in this paper. To model the analog bandpass filter (BPF) in the feedback path, a training method for digital finite impulse response (FIR) filter coefficients in a practical band-limited DPD system is proposed, and a filter matrix is constructed in different forms in the case of continuous signal and cyclic signal inputs. The filter matrix provides an extra degree of band-limited power amplifier (PA) model accuracy and robustness. Then, an inverse filter signal recovery (IFSR) method is proposed to recover the full-band output signal of the PA, which can be used to train the predistorter using conventional DPD techniques. Simulation results validates the effectiveness of the IFSR method, demonstrating that the IFSR-DPD method can reduce the ADC sampling rate to 1/10 or less compared to full-rate sampling methods, and decrease the ADC acquisition bandwidth to about 0.3 times that of the original input signal bandwidth. The linearization performance of the IFSR-DPD method is also evaluated on an instrument-based test platform. When the passband and transition band characteristics of the BPF are unsatisfactory, the proposed low-sampling rate DPD method improves the adjacent channel power ratio (ACPR) by 18.67 dB and the error vector magnitude (EVM) by 1.214%, compared to the scenario without DPD.
{"title":"A Low-Sampling-Rate Digital Predistortion Method Based on Inverse Filter Signal Recovery for Wideband Power Amplifiers","authors":"Xiaofang Wu;Jiawen Yan;Dehuang Zhang;Jianyang Zhou","doi":"10.1109/TBC.2025.3549995","DOIUrl":"https://doi.org/10.1109/TBC.2025.3549995","url":null,"abstract":"To address the high cost associated with using high-speed and large-acquisition-bandwidth analog-to-digital-converters (ADCs) in the feedback path, a new low-sampling-rate digital predistortion (DPD) method is proposed in this paper. To model the analog bandpass filter (BPF) in the feedback path, a training method for digital finite impulse response (FIR) filter coefficients in a practical band-limited DPD system is proposed, and a filter matrix is constructed in different forms in the case of continuous signal and cyclic signal inputs. The filter matrix provides an extra degree of band-limited power amplifier (PA) model accuracy and robustness. Then, an inverse filter signal recovery (IFSR) method is proposed to recover the full-band output signal of the PA, which can be used to train the predistorter using conventional DPD techniques. Simulation results validates the effectiveness of the IFSR method, demonstrating that the IFSR-DPD method can reduce the ADC sampling rate to 1/10 or less compared to full-rate sampling methods, and decrease the ADC acquisition bandwidth to about 0.3 times that of the original input signal bandwidth. The linearization performance of the IFSR-DPD method is also evaluated on an instrument-based test platform. When the passband and transition band characteristics of the BPF are unsatisfactory, the proposed low-sampling rate DPD method improves the adjacent channel power ratio (ACPR) by 18.67 dB and the error vector magnitude (EVM) by 1.214%, compared to the scenario without DPD.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 2","pages":"653-665"},"PeriodicalIF":3.2,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the OFDM-based digital terrestrial broadcasting systems, impulsive noise is a significant factor affecting communication quality. A prominent method to suppress impulsive noise is to incorporate a memoryless nonlinearity at the receiver front-end of the OFDM demodulator, in which parameter estimation of memoryless nonlinearity directly impact the effectiveness of impulsive noise suppression. In this paper, we proposes a deep learning-based memoryless nonlinearity approach for impulsive noise suppression. The proposed method can adaptively estimate the parameters of the memoryless nonlinearity in dynamic impulsive noise environments and achieve totically-optimal parameter estimation. To specific, we design a High-Amplitude Priority Downsampling method to extract the key amplitude characteristics from the input signal, which effectively resolves the issue of extracting amplitude features of impulsive noise. Besides, to address the issue of performance degradation due to insufficient training samples, we propose a novel training method that integrates progressive fine-tuning to complete the training only using few samples. Furthermore, we conduct experiments on signal-to-noise ratio (SNR) and bit error rate (BER) of the signal after impulsive noise suppression. The results validate that the parameters estimated by the proposed method can approximate the theoretical optimal values and the proposed method can effectively suppress impulsive noise and outperform the traditional methods in terms of SNR and BER.
{"title":"Parameter Estimation for Adaptive Impulsive Noise Suppression: A Deep Learning-Based Memoryless Nonlinearity Approach","authors":"Zhu Xiao;Yiqiu Zhang;Tong Li;Jing Bai;Siwang Zhou;Yonghu Zhang","doi":"10.1109/TBC.2025.3550016","DOIUrl":"https://doi.org/10.1109/TBC.2025.3550016","url":null,"abstract":"In the OFDM-based digital terrestrial broadcasting systems, impulsive noise is a significant factor affecting communication quality. A prominent method to suppress impulsive noise is to incorporate a memoryless nonlinearity at the receiver front-end of the OFDM demodulator, in which parameter estimation of memoryless nonlinearity directly impact the effectiveness of impulsive noise suppression. In this paper, we proposes a deep learning-based memoryless nonlinearity approach for impulsive noise suppression. The proposed method can adaptively estimate the parameters of the memoryless nonlinearity in dynamic impulsive noise environments and achieve totically-optimal parameter estimation. To specific, we design a High-Amplitude Priority Downsampling method to extract the key amplitude characteristics from the input signal, which effectively resolves the issue of extracting amplitude features of impulsive noise. Besides, to address the issue of performance degradation due to insufficient training samples, we propose a novel training method that integrates progressive fine-tuning to complete the training only using few samples. Furthermore, we conduct experiments on signal-to-noise ratio (SNR) and bit error rate (BER) of the signal after impulsive noise suppression. The results validate that the parameters estimated by the proposed method can approximate the theoretical optimal values and the proposed method can effectively suppress impulsive noise and outperform the traditional methods in terms of SNR and BER.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 2","pages":"641-652"},"PeriodicalIF":3.2,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-21DOI: 10.1109/TBC.2025.3550020
Yang Wang;Chuang Yang;Mugen Peng
Terahertz (THz) communication is considered as one of the most critical technologies for 6G broadcasting communications because of its abundant bandwidth. To compensate for the high propagation of THz, analog/digital hybrid precoding for THz massive multiple input multiple output (MIMO) is proposed to focus signals and extend the broadcasting communication range. Notably, considering hardware cost and power consumption, infinite and high-resolution phase shifters (PSs) are difficult to implement in THz massive MIMO, and low-resolution PSs are typically adopted in practice. However, low-resolution PSs cause severe performance degradation, which also poses challenges for the design of analog precoders for multi-carrier systems. Moreover, THz communication with broadband suffers severe frequency selective fading, further increasing the analog precoder design difficulty. Motivated by the above factors, in this paper, we propose a new heuristic algorithm under a fully connected (FC) structure and partially-connected (PC) architecture, which firstly decouples partially the digital precoder and the analog precoder and then optimizes alternately. To further improve the performance, we extend our partial decoupling method to dynamic subarrays in which each RF chain is connected to an antenna that does not duplicate. The numerical results demonstrate that our proposed THz hybrid precoding with low-resolution PSs achieves better performance to the comparisons for both FC structure and PC structure.
{"title":"Terahertz Hybrid Precoding With Low-Resolution PSs Under Frequency Selective Channel: A Partial Decoupling Method","authors":"Yang Wang;Chuang Yang;Mugen Peng","doi":"10.1109/TBC.2025.3550020","DOIUrl":"https://doi.org/10.1109/TBC.2025.3550020","url":null,"abstract":"Terahertz (THz) communication is considered as one of the most critical technologies for 6G broadcasting communications because of its abundant bandwidth. To compensate for the high propagation of THz, analog/digital hybrid precoding for THz massive multiple input multiple output (MIMO) is proposed to focus signals and extend the broadcasting communication range. Notably, considering hardware cost and power consumption, infinite and high-resolution phase shifters (PSs) are difficult to implement in THz massive MIMO, and low-resolution PSs are typically adopted in practice. However, low-resolution PSs cause severe performance degradation, which also poses challenges for the design of analog precoders for multi-carrier systems. Moreover, THz communication with broadband suffers severe frequency selective fading, further increasing the analog precoder design difficulty. Motivated by the above factors, in this paper, we propose a new heuristic algorithm under a fully connected (FC) structure and partially-connected (PC) architecture, which firstly decouples partially the digital precoder and the analog precoder and then optimizes alternately. To further improve the performance, we extend our partial decoupling method to dynamic subarrays in which each RF chain is connected to an antenna that does not duplicate. The numerical results demonstrate that our proposed THz hybrid precoding with low-resolution PSs achieves better performance to the comparisons for both FC structure and PC structure.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 2","pages":"453-466"},"PeriodicalIF":3.2,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144243589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-15DOI: 10.1109/TBC.2025.3565895
Hui Hu;Yunhui Shi;Jin Wang;Nam Ling;Baocai Yin
Based on the measured latitude and longitude, users can freely view different perspectives of the omnidirectional image. Typically, omnidirectional images are represented in the equirectangular projection (ERP) format. Although ERP images suffer from distortion and redundancy due to oversampling, making traditional codec inefficient, they maintain visual consistency and enhance compatibility with deep learning-based image processing tools. This has led to the emergence of end-to-end omnidirectional image compression methods based on the ERP format. In fact, transform coding, a key component in learned planar image compression, has not yet been fully explored in the domain of learned omnidirectional image compression. In this paper, we propose a transform coding method with adaptive latitude-aware and importance-activated features for omnidirectional image compression. Specifically, the adaptive latitude-aware mechanism comprises two modules. The first module, termed Adaptive Latitude-aware Module (ALAM), employs rectangular dilated convolutional kernels of multiple sizes to perceive distortion redundancy across different latitudes, followed by latitude-adaptive weighting to select optimal features for respective latitudes. The second module, named Multi-scale Convolutional Gated Feedforward Network (MCGFN), fully exploits local contextual information while suppressing feature redundancy induced by diverse dilated convolutions in the first module. Furthermore, to further reduce ERP redundancy, we design an importance-activated spatial feature transform module that regulates latent representations to allocate more bits to significant regions. Experimental results demonstrate that our proposed method outperforms existing VVC standards and learning-based omnidirectional image compression approaches at medium-to-high bitrates while maintaining low computational complexity.
{"title":"Adaptive Latitude-Aware and Importance-Activated Transform Coding for Learned Omnidirectional Image Compression","authors":"Hui Hu;Yunhui Shi;Jin Wang;Nam Ling;Baocai Yin","doi":"10.1109/TBC.2025.3565895","DOIUrl":"https://doi.org/10.1109/TBC.2025.3565895","url":null,"abstract":"Based on the measured latitude and longitude, users can freely view different perspectives of the omnidirectional image. Typically, omnidirectional images are represented in the equirectangular projection (ERP) format. Although ERP images suffer from distortion and redundancy due to oversampling, making traditional codec inefficient, they maintain visual consistency and enhance compatibility with deep learning-based image processing tools. This has led to the emergence of end-to-end omnidirectional image compression methods based on the ERP format. In fact, transform coding, a key component in learned planar image compression, has not yet been fully explored in the domain of learned omnidirectional image compression. In this paper, we propose a transform coding method with adaptive latitude-aware and importance-activated features for omnidirectional image compression. Specifically, the adaptive latitude-aware mechanism comprises two modules. The first module, termed Adaptive Latitude-aware Module (ALAM), employs rectangular dilated convolutional kernels of multiple sizes to perceive distortion redundancy across different latitudes, followed by latitude-adaptive weighting to select optimal features for respective latitudes. The second module, named Multi-scale Convolutional Gated Feedforward Network (MCGFN), fully exploits local contextual information while suppressing feature redundancy induced by diverse dilated convolutions in the first module. Furthermore, to further reduce ERP redundancy, we design an importance-activated spatial feature transform module that regulates latent representations to allocate more bits to significant regions. Experimental results demonstrate that our proposed method outperforms existing VVC standards and learning-based omnidirectional image compression approaches at medium-to-high bitrates while maintaining low computational complexity.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 3","pages":"874-888"},"PeriodicalIF":4.8,"publicationDate":"2025-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144998171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-12DOI: 10.1109/TBC.2025.3553307
Jian Xiong;Junqi Wu;You Zhou;Shiqing Xu
In recent years, with the advancement of autonomous aerial vehicles (AAV) technologies, small AAVs have been utilized for borderline patrol, especially for real-time video transmission without interruption. However, these small AAVs face limitations in conducting long-endurance and long-distance missions solely relying on their initial onboard resources. To address this issue, this paper introduces a novel combined AAV air resupply system based on energy cycle resupply. In this system, a ground energy resupply station dispatches a replenishing AAV (AAV-R) to dock with it along the border and transmit energy to the task AAV (AAV-T), when its energy resources are depleted, ensuring continuous energy supply. To tackle the challenge of siting the energy recharge station, we propose a greedy siting algorithm utilizing Monte Carlo methods and an algorithm based on ant colony and clustering. Simulations demonstrate that the number of energy recharge stations can be reduced to 47.6% - 52.9% compared to the AAV-T autonomous return recharge scheme. Additionally, we present a Q Learning-based energy cycle resupply algorithm for AAV-R path planning, offering practical applications in real-world borderline patrol scenarios.
{"title":"On Energy Replenishment Station Site Selection and Path Planning for Drone Video Streaming","authors":"Jian Xiong;Junqi Wu;You Zhou;Shiqing Xu","doi":"10.1109/TBC.2025.3553307","DOIUrl":"https://doi.org/10.1109/TBC.2025.3553307","url":null,"abstract":"In recent years, with the advancement of autonomous aerial vehicles (AAV) technologies, small AAVs have been utilized for borderline patrol, especially for real-time video transmission without interruption. However, these small AAVs face limitations in conducting long-endurance and long-distance missions solely relying on their initial onboard resources. To address this issue, this paper introduces a novel combined AAV air resupply system based on energy cycle resupply. In this system, a ground energy resupply station dispatches a replenishing AAV (AAV-R) to dock with it along the border and transmit energy to the task AAV (AAV-T), when its energy resources are depleted, ensuring continuous energy supply. To tackle the challenge of siting the energy recharge station, we propose a greedy siting algorithm utilizing Monte Carlo methods and an algorithm based on ant colony and clustering. Simulations demonstrate that the number of energy recharge stations can be reduced to 47.6% - 52.9% compared to the AAV-T autonomous return recharge scheme. Additionally, we present a Q Learning-based energy cycle resupply algorithm for AAV-R path planning, offering practical applications in real-world borderline patrol scenarios.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 3","pages":"862-873"},"PeriodicalIF":4.8,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144998336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}