In modern video coding standards, block-based inter prediction is widely adopted, which brings high compression efficiency. However, in natural videos, there are usually multiple moving objects of arbitrary shapes, resulting in complex motion fields that are difficult to represent compactly. This problem has been tackled by more flexible block partitioning methods in the Versatile Video Coding (VVC) standard, but the more flexible partitions require more overhead bits to signal and still cannot be made arbitrarily shaped. To address this limitation, we propose an object segmentation-assisted inter prediction method (SAIP), where objects in the reference frames are segmented by some advanced technologies. With a proper indication, the object segmentation mask is translated from the reference frame to the current frame as the arbitrary-shaped partition of different regions without any extra signal. Using the segmentation mask, motion compensation is separately performed for different regions, achieving higher prediction accuracy. The segmentation mask is further used to code the motion vectors of different regions more efficiently. Moreover, the segmentation mask is considered in the joint rate-distortion optimization for motion estimation and partition estimation to derive the motion vector of different regions and partition more accurately. The proposed method is implemented into the VVC reference software, VTM version 12.0. Experimental results show that the proposed method achieves up to 1.98%, 1.14%, 0.79%, and on average 0.82%, 0.49%, 0.37% BD-rate reduction for common test sequences, under the Low-delay P, Low-delay B, and Random Access configurations, respectively.
{"title":"Object Segmentation-Assisted Inter Prediction for Versatile Video Coding","authors":"Zhuoyuan Li;Zikun Yuan;Li Li;Dong Liu;Xiaohu Tang;Feng Wu","doi":"10.1109/TBC.2024.3434520","DOIUrl":"10.1109/TBC.2024.3434520","url":null,"abstract":"In modern video coding standards, block-based inter prediction is widely adopted, which brings high compression efficiency. However, in natural videos, there are usually multiple moving objects of arbitrary shapes, resulting in complex motion fields that are difficult to represent compactly. This problem has been tackled by more flexible block partitioning methods in the Versatile Video Coding (VVC) standard, but the more flexible partitions require more overhead bits to signal and still cannot be made arbitrarily shaped. To address this limitation, we propose an object segmentation-assisted inter prediction method (SAIP), where objects in the reference frames are segmented by some advanced technologies. With a proper indication, the object segmentation mask is translated from the reference frame to the current frame as the arbitrary-shaped partition of different regions without any extra signal. Using the segmentation mask, motion compensation is separately performed for different regions, achieving higher prediction accuracy. The segmentation mask is further used to code the motion vectors of different regions more efficiently. Moreover, the segmentation mask is considered in the joint rate-distortion optimization for motion estimation and partition estimation to derive the motion vector of different regions and partition more accurately. The proposed method is implemented into the VVC reference software, VTM version 12.0. Experimental results show that the proposed method achieves up to 1.98%, 1.14%, 0.79%, and on average 0.82%, 0.49%, 0.37% BD-rate reduction for common test sequences, under the Low-delay P, Low-delay B, and Random Access configurations, respectively.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 4","pages":"1236-1253"},"PeriodicalIF":3.2,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141940835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01DOI: 10.1109/TBC.2024.3434676
Wen-Xuan Long;Nian Li;Yuan Liu;M. R. Bhavani Shankar;Rui Chen
This paper considers the issue of acquiring channel state information (CSI) for multi-user orbital angular momentum (MU-OAM) wireless backhaul between the macro base station (MBS) and small base stations (SBSs) within broadcasting networks. Unlike prior works, we assume that each SBS transmits a pilot signal of length one on each multiplexed OAM mode and subcarrier, resulting in the coherent observations collected at the MBS. Then, we construct the data sets using the coherent observations, the components of which independently contain arbitrarily assumed positional information. The amplitude-phase multiple signal classification (AP-MUSIC) algorithm, a novel variant of the MUSIC, then conducts a two-dimensional (2-D) search on the amplitude and phase of the data component in both the OAM mode and frequency domains for estimating positions at each iteration. These estimates, together with the observations, are used to iteratively update the data sets, ultimately refining the distances and AoAs of all SBSs. The theoretical analysis and simulation results indicate that this solution not only yields the precise CSI for the MU-OAM system, but also markedly reduces the training overhead, compared to existing alternatives.
{"title":"Low-Overhead Iterative Channel Parameter Estimation for Multi-User OAM Wireless Backhaul","authors":"Wen-Xuan Long;Nian Li;Yuan Liu;M. R. Bhavani Shankar;Rui Chen","doi":"10.1109/TBC.2024.3434676","DOIUrl":"10.1109/TBC.2024.3434676","url":null,"abstract":"This paper considers the issue of acquiring channel state information (CSI) for multi-user orbital angular momentum (MU-OAM) wireless backhaul between the macro base station (MBS) and small base stations (SBSs) within broadcasting networks. Unlike prior works, we assume that each SBS transmits a pilot signal of length one on each multiplexed OAM mode and subcarrier, resulting in the coherent observations collected at the MBS. Then, we construct the data sets using the coherent observations, the components of which independently contain arbitrarily assumed positional information. The amplitude-phase multiple signal classification (AP-MUSIC) algorithm, a novel variant of the MUSIC, then conducts a two-dimensional (2-D) search on the amplitude and phase of the data component in both the OAM mode and frequency domains for estimating positions at each iteration. These estimates, together with the observations, are used to iteratively update the data sets, ultimately refining the distances and AoAs of all SBSs. The theoretical analysis and simulation results indicate that this solution not only yields the precise CSI for the MU-OAM system, but also markedly reduces the training overhead, compared to existing alternatives.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"74-80"},"PeriodicalIF":3.2,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10620284","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141886385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-31DOI: 10.1109/TBC.2024.3434536
Jian Wang;Yulong Hao;Zhongle Wu;Yafei Shi;Cheng Yang
Frequency modulation (FM) broadcasting is a robust and widely applied technology that offers unparalleled advantages over other broadcasting methods in challenging environments. In order to achieve high accuracy in constructing broadcasting maps for scenarios with uneven and sparse distribution of measurement data, we introduce the concept of FM broadcasting maps and propose a novel methodology for their construction. This paper utilizes the Long Short-Term Memory (LSTM) model to assimilate predictions from the ITU-R models for modeling purposes. To begin, we analyzed critical environmental parameters influencing radio wave propagation. Based on this analysis, we identified the foundational input features for the LSTM model. Subsequently, predictions from the ITU-R P.1546 and 2001 models were assimilated as features and input into the LSTM model for training, resulting in assimilation modeling. Finally, a broadcast map is constructed using the parameter construction method based on the proposed model. The results indicate that the relative error between the measurements and the proposed models, ITU-R P.1546 and ITU-R P.2001, are 3.14%, 6.48%, and 9.89%, respectively. The prediction accuracy of the proposed model surpasses that of the ITU-R models, and stability is significantly improved compared to models solely based on LSTM. The broadcast map in this paper provides an objective reflection of measured field strength values across multiple dimensions, including frequency, distance, various terrains, and error distribution. It demonstrates notable advantages in scenarios characterized by sparse and unevenly distributed sampling points.
{"title":"A Broadcast Map Constructing Method Based on the LSTM and Assimilation Theory","authors":"Jian Wang;Yulong Hao;Zhongle Wu;Yafei Shi;Cheng Yang","doi":"10.1109/TBC.2024.3434536","DOIUrl":"10.1109/TBC.2024.3434536","url":null,"abstract":"Frequency modulation (FM) broadcasting is a robust and widely applied technology that offers unparalleled advantages over other broadcasting methods in challenging environments. In order to achieve high accuracy in constructing broadcasting maps for scenarios with uneven and sparse distribution of measurement data, we introduce the concept of FM broadcasting maps and propose a novel methodology for their construction. This paper utilizes the Long Short-Term Memory (LSTM) model to assimilate predictions from the ITU-R models for modeling purposes. To begin, we analyzed critical environmental parameters influencing radio wave propagation. Based on this analysis, we identified the foundational input features for the LSTM model. Subsequently, predictions from the ITU-R P.1546 and 2001 models were assimilated as features and input into the LSTM model for training, resulting in assimilation modeling. Finally, a broadcast map is constructed using the parameter construction method based on the proposed model. The results indicate that the relative error between the measurements and the proposed models, ITU-R P.1546 and ITU-R P.2001, are 3.14%, 6.48%, and 9.89%, respectively. The prediction accuracy of the proposed model surpasses that of the ITU-R models, and stability is significantly improved compared to models solely based on LSTM. The broadcast map in this paper provides an objective reflection of measured field strength values across multiple dimensions, including frequency, distance, various terrains, and error distribution. It demonstrates notable advantages in scenarios characterized by sparse and unevenly distributed sampling points.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 3","pages":"924-934"},"PeriodicalIF":3.2,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141869492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-31DOI: 10.1109/TBC.2024.3420748
Zean Chen;Yeyao Chen;Gangyi Jiang;Mei Yu;Haiyong Xu;Ting Luo
Light Field (LF) imaging captures the spatial and angular information of light rays in the real world and enables various applications, including digital refocusing and single-shot depth estimation. Unfortunately, due to the limited sensor size of LF cameras, the captured LF images suffer from low spatial resolution while providing a dense angular sampling. Existing single-input LF spatial super-resolution (SR) methods usually utilize the inherent sub-pixel information to recover high-frequency textures, but they struggle in large-scale SR tasks (e.g., $8times $