IEEE Transactions on Broadcasting最新文献_第6页

Object Segmentation-Assisted Inter Prediction for Versatile Video Coding 多用途视频编码的物体分割辅助相互预测

IF 3.2 1区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Broadcasting

Pub Date : 2024-08-05 DOI: 10.1109/TBC.2024.3434520

Zhuoyuan Li;Zikun Yuan;Li Li;Dong Liu;Xiaohu Tang;Feng Wu

In modern video coding standards, block-based inter prediction is widely adopted, which brings high compression efficiency. However, in natural videos, there are usually multiple moving objects of arbitrary shapes, resulting in complex motion fields that are difficult to represent compactly. This problem has been tackled by more flexible block partitioning methods in the Versatile Video Coding (VVC) standard, but the more flexible partitions require more overhead bits to signal and still cannot be made arbitrarily shaped. To address this limitation, we propose an object segmentation-assisted inter prediction method (SAIP), where objects in the reference frames are segmented by some advanced technologies. With a proper indication, the object segmentation mask is translated from the reference frame to the current frame as the arbitrary-shaped partition of different regions without any extra signal. Using the segmentation mask, motion compensation is separately performed for different regions, achieving higher prediction accuracy. The segmentation mask is further used to code the motion vectors of different regions more efficiently. Moreover, the segmentation mask is considered in the joint rate-distortion optimization for motion estimation and partition estimation to derive the motion vector of different regions and partition more accurately. The proposed method is implemented into the VVC reference software, VTM version 12.0. Experimental results show that the proposed method achieves up to 1.98%, 1.14%, 0.79%, and on average 0.82%, 0.49%, 0.37% BD-rate reduction for common test sequences, under the Low-delay P, Low-delay B, and Random Access configurations, respectively.

在现代视频编码标准中，基于块的内部预测被广泛采用，带来了较高的压缩效率。然而，在自然视频中，通常存在多个任意形状的运动物体，导致运动场复杂，难以紧凑地表示。在通用视频编码（VVC）标准中，更灵活的块分区方法已经解决了这个问题，但是更灵活的分区需要更多的开销比特来发送信号，并且仍然不能任意形状。为了解决这一限制，我们提出了一种目标分割辅助内部预测方法（SAIP），该方法通过一些先进的技术对参考帧中的目标进行分割。通过适当的指示，将目标分割掩码从参考帧转换为当前帧，作为不同区域的任意形状分割，而不需要任何额外的信号。利用分割掩模对不同区域分别进行运动补偿，提高了预测精度。进一步利用分割掩码对不同区域的运动向量进行更有效的编码。此外，在运动估计和分割估计的联合速率失真优化中考虑了分割掩码，从而更准确地推导出不同区域的运动向量并进行分割。该方法已在VVC参考软件VTM 12.0中实现。实验结果表明，该方法在Low-delay P、Low-delay B和Random Access配置下，对常见测试序列的平均bd率分别降低了1.98%、1.14%、0.79%和0.82%、0.49%、0.37%。

{"title":"Object Segmentation-Assisted Inter Prediction for Versatile Video Coding","authors":"Zhuoyuan Li;Zikun Yuan;Li Li;Dong Liu;Xiaohu Tang;Feng Wu","doi":"10.1109/TBC.2024.3434520","DOIUrl":"10.1109/TBC.2024.3434520","url":null,"abstract":"In modern video coding standards, block-based inter prediction is widely adopted, which brings high compression efficiency. However, in natural videos, there are usually multiple moving objects of arbitrary shapes, resulting in complex motion fields that are difficult to represent compactly. This problem has been tackled by more flexible block partitioning methods in the Versatile Video Coding (VVC) standard, but the more flexible partitions require more overhead bits to signal and still cannot be made arbitrarily shaped. To address this limitation, we propose an object segmentation-assisted inter prediction method (SAIP), where objects in the reference frames are segmented by some advanced technologies. With a proper indication, the object segmentation mask is translated from the reference frame to the current frame as the arbitrary-shaped partition of different regions without any extra signal. Using the segmentation mask, motion compensation is separately performed for different regions, achieving higher prediction accuracy. The segmentation mask is further used to code the motion vectors of different regions more efficiently. Moreover, the segmentation mask is considered in the joint rate-distortion optimization for motion estimation and partition estimation to derive the motion vector of different regions and partition more accurately. The proposed method is implemented into the VVC reference software, VTM version 12.0. Experimental results show that the proposed method achieves up to 1.98%, 1.14%, 0.79%, and on average 0.82%, 0.49%, 0.37% BD-rate reduction for common test sequences, under the Low-delay P, Low-delay B, and Random Access configurations, respectively.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 4","pages":"1236-1253"},"PeriodicalIF":3.2,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141940835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Low-Overhead Iterative Channel Parameter Estimation for Multi-User OAM Wireless Backhaul 用于多用户 OAM 无线回程的低开销迭代信道参数估计

IF 3.2 1区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Broadcasting

Pub Date : 2024-08-01 DOI: 10.1109/TBC.2024.3434676

Wen-Xuan Long;Nian Li;Yuan Liu;M. R. Bhavani Shankar;Rui Chen

This paper considers the issue of acquiring channel state information (CSI) for multi-user orbital angular momentum (MU-OAM) wireless backhaul between the macro base station (MBS) and small base stations (SBSs) within broadcasting networks. Unlike prior works, we assume that each SBS transmits a pilot signal of length one on each multiplexed OAM mode and subcarrier, resulting in the coherent observations collected at the MBS. Then, we construct the data sets using the coherent observations, the components of which independently contain arbitrarily assumed positional information. The amplitude-phase multiple signal classification (AP-MUSIC) algorithm, a novel variant of the MUSIC, then conducts a two-dimensional (2-D) search on the amplitude and phase of the data component in both the OAM mode and frequency domains for estimating positions at each iteration. These estimates, together with the observations, are used to iteratively update the data sets, ultimately refining the distances and AoAs of all SBSs. The theoretical analysis and simulation results indicate that this solution not only yields the precise CSI for the MU-OAM system, but also markedly reduces the training overhead, compared to existing alternatives.

{"title":"Low-Overhead Iterative Channel Parameter Estimation for Multi-User OAM Wireless Backhaul","authors":"Wen-Xuan Long;Nian Li;Yuan Liu;M. R. Bhavani Shankar;Rui Chen","doi":"10.1109/TBC.2024.3434676","DOIUrl":"10.1109/TBC.2024.3434676","url":null,"abstract":"This paper considers the issue of acquiring channel state information (CSI) for multi-user orbital angular momentum (MU-OAM) wireless backhaul between the macro base station (MBS) and small base stations (SBSs) within broadcasting networks. Unlike prior works, we assume that each SBS transmits a pilot signal of length one on each multiplexed OAM mode and subcarrier, resulting in the coherent observations collected at the MBS. Then, we construct the data sets using the coherent observations, the components of which independently contain arbitrarily assumed positional information. The amplitude-phase multiple signal classification (AP-MUSIC) algorithm, a novel variant of the MUSIC, then conducts a two-dimensional (2-D) search on the amplitude and phase of the data component in both the OAM mode and frequency domains for estimating positions at each iteration. These estimates, together with the observations, are used to iteratively update the data sets, ultimately refining the distances and AoAs of all SBSs. The theoretical analysis and simulation results indicate that this solution not only yields the precise CSI for the MU-OAM system, but also markedly reduces the training overhead, compared to existing alternatives.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"71 1","pages":"74-80"},"PeriodicalIF":3.2,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10620284","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141886385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Broadcast Map Constructing Method Based on the LSTM and Assimilation Theory 基于 LSTM 和同化理论的广播地图构建方法

IF 3.2 1区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Broadcasting

Pub Date : 2024-07-31 DOI: 10.1109/TBC.2024.3434536

Jian Wang;Yulong Hao;Zhongle Wu;Yafei Shi;Cheng Yang

Frequency modulation (FM) broadcasting is a robust and widely applied technology that offers unparalleled advantages over other broadcasting methods in challenging environments. In order to achieve high accuracy in constructing broadcasting maps for scenarios with uneven and sparse distribution of measurement data, we introduce the concept of FM broadcasting maps and propose a novel methodology for their construction. This paper utilizes the Long Short-Term Memory (LSTM) model to assimilate predictions from the ITU-R models for modeling purposes. To begin, we analyzed critical environmental parameters influencing radio wave propagation. Based on this analysis, we identified the foundational input features for the LSTM model. Subsequently, predictions from the ITU-R P.1546 and 2001 models were assimilated as features and input into the LSTM model for training, resulting in assimilation modeling. Finally, a broadcast map is constructed using the parameter construction method based on the proposed model. The results indicate that the relative error between the measurements and the proposed models, ITU-R P.1546 and ITU-R P.2001, are 3.14%, 6.48%, and 9.89%, respectively. The prediction accuracy of the proposed model surpasses that of the ITU-R models, and stability is significantly improved compared to models solely based on LSTM. The broadcast map in this paper provides an objective reflection of measured field strength values across multiple dimensions, including frequency, distance, various terrains, and error distribution. It demonstrates notable advantages in scenarios characterized by sparse and unevenly distributed sampling points.

调频（FM）广播是一种强大而广泛应用的技术，在具有挑战性的环境中具有其他广播方法无可比拟的优势。为了在测量数据分布不均且稀疏的情况下实现高精度的广播图构建，我们引入了调频广播图的概念，并提出了一种构建调频广播图的新方法。本文利用长短时记忆（LSTM）模型吸收 ITU-R 模型的预测结果，以达到建模目的。首先，我们分析了影响无线电波传播的关键环境参数。在此基础上，我们确定了 LSTM 模型的基本输入特征。随后，ITU-R P.1546 和 2001 模型的预测结果被同化为特征，并输入 LSTM 模型进行训练，从而形成同化建模。最后，使用基于拟议模型的参数构建方法构建广播地图。结果表明，测量结果与拟议模型、ITU-R P.1546 和 ITU-R P.2001 之间的相对误差分别为 3.14%、6.48% 和 9.89%。与仅基于 LSTM 的模型相比，本文提出的模型的预测精度超过了 ITU-R 模型，稳定性也有显著提高。本文中的广播地图客观反映了频率、距离、各种地形和误差分布等多个维度的场强测量值。在采样点稀疏且分布不均的场景中，它表现出了明显的优势。

{"title":"A Broadcast Map Constructing Method Based on the LSTM and Assimilation Theory","authors":"Jian Wang;Yulong Hao;Zhongle Wu;Yafei Shi;Cheng Yang","doi":"10.1109/TBC.2024.3434536","DOIUrl":"10.1109/TBC.2024.3434536","url":null,"abstract":"Frequency modulation (FM) broadcasting is a robust and widely applied technology that offers unparalleled advantages over other broadcasting methods in challenging environments. In order to achieve high accuracy in constructing broadcasting maps for scenarios with uneven and sparse distribution of measurement data, we introduce the concept of FM broadcasting maps and propose a novel methodology for their construction. This paper utilizes the Long Short-Term Memory (LSTM) model to assimilate predictions from the ITU-R models for modeling purposes. To begin, we analyzed critical environmental parameters influencing radio wave propagation. Based on this analysis, we identified the foundational input features for the LSTM model. Subsequently, predictions from the ITU-R P.1546 and 2001 models were assimilated as features and input into the LSTM model for training, resulting in assimilation modeling. Finally, a broadcast map is constructed using the parameter construction method based on the proposed model. The results indicate that the relative error between the measurements and the proposed models, ITU-R P.1546 and ITU-R P.2001, are 3.14%, 6.48%, and 9.89%, respectively. The prediction accuracy of the proposed model surpasses that of the ITU-R models, and stability is significantly improved compared to models solely based on LSTM. The broadcast map in this paper provides an objective reflection of measured field strength values across multiple dimensions, including frequency, distance, various terrains, and error distribution. It demonstrates notable advantages in scenarios characterized by sparse and unevenly distributed sampling points.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 3","pages":"924-934"},"PeriodicalIF":3.2,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141869492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-Scale Spatial-Angular Collaborative Guidance Network for Heterogeneous Light Field Spatial Super-Resolution 用于异质光场空间超分辨率的多尺度空间-角度协作制导网络

IF 3.2 1区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Broadcasting

Pub Date : 2024-07-31 DOI: 10.1109/TBC.2024.3420748

Zean Chen;Yeyao Chen;Gangyi Jiang;Mei Yu;Haiyong Xu;Ting Luo

Light Field (LF) imaging captures the spatial and angular information of light rays in the real world and enables various applications, including digital refocusing and single-shot depth estimation. Unfortunately, due to the limited sensor size of LF cameras, the captured LF images suffer from low spatial resolution while providing a dense angular sampling. Existing single-input LF spatial super-resolution (SR) methods usually utilize the inherent sub-pixel information to recover high-frequency textures, but they struggle in large-scale SR tasks (e.g.,

$8times $

). Conversely, the heterogeneous imaging approach combining an LF camera and a 2D digital camera can capture richer information for effective large-scale reconstruction. To this end, this paper proposes a multi-scale spatial-angular collaborative guidance network (LF-MSACGNet) for heterogeneous LF spatial SR. Specifically, a context-guided deformable alignment module is first designed, which utilizes high-level feature information to achieve precise alignment between the low-resolution LF image and the 2D high-resolution image. Subsequently, a Transformer-driven spatial-angular collaborative guidance module is constructed to explore the spatial-angular correlation and complementarity. This allows for an effective fusion of the multi-resolution spatial-angular features. Finally, the SR LF image is reconstructed through a spatial-angular aggregation module. In addition, a multi-scale training strategy is adopted to subdivide the challenging large-scale SR task into multiple simple tasks to boost the SR performance. Experimental results on seven public datasets show that the proposed method outperforms the state-of-the-art SR methods in both quantitative and qualitative comparison, and exhibits favorable robustness to wide baseline LF images.

光场（LF）成像捕获现实世界中光线的空间和角度信息，并实现各种应用，包括数字重聚焦和单镜头深度估计。不幸的是，由于LF相机的传感器尺寸有限，捕获的LF图像在提供密集角度采样的同时空间分辨率较低。现有的单输入LF空间超分辨率（SR）方法通常利用固有的亚像素信息来恢复高频纹理，但它们在大规模的SR任务（例如，$8times $）中表现不佳。相反，结合LF相机和2D数码相机的异构成像方法可以捕获更丰富的信息，从而有效地进行大规模重建。为此，本文提出了面向异构LF空间sr的多尺度空间-角度协同制导网络（LF- msacgnet）。首先设计了上下文引导的可变形对齐模块，利用高级特征信息实现低分辨率LF图像与二维高分辨率图像的精确对齐。随后，构建了变压器驱动的空间角协同制导模块，探索空间角的相关性和互补性。这使得多分辨率空间角特征的有效融合成为可能。最后，通过空间-角度聚合模块重构SR LF图像。此外，采用多尺度训练策略，将具有挑战性的大规模SR任务细分为多个简单任务，提高SR性能。在7个公开数据集上的实验结果表明，本文提出的方法在定量和定性比较上都优于最先进的SR方法，并且对宽基线LF图像具有良好的鲁棒性。

{"title":"Multi-Scale Spatial-Angular Collaborative Guidance Network for Heterogeneous Light Field Spatial Super-Resolution","authors":"Zean Chen;Yeyao Chen;Gangyi Jiang;Mei Yu;Haiyong Xu;Ting Luo","doi":"10.1109/TBC.2024.3420748","DOIUrl":"10.1109/TBC.2024.3420748","url":null,"abstract":"Light Field (LF) imaging captures the spatial and angular information of light rays in the real world and enables various applications, including digital refocusing and single-shot depth estimation. Unfortunately, due to the limited sensor size of LF cameras, the captured LF images suffer from low spatial resolution while providing a dense angular sampling. Existing single-input LF spatial super-resolution (SR) methods usually utilize the inherent sub-pixel information to recover high-frequency textures, but they struggle in large-scale SR tasks (e.g., \u0000<inline-formula> <tex-math>$8times $ </tex-math></inline-formula>\u0000). Conversely, the heterogeneous imaging approach combining an LF camera and a 2D digital camera can capture richer information for effective large-scale reconstruction. To this end, this paper proposes a multi-scale spatial-angular collaborative guidance network (LF-MSACGNet) for heterogeneous LF spatial SR. Specifically, a context-guided deformable alignment module is first designed, which utilizes high-level feature information to achieve precise alignment between the low-resolution LF image and the 2D high-resolution image. Subsequently, a Transformer-driven spatial-angular collaborative guidance module is constructed to explore the spatial-angular correlation and complementarity. This allows for an effective fusion of the multi-resolution spatial-angular features. Finally, the SR LF image is reconstructed through a spatial-angular aggregation module. In addition, a multi-scale training strategy is adopted to subdivide the challenging large-scale SR task into multiple simple tasks to boost the SR performance. Experimental results on seven public datasets show that the proposed method outperforms the state-of-the-art SR methods in both quantitative and qualitative comparison, and exhibits favorable robustness to wide baseline LF images.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 4","pages":"1221-1235"},"PeriodicalIF":3.2,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141869491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving QoS of Satellite Broadcasting Against Rain Attenuation by LLR Sharing Method With IP Network Integration at FEC Layer 通过在 FEC 层集成 IP 网络的 LLR 共享方法改善卫星广播抗雨衰减的 QoS

IF 3.2 1区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Broadcasting

Pub Date : 2024-07-16 DOI: 10.1109/TBC.2024.3419564

Yuki Koizumi;Yoichi Suzuki;Masashi Kamei

We are developing a diverse reception system (DRS) to compensate for rain attenuation in satellite broadcasting that utilizes path diversity between a satellite channel and a best-effort (BE)-IP network for end users. The proposed DRS recovers signals lost on a satellite channel due to significant rain attenuation by supplementing them via a BE-IP network. The unique point of the DRS is that it uses a common forward error correction (FEC) for both channels by applying a FEC for satellite broadcasting to the error correction in BE-IP network. This allows both channels to be integrated at the FEC layer and improves the quality of service (QoS) in the satellite broadcasting thanks to sharing the decoding information between the two. In this paper, we propose an advanced error-correction technique that achieves a high QoS in the DRS by sharing each log-likelihood ratio (LLR) utilized for the FEC decoding of bits received over both channels. This technique, which we call the LLR sharing method, enables successful error correction even when neither a satellite channel nor a BE-IP network can individually achieve error-free transmission due to significant rain attenuation or burst packet loss, respectively. Computer simulation results confirm that the LLR sharing method can improve the required C/N in the satellite broadcasting. In addition, we discuss how to suppress the load of the BE-IP network and integrate both channels efficiently.

我们正在开发一种多样化接收系统（DRS），利用卫星信道与面向终端用户的尽力（BE）-IP 网络之间的路径分集来补偿卫星广播中的雨衰减。拟议的 DRS 可通过 BE-IP 网络补充卫星信道上的信号，从而恢复因严重雨衰而丢失的信号。DRS 的独特之处在于，它通过将卫星广播的前向纠错（FEC）应用到 BE-IP 网络的纠错中，为两个信道使用共同的前向纠错（FEC）。这使得两个信道可以在 FEC 层集成，并通过共享两个信道之间的解码信息提高了卫星广播的服务质量（QoS）。在本文中，我们提出了一种先进的纠错技术，通过共享两个信道接收比特的 FEC 解码所使用的每个对数似然比 (LLR)，在 DRS 中实现了高 QoS。我们将这种技术称为 LLR 共享法，即使卫星信道和 BE-IP 网络分别因明显的雨衰减或突发数据包丢失而无法单独实现无差错传输时，也能成功纠错。计算机仿真结果证实，LLR 共享方法可以提高卫星广播所需的信噪比。此外，我们还讨论了如何抑制 BE-IP 网络的负载并有效整合两个信道。

{"title":"Improving QoS of Satellite Broadcasting Against Rain Attenuation by LLR Sharing Method With IP Network Integration at FEC Layer","authors":"Yuki Koizumi;Yoichi Suzuki;Masashi Kamei","doi":"10.1109/TBC.2024.3419564","DOIUrl":"10.1109/TBC.2024.3419564","url":null,"abstract":"We are developing a diverse reception system (DRS) to compensate for rain attenuation in satellite broadcasting that utilizes path diversity between a satellite channel and a best-effort (BE)-IP network for end users. The proposed DRS recovers signals lost on a satellite channel due to significant rain attenuation by supplementing them via a BE-IP network. The unique point of the DRS is that it uses a common forward error correction (FEC) for both channels by applying a FEC for satellite broadcasting to the error correction in BE-IP network. This allows both channels to be integrated at the FEC layer and improves the quality of service (QoS) in the satellite broadcasting thanks to sharing the decoding information between the two. In this paper, we propose an advanced error-correction technique that achieves a high QoS in the DRS by sharing each log-likelihood ratio (LLR) utilized for the FEC decoding of bits received over both channels. This technique, which we call the LLR sharing method, enables successful error correction even when neither a satellite channel nor a BE-IP network can individually achieve error-free transmission due to significant rain attenuation or burst packet loss, respectively. Computer simulation results confirm that the LLR sharing method can improve the required C/N in the satellite broadcasting. In addition, we discuss how to suppress the load of the BE-IP network and integrate both channels efficiently.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 3","pages":"822-832"},"PeriodicalIF":3.2,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10599863","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141720316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Performance Evaluation of YOLOv8-Based Bib Number Detection in Media Streaming Race 媒体流竞赛中基于 YOLOv8 的 Bib 号码检测性能评估

IF 3.2 1区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Broadcasting

Pub Date : 2024-07-09 DOI: 10.1109/TBC.2024.3414656

Rafael Martínez;Álvaro Llorente;Alberto del Rio;Javier Serrano;David Jimenez

The evolution of telecommunication networks unlocks new possibilities for multimedia services, including enriched and personalized experiences. However, ensuring high Quality of Service and Quality of Experience requires intelligent solutions at the edge. This study investigates the real-time detection of race bib numbers using YOLOv8, a state-of-the-art object detection framework, within the context of 5G/6G edge computing. We train (BDBD and SVHN datasets) and analyze various YOLOv8 models (nano to extreme) across two diverse racing datasets (TGCRBNW and RBNR), encompassing varied environmental conditions (daytime and nighttime). Our assessment focuses on key performance metrics, including processing time, efficiency, and accuracy. For instance, on the TGCRBNW dataset, the extreme-sized model shows a noticeable reduction in prediction time when the more powerful GPU is used, with times decreasing from 1,161 to 54 seconds on a desktop computer. Similarly, on the RBNR dataset, the extreme-sized model exhibits a significant reduction in prediction time from 373 to 15 seconds when using the more powerful GPU. In terms of accuracy, we found varying performance across scenarios and datasets. For example, not good enough results are obtained in most scenarios on the TGCRBNW dataset (lower than 50% in all sets and models), while YOLOv8m obtain the high accuracy in several scenarios on the RBNR dataset (almost 80% of accuracy in the best set). Variability in prediction times was observed between different computer architectures, highlighting the importance of selecting appropriate hardware for specific tasks. These results emphasize the importance of aligning computational resources with the demands of real-world tasks to achieve timely and accurate predictions.

电信网络的发展为多媒体服务带来了新的可能性，包括丰富的个性化体验。然而，要确保高服务质量和高体验质量，就需要在边缘采用智能解决方案。本研究在 5G/6G 边缘计算的背景下，使用最先进的对象检测框架 YOLOv8 对比赛号码进行实时检测。我们在两个不同的比赛数据集（TGCRBNW 和 RBNR）中训练（BDBD 和 SVHN 数据集）并分析各种 YOLOv8 模型（从纳米到极致），其中包括不同的环境条件（白天和夜间）。我们的评估侧重于关键性能指标，包括处理时间、效率和准确性。例如，在 TGCRBNW 数据集上，当使用更强大的 GPU 时，极端尺寸模型的预测时间明显缩短，在台式电脑上的预测时间从 1161 秒缩短到 54 秒。同样，在 RBNR 数据集上，当使用更强大的 GPU 时，极端大小模型的预测时间从 373 秒显著缩短到 15 秒。在准确性方面，我们发现不同的场景和数据集有不同的表现。例如，在 TGCRBNW 数据集上的大多数场景中都没有获得足够好的结果（所有数据集和模型的准确率都低于 50%），而 YOLOv8m 在 RBNR 数据集上的多个场景中都获得了较高的准确率（在最好的数据集中准确率接近 80%）。不同计算机架构的预测时间存在差异，这凸显了为特定任务选择合适硬件的重要性。这些结果强调了根据实际任务的需求调整计算资源以实现及时准确预测的重要性。

{"title":"Performance Evaluation of YOLOv8-Based Bib Number Detection in Media Streaming Race","authors":"Rafael Martínez;Álvaro Llorente;Alberto del Rio;Javier Serrano;David Jimenez","doi":"10.1109/TBC.2024.3414656","DOIUrl":"10.1109/TBC.2024.3414656","url":null,"abstract":"The evolution of telecommunication networks unlocks new possibilities for multimedia services, including enriched and personalized experiences. However, ensuring high Quality of Service and Quality of Experience requires intelligent solutions at the edge. This study investigates the real-time detection of race bib numbers using YOLOv8, a state-of-the-art object detection framework, within the context of 5G/6G edge computing. We train (BDBD and SVHN datasets) and analyze various YOLOv8 models (nano to extreme) across two diverse racing datasets (TGCRBNW and RBNR), encompassing varied environmental conditions (daytime and nighttime). Our assessment focuses on key performance metrics, including processing time, efficiency, and accuracy. For instance, on the TGCRBNW dataset, the extreme-sized model shows a noticeable reduction in prediction time when the more powerful GPU is used, with times decreasing from 1,161 to 54 seconds on a desktop computer. Similarly, on the RBNR dataset, the extreme-sized model exhibits a significant reduction in prediction time from 373 to 15 seconds when using the more powerful GPU. In terms of accuracy, we found varying performance across scenarios and datasets. For example, not good enough results are obtained in most scenarios on the TGCRBNW dataset (lower than 50% in all sets and models), while YOLOv8m obtain the high accuracy in several scenarios on the RBNR dataset (almost 80% of accuracy in the best set). Variability in prediction times was observed between different computer architectures, highlighting the importance of selecting appropriate hardware for specific tasks. These results emphasize the importance of aligning computational resources with the demands of real-world tasks to achieve timely and accurate predictions.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 3","pages":"1126-1138"},"PeriodicalIF":3.2,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10591494","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141573016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hybrid Unicast/Multicast Massive MIMO Precoding for 5G Mixed Mode 面向 5G 混合模式的混合单播/多播大规模 MIMO 精确编码

IF 3.2 1区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Broadcasting

Pub Date : 2024-07-09 DOI: 10.1109/TBC.2024.3405313

Fei Qi;Lei Liu;Weiliang Xie

This paper studies the realization of wireless video transmission by leveraging 5G mixed mode with multimedia broadcast multicast services (MBMS). In particular, it investigates a number of key elements, such as physical layer modeling and precoding strategies, for MBMS implementation with large-scale multi-input multi-output (MIMO). A novel hybrid 5G mixed mode system is proposed to seamlessly integrate unicast and multicast transmissions, wherein system architecture, user grouping strategies, interference mitigation techniques, and optimized multicast beamforming approach are comprehensively elucidated. The performance of our proposed system is assessed through comprehensive simulations and analysis. The results indicate significant improvements in coding and spectral efficiencies while combining MIMO with layer division multiplexing (LDM).

本文研究了利用 5G 混合模式与多媒体广播组播服务（MBMS）实现无线视频传输的问题。特别是，它研究了一些关键要素，如物理层建模和预编码策略，以实现大规模多输入多输出（MIMO）的 MBMS。本文提出了一种新型混合 5G 混合模式系统，以无缝集成单播和组播传输，并全面阐释了系统架构、用户分组策略、干扰缓解技术和优化的组播波束成形方法。我们通过全面的模拟和分析评估了所提系统的性能。结果表明，在将多输入多输出（MIMO）与层分复用（LDM）相结合的同时，编码效率和频谱效率都得到了显著提高。

引用次数: 0

A Novel Distributed Multi-Source Optimal Rate Control Solution for HTTP Live Video Streaming 针对 HTTP 实时视频流的新型分布式多源优化速率控制解决方案

IF 3.2 1区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Broadcasting

Pub Date : 2024-07-08 DOI: 10.1109/TBC.2024.3391051

Shujie Yang;Chuxing Fang;Lujie Zhong;Mu Wang;Zan Zhou;Han Xiao;Hao Hao;Changqiao Xu;Gabriel-Miro Muntean

HTTP live streaming delivers dynamically video content with varying bitrates to accommodate the dynamic real-time bandwidth fluctuations while considering diverse user preferences and device capabilities. Existing flow control solutions do not provide support for new features such as multi-source content transmission. In this paper, we propose a distributed multi-source rate control optimization algorithm (DMRCA) that maximizes the overall network bandwidth utility and improves viewer Quality of Experience (QoE). First, we model the rate control problem as a dual-optimized multi-source and multi-rate problem. Then, we decompose the problem into sub-problems of source rate selection and user rate adaptation and we prove that solving the original problem is equivalent to solving these two sub-problems. Furthermore, we propose DMRCA as a fully distributed algorithm to solve these sub-problems and derive an optimal solution and we discuss DMRCA’s complexity and convergence. Finally, through a series of simulation tests, we demonstrate the superiority of our proposed algorithm compared to alternative state-of-the-art solutions.

HTTP 实时流以不同的比特率动态传送视频内容，以适应动态的实时带宽波动，同时考虑到不同的用户偏好和设备能力。现有的流量控制解决方案不支持多源内容传输等新功能。在本文中，我们提出了一种分布式多源速率控制优化算法（DMRCA），它能最大限度地提高整体网络带宽效用并改善观众的体验质量（QoE）。首先，我们将速率控制问题建模为多源和多速率双重优化问题。然后，我们将问题分解为源速率选择和用户速率适应两个子问题，并证明解决原始问题等同于解决这两个子问题。此外，我们还提出了 DMRCA 作为一种全分布式算法来解决这些子问题，并推导出一个最优解，我们还讨论了 DMRCA 的复杂性和收敛性。最后，通过一系列模拟测试，我们证明了与其他最先进的解决方案相比，我们提出的算法更胜一筹。

{"title":"A Novel Distributed Multi-Source Optimal Rate Control Solution for HTTP Live Video Streaming","authors":"Shujie Yang;Chuxing Fang;Lujie Zhong;Mu Wang;Zan Zhou;Han Xiao;Hao Hao;Changqiao Xu;Gabriel-Miro Muntean","doi":"10.1109/TBC.2024.3391051","DOIUrl":"10.1109/TBC.2024.3391051","url":null,"abstract":"HTTP live streaming delivers dynamically video content with varying bitrates to accommodate the dynamic real-time bandwidth fluctuations while considering diverse user preferences and device capabilities. Existing flow control solutions do not provide support for new features such as multi-source content transmission. In this paper, we propose a distributed multi-source rate control optimization algorithm (DMRCA) that maximizes the overall network bandwidth utility and improves viewer Quality of Experience (QoE). First, we model the rate control problem as a dual-optimized multi-source and multi-rate problem. Then, we decompose the problem into sub-problems of source rate selection and user rate adaptation and we prove that solving the original problem is equivalent to solving these two sub-problems. Furthermore, we propose DMRCA as a fully distributed algorithm to solve these sub-problems and derive an optimal solution and we discuss DMRCA’s complexity and convergence. Finally, through a series of simulation tests, we demonstrate the superiority of our proposed algorithm compared to alternative state-of-the-art solutions.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 3","pages":"792-807"},"PeriodicalIF":3.2,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10589341","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141573019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multimedia Classification via Tensor Linear Discriminant Analysis 通过张量线性判别分析进行多媒体分类

IF 3.2 1区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Broadcasting

Pub Date : 2024-07-08 DOI: 10.1109/TBC.2024.3417342

Shih-Yu Chang;Hsiao-Chun Wu;Kun Yan;Scott Chih-Hao Huang;Yiyan Wu

Linear discriminant analysis (LDA) is a well-known feature-extraction technique for data analytic and pattern classification. As the dimensionality of multimedia data has increased in this big era, it is often to characterize data by tensors. Over the past two decades, researchers have thus explored to extend LDA to the general tensor space, especially in two common ways: LDA of tensors using tensor decomposition methods (by conversion of tensors to matrices) and LDA of tensors built upon the T-product. However, both of the aforementioned approaches have restrictions thereby. A critical problem about how to carry out LDA of arbitrary scatter tensors based on the Einstein product still remains unsolved by the existing methods. Therefore, we propose a novel tensor LDA (a.k.a. TLDA) approach, which can carry out the LDA of arbitrary-dimensional scatter-tensors without any need of tensor decomposition. Besides, for reducing the computation time, we also design a parallel paradigm to execute our proposed TLDA in this work. Numerical experiments conducted over real multimedia data demonstrate the efficacy of our proposed new TLDA in terms of classification accuracy. Moreover, the comparison of the classification accuracies, computational-complexities, and memory-complexities of our proposed novel TLDA scheme and other existing tensor-based LDA methods is made. By leveraging TLDA for high-dimensional feature extraction, segmentation, and user-item interaction data processing, future multimedia recommendation systems can facilitate more accurate, engaging, and satisfactory user experience over the Internet.

线性判别分析（LDA）是一种用于数据分析和模式分类的特征提取技术。在这个大时代，随着多媒体数据维数的增加，人们往往用张量来描述数据。在过去的二十年里，研究人员探索了将LDA扩展到一般张量空间的方法，特别是两种常见的方法：使用张量分解方法（通过将张量转换为矩阵）的张量LDA和基于t积的张量LDA。然而，上述两种方法都有其局限性。如何基于爱因斯坦积实现任意散射张量的LDA，是现有方法尚未解决的一个关键问题。因此，我们提出了一种新的张量LDA（又名TLDA）方法，它可以在不需要张量分解的情况下对任意维的散射张量进行LDA。此外，为了减少计算时间，我们还设计了一个并行范例来执行我们提出的TLDA。在真实多媒体数据上进行的数值实验证明了我们提出的新TLDA在分类精度方面的有效性。此外，我们提出的新TLDA方案与其他现有的基于张量的LDA方法在分类精度、计算复杂度和内存复杂度方面进行了比较。通过利用TLDA进行高维特征提取、分割和用户项目交互数据处理，未来的多媒体推荐系统可以在互联网上促进更准确、更吸引人、更令人满意的用户体验。

{"title":"Multimedia Classification via Tensor Linear Discriminant Analysis","authors":"Shih-Yu Chang;Hsiao-Chun Wu;Kun Yan;Scott Chih-Hao Huang;Yiyan Wu","doi":"10.1109/TBC.2024.3417342","DOIUrl":"10.1109/TBC.2024.3417342","url":null,"abstract":"Linear discriminant analysis (LDA) is a well-known feature-extraction technique for data analytic and pattern classification. As the dimensionality of multimedia data has increased in this big era, it is often to characterize data by tensors. Over the past two decades, researchers have thus explored to extend LDA to the general tensor space, especially in two common ways: LDA of tensors using tensor decomposition methods (by conversion of tensors to matrices) and LDA of tensors built upon the T-product. However, both of the aforementioned approaches have restrictions thereby. A critical problem about how to carry out LDA of arbitrary scatter tensors based on the Einstein product still remains unsolved by the existing methods. Therefore, we propose a novel tensor LDA (a.k.a. TLDA) approach, which can carry out the LDA of arbitrary-dimensional scatter-tensors without any need of tensor decomposition. Besides, for reducing the computation time, we also design a parallel paradigm to execute our proposed TLDA in this work. Numerical experiments conducted over real multimedia data demonstrate the efficacy of our proposed new TLDA in terms of classification accuracy. Moreover, the comparison of the classification accuracies, computational-complexities, and memory-complexities of our proposed novel TLDA scheme and other existing tensor-based LDA methods is made. By leveraging TLDA for high-dimensional feature extraction, segmentation, and user-item interaction data processing, future multimedia recommendation systems can facilitate more accurate, engaging, and satisfactory user experience over the Internet.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 4","pages":"1139-1152"},"PeriodicalIF":3.2,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141573017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Packet Retransmission Schemes and Trials for Broadcast Services in Mobile Scenarios 移动场景中广播服务的数据包重传方案和试验

IF 3.2 1区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Broadcasting

Pub Date : 2024-07-02 DOI: 10.1109/TBC.2024.3410706

Yin Xu;Hao Ju;Zigang Fu;Xin Lin;Tianyao Ma;Dazhi He;Yang Chen;Dajun Zhang;Ke Wang;Wenjun Zhang;Yiyan Wu

With the escalating prevalence of datacasting, live streaming and high-quality video consumption on mobile devices, there arises an increasing demand for a cost-effective and reliable approach to transmit large volumes of such content to extensive audiences. While broadband mobile networks can increase capacity through denser base stations and higher frequencies, the linear pace of facility development makes it difficult to match the non-linear growth of the service throughput. Terrestrial Broadcast has proven itself to be significantly more efficient in transmitting popular video streams to mobile devices over a large area. However, due to its downlink-only nature, it falls short of delivering consistently reliable services. Hence, the convergence of terrestrial broadcast and broadband mobile networks has resurfaced as a pertinent topic for consideration. In this paper, terrestrial broadcast is adopted as the main pipe to transmit streaming services to mobile phones, with a 5th generation mobile communications (5G) new radio (NR) mobile carrier employed to provide complementary packet loss retransmission service, ensuring a seamless service experience. First, a cross-standard packet retransmission (CPR) scheme is proposed based on 5G broadcast and 5G NR Systems. Corresponding protocols and schemes are introduced, and a prototype system is realized. CPR is able to support delay-insensitive datacasting services very well, yet its higher layer convergence poses challenges for supporting delay-sensitive real-time services. To address this, a MAC-layer homogeneous packet retransmission (HPR) scheme is proposed. The basic principle is to utilize the carrier aggregation mechanism of 5G, modifying the protocols to enable one carrier to simulate broadcast while maintaining unicast in another carrier. In HPR, packet retransmission can be done at the MAC layer, reducing the retransmission delay to within 5 microseconds. Simulation and trial results are presented based on the proposed schemes.

随着数据传输、流媒体直播和高质量视频消费在移动设备上的日益普及，人们越来越需要一种具有成本效益且可靠的方法来向广大受众传输大量此类内容。虽然宽带移动网络可以通过更密集的基站和更高的频率来提高容量，但设施的线性发展速度很难与服务吞吐量的非线性增长相匹配。事实证明，地面广播在向大范围移动设备传输流行视频流方面效率更高。然而，由于其仅具有下行链路的特性，它无法提供持续可靠的服务。因此，地面广播与宽带移动网络的融合再次成为需要考虑的相关话题。本文采用地面广播作为向手机传输流媒体服务的主要管道，并利用第五代移动通信（5G）新无线电（NR）移动载波提供互补的丢包重传服务，确保无缝的服务体验。首先，提出了一种基于 5G 广播和 5G NR 系统的跨标准数据包重传（CPR）方案。介绍了相应的协议和方案，并实现了一个原型系统。CPR 能够很好地支持对延迟不敏感的数据广播服务，但其高层融合对支持对延迟敏感的实时服务提出了挑战。为解决这一问题，提出了一种 MAC 层同质数据包重传（HPR）方案。其基本原理是利用 5G 的载波聚合机制，修改协议使一个载波能够模拟广播，同时在另一个载波中保持单播。在 HPR 中，数据包重传可在 MAC 层完成，从而将重传延迟减少到 5 微秒以内。本文介绍了基于所提方案的仿真和试验结果。

{"title":"Packet Retransmission Schemes and Trials for Broadcast Services in Mobile Scenarios","authors":"Yin Xu;Hao Ju;Zigang Fu;Xin Lin;Tianyao Ma;Dazhi He;Yang Chen;Dajun Zhang;Ke Wang;Wenjun Zhang;Yiyan Wu","doi":"10.1109/TBC.2024.3410706","DOIUrl":"10.1109/TBC.2024.3410706","url":null,"abstract":"With the escalating prevalence of datacasting, live streaming and high-quality video consumption on mobile devices, there arises an increasing demand for a cost-effective and reliable approach to transmit large volumes of such content to extensive audiences. While broadband mobile networks can increase capacity through denser base stations and higher frequencies, the linear pace of facility development makes it difficult to match the non-linear growth of the service throughput. Terrestrial Broadcast has proven itself to be significantly more efficient in transmitting popular video streams to mobile devices over a large area. However, due to its downlink-only nature, it falls short of delivering consistently reliable services. Hence, the convergence of terrestrial broadcast and broadband mobile networks has resurfaced as a pertinent topic for consideration. In this paper, terrestrial broadcast is adopted as the main pipe to transmit streaming services to mobile phones, with a 5th generation mobile communications (5G) new radio (NR) mobile carrier employed to provide complementary packet loss retransmission service, ensuring a seamless service experience. First, a cross-standard packet retransmission (CPR) scheme is proposed based on 5G broadcast and 5G NR Systems. Corresponding protocols and schemes are introduced, and a prototype system is realized. CPR is able to support delay-insensitive datacasting services very well, yet its higher layer convergence poses challenges for supporting delay-sensitive real-time services. To address this, a MAC-layer homogeneous packet retransmission (HPR) scheme is proposed. The basic principle is to utilize the carrier aggregation mechanism of 5G, modifying the protocols to enable one carrier to simulate broadcast while maintaining unicast in another carrier. In HPR, packet retransmission can be done at the MAC layer, reducing the retransmission delay to within 5 microseconds. Simulation and trial results are presented based on the proposed schemes.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 3","pages":"1113-1125"},"PeriodicalIF":3.2,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141528458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0