Pub Date : 2024-03-21DOI: 10.1109/TBC.2024.3374119
Yumei Wang;Junjie Li;Zhijun Li;Simou Shang;Yu Liu
360-degree videos usually require extremely high bandwidth and low latency for wireless transmission, which hinders their popularity. A tile-based viewport adaptive streaming scheme, which involves accurate viewport prediction and optimal bitrate adaptation to maintain user Quality of Experience (QoE) under a bandwidth-constrained network, has been proposed by researchers. However, viewport prediction is error-prone in long-term prediction, and bitrate adaptation schemes may waste bandwidth resources due to failing to consider various aspects of QoE. In this paper, we propose a synergistic temporal-spatial user-aware viewport prediction scheme for optimal adaptive 360-Degree video streaming (SPA360) to tackle these challenges. We use a user-aware viewport prediction mode, which offers a white box solution for Field of View (FoV) prediction. Specially, we employ temporal-spatial fusion for enhanced viewport prediction to minimize prediction errors. Our proposed utility prediction model jointly considers viewport probability distribution and metrics that directly affecting QoE to enable more precise bitrate adaptation. To optimize bitrate adaptation for tiled-based 360-degree video streaming, the problem is formulated as a packet knapsack problem and solved efficiently with a dynamic programming-based algorithm to maximize utility. The SPA360 scheme demonstrates improved performance in terms of both viewport prediction accuracy and bandwidth utilization, and our approach enhances the overall quality and efficiency of adaptive 360-degree video streaming.
{"title":"Synergistic Temporal-Spatial User-Aware Viewport Prediction for Optimal Adaptive 360-Degree Video Streaming","authors":"Yumei Wang;Junjie Li;Zhijun Li;Simou Shang;Yu Liu","doi":"10.1109/TBC.2024.3374119","DOIUrl":"10.1109/TBC.2024.3374119","url":null,"abstract":"360-degree videos usually require extremely high bandwidth and low latency for wireless transmission, which hinders their popularity. A tile-based viewport adaptive streaming scheme, which involves accurate viewport prediction and optimal bitrate adaptation to maintain user Quality of Experience (QoE) under a bandwidth-constrained network, has been proposed by researchers. However, viewport prediction is error-prone in long-term prediction, and bitrate adaptation schemes may waste bandwidth resources due to failing to consider various aspects of QoE. In this paper, we propose a synergistic temporal-spatial user-aware viewport prediction scheme for optimal adaptive 360-Degree video streaming (SPA360) to tackle these challenges. We use a user-aware viewport prediction mode, which offers a white box solution for Field of View (FoV) prediction. Specially, we employ temporal-spatial fusion for enhanced viewport prediction to minimize prediction errors. Our proposed utility prediction model jointly considers viewport probability distribution and metrics that directly affecting QoE to enable more precise bitrate adaptation. To optimize bitrate adaptation for tiled-based 360-degree video streaming, the problem is formulated as a packet knapsack problem and solved efficiently with a dynamic programming-based algorithm to maximize utility. The SPA360 scheme demonstrates improved performance in terms of both viewport prediction accuracy and bandwidth utilization, and our approach enhances the overall quality and efficiency of adaptive 360-degree video streaming.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 2","pages":"453-467"},"PeriodicalIF":4.5,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140197311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The visual quality of 3D-synthesized videos is closely related to the development and broadcasting of immersive media such as free-viewpoint videos and six degrees of freedom navigation. Therefore, studying the 3D-Synthesized video quality assessment is helpful to promote the popularity of immersive media applications. Inspired by the texture compression, depth compression and virtual view synthesis polluting the visual quality of 3D-synthesized videos at pixel-, structure- and content-levels, this paper proposes a Multi-Level 3D-Synthesized Video Quality Assessment algorithm, namely ML-SVQA, which consists of a quality feature perception module and a quality feature regression module. Specifically, the quality feature perception module firstly extracts motion vector fields of the 3D-synthesized video at pixel-, structure- and content-levels by combining the perception mechanism of human visual system. Then, the quality feature perception module measures the temporal flicker distortion intensity in the no-reference environment by calculating the self-similarity of adjacent motion vector fields. Finally, the quality feature regression module uses the machine learning algorithm to learn the mapping of the developed quality features to the quality score. Experiments constructed on the public IRCCyN/IVC and SIAT synthesized video datasets show that our ML-SVQA is more effective than state-of-the-art image/video quality assessment methods in evaluating the quality of 3D-Synthesized videos.
三维合成视频的视觉质量与自由视点视频和六自由度导航等沉浸式媒体的开发和播放密切相关。因此,研究三维合成视频质量评估有助于促进身临其境媒体应用的普及。受纹理压缩、深度压缩和虚拟视图合成在像素级、结构级和内容级污染三维合成视频视觉质量的启发,本文提出了一种多级三维合成视频质量评估算法,即 ML-SVQA,该算法由质量特征感知模块和质量特征回归模块组成。具体来说,质量特征感知模块首先结合人类视觉系统的感知机制,从像素、结构和内容三个层面提取三维合成视频的运动矢量场。然后,质量特征感知模块通过计算相邻运动矢量场的自相似性来测量无参照环境下的时间闪烁失真强度。最后,质量特征回归模块使用机器学习算法来学习所开发的质量特征与质量得分之间的映射关系。在公开的 IRCCyN/IVC 和 SIAT 合成视频数据集上构建的实验表明,在评估 3D 合成视频质量方面,我们的 ML-SVQA 比最先进的图像/视频质量评估方法更有效。
{"title":"No-Reference Multi-Level Video Quality Assessment Metric for 3D-Synthesized Videos","authors":"Guangcheng Wang;Baojin Huang;Ke Gu;Yuchen Liu;Hongyan Liu;Quan Shi;Guangtao Zhai;Wenjun Zhang","doi":"10.1109/TBC.2024.3396696","DOIUrl":"10.1109/TBC.2024.3396696","url":null,"abstract":"The visual quality of 3D-synthesized videos is closely related to the development and broadcasting of immersive media such as free-viewpoint videos and six degrees of freedom navigation. Therefore, studying the 3D-Synthesized video quality assessment is helpful to promote the popularity of immersive media applications. Inspired by the texture compression, depth compression and virtual view synthesis polluting the visual quality of 3D-synthesized videos at pixel-, structure- and content-levels, this paper proposes a Multi-Level 3D-Synthesized Video Quality Assessment algorithm, namely ML-SVQA, which consists of a quality feature perception module and a quality feature regression module. Specifically, the quality feature perception module firstly extracts motion vector fields of the 3D-synthesized video at pixel-, structure- and content-levels by combining the perception mechanism of human visual system. Then, the quality feature perception module measures the temporal flicker distortion intensity in the no-reference environment by calculating the self-similarity of adjacent motion vector fields. Finally, the quality feature regression module uses the machine learning algorithm to learn the mapping of the developed quality features to the quality score. Experiments constructed on the public IRCCyN/IVC and SIAT synthesized video datasets show that our ML-SVQA is more effective than state-of-the-art image/video quality assessment methods in evaluating the quality of 3D-Synthesized videos.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 2","pages":"584-596"},"PeriodicalIF":4.5,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141153267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-21DOI: 10.1109/TBC.2024.3394291
Qiang Zhu;Feiyu Chen;Yu Liu;Shuyuan Zhu;Bing Zeng
Compressed video super-resolution (VSR) is employed to generate high-resolution (HR) videos from low-resolution (LR) compressed videos. Recently, some compressed VSR methods have adopted coding priors, such as partition maps, compressed residual frames, predictive pictures and motion vectors, to generate HR videos. However, these methods disregard the design of modules according to the specific characteristics of coding information, which limits the application efficiency of coding priors. In this paper, we propose a deep compressed VSR network that effectively introduces coding priors to construct high-quality HR videos. Specifically, we design a partition-guided feature extraction module to extract features from the LR video with the guidance of the partition average image. Moreover, we separate the video features into sparse features and dense features according to the energy distribution of the compressed residual frame to achieve feature enhancement. Additionally, we construct a temporal attention-based feature fusion module to use motion vectors and predictive pictures to eliminate motion errors between frames and temporally fuse features. Based on these modules, the coding priors are effectively employed in our model for constructing high-quality HR videos. The experimental results demonstrate that our method achieves better performance and lower complexity than the state-of-the-arts.
{"title":"Deep Compressed Video Super-Resolution With Guidance of Coding Priors","authors":"Qiang Zhu;Feiyu Chen;Yu Liu;Shuyuan Zhu;Bing Zeng","doi":"10.1109/TBC.2024.3394291","DOIUrl":"10.1109/TBC.2024.3394291","url":null,"abstract":"Compressed video super-resolution (VSR) is employed to generate high-resolution (HR) videos from low-resolution (LR) compressed videos. Recently, some compressed VSR methods have adopted coding priors, such as partition maps, compressed residual frames, predictive pictures and motion vectors, to generate HR videos. However, these methods disregard the design of modules according to the specific characteristics of coding information, which limits the application efficiency of coding priors. In this paper, we propose a deep compressed VSR network that effectively introduces coding priors to construct high-quality HR videos. Specifically, we design a partition-guided feature extraction module to extract features from the LR video with the guidance of the partition average image. Moreover, we separate the video features into sparse features and dense features according to the energy distribution of the compressed residual frame to achieve feature enhancement. Additionally, we construct a temporal attention-based feature fusion module to use motion vectors and predictive pictures to eliminate motion errors between frames and temporally fuse features. Based on these modules, the coding priors are effectively employed in our model for constructing high-quality HR videos. The experimental results demonstrate that our method achieves better performance and lower complexity than the state-of-the-arts.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 2","pages":"505-515"},"PeriodicalIF":4.5,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141153653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Diffusion models have gained significant popularity for image-to-image translation tasks. Previous efforts applying diffusion models to image super-resolution have demonstrated that iteratively refining pure Gaussian noise using a U-Net architecture trained on denoising at various noise levels can yield satisfactory high-resolution images from low-resolution inputs. However, this iterative refinement process comes with the drawback of low inference speed, which strongly limits its applications. To speed up inference and further enhance the performance, our research revisits diffusion models in image super-resolution and proposes a straightforward yet significant diffusion model-based super-resolution method called ACDMSR (accelerated conditional diffusion model for image super-resolution). Specifically, we adopt existing image super-resolution methods and finetune them to provide conditional images from given low-resolution images, which can help to achieve better high-resolution results than just taking low-resolution images as conditional images. Then we adapt the diffusion model to perform super-resolution through a deterministic iterative denoising process, which helps to strongly decline the inference time. We demonstrate that our method surpasses previous attempts in qualitative and quantitative results through extensive experiments conducted on benchmark datasets such as Set5, Set14, Urban100, BSD100, and Manga109. Moreover, our approach generates more visually realistic counterparts for low-resolution images, emphasizing its effectiveness in practical scenarios.
{"title":"ACDMSR: Accelerated Conditional Diffusion Models for Single Image Super-Resolution","authors":"Axi Niu;Trung X. Pham;Kang Zhang;Jinqiu Sun;Yu Zhu;Qingsen Yan;In So Kweon;Yanning Zhang","doi":"10.1109/TBC.2024.3374122","DOIUrl":"10.1109/TBC.2024.3374122","url":null,"abstract":"Diffusion models have gained significant popularity for image-to-image translation tasks. Previous efforts applying diffusion models to image super-resolution have demonstrated that iteratively refining pure Gaussian noise using a U-Net architecture trained on denoising at various noise levels can yield satisfactory high-resolution images from low-resolution inputs. However, this iterative refinement process comes with the drawback of low inference speed, which strongly limits its applications. To speed up inference and further enhance the performance, our research revisits diffusion models in image super-resolution and proposes a straightforward yet significant diffusion model-based super-resolution method called ACDMSR (accelerated conditional diffusion model for image super-resolution). Specifically, we adopt existing image super-resolution methods and finetune them to provide conditional images from given low-resolution images, which can help to achieve better high-resolution results than just taking low-resolution images as conditional images. Then we adapt the diffusion model to perform super-resolution through a deterministic iterative denoising process, which helps to strongly decline the inference time. We demonstrate that our method surpasses previous attempts in qualitative and quantitative results through extensive experiments conducted on benchmark datasets such as Set5, Set14, Urban100, BSD100, and Manga109. Moreover, our approach generates more visually realistic counterparts for low-resolution images, emphasizing its effectiveness in practical scenarios.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 2","pages":"492-504"},"PeriodicalIF":4.5,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140205610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The adaptive bitrate (ABR) algorithm plays a crucial role in ensuring satisfactory quality of experience (QoE) in video streaming applications. Most existing approaches, either rule-based or learning-driven, tend to conduct ABR decisions based on limited network statistics, e.g., mean/standard deviation of recent throughput measurements. However, all of them lack a good understanding of network dynamics given the varying network conditions from time to time, leading to compromised performance, especially when the network condition changes significantly. In this paper, we propose a framework named ANT that aims to enhance adaptive video streaming by accurately learning network dynamics. ANT represents and detects specific network conditions by characterizing the entire spectrum of network fluctuations. It further trains multiple dedicated ABR models for each condition using deep reinforcement learning. During inference, a dynamic switching mechanism is devised to activate the appropriate ABR model based on real-time network condition sensing, enabling ANT to automatically adjust its control policies to different network conditions. Extensive experimental results demonstrate that our proposed ANT achieves a significant improvement in user QoE of 20.8%-41.2% in the video-on-demand scenario and 67.4%-134.5% in the live-streaming scenario compared to state-of-the-art methods, across a wide range of network conditions.
在视频流应用中,自适应比特率(ABR)算法对确保令人满意的体验质量(QoE)起着至关重要的作用。大多数现有方法,无论是基于规则的还是学习驱动的,都倾向于根据有限的网络统计数据(如最近吞吐量测量的平均值/标准偏差)做出 ABR 决定。然而,所有这些方法都缺乏对网络动态的充分了解,因为网络条件时常变化,导致性能受损,尤其是当网络条件发生重大变化时。在本文中,我们提出了一个名为 ANT 的框架,旨在通过准确学习网络动态来增强自适应视频流。ANT 通过描述整个网络波动频谱来表示和检测特定的网络条件。它还利用深度强化学习为每种情况训练多个专用 ABR 模型。在推理过程中,我们设计了一种动态切换机制,根据实时网络状况感知激活适当的 ABR 模型,使 ANT 能够根据不同的网络状况自动调整其控制策略。广泛的实验结果表明,与最先进的方法相比,在各种网络条件下,我们提出的 ANT 在视频点播场景中显著改善了用户 QoE,改善幅度为 20.8%-41.2%,在直播场景中改善幅度为 67.4%-134.5%。
{"title":"Learning Accurate Network Dynamics for Enhanced Adaptive Video Streaming","authors":"Jiaoyang Yin;Hao Chen;Yiling Xu;Zhan Ma;Xiaozhong Xu","doi":"10.1109/TBC.2024.3396698","DOIUrl":"10.1109/TBC.2024.3396698","url":null,"abstract":"The adaptive bitrate (ABR) algorithm plays a crucial role in ensuring satisfactory quality of experience (QoE) in video streaming applications. Most existing approaches, either rule-based or learning-driven, tend to conduct ABR decisions based on limited network statistics, e.g., mean/standard deviation of recent throughput measurements. However, all of them lack a good understanding of network dynamics given the varying network conditions from time to time, leading to compromised performance, especially when the network condition changes significantly. In this paper, we propose a framework named ANT that aims to enhance adaptive video streaming by accurately learning network dynamics. ANT represents and detects specific network conditions by characterizing the entire spectrum of network fluctuations. It further trains multiple dedicated ABR models for each condition using deep reinforcement learning. During inference, a dynamic switching mechanism is devised to activate the appropriate ABR model based on real-time network condition sensing, enabling ANT to automatically adjust its control policies to different network conditions. Extensive experimental results demonstrate that our proposed ANT achieves a significant improvement in user QoE of 20.8%-41.2% in the video-on-demand scenario and 67.4%-134.5% in the live-streaming scenario compared to state-of-the-art methods, across a wide range of network conditions.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 3","pages":"808-821"},"PeriodicalIF":3.2,"publicationDate":"2024-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141060322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Layered Division Multiplexing (LDM) is a Power-based Non-Orthogonal Multiplexing (P-NOM) technique that has been implemented in the Advanced Television System Committee (ATSC) 3.0 terrestrial TV physical layer to effectively multiplex services with different robustness and data rate requirements. As communication systems quickly evolve, the services to be delivered are becoming more diverse and versatile. Up to now, the LDM system adopted in the terrestrial TV system uses a uniform injection level for the lower-level (or Layer 2) signal injection. This paper investigates the non-uniform injection level LDM (NULDM). The proposed technique can explore the Unequal Error Protection (UEP) property of Low-Density Parity-Check (LDPC) codes and the flexible power allocation nature of the NULDM to improve the system performance and spectrum efficiency. NULDM enables the seamless integration of broadcast/multicast and unicast services in one RF channel, where the unicast signal can assign different resources (power, frequency, and time) based on the UE distance and service requirements. Meanwhile, more power could be allocated to improve the upper layer (or Layer 1) broadcast and datacast services. To make better use of the UEP property of LDPC codes in NULDM, the extended Gaussian mixture approximation (EGMA) method is used to design bit interleaving patterns. Additionally, inspired by the channel order of polar codes, this paper proposes an LDPC sub-block interleaving order (SBIO) scheme that performs similarly to the EGMA interleaving model, while better adapting to the diverse needs of proposed mixed service delivery scenarios for convergence of broadband wireless communications and broadcasting systems.
{"title":"LDPC-Coded LDM Systems Employing Non-Uniform Injection Level for Combining Broadcast and Multicast/Unicast Services","authors":"Hao Ju;Yin Xu;Ruiqi Liu;Dazhi He;Sungjun Ahn;Namho Hur;Sung-Ik Park;Wenjun Zhang;Yiyan Wu","doi":"10.1109/TBC.2024.3394296","DOIUrl":"10.1109/TBC.2024.3394296","url":null,"abstract":"Layered Division Multiplexing (LDM) is a Power-based Non-Orthogonal Multiplexing (P-NOM) technique that has been implemented in the Advanced Television System Committee (ATSC) 3.0 terrestrial TV physical layer to effectively multiplex services with different robustness and data rate requirements. As communication systems quickly evolve, the services to be delivered are becoming more diverse and versatile. Up to now, the LDM system adopted in the terrestrial TV system uses a uniform injection level for the lower-level (or Layer 2) signal injection. This paper investigates the non-uniform injection level LDM (NULDM). The proposed technique can explore the Unequal Error Protection (UEP) property of Low-Density Parity-Check (LDPC) codes and the flexible power allocation nature of the NULDM to improve the system performance and spectrum efficiency. NULDM enables the seamless integration of broadcast/multicast and unicast services in one RF channel, where the unicast signal can assign different resources (power, frequency, and time) based on the UE distance and service requirements. Meanwhile, more power could be allocated to improve the upper layer (or Layer 1) broadcast and datacast services. To make better use of the UEP property of LDPC codes in NULDM, the extended Gaussian mixture approximation (EGMA) method is used to design bit interleaving patterns. Additionally, inspired by the channel order of polar codes, this paper proposes an LDPC sub-block interleaving order (SBIO) scheme that performs similarly to the EGMA interleaving model, while better adapting to the diverse needs of proposed mixed service delivery scenarios for convergence of broadband wireless communications and broadcasting systems.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 3","pages":"1032-1043"},"PeriodicalIF":3.2,"publicationDate":"2024-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141060278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-13DOI: 10.1109/TBC.2024.3363455
Mohammed Amine Togou;Anderson Augusto Simiscuka;Rohit Verma;Noel E. O’Connor;Iñigo Tamayo;Stefano Masneri;Mikel Zorrilla;Gabriel-Miro Muntean
Due to the COVID-19 pandemic, most arts and cultural activities have moved online. This has contributed to the surge in development of artistic tools that enable professional artists to produce engaging and immersive shows remotely. This article introduces TRACTION Co-Creation Stage (TCS), a novel Web-based solution, designed and developed in the context of the EU Horizon 2020 TRACTION project, which allows for remote creation and delivery of artistic shows. TCS supports multiple artists performing simultaneously, either live or pre-recorded, on multiple stages at different geographical locations. It employs a client-server approach. The client has two major components: Control and Display. The former is used by the production teams to create shows by specifying layouts, scenes, and media sources to be included. The latter is used by viewers to watch the various shows. To ensure viewers’ good quality of experience (QoE) levels, TCS employs adaptive streaming based on a novel Prioritised Adaptation solution based on the DASH standard for pre-recorded content delivery (PADA), which is introduced in this paper. User tests and experiments are carried out to evaluate the performance of TCS’ Control and Display applications and that of PADA algorithm when creating and distributing opera shows.
{"title":"An Innovative Adaptive Web-Based Solution for Improved Remote Co-Creation and Delivery of Artistic Performances","authors":"Mohammed Amine Togou;Anderson Augusto Simiscuka;Rohit Verma;Noel E. O’Connor;Iñigo Tamayo;Stefano Masneri;Mikel Zorrilla;Gabriel-Miro Muntean","doi":"10.1109/TBC.2024.3363455","DOIUrl":"10.1109/TBC.2024.3363455","url":null,"abstract":"Due to the COVID-19 pandemic, most arts and cultural activities have moved online. This has contributed to the surge in development of artistic tools that enable professional artists to produce engaging and immersive shows remotely. This article introduces TRACTION Co-Creation Stage (TCS), a novel Web-based solution, designed and developed in the context of the EU Horizon 2020 TRACTION project, which allows for remote creation and delivery of artistic shows. TCS supports multiple artists performing simultaneously, either live or pre-recorded, on multiple stages at different geographical locations. It employs a client-server approach. The client has two major components: Control and Display. The former is used by the production teams to create shows by specifying layouts, scenes, and media sources to be included. The latter is used by viewers to watch the various shows. To ensure viewers’ good quality of experience (QoE) levels, TCS employs adaptive streaming based on a novel Prioritised Adaptation solution based on the DASH standard for pre-recorded content delivery (PADA), which is introduced in this paper. User tests and experiments are carried out to evaluate the performance of TCS’ Control and Display applications and that of PADA algorithm when creating and distributing opera shows.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 2","pages":"719-730"},"PeriodicalIF":4.5,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10472407","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140125226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-10DOI: 10.1109/TBC.2024.3391056
John A. Snoap;Dimitrie C. Popescu;Chad M. Spooner
The paper presents a novel deep-learning (DL) based classifier for digitally modulated signals that uses a capsule network (CAP) with custom-designed feature extraction layers. The classifier takes the in-phase/quadrature (I/Q) components of the digitally modulated signal as input, and the feature extraction layers are inspired by cyclostationary signal processing (CSP) techniques, which extract the cyclic cumulant (CC) features that are employed by conventional CSP-based approaches to blind modulation classification and signal identification. Specifically, the feature extraction layers implement a proxy of the mathematical functions used in the calculation of the CC features and include a squaring layer, a raise-to-the-power-of-three layer, and a fast-Fourier-transform (FFT) layer, along with additional normalization and warping layers to ensure that the relative signal powers are retained and to prevent the trainable neural network (NN) layers from diverging in the training process. The classification performance and the generalization abilities of the proposed CAP are tested using two distinct datasets that contain similar classes of digitally modulated signals but that have been generated independently, and numerical results obtained reveal that the proposed CAP with novel feature extraction layers achieves high classification accuracy while also outperforming alternative DL-based approaches for signal classification in terms of both classification accuracy and generalization abilities.
本文介绍了一种基于深度学习(DL)的新型数字调制信号分类器,该分类器使用带有定制设计特征提取层的胶囊网络(CAP)。该分类器将数字调制信号的同相/正交(I/Q)分量作为输入,而特征提取层则受到环静止信号处理(CSP)技术的启发,该技术可提取循环累积(CC)特征,这些特征被传统的基于 CSP 的方法用于盲调制分类和信号识别。具体来说,特征提取层实现了用于计算 CC 特征的数学函数的代理,包括一个平方层、一个三倍功率层和一个快速傅里叶变换(FFT)层,以及额外的归一化和翘曲层,以确保保留相对信号功率,并防止可训练神经网络(NN)层在训练过程中发散。使用两个不同的数据集测试了所提出的 CAP 的分类性能和泛化能力,这两个数据集包含类似类别的数字调制信号,但都是独立生成的。数值结果表明,所提出的 CAP 连同新颖的特征提取层实现了较高的分类精度,同时在分类精度和泛化能力方面也优于其他基于 DL 的信号分类方法。
{"title":"Deep-Learning-Based Classifier With Custom Feature-Extraction Layers for Digitally Modulated Signals","authors":"John A. Snoap;Dimitrie C. Popescu;Chad M. Spooner","doi":"10.1109/TBC.2024.3391056","DOIUrl":"10.1109/TBC.2024.3391056","url":null,"abstract":"The paper presents a novel deep-learning (DL) based classifier for digitally modulated signals that uses a capsule network (CAP) with custom-designed feature extraction layers. The classifier takes the in-phase/quadrature (I/Q) components of the digitally modulated signal as input, and the feature extraction layers are inspired by cyclostationary signal processing (CSP) techniques, which extract the cyclic cumulant (CC) features that are employed by conventional CSP-based approaches to blind modulation classification and signal identification. Specifically, the feature extraction layers implement a proxy of the mathematical functions used in the calculation of the CC features and include a squaring layer, a raise-to-the-power-of-three layer, and a fast-Fourier-transform (FFT) layer, along with additional normalization and warping layers to ensure that the relative signal powers are retained and to prevent the trainable neural network (NN) layers from diverging in the training process. The classification performance and the generalization abilities of the proposed CAP are tested using two distinct datasets that contain similar classes of digitally modulated signals but that have been generated independently, and numerical results obtained reveal that the proposed CAP with novel feature extraction layers achieves high classification accuracy while also outperforming alternative DL-based approaches for signal classification in terms of both classification accuracy and generalization abilities.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 3","pages":"763-773"},"PeriodicalIF":3.2,"publicationDate":"2024-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140934724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-10DOI: 10.1109/TBC.2024.3394297
Fei Zhou;Zikang Zheng;Guoping Qiu
Displaying standard dynamic range (SDR) videos on high dynamic range (HDR) devices requires inverse tone mapping (ITM). However, such mapping can introduce banding artifacts. This paper presents a banding removal method for inversely tone mapped HDR videos based on deep convolutional neural networks (DCNNs) and adaptive filtering. Three banding relevant feature maps are first extracted and then fed to two DCNNs, a ShapeNet and a PositionNet. The PositionNet learns a soft mask indicating the locations where banding is likely to have occurred and filtering is required while the ShapeNet predicts the filter shapes appropriate for different locations. An advantage of the method is that the adaptive filters can be jointly optimized with a learning-based ITM algorithm for creating high-quality HDR videos. Experimental results show that our method outperforms state-of-the-art algorithms qualitatively and quantitatively.
{"title":"Removing Banding Artifacts in HDR Videos Generated From Inverse Tone Mapping","authors":"Fei Zhou;Zikang Zheng;Guoping Qiu","doi":"10.1109/TBC.2024.3394297","DOIUrl":"10.1109/TBC.2024.3394297","url":null,"abstract":"Displaying standard dynamic range (SDR) videos on high dynamic range (HDR) devices requires inverse tone mapping (ITM). However, such mapping can introduce banding artifacts. This paper presents a banding removal method for inversely tone mapped HDR videos based on deep convolutional neural networks (DCNNs) and adaptive filtering. Three banding relevant feature maps are first extracted and then fed to two DCNNs, a ShapeNet and a PositionNet. The PositionNet learns a soft mask indicating the locations where banding is likely to have occurred and filtering is required while the ShapeNet predicts the filter shapes appropriate for different locations. An advantage of the method is that the adaptive filters can be jointly optimized with a learning-based ITM algorithm for creating high-quality HDR videos. Experimental results show that our method outperforms state-of-the-art algorithms qualitatively and quantitatively.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 2","pages":"753-762"},"PeriodicalIF":4.5,"publicationDate":"2024-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140934722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Orthogonal frequency division multiplexing indexed modulation (OFDM-IM), an emerging multi-carrier modulation technique, offers significant advantages over traditional OFDM. The OFDM-IM scheme exhibits superior performance in terms of bit error rate (BER) at low and medium data rates, while also enhancing resilience to inter-carrier interference in dynamically changing channels. However, the challenge of a high peak-to-average ratio (PAPR) also persists in OFDM-IM. In this study, we propose a novel approach to mitigate PAPR by introducing a small dither signal to the idle subcarrier, leveraging the inherent characteristics of OFDM-IM. Subsequently, we address the nonconvex and non-smooth optimization problem of minimizing the maximum amplitude of dither signals while maintaining a constant PAPR constraint. To effectively tackle this challenging optimization task, we adopt the linearized alternating direction multiplier method (LADMM), referred to as the LADMM-direct algorithm, which provides a simple closed-form solution for each subproblem encountered during the optimization process. To improve the convergence rate of the LADMM-direct algorithm, a LADMM-relax algorithm is also proposed to address the PAPR problem. Simulation results demonstrate that our proposed LADMM-direct and LADMM-relax algorithms significantly reduce computational complexity and achieve superior performance in terms of both PAPR and bit error rate (BER) compared to state-of-the-art algorithms.
{"title":"Optimal OFDM-IM Signals With Constant PAPR","authors":"Jiabo Hu;Yajun Wang;Zhuxian Lian;Yinjie Su;Zhibin Xie","doi":"10.1109/TBC.2024.3394292","DOIUrl":"10.1109/TBC.2024.3394292","url":null,"abstract":"Orthogonal frequency division multiplexing indexed modulation (OFDM-IM), an emerging multi-carrier modulation technique, offers significant advantages over traditional OFDM. The OFDM-IM scheme exhibits superior performance in terms of bit error rate (BER) at low and medium data rates, while also enhancing resilience to inter-carrier interference in dynamically changing channels. However, the challenge of a high peak-to-average ratio (PAPR) also persists in OFDM-IM. In this study, we propose a novel approach to mitigate PAPR by introducing a small dither signal to the idle subcarrier, leveraging the inherent characteristics of OFDM-IM. Subsequently, we address the nonconvex and non-smooth optimization problem of minimizing the maximum amplitude of dither signals while maintaining a constant PAPR constraint. To effectively tackle this challenging optimization task, we adopt the linearized alternating direction multiplier method (LADMM), referred to as the LADMM-direct algorithm, which provides a simple closed-form solution for each subproblem encountered during the optimization process. To improve the convergence rate of the LADMM-direct algorithm, a LADMM-relax algorithm is also proposed to address the PAPR problem. Simulation results demonstrate that our proposed LADMM-direct and LADMM-relax algorithms significantly reduce computational complexity and achieve superior performance in terms of both PAPR and bit error rate (BER) compared to state-of-the-art algorithms.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 3","pages":"945-954"},"PeriodicalIF":3.2,"publicationDate":"2024-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140934688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}