2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)最新文献

英文中文

Evaluation of Different Task Distributions for Edge Cloud-based Collaborative Visual SLAM 基于边缘云的协同视觉SLAM不同任务分布的评估

2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)

Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287125

Sebastian Eger, R. Pries, E. Steinbach

In recent years, a variety of visual SLAM (Simultaneous Localization and Mapping) systems have been proposed. These systems allow camera-equipped agents to create a map of the environment and determine their position within this map, even without an available GNSS signal. Visual SLAM algorithms differ mainly in the way the image information is processed and whether the resulting map is represented as a dense point cloud or with sparse feature points. However, most systems have in common that a high computational effort is necessary to create an accurate, correct and up-to-date pose and map. This is a challenge for smaller mobile agents with limited power and computing resources.In this paper, we investigate how the processing steps of a state-of-the-art feature-based visual SLAM system can be distributed among a mobile agent and an edge-cloud server. Depending on the specification of the agent, it can run the complete system locally, offload only the tracking and optimization part, or run nearly all processing steps on the server. For this purpose, the individual processing steps and their resulting data formats are examined and methods are presented how the data can be efficiently transmitted to the server. Our experimental evaluation shows that the CPU load can be reduced for all task distributions which offload part of the pipeline to the server. For agents with low computing power, the processing time for the pose estimation can even be reduced. In addition, the higher computing power of the server allows to increase the frame rate and accuracy for pose estimation.

近年来，人们提出了多种视觉SLAM (Simultaneous Localization and Mapping)系统。这些系统允许配备摄像头的代理人创建环境地图并确定其在地图中的位置，即使没有可用的GNSS信号。视觉SLAM算法的区别主要在于图像信息的处理方式，以及生成的地图是用密集的点云表示还是用稀疏的特征点表示。然而，大多数系统都有一个共同点，即需要大量的计算来创建准确、正确和最新的姿势和地图。这对于功率和计算资源有限的小型移动代理来说是一个挑战。在本文中，我们研究了如何在移动代理和边缘云服务器之间分配最先进的基于特征的视觉SLAM系统的处理步骤。根据代理的规范，它可以在本地运行整个系统，只卸载跟踪和优化部分，或者在服务器上运行几乎所有的处理步骤。为此，将检查各个处理步骤及其产生的数据格式，并介绍如何将数据有效地传输到服务器的方法。我们的实验评估表明，对于所有将部分管道卸载到服务器的任务分发版，CPU负载都可以减少。对于计算能力较低的智能体，姿态估计的处理时间甚至可以缩短。此外，服务器更高的计算能力允许提高帧速率和姿态估计的准确性。

{"title":"Evaluation of Different Task Distributions for Edge Cloud-based Collaborative Visual SLAM","authors":"Sebastian Eger, R. Pries, E. Steinbach","doi":"10.1109/MMSP48831.2020.9287125","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287125","url":null,"abstract":"In recent years, a variety of visual SLAM (Simultaneous Localization and Mapping) systems have been proposed. These systems allow camera-equipped agents to create a map of the environment and determine their position within this map, even without an available GNSS signal. Visual SLAM algorithms differ mainly in the way the image information is processed and whether the resulting map is represented as a dense point cloud or with sparse feature points. However, most systems have in common that a high computational effort is necessary to create an accurate, correct and up-to-date pose and map. This is a challenge for smaller mobile agents with limited power and computing resources.In this paper, we investigate how the processing steps of a state-of-the-art feature-based visual SLAM system can be distributed among a mobile agent and an edge-cloud server. Depending on the specification of the agent, it can run the complete system locally, offload only the tracking and optimization part, or run nearly all processing steps on the server. For this purpose, the individual processing steps and their resulting data formats are examined and methods are presented how the data can be efficiently transmitted to the server. Our experimental evaluation shows that the CPU load can be reduced for all task distributions which offload part of the pipeline to the server. For agents with low computing power, the processing time for the pose estimation can even be reduced. In addition, the higher computing power of the server allows to increase the frame rate and accuracy for pose estimation.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131785567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Automatic Gain Control for Enhanced HDR Performance on Audio 自动增益控制增强音频上的HDR性能

2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)

Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287160

D. Garcia, J. Hernandez, Steve Mann

We introduce a method to enhance the performance of the high dynamic range (HDR) technique on audio signals by automatically controlling the gains of the individual signal channels. Automatic gain control (AGC) compensates the receiver’s dynamic range by ensuring that the incoming signal is contained within the desired range while the HDR utilizes these multi-channel gains to extend the dynamic range of the composited signal. The results validate that the benefits given by each method are compounded when they are used together. In effect, we produce a dynamic high dynamic range (DHDR) composite signal. The HDR AGC method is simulated to show performance gains under various conditions. The method is then implemented using a custom PCB and a microcontroller to show feasibility in real-world and real-time applications.

介绍了一种通过自动控制各个信号通道的增益来提高音频信号高动态范围(HDR)技术性能的方法。自动增益控制(AGC)通过确保输入信号包含在期望范围内来补偿接收器的动态范围，而HDR利用这些多通道增益来扩展合成信号的动态范围。结果证明，当它们一起使用时，每种方法所带来的好处是复合的。实际上，我们产生了一个动态高动态范围(DHDR)复合信号。仿真显示了HDR AGC方法在不同条件下的性能增益。然后使用自定义PCB和微控制器实现该方法，以显示在现实世界和实时应用中的可行性。

引用次数: 4

Convolution Autoencoder-Based Sparse Representation Wavelet for Image Classification 基于卷积自编码器的稀疏表示小波图像分类

2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)

Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287107

Tan-Sy Nguyen, Long H. Ngo, M. Luong, M. Kaaniche, Azeddine Beghdadi

In this paper, we propose an effective Convolutional Autoencoder (AE) model for Sparse Representation (SR) in the Wavelet Domain for Classification (SRWC). The proposed approach involves an autoencoder with a sparse latent layer for learning sparse codes of wavelet features. The estimated sparse codes are used for assigning classes to test samples using a residual-based probabilistic criterion. Intensive experiments carried out on various datasets revealed that the proposed method yields better classification accuracy while exhibiting a significant reduction in the number of network parameters, compared to several recent deep learning-based methods.

本文提出了一种有效的卷积自编码器(AE)模型，用于小波域分类(SRWC)中的稀疏表示(SR)。该方法包括一个带有稀疏隐层的自编码器，用于学习小波特征的稀疏编码。估计的稀疏代码用于使用基于残差的概率准则为测试样本分配类。在各种数据集上进行的密集实验表明，与最近几种基于深度学习的方法相比，所提出的方法产生了更好的分类精度，同时显示出网络参数数量的显着减少。

引用次数: 4

Key Point Agnostic Frequency-Selective Mesh-to-Grid Image Resampling using Spectral Weighting 基于频谱加权的点不可知频率选择网格图像重采样

2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)

Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287096

Viktoria Heimann, Nils Genser, A. Kaup

Many applications in image processing require re-sampling of arbitrarily located samples onto regular grid positions. This is important in frame-rate up-conversion, super-resolution, and image warping among others. A state-of-the-art high quality model-based resampling technique is frequency-selective mesh-to-grid resampling which requires pre-estimation of key points. In this paper, we propose a new key point agnostic frequency-selective mesh-to-grid resampling that does not depend on pre-estimated key points. Hence, the number of data points that are included is reduced drastically and the run time decreases significantly. To compensate for the key points, a spectral weighting function is introduced that models the optical transfer function in order to favor low frequencies more than high ones. Thereby, resampling artefacts like ringing are supressed reliably and the resampling quality increases. On average, the new AFSMR is conceptually simpler and gains up to 1.2 dB in terms of PSNR compared to the original mesh-to-grid resampling while being approximately 14.5 times faster.

图像处理中的许多应用需要将任意位置的样本重新采样到规则的网格位置。这在帧率上转换、超分辨率和图像扭曲等方面非常重要。频率选择性网格重采样是一种基于模型的高质量重采样技术，它需要对关键点进行预估计。在本文中，我们提出了一种新的关键点不可知的频率选择网格到网格的重采样，它不依赖于预估计的关键点。因此，包含的数据点数量大大减少，运行时间也大大缩短。为了补偿关键点，引入了一个光谱加权函数来模拟光学传递函数，以便更倾向于低频而不是高频。从而可靠地抑制了振铃等重采样伪影，提高了重采样质量。平均而言，新的AFSMR在概念上更简单，与原始的网格到网格重采样相比，PSNR的增益高达1.2 dB，而速度约为14.5倍。

{"title":"Key Point Agnostic Frequency-Selective Mesh-to-Grid Image Resampling using Spectral Weighting","authors":"Viktoria Heimann, Nils Genser, A. Kaup","doi":"10.1109/MMSP48831.2020.9287096","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287096","url":null,"abstract":"Many applications in image processing require re-sampling of arbitrarily located samples onto regular grid positions. This is important in frame-rate up-conversion, super-resolution, and image warping among others. A state-of-the-art high quality model-based resampling technique is frequency-selective mesh-to-grid resampling which requires pre-estimation of key points. In this paper, we propose a new key point agnostic frequency-selective mesh-to-grid resampling that does not depend on pre-estimated key points. Hence, the number of data points that are included is reduced drastically and the run time decreases significantly. To compensate for the key points, a spectral weighting function is introduced that models the optical transfer function in order to favor low frequencies more than high ones. Thereby, resampling artefacts like ringing are supressed reliably and the resampling quality increases. On average, the new AFSMR is conceptually simpler and gains up to 1.2 dB in terms of PSNR compared to the original mesh-to-grid resampling while being approximately 14.5 times faster.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117267510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Successive Refinement of Bounding Volumes for Point Cloud Coding 点云编码中边界体的逐次细化

2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)

Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287106

I. Tabus, E. C. Kaya, S. Schwarz

The paper proposes a new lossy way of encoding the geometry of point clouds. The proposed scheme reconstructs the geometry from only the two depth maps associated to a single projection direction and then proposes a progressive reconstruction process using suitably defined anchor points. The reconstruction from the two depth images follows several primitives for analyzing and encoding, several of which are only optional. The resulting bitstream is embedded and can be truncated at various levels of reconstruction of the bounding volume. The encoding tools for encoding the needed entities are extremely simple and can be combined flexibly. The scheme can also be combined with the G-PCC coding, for reconstructing in a lossless way the sparse point clouds. The experiments show improvement of the rate-distortion performance of the proposed method when combined with the G-PCC codec as compared to G-PCC codec alone.

提出了一种新的有损点云几何编码方法。该方案仅从与单个投影方向相关的两个深度图重建几何形状，然后使用适当定义的锚点提出渐进重建过程。两个深度图像的重建遵循几个用于分析和编码的原语，其中几个是可选的。生成的比特流是嵌入的，可以在边界体的各个重建级别上截断。用于编码所需实体的编码工具非常简单，并且可以灵活组合。该方案还可以与G-PCC编码相结合，对稀疏点云进行无损重构。实验结果表明，与单独使用G-PCC编解码器相比，该方法与G-PCC编解码器结合使用后的码率失真性能得到了改善。

引用次数: 2

Surface Lightfield Support in Video-based Point Cloud Coding 基于视频的点云编码中的表面光场支持

2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)

Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287115

Deepa Naik, S. Schwarz, V. Vadakital, Kimmo Roimela

Surface light-field (SLF) is a mapping of a set of color vectors to a set of ray vectors that originate at a point on a surface. It enables rendering photo-realistic view points in extended reality applications. However, the amount of data required to represent SLF is significantly more. Therefore, storing and distributing SLFs requires an efficient compressed representation. The Motion Pictures Experts Group (MPEG) has an on-going standard activity for the compression of point clouds. Until recently, this activity was targeting compression of single texture information, but is now investigating view dependent textures. In this paper, we propose methods to optimize coding of view dependent color without compromising on the visual quality. Our results show the optimizations provided in this paper reduce coded HEVC bit rate by 64% for the all-intra configuration and 52% for the random-access configuration, when compared to coding all texture independently.

表面光场(SLF)是一组颜色向量到一组光线向量的映射，这些光线向量起源于表面上的一点。它可以在扩展现实应用程序中渲染逼真的视点。但是，表示SLF所需的数据量要大得多。因此，存储和分发slf需要一种有效的压缩表示。电影专家组(MPEG)有一个正在进行的压缩点云的标准活动。直到最近，这个活动都是针对单一纹理信息的压缩，但现在正在研究视图依赖的纹理。在本文中，我们提出了在不影响视觉质量的情况下优化视图相关颜色编码的方法。我们的结果表明，与独立编码所有纹理相比，本文提供的优化方法在全帧内配置下将编码HEVC比特率降低了64%，在随机访问配置下降低了52%。

引用次数: 2

Smart caching for live 360° video streaming in mobile networks 智能缓存的实时360°视频流在移动网络

2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)

Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287059

P. Maniotis, N. Thomos

Despite the advances of 5G systems, the delivery of 360° video content in mobile networks remains challenging because of the size of 360° video files. Recently, edge caching has been shown to bring large performance gains to 360° Video on Demand (VoD) delivery systems, however existing systems cannot be straightforwardly applied to live 360° video streaming. To address this issue, we investigate edge cache-assisted live 360° video streaming. As videos’ and tiles’ popularities vary with time, our framework employs a Long Short-Term Memory (LSTM) network to determine the optimal cache placement/evictions strategies that optimize the quality of the videos rendered by the users. To further enhance the delivered video quality, users located in the overlap of the coverage areas of multiple SBSs are allowed to receive their data from any of these SBSs. We evaluate and compare the performance of our method with that of state-of-the-art systems. The results show the superiority of the proposed method against its counterparts, and make clear the benefits of accurate tiles’ popularity prediction by the LSTM networks and users association with multiple SBSs in terms of the delivered quality.

尽管5G系统取得了进步，但由于360°视频文件的大小，在移动网络中传输360°视频内容仍然具有挑战性。最近，边缘缓存已经被证明可以为360°视频点播(VoD)交付系统带来巨大的性能提升，但是现有系统不能直接应用于实时360°视频流。为了解决这个问题，我们研究了边缘缓存辅助的360°实时视频流。由于视频和贴片的受欢迎程度随时间而变化，我们的框架采用长短期记忆(LSTM)网络来确定最佳缓存放置/删除策略，从而优化用户呈现的视频质量。为进一步提升传送的视像质素，位于多个附属电台覆盖范围重叠的用户可从其中任何一个附属电台接收数据。我们评估和比较我们的方法与最先进的系统的性能。结果表明了该方法相对于同类方法的优越性，并明确了LSTM网络对瓷砖流行度的准确预测以及用户与多个sbs的关联在交付质量方面的好处。

{"title":"Smart caching for live 360° video streaming in mobile networks","authors":"P. Maniotis, N. Thomos","doi":"10.1109/MMSP48831.2020.9287059","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287059","url":null,"abstract":"Despite the advances of 5G systems, the delivery of 360° video content in mobile networks remains challenging because of the size of 360° video files. Recently, edge caching has been shown to bring large performance gains to 360° Video on Demand (VoD) delivery systems, however existing systems cannot be straightforwardly applied to live 360° video streaming. To address this issue, we investigate edge cache-assisted live 360° video streaming. As videos’ and tiles’ popularities vary with time, our framework employs a Long Short-Term Memory (LSTM) network to determine the optimal cache placement/evictions strategies that optimize the quality of the videos rendered by the users. To further enhance the delivered video quality, users located in the overlap of the coverage areas of multiple SBSs are allowed to receive their data from any of these SBSs. We evaluate and compare the performance of our method with that of state-of-the-art systems. The results show the superiority of the proposed method against its counterparts, and make clear the benefits of accurate tiles’ popularity prediction by the LSTM networks and users association with multiple SBSs in terms of the delivered quality.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"165 3-4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134289122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Subjective Test Dataset and Meta-data-based Models for 360° Streaming Video Quality 主观测试数据集和基于元数据的360°流媒体视频质量模型

2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)

Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287065

S. Fremerey, Steve Göring, Rakesh Rao Ramachandra Rao, Rachel Huang, A. Raake

During the last years, the number of 360° videos available for streaming has rapidly increased, leading to the need for 360° streaming video quality assessment. In this paper, we report and publish results of three subjective 360° video quality tests, with conditions used to reflect real-world bitrates and resolutions including 4K, 6K and 8K, resulting in 64 stimuli each for the first two tests and 63 for the third. As playout device we used the HTC Vive for the first and HTC Vive Pro for the remaining two tests. Video-quality ratings were collected using the 5-point Absolute Category Rating scale. The 360° dataset provided with the paper contains the links of the used source videos, the raw subjective scores, video-related meta-data, head rotation data and Simulator Sickness Questionnaire results per stimulus and per subject to enable reproducibility of the provided results. Moreover, we use our dataset to compare the performance of state-of-the-art full-reference quality metrics such as VMAF, PSNR, SSIM, ADM2, WS-PSNR and WS-SSIM. Out of all metrics, VMAF was found to show the highest correlation with the subjective scores. Further, we evaluated a center-cropped version of VMAF ("VMAF-cc") that showed to provide a similar performance as the full VMAF. In addition to the dataset and the objective metric evaluation, we propose two new video-quality prediction models, a bitstream meta-data-based model and a hybrid no-reference model using bitrate, resolution and pixel information of the video as input. The new lightweight models provide similar performance as the full-reference models while enabling fast calculations.

在过去几年中，可用于流媒体的360°视频数量迅速增加，导致需要对360°流媒体视频质量进行评估。在本文中，我们报告并发布了三个主观360°视频质量测试的结果，这些测试的条件用于反映现实世界的比特率和分辨率，包括4K, 6K和8K，前两次测试各产生64个刺激，第三次测试各产生63个刺激。作为播放设备，我们使用HTC Vive进行了第一次测试，HTC Vive Pro进行了剩下的两次测试。视频质量评级是用5分绝对类别评级量表收集的。本文提供的360°数据集包含使用的源视频链接、原始主观评分、视频相关元数据、头部旋转数据和每个刺激和每个受试者的模拟器疾病问卷结果，以确保所提供结果的可重复性。此外，我们使用我们的数据集来比较最先进的全参考质量指标的性能，如VMAF、PSNR、SSIM、ADM2、WS-PSNR和WS-SSIM。在所有指标中，VMAF与主观得分的相关性最高。此外，我们评估了VMAF的中心裁剪版本(“VMAF-cc”)，显示其提供与完整VMAF相似的性能。除了数据集和客观度量评估之外，我们还提出了两种新的视频质量预测模型，即基于比特流元数据的模型和以视频的比特率、分辨率和像素信息为输入的混合无参考模型。新的轻量级模型提供与全参考模型相似的性能，同时实现快速计算。

{"title":"Subjective Test Dataset and Meta-data-based Models for 360° Streaming Video Quality","authors":"S. Fremerey, Steve Göring, Rakesh Rao Ramachandra Rao, Rachel Huang, A. Raake","doi":"10.1109/MMSP48831.2020.9287065","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287065","url":null,"abstract":"During the last years, the number of 360° videos available for streaming has rapidly increased, leading to the need for 360° streaming video quality assessment. In this paper, we report and publish results of three subjective 360° video quality tests, with conditions used to reflect real-world bitrates and resolutions including 4K, 6K and 8K, resulting in 64 stimuli each for the first two tests and 63 for the third. As playout device we used the HTC Vive for the first and HTC Vive Pro for the remaining two tests. Video-quality ratings were collected using the 5-point Absolute Category Rating scale. The 360° dataset provided with the paper contains the links of the used source videos, the raw subjective scores, video-related meta-data, head rotation data and Simulator Sickness Questionnaire results per stimulus and per subject to enable reproducibility of the provided results. Moreover, we use our dataset to compare the performance of state-of-the-art full-reference quality metrics such as VMAF, PSNR, SSIM, ADM2, WS-PSNR and WS-SSIM. Out of all metrics, VMAF was found to show the highest correlation with the subjective scores. Further, we evaluated a center-cropped version of VMAF (\"VMAF-cc\") that showed to provide a similar performance as the full VMAF. In addition to the dataset and the objective metric evaluation, we propose two new video-quality prediction models, a bitstream meta-data-based model and a hybrid no-reference model using bitrate, resolution and pixel information of the video as input. The new lightweight models provide similar performance as the full-reference models while enabling fast calculations.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114294984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Efficient Low Bit-Rate Intra-Frame Coding using Common Information for 360-degree Video 基于通用信息的高效低比特率360度视频帧内编码

2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)

Pub Date : 2020-09-21 DOI: 10.1109/MMSP48831.2020.9287050

Fariha Afsana, M. Paul, M. Murshed, D. Taubman

With the growth of video technologies, super-resolution videos, including 360-degree immersive video has become a reality due to exciting applications such as augmented/virtual/mixed reality for better interaction and a wide-angle user-view experience of a scene compared to traditional video with narrow-focused viewing angle. The new generation video contents are bandwidth-intensive in nature due to high resolution and demand high bit rate as well as low latency delivery requirements that pose challenges in solving the bottleneck of transmission and storage burdens. There is limited optimisation space in traditional video coding schemes for improving video coding efficiency in intra-frame due to the fixed size of processing block. This paper presents a new approach for improving intra-frame coding especially at low bit rate video transmission for 360-degree video for lossy mode of HEVC. Prior to using traditional HEVC intra-prediction, this approach exploits the global redundancy of entire frame by extracting common important information using multi-level discrete wavelet transformation. This paper demonstrates that the proposed method considering only low frequency information of a frame and encoding this can outperform the HEVC standard at low bit rates. The experimental results indicate that the proposed intra-frame coding strategy achieves an average of 54.07% BD-rate reduction and 2.84 dB BD-PSNR gain for low bit rate scenario compared to the HEVC. It also achieves a significant improvement in encoding time reduction of about 66.84% on an average. Moreover, this finding also demonstrates that the existing HEVC block partitioning can be applied in the transform domain for better exploitation of information concentration as we applied HEVC on wavelet frequency domain.

随着视频技术的发展，包括360度沉浸式视频在内的超分辨率视频已经成为现实，因为增强/虚拟/混合现实等令人兴奋的应用可以更好地进行交互，并且与传统视频相比，具有窄聚焦视角的用户观看场景的广角体验。新一代视频内容具有高分辨率、高比特率、低延迟传输等带宽密集型特点，对解决传输瓶颈和存储负担提出了挑战。传统的视频编码方案由于处理块的大小固定，提高帧内视频编码效率的优化空间有限。本文提出了一种改进帧内编码的新方法，特别是在HEVC有损模式下360度视频的低比特率视频传输中。该方法在传统HEVC内部预测方法的基础上，利用多级离散小波变换提取图像中常见的重要信息，充分利用整个帧的全局冗余性。本文证明，该方法仅考虑帧的低频信息并进行编码，可以在低比特率下优于HEVC标准。实验结果表明，在低比特率场景下，与HEVC相比，所提出的帧内编码策略实现了平均54.07%的降噪和2.84 dB的降噪增益。在编码时间方面也取得了显著的改善，平均减少了66.84%。此外，这一发现还表明，当我们在小波频域上应用HEVC时，现有的HEVC块划分可以应用于变换域，从而更好地利用信息集中。

{"title":"Efficient Low Bit-Rate Intra-Frame Coding using Common Information for 360-degree Video","authors":"Fariha Afsana, M. Paul, M. Murshed, D. Taubman","doi":"10.1109/MMSP48831.2020.9287050","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287050","url":null,"abstract":"With the growth of video technologies, super-resolution videos, including 360-degree immersive video has become a reality due to exciting applications such as augmented/virtual/mixed reality for better interaction and a wide-angle user-view experience of a scene compared to traditional video with narrow-focused viewing angle. The new generation video contents are bandwidth-intensive in nature due to high resolution and demand high bit rate as well as low latency delivery requirements that pose challenges in solving the bottleneck of transmission and storage burdens. There is limited optimisation space in traditional video coding schemes for improving video coding efficiency in intra-frame due to the fixed size of processing block. This paper presents a new approach for improving intra-frame coding especially at low bit rate video transmission for 360-degree video for lossy mode of HEVC. Prior to using traditional HEVC intra-prediction, this approach exploits the global redundancy of entire frame by extracting common important information using multi-level discrete wavelet transformation. This paper demonstrates that the proposed method considering only low frequency information of a frame and encoding this can outperform the HEVC standard at low bit rates. The experimental results indicate that the proposed intra-frame coding strategy achieves an average of 54.07% BD-rate reduction and 2.84 dB BD-PSNR gain for low bit rate scenario compared to the HEVC. It also achieves a significant improvement in encoding time reduction of about 66.84% on an average. Moreover, this finding also demonstrates that the existing HEVC block partitioning can be applied in the transform domain for better exploitation of information concentration as we applied HEVC on wavelet frequency domain.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132179551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

MMSP 2020 Cover Page MMSP 2020封面

2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)

Pub Date : 2020-09-21 DOI: 10.1109/mmsp48831.2020.9287079

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀