Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287125
Sebastian Eger, R. Pries, E. Steinbach
In recent years, a variety of visual SLAM (Simultaneous Localization and Mapping) systems have been proposed. These systems allow camera-equipped agents to create a map of the environment and determine their position within this map, even without an available GNSS signal. Visual SLAM algorithms differ mainly in the way the image information is processed and whether the resulting map is represented as a dense point cloud or with sparse feature points. However, most systems have in common that a high computational effort is necessary to create an accurate, correct and up-to-date pose and map. This is a challenge for smaller mobile agents with limited power and computing resources.In this paper, we investigate how the processing steps of a state-of-the-art feature-based visual SLAM system can be distributed among a mobile agent and an edge-cloud server. Depending on the specification of the agent, it can run the complete system locally, offload only the tracking and optimization part, or run nearly all processing steps on the server. For this purpose, the individual processing steps and their resulting data formats are examined and methods are presented how the data can be efficiently transmitted to the server. Our experimental evaluation shows that the CPU load can be reduced for all task distributions which offload part of the pipeline to the server. For agents with low computing power, the processing time for the pose estimation can even be reduced. In addition, the higher computing power of the server allows to increase the frame rate and accuracy for pose estimation.
近年来,人们提出了多种视觉SLAM (Simultaneous Localization and Mapping)系统。这些系统允许配备摄像头的代理人创建环境地图并确定其在地图中的位置,即使没有可用的GNSS信号。视觉SLAM算法的区别主要在于图像信息的处理方式,以及生成的地图是用密集的点云表示还是用稀疏的特征点表示。然而,大多数系统都有一个共同点,即需要大量的计算来创建准确、正确和最新的姿势和地图。这对于功率和计算资源有限的小型移动代理来说是一个挑战。在本文中,我们研究了如何在移动代理和边缘云服务器之间分配最先进的基于特征的视觉SLAM系统的处理步骤。根据代理的规范,它可以在本地运行整个系统,只卸载跟踪和优化部分,或者在服务器上运行几乎所有的处理步骤。为此,将检查各个处理步骤及其产生的数据格式,并介绍如何将数据有效地传输到服务器的方法。我们的实验评估表明,对于所有将部分管道卸载到服务器的任务分发版,CPU负载都可以减少。对于计算能力较低的智能体,姿态估计的处理时间甚至可以缩短。此外,服务器更高的计算能力允许提高帧速率和姿态估计的准确性。
{"title":"Evaluation of Different Task Distributions for Edge Cloud-based Collaborative Visual SLAM","authors":"Sebastian Eger, R. Pries, E. Steinbach","doi":"10.1109/MMSP48831.2020.9287125","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287125","url":null,"abstract":"In recent years, a variety of visual SLAM (Simultaneous Localization and Mapping) systems have been proposed. These systems allow camera-equipped agents to create a map of the environment and determine their position within this map, even without an available GNSS signal. Visual SLAM algorithms differ mainly in the way the image information is processed and whether the resulting map is represented as a dense point cloud or with sparse feature points. However, most systems have in common that a high computational effort is necessary to create an accurate, correct and up-to-date pose and map. This is a challenge for smaller mobile agents with limited power and computing resources.In this paper, we investigate how the processing steps of a state-of-the-art feature-based visual SLAM system can be distributed among a mobile agent and an edge-cloud server. Depending on the specification of the agent, it can run the complete system locally, offload only the tracking and optimization part, or run nearly all processing steps on the server. For this purpose, the individual processing steps and their resulting data formats are examined and methods are presented how the data can be efficiently transmitted to the server. Our experimental evaluation shows that the CPU load can be reduced for all task distributions which offload part of the pipeline to the server. For agents with low computing power, the processing time for the pose estimation can even be reduced. In addition, the higher computing power of the server allows to increase the frame rate and accuracy for pose estimation.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131785567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287160
D. Garcia, J. Hernandez, Steve Mann
We introduce a method to enhance the performance of the high dynamic range (HDR) technique on audio signals by automatically controlling the gains of the individual signal channels. Automatic gain control (AGC) compensates the receiver’s dynamic range by ensuring that the incoming signal is contained within the desired range while the HDR utilizes these multi-channel gains to extend the dynamic range of the composited signal. The results validate that the benefits given by each method are compounded when they are used together. In effect, we produce a dynamic high dynamic range (DHDR) composite signal. The HDR AGC method is simulated to show performance gains under various conditions. The method is then implemented using a custom PCB and a microcontroller to show feasibility in real-world and real-time applications.
{"title":"Automatic Gain Control for Enhanced HDR Performance on Audio","authors":"D. Garcia, J. Hernandez, Steve Mann","doi":"10.1109/MMSP48831.2020.9287160","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287160","url":null,"abstract":"We introduce a method to enhance the performance of the high dynamic range (HDR) technique on audio signals by automatically controlling the gains of the individual signal channels. Automatic gain control (AGC) compensates the receiver’s dynamic range by ensuring that the incoming signal is contained within the desired range while the HDR utilizes these multi-channel gains to extend the dynamic range of the composited signal. The results validate that the benefits given by each method are compounded when they are used together. In effect, we produce a dynamic high dynamic range (DHDR) composite signal. The HDR AGC method is simulated to show performance gains under various conditions. The method is then implemented using a custom PCB and a microcontroller to show feasibility in real-world and real-time applications.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133837621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287107
Tan-Sy Nguyen, Long H. Ngo, M. Luong, M. Kaaniche, Azeddine Beghdadi
In this paper, we propose an effective Convolutional Autoencoder (AE) model for Sparse Representation (SR) in the Wavelet Domain for Classification (SRWC). The proposed approach involves an autoencoder with a sparse latent layer for learning sparse codes of wavelet features. The estimated sparse codes are used for assigning classes to test samples using a residual-based probabilistic criterion. Intensive experiments carried out on various datasets revealed that the proposed method yields better classification accuracy while exhibiting a significant reduction in the number of network parameters, compared to several recent deep learning-based methods.
{"title":"Convolution Autoencoder-Based Sparse Representation Wavelet for Image Classification","authors":"Tan-Sy Nguyen, Long H. Ngo, M. Luong, M. Kaaniche, Azeddine Beghdadi","doi":"10.1109/MMSP48831.2020.9287107","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287107","url":null,"abstract":"In this paper, we propose an effective Convolutional Autoencoder (AE) model for Sparse Representation (SR) in the Wavelet Domain for Classification (SRWC). The proposed approach involves an autoencoder with a sparse latent layer for learning sparse codes of wavelet features. The estimated sparse codes are used for assigning classes to test samples using a residual-based probabilistic criterion. Intensive experiments carried out on various datasets revealed that the proposed method yields better classification accuracy while exhibiting a significant reduction in the number of network parameters, compared to several recent deep learning-based methods.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"24 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117007707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287096
Viktoria Heimann, Nils Genser, A. Kaup
Many applications in image processing require re-sampling of arbitrarily located samples onto regular grid positions. This is important in frame-rate up-conversion, super-resolution, and image warping among others. A state-of-the-art high quality model-based resampling technique is frequency-selective mesh-to-grid resampling which requires pre-estimation of key points. In this paper, we propose a new key point agnostic frequency-selective mesh-to-grid resampling that does not depend on pre-estimated key points. Hence, the number of data points that are included is reduced drastically and the run time decreases significantly. To compensate for the key points, a spectral weighting function is introduced that models the optical transfer function in order to favor low frequencies more than high ones. Thereby, resampling artefacts like ringing are supressed reliably and the resampling quality increases. On average, the new AFSMR is conceptually simpler and gains up to 1.2 dB in terms of PSNR compared to the original mesh-to-grid resampling while being approximately 14.5 times faster.
{"title":"Key Point Agnostic Frequency-Selective Mesh-to-Grid Image Resampling using Spectral Weighting","authors":"Viktoria Heimann, Nils Genser, A. Kaup","doi":"10.1109/MMSP48831.2020.9287096","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287096","url":null,"abstract":"Many applications in image processing require re-sampling of arbitrarily located samples onto regular grid positions. This is important in frame-rate up-conversion, super-resolution, and image warping among others. A state-of-the-art high quality model-based resampling technique is frequency-selective mesh-to-grid resampling which requires pre-estimation of key points. In this paper, we propose a new key point agnostic frequency-selective mesh-to-grid resampling that does not depend on pre-estimated key points. Hence, the number of data points that are included is reduced drastically and the run time decreases significantly. To compensate for the key points, a spectral weighting function is introduced that models the optical transfer function in order to favor low frequencies more than high ones. Thereby, resampling artefacts like ringing are supressed reliably and the resampling quality increases. On average, the new AFSMR is conceptually simpler and gains up to 1.2 dB in terms of PSNR compared to the original mesh-to-grid resampling while being approximately 14.5 times faster.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117267510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287106
I. Tabus, E. C. Kaya, S. Schwarz
The paper proposes a new lossy way of encoding the geometry of point clouds. The proposed scheme reconstructs the geometry from only the two depth maps associated to a single projection direction and then proposes a progressive reconstruction process using suitably defined anchor points. The reconstruction from the two depth images follows several primitives for analyzing and encoding, several of which are only optional. The resulting bitstream is embedded and can be truncated at various levels of reconstruction of the bounding volume. The encoding tools for encoding the needed entities are extremely simple and can be combined flexibly. The scheme can also be combined with the G-PCC coding, for reconstructing in a lossless way the sparse point clouds. The experiments show improvement of the rate-distortion performance of the proposed method when combined with the G-PCC codec as compared to G-PCC codec alone.
{"title":"Successive Refinement of Bounding Volumes for Point Cloud Coding","authors":"I. Tabus, E. C. Kaya, S. Schwarz","doi":"10.1109/MMSP48831.2020.9287106","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287106","url":null,"abstract":"The paper proposes a new lossy way of encoding the geometry of point clouds. The proposed scheme reconstructs the geometry from only the two depth maps associated to a single projection direction and then proposes a progressive reconstruction process using suitably defined anchor points. The reconstruction from the two depth images follows several primitives for analyzing and encoding, several of which are only optional. The resulting bitstream is embedded and can be truncated at various levels of reconstruction of the bounding volume. The encoding tools for encoding the needed entities are extremely simple and can be combined flexibly. The scheme can also be combined with the G-PCC coding, for reconstructing in a lossless way the sparse point clouds. The experiments show improvement of the rate-distortion performance of the proposed method when combined with the G-PCC codec as compared to G-PCC codec alone.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131537550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287115
Deepa Naik, S. Schwarz, V. Vadakital, Kimmo Roimela
Surface light-field (SLF) is a mapping of a set of color vectors to a set of ray vectors that originate at a point on a surface. It enables rendering photo-realistic view points in extended reality applications. However, the amount of data required to represent SLF is significantly more. Therefore, storing and distributing SLFs requires an efficient compressed representation. The Motion Pictures Experts Group (MPEG) has an on-going standard activity for the compression of point clouds. Until recently, this activity was targeting compression of single texture information, but is now investigating view dependent textures. In this paper, we propose methods to optimize coding of view dependent color without compromising on the visual quality. Our results show the optimizations provided in this paper reduce coded HEVC bit rate by 64% for the all-intra configuration and 52% for the random-access configuration, when compared to coding all texture independently.
{"title":"Surface Lightfield Support in Video-based Point Cloud Coding","authors":"Deepa Naik, S. Schwarz, V. Vadakital, Kimmo Roimela","doi":"10.1109/MMSP48831.2020.9287115","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287115","url":null,"abstract":"Surface light-field (SLF) is a mapping of a set of color vectors to a set of ray vectors that originate at a point on a surface. It enables rendering photo-realistic view points in extended reality applications. However, the amount of data required to represent SLF is significantly more. Therefore, storing and distributing SLFs requires an efficient compressed representation. The Motion Pictures Experts Group (MPEG) has an on-going standard activity for the compression of point clouds. Until recently, this activity was targeting compression of single texture information, but is now investigating view dependent textures. In this paper, we propose methods to optimize coding of view dependent color without compromising on the visual quality. Our results show the optimizations provided in this paper reduce coded HEVC bit rate by 64% for the all-intra configuration and 52% for the random-access configuration, when compared to coding all texture independently.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133931625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287059
P. Maniotis, N. Thomos
Despite the advances of 5G systems, the delivery of 360° video content in mobile networks remains challenging because of the size of 360° video files. Recently, edge caching has been shown to bring large performance gains to 360° Video on Demand (VoD) delivery systems, however existing systems cannot be straightforwardly applied to live 360° video streaming. To address this issue, we investigate edge cache-assisted live 360° video streaming. As videos’ and tiles’ popularities vary with time, our framework employs a Long Short-Term Memory (LSTM) network to determine the optimal cache placement/evictions strategies that optimize the quality of the videos rendered by the users. To further enhance the delivered video quality, users located in the overlap of the coverage areas of multiple SBSs are allowed to receive their data from any of these SBSs. We evaluate and compare the performance of our method with that of state-of-the-art systems. The results show the superiority of the proposed method against its counterparts, and make clear the benefits of accurate tiles’ popularity prediction by the LSTM networks and users association with multiple SBSs in terms of the delivered quality.
{"title":"Smart caching for live 360° video streaming in mobile networks","authors":"P. Maniotis, N. Thomos","doi":"10.1109/MMSP48831.2020.9287059","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287059","url":null,"abstract":"Despite the advances of 5G systems, the delivery of 360° video content in mobile networks remains challenging because of the size of 360° video files. Recently, edge caching has been shown to bring large performance gains to 360° Video on Demand (VoD) delivery systems, however existing systems cannot be straightforwardly applied to live 360° video streaming. To address this issue, we investigate edge cache-assisted live 360° video streaming. As videos’ and tiles’ popularities vary with time, our framework employs a Long Short-Term Memory (LSTM) network to determine the optimal cache placement/evictions strategies that optimize the quality of the videos rendered by the users. To further enhance the delivered video quality, users located in the overlap of the coverage areas of multiple SBSs are allowed to receive their data from any of these SBSs. We evaluate and compare the performance of our method with that of state-of-the-art systems. The results show the superiority of the proposed method against its counterparts, and make clear the benefits of accurate tiles’ popularity prediction by the LSTM networks and users association with multiple SBSs in terms of the delivered quality.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"165 3-4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134289122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287065
S. Fremerey, Steve Göring, Rakesh Rao Ramachandra Rao, Rachel Huang, A. Raake
During the last years, the number of 360° videos available for streaming has rapidly increased, leading to the need for 360° streaming video quality assessment. In this paper, we report and publish results of three subjective 360° video quality tests, with conditions used to reflect real-world bitrates and resolutions including 4K, 6K and 8K, resulting in 64 stimuli each for the first two tests and 63 for the third. As playout device we used the HTC Vive for the first and HTC Vive Pro for the remaining two tests. Video-quality ratings were collected using the 5-point Absolute Category Rating scale. The 360° dataset provided with the paper contains the links of the used source videos, the raw subjective scores, video-related meta-data, head rotation data and Simulator Sickness Questionnaire results per stimulus and per subject to enable reproducibility of the provided results. Moreover, we use our dataset to compare the performance of state-of-the-art full-reference quality metrics such as VMAF, PSNR, SSIM, ADM2, WS-PSNR and WS-SSIM. Out of all metrics, VMAF was found to show the highest correlation with the subjective scores. Further, we evaluated a center-cropped version of VMAF ("VMAF-cc") that showed to provide a similar performance as the full VMAF. In addition to the dataset and the objective metric evaluation, we propose two new video-quality prediction models, a bitstream meta-data-based model and a hybrid no-reference model using bitrate, resolution and pixel information of the video as input. The new lightweight models provide similar performance as the full-reference models while enabling fast calculations.
{"title":"Subjective Test Dataset and Meta-data-based Models for 360° Streaming Video Quality","authors":"S. Fremerey, Steve Göring, Rakesh Rao Ramachandra Rao, Rachel Huang, A. Raake","doi":"10.1109/MMSP48831.2020.9287065","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287065","url":null,"abstract":"During the last years, the number of 360° videos available for streaming has rapidly increased, leading to the need for 360° streaming video quality assessment. In this paper, we report and publish results of three subjective 360° video quality tests, with conditions used to reflect real-world bitrates and resolutions including 4K, 6K and 8K, resulting in 64 stimuli each for the first two tests and 63 for the third. As playout device we used the HTC Vive for the first and HTC Vive Pro for the remaining two tests. Video-quality ratings were collected using the 5-point Absolute Category Rating scale. The 360° dataset provided with the paper contains the links of the used source videos, the raw subjective scores, video-related meta-data, head rotation data and Simulator Sickness Questionnaire results per stimulus and per subject to enable reproducibility of the provided results. Moreover, we use our dataset to compare the performance of state-of-the-art full-reference quality metrics such as VMAF, PSNR, SSIM, ADM2, WS-PSNR and WS-SSIM. Out of all metrics, VMAF was found to show the highest correlation with the subjective scores. Further, we evaluated a center-cropped version of VMAF (\"VMAF-cc\") that showed to provide a similar performance as the full VMAF. In addition to the dataset and the objective metric evaluation, we propose two new video-quality prediction models, a bitstream meta-data-based model and a hybrid no-reference model using bitrate, resolution and pixel information of the video as input. The new lightweight models provide similar performance as the full-reference models while enabling fast calculations.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114294984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287050
Fariha Afsana, M. Paul, M. Murshed, D. Taubman
With the growth of video technologies, super-resolution videos, including 360-degree immersive video has become a reality due to exciting applications such as augmented/virtual/mixed reality for better interaction and a wide-angle user-view experience of a scene compared to traditional video with narrow-focused viewing angle. The new generation video contents are bandwidth-intensive in nature due to high resolution and demand high bit rate as well as low latency delivery requirements that pose challenges in solving the bottleneck of transmission and storage burdens. There is limited optimisation space in traditional video coding schemes for improving video coding efficiency in intra-frame due to the fixed size of processing block. This paper presents a new approach for improving intra-frame coding especially at low bit rate video transmission for 360-degree video for lossy mode of HEVC. Prior to using traditional HEVC intra-prediction, this approach exploits the global redundancy of entire frame by extracting common important information using multi-level discrete wavelet transformation. This paper demonstrates that the proposed method considering only low frequency information of a frame and encoding this can outperform the HEVC standard at low bit rates. The experimental results indicate that the proposed intra-frame coding strategy achieves an average of 54.07% BD-rate reduction and 2.84 dB BD-PSNR gain for low bit rate scenario compared to the HEVC. It also achieves a significant improvement in encoding time reduction of about 66.84% on an average. Moreover, this finding also demonstrates that the existing HEVC block partitioning can be applied in the transform domain for better exploitation of information concentration as we applied HEVC on wavelet frequency domain.
{"title":"Efficient Low Bit-Rate Intra-Frame Coding using Common Information for 360-degree Video","authors":"Fariha Afsana, M. Paul, M. Murshed, D. Taubman","doi":"10.1109/MMSP48831.2020.9287050","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287050","url":null,"abstract":"With the growth of video technologies, super-resolution videos, including 360-degree immersive video has become a reality due to exciting applications such as augmented/virtual/mixed reality for better interaction and a wide-angle user-view experience of a scene compared to traditional video with narrow-focused viewing angle. The new generation video contents are bandwidth-intensive in nature due to high resolution and demand high bit rate as well as low latency delivery requirements that pose challenges in solving the bottleneck of transmission and storage burdens. There is limited optimisation space in traditional video coding schemes for improving video coding efficiency in intra-frame due to the fixed size of processing block. This paper presents a new approach for improving intra-frame coding especially at low bit rate video transmission for 360-degree video for lossy mode of HEVC. Prior to using traditional HEVC intra-prediction, this approach exploits the global redundancy of entire frame by extracting common important information using multi-level discrete wavelet transformation. This paper demonstrates that the proposed method considering only low frequency information of a frame and encoding this can outperform the HEVC standard at low bit rates. The experimental results indicate that the proposed intra-frame coding strategy achieves an average of 54.07% BD-rate reduction and 2.84 dB BD-PSNR gain for low bit rate scenario compared to the HEVC. It also achieves a significant improvement in encoding time reduction of about 66.84% on an average. Moreover, this finding also demonstrates that the existing HEVC block partitioning can be applied in the transform domain for better exploitation of information concentration as we applied HEVC on wavelet frequency domain.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132179551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/mmsp48831.2020.9287079
{"title":"MMSP 2020 Cover Page","authors":"","doi":"10.1109/mmsp48831.2020.9287079","DOIUrl":"https://doi.org/10.1109/mmsp48831.2020.9287079","url":null,"abstract":"","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133101945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}