Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287131
Pasi Pertilä, Mikko Parviainen, V. Myllylä, A. Huttunen, P. Jarske
The spatial information about a sound source is carried by acoustic waves to a microphone array and can be observed through estimation of phase and amplitude differences between microphones. Time difference of arrival (TDoA) captures the propagation delay of the wavefront between microphones and can be used to steer a beamformer or to localize the source. However, reverberation and interference can deteriorate the TDoA estimate. Deep neural networks (DNNs) through supervised learning can extract speech related TDoAs in more adverse conditions than traditional correlation -based methods.Acoustic simulations provide large amounts of data with annotations, while real recordings require manual annotations or the use of reference sensors with proper calibration procedures. The distributions of these two data sources can differ. When a DNN model that is trained using simulated data is presented with real data from a different distribution, its performance decreases if not properly addressed.For the reduction of DNN –based TDoA estimation error, this work investigates the role of different input normalization techniques, mixing of simulated and real data for training, and applying an adversarial domain adaptation technique. Results quantify the reduction in TDoA error for real data using the different approaches. It is evident that the use of normalization methods, domain-adaptation, and real data during training can reduce the TDoA error.
{"title":"Time Difference of Arrival Estimation with Deep Learning – From Acoustic Simulations to Recorded Data","authors":"Pasi Pertilä, Mikko Parviainen, V. Myllylä, A. Huttunen, P. Jarske","doi":"10.1109/MMSP48831.2020.9287131","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287131","url":null,"abstract":"The spatial information about a sound source is carried by acoustic waves to a microphone array and can be observed through estimation of phase and amplitude differences between microphones. Time difference of arrival (TDoA) captures the propagation delay of the wavefront between microphones and can be used to steer a beamformer or to localize the source. However, reverberation and interference can deteriorate the TDoA estimate. Deep neural networks (DNNs) through supervised learning can extract speech related TDoAs in more adverse conditions than traditional correlation -based methods.Acoustic simulations provide large amounts of data with annotations, while real recordings require manual annotations or the use of reference sensors with proper calibration procedures. The distributions of these two data sources can differ. When a DNN model that is trained using simulated data is presented with real data from a different distribution, its performance decreases if not properly addressed.For the reduction of DNN –based TDoA estimation error, this work investigates the role of different input normalization techniques, mixing of simulated and real data for training, and applying an adversarial domain adaptation technique. Results quantify the reduction in TDoA error for real data using the different approaches. It is evident that the use of normalization methods, domain-adaptation, and real data during training can reduce the TDoA error.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131418773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287132
Anna Meyer, Nils Genser, A. Kaup
Recent developments in optical sensors enable a wide range of applications for multispectral imaging, e.g., in surveillance, optical sorting, and life-science instrumentation. Increasing spatial and spectral resolution allows creating higher quality products, however, it poses challenges in handling such large amounts of data. Consequently, specialized compression techniques for multispectral images are required. High Efficiency Video Coding (HEVC) is known to be the state of the art in efficiency for both video coding and still image coding. In this paper, we propose a cross-spectral compression scheme for efficiently coding multispectral data based on HEVC. Extending intra picture prediction by a novel inter-band predictor, spectral as well as spatial redundancies can be effectively exploited. Dependencies among the current band and further spectral references are considered jointly by adaptive linear regression modeling. The proposed backward prediction scheme does not require additional side information for decoding. We show that our novel approach is able to outperform state-of-the-art lossy compression techniques in terms of rate-distortion performance. On different data sets, average Bjøntegaard delta rate savings of 82 % and 55 % compared to HEVC and a reference method from literature are achieved, respectively.
{"title":"Multispectral Image Compression Based on HEVC Using Pel-Recursive Inter-Band Prediction","authors":"Anna Meyer, Nils Genser, A. Kaup","doi":"10.1109/MMSP48831.2020.9287132","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287132","url":null,"abstract":"Recent developments in optical sensors enable a wide range of applications for multispectral imaging, e.g., in surveillance, optical sorting, and life-science instrumentation. Increasing spatial and spectral resolution allows creating higher quality products, however, it poses challenges in handling such large amounts of data. Consequently, specialized compression techniques for multispectral images are required. High Efficiency Video Coding (HEVC) is known to be the state of the art in efficiency for both video coding and still image coding. In this paper, we propose a cross-spectral compression scheme for efficiently coding multispectral data based on HEVC. Extending intra picture prediction by a novel inter-band predictor, spectral as well as spatial redundancies can be effectively exploited. Dependencies among the current band and further spectral references are considered jointly by adaptive linear regression modeling. The proposed backward prediction scheme does not require additional side information for decoding. We show that our novel approach is able to outperform state-of-the-art lossy compression techniques in terms of rate-distortion performance. On different data sets, average Bjøntegaard delta rate savings of 82 % and 55 % compared to HEVC and a reference method from literature are achieved, respectively.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134052280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287107
Tan-Sy Nguyen, Long H. Ngo, M. Luong, M. Kaaniche, Azeddine Beghdadi
In this paper, we propose an effective Convolutional Autoencoder (AE) model for Sparse Representation (SR) in the Wavelet Domain for Classification (SRWC). The proposed approach involves an autoencoder with a sparse latent layer for learning sparse codes of wavelet features. The estimated sparse codes are used for assigning classes to test samples using a residual-based probabilistic criterion. Intensive experiments carried out on various datasets revealed that the proposed method yields better classification accuracy while exhibiting a significant reduction in the number of network parameters, compared to several recent deep learning-based methods.
{"title":"Convolution Autoencoder-Based Sparse Representation Wavelet for Image Classification","authors":"Tan-Sy Nguyen, Long H. Ngo, M. Luong, M. Kaaniche, Azeddine Beghdadi","doi":"10.1109/MMSP48831.2020.9287107","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287107","url":null,"abstract":"In this paper, we propose an effective Convolutional Autoencoder (AE) model for Sparse Representation (SR) in the Wavelet Domain for Classification (SRWC). The proposed approach involves an autoencoder with a sparse latent layer for learning sparse codes of wavelet features. The estimated sparse codes are used for assigning classes to test samples using a residual-based probabilistic criterion. Intensive experiments carried out on various datasets revealed that the proposed method yields better classification accuracy while exhibiting a significant reduction in the number of network parameters, compared to several recent deep learning-based methods.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"24 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117007707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287096
Viktoria Heimann, Nils Genser, A. Kaup
Many applications in image processing require re-sampling of arbitrarily located samples onto regular grid positions. This is important in frame-rate up-conversion, super-resolution, and image warping among others. A state-of-the-art high quality model-based resampling technique is frequency-selective mesh-to-grid resampling which requires pre-estimation of key points. In this paper, we propose a new key point agnostic frequency-selective mesh-to-grid resampling that does not depend on pre-estimated key points. Hence, the number of data points that are included is reduced drastically and the run time decreases significantly. To compensate for the key points, a spectral weighting function is introduced that models the optical transfer function in order to favor low frequencies more than high ones. Thereby, resampling artefacts like ringing are supressed reliably and the resampling quality increases. On average, the new AFSMR is conceptually simpler and gains up to 1.2 dB in terms of PSNR compared to the original mesh-to-grid resampling while being approximately 14.5 times faster.
{"title":"Key Point Agnostic Frequency-Selective Mesh-to-Grid Image Resampling using Spectral Weighting","authors":"Viktoria Heimann, Nils Genser, A. Kaup","doi":"10.1109/MMSP48831.2020.9287096","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287096","url":null,"abstract":"Many applications in image processing require re-sampling of arbitrarily located samples onto regular grid positions. This is important in frame-rate up-conversion, super-resolution, and image warping among others. A state-of-the-art high quality model-based resampling technique is frequency-selective mesh-to-grid resampling which requires pre-estimation of key points. In this paper, we propose a new key point agnostic frequency-selective mesh-to-grid resampling that does not depend on pre-estimated key points. Hence, the number of data points that are included is reduced drastically and the run time decreases significantly. To compensate for the key points, a spectral weighting function is introduced that models the optical transfer function in order to favor low frequencies more than high ones. Thereby, resampling artefacts like ringing are supressed reliably and the resampling quality increases. On average, the new AFSMR is conceptually simpler and gains up to 1.2 dB in terms of PSNR compared to the original mesh-to-grid resampling while being approximately 14.5 times faster.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117267510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287106
I. Tabus, E. C. Kaya, S. Schwarz
The paper proposes a new lossy way of encoding the geometry of point clouds. The proposed scheme reconstructs the geometry from only the two depth maps associated to a single projection direction and then proposes a progressive reconstruction process using suitably defined anchor points. The reconstruction from the two depth images follows several primitives for analyzing and encoding, several of which are only optional. The resulting bitstream is embedded and can be truncated at various levels of reconstruction of the bounding volume. The encoding tools for encoding the needed entities are extremely simple and can be combined flexibly. The scheme can also be combined with the G-PCC coding, for reconstructing in a lossless way the sparse point clouds. The experiments show improvement of the rate-distortion performance of the proposed method when combined with the G-PCC codec as compared to G-PCC codec alone.
{"title":"Successive Refinement of Bounding Volumes for Point Cloud Coding","authors":"I. Tabus, E. C. Kaya, S. Schwarz","doi":"10.1109/MMSP48831.2020.9287106","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287106","url":null,"abstract":"The paper proposes a new lossy way of encoding the geometry of point clouds. The proposed scheme reconstructs the geometry from only the two depth maps associated to a single projection direction and then proposes a progressive reconstruction process using suitably defined anchor points. The reconstruction from the two depth images follows several primitives for analyzing and encoding, several of which are only optional. The resulting bitstream is embedded and can be truncated at various levels of reconstruction of the bounding volume. The encoding tools for encoding the needed entities are extremely simple and can be combined flexibly. The scheme can also be combined with the G-PCC coding, for reconstructing in a lossless way the sparse point clouds. The experiments show improvement of the rate-distortion performance of the proposed method when combined with the G-PCC codec as compared to G-PCC codec alone.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131537550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287115
Deepa Naik, S. Schwarz, V. Vadakital, Kimmo Roimela
Surface light-field (SLF) is a mapping of a set of color vectors to a set of ray vectors that originate at a point on a surface. It enables rendering photo-realistic view points in extended reality applications. However, the amount of data required to represent SLF is significantly more. Therefore, storing and distributing SLFs requires an efficient compressed representation. The Motion Pictures Experts Group (MPEG) has an on-going standard activity for the compression of point clouds. Until recently, this activity was targeting compression of single texture information, but is now investigating view dependent textures. In this paper, we propose methods to optimize coding of view dependent color without compromising on the visual quality. Our results show the optimizations provided in this paper reduce coded HEVC bit rate by 64% for the all-intra configuration and 52% for the random-access configuration, when compared to coding all texture independently.
{"title":"Surface Lightfield Support in Video-based Point Cloud Coding","authors":"Deepa Naik, S. Schwarz, V. Vadakital, Kimmo Roimela","doi":"10.1109/MMSP48831.2020.9287115","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287115","url":null,"abstract":"Surface light-field (SLF) is a mapping of a set of color vectors to a set of ray vectors that originate at a point on a surface. It enables rendering photo-realistic view points in extended reality applications. However, the amount of data required to represent SLF is significantly more. Therefore, storing and distributing SLFs requires an efficient compressed representation. The Motion Pictures Experts Group (MPEG) has an on-going standard activity for the compression of point clouds. Until recently, this activity was targeting compression of single texture information, but is now investigating view dependent textures. In this paper, we propose methods to optimize coding of view dependent color without compromising on the visual quality. Our results show the optimizations provided in this paper reduce coded HEVC bit rate by 64% for the all-intra configuration and 52% for the random-access configuration, when compared to coding all texture independently.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133931625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287059
P. Maniotis, N. Thomos
Despite the advances of 5G systems, the delivery of 360° video content in mobile networks remains challenging because of the size of 360° video files. Recently, edge caching has been shown to bring large performance gains to 360° Video on Demand (VoD) delivery systems, however existing systems cannot be straightforwardly applied to live 360° video streaming. To address this issue, we investigate edge cache-assisted live 360° video streaming. As videos’ and tiles’ popularities vary with time, our framework employs a Long Short-Term Memory (LSTM) network to determine the optimal cache placement/evictions strategies that optimize the quality of the videos rendered by the users. To further enhance the delivered video quality, users located in the overlap of the coverage areas of multiple SBSs are allowed to receive their data from any of these SBSs. We evaluate and compare the performance of our method with that of state-of-the-art systems. The results show the superiority of the proposed method against its counterparts, and make clear the benefits of accurate tiles’ popularity prediction by the LSTM networks and users association with multiple SBSs in terms of the delivered quality.
{"title":"Smart caching for live 360° video streaming in mobile networks","authors":"P. Maniotis, N. Thomos","doi":"10.1109/MMSP48831.2020.9287059","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287059","url":null,"abstract":"Despite the advances of 5G systems, the delivery of 360° video content in mobile networks remains challenging because of the size of 360° video files. Recently, edge caching has been shown to bring large performance gains to 360° Video on Demand (VoD) delivery systems, however existing systems cannot be straightforwardly applied to live 360° video streaming. To address this issue, we investigate edge cache-assisted live 360° video streaming. As videos’ and tiles’ popularities vary with time, our framework employs a Long Short-Term Memory (LSTM) network to determine the optimal cache placement/evictions strategies that optimize the quality of the videos rendered by the users. To further enhance the delivered video quality, users located in the overlap of the coverage areas of multiple SBSs are allowed to receive their data from any of these SBSs. We evaluate and compare the performance of our method with that of state-of-the-art systems. The results show the superiority of the proposed method against its counterparts, and make clear the benefits of accurate tiles’ popularity prediction by the LSTM networks and users association with multiple SBSs in terms of the delivered quality.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"165 3-4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134289122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287065
S. Fremerey, Steve Göring, Rakesh Rao Ramachandra Rao, Rachel Huang, A. Raake
During the last years, the number of 360° videos available for streaming has rapidly increased, leading to the need for 360° streaming video quality assessment. In this paper, we report and publish results of three subjective 360° video quality tests, with conditions used to reflect real-world bitrates and resolutions including 4K, 6K and 8K, resulting in 64 stimuli each for the first two tests and 63 for the third. As playout device we used the HTC Vive for the first and HTC Vive Pro for the remaining two tests. Video-quality ratings were collected using the 5-point Absolute Category Rating scale. The 360° dataset provided with the paper contains the links of the used source videos, the raw subjective scores, video-related meta-data, head rotation data and Simulator Sickness Questionnaire results per stimulus and per subject to enable reproducibility of the provided results. Moreover, we use our dataset to compare the performance of state-of-the-art full-reference quality metrics such as VMAF, PSNR, SSIM, ADM2, WS-PSNR and WS-SSIM. Out of all metrics, VMAF was found to show the highest correlation with the subjective scores. Further, we evaluated a center-cropped version of VMAF ("VMAF-cc") that showed to provide a similar performance as the full VMAF. In addition to the dataset and the objective metric evaluation, we propose two new video-quality prediction models, a bitstream meta-data-based model and a hybrid no-reference model using bitrate, resolution and pixel information of the video as input. The new lightweight models provide similar performance as the full-reference models while enabling fast calculations.
{"title":"Subjective Test Dataset and Meta-data-based Models for 360° Streaming Video Quality","authors":"S. Fremerey, Steve Göring, Rakesh Rao Ramachandra Rao, Rachel Huang, A. Raake","doi":"10.1109/MMSP48831.2020.9287065","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287065","url":null,"abstract":"During the last years, the number of 360° videos available for streaming has rapidly increased, leading to the need for 360° streaming video quality assessment. In this paper, we report and publish results of three subjective 360° video quality tests, with conditions used to reflect real-world bitrates and resolutions including 4K, 6K and 8K, resulting in 64 stimuli each for the first two tests and 63 for the third. As playout device we used the HTC Vive for the first and HTC Vive Pro for the remaining two tests. Video-quality ratings were collected using the 5-point Absolute Category Rating scale. The 360° dataset provided with the paper contains the links of the used source videos, the raw subjective scores, video-related meta-data, head rotation data and Simulator Sickness Questionnaire results per stimulus and per subject to enable reproducibility of the provided results. Moreover, we use our dataset to compare the performance of state-of-the-art full-reference quality metrics such as VMAF, PSNR, SSIM, ADM2, WS-PSNR and WS-SSIM. Out of all metrics, VMAF was found to show the highest correlation with the subjective scores. Further, we evaluated a center-cropped version of VMAF (\"VMAF-cc\") that showed to provide a similar performance as the full VMAF. In addition to the dataset and the objective metric evaluation, we propose two new video-quality prediction models, a bitstream meta-data-based model and a hybrid no-reference model using bitrate, resolution and pixel information of the video as input. The new lightweight models provide similar performance as the full-reference models while enabling fast calculations.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114294984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287050
Fariha Afsana, M. Paul, M. Murshed, D. Taubman
With the growth of video technologies, super-resolution videos, including 360-degree immersive video has become a reality due to exciting applications such as augmented/virtual/mixed reality for better interaction and a wide-angle user-view experience of a scene compared to traditional video with narrow-focused viewing angle. The new generation video contents are bandwidth-intensive in nature due to high resolution and demand high bit rate as well as low latency delivery requirements that pose challenges in solving the bottleneck of transmission and storage burdens. There is limited optimisation space in traditional video coding schemes for improving video coding efficiency in intra-frame due to the fixed size of processing block. This paper presents a new approach for improving intra-frame coding especially at low bit rate video transmission for 360-degree video for lossy mode of HEVC. Prior to using traditional HEVC intra-prediction, this approach exploits the global redundancy of entire frame by extracting common important information using multi-level discrete wavelet transformation. This paper demonstrates that the proposed method considering only low frequency information of a frame and encoding this can outperform the HEVC standard at low bit rates. The experimental results indicate that the proposed intra-frame coding strategy achieves an average of 54.07% BD-rate reduction and 2.84 dB BD-PSNR gain for low bit rate scenario compared to the HEVC. It also achieves a significant improvement in encoding time reduction of about 66.84% on an average. Moreover, this finding also demonstrates that the existing HEVC block partitioning can be applied in the transform domain for better exploitation of information concentration as we applied HEVC on wavelet frequency domain.
{"title":"Efficient Low Bit-Rate Intra-Frame Coding using Common Information for 360-degree Video","authors":"Fariha Afsana, M. Paul, M. Murshed, D. Taubman","doi":"10.1109/MMSP48831.2020.9287050","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287050","url":null,"abstract":"With the growth of video technologies, super-resolution videos, including 360-degree immersive video has become a reality due to exciting applications such as augmented/virtual/mixed reality for better interaction and a wide-angle user-view experience of a scene compared to traditional video with narrow-focused viewing angle. The new generation video contents are bandwidth-intensive in nature due to high resolution and demand high bit rate as well as low latency delivery requirements that pose challenges in solving the bottleneck of transmission and storage burdens. There is limited optimisation space in traditional video coding schemes for improving video coding efficiency in intra-frame due to the fixed size of processing block. This paper presents a new approach for improving intra-frame coding especially at low bit rate video transmission for 360-degree video for lossy mode of HEVC. Prior to using traditional HEVC intra-prediction, this approach exploits the global redundancy of entire frame by extracting common important information using multi-level discrete wavelet transformation. This paper demonstrates that the proposed method considering only low frequency information of a frame and encoding this can outperform the HEVC standard at low bit rates. The experimental results indicate that the proposed intra-frame coding strategy achieves an average of 54.07% BD-rate reduction and 2.84 dB BD-PSNR gain for low bit rate scenario compared to the HEVC. It also achieves a significant improvement in encoding time reduction of about 66.84% on an average. Moreover, this finding also demonstrates that the existing HEVC block partitioning can be applied in the transform domain for better exploitation of information concentration as we applied HEVC on wavelet frequency domain.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132179551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/mmsp48831.2020.9287079
{"title":"MMSP 2020 Cover Page","authors":"","doi":"10.1109/mmsp48831.2020.9287079","DOIUrl":"https://doi.org/10.1109/mmsp48831.2020.9287079","url":null,"abstract":"","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133101945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}