Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287055
Rakesh Rao Ramachandra Rao, Steve Göring, Robert Steger, Saman Zadtootaghaj, Nabajeet Barman, S. Fremerey, S. Möller, A. Raake
The streaming of gaming content, both passive and interactive, has increased manifolds in recent years. Gaming contents bring with them some peculiarities which are normally not seen in traditional 2D videos, such as the artificial and synthetic nature of contents or repetition of objects in a game. In addition, the perception of gaming content by the user is different from that of traditional 2D videos due to its pecularities and also the fact that users may not often watch such content. Hence, it becomes imperative to evaluate whether the existing video quality models usually designed for traditional 2D videos are applicable to gaming content. In this paper, we evaluate the applicability of the recently standardized bitstream-based video-quality model ITU-T P.1204.3 on gaming content. To analyze the performance of this model, we used 4 different gaming datasets (3 publicly available + 1 internal) not previously used for model training, and compared it with the existing state-of-the-art models. We found that the ITU P.1204.3 model out of the box performs well on these unseen datasets, with an RMSE ranging between 0.38 − 0.45 on the 5-point absolute category rating and Pearson Correlation between 0.85 − 0.93 across all the 4 databases. We further propose a full-HD variant of the P.1204.3 model, since the original model is trained and validated which targets a resolution of 4K/UHD-1. A 50:50 split across all databases is used to train and validate this variant so as to make sure that the proposed model is applicable to various conditions.
{"title":"A Large-scale Evaluation of the bitstream-based video-quality model ITU-T P.1204.3 on Gaming Content","authors":"Rakesh Rao Ramachandra Rao, Steve Göring, Robert Steger, Saman Zadtootaghaj, Nabajeet Barman, S. Fremerey, S. Möller, A. Raake","doi":"10.1109/MMSP48831.2020.9287055","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287055","url":null,"abstract":"The streaming of gaming content, both passive and interactive, has increased manifolds in recent years. Gaming contents bring with them some peculiarities which are normally not seen in traditional 2D videos, such as the artificial and synthetic nature of contents or repetition of objects in a game. In addition, the perception of gaming content by the user is different from that of traditional 2D videos due to its pecularities and also the fact that users may not often watch such content. Hence, it becomes imperative to evaluate whether the existing video quality models usually designed for traditional 2D videos are applicable to gaming content. In this paper, we evaluate the applicability of the recently standardized bitstream-based video-quality model ITU-T P.1204.3 on gaming content. To analyze the performance of this model, we used 4 different gaming datasets (3 publicly available + 1 internal) not previously used for model training, and compared it with the existing state-of-the-art models. We found that the ITU P.1204.3 model out of the box performs well on these unseen datasets, with an RMSE ranging between 0.38 − 0.45 on the 5-point absolute category rating and Pearson Correlation between 0.85 − 0.93 across all the 4 databases. We further propose a full-HD variant of the P.1204.3 model, since the original model is trained and validated which targets a resolution of 4K/UHD-1. A 50:50 split across all databases is used to train and validate this variant so as to make sure that the proposed model is applicable to various conditions.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"237 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133234512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287160
D. Garcia, J. Hernandez, Steve Mann
We introduce a method to enhance the performance of the high dynamic range (HDR) technique on audio signals by automatically controlling the gains of the individual signal channels. Automatic gain control (AGC) compensates the receiver’s dynamic range by ensuring that the incoming signal is contained within the desired range while the HDR utilizes these multi-channel gains to extend the dynamic range of the composited signal. The results validate that the benefits given by each method are compounded when they are used together. In effect, we produce a dynamic high dynamic range (DHDR) composite signal. The HDR AGC method is simulated to show performance gains under various conditions. The method is then implemented using a custom PCB and a microcontroller to show feasibility in real-world and real-time applications.
{"title":"Automatic Gain Control for Enhanced HDR Performance on Audio","authors":"D. Garcia, J. Hernandez, Steve Mann","doi":"10.1109/MMSP48831.2020.9287160","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287160","url":null,"abstract":"We introduce a method to enhance the performance of the high dynamic range (HDR) technique on audio signals by automatically controlling the gains of the individual signal channels. Automatic gain control (AGC) compensates the receiver’s dynamic range by ensuring that the incoming signal is contained within the desired range while the HDR utilizes these multi-channel gains to extend the dynamic range of the composited signal. The results validate that the benefits given by each method are compounded when they are used together. In effect, we produce a dynamic high dynamic range (DHDR) composite signal. The HDR AGC method is simulated to show performance gains under various conditions. The method is then implemented using a custom PCB and a microcontroller to show feasibility in real-world and real-time applications.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133837621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287076
K. Wegner, J. Stankowski, O. Stankiewicz, Hubert Żabiński, K. Klimaszewski, T. Grajek
In the paper we investigated the methods of obtaining high-resolution, high frame-rate virtual views based on low frame-rate cameras for applications in high-performance multiview systems. We demonstrated how to set up synchronization for multiview acquisition systems to record required data and then how to process the data to create virtual views at a higher frame rate, while preserving high resolution of the views. We analyzed various ways to combine time frame interpolation with an alternative side-view synthesis technique which allows us to create a required high frame-rate video of a virtual viewpoint. The results prove that the proposed methods are capable of delivering the expected high-quality, high-resolution and high frame-rate virtual views.
{"title":"High Frame-Rate Virtual View Synthesis Based on Low Frame-Rate Input","authors":"K. Wegner, J. Stankowski, O. Stankiewicz, Hubert Żabiński, K. Klimaszewski, T. Grajek","doi":"10.1109/MMSP48831.2020.9287076","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287076","url":null,"abstract":"In the paper we investigated the methods of obtaining high-resolution, high frame-rate virtual views based on low frame-rate cameras for applications in high-performance multiview systems. We demonstrated how to set up synchronization for multiview acquisition systems to record required data and then how to process the data to create virtual views at a higher frame rate, while preserving high resolution of the views. We analyzed various ways to combine time frame interpolation with an alternative side-view synthesis technique which allows us to create a required high frame-rate video of a virtual viewpoint. The results prove that the proposed methods are capable of delivering the expected high-quality, high-resolution and high frame-rate virtual views.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"160 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133573742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287051
Zohaib Amjad Khan, Azeddine Beghdadi, F. A. Cheikh, M. Kaaniche, Muhammad Ali Qureshi
An effective contrast enhancement method should not only improve the perceptual quality of an image but should also avoid adding any artifacts or affecting naturalness of images. This makes Contrast Enhancement Evaluation (CEE) a challenging task in the sense that both the improvement in image quality and unwanted side-effects need to be checked for. Currently, there is no single CEE metric that works well for all kinds of enhancement criteria. In this paper, we propose a new Multi-Criteria CEE (MCCEE) measure which combines different metrics effectively to give a single quality score. In order to fully exploit the potential of these metrics, we have further proposed to apply them on the decomposed image using wavelet transform. This new metric has been tested on two natural image contrast enhancement databases as well as on medical Computed Tomography (CT) images. The results show a substantial improvement as compared to the existing evaluation metrics. The code for the metric is available at: https://github.com/zakopz/MCCEE-Contrast-Enhancement-Metric
{"title":"A Multi-Criteria Contrast Enhancement Evaluation Measure using Wavelet Decomposition","authors":"Zohaib Amjad Khan, Azeddine Beghdadi, F. A. Cheikh, M. Kaaniche, Muhammad Ali Qureshi","doi":"10.1109/MMSP48831.2020.9287051","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287051","url":null,"abstract":"An effective contrast enhancement method should not only improve the perceptual quality of an image but should also avoid adding any artifacts or affecting naturalness of images. This makes Contrast Enhancement Evaluation (CEE) a challenging task in the sense that both the improvement in image quality and unwanted side-effects need to be checked for. Currently, there is no single CEE metric that works well for all kinds of enhancement criteria. In this paper, we propose a new Multi-Criteria CEE (MCCEE) measure which combines different metrics effectively to give a single quality score. In order to fully exploit the potential of these metrics, we have further proposed to apply them on the decomposed image using wavelet transform. This new metric has been tested on two natural image contrast enhancement databases as well as on medical Computed Tomography (CT) images. The results show a substantial improvement as compared to the existing evaluation metrics. The code for the metric is available at: https://github.com/zakopz/MCCEE-Contrast-Enhancement-Metric","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116554642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287099
Robert Hupke, Sebastian Lauster, Nils Poschadel, Marcel Nophut, Stephan Preihs, J. Peissig
Knowledge of room acoustic parameters such as frequency- and direction-dependent reflection coefficients, room volume, or geometric characteristics is important for the mod-eling of acoustic environments, e. g. to improve the plausibility of immersive audio in mixed reality applications or to transfer a physical acoustic environment into a completely virtual one. This paper presents a method for detecting first-order reflections in three-dimensions of spatial room impulse responses recorded with a spherical microphone array. By using geometric relations, the estimated direction of arrival (DOA), and the time difference of arrival (TDOA), the order of the respective mirror sound source is determined and categorized to the individual walls of the room. The detected DOA and TDOA of the first-order mirror sound sources are used to estimate the frequency-dependent reflection coefficients of the respective walls using a null-steering beamformer directed to the estimated DOA. Analysis in terms of DOA and TDOA indicates an accurate estimation for simulated and measured data. The estimation of the reflection coefficients shows a relative error of 3.5 % between 500 Hz and 4 kHz for simulated data. Furthermore, experimental challenges are discussed, such as the evaluation of the reflection coefficient estimation in real acoustic environments.
{"title":"Localization and Categorization of Early Reflections for Estimating Acoustic Reflection Coefficients","authors":"Robert Hupke, Sebastian Lauster, Nils Poschadel, Marcel Nophut, Stephan Preihs, J. Peissig","doi":"10.1109/MMSP48831.2020.9287099","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287099","url":null,"abstract":"Knowledge of room acoustic parameters such as frequency- and direction-dependent reflection coefficients, room volume, or geometric characteristics is important for the mod-eling of acoustic environments, e. g. to improve the plausibility of immersive audio in mixed reality applications or to transfer a physical acoustic environment into a completely virtual one. This paper presents a method for detecting first-order reflections in three-dimensions of spatial room impulse responses recorded with a spherical microphone array. By using geometric relations, the estimated direction of arrival (DOA), and the time difference of arrival (TDOA), the order of the respective mirror sound source is determined and categorized to the individual walls of the room. The detected DOA and TDOA of the first-order mirror sound sources are used to estimate the frequency-dependent reflection coefficients of the respective walls using a null-steering beamformer directed to the estimated DOA. Analysis in terms of DOA and TDOA indicates an accurate estimation for simulated and measured data. The estimation of the reflection coefficients shows a relative error of 3.5 % between 500 Hz and 4 kHz for simulated data. Furthermore, experimental challenges are discussed, such as the evaluation of the reflection coefficient estimation in real acoustic environments.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116024673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287154
Rafael Diniz, P. Freitas, Mylène C. Q. Farias
In recent years, there has been an increase in the popularity of Point Clouds (PC) as the preferred data structure for representing 3D visual contents. Examples of PC applications range from 3D representations of small objects up to large maps. The advent of PC adoption triggered the development of new coding, transmission, and presentation methodologies. And, along with these, novel methods for evaluating the visual quality of PC contents. This paper presents a new objective full-reference visual quality metric for PC contents, which uses a proposed descriptor entitled Local Luminance Patterns (LLP). It extracts the statistics of the luminance information of reference and test PCs and compares their statistics to assess the perceived quality of the test PC. The proposed PC quality assessment method can be applied to both large and small scale PCs. Using publicly available PC quality datasets, we compared the proposed method with current state-of-the-art PC quality metrics, obtaining competing results.
{"title":"Local Luminance Patterns for Point Cloud Quality Assessment","authors":"Rafael Diniz, P. Freitas, Mylène C. Q. Farias","doi":"10.1109/MMSP48831.2020.9287154","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287154","url":null,"abstract":"In recent years, there has been an increase in the popularity of Point Clouds (PC) as the preferred data structure for representing 3D visual contents. Examples of PC applications range from 3D representations of small objects up to large maps. The advent of PC adoption triggered the development of new coding, transmission, and presentation methodologies. And, along with these, novel methods for evaluating the visual quality of PC contents. This paper presents a new objective full-reference visual quality metric for PC contents, which uses a proposed descriptor entitled Local Luminance Patterns (LLP). It extracts the statistics of the luminance information of reference and test PCs and compares their statistics to assess the perceived quality of the test PC. The proposed PC quality assessment method can be applied to both large and small scale PCs. Using publicly available PC quality datasets, we compared the proposed method with current state-of-the-art PC quality metrics, obtaining competing results.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116343394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287068
A. Muñoz-Montoro, A. Politis, K. Drossos, J. Carabias-Orti
This work addresses the problem of multichannel source separation combining two powerful approaches, multichannel spectral factorization with recent monophonic deep learning (DL) based spectrum inference. Individual source spectra at different channels are estimated with a Masker-Denoiser twin network, able to model long-term temporal patterns of a musical piece. The monophonic source spectrograms are used within a spatial covariance mixing model based on complex-valued multichannel non-negative matrix factorization (CMNMF) that predicts the spatial characteristics of each source. The proposed framework is evaluated on the task of singing voice separation with a large multichannel dataset. Experimental results show that our joint DL+CMNMF method outperforms both the individual monophonic DL-based separation and the multichannel CMNMF baseline methods.
{"title":"Multichannel Singing Voice Separation by Deep Neural Network Informed DOA Constrained CMNMF","authors":"A. Muñoz-Montoro, A. Politis, K. Drossos, J. Carabias-Orti","doi":"10.1109/MMSP48831.2020.9287068","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287068","url":null,"abstract":"This work addresses the problem of multichannel source separation combining two powerful approaches, multichannel spectral factorization with recent monophonic deep learning (DL) based spectrum inference. Individual source spectra at different channels are estimated with a Masker-Denoiser twin network, able to model long-term temporal patterns of a musical piece. The monophonic source spectrograms are used within a spatial covariance mixing model based on complex-valued multichannel non-negative matrix factorization (CMNMF) that predicts the spatial characteristics of each source. The proposed framework is evaluated on the task of singing voice separation with a large multichannel dataset. Experimental results show that our joint DL+CMNMF method outperforms both the individual monophonic DL-based separation and the multichannel CMNMF baseline methods.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123328475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287103
Pratyusha Das, Antonio Ortega
With the advancement of reliable, fast, portable acquisition systems, human motion capture data is becoming widely used in many industrial, medical, and surveillance applications. These systems can track multiple people simultaneously, providing full-body skeletal keypoints as well as more detailed landmarks in face, hands and feet. This leads to a huge amount of skeleton data to be transmitted or stored. In this paper, we introduce Graph-based Skeleton Compression (GSC), an efficient graph-based method for nearly lossless compression. We use a separable spatio-temporal graph transform along with non-uniform quantization, coefficient scanning and entropy coding with run-length codes for nearly lossless compression. We evaluate the compression performance of the proposed method on the large NTU-RGB activity dataset. Our method outperforms a 1D discrete cosine transform method applied along temporal direction. In near-lossless mode our proposed compression does not affect action recognition performance.
{"title":"Graph-based skeleton data compression","authors":"Pratyusha Das, Antonio Ortega","doi":"10.1109/MMSP48831.2020.9287103","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287103","url":null,"abstract":"With the advancement of reliable, fast, portable acquisition systems, human motion capture data is becoming widely used in many industrial, medical, and surveillance applications. These systems can track multiple people simultaneously, providing full-body skeletal keypoints as well as more detailed landmarks in face, hands and feet. This leads to a huge amount of skeleton data to be transmitted or stored. In this paper, we introduce Graph-based Skeleton Compression (GSC), an efficient graph-based method for nearly lossless compression. We use a separable spatio-temporal graph transform along with non-uniform quantization, coefficient scanning and entropy coding with run-length codes for nearly lossless compression. We evaluate the compression performance of the proposed method on the large NTU-RGB activity dataset. Our method outperforms a 1D discrete cosine transform method applied along temporal direction. In near-lossless mode our proposed compression does not affect action recognition performance.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125472767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287120
Yingying Zhang, Chao Ren, Honggang Chen, Ce Zhu
Depth maps are widely used in 3D imaging techniques because of the appearance of the consumer depth cameras. However, the practical application of the depth map is limited by the poor image quality. In this paper, we propose a novel framework for the single depth map super-resolution via joint the local and non-local constraints simultaneously in the depth map. For the non-local constraint, we use the group-based sparse representation to explore the non-local self-similarity of the depth map. For the local constraint, we first estimate gradient images in different directions of the desired high-resolution (HR) depth map, and then build a multi-directional gradient guided regularizer using these estimated gradient images to describe depth gradients with different orientations. Finally, the two complementary regularizers are cast into a unified optimization framework to obtain the desired HR image. The experimental results show that the proposed method can achieve better depth super-resolution performance than state-of-the-art methods.
{"title":"Single depth map super-resolution via joint non-local and local modeling","authors":"Yingying Zhang, Chao Ren, Honggang Chen, Ce Zhu","doi":"10.1109/MMSP48831.2020.9287120","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287120","url":null,"abstract":"Depth maps are widely used in 3D imaging techniques because of the appearance of the consumer depth cameras. However, the practical application of the depth map is limited by the poor image quality. In this paper, we propose a novel framework for the single depth map super-resolution via joint the local and non-local constraints simultaneously in the depth map. For the non-local constraint, we use the group-based sparse representation to explore the non-local self-similarity of the depth map. For the local constraint, we first estimate gradient images in different directions of the desired high-resolution (HR) depth map, and then build a multi-directional gradient guided regularizer using these estimated gradient images to describe depth gradients with different orientations. Finally, the two complementary regularizers are cast into a unified optimization framework to obtain the desired HR image. The experimental results show that the proposed method can achieve better depth super-resolution performance than state-of-the-art methods.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"169 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120893131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287078
I. Curcio, Saba Ahsan
Immersive 360-degree video delivery is more and more widespread. New use cases are constantly emerging and make it a promising video technology for Extended Reality applications. Viewport Dependent Delivery (VDD) is an established technique used for saving network bit rate when transmitting omnidirectional video. One of the hardest challenges in VDD of 360-degree video is how to ensure that the video quality in the user’s viewport is always the highest possible, independent of the user’s head motion speed and span of motion. This paper introduces the concept of viewport margins. These can be understood as an extra high-quality spatial safety area around the user’s viewport. Viewport margins provide a better user experience for the receiver by reducing the Motion to High Quality Delay and the percentage of low-quality viewport seen by the user. We provide simulation results that show the advantage of using viewport margins for real-time low-delay VDD of 360-degree video. In particular, for a head motion of 90 degrees, using a 10-30% margins can reduce the percentage of viewport at low quality by 5-10% and using 30% margins reduces the motion to high quality delay to zero for head speeds up to 360 degrees per second, when the viewport feedback is sent every 33ms.
{"title":"Viewport Margins for 360-Degree Immersive Video","authors":"I. Curcio, Saba Ahsan","doi":"10.1109/MMSP48831.2020.9287078","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287078","url":null,"abstract":"Immersive 360-degree video delivery is more and more widespread. New use cases are constantly emerging and make it a promising video technology for Extended Reality applications. Viewport Dependent Delivery (VDD) is an established technique used for saving network bit rate when transmitting omnidirectional video. One of the hardest challenges in VDD of 360-degree video is how to ensure that the video quality in the user’s viewport is always the highest possible, independent of the user’s head motion speed and span of motion. This paper introduces the concept of viewport margins. These can be understood as an extra high-quality spatial safety area around the user’s viewport. Viewport margins provide a better user experience for the receiver by reducing the Motion to High Quality Delay and the percentage of low-quality viewport seen by the user. We provide simulation results that show the advantage of using viewport margins for real-time low-delay VDD of 360-degree video. In particular, for a head motion of 90 degrees, using a 10-30% margins can reduce the percentage of viewport at low quality by 5-10% and using 30% margins reduces the motion to high quality delay to zero for head speeds up to 360 degrees per second, when the viewport feedback is sent every 33ms.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"178 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128288605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}