Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287154
Rafael Diniz, P. Freitas, Mylène C. Q. Farias
In recent years, there has been an increase in the popularity of Point Clouds (PC) as the preferred data structure for representing 3D visual contents. Examples of PC applications range from 3D representations of small objects up to large maps. The advent of PC adoption triggered the development of new coding, transmission, and presentation methodologies. And, along with these, novel methods for evaluating the visual quality of PC contents. This paper presents a new objective full-reference visual quality metric for PC contents, which uses a proposed descriptor entitled Local Luminance Patterns (LLP). It extracts the statistics of the luminance information of reference and test PCs and compares their statistics to assess the perceived quality of the test PC. The proposed PC quality assessment method can be applied to both large and small scale PCs. Using publicly available PC quality datasets, we compared the proposed method with current state-of-the-art PC quality metrics, obtaining competing results.
{"title":"Local Luminance Patterns for Point Cloud Quality Assessment","authors":"Rafael Diniz, P. Freitas, Mylène C. Q. Farias","doi":"10.1109/MMSP48831.2020.9287154","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287154","url":null,"abstract":"In recent years, there has been an increase in the popularity of Point Clouds (PC) as the preferred data structure for representing 3D visual contents. Examples of PC applications range from 3D representations of small objects up to large maps. The advent of PC adoption triggered the development of new coding, transmission, and presentation methodologies. And, along with these, novel methods for evaluating the visual quality of PC contents. This paper presents a new objective full-reference visual quality metric for PC contents, which uses a proposed descriptor entitled Local Luminance Patterns (LLP). It extracts the statistics of the luminance information of reference and test PCs and compares their statistics to assess the perceived quality of the test PC. The proposed PC quality assessment method can be applied to both large and small scale PCs. Using publicly available PC quality datasets, we compared the proposed method with current state-of-the-art PC quality metrics, obtaining competing results.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116343394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287103
Pratyusha Das, Antonio Ortega
With the advancement of reliable, fast, portable acquisition systems, human motion capture data is becoming widely used in many industrial, medical, and surveillance applications. These systems can track multiple people simultaneously, providing full-body skeletal keypoints as well as more detailed landmarks in face, hands and feet. This leads to a huge amount of skeleton data to be transmitted or stored. In this paper, we introduce Graph-based Skeleton Compression (GSC), an efficient graph-based method for nearly lossless compression. We use a separable spatio-temporal graph transform along with non-uniform quantization, coefficient scanning and entropy coding with run-length codes for nearly lossless compression. We evaluate the compression performance of the proposed method on the large NTU-RGB activity dataset. Our method outperforms a 1D discrete cosine transform method applied along temporal direction. In near-lossless mode our proposed compression does not affect action recognition performance.
{"title":"Graph-based skeleton data compression","authors":"Pratyusha Das, Antonio Ortega","doi":"10.1109/MMSP48831.2020.9287103","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287103","url":null,"abstract":"With the advancement of reliable, fast, portable acquisition systems, human motion capture data is becoming widely used in many industrial, medical, and surveillance applications. These systems can track multiple people simultaneously, providing full-body skeletal keypoints as well as more detailed landmarks in face, hands and feet. This leads to a huge amount of skeleton data to be transmitted or stored. In this paper, we introduce Graph-based Skeleton Compression (GSC), an efficient graph-based method for nearly lossless compression. We use a separable spatio-temporal graph transform along with non-uniform quantization, coefficient scanning and entropy coding with run-length codes for nearly lossless compression. We evaluate the compression performance of the proposed method on the large NTU-RGB activity dataset. Our method outperforms a 1D discrete cosine transform method applied along temporal direction. In near-lossless mode our proposed compression does not affect action recognition performance.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125472767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287076
K. Wegner, J. Stankowski, O. Stankiewicz, Hubert Żabiński, K. Klimaszewski, T. Grajek
In the paper we investigated the methods of obtaining high-resolution, high frame-rate virtual views based on low frame-rate cameras for applications in high-performance multiview systems. We demonstrated how to set up synchronization for multiview acquisition systems to record required data and then how to process the data to create virtual views at a higher frame rate, while preserving high resolution of the views. We analyzed various ways to combine time frame interpolation with an alternative side-view synthesis technique which allows us to create a required high frame-rate video of a virtual viewpoint. The results prove that the proposed methods are capable of delivering the expected high-quality, high-resolution and high frame-rate virtual views.
{"title":"High Frame-Rate Virtual View Synthesis Based on Low Frame-Rate Input","authors":"K. Wegner, J. Stankowski, O. Stankiewicz, Hubert Żabiński, K. Klimaszewski, T. Grajek","doi":"10.1109/MMSP48831.2020.9287076","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287076","url":null,"abstract":"In the paper we investigated the methods of obtaining high-resolution, high frame-rate virtual views based on low frame-rate cameras for applications in high-performance multiview systems. We demonstrated how to set up synchronization for multiview acquisition systems to record required data and then how to process the data to create virtual views at a higher frame rate, while preserving high resolution of the views. We analyzed various ways to combine time frame interpolation with an alternative side-view synthesis technique which allows us to create a required high frame-rate video of a virtual viewpoint. The results prove that the proposed methods are capable of delivering the expected high-quality, high-resolution and high frame-rate virtual views.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"160 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133573742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287051
Zohaib Amjad Khan, Azeddine Beghdadi, F. A. Cheikh, M. Kaaniche, Muhammad Ali Qureshi
An effective contrast enhancement method should not only improve the perceptual quality of an image but should also avoid adding any artifacts or affecting naturalness of images. This makes Contrast Enhancement Evaluation (CEE) a challenging task in the sense that both the improvement in image quality and unwanted side-effects need to be checked for. Currently, there is no single CEE metric that works well for all kinds of enhancement criteria. In this paper, we propose a new Multi-Criteria CEE (MCCEE) measure which combines different metrics effectively to give a single quality score. In order to fully exploit the potential of these metrics, we have further proposed to apply them on the decomposed image using wavelet transform. This new metric has been tested on two natural image contrast enhancement databases as well as on medical Computed Tomography (CT) images. The results show a substantial improvement as compared to the existing evaluation metrics. The code for the metric is available at: https://github.com/zakopz/MCCEE-Contrast-Enhancement-Metric
{"title":"A Multi-Criteria Contrast Enhancement Evaluation Measure using Wavelet Decomposition","authors":"Zohaib Amjad Khan, Azeddine Beghdadi, F. A. Cheikh, M. Kaaniche, Muhammad Ali Qureshi","doi":"10.1109/MMSP48831.2020.9287051","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287051","url":null,"abstract":"An effective contrast enhancement method should not only improve the perceptual quality of an image but should also avoid adding any artifacts or affecting naturalness of images. This makes Contrast Enhancement Evaluation (CEE) a challenging task in the sense that both the improvement in image quality and unwanted side-effects need to be checked for. Currently, there is no single CEE metric that works well for all kinds of enhancement criteria. In this paper, we propose a new Multi-Criteria CEE (MCCEE) measure which combines different metrics effectively to give a single quality score. In order to fully exploit the potential of these metrics, we have further proposed to apply them on the decomposed image using wavelet transform. This new metric has been tested on two natural image contrast enhancement databases as well as on medical Computed Tomography (CT) images. The results show a substantial improvement as compared to the existing evaluation metrics. The code for the metric is available at: https://github.com/zakopz/MCCEE-Contrast-Enhancement-Metric","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116554642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287131
Pasi Pertilä, Mikko Parviainen, V. Myllylä, A. Huttunen, P. Jarske
The spatial information about a sound source is carried by acoustic waves to a microphone array and can be observed through estimation of phase and amplitude differences between microphones. Time difference of arrival (TDoA) captures the propagation delay of the wavefront between microphones and can be used to steer a beamformer or to localize the source. However, reverberation and interference can deteriorate the TDoA estimate. Deep neural networks (DNNs) through supervised learning can extract speech related TDoAs in more adverse conditions than traditional correlation -based methods.Acoustic simulations provide large amounts of data with annotations, while real recordings require manual annotations or the use of reference sensors with proper calibration procedures. The distributions of these two data sources can differ. When a DNN model that is trained using simulated data is presented with real data from a different distribution, its performance decreases if not properly addressed.For the reduction of DNN –based TDoA estimation error, this work investigates the role of different input normalization techniques, mixing of simulated and real data for training, and applying an adversarial domain adaptation technique. Results quantify the reduction in TDoA error for real data using the different approaches. It is evident that the use of normalization methods, domain-adaptation, and real data during training can reduce the TDoA error.
{"title":"Time Difference of Arrival Estimation with Deep Learning – From Acoustic Simulations to Recorded Data","authors":"Pasi Pertilä, Mikko Parviainen, V. Myllylä, A. Huttunen, P. Jarske","doi":"10.1109/MMSP48831.2020.9287131","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287131","url":null,"abstract":"The spatial information about a sound source is carried by acoustic waves to a microphone array and can be observed through estimation of phase and amplitude differences between microphones. Time difference of arrival (TDoA) captures the propagation delay of the wavefront between microphones and can be used to steer a beamformer or to localize the source. However, reverberation and interference can deteriorate the TDoA estimate. Deep neural networks (DNNs) through supervised learning can extract speech related TDoAs in more adverse conditions than traditional correlation -based methods.Acoustic simulations provide large amounts of data with annotations, while real recordings require manual annotations or the use of reference sensors with proper calibration procedures. The distributions of these two data sources can differ. When a DNN model that is trained using simulated data is presented with real data from a different distribution, its performance decreases if not properly addressed.For the reduction of DNN –based TDoA estimation error, this work investigates the role of different input normalization techniques, mixing of simulated and real data for training, and applying an adversarial domain adaptation technique. Results quantify the reduction in TDoA error for real data using the different approaches. It is evident that the use of normalization methods, domain-adaptation, and real data during training can reduce the TDoA error.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131418773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287063
Farid Aliajni, Esa Rahtu
In this paper, we present a comprehensive study on the utility of deep learning feature extraction methods for visual place recognition task in three challenging conditions, appearance variation, viewpoint variation and combination of both appearance and viewpoint variation. We extensively compared the performance of convolutional neural network architectures with batch normalization layers in terms of fraction of the correct matches. These architectures are primarily trained for image classification and object detection problems and used as holistic feature descriptors for visual place recognition task. To verify effectiveness of our results, we utilized four real world datasets in place recognition. Our investigation demonstrates that convolutional neural network architectures coupled with batch normalization and trained for other tasks in computer vision outperform architectures which are specifically designed for place recognition tasks.
{"title":"Deep Learning Off-the-shelf Holistic Feature Descriptors for Visual Place Recognition in Challenging Conditions","authors":"Farid Aliajni, Esa Rahtu","doi":"10.1109/MMSP48831.2020.9287063","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287063","url":null,"abstract":"In this paper, we present a comprehensive study on the utility of deep learning feature extraction methods for visual place recognition task in three challenging conditions, appearance variation, viewpoint variation and combination of both appearance and viewpoint variation. We extensively compared the performance of convolutional neural network architectures with batch normalization layers in terms of fraction of the correct matches. These architectures are primarily trained for image classification and object detection problems and used as holistic feature descriptors for visual place recognition task. To verify effectiveness of our results, we utilized four real world datasets in place recognition. Our investigation demonstrates that convolutional neural network architectures coupled with batch normalization and trained for other tasks in computer vision outperform architectures which are specifically designed for place recognition tasks.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132467147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287084
Hanzhou Wu, Yuwei Yao, Xinpeng Zhang, Jiangfeng Wang
Criminal sketching aims to draw an approximation portrait of the criminal suspect by details of the criminal suspect that the observer can remember. However, even for a professional artist, it would need much time to complete sketching and draw a good portrait. It therefore motivates us to study forensic sketching with a generative adversarial network based architecture, which allows us to synthesize a real-like portrait of the criminal suspect described by an eyewitness. The proposed work contains two steps: sketch generation and portrait generation. For the former, a facial outline is sketched based on the descriptive details. For the latter, the facial details are completed to generate a portrait. To make the portrait more realistic, we use a portrait discriminator, which can not only learn the discriminative features between the faces synthesized by the generator and the real faces, but also recognize the face attributes. Experiments have shown that this work achieves promising performance for criminal sketching.
{"title":"Towards Criminal Sketching with Generative Adversarial Network","authors":"Hanzhou Wu, Yuwei Yao, Xinpeng Zhang, Jiangfeng Wang","doi":"10.1109/MMSP48831.2020.9287084","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287084","url":null,"abstract":"Criminal sketching aims to draw an approximation portrait of the criminal suspect by details of the criminal suspect that the observer can remember. However, even for a professional artist, it would need much time to complete sketching and draw a good portrait. It therefore motivates us to study forensic sketching with a generative adversarial network based architecture, which allows us to synthesize a real-like portrait of the criminal suspect described by an eyewitness. The proposed work contains two steps: sketch generation and portrait generation. For the former, a facial outline is sketched based on the descriptive details. For the latter, the facial details are completed to generate a portrait. To make the portrait more realistic, we use a portrait discriminator, which can not only learn the discriminative features between the faces synthesized by the generator and the real faces, but also recognize the face attributes. Experiments have shown that this work achieves promising performance for criminal sketching.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134300242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287150
N. Passalis, Jenni Raitoharju, M. Gabbouj, A. Tefas
Early exits provide an effective way of implementing adaptive computational graphs over deep learning models. In this way it is possible to adapt them on-the-fly to the available computational resources or even to the difficulty of each input sample, reducing the energy and computational power requirements in many embedded and mobile applications. However, performing this kind of adaptive inference also comes with several challenges, since the difficulty of each sample must be estimated and the most appropriate early exit must be selected. It is worth noting that existing approaches often lead to highly unbalanced distributions over the selected early exits, reducing the efficiency of the adaptive inference process. At the same time, only a few resources can be devoted to the aforementioned process, in order to ensure that an adequate speedup will be obtained. The main contribution of this work is to provide an easy to use and tune adaptive inference approach for early exits that can overcome some of these limitations. In this way, the proposed method allows for a) obtaining a more balanced inference distribution among the early exits, b) relying on a single and interpretable hyperparameter for tuning its behavior (ranging from faster inference to higher accuracy), and c) improving the performance of the networks (increasing the accuracy and reducing the time needed for inference). Indeed, the effectiveness of the proposed method over existing approaches is demonstrated using four different image datasets.
{"title":"Efficient Adaptive Inference Leveraging Bag-of-Features-based Early Exits","authors":"N. Passalis, Jenni Raitoharju, M. Gabbouj, A. Tefas","doi":"10.1109/MMSP48831.2020.9287150","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287150","url":null,"abstract":"Early exits provide an effective way of implementing adaptive computational graphs over deep learning models. In this way it is possible to adapt them on-the-fly to the available computational resources or even to the difficulty of each input sample, reducing the energy and computational power requirements in many embedded and mobile applications. However, performing this kind of adaptive inference also comes with several challenges, since the difficulty of each sample must be estimated and the most appropriate early exit must be selected. It is worth noting that existing approaches often lead to highly unbalanced distributions over the selected early exits, reducing the efficiency of the adaptive inference process. At the same time, only a few resources can be devoted to the aforementioned process, in order to ensure that an adequate speedup will be obtained. The main contribution of this work is to provide an easy to use and tune adaptive inference approach for early exits that can overcome some of these limitations. In this way, the proposed method allows for a) obtaining a more balanced inference distribution among the early exits, b) relying on a single and interpretable hyperparameter for tuning its behavior (ranging from faster inference to higher accuracy), and c) improving the performance of the networks (increasing the accuracy and reducing the time needed for inference). Indeed, the effectiveness of the proposed method over existing approaches is demonstrated using four different image datasets.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134302294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287132
Anna Meyer, Nils Genser, A. Kaup
Recent developments in optical sensors enable a wide range of applications for multispectral imaging, e.g., in surveillance, optical sorting, and life-science instrumentation. Increasing spatial and spectral resolution allows creating higher quality products, however, it poses challenges in handling such large amounts of data. Consequently, specialized compression techniques for multispectral images are required. High Efficiency Video Coding (HEVC) is known to be the state of the art in efficiency for both video coding and still image coding. In this paper, we propose a cross-spectral compression scheme for efficiently coding multispectral data based on HEVC. Extending intra picture prediction by a novel inter-band predictor, spectral as well as spatial redundancies can be effectively exploited. Dependencies among the current band and further spectral references are considered jointly by adaptive linear regression modeling. The proposed backward prediction scheme does not require additional side information for decoding. We show that our novel approach is able to outperform state-of-the-art lossy compression techniques in terms of rate-distortion performance. On different data sets, average Bjøntegaard delta rate savings of 82 % and 55 % compared to HEVC and a reference method from literature are achieved, respectively.
{"title":"Multispectral Image Compression Based on HEVC Using Pel-Recursive Inter-Band Prediction","authors":"Anna Meyer, Nils Genser, A. Kaup","doi":"10.1109/MMSP48831.2020.9287132","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287132","url":null,"abstract":"Recent developments in optical sensors enable a wide range of applications for multispectral imaging, e.g., in surveillance, optical sorting, and life-science instrumentation. Increasing spatial and spectral resolution allows creating higher quality products, however, it poses challenges in handling such large amounts of data. Consequently, specialized compression techniques for multispectral images are required. High Efficiency Video Coding (HEVC) is known to be the state of the art in efficiency for both video coding and still image coding. In this paper, we propose a cross-spectral compression scheme for efficiently coding multispectral data based on HEVC. Extending intra picture prediction by a novel inter-band predictor, spectral as well as spatial redundancies can be effectively exploited. Dependencies among the current band and further spectral references are considered jointly by adaptive linear regression modeling. The proposed backward prediction scheme does not require additional side information for decoding. We show that our novel approach is able to outperform state-of-the-art lossy compression techniques in terms of rate-distortion performance. On different data sets, average Bjøntegaard delta rate savings of 82 % and 55 % compared to HEVC and a reference method from literature are achieved, respectively.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134052280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287055
Rakesh Rao Ramachandra Rao, Steve Göring, Robert Steger, Saman Zadtootaghaj, Nabajeet Barman, S. Fremerey, S. Möller, A. Raake
The streaming of gaming content, both passive and interactive, has increased manifolds in recent years. Gaming contents bring with them some peculiarities which are normally not seen in traditional 2D videos, such as the artificial and synthetic nature of contents or repetition of objects in a game. In addition, the perception of gaming content by the user is different from that of traditional 2D videos due to its pecularities and also the fact that users may not often watch such content. Hence, it becomes imperative to evaluate whether the existing video quality models usually designed for traditional 2D videos are applicable to gaming content. In this paper, we evaluate the applicability of the recently standardized bitstream-based video-quality model ITU-T P.1204.3 on gaming content. To analyze the performance of this model, we used 4 different gaming datasets (3 publicly available + 1 internal) not previously used for model training, and compared it with the existing state-of-the-art models. We found that the ITU P.1204.3 model out of the box performs well on these unseen datasets, with an RMSE ranging between 0.38 − 0.45 on the 5-point absolute category rating and Pearson Correlation between 0.85 − 0.93 across all the 4 databases. We further propose a full-HD variant of the P.1204.3 model, since the original model is trained and validated which targets a resolution of 4K/UHD-1. A 50:50 split across all databases is used to train and validate this variant so as to make sure that the proposed model is applicable to various conditions.
{"title":"A Large-scale Evaluation of the bitstream-based video-quality model ITU-T P.1204.3 on Gaming Content","authors":"Rakesh Rao Ramachandra Rao, Steve Göring, Robert Steger, Saman Zadtootaghaj, Nabajeet Barman, S. Fremerey, S. Möller, A. Raake","doi":"10.1109/MMSP48831.2020.9287055","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287055","url":null,"abstract":"The streaming of gaming content, both passive and interactive, has increased manifolds in recent years. Gaming contents bring with them some peculiarities which are normally not seen in traditional 2D videos, such as the artificial and synthetic nature of contents or repetition of objects in a game. In addition, the perception of gaming content by the user is different from that of traditional 2D videos due to its pecularities and also the fact that users may not often watch such content. Hence, it becomes imperative to evaluate whether the existing video quality models usually designed for traditional 2D videos are applicable to gaming content. In this paper, we evaluate the applicability of the recently standardized bitstream-based video-quality model ITU-T P.1204.3 on gaming content. To analyze the performance of this model, we used 4 different gaming datasets (3 publicly available + 1 internal) not previously used for model training, and compared it with the existing state-of-the-art models. We found that the ITU P.1204.3 model out of the box performs well on these unseen datasets, with an RMSE ranging between 0.38 − 0.45 on the 5-point absolute category rating and Pearson Correlation between 0.85 − 0.93 across all the 4 databases. We further propose a full-HD variant of the P.1204.3 model, since the original model is trained and validated which targets a resolution of 4K/UHD-1. A 50:50 split across all databases is used to train and validate this variant so as to make sure that the proposed model is applicable to various conditions.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"237 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133234512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}