Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287091
Pierre R. Lebreton, Kazuhisa Yamagishi
In this paper, a model is investigated for optimizing the encoding of adaptive bitrate video streaming. To this end, the relationship between quality, content duration, and acceptability measured by using the completion ratio is studied. This work is based on intensive subjective testing performed in a laboratory environment and shows the importance of stimulus duration in acceptance studies. A model to predict the completion ratio of videos is provided and shows good accuracy. By using this model, quality requirements can be derived on the basis of the target abandonment rate and content duration. This work will help video streaming providers to define suitable coding conditions when preparing content to be broadcast on their platform that will maintain user engagement.
{"title":"Study on viewing completion ratio of video streaming","authors":"Pierre R. Lebreton, Kazuhisa Yamagishi","doi":"10.1109/MMSP48831.2020.9287091","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287091","url":null,"abstract":"In this paper, a model is investigated for optimizing the encoding of adaptive bitrate video streaming. To this end, the relationship between quality, content duration, and acceptability measured by using the completion ratio is studied. This work is based on intensive subjective testing performed in a laboratory environment and shows the importance of stimulus duration in acceptance studies. A model to predict the completion ratio of videos is provided and shows good accuracy. By using this model, quality requirements can be derived on the basis of the target abandonment rate and content duration. This work will help video streaming providers to define suitable coding conditions when preparing content to be broadcast on their platform that will maintain user engagement.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115099430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287100
Chung Xue Er Shamaine, Yuansong Qiao, John Henry, Ken McNevin, Niall Murray
Real world virtual world communication and interaction will be a cornerstone of future intelligent manufacturing ecosystems. Human robotic interaction is considered to be the basic element of factories of the future. Despite the advancement of different technologies such as wearables and Augmented Reality (AR), human-robot interaction (HRI) is still extremely challenging. Whilst progress has been made in the development of different mechanisms to support HRI, there are issues with cost, naturalistic and intuitive interaction, and communication across heterogeneous systems. To mitigate these limitations, RoSTAR is proposed. RoSTAR is a novel open-source HRI system based on the Robot Operating System (ROS) and Augmented Reality. An AR Head Mounted Display (HMD) is deployed. It enables the user to interact and communicate through a ROS powered robotic arm. A model of the robot arm is imported directly into the Unity Game engine, and any interactions with this virtual robotic arm are communicated to the ROS robotic arm. This system has the potential to be used for different process tasks, such as robotic gluing, dispensing and arc welding as part of an interoperable, low cost, portable and naturalistically interactive experience.
{"title":"RoSTAR: ROS-based Telerobotic Control via Augmented Reality","authors":"Chung Xue Er Shamaine, Yuansong Qiao, John Henry, Ken McNevin, Niall Murray","doi":"10.1109/MMSP48831.2020.9287100","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287100","url":null,"abstract":"Real world virtual world communication and interaction will be a cornerstone of future intelligent manufacturing ecosystems. Human robotic interaction is considered to be the basic element of factories of the future. Despite the advancement of different technologies such as wearables and Augmented Reality (AR), human-robot interaction (HRI) is still extremely challenging. Whilst progress has been made in the development of different mechanisms to support HRI, there are issues with cost, naturalistic and intuitive interaction, and communication across heterogeneous systems. To mitigate these limitations, RoSTAR is proposed. RoSTAR is a novel open-source HRI system based on the Robot Operating System (ROS) and Augmented Reality. An AR Head Mounted Display (HMD) is deployed. It enables the user to interact and communicate through a ROS powered robotic arm. A model of the robot arm is imported directly into the Unity Game engine, and any interactions with this virtual robotic arm are communicated to the ROS robotic arm. This system has the potential to be used for different process tasks, such as robotic gluing, dispensing and arc welding as part of an interoperable, low cost, portable and naturalistically interactive experience.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121114524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287129
Md. Asikuzzaman, Deepak Rajamohan, M. Pickering
Video data storage and transmission cost can be reduced by minimizing the temporally redundant information among frames using an appropriate motion-compensated prediction technique. In the current video coding standard, the neighbouring frames are exploited to predict the motion of the current frame using global motion estimation-based approaches. However, the global motion estimation of a frame may not produce the actual motion of individual objects in the frame as each of the objects in a frame usually has its own motion. In this paper, an edge-based motion estimation technique is presented that finds the motion of each object in the frame rather than finding the global motion of that frame. In the proposed method, edge position difference (EPD) similarity measure-based image registration between the two frames is applied to register each object in the frame. A superpixel search is then applied to segment the registered object. Finally, the proposed edge-based image registration technique and Demons algorithm are applied to predict the objects in the current frame. Our experimental analysis demonstrates that the proposed algorithm can estimate the motions of individual objects in the current frame accurately compared to the existing global motion estimation-based approaches.
{"title":"Object-Oriented Motion Estimation using Edge-Based Image Registration","authors":"Md. Asikuzzaman, Deepak Rajamohan, M. Pickering","doi":"10.1109/MMSP48831.2020.9287129","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287129","url":null,"abstract":"Video data storage and transmission cost can be reduced by minimizing the temporally redundant information among frames using an appropriate motion-compensated prediction technique. In the current video coding standard, the neighbouring frames are exploited to predict the motion of the current frame using global motion estimation-based approaches. However, the global motion estimation of a frame may not produce the actual motion of individual objects in the frame as each of the objects in a frame usually has its own motion. In this paper, an edge-based motion estimation technique is presented that finds the motion of each object in the frame rather than finding the global motion of that frame. In the proposed method, edge position difference (EPD) similarity measure-based image registration between the two frames is applied to register each object in the frame. A superpixel search is then applied to segment the registered object. Finally, the proposed edge-based image registration technique and Demons algorithm are applied to predict the objects in the current frame. Our experimental analysis demonstrates that the proposed algorithm can estimate the motions of individual objects in the current frame accurately compared to the existing global motion estimation-based approaches.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123177343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287082
Jianping Lin, Mohammad Akbari, H. Fu, Qian Zhang, Shang Wang, Jie Liang, Dong Liu, F. Liang, Guohe Zhang, Chengjie Tu
In this proposal, we design a learned multi-frequency image compression approach that uses generalized octave convolutions to factorize the latent representations into high-frequency (HF) and low-frequency (LF) components, and the LF components have lower resolution than HF components, which can improve the rate-distortion performance, similar to wavelet transform. Moreover, compared to the original octave convolution, the proposed generalized octave convolution (GoConv) and octave transposed-convolution (GoTConv) with internal activation layers preserve more spatial structure of the information, and enable more effective filtering between the HF and LF components, which further improve the performance. In addition, we develop a variable-rate scheme using the Lagrangian parameter to modulate all the internal feature maps in the autoencoder, which allows the scheme to achieve the large bitrate range of the JPEG AI with only three models. Experiments show that the proposed scheme achieves much better Y MS-SSIM than VVC. In terms of YUV PSNR, our scheme is very similar to HEVC.
{"title":"Variable-Rate Multi-Frequency Image Compression using Modulated Generalized Octave Convolution","authors":"Jianping Lin, Mohammad Akbari, H. Fu, Qian Zhang, Shang Wang, Jie Liang, Dong Liu, F. Liang, Guohe Zhang, Chengjie Tu","doi":"10.1109/MMSP48831.2020.9287082","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287082","url":null,"abstract":"In this proposal, we design a learned multi-frequency image compression approach that uses generalized octave convolutions to factorize the latent representations into high-frequency (HF) and low-frequency (LF) components, and the LF components have lower resolution than HF components, which can improve the rate-distortion performance, similar to wavelet transform. Moreover, compared to the original octave convolution, the proposed generalized octave convolution (GoConv) and octave transposed-convolution (GoTConv) with internal activation layers preserve more spatial structure of the information, and enable more effective filtering between the HF and LF components, which further improve the performance. In addition, we develop a variable-rate scheme using the Lagrangian parameter to modulate all the internal feature maps in the autoencoder, which allows the scheme to achieve the large bitrate range of the JPEG AI with only three models. Experiments show that the proposed scheme achieves much better Y MS-SSIM than VVC. In terms of YUV PSNR, our scheme is very similar to HEVC.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125230118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Image understanding under the foggy scene is greatly challenging due to inhomogeneous visibility deterioration. Although various image dehazing methods have been proposed, they usually aim to improve image visibility (such as, PSNR/SSIM) in the pixel space rather than the feature space, which is critical for the perception of computer vision. Due to this mismatch, existing dehazing methods are limited or even adverse in facilitating the foggy scene understanding. In this paper, we propose a generalized deep feature refinement module to minimize the difference between clear images and hazy images in the feature space. It is consistent with the computer perception and can be embedded into existing detection or segmentation backbones for joint optimization. Our feature refinement module is built upon the graph convolutional network, which is favorable in capturing the contextual information and beneficial for distinguishing different semantic objects. We validate our method on the detection and segmentation tasks under foggy scenes. Extensive experimental results show that our method outperforms the state-of-the-art dehazing based pretreatments and the fine-tuning results on hazy images.
{"title":"Haze-robust image understanding via context-aware deep feature refinement","authors":"Hui Li, Q. Wu, Haoran Wei, K. Ngan, Hongliang Li, Fanman Meng, Linfeng Xu","doi":"10.1109/MMSP48831.2020.9287089","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287089","url":null,"abstract":"Image understanding under the foggy scene is greatly challenging due to inhomogeneous visibility deterioration. Although various image dehazing methods have been proposed, they usually aim to improve image visibility (such as, PSNR/SSIM) in the pixel space rather than the feature space, which is critical for the perception of computer vision. Due to this mismatch, existing dehazing methods are limited or even adverse in facilitating the foggy scene understanding. In this paper, we propose a generalized deep feature refinement module to minimize the difference between clear images and hazy images in the feature space. It is consistent with the computer perception and can be embedded into existing detection or segmentation backbones for joint optimization. Our feature refinement module is built upon the graph convolutional network, which is favorable in capturing the contextual information and beneficial for distinguishing different semantic objects. We validate our method on the detection and segmentation tasks under foggy scenes. Extensive experimental results show that our method outperforms the state-of-the-art dehazing based pretreatments and the fine-tuning results on hazy images.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129250093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287148
Jacob Chakareski, Nicholas Mastronarde
We investigate a novel communications system that integrates scalable multi-layer 360° video tiling, viewport-adaptive rate-distortion optimal resource allocation, and VR-centric edge computing and caching, to enable future high-quality untethered VR streaming. Our system comprises a collection of 5G small cells that can pool their communication, computing, and storage resources to collectively deliver scalable 360° video content to mobile VR clients at much higher quality. Our major contributions are rigorous design of multi-layer 360° tiling and related models of statistical user navigation, and analysis and optimization of edge-based multi-user VR streaming that integrates viewport adaptation and server cooperation. We also explore the possibility of network coded data operation and its implications for the analysis, optimization, and system performance we pursue here. We demonstrate considerable gains in delivered immersion fidelity, featuring much higher 360° viewport peak signal to noise ratio (PSNR) and VR video frame rates and spatial resolutions.
{"title":"Mobile-Edge Cooperative Multi-User 360° Video Computing and Streaming","authors":"Jacob Chakareski, Nicholas Mastronarde","doi":"10.1109/MMSP48831.2020.9287148","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287148","url":null,"abstract":"We investigate a novel communications system that integrates scalable multi-layer 360° video tiling, viewport-adaptive rate-distortion optimal resource allocation, and VR-centric edge computing and caching, to enable future high-quality untethered VR streaming. Our system comprises a collection of 5G small cells that can pool their communication, computing, and storage resources to collectively deliver scalable 360° video content to mobile VR clients at much higher quality. Our major contributions are rigorous design of multi-layer 360° tiling and related models of statistical user navigation, and analysis and optimization of edge-based multi-user VR streaming that integrates viewport adaptation and server cooperation. We also explore the possibility of network coded data operation and its implications for the analysis, optimization, and system performance we pursue here. We demonstrate considerable gains in delivered immersion fidelity, featuring much higher 360° viewport peak signal to noise ratio (PSNR) and VR video frame rates and spatial resolutions.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132578755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287134
Camilo Arévalo, J. Villegas
A method to reduce the memory footprint of Head- Related Transfer Functions (HRTFs) is introduced. Based on an Eigen decomposition of HRTFs, the proposed method is capable of reducing a database comprising 6,344 measurements from 36.30 MB to 2.41MB (about a 15:1 compression ratio). Synthetic HRTFs in the compressed database were set to have less than 1dB spectral distortion between 0.1 and 16 kHz. The differences between the compressed measurements with those in the original database do not seem to translate into degradation of perceptual location accuracy. The high degree of compression obtained with this method allows the inclusion of interpolated HRTFs in databases for easing the real-time audio spatialization in Virtual Reality (VR).
{"title":"Compressing Head-Related Transfer Function databases by Eigen decomposition","authors":"Camilo Arévalo, J. Villegas","doi":"10.1109/MMSP48831.2020.9287134","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287134","url":null,"abstract":"A method to reduce the memory footprint of Head- Related Transfer Functions (HRTFs) is introduced. Based on an Eigen decomposition of HRTFs, the proposed method is capable of reducing a database comprising 6,344 measurements from 36.30 MB to 2.41MB (about a 15:1 compression ratio). Synthetic HRTFs in the compressed database were set to have less than 1dB spectral distortion between 0.1 and 16 kHz. The differences between the compressed measurements with those in the original database do not seem to translate into degradation of perceptual location accuracy. The high degree of compression obtained with this method allows the inclusion of interpolated HRTFs in databases for easing the real-time audio spatialization in Virtual Reality (VR).","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132370010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287143
Jessie Lin, N. Birkbeck, Balu Adsumilli
Display devices can affect the perceived quality of a video significantly. In this paper, we focus on the scenario where video resolution does not exceed screen resolution, and investigate the relationship of perceived video quality on mobile, laptop and TV. A novel transformation of Mean Opinion Scores (MOS) among different devices is proposed and is shown to be effective at normalizing ratings across user devices for in lab and crowd sourced subjective studies. The model allows us to perform more focused in lab subjective studies as we can reduce the number of test devices and helps us reduce noise during crowd-sourcing subjective video quality tests. It is also more effective than utilizing existing device dependent objective metrics for translating MOS ratings across devices.
{"title":"Translation of Perceived Video Quality Across Displays","authors":"Jessie Lin, N. Birkbeck, Balu Adsumilli","doi":"10.1109/MMSP48831.2020.9287143","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287143","url":null,"abstract":"Display devices can affect the perceived quality of a video significantly. In this paper, we focus on the scenario where video resolution does not exceed screen resolution, and investigate the relationship of perceived video quality on mobile, laptop and TV. A novel transformation of Mean Opinion Scores (MOS) among different devices is proposed and is shown to be effective at normalizing ratings across user devices for in lab and crowd sourced subjective studies. The model allows us to perform more focused in lab subjective studies as we can reduce the number of test devices and helps us reduce noise during crowd-sourcing subjective video quality tests. It is also more effective than utilizing existing device dependent objective metrics for translating MOS ratings across devices.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127693552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287166
Sönke Südbeck, Thomas C. Krause, J. Ostermann
Time-difference-of-arrival (TDOA) localization is a technique for finding the position of a wave emitting object, e.g., a car horn. Many algorithms have been proposed for TDOA localization under line-of-sight (LOS) conditions. In the non-line-of-sight (NLOS) case the performance of these algorithms usually deteriorates. There are techniques to reduce the error introduced by the NLOS condition, which, however, do not directly take into account information on the geometry of the surroundings. In this paper a NLOS TDOA localization approach for a simple diffraction scenario is described, which includes information on the surroundings into the equation system. An experiment with three different loudspeaker positions was conducted to validate the proposed method. The localization error was less than 6.2 % of the distance from the source to the closest microphone position. Simulations show that the proposed method attains the Cramer-Rao-Lower-Bound for low enough TDOA noise levels.
{"title":"Non-Line-of-Sight Time-Difference-of-Arrival Localization with Explicit Inclusion of Geometry Information in a Simple Diffraction Scenario","authors":"Sönke Südbeck, Thomas C. Krause, J. Ostermann","doi":"10.1109/MMSP48831.2020.9287166","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287166","url":null,"abstract":"Time-difference-of-arrival (TDOA) localization is a technique for finding the position of a wave emitting object, e.g., a car horn. Many algorithms have been proposed for TDOA localization under line-of-sight (LOS) conditions. In the non-line-of-sight (NLOS) case the performance of these algorithms usually deteriorates. There are techniques to reduce the error introduced by the NLOS condition, which, however, do not directly take into account information on the geometry of the surroundings. In this paper a NLOS TDOA localization approach for a simple diffraction scenario is described, which includes information on the surroundings into the equation system. An experiment with three different loudspeaker positions was conducted to validate the proposed method. The localization error was less than 6.2 % of the distance from the source to the closest microphone position. Simulations show that the proposed method attains the Cramer-Rao-Lower-Bound for low enough TDOA noise levels.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129147058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287060
André F. R. Guarda, Nuno M. M. Rodrigues, F. Pereira
Point clouds are a 3D visual representation format that has recently become fundamentally important for immersive and interactive multimedia applications. Considering the high number of points of practically relevant point clouds, and their increasing market demand, efficient point cloud coding has become a vital research topic. In addition, scalability is an important feature for point cloud coding, especially for real-time applications, where the fast and rate efficient access to a decoded point cloud is important; however, this issue is still rather unexplored in the literature. In this context, this paper proposes a novel deep learning-based point cloud geometry coding solution with resolution scalability via interlaced sub-sampling. As additional layers are decoded, the number of points in the reconstructed point cloud increases as well as the overall quality. Experimental results show that the proposed scalable point cloud geometry coding solution outperforms the recent MPEG Geometry-based Point Cloud Compression standard which is much less scalable.
{"title":"Deep Learning-based Point Cloud Geometry Coding with Resolution Scalability","authors":"André F. R. Guarda, Nuno M. M. Rodrigues, F. Pereira","doi":"10.1109/MMSP48831.2020.9287060","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287060","url":null,"abstract":"Point clouds are a 3D visual representation format that has recently become fundamentally important for immersive and interactive multimedia applications. Considering the high number of points of practically relevant point clouds, and their increasing market demand, efficient point cloud coding has become a vital research topic. In addition, scalability is an important feature for point cloud coding, especially for real-time applications, where the fast and rate efficient access to a decoded point cloud is important; however, this issue is still rather unexplored in the literature. In this context, this paper proposes a novel deep learning-based point cloud geometry coding solution with resolution scalability via interlaced sub-sampling. As additional layers are decoded, the number of points in the reconstructed point cloud increases as well as the overall quality. Experimental results show that the proposed scalable point cloud geometry coding solution outperforms the recent MPEG Geometry-based Point Cloud Compression standard which is much less scalable.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125395526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}