Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287100
Chung Xue Er Shamaine, Yuansong Qiao, John Henry, Ken McNevin, Niall Murray
Real world virtual world communication and interaction will be a cornerstone of future intelligent manufacturing ecosystems. Human robotic interaction is considered to be the basic element of factories of the future. Despite the advancement of different technologies such as wearables and Augmented Reality (AR), human-robot interaction (HRI) is still extremely challenging. Whilst progress has been made in the development of different mechanisms to support HRI, there are issues with cost, naturalistic and intuitive interaction, and communication across heterogeneous systems. To mitigate these limitations, RoSTAR is proposed. RoSTAR is a novel open-source HRI system based on the Robot Operating System (ROS) and Augmented Reality. An AR Head Mounted Display (HMD) is deployed. It enables the user to interact and communicate through a ROS powered robotic arm. A model of the robot arm is imported directly into the Unity Game engine, and any interactions with this virtual robotic arm are communicated to the ROS robotic arm. This system has the potential to be used for different process tasks, such as robotic gluing, dispensing and arc welding as part of an interoperable, low cost, portable and naturalistically interactive experience.
{"title":"RoSTAR: ROS-based Telerobotic Control via Augmented Reality","authors":"Chung Xue Er Shamaine, Yuansong Qiao, John Henry, Ken McNevin, Niall Murray","doi":"10.1109/MMSP48831.2020.9287100","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287100","url":null,"abstract":"Real world virtual world communication and interaction will be a cornerstone of future intelligent manufacturing ecosystems. Human robotic interaction is considered to be the basic element of factories of the future. Despite the advancement of different technologies such as wearables and Augmented Reality (AR), human-robot interaction (HRI) is still extremely challenging. Whilst progress has been made in the development of different mechanisms to support HRI, there are issues with cost, naturalistic and intuitive interaction, and communication across heterogeneous systems. To mitigate these limitations, RoSTAR is proposed. RoSTAR is a novel open-source HRI system based on the Robot Operating System (ROS) and Augmented Reality. An AR Head Mounted Display (HMD) is deployed. It enables the user to interact and communicate through a ROS powered robotic arm. A model of the robot arm is imported directly into the Unity Game engine, and any interactions with this virtual robotic arm are communicated to the ROS robotic arm. This system has the potential to be used for different process tasks, such as robotic gluing, dispensing and arc welding as part of an interoperable, low cost, portable and naturalistically interactive experience.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121114524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287082
Jianping Lin, Mohammad Akbari, H. Fu, Qian Zhang, Shang Wang, Jie Liang, Dong Liu, F. Liang, Guohe Zhang, Chengjie Tu
In this proposal, we design a learned multi-frequency image compression approach that uses generalized octave convolutions to factorize the latent representations into high-frequency (HF) and low-frequency (LF) components, and the LF components have lower resolution than HF components, which can improve the rate-distortion performance, similar to wavelet transform. Moreover, compared to the original octave convolution, the proposed generalized octave convolution (GoConv) and octave transposed-convolution (GoTConv) with internal activation layers preserve more spatial structure of the information, and enable more effective filtering between the HF and LF components, which further improve the performance. In addition, we develop a variable-rate scheme using the Lagrangian parameter to modulate all the internal feature maps in the autoencoder, which allows the scheme to achieve the large bitrate range of the JPEG AI with only three models. Experiments show that the proposed scheme achieves much better Y MS-SSIM than VVC. In terms of YUV PSNR, our scheme is very similar to HEVC.
{"title":"Variable-Rate Multi-Frequency Image Compression using Modulated Generalized Octave Convolution","authors":"Jianping Lin, Mohammad Akbari, H. Fu, Qian Zhang, Shang Wang, Jie Liang, Dong Liu, F. Liang, Guohe Zhang, Chengjie Tu","doi":"10.1109/MMSP48831.2020.9287082","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287082","url":null,"abstract":"In this proposal, we design a learned multi-frequency image compression approach that uses generalized octave convolutions to factorize the latent representations into high-frequency (HF) and low-frequency (LF) components, and the LF components have lower resolution than HF components, which can improve the rate-distortion performance, similar to wavelet transform. Moreover, compared to the original octave convolution, the proposed generalized octave convolution (GoConv) and octave transposed-convolution (GoTConv) with internal activation layers preserve more spatial structure of the information, and enable more effective filtering between the HF and LF components, which further improve the performance. In addition, we develop a variable-rate scheme using the Lagrangian parameter to modulate all the internal feature maps in the autoencoder, which allows the scheme to achieve the large bitrate range of the JPEG AI with only three models. Experiments show that the proposed scheme achieves much better Y MS-SSIM than VVC. In terms of YUV PSNR, our scheme is very similar to HEVC.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125230118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287159
Nina Žižakić, A. Pižurica
In this paper, we present a novel approach for designing local image descriptors that learn from data and from hand-crafted descriptors. In particular, we construct a learning model that first mimics the behaviour of a hand-crafted descriptor and then learns to improve upon it in an unsupervised manner. We demonstrate the use of this knowledge-transfer framework by constructing the learned BRIEF descriptor based on the well-known hand-crafted descriptor BRIEF. We implement our learned BRIEF with a convolutional autoencoder architecture. Evaluation on the HPatches benchmark for local image descriptors shows the effectiveness of the proposed approach in the tasks of patch retrieval, patch verification, and image matching.
{"title":"Learned BRIEF – transferring the knowledge from hand-crafted to learning-based descriptors","authors":"Nina Žižakić, A. Pižurica","doi":"10.1109/MMSP48831.2020.9287159","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287159","url":null,"abstract":"In this paper, we present a novel approach for designing local image descriptors that learn from data and from hand-crafted descriptors. In particular, we construct a learning model that first mimics the behaviour of a hand-crafted descriptor and then learns to improve upon it in an unsupervised manner. We demonstrate the use of this knowledge-transfer framework by constructing the learned BRIEF descriptor based on the well-known hand-crafted descriptor BRIEF. We implement our learned BRIEF with a convolutional autoencoder architecture. Evaluation on the HPatches benchmark for local image descriptors shows the effectiveness of the proposed approach in the tasks of patch retrieval, patch verification, and image matching.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114903195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work proposes a bi-directional intra prediction-based measurement coding algorithm for compressive sensing images. Compressive sensing is capable of reducing the size of the sparse signals, in which the high-dimensional signals are represented by the under-determined linear measurements. In order to explore the spatial redundancy in measurements, the corresponding pixel domain information extracted using the structure of measurement matrix. Firstly, the mono-directional prediction modes (i.e. horizontal mode and vertical mode), which refer to the nearest information of neighboring pixel blocks, are obtained by the structure of the measurement matrix. Secondly, we design bi-directional intra prediction modes (i.e. Diagonal + Horizontal, Diagonal + Vertical) base on the already obtained mono-directional prediction modes. Experimental results show that this work improves 0.01 - 0.02 dB PSNR improvement and the birate reductions of on average 19%, up to 36% compared to the state-of-the-art.
{"title":"Bi-directional intra prediction based measurement coding for compressive sensing images","authors":"Thuy Thi Thu Tran, Jirayu Peetakul, Chi Do-Kim Pham, Jinjia Zhou","doi":"10.1109/MMSP48831.2020.9287074","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287074","url":null,"abstract":"This work proposes a bi-directional intra prediction-based measurement coding algorithm for compressive sensing images. Compressive sensing is capable of reducing the size of the sparse signals, in which the high-dimensional signals are represented by the under-determined linear measurements. In order to explore the spatial redundancy in measurements, the corresponding pixel domain information extracted using the structure of measurement matrix. Firstly, the mono-directional prediction modes (i.e. horizontal mode and vertical mode), which refer to the nearest information of neighboring pixel blocks, are obtained by the structure of the measurement matrix. Secondly, we design bi-directional intra prediction modes (i.e. Diagonal + Horizontal, Diagonal + Vertical) base on the already obtained mono-directional prediction modes. Experimental results show that this work improves 0.01 - 0.02 dB PSNR improvement and the birate reductions of on average 19%, up to 36% compared to the state-of-the-art.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122180891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Image understanding under the foggy scene is greatly challenging due to inhomogeneous visibility deterioration. Although various image dehazing methods have been proposed, they usually aim to improve image visibility (such as, PSNR/SSIM) in the pixel space rather than the feature space, which is critical for the perception of computer vision. Due to this mismatch, existing dehazing methods are limited or even adverse in facilitating the foggy scene understanding. In this paper, we propose a generalized deep feature refinement module to minimize the difference between clear images and hazy images in the feature space. It is consistent with the computer perception and can be embedded into existing detection or segmentation backbones for joint optimization. Our feature refinement module is built upon the graph convolutional network, which is favorable in capturing the contextual information and beneficial for distinguishing different semantic objects. We validate our method on the detection and segmentation tasks under foggy scenes. Extensive experimental results show that our method outperforms the state-of-the-art dehazing based pretreatments and the fine-tuning results on hazy images.
{"title":"Haze-robust image understanding via context-aware deep feature refinement","authors":"Hui Li, Q. Wu, Haoran Wei, K. Ngan, Hongliang Li, Fanman Meng, Linfeng Xu","doi":"10.1109/MMSP48831.2020.9287089","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287089","url":null,"abstract":"Image understanding under the foggy scene is greatly challenging due to inhomogeneous visibility deterioration. Although various image dehazing methods have been proposed, they usually aim to improve image visibility (such as, PSNR/SSIM) in the pixel space rather than the feature space, which is critical for the perception of computer vision. Due to this mismatch, existing dehazing methods are limited or even adverse in facilitating the foggy scene understanding. In this paper, we propose a generalized deep feature refinement module to minimize the difference between clear images and hazy images in the feature space. It is consistent with the computer perception and can be embedded into existing detection or segmentation backbones for joint optimization. Our feature refinement module is built upon the graph convolutional network, which is favorable in capturing the contextual information and beneficial for distinguishing different semantic objects. We validate our method on the detection and segmentation tasks under foggy scenes. Extensive experimental results show that our method outperforms the state-of-the-art dehazing based pretreatments and the fine-tuning results on hazy images.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129250093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287143
Jessie Lin, N. Birkbeck, Balu Adsumilli
Display devices can affect the perceived quality of a video significantly. In this paper, we focus on the scenario where video resolution does not exceed screen resolution, and investigate the relationship of perceived video quality on mobile, laptop and TV. A novel transformation of Mean Opinion Scores (MOS) among different devices is proposed and is shown to be effective at normalizing ratings across user devices for in lab and crowd sourced subjective studies. The model allows us to perform more focused in lab subjective studies as we can reduce the number of test devices and helps us reduce noise during crowd-sourcing subjective video quality tests. It is also more effective than utilizing existing device dependent objective metrics for translating MOS ratings across devices.
{"title":"Translation of Perceived Video Quality Across Displays","authors":"Jessie Lin, N. Birkbeck, Balu Adsumilli","doi":"10.1109/MMSP48831.2020.9287143","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287143","url":null,"abstract":"Display devices can affect the perceived quality of a video significantly. In this paper, we focus on the scenario where video resolution does not exceed screen resolution, and investigate the relationship of perceived video quality on mobile, laptop and TV. A novel transformation of Mean Opinion Scores (MOS) among different devices is proposed and is shown to be effective at normalizing ratings across user devices for in lab and crowd sourced subjective studies. The model allows us to perform more focused in lab subjective studies as we can reduce the number of test devices and helps us reduce noise during crowd-sourcing subjective video quality tests. It is also more effective than utilizing existing device dependent objective metrics for translating MOS ratings across devices.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127693552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287134
Camilo Arévalo, J. Villegas
A method to reduce the memory footprint of Head- Related Transfer Functions (HRTFs) is introduced. Based on an Eigen decomposition of HRTFs, the proposed method is capable of reducing a database comprising 6,344 measurements from 36.30 MB to 2.41MB (about a 15:1 compression ratio). Synthetic HRTFs in the compressed database were set to have less than 1dB spectral distortion between 0.1 and 16 kHz. The differences between the compressed measurements with those in the original database do not seem to translate into degradation of perceptual location accuracy. The high degree of compression obtained with this method allows the inclusion of interpolated HRTFs in databases for easing the real-time audio spatialization in Virtual Reality (VR).
{"title":"Compressing Head-Related Transfer Function databases by Eigen decomposition","authors":"Camilo Arévalo, J. Villegas","doi":"10.1109/MMSP48831.2020.9287134","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287134","url":null,"abstract":"A method to reduce the memory footprint of Head- Related Transfer Functions (HRTFs) is introduced. Based on an Eigen decomposition of HRTFs, the proposed method is capable of reducing a database comprising 6,344 measurements from 36.30 MB to 2.41MB (about a 15:1 compression ratio). Synthetic HRTFs in the compressed database were set to have less than 1dB spectral distortion between 0.1 and 16 kHz. The differences between the compressed measurements with those in the original database do not seem to translate into degradation of perceptual location accuracy. The high degree of compression obtained with this method allows the inclusion of interpolated HRTFs in databases for easing the real-time audio spatialization in Virtual Reality (VR).","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132370010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287148
Jacob Chakareski, Nicholas Mastronarde
We investigate a novel communications system that integrates scalable multi-layer 360° video tiling, viewport-adaptive rate-distortion optimal resource allocation, and VR-centric edge computing and caching, to enable future high-quality untethered VR streaming. Our system comprises a collection of 5G small cells that can pool their communication, computing, and storage resources to collectively deliver scalable 360° video content to mobile VR clients at much higher quality. Our major contributions are rigorous design of multi-layer 360° tiling and related models of statistical user navigation, and analysis and optimization of edge-based multi-user VR streaming that integrates viewport adaptation and server cooperation. We also explore the possibility of network coded data operation and its implications for the analysis, optimization, and system performance we pursue here. We demonstrate considerable gains in delivered immersion fidelity, featuring much higher 360° viewport peak signal to noise ratio (PSNR) and VR video frame rates and spatial resolutions.
{"title":"Mobile-Edge Cooperative Multi-User 360° Video Computing and Streaming","authors":"Jacob Chakareski, Nicholas Mastronarde","doi":"10.1109/MMSP48831.2020.9287148","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287148","url":null,"abstract":"We investigate a novel communications system that integrates scalable multi-layer 360° video tiling, viewport-adaptive rate-distortion optimal resource allocation, and VR-centric edge computing and caching, to enable future high-quality untethered VR streaming. Our system comprises a collection of 5G small cells that can pool their communication, computing, and storage resources to collectively deliver scalable 360° video content to mobile VR clients at much higher quality. Our major contributions are rigorous design of multi-layer 360° tiling and related models of statistical user navigation, and analysis and optimization of edge-based multi-user VR streaming that integrates viewport adaptation and server cooperation. We also explore the possibility of network coded data operation and its implications for the analysis, optimization, and system performance we pursue here. We demonstrate considerable gains in delivered immersion fidelity, featuring much higher 360° viewport peak signal to noise ratio (PSNR) and VR video frame rates and spatial resolutions.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132578755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287161
Giovanni Pepe, L. Gabrielli, S. Squartini, L. Cattani, Carlo Tripodi
A recent trend in car audio systems is the generation of Individual Listening Zones (ILZ), allowing to improve phone call privacy and reduce disturbance to other passengers, without wearing headphones or earpieces. This is generally achieved by using loudspeaker arrays. In this paper, we describe an approach to achieve ILZ exploiting general purpose car loudspeakers and processing the signal through carefully designed Finite Impulse Response (FIR) filters. We propose a deep neural network approach for the design of filters coefficients in order to obtain a so-called bright zone, where the signal is clearly heard, and a dark zone, where the signal is attenuated. Additionally, the frequency response in the bright zone is constrained to be as flat as possible. Numerical experiments were performed taking the impulse responses measured with either one binaural pair or three binaural pairs for each passenger. The results in terms of attenuation and flatness prove the viability of the approach.
{"title":"Deep Learning for Individual Listening Zone","authors":"Giovanni Pepe, L. Gabrielli, S. Squartini, L. Cattani, Carlo Tripodi","doi":"10.1109/MMSP48831.2020.9287161","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287161","url":null,"abstract":"A recent trend in car audio systems is the generation of Individual Listening Zones (ILZ), allowing to improve phone call privacy and reduce disturbance to other passengers, without wearing headphones or earpieces. This is generally achieved by using loudspeaker arrays. In this paper, we describe an approach to achieve ILZ exploiting general purpose car loudspeakers and processing the signal through carefully designed Finite Impulse Response (FIR) filters. We propose a deep neural network approach for the design of filters coefficients in order to obtain a so-called bright zone, where the signal is clearly heard, and a dark zone, where the signal is attenuated. Additionally, the frequency response in the bright zone is constrained to be as flat as possible. Numerical experiments were performed taking the impulse responses measured with either one binaural pair or three binaural pairs for each passenger. The results in terms of attenuation and flatness prove the viability of the approach.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127745710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-09-21DOI: 10.1109/MMSP48831.2020.9287092
D. Graziosi, A. Tabatabai, Vladyslav Zakharchenko, A. Zaghetto
For a V-PCC1 system to be able to reconstruct a single instance of the point cloud one V-PCC unit must be transferred to the 3D point cloud reconstruction module. It is however required that all the V-PCC components i.e. occupancy map, geometry, atlas and attribute to be temporally aligned. This, in principle, could pose a challenge since the temporal structures of the decoded sub-bitstreams are not coherent across V-PCC sub-bitstreams. In this paper we propose an output delay adjustment mechanism for the decoded V-PCC sub-bitstreams to provide synchronized V-PCC components input to the point cloud reconstruction module.
{"title":"V-PCC Component Synchronization for Point Cloud Reconstruction","authors":"D. Graziosi, A. Tabatabai, Vladyslav Zakharchenko, A. Zaghetto","doi":"10.1109/MMSP48831.2020.9287092","DOIUrl":"https://doi.org/10.1109/MMSP48831.2020.9287092","url":null,"abstract":"For a V-PCC1 system to be able to reconstruct a single instance of the point cloud one V-PCC unit must be transferred to the 3D point cloud reconstruction module. It is however required that all the V-PCC components i.e. occupancy map, geometry, atlas and attribute to be temporally aligned. This, in principle, could pose a challenge since the temporal structures of the decoded sub-bitstreams are not coherent across V-PCC sub-bitstreams. In this paper we propose an output delay adjustment mechanism for the decoded V-PCC sub-bitstreams to provide synchronized V-PCC components input to the point cloud reconstruction module.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126168165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}