Pub Date : 2023-03-01DOI: 10.1109/VR55154.2023.00050
Jingying Wang, Yilin Qiu, Keyu Chen, Yu Ding, Ye Pan
Avatars are one of the most important elements in virtual environments. Real-time facial retargeting technology is of vital importance in AR/VR interactions, the filmmaking, and the entertainment industry, and blendshapes for avatars are one of its important materials. Previous works either focused on the characters with the same topology, which cannot be generalized to universal avatars, or used optimization methods that have high demand on the dataset. In this paper, we adopt the essence of deep learning and feature transfer to realize deformation transfer, thereby generating blendshapes for target avatars based on the given sources. We proposed a Variational Autoencoder (VAE) to extract the latent space of the avatars and then use a Multilayer Perceptron (MLP) model to realize the translation between the latent spaces of the source avatar and target avatars. By decoding the latent code of different blendshapes, we can obtain the blendshapes for the target avatars with the same semantics as that of the source. We qualitatively and quantitatively compared our method with both classical and learning-based methods. The results revealed that the blendshapes generated by our method achieves higher similarity to the groundtruth blendshapes than the state-of-art methods. We also demonstrated that our method can be applied to expression transfer for stylized characters with different topologies.
{"title":"Fully Automatic Blendshape Generation for Stylized Characters","authors":"Jingying Wang, Yilin Qiu, Keyu Chen, Yu Ding, Ye Pan","doi":"10.1109/VR55154.2023.00050","DOIUrl":"https://doi.org/10.1109/VR55154.2023.00050","url":null,"abstract":"Avatars are one of the most important elements in virtual environments. Real-time facial retargeting technology is of vital importance in AR/VR interactions, the filmmaking, and the entertainment industry, and blendshapes for avatars are one of its important materials. Previous works either focused on the characters with the same topology, which cannot be generalized to universal avatars, or used optimization methods that have high demand on the dataset. In this paper, we adopt the essence of deep learning and feature transfer to realize deformation transfer, thereby generating blendshapes for target avatars based on the given sources. We proposed a Variational Autoencoder (VAE) to extract the latent space of the avatars and then use a Multilayer Perceptron (MLP) model to realize the translation between the latent spaces of the source avatar and target avatars. By decoding the latent code of different blendshapes, we can obtain the blendshapes for the target avatars with the same semantics as that of the source. We qualitatively and quantitatively compared our method with both classical and learning-based methods. The results revealed that the blendshapes generated by our method achieves higher similarity to the groundtruth blendshapes than the state-of-art methods. We also demonstrated that our method can be applied to expression transfer for stylized characters with different topologies.","PeriodicalId":346767,"journal":{"name":"2023 IEEE Conference Virtual Reality and 3D User Interfaces (VR)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124137195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-01DOI: 10.1109/VR55154.2023.00053
Sunyoung Bang, Woontack Woo
The reading experience on current augmented reality (AR) head mounted displays (HMDs) is often impeded by the devices' low perceived resolution, translucency, and small field of view, especially in situations involving lengthy text. Although many researchers have proposed methods to resolve this issue, the inherent characteristics prevent these displays from delivering a readability on par with that of more traditional displays. As a solution, we explore the use of smartphones as assistive displays to AR HMDs. To validate the feasibility of our approach, we conducted a user study in which we compared a smartphone-assisted hybrid interface against using the HMD only for two different text lengths. The results demonstrate that the hybrid interface yields a lower task load regardless of the text length, although it does not improve task performance. Furthermore, the hybrid interface provides a better experience regarding user comfort, visual fatigue, and perceived readability. Based on these results, we claim that joining the spatial output capabilities of the HMD with the high-resolution display of the smartphone is a viable solution for improving the reading experience in AR.
{"title":"Enhancing the Reading Experience on AR HMDs by Using Smartphones as Assistive Displays","authors":"Sunyoung Bang, Woontack Woo","doi":"10.1109/VR55154.2023.00053","DOIUrl":"https://doi.org/10.1109/VR55154.2023.00053","url":null,"abstract":"The reading experience on current augmented reality (AR) head mounted displays (HMDs) is often impeded by the devices' low perceived resolution, translucency, and small field of view, especially in situations involving lengthy text. Although many researchers have proposed methods to resolve this issue, the inherent characteristics prevent these displays from delivering a readability on par with that of more traditional displays. As a solution, we explore the use of smartphones as assistive displays to AR HMDs. To validate the feasibility of our approach, we conducted a user study in which we compared a smartphone-assisted hybrid interface against using the HMD only for two different text lengths. The results demonstrate that the hybrid interface yields a lower task load regardless of the text length, although it does not improve task performance. Furthermore, the hybrid interface provides a better experience regarding user comfort, visual fatigue, and perceived readability. Based on these results, we claim that joining the spatial output capabilities of the HMD with the high-resolution display of the smartphone is a viable solution for improving the reading experience in AR.","PeriodicalId":346767,"journal":{"name":"2023 IEEE Conference Virtual Reality and 3D User Interfaces (VR)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125627616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-01DOI: 10.1109/VR55154.2023.00077
Theophilus Teo, Kuniharu Sakurada, M. Sugimoto
Parallel view is a technique that allows a VR user to see multiple locations at a time. It enables the user to control several remote or virtual body parts while seeing parallel views to solve synchronous tasks. However, these techniques only explored the benefits and drawbacks of a user performing different tasks. In this paper, we explored enhancements on a singular or asynchronous task by utilizing information obtained in parallel views. We developed three prototypes where parallel views are fixed, moving in symmetric order, or following the user's eye gaze. We conducted a user study to compare each prototype against traditional VR (without parallel views) in three types of tasks: object search and interaction tasks in a 1) simple environment and 2) complex environment, and 3) object distances estimation task. We found parallel views improved multi-embodiment while each technique helped different tasks. No parallel view provided a clean interface, thus improving spatial presence, mental effort, and user performance. However, participants' feedback highlighted potential usefulness and a lower physical effort by using parallel views to solve complicated tasks.
{"title":"Exploring Enhancements towards Gaze Oriented Parallel Views in Immersive Tasks","authors":"Theophilus Teo, Kuniharu Sakurada, M. Sugimoto","doi":"10.1109/VR55154.2023.00077","DOIUrl":"https://doi.org/10.1109/VR55154.2023.00077","url":null,"abstract":"Parallel view is a technique that allows a VR user to see multiple locations at a time. It enables the user to control several remote or virtual body parts while seeing parallel views to solve synchronous tasks. However, these techniques only explored the benefits and drawbacks of a user performing different tasks. In this paper, we explored enhancements on a singular or asynchronous task by utilizing information obtained in parallel views. We developed three prototypes where parallel views are fixed, moving in symmetric order, or following the user's eye gaze. We conducted a user study to compare each prototype against traditional VR (without parallel views) in three types of tasks: object search and interaction tasks in a 1) simple environment and 2) complex environment, and 3) object distances estimation task. We found parallel views improved multi-embodiment while each technique helped different tasks. No parallel view provided a clean interface, thus improving spatial presence, mental effort, and user performance. However, participants' feedback highlighted potential usefulness and a lower physical effort by using parallel views to solve complicated tasks.","PeriodicalId":346767,"journal":{"name":"2023 IEEE Conference Virtual Reality and 3D User Interfaces (VR)","volume":"238 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132164128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-01DOI: 10.1109/VR55154.2023.00074
Shiwei Cheng, Jieming Tian
With the emergence of brain-computer interface (BCI) technology and virtual reality (VR), how to improve the quality of motor imagery (MI) electroencephalogram (EEG) signal has become a key issue for MI BCI applications under VR. In this paper, we proposed to enhance the quality of MI EEG signal by using haptic stimulation training. We designed a first-person perspective and a third-person perspective scene under VR, and the experimental results showed that the left- and right-hand MI EEG quality of the participants improved significantly compared with that before training, and the mean differentiation of the left- and right-hand MI tasks was improved by 21.8% and 15.7%, respectively. We implemented a BCI application system in VR and developed a game based on MI EEG for control of ball movement, in which the average classification accuracy by the participants after training in the first-person perspective reached 93.5%, which was a significant improvement over existing study.
{"title":"A Haptic Stimulation-Based Training Method to Improve the Quality of Motor Imagery EEG Signal in VR","authors":"Shiwei Cheng, Jieming Tian","doi":"10.1109/VR55154.2023.00074","DOIUrl":"https://doi.org/10.1109/VR55154.2023.00074","url":null,"abstract":"With the emergence of brain-computer interface (BCI) technology and virtual reality (VR), how to improve the quality of motor imagery (MI) electroencephalogram (EEG) signal has become a key issue for MI BCI applications under VR. In this paper, we proposed to enhance the quality of MI EEG signal by using haptic stimulation training. We designed a first-person perspective and a third-person perspective scene under VR, and the experimental results showed that the left- and right-hand MI EEG quality of the participants improved significantly compared with that before training, and the mean differentiation of the left- and right-hand MI tasks was improved by 21.8% and 15.7%, respectively. We implemented a BCI application system in VR and developed a game based on MI EEG for control of ball movement, in which the average classification accuracy by the participants after training in the first-person perspective reached 93.5%, which was a significant improvement over existing study.","PeriodicalId":346767,"journal":{"name":"2023 IEEE Conference Virtual Reality and 3D User Interfaces (VR)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131597637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-01DOI: 10.1109/VR55154.2023.00018
Yiwei Bao, Jiaxi Wang, Zhimin Wang, Feng Lu
Recent research based on hand-eye coordination has shown that gaze could improve object selection and translation experience under certain scenarios in AR. However, several limitations still exist. Specifically, we investigate whether gaze could help object selection with heavy 3D occlusions and help 3D object translation in the depth dimension. In addition, we also investigate the possibility of reducing the gaze calibration burden before use. Therefore, we develop new methods with proper gaze guidance for 3D interaction in AR, and also an implicit online calibration method. We conduct two user studies to evaluate different interaction methods and the results show that our methods not only improve the effectiveness of occluded objects selection but also alleviate the arm fatigue problem significantly in the depth translation task. We also evaluate the proposed implicit online calibration method and find its accuracy comparable to standard 9 points explicit calibration, which makes a step towards practical use in the real world.
{"title":"Exploring 3D Interaction with Gaze Guidance in Augmented Reality","authors":"Yiwei Bao, Jiaxi Wang, Zhimin Wang, Feng Lu","doi":"10.1109/VR55154.2023.00018","DOIUrl":"https://doi.org/10.1109/VR55154.2023.00018","url":null,"abstract":"Recent research based on hand-eye coordination has shown that gaze could improve object selection and translation experience under certain scenarios in AR. However, several limitations still exist. Specifically, we investigate whether gaze could help object selection with heavy 3D occlusions and help 3D object translation in the depth dimension. In addition, we also investigate the possibility of reducing the gaze calibration burden before use. Therefore, we develop new methods with proper gaze guidance for 3D interaction in AR, and also an implicit online calibration method. We conduct two user studies to evaluate different interaction methods and the results show that our methods not only improve the effectiveness of occluded objects selection but also alleviate the arm fatigue problem significantly in the depth translation task. We also evaluate the proposed implicit online calibration method and find its accuracy comparable to standard 9 points explicit calibration, which makes a step towards practical use in the real world.","PeriodicalId":346767,"journal":{"name":"2023 IEEE Conference Virtual Reality and 3D User Interfaces (VR)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133410941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-01DOI: 10.1109/vr55154.2023.00094
{"title":"VGTC Virtual Reality Academy Award","authors":"","doi":"10.1109/vr55154.2023.00094","DOIUrl":"https://doi.org/10.1109/vr55154.2023.00094","url":null,"abstract":"","PeriodicalId":346767,"journal":{"name":"2023 IEEE Conference Virtual Reality and 3D User Interfaces (VR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127094468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-01DOI: 10.1109/VR55154.2023.00061
Mykola Maslych, Yahya Hmaiti, Ryan Ghamandi, Paige Leber, Ravi Kiran Kattoju, Jacob Belga, J. Laviola
Standard selection techniques such as ray casting fail when virtual objects are partially or fully occluded. In this paper, we present two novel approaches that combine cone-casting, world-in-miniature, and grasping metaphors to disocclude objects in the representation local to the user. Through a within-subject study where we compared 4 selection techniques across 3 levels of object occlusion, we found that our techniques outperformed an alternative one that also focuses on maintaining the spatial relationships between objects. We discuss application scenarios and future research directions for these types of selection techniques.
{"title":"Toward Intuitive Acquisition of Occluded VR Objects Through an Interactive Disocclusion Mini-map","authors":"Mykola Maslych, Yahya Hmaiti, Ryan Ghamandi, Paige Leber, Ravi Kiran Kattoju, Jacob Belga, J. Laviola","doi":"10.1109/VR55154.2023.00061","DOIUrl":"https://doi.org/10.1109/VR55154.2023.00061","url":null,"abstract":"Standard selection techniques such as ray casting fail when virtual objects are partially or fully occluded. In this paper, we present two novel approaches that combine cone-casting, world-in-miniature, and grasping metaphors to disocclude objects in the representation local to the user. Through a within-subject study where we compared 4 selection techniques across 3 levels of object occlusion, we found that our techniques outperformed an alternative one that also focuses on maintaining the spatial relationships between objects. We discuss application scenarios and future research directions for these types of selection techniques.","PeriodicalId":346767,"journal":{"name":"2023 IEEE Conference Virtual Reality and 3D User Interfaces (VR)","volume":"261 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132631619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-01DOI: 10.1109/VR55154.2023.00036
Rahul Singh, Muhammad Huzaifa, Jeffrey Liu, Anjul Patney, Hashim Sharif, Yifan Zhao, S. Adve
Extended reality (XR) devices, including augmented, virtual, and mixed reality, provide a deeply immersive experience. However, practical limitations like weight, heat, and comfort put extreme constraints on the performance, power consumption, and image quality of such systems. In this paper, we study how these constraints form the tradeoff between Fixed Foveated Rendering (FFR), Gaze-Tracked Foveated Rendering (TFR), and conventional, non-foveated rendering. While existing papers have often studied these methods, we provide the first comprehensive study of their relative feasibility in practical systems with limited battery life and computational budget. We show that TFR with the added cost of the gaze-tracker can often be more expensive than FFR. Thus, we co-design a gaze-tracked foveated renderer considering its benefits in computation, power efficiency, and tradeoffs in image quality. We describe principled approximations for eye tracking which provide up to a 9x speedup in runtime performance with approximately a 20x improvement in energy efficiency when run on a mobile GPU. In isolation, these approximations appear to significantly degrade the gaze quality, but appropriate compensation in the visual pipeline can mitigate the loss. Overall, we show that with a highly optimized gaze-tracker, TFR is feasible compared to FFR, resulting in up to 1.25x faster frame times while also reducing total energy consumption by over 40%.
{"title":"Power, Performance, and Image Quality Tradeoffs in Foveated Rendering","authors":"Rahul Singh, Muhammad Huzaifa, Jeffrey Liu, Anjul Patney, Hashim Sharif, Yifan Zhao, S. Adve","doi":"10.1109/VR55154.2023.00036","DOIUrl":"https://doi.org/10.1109/VR55154.2023.00036","url":null,"abstract":"Extended reality (XR) devices, including augmented, virtual, and mixed reality, provide a deeply immersive experience. However, practical limitations like weight, heat, and comfort put extreme constraints on the performance, power consumption, and image quality of such systems. In this paper, we study how these constraints form the tradeoff between Fixed Foveated Rendering (FFR), Gaze-Tracked Foveated Rendering (TFR), and conventional, non-foveated rendering. While existing papers have often studied these methods, we provide the first comprehensive study of their relative feasibility in practical systems with limited battery life and computational budget. We show that TFR with the added cost of the gaze-tracker can often be more expensive than FFR. Thus, we co-design a gaze-tracked foveated renderer considering its benefits in computation, power efficiency, and tradeoffs in image quality. We describe principled approximations for eye tracking which provide up to a 9x speedup in runtime performance with approximately a 20x improvement in energy efficiency when run on a mobile GPU. In isolation, these approximations appear to significantly degrade the gaze quality, but appropriate compensation in the visual pipeline can mitigate the loss. Overall, we show that with a highly optimized gaze-tracker, TFR is feasible compared to FFR, resulting in up to 1.25x faster frame times while also reducing total energy consumption by over 40%.","PeriodicalId":346767,"journal":{"name":"2023 IEEE Conference Virtual Reality and 3D User Interfaces (VR)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115392122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-01DOI: 10.1109/VR55154.2023.00062
Xuehuai Shi, Lili Wang, Jian Wu, Wei Ke, C. Lam
Optimizing rendering performance improves the user's immersion in virtual scene exploration. Foveated rendering uses the features of the human visual system (HVS) to improve rendering performance without sacrificing perceptual visual quality. We collect and analyze the viewing motion of different locomotion methods, and describe the effects of these viewing motions on HVS's sensitivity, as well as the advantages of these effects that may bring to foveated rendering. Then we propose the locomotion-aware foveated rendering method (LaFR) to further accelerate foveated rendering by leveraging the advantages. In LaFR, we first introduce the framework of LaFR. Secondly, we propose an eccentricity-based shading rate controller that provides the shading rate control of the given region in foveated rendering. Thirdly, we propose a locomotion-aware log-polar mapping method, which controls the foveal average shading rate, the peripheral shading rate decrease speed, and the overall shading quantity with the locomotion-aware coefficients based on the eccentricity-based shading rate controller. LaFR achieves similar perceptual visual quality as the conventional foveated rendering while achieving up to 1.6× speedup. Compared with the full resolution rendering, LaFR achieves up to 3.8× speedup.
{"title":"Locomotion-aware Foveated Rendering","authors":"Xuehuai Shi, Lili Wang, Jian Wu, Wei Ke, C. Lam","doi":"10.1109/VR55154.2023.00062","DOIUrl":"https://doi.org/10.1109/VR55154.2023.00062","url":null,"abstract":"Optimizing rendering performance improves the user's immersion in virtual scene exploration. Foveated rendering uses the features of the human visual system (HVS) to improve rendering performance without sacrificing perceptual visual quality. We collect and analyze the viewing motion of different locomotion methods, and describe the effects of these viewing motions on HVS's sensitivity, as well as the advantages of these effects that may bring to foveated rendering. Then we propose the locomotion-aware foveated rendering method (LaFR) to further accelerate foveated rendering by leveraging the advantages. In LaFR, we first introduce the framework of LaFR. Secondly, we propose an eccentricity-based shading rate controller that provides the shading rate control of the given region in foveated rendering. Thirdly, we propose a locomotion-aware log-polar mapping method, which controls the foveal average shading rate, the peripheral shading rate decrease speed, and the overall shading quantity with the locomotion-aware coefficients based on the eccentricity-based shading rate controller. LaFR achieves similar perceptual visual quality as the conventional foveated rendering while achieving up to 1.6× speedup. Compared with the full resolution rendering, LaFR achieves up to 3.8× speedup.","PeriodicalId":346767,"journal":{"name":"2023 IEEE Conference Virtual Reality and 3D User Interfaces (VR)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114201108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-01DOI: 10.1109/VR55154.2023.00033
Junhua Liu, Boxiang Zhu, Fang Wang, Yili Jin, Wenyi Zhang, Zihan Xu, Shuguang Cui
Volumetric video (VV) recently emerges as a new form of video application providing a photorealistic immersive 3D viewing experience with 6 degree-of-freedom (DoF), which empowers many applications such as VR, AR, and Metaverse. A key problem therein is how to stream the enormous size VV through the network with limited bandwidth. Existing works mostly focused on predicting the viewport for a tiling-based adaptive VV streaming, which however only has quite a limited effect on resource saving. We argue that the content repeatability in the viewport can be further leveraged, and for the first time, propose a client-side cache-assisted strategy that aims to buffer the repeatedly appearing VV tiles in the near future so as to reduce the redundant VV content transmission. The key challenges exist in three aspects, including (1) feature extraction and mining in 6 DoF VV context, (2) accurate long-term viewing pattern estimation and (3) optimal caching scheduling with limited capacity. In this paper, we propose CaV3, an integrated cache-assisted viewport adaptive VV streaming framework to address the challenges. CaV3 employs a Long-short term Sequential prediction model (LSTSP) that achieves accurate short-term, mid-term and long-term viewing pattern prediction with a multi-modal fusion model by capturing the viewer's behavior inertia, current attention, and subjective intention. Besides, CaV3 also contains a contextual MAB-based caching adaptation algorithm (CCA) to fully utilize the viewing pattern and solve the optimal caching problem with a proved upper bound regret. Compared to existing VV datasets only containing single or co-located objects, we for the first time collect a comprehensive dataset with sufficient practical unbounded 360° scenes. The extensive evaluation of the dataset confirms the superiority of CaV3, which outperforms the SOTA algorithm by 15.6%-43% in viewport prediction and 13%-40% in system utility.
{"title":"CaV3: Cache-assisted Viewport Adaptive Volumetric Video Streaming","authors":"Junhua Liu, Boxiang Zhu, Fang Wang, Yili Jin, Wenyi Zhang, Zihan Xu, Shuguang Cui","doi":"10.1109/VR55154.2023.00033","DOIUrl":"https://doi.org/10.1109/VR55154.2023.00033","url":null,"abstract":"Volumetric video (VV) recently emerges as a new form of video application providing a photorealistic immersive 3D viewing experience with 6 degree-of-freedom (DoF), which empowers many applications such as VR, AR, and Metaverse. A key problem therein is how to stream the enormous size VV through the network with limited bandwidth. Existing works mostly focused on predicting the viewport for a tiling-based adaptive VV streaming, which however only has quite a limited effect on resource saving. We argue that the content repeatability in the viewport can be further leveraged, and for the first time, propose a client-side cache-assisted strategy that aims to buffer the repeatedly appearing VV tiles in the near future so as to reduce the redundant VV content transmission. The key challenges exist in three aspects, including (1) feature extraction and mining in 6 DoF VV context, (2) accurate long-term viewing pattern estimation and (3) optimal caching scheduling with limited capacity. In this paper, we propose CaV3, an integrated cache-assisted viewport adaptive VV streaming framework to address the challenges. CaV3 employs a Long-short term Sequential prediction model (LSTSP) that achieves accurate short-term, mid-term and long-term viewing pattern prediction with a multi-modal fusion model by capturing the viewer's behavior inertia, current attention, and subjective intention. Besides, CaV3 also contains a contextual MAB-based caching adaptation algorithm (CCA) to fully utilize the viewing pattern and solve the optimal caching problem with a proved upper bound regret. Compared to existing VV datasets only containing single or co-located objects, we for the first time collect a comprehensive dataset with sufficient practical unbounded 360° scenes. The extensive evaluation of the dataset confirms the superiority of CaV3, which outperforms the SOTA algorithm by 15.6%-43% in viewport prediction and 13%-40% in system utility.","PeriodicalId":346767,"journal":{"name":"2023 IEEE Conference Virtual Reality and 3D User Interfaces (VR)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114730817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}