IEEE transactions on visualization and computer graphics最新文献

英文中文

PwP: Permutating with Probability for Efficient Group Selection in VR. PwP：在虚拟现实中通过概率排列实现高效的分组选择。

IEEE transactions on visualization and computer graphics

Pub Date : 2025-03-11 DOI: 10.1109/TVCG.2025.3549560

Jian Wu, Weicheng Zhang, Handong Chen, Wei Lin, Xuehuai Shi, Lili Wang

Group selection in virtual reality is an important means of multi-object selection, which allows users to quickly group multiple objects and can significantly improve the operation efficiency of multiple types of objects. In this paper, we propose a group selection method based on multiple rounds of probability permutation, in which the efficiency of group selection is substantially improved by making the object layout of the next round easier to be batch-selected through interactive selection, object grouping probability computation, and position rearrangement in each round of the selection process. We conducted ablation experiments to determine the algorithm coefficients and validate the effectiveness of the algorithm. In addition, an empirical user study was conducted to evaluate the ability of our method to significantly improve the efficiency of the group selection task in an immersive virtual reality environment. The reduced operations also indirectly reduce the user task load and improve usability.

引用次数: 0

Trust in Virtual Agents: Exploring the Role of Stylization and Voice. 虚拟代理中的信任：探索风格化和声音的作用。

IEEE transactions on visualization and computer graphics

Pub Date : 2025-03-11 DOI: 10.1109/TVCG.2025.3549566

Yang Gao, Yangbin Dai, Guangtao Zhang, Honglei Guo, Fariba Mostajeran, Binge Zheng, Tao Yu

With the continuous advancement of artificial intelligence technology, data-driven methods for reconstructing and animating virtual agents have achieved increasing levels of realism. However, there is limited research on how these novel data-driven methods, combined with voice cues, affect user perceptions. We use advanced data-driven methods to reconstruct stylized agents and combine them with synthesized voices to study their effects on users' trust and other perceptions (e.g. social presence and empathy). Through an experiment with 27 participants, our findings reveal that stylized virtual agents enhance user trust to a degree comparable to real style, while voice has a negligible effect on trust. Additionally, elder agents are more likely to be trusted. The style of the agents also plays a key role in participants' perceived realism, and audio-visual matching significantly enhances perceived empathy. These results provide new insights into designing trustworthy virtual agents and further support and validate the audio-visual integration theory.

引用次数: 0

Editable Mesh Animations Modeling Based on Controlable Particles for Real-Time XR.

IEEE transactions on visualization and computer graphics

Pub Date : 2025-03-11 DOI: 10.1109/TVCG.2025.3549573

Xiangyang Zhou, Yanrui Xu, Chao Yao, Xiaokun Wang, Xiaojuan Ban

The real-time generation of editable mesh animations in XR applications has been a focal point of research in the XR field. However, easily controlling the generated editable meshes remains a significant challenge. Existing methods often suffer from slow generation speeds and suboptimal results, failing to accurately simulate target objects' complex details and shapes, which does not meet user expectations. Additionally, the final generated meshes typically require manual user adjustments, and it is difficult to generate multiple target models simultaneously. To overcome these limitations, a universal control scheme for particles based on the sampling features of the target is proposed. It introduces a spatially adaptive control algorithm for particle coupling by adjusting the magnitude of control forces based on the spatial features of model sampling, thereby eliminating the need for parameter dependency and enabling the control of multiple types of models within the same scene. We further introduce boundary correction techniques to improve the precision in generating target shapes while reducing particle splashing. Moreover, a distance-adaptive particle fragmentation mechanism prevents unnecessary particle accumulation. Experimental results demonstrate that the method has better performance in controlling complex structures and generating multiple targets at the same time compared to existing methods. It enhances control accuracy for complex structures and targets under the condition of sparse model sampling. It also consistently delivers outstanding results while maintaining high stability and efficiency. Ultimately, we were able to create a set of smooth editable meshes and developed a solution for integrating this algorithm into VR and AR animation applications.

{"title":"Editable Mesh Animations Modeling Based on Controlable Particles for Real-Time XR.","authors":"Xiangyang Zhou, Yanrui Xu, Chao Yao, Xiaokun Wang, Xiaojuan Ban","doi":"10.1109/TVCG.2025.3549573","DOIUrl":"10.1109/TVCG.2025.3549573","url":null,"abstract":"The real-time generation of editable mesh animations in XR applications has been a focal point of research in the XR field. However, easily controlling the generated editable meshes remains a significant challenge. Existing methods often suffer from slow generation speeds and suboptimal results, failing to accurately simulate target objects' complex details and shapes, which does not meet user expectations. Additionally, the final generated meshes typically require manual user adjustments, and it is difficult to generate multiple target models simultaneously. To overcome these limitations, a universal control scheme for particles based on the sampling features of the target is proposed. It introduces a spatially adaptive control algorithm for particle coupling by adjusting the magnitude of control forces based on the spatial features of model sampling, thereby eliminating the need for parameter dependency and enabling the control of multiple types of models within the same scene. We further introduce boundary correction techniques to improve the precision in generating target shapes while reducing particle splashing. Moreover, a distance-adaptive particle fragmentation mechanism prevents unnecessary particle accumulation. Experimental results demonstrate that the method has better performance in controlling complex structures and generating multiple targets at the same time compared to existing methods. It enhances control accuracy for complex structures and targets under the condition of sparse model sampling. It also consistently delivers outstanding results while maintaining high stability and efficiency. Ultimately, we were able to create a set of smooth editable meshes and developed a solution for integrating this algorithm into VR and AR animation applications.","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143607527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FovealNet: Advancing AI-Driven Gaze Tracking Solutions for Efficient Foveated Rendering in Virtual Reality.

IEEE transactions on visualization and computer graphics

Pub Date : 2025-03-11 DOI: 10.1109/TVCG.2025.3549577

Wenxuan Liu, Budmonde Duinkharjav, Qi Sun, Sai Qian Zhang

Leveraging real-time eye tracking, foveated rendering optimizes hardware efficiency and enhances visual quality virtual reality (VR). This approach leverages eye-tracking techniques to determine where the user is looking, allowing the system to render high-resolution graphics only in the foveal region-the small area of the retina where visual acuity is highest, while the peripheral view is rendered at lower resolution. However, modern deep learning-based gaze-tracking solutions often exhibit a long-tail distribution of tracking errors, which can degrade user experience and reduce the benefits of foveated rendering by causing misalignment and decreased visual quality. This paper introduces FovealNet, an advanced AI-driven gaze tracking framework designed to optimize system performance by strategically enhancing gaze tracking accuracy. To further reduce the implementation cost of the gaze tracking algorithm, FovealNet employs an event-based cropping method that eliminates over 64.8% of irrelevant pixels from the input image. Additionally, it incorporates a simple yet effective token-pruning strategy that dynamically removes tokens on the fly without compromising tracking accuracy. Finally, to support different runtime rendering configurations, we propose a system performance-aware multi-resolution training strategy, allowing the gaze tracking DNN to adapt and optimize overall system performance more effectively. Evaluation results demonstrate that FovealNet achieves at least 1.42× speed up compared to previous methods and 13% increase in perceptual quality for foveated output. The code is available at https://github.com/wl3181/FovealNet.

{"title":"FovealNet: Advancing AI-Driven Gaze Tracking Solutions for Efficient Foveated Rendering in Virtual Reality.","authors":"Wenxuan Liu, Budmonde Duinkharjav, Qi Sun, Sai Qian Zhang","doi":"10.1109/TVCG.2025.3549577","DOIUrl":"10.1109/TVCG.2025.3549577","url":null,"abstract":"Leveraging real-time eye tracking, foveated rendering optimizes hardware efficiency and enhances visual quality virtual reality (VR). This approach leverages eye-tracking techniques to determine where the user is looking, allowing the system to render high-resolution graphics only in the foveal region-the small area of the retina where visual acuity is highest, while the peripheral view is rendered at lower resolution. However, modern deep learning-based gaze-tracking solutions often exhibit a long-tail distribution of tracking errors, which can degrade user experience and reduce the benefits of foveated rendering by causing misalignment and decreased visual quality. This paper introduces FovealNet, an advanced AI-driven gaze tracking framework designed to optimize system performance by strategically enhancing gaze tracking accuracy. To further reduce the implementation cost of the gaze tracking algorithm, FovealNet employs an event-based cropping method that eliminates over 64.8% of irrelevant pixels from the input image. Additionally, it incorporates a simple yet effective token-pruning strategy that dynamically removes tokens on the fly without compromising tracking accuracy. Finally, to support different runtime rendering configurations, we propose a system performance-aware multi-resolution training strategy, allowing the gaze tracking DNN to adapt and optimize overall system performance more effectively. Evaluation results demonstrate that FovealNet achieves at least 1.42× speed up compared to previous methods and 13% increase in perceptual quality for foveated output. The code is available at https://github.com/wl3181/FovealNet.","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143607565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Examining the Validity of An Endoscopist-patient Co-participative Virtual Reality Method (EPC-VR) in Pain Relief during Colonoscopy.

IEEE transactions on visualization and computer graphics

Pub Date : 2025-03-11 DOI: 10.1109/TVCG.2025.3549874

Yulong Bian, Juan Liu, Yongjiu Lin, Weiying Liu, Yang Zhang, Tangjun Qu, Sheng Li, Zhaojie Pan, Wenming Liu, Wei Huang, Ying Shi

To relieve perceived pain in patients undergoing colonoscopy, we developed an endoscopist-patient co-participative VR tool (EPC-VR) based on A Neurocognitive Model of Attention to Pain. It allows the patient to play a VR game actively and supports the endoscopist in triggering a distraction mechanism to divert the patient's attention away from the medical procedure. We performed a comparative clinical study with 40 patients. Patients' perception of pain and affective responses were evaluated, and the results support the effectiveness of EPC-VR: active VR playing with endoscopists' participation can help relieve the perceived pain and scare of patients undergoing colonoscopy. Finally, 87.5% of patients opt to use the VR application in the next colonoscopy.

引用次数: 0

Comparison of Visual Saliency for Dynamic Point Clouds: Task-free vs. Task-dependent.

IEEE transactions on visualization and computer graphics

Pub Date : 2025-03-11 DOI: 10.1109/TVCG.2025.3549863

Xuemei Zhou, Irene Viola, Silvia Rossi, Pablo Cesar

This paper presents a Task-Free eye-tracking dataset for Dynamic Point Clouds (TF-DPC) aimed at investigating visual attention. The dataset is composed of eye gaze and head movements collected from 24 participants observing 19 scanned dynamic point clouds in a Virtual Reality (VR) environment with 6 degrees of freedom. We compare the visual saliency maps generated from this dataset with those from a prior task-dependent experiment (focused on quality assessment) to explore how high-level tasks influence human visual attention. To measure the similarity between these visual saliency maps, we apply the well-known Pearson correlation coefficient and an adapted version of the Earth Mover's Distance metric, which takes into account both spatial information and the degrees of saliency. Our experimental results provide both qualitative and quantitative insights, revealing significant differences in visual attention due to task influence. This work enhances our understanding of the visual attention for dynamic point cloud (specifically human figures) in VR from gaze and human movement trajectories, and highlights the impact of task-dependent factors, offering valuable guidance for advancing visual saliency models and improving VR perception.

{"title":"Comparison of Visual Saliency for Dynamic Point Clouds: Task-free vs. Task-dependent.","authors":"Xuemei Zhou, Irene Viola, Silvia Rossi, Pablo Cesar","doi":"10.1109/TVCG.2025.3549863","DOIUrl":"10.1109/TVCG.2025.3549863","url":null,"abstract":"This paper presents a Task-Free eye-tracking dataset for Dynamic Point Clouds (TF-DPC) aimed at investigating visual attention. The dataset is composed of eye gaze and head movements collected from 24 participants observing 19 scanned dynamic point clouds in a Virtual Reality (VR) environment with 6 degrees of freedom. We compare the visual saliency maps generated from this dataset with those from a prior task-dependent experiment (focused on quality assessment) to explore how high-level tasks influence human visual attention. To measure the similarity between these visual saliency maps, we apply the well-known Pearson correlation coefficient and an adapted version of the Earth Mover's Distance metric, which takes into account both spatial information and the degrees of saliency. Our experimental results provide both qualitative and quantitative insights, revealing significant differences in visual attention due to task influence. This work enhances our understanding of the visual attention for dynamic point cloud (specifically human figures) in VR from gaze and human movement trajectories, and highlights the impact of task-dependent factors, offering valuable guidance for advancing visual saliency models and improving VR perception.","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143607524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Impact of Avatar Retargeting on Pointing and Conversational Communication.

IEEE transactions on visualization and computer graphics

Pub Date : 2025-03-11 DOI: 10.1109/TVCG.2025.3549171

Simbarashe Nyatsanga, Doug Roble, Michael Neff

One of the pleasures of interacting using avatars in VR is being able to play a character very different to yourself. As the scale of characters change relative to a user, there is a need to retarget user motions onto the character, generally maintaining either the user's pose or the position of their wrists and ankles. This retargeting can impact both the functional and social information conveyed by the avatar. Focused on 3rd-person (observed) avatars, this paper presents three studies on these varied aspects of communication. It establishes a baseline for near-field avatar pointing, showing an accuracy of about 5cm. This can be maintained using positional hand constraints, but increases if the user's pose is directly transferred to the character. It is possible to maintain this accuracy with a Semantic Inverse Kinematics formulation that brings the avatar closer to the user's actual pose, but compensates by adjusting the finger pointing direction. Similar results are shown for conveying spatial information, namely object size. The choice of pose or position based retargeting leads to a small change in the perception of avatar personality, indicating an impact on social communication. This effect was not observed in a task where the users' cognitive load was otherwise high, so may be task dependent. It could also become more pronounced for more extreme proportion changes.

{"title":"The Impact of Avatar Retargeting on Pointing and Conversational Communication.","authors":"Simbarashe Nyatsanga, Doug Roble, Michael Neff","doi":"10.1109/TVCG.2025.3549171","DOIUrl":"10.1109/TVCG.2025.3549171","url":null,"abstract":"One of the pleasures of interacting using avatars in VR is being able to play a character very different to yourself. As the scale of characters change relative to a user, there is a need to retarget user motions onto the character, generally maintaining either the user's pose or the position of their wrists and ankles. This retargeting can impact both the functional and social information conveyed by the avatar. Focused on 3rd-person (observed) avatars, this paper presents three studies on these varied aspects of communication. It establishes a baseline for near-field avatar pointing, showing an accuracy of about 5cm. This can be maintained using positional hand constraints, but increases if the user's pose is directly transferred to the character. It is possible to maintain this accuracy with a Semantic Inverse Kinematics formulation that brings the avatar closer to the user's actual pose, but compensates by adjusting the finger pointing direction. Similar results are shown for conveying spatial information, namely object size. The choice of pose or position based retargeting leads to a small change in the perception of avatar personality, indicating an impact on social communication. This effect was not observed in a task where the users' cognitive load was otherwise high, so may be task dependent. It could also become more pronounced for more extreme proportion changes.","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143607558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing Consumer Insights through VR Metaphor Elicitation.

IEEE transactions on visualization and computer graphics

Pub Date : 2025-03-11 DOI: 10.1109/TVCG.2025.3549905

Sai Priya Jyothula, Andrew E Johnson

In consumer research, understanding consumer behavior and experiences is vital for making informed decisions about product design, innovation and marketing. Zaltman's Metaphor Elicitation Technique (ZMET) leverages metaphors and non-verbal communication to uncover and gain deeper insights into consumers' thoughts and emotions. This paper introduces a novel system that enables consumer researchers (interviewers) to perform a modified version of metaphor elicitation interviews in virtual reality (VR). Consumers (participants) use 3D objects in the virtual environment to express their thoughts and emotions, instead of pictures conventionally used in a ZMET interview. The system features an asymmetric setup where the participant is immersed in VR using a head-mounted display (HMD), while the interviewer views the participant's perspective on a monitor. We discuss the technical and design aspects of the VR system and present the results of a user study (N = 17) that we conducted to validate the effectiveness of performing the metaphor elicitation interviews in VR. This work also explores the experiences of both participants and interviewers during the interview sessions, aiming to identify potential improvements. The qualitative and quantitative analysis of the data demonstrated how immersion, presence and embodied interaction positively affect and aid in sense-making and deeper expression of the participants' thoughts, perspectives and emotions.

{"title":"Enhancing Consumer Insights through VR Metaphor Elicitation.","authors":"Sai Priya Jyothula, Andrew E Johnson","doi":"10.1109/TVCG.2025.3549905","DOIUrl":"10.1109/TVCG.2025.3549905","url":null,"abstract":"In consumer research, understanding consumer behavior and experiences is vital for making informed decisions about product design, innovation and marketing. Zaltman's Metaphor Elicitation Technique (ZMET) leverages metaphors and non-verbal communication to uncover and gain deeper insights into consumers' thoughts and emotions. This paper introduces a novel system that enables consumer researchers (interviewers) to perform a modified version of metaphor elicitation interviews in virtual reality (VR). Consumers (participants) use 3D objects in the virtual environment to express their thoughts and emotions, instead of pictures conventionally used in a ZMET interview. The system features an asymmetric setup where the participant is immersed in VR using a head-mounted display (HMD), while the interviewer views the participant's perspective on a monitor. We discuss the technical and design aspects of the VR system and present the results of a user study (N = 17) that we conducted to validate the effectiveness of performing the metaphor elicitation interviews in VR. This work also explores the experiences of both participants and interviewers during the interview sessions, aiming to identify potential improvements. The qualitative and quantitative analysis of the data demonstrated how immersion, presence and embodied interaction positively affect and aid in sense-making and deeper expression of the participants' thoughts, perspectives and emotions.","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143607493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Beyond Mute and Block: Adoption and Effectiveness of Safety Tools in Social VR, from Ubiquitous Harassment to Social Sculpting. 超越静音和屏蔽：社交 VR 中安全工具的采用和有效性，从无处不在的骚扰到社交雕塑。

IEEE transactions on visualization and computer graphics

Pub Date : 2025-03-11 DOI: 10.1109/TVCG.2025.3549860

Maheshya Weerasinghe, Shaun Macdonald, Cristina Fiani, Joseph O'Hagan, Mathieu Chollet, Mark McGill, Mohamed Khamis

Harassment in Social Virtual Reality (SVR) is a growing concern. The current SVR landscape features inconsistent access to non-standardised safety features, with minimal empirical evidence on their real-world effectiveness, usage and impact. We examine the use and effectiveness of safety tools across 12 popular SVR platforms by surveying 100 users about their experiences of different types of harassment and their use of features like muting, blocking, personal spaces and safety gestures. While harassment remained common-including hate speech, virtual stalking, and physical harassment-many find safety features insufficient or inconsistently applied. Reactive tools like muting and blocking are widely used, largely driven by users' familiarity from other platforms. Safety tools are also used to proactively curate individual virtual experiences, protecting users from harassment, but inadvertently leading to fragmented social spaces. We advocate for standardising proactive, rather than reactive, anti-harassment tools across platforms, and present insights into future safety feature development.

{"title":"Beyond Mute and Block: Adoption and Effectiveness of Safety Tools in Social VR, from Ubiquitous Harassment to Social Sculpting.","authors":"Maheshya Weerasinghe, Shaun Macdonald, Cristina Fiani, Joseph O'Hagan, Mathieu Chollet, Mark McGill, Mohamed Khamis","doi":"10.1109/TVCG.2025.3549860","DOIUrl":"10.1109/TVCG.2025.3549860","url":null,"abstract":"Harassment in Social Virtual Reality (SVR) is a growing concern. The current SVR landscape features inconsistent access to non-standardised safety features, with minimal empirical evidence on their real-world effectiveness, usage and impact. We examine the use and effectiveness of safety tools across 12 popular SVR platforms by surveying 100 users about their experiences of different types of harassment and their use of features like muting, blocking, personal spaces and safety gestures. While harassment remained common-including hate speech, virtual stalking, and physical harassment-many find safety features insufficient or inconsistently applied. Reactive tools like muting and blocking are widely used, largely driven by users' familiarity from other platforms. Safety tools are also used to proactively curate individual virtual experiences, protecting users from harassment, but inadvertently leading to fragmented social spaces. We advocate for standardising proactive, rather than reactive, anti-harassment tools across platforms, and present insights into future safety feature development.","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143607519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ESIQA: Perceptual Quality Assessment of Vision-Pro-based Egocentric Spatial Images.

IEEE transactions on visualization and computer graphics

Pub Date : 2025-03-11 DOI: 10.1109/TVCG.2025.3549174

Xilei Zhu, Liu Yang, Huiyu Duan, Xiongkuo Min, Guangtao Zhai, Patrick Le Callet

With the development of eXtended Reality (XR), photo capturing and display technology based on head-mounted displays (HMDs) have experienced significant advancements and gained considerable attention. Egocentric spatial images and videos are emerging as a compelling form of stereoscopic XR content. The assessment for the Quality of Experience (QoE) of XR content is important to ensure a high-quality viewing experience. Different from traditional 2D images, egocentric spatial images present challenges for perceptual quality assessment due to their special shooting, processing methods, and stereoscopic characteristics However, the corresponding image quality assessment (IQA) research for egocentric spatial images is still lacking. In this paper, we establish the Egocentric Spatial Images Quality Assessment Database (ESIQAD), the first IQA database dedicated for egocentric spatial images as far as we know. Our ESIQAD includes 500 egocentric spatial images and the corresponding mean opinion scores (MOSs) under three display modes, including 2D display, 3D-window display, and 3D-immersive display. Based on our ESIQAD, we propose a novel mamba2-based multi-stage feature fusion model, termed ESIQAnet, which predicts the perceptual quality of egocentric spatial images under the three display modes. Specifically, we first extract features from multiple visual state space duality (VSSD) blocks, then apply cross attention to fuse binocular view information and use transposed attention to further refine the features. The multi-stage features are finally concatenated and fed into a quality regression network to predict the quality score. Extensive experimental results demonstrate that the ESIQAnet outperforms 22 state-of-the-art IQA models on the ESIQAD under all three display modes. The database and code are available at https://github.com/IntMeGroup/ESIQA.

{"title":"ESIQA: Perceptual Quality Assessment of Vision-Pro-based Egocentric Spatial Images.","authors":"Xilei Zhu, Liu Yang, Huiyu Duan, Xiongkuo Min, Guangtao Zhai, Patrick Le Callet","doi":"10.1109/TVCG.2025.3549174","DOIUrl":"10.1109/TVCG.2025.3549174","url":null,"abstract":"With the development of eXtended Reality (XR), photo capturing and display technology based on head-mounted displays (HMDs) have experienced significant advancements and gained considerable attention. Egocentric spatial images and videos are emerging as a compelling form of stereoscopic XR content. The assessment for the Quality of Experience (QoE) of XR content is important to ensure a high-quality viewing experience. Different from traditional 2D images, egocentric spatial images present challenges for perceptual quality assessment due to their special shooting, processing methods, and stereoscopic characteristics However, the corresponding image quality assessment (IQA) research for egocentric spatial images is still lacking. In this paper, we establish the Egocentric Spatial Images Quality Assessment Database (ESIQAD), the first IQA database dedicated for egocentric spatial images as far as we know. Our ESIQAD includes 500 egocentric spatial images and the corresponding mean opinion scores (MOSs) under three display modes, including 2D display, 3D-window display, and 3D-immersive display. Based on our ESIQAD, we propose a novel mamba2-based multi-stage feature fusion model, termed ESIQAnet, which predicts the perceptual quality of egocentric spatial images under the three display modes. Specifically, we first extract features from multiple visual state space duality (VSSD) blocks, then apply cross attention to fuse binocular view information and use transposed attention to further refine the features. The multi-stage features are finally concatenated and fed into a quality regression network to predict the quality score. Extensive experimental results demonstrate that the ESIQAnet outperforms 22 state-of-the-art IQA models on the ESIQAD under all three display modes. The database and code are available at https://github.com/IntMeGroup/ESIQA.","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143607496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

IEEE transactions on visualization and computer graphics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀