首页 > 最新文献

Computer Animation and Virtual Worlds最新文献

英文 中文
Text-Driven High-Quality 3D Human Generation via Variational Gradient Estimation and Latent Reward Models 文本驱动的高质量3D人类生成通过变分梯度估计和潜在奖励模型
IF 1.7 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2026-01-08 DOI: 10.1002/cav.70089
Pengfei Zhou, Xukun Shen, Yong Hu

Recent advances in Score Distillation Sampling (SDS) have enabled text-driven 3D human generation, yet the standard classifier-free guidance (CFG) framework struggles with semantic misalignment and texture oversaturation due to limited model capacity. We propose a novel framework that decouples conditional and unconditional guidance via a dual-model strategy: A pretrained diffusion model ensures geometric stability, while a preference-tuned latent reward model enhances semantic fidelity. To further refine noise estimation, we introduce a lightweight U-shaped Swin Transformer (U-Swin) that regularizes predicted noise against the reward model, reducing gradient bias and local artifacts. Additionally, we design a time-varying noise weighting mechanism to dynamically balance the two guidance signals during denoising, improving stability and texture realism. Extensive experiments show that our method significantly improves alignment with textual descriptions, enhances texture details, and outperforms state-of-the-art baselines in both visual quality and semantic consistency.

分数蒸馏采样(SDS)的最新进展使文本驱动的3D人体生成成为可能,但由于模型容量有限,标准的无分类器指导(CFG)框架存在语义不对齐和纹理过饱和的问题。我们提出了一个新的框架,通过双模型策略来解耦条件和无条件引导:预训练的扩散模型确保几何稳定性,而偏好调整的潜在奖励模型增强语义保真度。为了进一步改进噪声估计,我们引入了一个轻量级的u形Swin变压器(U-Swin),它根据奖励模型对预测的噪声进行正则化,减少梯度偏差和局部伪像。此外,我们设计了一种时变的噪声加权机制,在去噪过程中动态平衡两个制导信号,提高了稳定性和纹理真实感。大量实验表明,我们的方法显著改善了与文本描述的对齐,增强了纹理细节,并且在视觉质量和语义一致性方面优于最先进的基线。
{"title":"Text-Driven High-Quality 3D Human Generation via Variational Gradient Estimation and Latent Reward Models","authors":"Pengfei Zhou,&nbsp;Xukun Shen,&nbsp;Yong Hu","doi":"10.1002/cav.70089","DOIUrl":"https://doi.org/10.1002/cav.70089","url":null,"abstract":"<div>\u0000 \u0000 <p>Recent advances in Score Distillation Sampling (SDS) have enabled text-driven 3D human generation, yet the standard classifier-free guidance (CFG) framework struggles with semantic misalignment and texture oversaturation due to limited model capacity. We propose a novel framework that decouples conditional and unconditional guidance via a dual-model strategy: A pretrained diffusion model ensures geometric stability, while a preference-tuned latent reward model enhances semantic fidelity. To further refine noise estimation, we introduce a lightweight U-shaped Swin Transformer (U-Swin) that regularizes predicted noise against the reward model, reducing gradient bias and local artifacts. Additionally, we design a time-varying noise weighting mechanism to dynamically balance the two guidance signals during denoising, improving stability and texture realism. Extensive experiments show that our method significantly improves alignment with textual descriptions, enhances texture details, and outperforms state-of-the-art baselines in both visual quality and semantic consistency.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"37 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145963945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Environmental Design Elements in Library Spaces: A Virtual Reality Study of Psychophysiological Responses to Color, Material, and Lighting in Built Environments 图书馆空间中的环境设计元素:对建筑环境中色彩、材料和照明的心理生理反应的虚拟现实研究
IF 1.7 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2026-01-07 DOI: 10.1002/cav.70092
Mengyan Lin, Ning Li

This study explores how environmental design elements in library spaces influence human psychophysiological responses using virtual reality (VR). Thirty participants experienced VR simulations of library reading areas, with variations in wall color, flooring material, and lighting intensity, whereas electroencephalography (EEG) and galvanic skin response (GSR) measured physiological reactions alongside subjective ratings. Moderate lighting (20,000–30,000 cd) minimized arousal and supported attention, while white walls enhanced relaxation via increased alpha brain activity. Green plant walls slightly boosted attention-related beta activity, and wood flooring was rated highest for comfort and naturalness. VR enabled precise control of design variables, advancing environmental psychology research. These findings offer evidence-based guidelines for designing public spaces like libraries to enhance user well-being and cognitive performance, with implications for educational and public buildings.

本研究利用虚拟现实技术探讨了图书馆空间中的环境设计元素如何影响人的心理生理反应。30名参与者体验了图书馆阅读区域的VR模拟,墙壁颜色、地板材料和照明强度都有所不同,而脑电图(EEG)和皮肤电反应(GSR)则测量了生理反应和主观评分。适度的照明(20,000-30,000 cd)最大限度地减少唤醒和支持注意力,而白墙通过增加α脑活动来增强放松。绿色植物墙略微提高了与注意力相关的β活动,木地板在舒适度和自然性方面被评为最高。VR能够精确控制设计变量,推进环境心理学研究。这些发现为图书馆等公共空间的设计提供了基于证据的指导,以提高用户的幸福感和认知能力,对教育和公共建筑也有启示。
{"title":"Environmental Design Elements in Library Spaces: A Virtual Reality Study of Psychophysiological Responses to Color, Material, and Lighting in Built Environments","authors":"Mengyan Lin,&nbsp;Ning Li","doi":"10.1002/cav.70092","DOIUrl":"https://doi.org/10.1002/cav.70092","url":null,"abstract":"<div>\u0000 \u0000 <p>This study explores how environmental design elements in library spaces influence human psychophysiological responses using virtual reality (VR). Thirty participants experienced VR simulations of library reading areas, with variations in wall color, flooring material, and lighting intensity, whereas electroencephalography (EEG) and galvanic skin response (GSR) measured physiological reactions alongside subjective ratings. Moderate lighting (20,000–30,000 cd) minimized arousal and supported attention, while white walls enhanced relaxation via increased alpha brain activity. Green plant walls slightly boosted attention-related beta activity, and wood flooring was rated highest for comfort and naturalness. VR enabled precise control of design variables, advancing environmental psychology research. These findings offer evidence-based guidelines for designing public spaces like libraries to enhance user well-being and cognitive performance, with implications for educational and public buildings.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"37 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145930986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Innovating 3D Object Generation for the Metaverse Through Speech Input 通过语音输入为虚拟世界创新3D对象生成
IF 1.7 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-12-18 DOI: 10.1002/cav.70088
An Chao Tsai, Pierre Dave Victor Katuhe

This research introduces a novel approach to generate three-dimensional objects using human voice as input for a machine learning model. The spoken input is first converted into text using a Google API, which then guides the creation of the desired three-dimensional object. This approach is particularly beneficial for individuals without design expertise, paving the way for broader participation in the evolving Metaverse—a virtual reality that transcends our physical realm. The three-dimensional object generation model comprises three primary components: Neural Radiance Fields (NeRF), Low-Rank Adaptation (LoRA), and Stable Diffusion. When combined, these components facilitate the creation of a diverse range of three-dimensional objects. Our method presents an innovative approach to harnessing speech recognition for generating three-dimensional objects within the Metaverse, while demonstrating competitive quality and practical efficiency.

本研究介绍了一种使用人类声音作为机器学习模型输入来生成三维物体的新方法。首先使用谷歌API将语音输入转换为文本,然后指导所需三维对象的创建。这种方法对没有设计专业知识的个人特别有益,为更广泛地参与不断发展的虚拟世界铺平了道路——虚拟现实超越了我们的物理领域。三维目标生成模型包括三个主要组成部分:神经辐射场(NeRF)、低秩自适应(LoRA)和稳定扩散(Stable Diffusion)。当组合在一起时,这些组件有助于创建各种各样的三维物体。我们的方法提出了一种创新的方法,利用语音识别在虚拟世界中生成三维物体,同时展示了具有竞争力的质量和实用效率。
{"title":"Innovating 3D Object Generation for the Metaverse Through Speech Input","authors":"An Chao Tsai,&nbsp;Pierre Dave Victor Katuhe","doi":"10.1002/cav.70088","DOIUrl":"https://doi.org/10.1002/cav.70088","url":null,"abstract":"<div>\u0000 \u0000 <p>This research introduces a novel approach to generate three-dimensional objects using human voice as input for a machine learning model. The spoken input is first converted into text using a Google API, which then guides the creation of the desired three-dimensional object. This approach is particularly beneficial for individuals without design expertise, paving the way for broader participation in the evolving Metaverse—a virtual reality that transcends our physical realm. The three-dimensional object generation model comprises three primary components: Neural Radiance Fields (NeRF), Low-Rank Adaptation (LoRA), and Stable Diffusion. When combined, these components facilitate the creation of a diverse range of three-dimensional objects. Our method presents an innovative approach to harnessing speech recognition for generating three-dimensional objects within the Metaverse, while demonstrating competitive quality and practical efficiency.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 6","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145824749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic Translational Gains Manipulation for Tiny Object Interaction 微小对象交互的动态平移增益操纵
IF 1.7 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-12-16 DOI: 10.1002/cav.70086
Jiahui Dong, Tansi Zhang, Shengyang Luo, Christos Mousas, Yingjie Chen

Interacting with small objects in virtual reality (VR) can be challenging due to the physical limitations of controllers and headsets, which often lead to unintended collisions and tracking loss when devices come too close, thereby disrupting the user's immersive experience. While researchers have developed techniques like translational gain, hand remapping, and specialized interaction to address these challenges, these approaches are often task-specific or insufficient for precise, detailed interactions or observations. To address these challenges, in this paper, we introduce a novel interaction technique called dynamic translational gains manipulation (DTGM), which adjusts scaling in real-time based on the user's proximity to objects. We conducted a user study to evaluate the effectiveness of improving precision during object manipulation and understanding the subjective mental workload of the proposed DTGM technique. Our results revealed that the DTGM technique improved interaction efficiency, making it suitable for various VR applications where precision and space optimization are crucial.

由于控制器和耳机的物理限制,在虚拟现实(VR)中与小物体进行交互可能具有挑战性,当设备离得太近时,往往会导致意外碰撞和跟踪丢失,从而破坏用户的沉浸式体验。虽然研究人员已经开发了诸如平移增益、手动重绘和专门交互等技术来应对这些挑战,但这些方法通常是针对特定任务的,或者不足以进行精确、详细的交互或观察。为了解决这些挑战,在本文中,我们引入了一种新的交互技术,称为动态平移增益操作(DTGM),它根据用户与对象的接近程度实时调整缩放。我们进行了一项用户研究,以评估在物体操作过程中提高精度的有效性,并了解所提出的DTGM技术的主观心理工作量。我们的研究结果表明,DTGM技术提高了交互效率,使其适用于精度和空间优化至关重要的各种VR应用。
{"title":"Dynamic Translational Gains Manipulation for Tiny Object Interaction","authors":"Jiahui Dong,&nbsp;Tansi Zhang,&nbsp;Shengyang Luo,&nbsp;Christos Mousas,&nbsp;Yingjie Chen","doi":"10.1002/cav.70086","DOIUrl":"https://doi.org/10.1002/cav.70086","url":null,"abstract":"<p>Interacting with small objects in virtual reality (VR) can be challenging due to the physical limitations of controllers and headsets, which often lead to unintended collisions and tracking loss when devices come too close, thereby disrupting the user's immersive experience. While researchers have developed techniques like translational gain, hand remapping, and specialized interaction to address these challenges, these approaches are often task-specific or insufficient for precise, detailed interactions or observations. To address these challenges, in this paper, we introduce a novel interaction technique called dynamic translational gains manipulation (DTGM), which adjusts scaling in real-time based on the user's proximity to objects. We conducted a user study to evaluate the effectiveness of improving precision during object manipulation and understanding the subjective mental workload of the proposed DTGM technique. Our results revealed that the DTGM technique improved interaction efficiency, making it suitable for various VR applications where precision and space optimization are crucial.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 6","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cav.70086","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145852629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SD-T2LM: Long-Sequence Human Motion Generation Based on Dynamic SLERP Interpolation and Dynamic Mask Mechanism SD-T2LM:基于动态SLERP插值和动态掩模机制的长序列人体运动生成
IF 1.7 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-12-02 DOI: 10.1002/cav.70078
Fengsen Jin, Guang Li, Jianjun Li, Chongyang Ding

Text-driven human motion generation aims to synthesize coherent, natural human actions from natural-language descriptions. While current techniques are effective for short sequences, they often suffer from abrupt transitions, detail loss, and temporal misalignment when extended to long sequences. We present SD-T2LM, a unified framework that couples transition optimization, detail enhancement, and temporal alignment to produce natural, detail-rich long motion sequences. First, a dynamic SLERP interpolation selects transition length conditioned on start/end pose similarity to suppress discontinuities while preserving physical plausibility. Second, a dynamic masking mechanism progressively reduces the mask ratio from 60% to 30% during training, enabling staged learning-global structure early, fine-grained details later. Third, a Temporal Flexible Alignment loss based on Soft-DTW aligns generated and ground-truth rhythms without enforcing point-wise synchrony. On the HumanML3D dataset and KIT-ML dataset, SD-T2LM showed significant improvements in FID and diversity compared to the baseline model, while reducing transition distance and jerk.

文本驱动的人类动作生成旨在从自然语言描述中合成连贯、自然的人类动作。虽然目前的技术对短序列是有效的,但当扩展到长序列时,它们经常遭受突变,细节丢失和时间错位。我们提出了SD-T2LM,这是一个统一的框架,结合了过渡优化,细节增强和时间对齐,以产生自然的,细节丰富的长运动序列。首先,动态SLERP插值选择以起始/结束位姿相似性为条件的过渡长度,在保持物理合理性的同时抑制不连续。其次,动态掩蔽机制在训练过程中逐步将掩蔽率从60%降低到30%,从而实现早期的分阶段学习-全局结构,后期的细粒度细节。第三,基于Soft-DTW的时间柔性对齐损失在不强制逐点同步的情况下对齐生成的和真实的节奏。在HumanML3D数据集和KIT-ML数据集上,与基线模型相比,SD-T2LM在FID和多样性方面表现出显著改善,同时减少了过渡距离和抖动。
{"title":"SD-T2LM: Long-Sequence Human Motion Generation Based on Dynamic SLERP Interpolation and Dynamic Mask Mechanism","authors":"Fengsen Jin,&nbsp;Guang Li,&nbsp;Jianjun Li,&nbsp;Chongyang Ding","doi":"10.1002/cav.70078","DOIUrl":"https://doi.org/10.1002/cav.70078","url":null,"abstract":"<div>\u0000 \u0000 <p>Text-driven human motion generation aims to synthesize coherent, natural human actions from natural-language descriptions. While current techniques are effective for short sequences, they often suffer from abrupt transitions, detail loss, and temporal misalignment when extended to long sequences. We present SD-T2LM, a unified framework that couples transition optimization, detail enhancement, and temporal alignment to produce natural, detail-rich long motion sequences. First, a dynamic SLERP interpolation selects transition length conditioned on start/end pose similarity to suppress discontinuities while preserving physical plausibility. Second, a dynamic masking mechanism progressively reduces the mask ratio from 60% to 30% during training, enabling staged learning-global structure early, fine-grained details later. Third, a Temporal Flexible Alignment loss based on Soft-DTW aligns generated and ground-truth rhythms without enforcing point-wise synchrony. On the HumanML3D dataset and KIT-ML dataset, SD-T2LM showed significant improvements in FID and diversity compared to the baseline model, while reducing transition distance and jerk.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 6","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145686105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating Torso Direction Based on a Head-Mounted Display and Its Controllers for Steering During Virtual Locomotion 虚拟运动中基于头戴式显示器及其控制器的躯干方向估计
IF 1.7 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-11-30 DOI: 10.1002/cav.70087
Jingbo Zhao, Mengyang Ding, Zihao Guo, Mingjun Shao

Reliable estimation of torso direction is essential for virtual locomotion as it controls the locomotion direction in virtual environments. Existing approaches rely on external sensors to track torso direction or use the poses and/or movements from the head, hands, or feet to control the direction of travel. In addition, deep learning methods based on a head-mounted display (HMD) and its controllers were proposed for torso direction estimation but are unable to produce continuous estimates or that the estimation errors were not systematically evaluated. On the other hand, 3D full-body human motion generation methods based on an HMD and its controllers are capable of estimating torso direction but this capability was largely ignored for its use in virtual steering. We proposed a new method called TCNPoser that can perform torso direction estimation and generate 3D full-body human motion. Through offline evaluation and a user study, we show that the proposed method achieves the state-of-the-art (SOTA) performance for real-time torso direction estimation.

在虚拟环境中,躯干方向的可靠估计是虚拟运动的关键,它控制着虚拟环境中的运动方向。现有的方法依赖于外部传感器来跟踪躯干方向,或者使用头部、手或脚的姿势和/或运动来控制行进方向。此外,提出了基于头戴式显示器(HMD)及其控制器的深度学习方法用于躯干方向估计,但无法产生连续估计或无法系统评估估计误差。另一方面,基于HMD及其控制器的3D全身人体运动生成方法能够估计躯干方向,但在虚拟转向中,这种能力在很大程度上被忽略了。我们提出了一种新的TCNPoser方法,可以进行躯干方向估计并生成三维全身人体运动。通过离线评估和用户研究,我们表明该方法达到了实时躯干方向估计的最先进(SOTA)性能。
{"title":"Estimating Torso Direction Based on a Head-Mounted Display and Its Controllers for Steering During Virtual Locomotion","authors":"Jingbo Zhao,&nbsp;Mengyang Ding,&nbsp;Zihao Guo,&nbsp;Mingjun Shao","doi":"10.1002/cav.70087","DOIUrl":"https://doi.org/10.1002/cav.70087","url":null,"abstract":"<div>\u0000 \u0000 <p>Reliable estimation of torso direction is essential for virtual locomotion as it controls the locomotion direction in virtual environments. Existing approaches rely on external sensors to track torso direction or use the poses and/or movements from the head, hands, or feet to control the direction of travel. In addition, deep learning methods based on a head-mounted display (HMD) and its controllers were proposed for torso direction estimation but are unable to produce continuous estimates or that the estimation errors were not systematically evaluated. On the other hand, 3D full-body human motion generation methods based on an HMD and its controllers are capable of estimating torso direction but this capability was largely ignored for its use in virtual steering. We proposed a new method called TCNPoser that can perform torso direction estimation and generate 3D full-body human motion. Through offline evaluation and a user study, we show that the proposed method achieves the state-of-the-art (SOTA) performance for real-time torso direction estimation.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 6","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145686401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bridging Real and Virtual: Augmented Reality as a Catalyst for Creative Learning in Beverage Preparation Courses 连接真实与虚拟:增强现实作为饮料制备课程创造性学习的催化剂
IF 1.7 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-11-30 DOI: 10.1002/cav.70083
Yen-Cheng Chen

The COVID-19 pandemic has profoundly disrupted traditional face-to-face education, accelerating the shift towards digital learning environments and prompting the adoption of innovative pedagogical approaches. Among these, Augmented Reality (AR) has emerged as a transformative educational tool, offering immersive and interactive platforms that enhance students' understanding of complex concepts. This study investigates the integration of AR technology into university-level beverage preparation courses, with the aim of fostering creativity and strengthening professional competencies. The findings indicate that AR has considerable potential to enhance creative performance, reshape pedagogical practices, and promote self-directed learning. By addressing prevalent learning challenges and providing actionable insights into students' developmental progress, AR supports the design of tailored teaching strategies while cultivating industry-relevant skills. This research highlights the role of AR in advancing inclusive, equitable, and high-quality education, in alignment with Sustainable Development Goal 4 (SDG 4), and underscores its broader significance in redefining educational paradigms in the post-pandemic era.

2019冠状病毒病大流行深刻扰乱了传统的面对面教育,加速了向数字化学习环境的转变,并促使采用创新的教学方法。其中,增强现实(AR)已经成为一种变革性的教育工具,提供身临其境的互动平台,增强学生对复杂概念的理解。本研究旨在探讨如何将AR技术整合到大学水平的饮料配制课程中,以培养学生的创造力和专业能力。研究结果表明,增强现实在提高创造性表现、重塑教学实践和促进自主学习方面具有相当大的潜力。通过解决普遍存在的学习挑战,并为学生的发展进步提供可操作的见解,AR支持量身定制教学策略的设计,同时培养行业相关技能。本研究强调了增强现实在根据可持续发展目标4推进包容、公平和高质量教育方面的作用,并强调了其在大流行后时代重新定义教育范式方面的更广泛意义。
{"title":"Bridging Real and Virtual: Augmented Reality as a Catalyst for Creative Learning in Beverage Preparation Courses","authors":"Yen-Cheng Chen","doi":"10.1002/cav.70083","DOIUrl":"https://doi.org/10.1002/cav.70083","url":null,"abstract":"<div>\u0000 \u0000 <p>The COVID-19 pandemic has profoundly disrupted traditional face-to-face education, accelerating the shift towards digital learning environments and prompting the adoption of innovative pedagogical approaches. Among these, Augmented Reality (AR) has emerged as a transformative educational tool, offering immersive and interactive platforms that enhance students' understanding of complex concepts. This study investigates the integration of AR technology into university-level beverage preparation courses, with the aim of fostering creativity and strengthening professional competencies. The findings indicate that AR has considerable potential to enhance creative performance, reshape pedagogical practices, and promote self-directed learning. By addressing prevalent learning challenges and providing actionable insights into students' developmental progress, AR supports the design of tailored teaching strategies while cultivating industry-relevant skills. This research highlights the role of AR in advancing inclusive, equitable, and high-quality education, in alignment with Sustainable Development Goal 4 (SDG 4), and underscores its broader significance in redefining educational paradigms in the post-pandemic era.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 6","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145686400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Convergence of Deep Learning and the Metaverse: A Multidisciplinary Survey of Current Research and Future Directions 深度学习和元宇宙的融合:当前研究和未来方向的多学科调查
IF 1.7 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-11-25 DOI: 10.1002/cav.70082
Jothi Prakash Venugopal, Arul Antran Vijay Subramanian, Gopikrishnan Sundaram, Mahendhiran Ponnambalam Devadoss

The convergence of deep learning and the Metaverse represents a pivotal frontier in the evolution of intelligent digital ecosystems. This paper presents a comprehensive survey of how deep learning techniques spanning convolutional, generative, transformer-based, and reinforcement architectures collectively enable perception, creation, cognition, and governance within immersive virtual worlds. Building upon this synthesis, we propose the Deep Learning-Empowered Metaverse Intelligence (DL-MI) framework, which unifies sensory intelligence, generative world-building, adaptive reasoning, and ethical-social governance into a cohesive architecture. The study illustrates how deep learning facilitates realistic avatar synthesis, dynamic environmental rendering, emotion-aware interaction, and predictive personalization, thereby transforming the Metaverse from reactive systems to anticipatory, self-evolving spaces. Key challenges such as data privacy, algorithmic bias, and computational sustainability are critically examined alongside emerging paradigms, including quantum-augmented AI and federated collaboration. By integrating technical, ethical, and societal dimensions, this survey provides a structured foundation for developing scalable, transparent, and human-centered Metaverse intelligence.

深度学习和虚拟世界的融合代表了智能数字生态系统进化的关键前沿。本文全面介绍了深度学习技术如何跨越卷积、生成、基于转换器和强化架构,共同实现沉浸式虚拟世界中的感知、创造、认知和治理。在这种综合的基础上,我们提出了深度学习授权的元宇宙智能(DL-MI)框架,它将感官智能、生成世界构建、自适应推理和伦理社会治理统一到一个有凝聚力的架构中。该研究说明了深度学习如何促进现实化身合成、动态环境渲染、情感感知交互和预测个性化,从而将虚拟世界从反应系统转变为预期的、自我进化的空间。数据隐私、算法偏差和计算可持续性等关键挑战与新兴范式(包括量子增强人工智能和联合协作)一起进行了严格审查。通过整合技术、伦理和社会维度,该调查为开发可扩展、透明和以人为中心的元宇宙智能提供了一个结构化的基础。
{"title":"The Convergence of Deep Learning and the Metaverse: A Multidisciplinary Survey of Current Research and Future Directions","authors":"Jothi Prakash Venugopal,&nbsp;Arul Antran Vijay Subramanian,&nbsp;Gopikrishnan Sundaram,&nbsp;Mahendhiran Ponnambalam Devadoss","doi":"10.1002/cav.70082","DOIUrl":"https://doi.org/10.1002/cav.70082","url":null,"abstract":"<div>\u0000 \u0000 <p>The convergence of deep learning and the Metaverse represents a pivotal frontier in the evolution of intelligent digital ecosystems. This paper presents a comprehensive survey of how deep learning techniques spanning convolutional, generative, transformer-based, and reinforcement architectures collectively enable perception, creation, cognition, and governance within immersive virtual worlds. Building upon this synthesis, we propose the Deep Learning-Empowered Metaverse Intelligence (DL-MI) framework, which unifies sensory intelligence, generative world-building, adaptive reasoning, and ethical-social governance into a cohesive architecture. The study illustrates how deep learning facilitates realistic avatar synthesis, dynamic environmental rendering, emotion-aware interaction, and predictive personalization, thereby transforming the Metaverse from reactive systems to anticipatory, self-evolving spaces. Key challenges such as data privacy, algorithmic bias, and computational sustainability are critically examined alongside emerging paradigms, including quantum-augmented AI and federated collaboration. By integrating technical, ethical, and societal dimensions, this survey provides a structured foundation for developing scalable, transparent, and human-centered Metaverse intelligence.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 6","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145626139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MotionBlend GAN: An Approach for Realistic Video Content Creation With Embedded Approach for Human Subjects MotionBlend GAN:一种用于人类主题的嵌入式方法的逼真视频内容创作方法
IF 1.7 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-11-25 DOI: 10.1002/cav.70085
Lalit Kumar, Dushyant Kumar Singh

The proposed MotionBlend GAN model marks a significant step forward in video synthesis by blending the motion from a source video with the appearance of a target person's image. As training progresses, the model improves video creation by enhancing the smoothness and natural flow of motion, resulting in more coherent and lifelike videos. Using advanced techniques like MoBConv blocks of EfficientNet-B7, OpenPose for precise pose detection, ResNet blocks for feature integration, and a 3D CNN discriminator, the model produces high-quality videos that maintain both spatial and temporal consistency. After 200 epochs, the model achieved an adversarial loss of 0.2265, with metrics like PSNR at 20.246, SSIM at 0.867, and LPIPS at 0.178. The high PSNR and SSIM values, along with the low LPIPS, show that the generated frames are well aligned and preserve important details. These results highlight the model's strong performance over time, consistently generating visually convincing videos of human activities using a reference image and source video. The model effectively transfers motion from video to image, creating realistic videos of human activity in comparison to existing models.

提出的MotionBlend GAN模型通过将源视频中的运动与目标人物图像的外观混合在一起,标志着视频合成迈出了重要的一步。随着训练的进行,该模型通过增强运动的流畅性和自然流动来改善视频创作,从而产生更加连贯和逼真的视频。使用先进的技术,如高效网b7的MoBConv块,用于精确姿态检测的OpenPose,用于特征集成的ResNet块和3D CNN鉴别器,该模型产生高质量的视频,保持空间和时间的一致性。经过200次epoch后,该模型实现了0.2265的对抗性损失,其中PSNR为20.246,SSIM为0.867,LPIPS为0.178。高PSNR和SSIM值以及低LPIPS表明生成的帧很好地对齐并保留了重要的细节。这些结果突出了该模型随着时间推移的强大性能,使用参考图像和源视频始终如一地生成视觉上令人信服的人类活动视频。该模型有效地将运动从视频转换为图像,与现有模型相比,可以创建逼真的人类活动视频。
{"title":"MotionBlend GAN: An Approach for Realistic Video Content Creation With Embedded Approach for Human Subjects","authors":"Lalit Kumar,&nbsp;Dushyant Kumar Singh","doi":"10.1002/cav.70085","DOIUrl":"https://doi.org/10.1002/cav.70085","url":null,"abstract":"<div>\u0000 \u0000 <p>The proposed MotionBlend GAN model marks a significant step forward in video synthesis by blending the motion from a source video with the appearance of a target person's image. As training progresses, the model improves video creation by enhancing the smoothness and natural flow of motion, resulting in more coherent and lifelike videos. Using advanced techniques like MoBConv blocks of EfficientNet-B7, OpenPose for precise pose detection, ResNet blocks for feature integration, and a 3D CNN discriminator, the model produces high-quality videos that maintain both spatial and temporal consistency. After 200 epochs, the model achieved an adversarial loss of 0.2265, with metrics like PSNR at 20.246, SSIM at 0.867, and LPIPS at 0.178. The high PSNR and SSIM values, along with the low LPIPS, show that the generated frames are well aligned and preserve important details. These results highlight the model's strong performance over time, consistently generating visually convincing videos of human activities using a reference image and source video. The model effectively transfers motion from video to image, creating realistic videos of human activity in comparison to existing models.</p>\u0000 </div>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 6","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145626137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Virtual Reenactment of the Itaewon Crowd Crush Using Kinodynamic Simulation 梨泰院拥挤人群的动态仿真虚拟再现
IF 1.7 4区 计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2025-11-25 DOI: 10.1002/cav.70081
Juyi Hwang, Young J. Kim

We propose new crowd simulation methods to virtually reenact the Itaewon disaster that occurred in 2022 due to the extreme crowd density. Conventional techniques make it challenging to simulate diverse, extremely dense crowd behaviors such as crowd surge, fluidization, and falls observed at the Itaewon disaster. This paper proposes a kinodynamic agent simulation combining kinematic agents for low-density crowds, hydrodynamic and hydrostatic agents for high-density crowds, and articulated passive agents for high-density crowds in dense contact. In order to perform co-simulation among heterogeneous agent types, we use a message-passing mechanism to share relevant kinematic and dynamic information among agents and make agent-type transitions based on crowd density and contact forces. Experiments show that the proposed hybrid simulation approach can accurately reenact crowd phenomena observed at the Itaewon compared to the CCTV footage. Moreover, our ablation study supports the use of kinodynamic agents to faithfully reconstruct the Itaewon crowd behavior. Furthermore, we run three what-if scenarios to explore the possibilities of using our techniques to help prevent incidents in the future. Finally, to demonstrate the applicability of our proposed methods to other types of extreme crowd behaviors besides the Itaewon disaster, we simulate two other real-world crowd incidents using our techniques.

我们提出了新的人群模拟方法,以虚拟再现2022年发生的梨泰院灾难。传统技术很难模拟在梨泰院灾难中观察到的人群涌动、流化和坠落等多种极端密集的人群行为。本文提出了一种结合低密度人群的运动学智能体、高密度人群的流体动力和流体静力智能体以及密集接触高密度人群的铰接被动智能体的运动学智能体仿真方法。为了在不同类型的智能体之间进行联合仿真,我们使用消息传递机制在智能体之间共享相关的运动学和动态信息,并根据人群密度和接触力进行智能体类型转换。实验表明,与CCTV录像相比,该混合模拟方法可以准确地再现在梨泰院观察到的人群现象。此外,我们的消融研究支持使用动力学试剂忠实地重建梨泰院人群行为。此外,我们运行了三个假设场景,以探索使用我们的技术来帮助防止将来发生事件的可能性。最后,为了证明我们提出的方法在梨泰院灾难之外的其他类型的极端人群行为中的适用性,我们使用我们的技术模拟了另外两个现实世界的人群事件。
{"title":"Virtual Reenactment of the Itaewon Crowd Crush Using Kinodynamic Simulation","authors":"Juyi Hwang,&nbsp;Young J. Kim","doi":"10.1002/cav.70081","DOIUrl":"https://doi.org/10.1002/cav.70081","url":null,"abstract":"<p>We propose new crowd simulation methods to virtually reenact the Itaewon disaster that occurred in 2022 due to the extreme crowd density. Conventional techniques make it challenging to simulate diverse, extremely dense crowd behaviors such as crowd surge, fluidization, and falls observed at the Itaewon disaster. This paper proposes a kinodynamic agent simulation combining kinematic agents for low-density crowds, hydrodynamic and hydrostatic agents for high-density crowds, and articulated passive agents for high-density crowds in dense contact. In order to perform co-simulation among heterogeneous agent types, we use a message-passing mechanism to share relevant kinematic and dynamic information among agents and make agent-type transitions based on crowd density and contact forces. Experiments show that the proposed hybrid simulation approach can accurately reenact crowd phenomena observed at the Itaewon compared to the CCTV footage. Moreover, our ablation study supports the use of kinodynamic agents to faithfully reconstruct the Itaewon crowd behavior. Furthermore, we run three what-if scenarios to explore the possibilities of using our techniques to help prevent incidents in the future. Finally, to demonstrate the applicability of our proposed methods to other types of extreme crowd behaviors besides the Itaewon disaster, we simulate two other real-world crowd incidents using our techniques.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"36 6","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cav.70081","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145626138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computer Animation and Virtual Worlds
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1