We present a cloth simulation parameter estimation method that integrates the flexibility of global optimization with the speed of neural networks. While global optimization allows for varied designs in objective functions and specifying the range of optimization variables, it requires thousands of objective function evaluations. Each evaluation, which involves a cloth simulation, is computationally demanding and impractical time-wise. On the other hand, neural network learning methods offer quick estimation results but face challenges such as the need for data collection, re-training when input data formats change, and difficulties in setting constraints on variable ranges. Our proposed method addresses these issues by replacing the simulation process, typically necessary for objective function evaluations in global optimization, with a neural network for inference. We demonstrate that, once an estimation model is trained, optimization for various objective functions becomes straightforward. Moreover, we illustrate that it is possible to achieve optimization results that reflect the intentions of expert users through visualization of a wide optimization space and the use of range constraints.
{"title":"Fast constrained optimization for cloth simulation parameters from static drapes","authors":"Eunjung Ju, Eungjune Shim, Kwang-yun Kim, Sungjin Yoon, Myung Geol Choi","doi":"10.1002/cav.2265","DOIUrl":"https://doi.org/10.1002/cav.2265","url":null,"abstract":"<p>We present a cloth simulation parameter estimation method that integrates the flexibility of global optimization with the speed of neural networks. While global optimization allows for varied designs in objective functions and specifying the range of optimization variables, it requires thousands of objective function evaluations. Each evaluation, which involves a cloth simulation, is computationally demanding and impractical time-wise. On the other hand, neural network learning methods offer quick estimation results but face challenges such as the need for data collection, re-training when input data formats change, and difficulties in setting constraints on variable ranges. Our proposed method addresses these issues by replacing the simulation process, typically necessary for objective function evaluations in global optimization, with a neural network for inference. We demonstrate that, once an estimation model is trained, optimization for various objective functions becomes straightforward. Moreover, we illustrate that it is possible to achieve optimization results that reflect the intentions of expert users through visualization of a wide optimization space and the use of range constraints.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cav.2265","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141326755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chiroptera behavior is complex and often unseen as bats are nocturnal, small, and elusive animals. Chiroptology has led to significant insights into the behavior and environmental interactions of bats. Biology, ecology, and even digital media often benefit from mathematical models of animals including humans. However, the history of Chiroptera modeling is often limited to specific behaviors, species, or biological functions and relies heavily on classical modeling methodologies that may not fully represent individuals or colonies well. This work proposes a continuous, parametric, multiagent, Chiroptera behavior model that captures the latest research in echolocation, hunting, and energetics of bats. This includes echolocation-based perception (or lack thereof), hunting patterns, roosting behavior, and energy consumption rates. We proposed the integration of these mathematical models in a framework that affords the individual simulation of bats within large-scale colonies. Practitioners can adjust the model to account for different perceptual affordances or patterns among species of bats, or even individuals (such as sickness or injury). We show that our model closely matches results from the literature, affords an animated graphical simulation, and has utility in simulation-based studies.
{"title":"Toward comprehensive Chiroptera modeling: A parametric multiagent model for bat behavior","authors":"Brendan Marney, Brandon Haworth","doi":"10.1002/cav.2251","DOIUrl":"https://doi.org/10.1002/cav.2251","url":null,"abstract":"<p>Chiroptera behavior is complex and often unseen as bats are nocturnal, small, and elusive animals. Chiroptology has led to significant insights into the behavior and environmental interactions of bats. Biology, ecology, and even digital media often benefit from mathematical models of animals including humans. However, the history of Chiroptera modeling is often limited to specific behaviors, species, or biological functions and relies heavily on classical modeling methodologies that may not fully represent individuals or colonies well. This work proposes a continuous, parametric, multiagent, Chiroptera behavior model that captures the latest research in echolocation, hunting, and energetics of bats. This includes echolocation-based perception (or lack thereof), hunting patterns, roosting behavior, and energy consumption rates. We proposed the integration of these mathematical models in a framework that affords the individual simulation of bats within large-scale colonies. Practitioners can adjust the model to account for different perceptual affordances or patterns among species of bats, or even individuals (such as sickness or injury). We show that our model closely matches results from the literature, affords an animated graphical simulation, and has utility in simulation-based studies.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cav.2251","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141326753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It is a hot topic to extract road maps from satellite images. However, it is still very challenging with existing methods to achieve high-quality results, because the regions covered by satellite images are very large and the roads are slender, complex and only take up a small part of a satellite image, making it difficult to distinguish roads from the background in satellite images. In this article, we address this challenge by presenting two modules to more effectively learn road features, and so improving road extraction. The first module exploits the differences between the patches containing roads and the patches containing no road to exclude the background regions as many as possible, by which the small part containing roads can be more specifically investigated for improvement. The second module enhances feature alignment in decoding feature maps by using strip convolution in combination with the attention mechanism. These two modules can be easily integrated into the networks of existing learning methods for improvement. Experimental results show that our modules can help existing methods to achieve high-quality results, superior to the state-of-the-art methods.
{"title":"Extracting roads from satellite images via enhancing road feature investigation in learning","authors":"Shiming Feng, Fei Hou, Jialu Chen, Wencheng Wang","doi":"10.1002/cav.2275","DOIUrl":"https://doi.org/10.1002/cav.2275","url":null,"abstract":"<p>It is a hot topic to extract road maps from satellite images. However, it is still very challenging with existing methods to achieve high-quality results, because the regions covered by satellite images are very large and the roads are slender, complex and only take up a small part of a satellite image, making it difficult to distinguish roads from the background in satellite images. In this article, we address this challenge by presenting two modules to more effectively learn road features, and so improving road extraction. The first module exploits the differences between the patches containing roads and the patches containing no road to exclude the background regions as many as possible, by which the small part containing roads can be more specifically investigated for improvement. The second module enhances feature alignment in decoding feature maps by using strip convolution in combination with the attention mechanism. These two modules can be easily integrated into the networks of existing learning methods for improvement. Experimental results show that our modules can help existing methods to achieve high-quality results, superior to the state-of-the-art methods.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141315447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luyuan Wang, Yiqian Wu, Yong-Liang Yang, Chen Liu, Xiaogang Jin
The rapid development of the online apparel shopping industry demands innovative solutions for high-quality digital apparel sample displays with virtual avatars. However, developing such displays is prohibitively expensive and prone to the well-known “uncanny valley” effect, where a nearly human-looking artifact arouses eeriness and repulsiveness, thus affecting the user experience. To effectively mitigate the “uncanny valley” effect and improve the overall authenticity of digital apparel sample displays, we present a novel photo-realistic portrait generation framework. Our key idea is to employ transfer learning to learn an identity-consistent mapping from the latent space of rendered portraits to that of real portraits. During the inference stage, the input portrait of an avatar can be directly transferred to a realistic portrait by changing its appearance style while maintaining the facial identity. To this end, we collect a new dataset, Daz-Rendered-Faces-HQ (DRFHQ), specifically designed for rendering-style portraits. We leverage this dataset to fine-tune the StyleGAN2-FFHQ generator, using our carefully crafted framework, which helps to preserve the geometric and color features relevant to facial identity. We evaluate our framework using portraits with diverse gender, age, and race variations. Qualitative and quantitative evaluations, along with ablation studies, highlight our method's advantages over state-of-the-art approaches.
{"title":"Identity-consistent transfer learning of portraits for digital apparel sample display","authors":"Luyuan Wang, Yiqian Wu, Yong-Liang Yang, Chen Liu, Xiaogang Jin","doi":"10.1002/cav.2278","DOIUrl":"https://doi.org/10.1002/cav.2278","url":null,"abstract":"<p>The rapid development of the online apparel shopping industry demands innovative solutions for high-quality digital apparel sample displays with virtual avatars. However, developing such displays is prohibitively expensive and prone to the well-known “uncanny valley” effect, where a nearly human-looking artifact arouses eeriness and repulsiveness, thus affecting the user experience. To effectively mitigate the “uncanny valley” effect and improve the overall authenticity of digital apparel sample displays, we present a novel photo-realistic portrait generation framework. Our key idea is to employ transfer learning to learn an identity-consistent mapping from the latent space of rendered portraits to that of real portraits. During the inference stage, the input portrait of an avatar can be directly transferred to a realistic portrait by changing its appearance style while maintaining the facial identity. To this end, we collect a new dataset, <b>D</b>az-<b>R</b>endered-<b>F</b>aces-<b>HQ</b> (<i>DRFHQ</i>), specifically designed for rendering-style portraits. We leverage this dataset to fine-tune the StyleGAN2-<i>FFHQ</i> generator, using our carefully crafted framework, which helps to preserve the geometric and color features relevant to facial identity. We evaluate our framework using portraits with diverse gender, age, and race variations. Qualitative and quantitative evaluations, along with ablation studies, highlight our method's advantages over state-of-the-art approaches.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141298633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yongtang Bao, Xiang Liu, Yue Qi, Ruijun Liu, Haojie Li
Personality recognition is of great significance in deepening the understanding of social relations. While personality recognition methods have made significant strides in recent years, the challenge of heterogeneity between modalities during feature fusion still needs to be solved. This paper introduces an adaptive multi-modal information fusion network (AMIF-Net) capable of concurrently processing video, audio, and text data. First, utilizing the AMIF-Net encoder, we process the extracted audio and video features separately, effectively capturing long-term data relationships. Then, adding adaptive elements in the fusion network can alleviate the problem of heterogeneity between modes. Lastly, we concatenate audio-video and text features into a regression network to obtain Big Five personality trait scores. Furthermore, we introduce a novel loss function to address the problem of training inaccuracies, taking advantage of its unique property of exhibiting a peak at the critical mean. Our tests on the ChaLearn First Impressions V2 multi-modal dataset show partial performance surpassing state-of-the-art networks.
人格识别对于加深对社会关系的理解具有重要意义。近年来,人格识别方法取得了长足进步,但特征融合过程中模态间的异质性仍是亟待解决的难题。本文介绍了一种能够同时处理视频、音频和文本数据的自适应多模态信息融合网络(AMIF-Net)。首先,我们利用 AMIF-Net 编码器分别处理提取的音频和视频特征,有效捕捉长期数据关系。然后,在融合网络中加入自适应元素,可以缓解不同模式之间的异质性问题。最后,我们将音频视频和文本特征整合到一个回归网络中,从而获得大五人格特质得分。此外,我们还引入了一种新的损失函数,利用其在临界均值处显示峰值的独特特性来解决训练不准确的问题。我们在 ChaLearn First Impressions V2 多模态数据集上进行的测试表明,其部分性能超过了最先进的网络。
{"title":"Adaptive information fusion network for multi-modal personality recognition","authors":"Yongtang Bao, Xiang Liu, Yue Qi, Ruijun Liu, Haojie Li","doi":"10.1002/cav.2268","DOIUrl":"https://doi.org/10.1002/cav.2268","url":null,"abstract":"<p>Personality recognition is of great significance in deepening the understanding of social relations. While personality recognition methods have made significant strides in recent years, the challenge of heterogeneity between modalities during feature fusion still needs to be solved. This paper introduces an adaptive multi-modal information fusion network (AMIF-Net) capable of concurrently processing video, audio, and text data. First, utilizing the AMIF-Net encoder, we process the extracted audio and video features separately, effectively capturing long-term data relationships. Then, adding adaptive elements in the fusion network can alleviate the problem of heterogeneity between modes. Lastly, we concatenate audio-video and text features into a regression network to obtain Big Five personality trait scores. Furthermore, we introduce a novel loss function to address the problem of training inaccuracies, taking advantage of its unique property of exhibiting a peak at the critical mean. Our tests on the ChaLearn First Impressions V2 multi-modal dataset show partial performance surpassing state-of-the-art networks.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141298535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Paying close attention to facial expressions, gestures, and communication techniques is essential when creating animated physician characters that are realistic and captivating when describing surgical procedures. This paper emphasizes the integration of appropriate emotions, co-speech gestures when medical experts explain the medical procedure, and designing animated characters. We can achieve healthy doctor-patient relationships and improvement of patients' understanding by depicting these components truthfully. We suggest two critical approaches to developing virtual medical experts by incorporating these elements. First, doctors can generate the contents of the surgical procedure with a virtual doctor. Second, patients can listen to the surgical procedure described by the virtual doctor and ask if they have any questions. Our system helps patients by considering their psychology and adding medical professionals' opinions. These improvements ensure the animated virtual agent is comforting, reassuring, and emotionally supportive. Through a user study, we evaluated our hypothesis and gained insight into improvements.
{"title":"Enhancing doctor-patient communication in surgical explanations: Designing effective facial expressions and gestures for animated physician characters","authors":"Hwang Youn Kim, Ghazanfar Ali, Jae-In Hwang","doi":"10.1002/cav.2236","DOIUrl":"https://doi.org/10.1002/cav.2236","url":null,"abstract":"<p>Paying close attention to facial expressions, gestures, and communication techniques is essential when creating animated physician characters that are realistic and captivating when describing surgical procedures. This paper emphasizes the integration of appropriate emotions, co-speech gestures when medical experts explain the medical procedure, and designing animated characters. We can achieve healthy doctor-patient relationships and improvement of patients' understanding by depicting these components truthfully. We suggest two critical approaches to developing virtual medical experts by incorporating these elements. First, doctors can generate the contents of the surgical procedure with a virtual doctor. Second, patients can listen to the surgical procedure described by the virtual doctor and ask if they have any questions. Our system helps patients by considering their psychology and adding medical professionals' opinions. These improvements ensure the animated virtual agent is comforting, reassuring, and emotionally supportive. Through a user study, we evaluated our hypothesis and gained insight into improvements.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cav.2236","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141264573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The development of the systems capable of synthesizing natural and life-like motions for virtual characters has long been a central focus in computer animation. It needs to generate high-quality motions for characters and provide users with a convenient and flexible interface for guiding character motions. In this work, we propose a language-directed virtual human motion generation approach based on musculoskeletal models to achieve interactive and higher-fidelity virtual human motion, which lays the foundation for the development of language-directed controllers in physics-based character animation. First, we construct a simplified model of musculoskeletal dynamics for the virtual character. Subsequently, we propose a hierarchical control framework consisting of a trajectory tracking layer and a muscle control layer, obtaining the optimal control policy for imitating the reference motions through the training. We design a multi-policy aggregation controller based on large language models, which selects the motion policy with the highest similarity to user text commands from the action-caption data pool, facilitating natural language-based control of virtual character motions. Experimental results demonstrate that the proposed approach not only generates high-quality motions highly resembling reference motions but also enables users to effectively guide virtual characters to perform various motions via natural language instructions.
{"title":"A language-directed virtual human motion generation approach based on musculoskeletal models","authors":"Libo Sun, Yongxiang Wang, Wenhu Qin","doi":"10.1002/cav.2257","DOIUrl":"https://doi.org/10.1002/cav.2257","url":null,"abstract":"<p>The development of the systems capable of synthesizing natural and life-like motions for virtual characters has long been a central focus in computer animation. It needs to generate high-quality motions for characters and provide users with a convenient and flexible interface for guiding character motions. In this work, we propose a language-directed virtual human motion generation approach based on musculoskeletal models to achieve interactive and higher-fidelity virtual human motion, which lays the foundation for the development of language-directed controllers in physics-based character animation. First, we construct a simplified model of musculoskeletal dynamics for the virtual character. Subsequently, we propose a hierarchical control framework consisting of a trajectory tracking layer and a muscle control layer, obtaining the optimal control policy for imitating the reference motions through the training. We design a multi-policy aggregation controller based on large language models, which selects the motion policy with the highest similarity to user text commands from the action-caption data pool, facilitating natural language-based control of virtual character motions. Experimental results demonstrate that the proposed approach not only generates high-quality motions highly resembling reference motions but also enables users to effectively guide virtual characters to perform various motions via natural language instructions.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141251352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Parametric human modeling are limited to either single-view frameworks or simple multi-view frameworks, failing to fully leverage the advantages of easily trainable single-view networks and the occlusion-resistant capabilities of multi-view images. The prevalent presence of object occlusion and self-occlusion in real-world scenarios leads to issues of robustness and accuracy in predicting human body parameters. Additionally, many methods overlook the spatial connectivity of human joints in the global estimation of model pose parameters, resulting in cumulative errors in continuous joint parameters.To address these challenges, we propose a flexible and efficient iterative decoding strategy. By extending from single-view images to multi-view video inputs, we achieve local-to-global optimization. We utilize attention mechanisms to capture the rotational dependencies between any node in the human body and all its ancestor nodes, thereby enhancing pose decoding capability. We employ a parameter-level iterative fusion of multi-view image data to achieve flexible integration of global pose information, rapidly obtaining appropriate projection features from different viewpoints, ultimately resulting in precise parameter estimation. Through experiments, we validate the effectiveness of the HIDE method on the Human3.6M and 3DPW datasets, demonstrating significantly improved visualization results compared to previous methods.
{"title":"HIDE: Hierarchical iterative decoding enhancement for multi-view 3D human parameter regression","authors":"Weitao Lin, Jiguang Zhang, Weiliang Meng, Xianglong Liu, Xiaopeng Zhang","doi":"10.1002/cav.2266","DOIUrl":"https://doi.org/10.1002/cav.2266","url":null,"abstract":"<p>Parametric human modeling are limited to either single-view frameworks or simple multi-view frameworks, failing to fully leverage the advantages of easily trainable single-view networks and the occlusion-resistant capabilities of multi-view images. The prevalent presence of object occlusion and self-occlusion in real-world scenarios leads to issues of robustness and accuracy in predicting human body parameters. Additionally, many methods overlook the spatial connectivity of human joints in the global estimation of model pose parameters, resulting in cumulative errors in continuous joint parameters.To address these challenges, we propose a flexible and efficient iterative decoding strategy. By extending from single-view images to multi-view video inputs, we achieve local-to-global optimization. We utilize attention mechanisms to capture the rotational dependencies between any node in the human body and all its ancestor nodes, thereby enhancing pose decoding capability. We employ a parameter-level iterative fusion of multi-view image data to achieve flexible integration of global pose information, rapidly obtaining appropriate projection features from different viewpoints, ultimately resulting in precise parameter estimation. Through experiments, we validate the effectiveness of the HIDE method on the Human3.6M and 3DPW datasets, demonstrating significantly improved visualization results compared to previous methods.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141251353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Virtual Reality (VR)-enabled multi-user collaboration has been gradually applied in academic research and industrial applications, but it still has key problems. First, it is often difficult for users to select or manipulate objects in complex three-dimesnional spaces, which greatly affects their operational efficiency. Second, supporting natural communication cues is crucial for cooperation in VR, especially in collaborative tasks, where ambiguous verbal communication cannot effectively assign partners the task of selecting or manipulating objects. To address the above issues, in this paper, we propose a new interaction method, Eye-Gesture Combination Interaction in VR, to enhance the execution of collaborative tasks by sharing the visualization of eye movement and gesture data among partners. We conducted user experiments and showed that using dots to represent eye gaze and virtual hands to represent gestures can help users complete tasks faster than other visualization methods. Finally, we developed a VR multi-user collaborative assembly system. The results of the user study show that sharing gaze points and gestures among users can significantly improve the productivity of collaborating users. Our work can effectively improve the efficiency of multi-user collaborative systems in VR and provide new design guidelines for collaborative systems in VR.
{"title":"Augmenting collaborative interaction with shared visualization of eye movement and gesture in VR","authors":"Yang Liu, Song Zhao, Shiwei Cheng","doi":"10.1002/cav.2264","DOIUrl":"https://doi.org/10.1002/cav.2264","url":null,"abstract":"<p>Virtual Reality (VR)-enabled multi-user collaboration has been gradually applied in academic research and industrial applications, but it still has key problems. First, it is often difficult for users to select or manipulate objects in complex three-dimesnional spaces, which greatly affects their operational efficiency. Second, supporting natural communication cues is crucial for cooperation in VR, especially in collaborative tasks, where ambiguous verbal communication cannot effectively assign partners the task of selecting or manipulating objects. To address the above issues, in this paper, we propose a new interaction method, Eye-Gesture Combination Interaction in VR, to enhance the execution of collaborative tasks by sharing the visualization of eye movement and gesture data among partners. We conducted user experiments and showed that using dots to represent eye gaze and virtual hands to represent gestures can help users complete tasks faster than other visualization methods. Finally, we developed a VR multi-user collaborative assembly system. The results of the user study show that sharing gaze points and gestures among users can significantly improve the productivity of collaborating users. Our work can effectively improve the efficiency of multi-user collaborative systems in VR and provide new design guidelines for collaborative systems in VR.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141245708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Existing crowd evacuation simulation methods commonly face challenges of low efficiency in path planning and insufficient realism in pedestrian movement during the evacuation process. In this study, we propose a novel crowd evacuation path planning approach based on the learning curve–deep deterministic policy gradient (LC-DDPG) algorithm. The algorithm incorporates dynamic experience pool and a priority experience sampling strategy, enhancing convergence speed and achieving higher average rewards, thus efficiently enabling global path planning. Building upon this foundation, we introduce a double-layer method for crowd evacuation using deep reinforcement learning. Specifically, within each group, individuals are categorized into leaders and followers. At the top layer, we employ the LC-DDPG algorithm to perform global path planning for the leaders. Simultaneously, at the bottom layer, an enhanced social force model guides the followers to avoid obstacles and follow the leaders during evacuation. We implemented a crowd evacuation simulation platform. Experimental results show that our proposed method has high path planning efficiency and can generate more realistic pedestrian trajectories in different scenarios and crowd sizes.
{"title":"A double-layer crowd evacuation simulation method based on deep reinforcement learning","authors":"Yong Zhang, Bo Yang, Jianlin Zhu","doi":"10.1002/cav.2280","DOIUrl":"https://doi.org/10.1002/cav.2280","url":null,"abstract":"<p>Existing crowd evacuation simulation methods commonly face challenges of low efficiency in path planning and insufficient realism in pedestrian movement during the evacuation process. In this study, we propose a novel crowd evacuation path planning approach based on the learning curve–deep deterministic policy gradient (LC-DDPG) algorithm. The algorithm incorporates dynamic experience pool and a priority experience sampling strategy, enhancing convergence speed and achieving higher average rewards, thus efficiently enabling global path planning. Building upon this foundation, we introduce a double-layer method for crowd evacuation using deep reinforcement learning. Specifically, within each group, individuals are categorized into leaders and followers. At the top layer, we employ the LC-DDPG algorithm to perform global path planning for the leaders. Simultaneously, at the bottom layer, an enhanced social force model guides the followers to avoid obstacles and follow the leaders during evacuation. We implemented a crowd evacuation simulation platform. Experimental results show that our proposed method has high path planning efficiency and can generate more realistic pedestrian trajectories in different scenarios and crowd sizes.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 3","pages":""},"PeriodicalIF":1.1,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141187614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}