2D Gaussian Splatting has recently emerged as a significant method in 3D reconstruction, enabling novel view synthesis and geometry reconstruction simultaneously. While the well-known Gaussian kernel is broadly used, its lack of anisotropy and deformation ability leads to dim and vague edges at object silhouettes, limiting the reconstruction quality of current Gaussian splatting methods. To enhance the representation power, we draw inspiration from quantum physics and propose to use the Gaussian-Hermite kernel as the new primitive in Gaussian splatting. The new kernel takes a unified mathematical form and extends the Gaussian function, which serves as the zero-rank term in the updated formulation. Our experiments demonstrate the extraordinary performance of Gaussian-Hermite kernel in both geometry reconstruction and novel-view synthesis tasks. The proposed kernel outperforms traditional Gaussian Splatting kernels, showcasing its potential for high-quality 3D reconstruction and rendering.
{"title":"2DGH: 2D Gaussian-Hermite Splatting for High-quality Rendering and Better Geometry Reconstruction","authors":"Ruihan Yu, Tianyu Huang, Jingwang Ling, Feng Xu","doi":"arxiv-2408.16982","DOIUrl":"https://doi.org/arxiv-2408.16982","url":null,"abstract":"2D Gaussian Splatting has recently emerged as a significant method in 3D\u0000reconstruction, enabling novel view synthesis and geometry reconstruction\u0000simultaneously. While the well-known Gaussian kernel is broadly used, its lack\u0000of anisotropy and deformation ability leads to dim and vague edges at object\u0000silhouettes, limiting the reconstruction quality of current Gaussian splatting\u0000methods. To enhance the representation power, we draw inspiration from quantum\u0000physics and propose to use the Gaussian-Hermite kernel as the new primitive in\u0000Gaussian splatting. The new kernel takes a unified mathematical form and\u0000extends the Gaussian function, which serves as the zero-rank term in the\u0000updated formulation. Our experiments demonstrate the extraordinary performance\u0000of Gaussian-Hermite kernel in both geometry reconstruction and novel-view\u0000synthesis tasks. The proposed kernel outperforms traditional Gaussian Splatting\u0000kernels, showcasing its potential for high-quality 3D reconstruction and\u0000rendering.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Functional approximation as a high-order continuous representation provides a more accurate value and gradient query compared to the traditional discrete volume representation. Volume visualization directly rendered from functional approximation generates high-quality rendering results without high-order artifacts caused by trilinear interpolations. However, querying an encoded functional approximation is computationally expensive, especially when the input dataset is large, making functional approximation impractical for interactive visualization. In this paper, we proposed a novel functional approximation multi-resolution representation, Adaptive-FAM, which is lightweight and fast to query. We also design a GPU-accelerated out-of-core multi-resolution volume visualization framework that directly utilizes the Adaptive-FAM representation to generate high-quality rendering with interactive responsiveness. Our method can not only dramatically decrease the caching time, one of the main contributors to input latency, but also effectively improve the cache hit rate through prefetching. Our approach significantly outperforms the traditional function approximation method in terms of input latency while maintaining comparable rendering quality.
{"title":"Adaptive Multi-Resolution Encoding for Interactive Large-Scale Volume Visualization through Functional Approximation","authors":"Jianxin Sun, David Lenz, Hongfeng Yu, Tom Peterka","doi":"arxiv-2409.00184","DOIUrl":"https://doi.org/arxiv-2409.00184","url":null,"abstract":"Functional approximation as a high-order continuous representation provides a\u0000more accurate value and gradient query compared to the traditional discrete\u0000volume representation. Volume visualization directly rendered from functional\u0000approximation generates high-quality rendering results without high-order\u0000artifacts caused by trilinear interpolations. However, querying an encoded\u0000functional approximation is computationally expensive, especially when the\u0000input dataset is large, making functional approximation impractical for\u0000interactive visualization. In this paper, we proposed a novel functional\u0000approximation multi-resolution representation, Adaptive-FAM, which is\u0000lightweight and fast to query. We also design a GPU-accelerated out-of-core\u0000multi-resolution volume visualization framework that directly utilizes the\u0000Adaptive-FAM representation to generate high-quality rendering with interactive\u0000responsiveness. Our method can not only dramatically decrease the caching time,\u0000one of the main contributors to input latency, but also effectively improve the\u0000cache hit rate through prefetching. Our approach significantly outperforms the\u0000traditional function approximation method in terms of input latency while\u0000maintaining comparable rendering quality.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nikolina Kubiak, Elliot Wortman, Armin Mustafa, Graeme Phillipson, Stephen Jolly, Simon Hadfield
Existing shadow detection models struggle to differentiate dark image areas from shadows. In this paper, we tackle this issue by verifying that all detected shadows are real, i.e. they have paired shadow casters. We perform this step in a physically-accurate manner by differentiably re-rendering the scene and observing the changes stemming from carving out estimated shadow casters. Thanks to this approach, the RenDetNet proposed in this paper is the first learning-based shadow detection model whose supervisory signals can be computed in a self-supervised manner. The developed system compares favourably against recent models trained on our data. As part of this publication, we release our code on github.
{"title":"RenDetNet: Weakly-supervised Shadow Detection with Shadow Caster Verification","authors":"Nikolina Kubiak, Elliot Wortman, Armin Mustafa, Graeme Phillipson, Stephen Jolly, Simon Hadfield","doi":"arxiv-2408.17143","DOIUrl":"https://doi.org/arxiv-2408.17143","url":null,"abstract":"Existing shadow detection models struggle to differentiate dark image areas\u0000from shadows. In this paper, we tackle this issue by verifying that all\u0000detected shadows are real, i.e. they have paired shadow casters. We perform\u0000this step in a physically-accurate manner by differentiably re-rendering the\u0000scene and observing the changes stemming from carving out estimated shadow\u0000casters. Thanks to this approach, the RenDetNet proposed in this paper is the\u0000first learning-based shadow detection model whose supervisory signals can be\u0000computed in a self-supervised manner. The developed system compares favourably\u0000against recent models trained on our data. As part of this publication, we\u0000release our code on github.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fangfu Liu, Wenqiang Sun, Hanyang Wang, Yikai Wang, Haowen Sun, Junliang Ye, Jun Zhang, Yueqi Duan
Advancements in 3D scene reconstruction have transformed 2D images from the real world into 3D models, producing realistic 3D results from hundreds of input photos. Despite great success in dense-view reconstruction scenarios, rendering a detailed scene from insufficient captured views is still an ill-posed optimization problem, often resulting in artifacts and distortions in unseen areas. In this paper, we propose ReconX, a novel 3D scene reconstruction paradigm that reframes the ambiguous reconstruction challenge as a temporal generation task. The key insight is to unleash the strong generative prior of large pre-trained video diffusion models for sparse-view reconstruction. However, 3D view consistency struggles to be accurately preserved in directly generated video frames from pre-trained models. To address this, given limited input views, the proposed ReconX first constructs a global point cloud and encodes it into a contextual space as the 3D structure condition. Guided by the condition, the video diffusion model then synthesizes video frames that are both detail-preserved and exhibit a high degree of 3D consistency, ensuring the coherence of the scene from various perspectives. Finally, we recover the 3D scene from the generated video through a confidence-aware 3D Gaussian Splatting optimization scheme. Extensive experiments on various real-world datasets show the superiority of our ReconX over state-of-the-art methods in terms of quality and generalizability.
{"title":"ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model","authors":"Fangfu Liu, Wenqiang Sun, Hanyang Wang, Yikai Wang, Haowen Sun, Junliang Ye, Jun Zhang, Yueqi Duan","doi":"arxiv-2408.16767","DOIUrl":"https://doi.org/arxiv-2408.16767","url":null,"abstract":"Advancements in 3D scene reconstruction have transformed 2D images from the\u0000real world into 3D models, producing realistic 3D results from hundreds of\u0000input photos. Despite great success in dense-view reconstruction scenarios,\u0000rendering a detailed scene from insufficient captured views is still an\u0000ill-posed optimization problem, often resulting in artifacts and distortions in\u0000unseen areas. In this paper, we propose ReconX, a novel 3D scene reconstruction\u0000paradigm that reframes the ambiguous reconstruction challenge as a temporal\u0000generation task. The key insight is to unleash the strong generative prior of\u0000large pre-trained video diffusion models for sparse-view reconstruction.\u0000However, 3D view consistency struggles to be accurately preserved in directly\u0000generated video frames from pre-trained models. To address this, given limited\u0000input views, the proposed ReconX first constructs a global point cloud and\u0000encodes it into a contextual space as the 3D structure condition. Guided by the\u0000condition, the video diffusion model then synthesizes video frames that are\u0000both detail-preserved and exhibit a high degree of 3D consistency, ensuring the\u0000coherence of the scene from various perspectives. Finally, we recover the 3D\u0000scene from the generated video through a confidence-aware 3D Gaussian Splatting\u0000optimization scheme. Extensive experiments on various real-world datasets show\u0000the superiority of our ReconX over state-of-the-art methods in terms of quality\u0000and generalizability.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Automating architectural floorplan design is vital for housing and interior design, offering a faster, cost-effective alternative to manual sketches by architects. However, existing methods, including rule-based and learning-based approaches, face challenges in design complexity and constrained generation with extensive post-processing, and tend to obvious geometric inconsistencies such as misalignment, overlap, and gaps. In this work, we propose a novel generative framework for vector floorplan design via structural graph generation, called GSDiff, focusing on wall junction generation and wall segment prediction to capture both geometric and semantic aspects of structural graphs. To improve the geometric rationality of generated structural graphs, we propose two innovative geometry enhancement methods. In wall junction generation, we propose a novel alignment loss function to improve geometric consistency. In wall segment prediction, we propose a random self-supervision method to enhance the model's perception of the overall geometric structure, thereby promoting the generation of reasonable geometric structures. Employing the diffusion model and the Transformer model, as well as the geometry enhancement strategies, our framework can generate wall junctions, wall segments and room polygons with structural and semantic information, resulting in structural graphs that accurately represent floorplans. Extensive experiments show that the proposed method surpasses existing techniques, enabling free generation and constrained generation, marking a shift towards structure generation in architectural design.
{"title":"Advancing Architectural Floorplan Design with Geometry-enhanced Graph Diffusion","authors":"Sizhe Hu, Wenming Wu, Yuntao Wang, Benzhu Xu, Liping Zheng","doi":"arxiv-2408.16258","DOIUrl":"https://doi.org/arxiv-2408.16258","url":null,"abstract":"Automating architectural floorplan design is vital for housing and interior\u0000design, offering a faster, cost-effective alternative to manual sketches by\u0000architects. However, existing methods, including rule-based and learning-based\u0000approaches, face challenges in design complexity and constrained generation\u0000with extensive post-processing, and tend to obvious geometric inconsistencies\u0000such as misalignment, overlap, and gaps. In this work, we propose a novel\u0000generative framework for vector floorplan design via structural graph\u0000generation, called GSDiff, focusing on wall junction generation and wall\u0000segment prediction to capture both geometric and semantic aspects of structural\u0000graphs. To improve the geometric rationality of generated structural graphs, we\u0000propose two innovative geometry enhancement methods. In wall junction\u0000generation, we propose a novel alignment loss function to improve geometric\u0000consistency. In wall segment prediction, we propose a random self-supervision\u0000method to enhance the model's perception of the overall geometric structure,\u0000thereby promoting the generation of reasonable geometric structures. Employing\u0000the diffusion model and the Transformer model, as well as the geometry\u0000enhancement strategies, our framework can generate wall junctions, wall\u0000segments and room polygons with structural and semantic information, resulting\u0000in structural graphs that accurately represent floorplans. Extensive\u0000experiments show that the proposed method surpasses existing techniques,\u0000enabling free generation and constrained generation, marking a shift towards\u0000structure generation in architectural design.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seams, distortions, wasted UV space, vertex-duplication, and varying resolution over the surface are the most prominent issues of the standard UV-based texturing of meshes. These issues are particularly acute when automatic UV-unwrapping techniques are used. For this reason, instead of generating textures in automatically generated UV-planes like most state-of-the-art methods, we propose to represent textures as coloured point-clouds whose colours are generated by a denoising diffusion probabilistic model constrained to operate on the surface of 3D objects. Our sampling and resolution agnostic generative model heavily relies on heat diffusion over the surface of the meshes for spatial communication between points. To enable processing of arbitrarily sampled point-cloud textures and ensure long-distance texture consistency we introduce a fast re-sampling of the mesh spectral properties used during the heat diffusion and introduce a novel heat-diffusion-based self-attention mechanism. Our code and pre-trained models are available at github.com/simofoti/UV3-TeD.
{"title":"UV-free Texture Generation with Denoising and Geodesic Heat Diffusions","authors":"Simone Foti, Stefanos Zafeiriou, Tolga Birdal","doi":"arxiv-2408.16762","DOIUrl":"https://doi.org/arxiv-2408.16762","url":null,"abstract":"Seams, distortions, wasted UV space, vertex-duplication, and varying\u0000resolution over the surface are the most prominent issues of the standard\u0000UV-based texturing of meshes. These issues are particularly acute when\u0000automatic UV-unwrapping techniques are used. For this reason, instead of\u0000generating textures in automatically generated UV-planes like most\u0000state-of-the-art methods, we propose to represent textures as coloured\u0000point-clouds whose colours are generated by a denoising diffusion probabilistic\u0000model constrained to operate on the surface of 3D objects. Our sampling and\u0000resolution agnostic generative model heavily relies on heat diffusion over the\u0000surface of the meshes for spatial communication between points. To enable\u0000processing of arbitrarily sampled point-cloud textures and ensure long-distance\u0000texture consistency we introduce a fast re-sampling of the mesh spectral\u0000properties used during the heat diffusion and introduce a novel\u0000heat-diffusion-based self-attention mechanism. Our code and pre-trained models\u0000are available at github.com/simofoti/UV3-TeD.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Áron Samuel Kovács, Pedro Hermosilla, Renata G. Raidou
We introduce G-Style, a novel algorithm designed to transfer the style of an image onto a 3D scene represented using Gaussian Splatting. Gaussian Splatting is a powerful 3D representation for novel view synthesis, as -- compared to other approaches based on Neural Radiance Fields -- it provides fast scene renderings and user control over the scene. Recent pre-prints have demonstrated that the style of Gaussian Splatting scenes can be modified using an image exemplar. However, since the scene geometry remains fixed during the stylization process, current solutions fall short of producing satisfactory results. Our algorithm aims to address these limitations by following a three-step process: In a pre-processing step, we remove undesirable Gaussians with large projection areas or highly elongated shapes. Subsequently, we combine several losses carefully designed to preserve different scales of the style in the image, while maintaining as much as possible the integrity of the original scene content. During the stylization process and following the original design of Gaussian Splatting, we split Gaussians where additional detail is necessary within our scene by tracking the gradient of the stylized color. Our experiments demonstrate that G-Style generates high-quality stylizations within just a few minutes, outperforming existing methods both qualitatively and quantitatively.
{"title":"G-Style: Stylized Gaussian Splatting","authors":"Áron Samuel Kovács, Pedro Hermosilla, Renata G. Raidou","doi":"arxiv-2408.15695","DOIUrl":"https://doi.org/arxiv-2408.15695","url":null,"abstract":"We introduce G-Style, a novel algorithm designed to transfer the style of an\u0000image onto a 3D scene represented using Gaussian Splatting. Gaussian Splatting\u0000is a powerful 3D representation for novel view synthesis, as -- compared to\u0000other approaches based on Neural Radiance Fields -- it provides fast scene\u0000renderings and user control over the scene. Recent pre-prints have demonstrated\u0000that the style of Gaussian Splatting scenes can be modified using an image\u0000exemplar. However, since the scene geometry remains fixed during the\u0000stylization process, current solutions fall short of producing satisfactory\u0000results. Our algorithm aims to address these limitations by following a\u0000three-step process: In a pre-processing step, we remove undesirable Gaussians\u0000with large projection areas or highly elongated shapes. Subsequently, we\u0000combine several losses carefully designed to preserve different scales of the\u0000style in the image, while maintaining as much as possible the integrity of the\u0000original scene content. During the stylization process and following the\u0000original design of Gaussian Splatting, we split Gaussians where additional\u0000detail is necessary within our scene by tracking the gradient of the stylized\u0000color. Our experiments demonstrate that G-Style generates high-quality\u0000stylizations within just a few minutes, outperforming existing methods both\u0000qualitatively and quantitatively.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rubens Halbig Montanha, Giovana Nascimento Raupp, Ana Carolina Policarpo Schmitt, Victor Flávio de Andrade Araujo, Soraia Raupp Musse
Computer Graphics (CG) advancements have allowed the creation of more realistic Virtual Humans (VH) through modern techniques for animating the VH body and face, thereby affecting perception. From traditional methods, including blend shapes, to driven animations using facial and body tracking, these advancements can potentially enhance the perception of comfort and realism in relation to VHs. Previously, Psychology studied facial movements in humans, with some works separating expressions into macro and micro expressions. Also, some previous CG studies have analyzed how macro and micro expressions are perceived, replicating psychology studies in VHs, encompassing studies with realistic and cartoon VHs, and exploring different VH technologies. However, instead of using facial tracking animation methods, these previous studies animated the VHs using blendshapes interpolation. To understand how the facial tracking technique alters the perception of VHs, this paper extends the study to macro and micro expressions, employing two datasets to transfer real facial expressions to VHs and analyze how their expressions are perceived. Our findings suggest that transferring facial expressions from real actors to VHs significantly diminishes the accuracy of emotion perception compared to VH facial animations created by artists.
{"title":"Micro and macro facial expressions by driven animations in realistic Virtual Humans","authors":"Rubens Halbig Montanha, Giovana Nascimento Raupp, Ana Carolina Policarpo Schmitt, Victor Flávio de Andrade Araujo, Soraia Raupp Musse","doi":"arxiv-2408.16110","DOIUrl":"https://doi.org/arxiv-2408.16110","url":null,"abstract":"Computer Graphics (CG) advancements have allowed the creation of more\u0000realistic Virtual Humans (VH) through modern techniques for animating the VH\u0000body and face, thereby affecting perception. From traditional methods,\u0000including blend shapes, to driven animations using facial and body tracking,\u0000these advancements can potentially enhance the perception of comfort and\u0000realism in relation to VHs. Previously, Psychology studied facial movements in\u0000humans, with some works separating expressions into macro and micro\u0000expressions. Also, some previous CG studies have analyzed how macro and micro\u0000expressions are perceived, replicating psychology studies in VHs, encompassing\u0000studies with realistic and cartoon VHs, and exploring different VH\u0000technologies. However, instead of using facial tracking animation methods,\u0000these previous studies animated the VHs using blendshapes interpolation. To\u0000understand how the facial tracking technique alters the perception of VHs, this\u0000paper extends the study to macro and micro expressions, employing two datasets\u0000to transfer real facial expressions to VHs and analyze how their expressions\u0000are perceived. Our findings suggest that transferring facial expressions from\u0000real actors to VHs significantly diminishes the accuracy of emotion perception\u0000compared to VH facial animations created by artists.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gabriel Fonseca Silva, Paulo Ricardo Knob, Rubens Halbig Montanha, Soraia Raupp Musse
Crowd simulation is a research area widely used in diverse fields, including gaming and security, assessing virtual agent movements through metrics like time to reach their goals, speed, trajectories, and densities. This is relevant for security applications, for instance, as different crowd configurations can determine the time people spend in environments trying to evacuate them. In this work, we extend WebCrowds, an authoring tool for crowd simulation, to allow users to build scenarios and evaluate them through a set of metrics. The aim is to provide a quantitative metric that can, based on simulation data, select the best crowd configuration in a certain environment. We conduct experiments to validate our proposed metric in multiple crowd simulation scenarios and perform a comparison with another metric found in the literature. The results show that experts in the domain of crowd scenarios agree with our proposed quantitative metric.
{"title":"Evaluating and Comparing Crowd Simulations: Perspectives from a Crowd Authoring Tool","authors":"Gabriel Fonseca Silva, Paulo Ricardo Knob, Rubens Halbig Montanha, Soraia Raupp Musse","doi":"arxiv-2408.15762","DOIUrl":"https://doi.org/arxiv-2408.15762","url":null,"abstract":"Crowd simulation is a research area widely used in diverse fields, including\u0000gaming and security, assessing virtual agent movements through metrics like\u0000time to reach their goals, speed, trajectories, and densities. This is relevant\u0000for security applications, for instance, as different crowd configurations can\u0000determine the time people spend in environments trying to evacuate them. In\u0000this work, we extend WebCrowds, an authoring tool for crowd simulation, to\u0000allow users to build scenarios and evaluate them through a set of metrics. The\u0000aim is to provide a quantitative metric that can, based on simulation data,\u0000select the best crowd configuration in a certain environment. We conduct\u0000experiments to validate our proposed metric in multiple crowd simulation\u0000scenarios and perform a comparison with another metric found in the literature.\u0000The results show that experts in the domain of crowd scenarios agree with our\u0000proposed quantitative metric.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Diffusion models have emerged as a popular method for 3D generation. However, it is still challenging for diffusion models to efficiently generate diverse and high-quality 3D shapes. In this paper, we introduce OctFusion, which can generate 3D shapes with arbitrary resolutions in 2.5 seconds on a single Nvidia 4090 GPU, and the extracted meshes are guaranteed to be continuous and manifold. The key components of OctFusion are the octree-based latent representation and the accompanying diffusion models. The representation combines the benefits of both implicit neural representations and explicit spatial octrees and is learned with an octree-based variational autoencoder. The proposed diffusion model is a unified multi-scale U-Net that enables weights and computation sharing across different octree levels and avoids the complexity of widely used cascaded diffusion schemes. We verify the effectiveness of OctFusion on the ShapeNet and Objaverse datasets and achieve state-of-the-art performances on shape generation tasks. We demonstrate that OctFusion is extendable and flexible by generating high-quality color fields for textured mesh generation and high-quality 3D shapes conditioned on text prompts, sketches, or category labels. Our code and pre-trained models are available at url{https://github.com/octree-nn/octfusion}.
{"title":"OctFusion: Octree-based Diffusion Models for 3D Shape Generation","authors":"Bojun Xiong, Si-Tong Wei, Xin-Yang Zheng, Yan-Pei Cao, Zhouhui Lian, Peng-Shuai Wang","doi":"arxiv-2408.14732","DOIUrl":"https://doi.org/arxiv-2408.14732","url":null,"abstract":"Diffusion models have emerged as a popular method for 3D generation. However,\u0000it is still challenging for diffusion models to efficiently generate diverse\u0000and high-quality 3D shapes. In this paper, we introduce OctFusion, which can\u0000generate 3D shapes with arbitrary resolutions in 2.5 seconds on a single Nvidia\u00004090 GPU, and the extracted meshes are guaranteed to be continuous and\u0000manifold. The key components of OctFusion are the octree-based latent\u0000representation and the accompanying diffusion models. The representation\u0000combines the benefits of both implicit neural representations and explicit\u0000spatial octrees and is learned with an octree-based variational autoencoder.\u0000The proposed diffusion model is a unified multi-scale U-Net that enables\u0000weights and computation sharing across different octree levels and avoids the\u0000complexity of widely used cascaded diffusion schemes. We verify the\u0000effectiveness of OctFusion on the ShapeNet and Objaverse datasets and achieve\u0000state-of-the-art performances on shape generation tasks. We demonstrate that\u0000OctFusion is extendable and flexible by generating high-quality color fields\u0000for textured mesh generation and high-quality 3D shapes conditioned on text\u0000prompts, sketches, or category labels. Our code and pre-trained models are\u0000available at url{https://github.com/octree-nn/octfusion}.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}