This paper presents a new algorithm, Weighted Squared Volume Minimization (WSVM), for generating high-quality tetrahedral meshes from closed triangle meshes. Drawing inspiration from the principle of minimal surfaces that minimize squared surface area, WSVM employs a new energy function integrating weighted squared volumes for tetrahedral elements. When minimized with constant weights, this energy promotes uniform volumes among the tetrahedra. Adjusting the weights to account for local geometry further achieves uniform dihedral angles within the mesh. The algorithm begins with an initial tetrahedral mesh generated via Delaunay tetrahedralization and proceeds by sequentially minimizing volume-oriented and then dihedral angle-oriented energies. At each stage, it alternates between optimizing vertex positions and refining mesh connectivity through the iterative process. The algorithm operates fully automatically and requires no parameter tuning. Evaluations on a variety of 3D models demonstrate that WSVM consistently produces tetrahedral meshes of higher quality, with fewer slivers and enhanced uniformity compared to existing methods. Check out further details at the project webpage: https://kaixinyu-hub.github.io/WSVM.github.io.
{"title":"Weighted Squared Volume Minimization (WSVM) for Generating Uniform Tetrahedral Meshes","authors":"Kaixin Yu, Yifu Wang, Peng Song, Xiangqiao Meng, Ying He, Jianjun Chen","doi":"arxiv-2409.05525","DOIUrl":"https://doi.org/arxiv-2409.05525","url":null,"abstract":"This paper presents a new algorithm, Weighted Squared Volume Minimization\u0000(WSVM), for generating high-quality tetrahedral meshes from closed triangle\u0000meshes. Drawing inspiration from the principle of minimal surfaces that\u0000minimize squared surface area, WSVM employs a new energy function integrating\u0000weighted squared volumes for tetrahedral elements. When minimized with constant\u0000weights, this energy promotes uniform volumes among the tetrahedra. Adjusting\u0000the weights to account for local geometry further achieves uniform dihedral\u0000angles within the mesh. The algorithm begins with an initial tetrahedral mesh\u0000generated via Delaunay tetrahedralization and proceeds by sequentially\u0000minimizing volume-oriented and then dihedral angle-oriented energies. At each\u0000stage, it alternates between optimizing vertex positions and refining mesh\u0000connectivity through the iterative process. The algorithm operates fully\u0000automatically and requires no parameter tuning. Evaluations on a variety of 3D\u0000models demonstrate that WSVM consistently produces tetrahedral meshes of higher\u0000quality, with fewer slivers and enhanced uniformity compared to existing\u0000methods. Check out further details at the project webpage:\u0000https://kaixinyu-hub.github.io/WSVM.github.io.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiangjun Tang, Linjun Wu, He Wang, Yiqian Wu, Bo Hu, Songnan Li, Xu Gong, Yuchen Liao, Qilong Kou, Xiaogang Jin
Motion style transfer changes the style of a motion while retaining its content and is useful in computer animations and games. Contact is an essential component of motion style transfer that should be controlled explicitly in order to express the style vividly while enhancing motion naturalness and quality. However, it is unknown how to decouple and control contact to achieve fine-grained control in motion style transfer. In this paper, we present a novel style transfer method for fine-grained control over contacts while achieving both motion naturalness and spatial-temporal variations of style. Based on our empirical evidence, we propose controlling contact indirectly through the hip velocity, which can be further decomposed into the trajectory and contact timing, respectively. To this end, we propose a new model that explicitly models the correlations between motions and trajectory/contact timing/style, allowing us to decouple and control each separately. Our approach is built around a motion manifold, where hip controls can be easily integrated into a Transformer-based decoder. It is versatile in that it can generate motions directly as well as be used as post-processing for existing methods to improve quality and contact controllability. In addition, we propose a new metric that measures a correlation pattern of motions based on our empirical evidence, aligning well with human perception in terms of motion naturalness. Based on extensive evaluation, our method outperforms existing methods in terms of style expressivity and motion quality.
{"title":"Decoupling Contact for Fine-Grained Motion Style Transfer","authors":"Xiangjun Tang, Linjun Wu, He Wang, Yiqian Wu, Bo Hu, Songnan Li, Xu Gong, Yuchen Liao, Qilong Kou, Xiaogang Jin","doi":"arxiv-2409.05387","DOIUrl":"https://doi.org/arxiv-2409.05387","url":null,"abstract":"Motion style transfer changes the style of a motion while retaining its\u0000content and is useful in computer animations and games. Contact is an essential\u0000component of motion style transfer that should be controlled explicitly in\u0000order to express the style vividly while enhancing motion naturalness and\u0000quality. However, it is unknown how to decouple and control contact to achieve\u0000fine-grained control in motion style transfer. In this paper, we present a\u0000novel style transfer method for fine-grained control over contacts while\u0000achieving both motion naturalness and spatial-temporal variations of style.\u0000Based on our empirical evidence, we propose controlling contact indirectly\u0000through the hip velocity, which can be further decomposed into the trajectory\u0000and contact timing, respectively. To this end, we propose a new model that\u0000explicitly models the correlations between motions and trajectory/contact\u0000timing/style, allowing us to decouple and control each separately. Our approach\u0000is built around a motion manifold, where hip controls can be easily integrated\u0000into a Transformer-based decoder. It is versatile in that it can generate\u0000motions directly as well as be used as post-processing for existing methods to\u0000improve quality and contact controllability. In addition, we propose a new\u0000metric that measures a correlation pattern of motions based on our empirical\u0000evidence, aligning well with human perception in terms of motion naturalness.\u0000Based on extensive evaluation, our method outperforms existing methods in terms\u0000of style expressivity and motion quality.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sheng Ye, Yuze He, Matthieu Lin, Jenny Sheng, Ruoyu Fan, Yiheng Han, Yubin Hu, Ran Yi, Yu-Hui Wen, Yong-Jin Liu, Wenping Wang
Neural implicit representations have revolutionized dense multi-view surface reconstruction, yet their performance significantly diminishes with sparse input views. A few pioneering works have sought to tackle the challenge of sparse-view reconstruction by leveraging additional geometric priors or multi-scene generalizability. However, they are still hindered by the imperfect choice of input views, using images under empirically determined viewpoints to provide considerable overlap. We propose PVP-Recon, a novel and effective sparse-view surface reconstruction method that progressively plans the next best views to form an optimal set of sparse viewpoints for image capturing. PVP-Recon starts initial surface reconstruction with as few as 3 views and progressively adds new views which are determined based on a novel warping score that reflects the information gain of each newly added view. This progressive view planning progress is interleaved with a neural SDF-based reconstruction module that utilizes multi-resolution hash features, enhanced by a progressive training scheme and a directional Hessian loss. Quantitative and qualitative experiments on three benchmark datasets show that our framework achieves high-quality reconstruction with a constrained input budget and outperforms existing baselines.
{"title":"PVP-Recon: Progressive View Planning via Warping Consistency for Sparse-View Surface Reconstruction","authors":"Sheng Ye, Yuze He, Matthieu Lin, Jenny Sheng, Ruoyu Fan, Yiheng Han, Yubin Hu, Ran Yi, Yu-Hui Wen, Yong-Jin Liu, Wenping Wang","doi":"arxiv-2409.05474","DOIUrl":"https://doi.org/arxiv-2409.05474","url":null,"abstract":"Neural implicit representations have revolutionized dense multi-view surface\u0000reconstruction, yet their performance significantly diminishes with sparse\u0000input views. A few pioneering works have sought to tackle the challenge of\u0000sparse-view reconstruction by leveraging additional geometric priors or\u0000multi-scene generalizability. However, they are still hindered by the imperfect\u0000choice of input views, using images under empirically determined viewpoints to\u0000provide considerable overlap. We propose PVP-Recon, a novel and effective\u0000sparse-view surface reconstruction method that progressively plans the next\u0000best views to form an optimal set of sparse viewpoints for image capturing.\u0000PVP-Recon starts initial surface reconstruction with as few as 3 views and\u0000progressively adds new views which are determined based on a novel warping\u0000score that reflects the information gain of each newly added view. This\u0000progressive view planning progress is interleaved with a neural SDF-based\u0000reconstruction module that utilizes multi-resolution hash features, enhanced by\u0000a progressive training scheme and a directional Hessian loss. Quantitative and\u0000qualitative experiments on three benchmark datasets show that our framework\u0000achieves high-quality reconstruction with a constrained input budget and\u0000outperforms existing baselines.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Congyi Zhang, Jinfan Yang, Eric Hedlin, Suzuran Takikawa, Nicholas Vining, Kwang Moo Yi, Wenping Wang, Alla Sheffer
Compressed representations of 3D shapes that are compact, accurate, and can be processed efficiently directly in compressed form, are extremely useful for digital media applications. Recent approaches in this space focus on learned implicit or parametric representations. While implicits are well suited for tasks such as in-out queries, they lack natural 2D parameterization, complicating tasks such as texture or normal mapping. Conversely, parametric representations support the latter tasks but are ill-suited for occupancy queries. We propose a novel learned alternative to these approaches, based on intersections of localized explicit, or height-field, surfaces. Since explicits can be trivially expressed both implicitly and parametrically, NESI directly supports a wider range of processing operations than implicit alternatives, including occupancy queries and parametric access. We represent input shapes using a collection of differently oriented height-field bounded half-spaces combined using volumetric Boolean intersections. We first tightly bound each input using a pair of oppositely oriented height-fields, forming a Double Height-Field (DHF) Hull. We refine this hull by intersecting it with additional localized height-fields (HFs) that capture surface regions in its interior. We minimize the number of HFs necessary to accurately capture each input and compactly encode both the DHF hull and the local HFs as neural functions defined over subdomains of R^2. This reduced dimensionality encoding delivers high-quality compact approximations. Given similar parameter count, or storage capacity, NESI significantly reduces approximation error compared to the state of the art, especially at lower parameter counts.
{"title":"NESI: Shape Representation via Neural Explicit Surface Intersection","authors":"Congyi Zhang, Jinfan Yang, Eric Hedlin, Suzuran Takikawa, Nicholas Vining, Kwang Moo Yi, Wenping Wang, Alla Sheffer","doi":"arxiv-2409.06030","DOIUrl":"https://doi.org/arxiv-2409.06030","url":null,"abstract":"Compressed representations of 3D shapes that are compact, accurate, and can\u0000be processed efficiently directly in compressed form, are extremely useful for\u0000digital media applications. Recent approaches in this space focus on learned\u0000implicit or parametric representations. While implicits are well suited for\u0000tasks such as in-out queries, they lack natural 2D parameterization,\u0000complicating tasks such as texture or normal mapping. Conversely, parametric\u0000representations support the latter tasks but are ill-suited for occupancy\u0000queries. We propose a novel learned alternative to these approaches, based on\u0000intersections of localized explicit, or height-field, surfaces. Since explicits\u0000can be trivially expressed both implicitly and parametrically, NESI directly\u0000supports a wider range of processing operations than implicit alternatives,\u0000including occupancy queries and parametric access. We represent input shapes\u0000using a collection of differently oriented height-field bounded half-spaces\u0000combined using volumetric Boolean intersections. We first tightly bound each\u0000input using a pair of oppositely oriented height-fields, forming a Double\u0000Height-Field (DHF) Hull. We refine this hull by intersecting it with additional\u0000localized height-fields (HFs) that capture surface regions in its interior. We\u0000minimize the number of HFs necessary to accurately capture each input and\u0000compactly encode both the DHF hull and the local HFs as neural functions\u0000defined over subdomains of R^2. This reduced dimensionality encoding delivers\u0000high-quality compact approximations. Given similar parameter count, or storage\u0000capacity, NESI significantly reduces approximation error compared to the state\u0000of the art, especially at lower parameter counts.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Benjamin Attal, Dor Verbin, Ben Mildenhall, Peter Hedman, Jonathan T. Barron, Matthew O'Toole, Pratul P. Srinivasan
State-of-the-art techniques for 3D reconstruction are largely based on volumetric scene representations, which require sampling multiple points to compute the color arriving along a ray. Using these representations for more general inverse rendering -- reconstructing geometry, materials, and lighting from observed images -- is challenging because recursively path-tracing such volumetric representations is expensive. Recent works alleviate this issue through the use of radiance caches: data structures that store the steady-state, infinite-bounce radiance arriving at any point from any direction. However, these solutions rely on approximations that introduce bias into the renderings and, more importantly, into the gradients used for optimization. We present a method that avoids these approximations while remaining computationally efficient. In particular, we leverage two techniques to reduce variance for unbiased estimators of the rendering equation: (1) an occlusion-aware importance sampler for incoming illumination and (2) a fast cache architecture that can be used as a control variate for the radiance from a high-quality, but more expensive, volumetric cache. We show that by removing these biases our approach improves the generality of radiance cache based inverse rendering, as well as increasing quality in the presence of challenging light transport effects such as specular reflections.
{"title":"Flash Cache: Reducing Bias in Radiance Cache Based Inverse Rendering","authors":"Benjamin Attal, Dor Verbin, Ben Mildenhall, Peter Hedman, Jonathan T. Barron, Matthew O'Toole, Pratul P. Srinivasan","doi":"arxiv-2409.05867","DOIUrl":"https://doi.org/arxiv-2409.05867","url":null,"abstract":"State-of-the-art techniques for 3D reconstruction are largely based on\u0000volumetric scene representations, which require sampling multiple points to\u0000compute the color arriving along a ray. Using these representations for more\u0000general inverse rendering -- reconstructing geometry, materials, and lighting\u0000from observed images -- is challenging because recursively path-tracing such\u0000volumetric representations is expensive. Recent works alleviate this issue\u0000through the use of radiance caches: data structures that store the\u0000steady-state, infinite-bounce radiance arriving at any point from any\u0000direction. However, these solutions rely on approximations that introduce bias\u0000into the renderings and, more importantly, into the gradients used for\u0000optimization. We present a method that avoids these approximations while\u0000remaining computationally efficient. In particular, we leverage two techniques\u0000to reduce variance for unbiased estimators of the rendering equation: (1) an\u0000occlusion-aware importance sampler for incoming illumination and (2) a fast\u0000cache architecture that can be used as a control variate for the radiance from\u0000a high-quality, but more expensive, volumetric cache. We show that by removing\u0000these biases our approach improves the generality of radiance cache based\u0000inverse rendering, as well as increasing quality in the presence of challenging\u0000light transport effects such as specular reflections.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Score Distillation Sampling (SDS) has emerged as a prevalent technique for text-to-3D generation, enabling 3D content creation by distilling view-dependent information from text-to-2D guidance. However, they frequently exhibit shortcomings such as over-saturated color and excess smoothness. In this paper, we conduct a thorough analysis of SDS and refine its formulation, finding that the core design is to model the distribution of rendered images. Following this insight, we introduce a novel strategy called Variational Distribution Mapping (VDM), which expedites the distribution modeling process by regarding the rendered images as instances of degradation from diffusion-based generation. This special design enables the efficient training of variational distribution by skipping the calculations of the Jacobians in the diffusion U-Net. We also introduce timestep-dependent Distribution Coefficient Annealing (DCA) to further improve distilling precision. Leveraging VDM and DCA, we use Gaussian Splatting as the 3D representation and build a text-to-3D generation framework. Extensive experiments and evaluations demonstrate the capability of VDM and DCA to generate high-fidelity and realistic assets with optimization efficiency.
{"title":"DreamMapping: High-Fidelity Text-to-3D Generation via Variational Distribution Mapping","authors":"Zeyu Cai, Duotun Wang, Yixun Liang, Zhijing Shao, Ying-Cong Chen, Xiaohang Zhan, Zeyu Wang","doi":"arxiv-2409.05099","DOIUrl":"https://doi.org/arxiv-2409.05099","url":null,"abstract":"Score Distillation Sampling (SDS) has emerged as a prevalent technique for\u0000text-to-3D generation, enabling 3D content creation by distilling\u0000view-dependent information from text-to-2D guidance. However, they frequently\u0000exhibit shortcomings such as over-saturated color and excess smoothness. In\u0000this paper, we conduct a thorough analysis of SDS and refine its formulation,\u0000finding that the core design is to model the distribution of rendered images.\u0000Following this insight, we introduce a novel strategy called Variational\u0000Distribution Mapping (VDM), which expedites the distribution modeling process\u0000by regarding the rendered images as instances of degradation from\u0000diffusion-based generation. This special design enables the efficient training\u0000of variational distribution by skipping the calculations of the Jacobians in\u0000the diffusion U-Net. We also introduce timestep-dependent Distribution\u0000Coefficient Annealing (DCA) to further improve distilling precision. Leveraging\u0000VDM and DCA, we use Gaussian Splatting as the 3D representation and build a\u0000text-to-3D generation framework. Extensive experiments and evaluations\u0000demonstrate the capability of VDM and DCA to generate high-fidelity and\u0000realistic assets with optimization efficiency.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the advancement of computer vision, dynamic 3D reconstruction techniques have seen significant progress and found applications in various fields. However, these techniques generate large amounts of 3D data sequences, necessitating efficient storage and transmission methods. Existing 3D model compression methods primarily focus on static models and do not consider inter-frame information, limiting their ability to reduce data size. Temporal mesh compression, which has received less attention, often requires all input meshes to have the same topology, a condition rarely met in real-world applications. This research proposes a method to compress mesh sequences with arbitrary topology using temporal correspondence and mesh deformation. The method establishes temporal correspondence between consecutive frames, applies a deformation model to transform the mesh from one frame to subsequent frames, and replaces the original meshes with deformed ones if the quality meets a tolerance threshold. Extensive experiments demonstrate that this method can achieve state-of-the-art performance in terms of compression performance. The contributions of this paper include a geometry and motion-based model for establishing temporal correspondence between meshes, a mesh quality assessment for temporal mesh sequences, an entropy-based encoding and corner table-based method for compressing mesh sequences, and extensive experiments showing the effectiveness of the proposed method. All the code will be open-sourced at https://github.com/lszhuhaichao/ultron.
{"title":"Ultron: Enabling Temporal Geometry Compression of 3D Mesh Sequences using Temporal Correspondence and Mesh Deformation","authors":"Haichao Zhu","doi":"arxiv-2409.05151","DOIUrl":"https://doi.org/arxiv-2409.05151","url":null,"abstract":"With the advancement of computer vision, dynamic 3D reconstruction techniques\u0000have seen significant progress and found applications in various fields.\u0000However, these techniques generate large amounts of 3D data sequences,\u0000necessitating efficient storage and transmission methods. Existing 3D model\u0000compression methods primarily focus on static models and do not consider\u0000inter-frame information, limiting their ability to reduce data size. Temporal\u0000mesh compression, which has received less attention, often requires all input\u0000meshes to have the same topology, a condition rarely met in real-world\u0000applications. This research proposes a method to compress mesh sequences with\u0000arbitrary topology using temporal correspondence and mesh deformation. The\u0000method establishes temporal correspondence between consecutive frames, applies\u0000a deformation model to transform the mesh from one frame to subsequent frames,\u0000and replaces the original meshes with deformed ones if the quality meets a\u0000tolerance threshold. Extensive experiments demonstrate that this method can\u0000achieve state-of-the-art performance in terms of compression performance. The\u0000contributions of this paper include a geometry and motion-based model for\u0000establishing temporal correspondence between meshes, a mesh quality assessment\u0000for temporal mesh sequences, an entropy-based encoding and corner table-based\u0000method for compressing mesh sequences, and extensive experiments showing the\u0000effectiveness of the proposed method. All the code will be open-sourced at\u0000https://github.com/lszhuhaichao/ultron.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zimu Liao, Siyan Chen, Rong Fu, Yi Wang, Zhongling Su, Hao Luo, Linning Xu, Bo Dai, Hengjie Li, Zhilin Pei, Xingcheng Zhang
Recently, 3D Gaussian Splatting (3DGS) has garnered attention for its high fidelity and real-time rendering. However, adapting 3DGS to different camera models, particularly fisheye lenses, poses challenges due to the unique 3D to 2D projection calculation. Additionally, there are inefficiencies in the tile-based splatting, especially for the extreme curvature and wide field of view of fisheye lenses, which are crucial for its broader real-life applications. To tackle these challenges, we introduce Fisheye-GS.This innovative method recalculates the projection transformation and its gradients for fisheye cameras. Our approach can be seamlessly integrated as a module into other efficient 3D rendering methods, emphasizing its extensibility, lightweight nature, and modular design. Since we only modified the projection component, it can also be easily adapted for use with different camera models. Compared to methods that train after undistortion, our approach demonstrates a clear improvement in visual quality.
{"title":"Fisheye-GS: Lightweight and Extensible Gaussian Splatting Module for Fisheye Cameras","authors":"Zimu Liao, Siyan Chen, Rong Fu, Yi Wang, Zhongling Su, Hao Luo, Linning Xu, Bo Dai, Hengjie Li, Zhilin Pei, Xingcheng Zhang","doi":"arxiv-2409.04751","DOIUrl":"https://doi.org/arxiv-2409.04751","url":null,"abstract":"Recently, 3D Gaussian Splatting (3DGS) has garnered attention for its high\u0000fidelity and real-time rendering. However, adapting 3DGS to different camera\u0000models, particularly fisheye lenses, poses challenges due to the unique 3D to\u00002D projection calculation. Additionally, there are inefficiencies in the\u0000tile-based splatting, especially for the extreme curvature and wide field of\u0000view of fisheye lenses, which are crucial for its broader real-life\u0000applications. To tackle these challenges, we introduce Fisheye-GS.This\u0000innovative method recalculates the projection transformation and its gradients\u0000for fisheye cameras. Our approach can be seamlessly integrated as a module into\u0000other efficient 3D rendering methods, emphasizing its extensibility,\u0000lightweight nature, and modular design. Since we only modified the projection\u0000component, it can also be easily adapted for use with different camera models.\u0000Compared to methods that train after undistortion, our approach demonstrates a\u0000clear improvement in visual quality.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yotam Erel, Or Kozlovsky-Mordenfeld, Daisuke Iwai, Kosuke Sato, Amit H. Bermano
We present a technique for dynamically projecting 3D content onto human hands with short perceived motion-to-photon latency. Computing the pose and shape of human hands accurately and quickly is a challenging task due to their articulated and deformable nature. We combine a slower 3D coarse estimation of the hand pose with high speed 2D correction steps which improve the alignment of the projection to the hands, increase the projected surface area, and reduce perceived latency. Since our approach leverages a full 3D reconstruction of the hands, any arbitrary texture or reasonably performant effect can be applied, which was not possible before. We conducted two user studies to assess the benefits of using our method. The results show subjects are less sensitive to latency artifacts and perform faster and with more ease a given associated task over the naive approach of directly projecting rendered frames from the 3D pose estimation. We demonstrate several novel use cases and applications.
{"title":"Casper DPM: Cascaded Perceptual Dynamic Projection Mapping onto Hands","authors":"Yotam Erel, Or Kozlovsky-Mordenfeld, Daisuke Iwai, Kosuke Sato, Amit H. Bermano","doi":"arxiv-2409.04397","DOIUrl":"https://doi.org/arxiv-2409.04397","url":null,"abstract":"We present a technique for dynamically projecting 3D content onto human hands\u0000with short perceived motion-to-photon latency. Computing the pose and shape of\u0000human hands accurately and quickly is a challenging task due to their\u0000articulated and deformable nature. We combine a slower 3D coarse estimation of\u0000the hand pose with high speed 2D correction steps which improve the alignment\u0000of the projection to the hands, increase the projected surface area, and reduce\u0000perceived latency. Since our approach leverages a full 3D reconstruction of the\u0000hands, any arbitrary texture or reasonably performant effect can be applied,\u0000which was not possible before. We conducted two user studies to assess the\u0000benefits of using our method. The results show subjects are less sensitive to\u0000latency artifacts and perform faster and with more ease a given associated task\u0000over the naive approach of directly projecting rendered frames from the 3D pose\u0000estimation. We demonstrate several novel use cases and applications.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stephan Olbrich, Andreas Beckert, Cécile Michel, Christian Schroer, Samaneh Ehteram, Andreas Schropp, Philipp Paetzold
Cuneiform is the earliest known system of writing, first developed for the Sumerian language of southern Mesopotamia in the second half of the 4th millennium BC. Cuneiform signs are obtained by impressing a stylus on fresh clay tablets. For certain purposes, e.g. authentication by seal imprint, some cuneiform tablets were enclosed in clay envelopes, which cannot be opened without destroying them. The aim of our interdisciplinary project is the non-invasive study of clay tablets. A portable X-ray micro-CT scanner is developed to acquire density data of such artifacts on a high-resolution, regular 3D grid at collection sites. The resulting volume data is processed through feature-preserving denoising, extraction of high-accuracy surfaces using a manifold dual marching cubes algorithm and extraction of local features by enhanced curvature rendering and ambient occlusion. For the non-invasive study of cuneiform inscriptions, the tablet is virtually separated from its envelope by curvature-based segmentation. The computational- and data-intensive algorithms are optimized or near-real-time offline usage with limited resources at collection sites. To visualize the complexity-reduced and octree-based compressed representation of surfaces, we develop and implement an interactive application. To facilitate the analysis of such clay tablets, we implement shape-based feature extraction algorithms to enhance cuneiform recognition. Our workflow supports innovative 3D display and interaction techniques such as autostereoscopic displays and gesture control.
楔形文字是已知最早的书写系统,最早是在公元前 4 世纪后半期为美索不达米亚南部的苏美尔语而开发的。楔形文字的符号是通过在新鲜粘土片上刻画笔迹而获得的。出于某些目的,例如通过印章印记进行鉴定,一些楔形文字石碑被装在粘土封套中,如果不破坏封套就无法打开。我们的跨学科项目旨在对泥板进行无损伤研究。我们开发了一种便携式 X 射线显微 CT 扫描仪,可在采集地点的高分辨率、规则三维网格上获取此类文物的密度数据。通过特征保留去噪、使用流形双行进立方体算法提取高精度表面以及通过增强曲率渲染和环境遮挡提取局部特征,对得到的体积数据进行处理。在对楔形文字铭文进行非侵入式研究时,通过基于曲率的分割,将碑文与其封套几乎分离开来。对计算和数据密集型算法进行了优化,以便在收集地点资源有限的情况下进行近乎实时的离线使用。为了使复杂度降低和基于八度压缩的表面表示可视化,我们开发并实现了一个交互式应用程序。为便于分析此类泥板,我们实施了基于形状的特征提取算法,以提高楔形文字的识别能力。我们的工作流程支持创新的三维显示和交互技术,如自立体显示和手势控制。
{"title":"Efficient Analysis and Visualization of High-Resolution Computed Tomography Data for the Exploration of Enclosed Cuneiform Tablets","authors":"Stephan Olbrich, Andreas Beckert, Cécile Michel, Christian Schroer, Samaneh Ehteram, Andreas Schropp, Philipp Paetzold","doi":"arxiv-2409.04236","DOIUrl":"https://doi.org/arxiv-2409.04236","url":null,"abstract":"Cuneiform is the earliest known system of writing, first developed for the\u0000Sumerian language of southern Mesopotamia in the second half of the 4th\u0000millennium BC. Cuneiform signs are obtained by impressing a stylus on fresh\u0000clay tablets. For certain purposes, e.g. authentication by seal imprint, some\u0000cuneiform tablets were enclosed in clay envelopes, which cannot be opened\u0000without destroying them. The aim of our interdisciplinary project is the\u0000non-invasive study of clay tablets. A portable X-ray micro-CT scanner is\u0000developed to acquire density data of such artifacts on a high-resolution,\u0000regular 3D grid at collection sites. The resulting volume data is processed\u0000through feature-preserving denoising, extraction of high-accuracy surfaces\u0000using a manifold dual marching cubes algorithm and extraction of local features\u0000by enhanced curvature rendering and ambient occlusion. For the non-invasive\u0000study of cuneiform inscriptions, the tablet is virtually separated from its\u0000envelope by curvature-based segmentation. The computational- and data-intensive\u0000algorithms are optimized or near-real-time offline usage with limited resources\u0000at collection sites. To visualize the complexity-reduced and octree-based\u0000compressed representation of surfaces, we develop and implement an interactive\u0000application. To facilitate the analysis of such clay tablets, we implement\u0000shape-based feature extraction algorithms to enhance cuneiform recognition. Our\u0000workflow supports innovative 3D display and interaction techniques such as\u0000autostereoscopic displays and gesture control.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}