Pub Date : 2024-03-18DOI: 10.1016/j.gmod.2024.101216
Qun-Ce Xu , Yong-Liang Yang , Bailin Deng
Effective removal of noises from raw point clouds while preserving geometric features is the key challenge for point cloud denoising. To address this problem, we propose a novel method that jointly optimizes the point positions and normals. To preserve geometric features, our formulation uses a generalized robust error metric to enforce piecewise smoothness of the normal vector field as well as consistency between point positions and normals. By varying the parameter of the error metric, we gradually increase its non-convexity to guide the optimization towards a desirable solution. By combining alternating minimization with a majorization-minimization strategy, we develop a numerical solver for the optimization which guarantees convergence. The effectiveness of our method is demonstrated by extensive comparisons with previous works.
{"title":"Point cloud denoising using a generalized error metric","authors":"Qun-Ce Xu , Yong-Liang Yang , Bailin Deng","doi":"10.1016/j.gmod.2024.101216","DOIUrl":"10.1016/j.gmod.2024.101216","url":null,"abstract":"<div><p>Effective removal of noises from raw point clouds while preserving geometric features is the key challenge for point cloud denoising. To address this problem, we propose a novel method that jointly optimizes the point positions and normals. To preserve geometric features, our formulation uses a generalized robust error metric to enforce piecewise smoothness of the normal vector field as well as consistency between point positions and normals. By varying the parameter of the error metric, we gradually increase its non-convexity to guide the optimization towards a desirable solution. By combining alternating minimization with a majorization-minimization strategy, we develop a numerical solver for the optimization which guarantees convergence. The effectiveness of our method is demonstrated by extensive comparisons with previous works.</p></div>","PeriodicalId":55083,"journal":{"name":"Graphical Models","volume":"133 ","pages":"Article 101216"},"PeriodicalIF":1.7,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1524070324000043/pdfft?md5=48a1964c4abbec912ee9a17b6f0212cf&pid=1-s2.0-S1524070324000043-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140151694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-13DOI: 10.1016/j.gmod.2024.101215
Xiaokang Liu , Lin Lu , Lingxin Cao , Oliver Deussen , Changhe Tu
The auxetic structure demonstrates an unconventional deployable mechanism, expanding in transverse directions while being stretched longitudinally (exhibiting a negative Poisson’s ratio). This characteristic offers advantages in diverse fields such as structural engineering, flexible electronics, and medicine. The rotating (semi-)rigid structure, as a typical auxetic structure, has been introduced into the field of computer-aided design because of its well-defined motion patterns. These structures find application as deployable structures in various endeavors aiming to approximate and rapidly fabricate doubly-curved surfaces, thereby mitigating the challenges associated with their production and transportation. Nevertheless, prior designs relying on basic geometric elements primarily concentrate on exploring the inherent nature of the structure and often lack aesthetic appeal. To address this limitation, we propose a novel design and generation method inspired by dihedral Escher tessellations. By introducing a new metric function, we achieve efficient evaluation of shape deployability as well as filtering of tessellations, followed by a two-step deformation and edge-deployability optimization process to ensure compliance with deployability constraints while preserving semantic meanings. Furthermore, we optimize the shape through physical simulation to guarantee deployability in actual manufacturing and control Poisson’s ratio to a certain extent. Our method yields structures that are both semantically meaningful and aesthetically pleasing, showcasing promising potential for auxetic applications.
{"title":"Auxetic dihedral Escher tessellations","authors":"Xiaokang Liu , Lin Lu , Lingxin Cao , Oliver Deussen , Changhe Tu","doi":"10.1016/j.gmod.2024.101215","DOIUrl":"https://doi.org/10.1016/j.gmod.2024.101215","url":null,"abstract":"<div><p>The auxetic structure demonstrates an unconventional deployable mechanism, expanding in transverse directions while being stretched longitudinally (exhibiting a negative Poisson’s ratio). This characteristic offers advantages in diverse fields such as structural engineering, flexible electronics, and medicine. The rotating (semi-)rigid structure, as a typical auxetic structure, has been introduced into the field of computer-aided design because of its well-defined motion patterns. These structures find application as deployable structures in various endeavors aiming to approximate and rapidly fabricate doubly-curved surfaces, thereby mitigating the challenges associated with their production and transportation. Nevertheless, prior designs relying on basic geometric elements primarily concentrate on exploring the inherent nature of the structure and often lack aesthetic appeal. To address this limitation, we propose a novel design and generation method inspired by dihedral Escher tessellations. By introducing a new metric function, we achieve efficient evaluation of shape deployability as well as filtering of tessellations, followed by a two-step deformation and edge-deployability optimization process to ensure compliance with deployability constraints while preserving semantic meanings. Furthermore, we optimize the shape through physical simulation to guarantee deployability in actual manufacturing and control Poisson’s ratio to a certain extent. Our method yields structures that are both semantically meaningful and aesthetically pleasing, showcasing promising potential for auxetic applications.</p></div>","PeriodicalId":55083,"journal":{"name":"Graphical Models","volume":"133 ","pages":"Article 101215"},"PeriodicalIF":1.7,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1524070324000031/pdfft?md5=ee39dfa2350ffc88d6645119c393baed&pid=1-s2.0-S1524070324000031-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140122375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-28DOI: 10.1016/j.gmod.2024.101214
Bowen Lyu , Li-Yong Shen , Chun-Ming Yuan
We introduce IGF-Fit, a novel method for estimating surface normals from point clouds with varying noise and density. Unlike previous approaches that rely on point-wise weights and explicit representations, IGF-Fit employs a network that learns an implicit representation and uses derivatives to predict normals. The input patch serves as both a shape latent vector and query points for fitting the implicit representation. To handle noisy input, we introduce a novel noise transformation module with a training strategy for noise classification and latent vector bias prediction. Our experiments on synthetic and real-world scan datasets demonstrate the effectiveness of IGF-Fit, achieving state-of-the-art performance on both noise-free and density-varying data.
{"title":"IGF-Fit: Implicit gradient field fitting for point cloud normal estimation","authors":"Bowen Lyu , Li-Yong Shen , Chun-Ming Yuan","doi":"10.1016/j.gmod.2024.101214","DOIUrl":"https://doi.org/10.1016/j.gmod.2024.101214","url":null,"abstract":"<div><p>We introduce IGF-Fit, a novel method for estimating surface normals from point clouds with varying noise and density. Unlike previous approaches that rely on point-wise weights and explicit representations, IGF-Fit employs a network that learns an implicit representation and uses derivatives to predict normals. The input patch serves as both a shape latent vector and query points for fitting the implicit representation. To handle noisy input, we introduce a novel noise transformation module with a training strategy for noise classification and latent vector bias prediction. Our experiments on synthetic and real-world scan datasets demonstrate the effectiveness of IGF-Fit, achieving state-of-the-art performance on both noise-free and density-varying data.</p></div>","PeriodicalId":55083,"journal":{"name":"Graphical Models","volume":"133 ","pages":"Article 101214"},"PeriodicalIF":1.7,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S152407032400002X/pdfft?md5=49f2d24bca30ab2fb9811c74fa197c78&pid=1-s2.0-S152407032400002X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139993418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-02DOI: 10.1016/j.gmod.2024.101213
Flávio Coutinho , Luiz Chaimowicz
Asset creation in game development usually requires multiple iterations until a final version is achieved. This iterative process becomes more significant when the content is pixel art, in which the artist carefully places each pixel. We hypothesize that the problem of generating character sprites in a target pose (e.g., facing right) given a source (e.g., facing front) can be framed as an image-to-image translation task. Then, we present an architecture of deep generative models that takes as input an image of a character in one domain (pose) and transfers it to another. We approach the problem using generative adversarial networks (GANs) and build on Pix2Pix’s architecture while leveraging some specific characteristics of the pixel art style. We evaluated the trained models using four small datasets (less than 1k) and a more extensive and diverse one (12k). The models yielded promising results, and their generalization capacity varies according to the dataset size and variability. After training models to generate images among four domains (i.e., front, right, back, left), we present an early version of a mixed-initiative sprite editor that allows users to interact with them and iterate in creating character sprites.
{"title":"Pixel art character generation as an image-to-image translation problem using GANs","authors":"Flávio Coutinho , Luiz Chaimowicz","doi":"10.1016/j.gmod.2024.101213","DOIUrl":"10.1016/j.gmod.2024.101213","url":null,"abstract":"<div><p>Asset creation in game development usually requires multiple iterations until a final version is achieved. This iterative process becomes more significant when the content is pixel art, in which the artist carefully places each pixel. We hypothesize that the problem of generating character sprites in a target pose (e.g., facing right) given a source (e.g., facing front) can be framed as an image-to-image translation task. Then, we present an architecture of deep generative models that takes as input an image of a character in one domain (pose) and transfers it to another. We approach the problem using generative adversarial networks (GANs) and build on Pix2Pix’s architecture while leveraging some specific characteristics of the pixel art style. We evaluated the trained models using four small datasets (less than 1k) and a more extensive and diverse one (12k). The models yielded promising results, and their generalization capacity varies according to the dataset size and variability. After training models to generate images among four domains (i.e., front, right, back, left), we present an early version of a mixed-initiative sprite editor that allows users to interact with them and iterate in creating character sprites.</p></div>","PeriodicalId":55083,"journal":{"name":"Graphical Models","volume":"132 ","pages":"Article 101213"},"PeriodicalIF":1.7,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1524070324000018/pdfft?md5=d7948e383c160b41fc886121e68e438f&pid=1-s2.0-S1524070324000018-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139661295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-03DOI: 10.1016/j.gmod.2023.101212
Gabriel Fonseca Silva, Paulo Ricardo Knob, Rubens Halbig Montanha, Soraia Raupp Musse
Crowd simulation is a research area widely used in diverse fields, including gaming and security, assessing virtual agent movements through metrics like time to reach their goals, speed, trajectories, and densities. This is relevant for security applications, for instance, as different crowd configurations can determine the time people spend in environments trying to evacuate them. In this work, we extend WebCrowds, an authoring tool for crowd simulation, to allow users to build scenarios and evaluate them through a set of metrics. The aim is to provide a quantitative metric that can, based on simulation data, select the best crowd configuration in a certain environment. We conduct experiments to validate our proposed metric in multiple crowd simulation scenarios and perform a comparison with another metric found in the literature. The results show that experts in the domain of crowd scenarios agree with our proposed quantitative metric.
{"title":"Evaluating and comparing crowd simulations: Perspectives from a crowd authoring tool","authors":"Gabriel Fonseca Silva, Paulo Ricardo Knob, Rubens Halbig Montanha, Soraia Raupp Musse","doi":"10.1016/j.gmod.2023.101212","DOIUrl":"10.1016/j.gmod.2023.101212","url":null,"abstract":"<div><p>Crowd simulation is a research area widely used in diverse fields, including gaming and security, assessing virtual agent movements through metrics like time to reach their goals, speed, trajectories, and densities. This is relevant for security applications, for instance, as different crowd configurations can determine the time people spend in environments trying to evacuate them. In this work, we extend WebCrowds, an authoring tool for crowd simulation, to allow users to build scenarios and evaluate them through a set of metrics. The aim is to provide a quantitative metric that can, based on simulation data, select the best crowd configuration in a certain environment. We conduct experiments to validate our proposed metric in multiple crowd simulation scenarios and perform a comparison with another metric found in the literature. The results show that experts in the domain of crowd scenarios agree with our proposed quantitative metric.</p></div>","PeriodicalId":55083,"journal":{"name":"Graphical Models","volume":"131 ","pages":"Article 101212"},"PeriodicalIF":1.7,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1524070323000425/pdfft?md5=99cc8b127e117c8937d599aa1f5ebafe&pid=1-s2.0-S1524070323000425-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139084586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Editorial special issue on the 9th smart tools and applications in graphics conference (STAG 2022)","authors":"Daniela Cabiddu , Gianmarco Cherchi , Teseo Schneider","doi":"10.1016/j.gmod.2023.101203","DOIUrl":"10.1016/j.gmod.2023.101203","url":null,"abstract":"","PeriodicalId":55083,"journal":{"name":"Graphical Models","volume":"130 ","pages":"Article 101203"},"PeriodicalIF":1.7,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1524070323000334/pdfft?md5=5e8e5ee6713dd442b9a08e76744aae09&pid=1-s2.0-S1524070323000334-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135638180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-01DOI: 10.1016/j.gmod.2023.101209
Aleksandar Dimitrijević, Dejan Rančić
This paper presents performance improvements for Ellipsoid Clipmaps, an out-of-core planet-sized geodetically accurate terrain rendering algorithm. The performance improvements were achieved by eliminating unnecessarily dense levels, more accurate block culling in the geographic coordinate system, and more efficient rendering methods. The elimination of unnecessarily dense levels is the result of analyzing and determining the optimal relative height of the viewer with respect to the most detailed level, resulting in the most consistent size of triangles across all visible levels. The proposed method for estimating the visibility of blocks based on view orientation allows rapid block-level view frustum culling performed in data space before visualization and spatial transformation of blocks. The use of a modern geometry pipeline through task and mesh shaders forced the handling of extremely fine granularity of blocks, but also shifted a significant part of the block culling process from CPU to the GPU. The approach described achieves high throughput and enables geodetically accurate rendering of the terrain based on the WGS 84 reference ellipsoid at very high resolution and in real time, with tens of millions of triangles with an average area of about 0.5 pix on a 1080p screen on mid-range graphics cards.
{"title":"High-performance Ellipsoidal Clipmaps","authors":"Aleksandar Dimitrijević, Dejan Rančić","doi":"10.1016/j.gmod.2023.101209","DOIUrl":"https://doi.org/10.1016/j.gmod.2023.101209","url":null,"abstract":"<div><p>This paper presents performance improvements for Ellipsoid Clipmaps, an out-of-core planet-sized geodetically accurate terrain rendering algorithm. The performance improvements were achieved by eliminating unnecessarily dense levels, more accurate block culling in the geographic coordinate system, and more efficient rendering methods. The elimination of unnecessarily dense levels is the result of analyzing and determining the optimal relative height of the viewer with respect to the most detailed level, resulting in the most consistent size of triangles across all visible levels. The proposed method for estimating the visibility of blocks based on view orientation allows rapid block-level view frustum culling performed in data space before visualization and spatial transformation of blocks. The use of a modern geometry pipeline through task and mesh shaders forced the handling of extremely fine granularity of blocks, but also shifted a significant part of the block culling process from CPU to the GPU. The approach described achieves high throughput and enables geodetically accurate rendering of the terrain based on the WGS 84 reference ellipsoid at very high resolution and in real time, with tens of millions of triangles with an average area of about 0.5 pix<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span> on a 1080p screen on mid-range graphics cards.</p></div>","PeriodicalId":55083,"journal":{"name":"Graphical Models","volume":"130 ","pages":"Article 101209"},"PeriodicalIF":1.7,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1524070323000395/pdfft?md5=26122c390b83d408f64d205c80bb4675&pid=1-s2.0-S1524070323000395-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138466486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-28DOI: 10.1016/j.gmod.2023.101210
Yu-Wei Zhang , Hongguang Yang , Ping Luo , Zhi Li , Hui Liu , Zhongping Ji , Caiming Zhang
This paper aims at extending the method of Zhang et al. (2023) to produce not only portrait bas-reliefs from single photographs, but also high-depth reliefs with reasonable depth ordering. We cast this task as a problem of style-aware photo-to-depth translation, where the input is a photograph conditioned by a style vector and the output is a portrait relief with desired depth style. To construct ground-truth data for network training, we first propose an optimization-based method to synthesize high-depth reliefs from 3D portraits. Then, we train a normal-to-depth network to learn the mapping from normal maps to relief depths. After that, we use the trained network to generate high-depth relief samples using the provided normal maps from Zhang et al. (2023). As each normal map has pixel-wise photograph, we are able to establish correspondences between photographs and high-depth reliefs. By taking the bas-reliefs of Zhang et al. (2023), the new high-depth reliefs and their mixtures as target ground-truths, we finally train a encoder-to-decoder network to achieve style-aware relief modeling. Specially, the network is based on a U-shaped architecture, consisting of Swin Transformer blocks to process hierarchical deep features. Extensive experiments have demonstrated the effectiveness of the proposed method. Comparisons with previous works have verified its flexibility and state-of-the-art performance.
本文旨在扩展Zhang et al.(2023)的方法,不仅可以从单张照片中生成人像浅浮雕,还可以生成深度排序合理的高深度浮雕。我们将此任务作为样式感知照片到深度转换的问题,其中输入是由样式向量限定的照片,输出是具有所需深度样式的人像浮雕。为了构建用于网络训练的真地数据,我们首先提出了一种基于优化的方法,从三维人像中合成高深度浮雕。然后,我们训练一个法线到深度的网络来学习从法线到地形深度的映射。之后,我们使用训练好的网络,使用Zhang等人(2023)提供的法线图生成高深度地形样本。由于每个法线贴图都有像素级的照片,我们能够在照片和高深度浮雕之间建立对应关系。通过将Zhang等人(2023)的浅浮雕、新的高深度浮雕及其混合物作为目标ground-truth,我们最终训练了一个编码器到解码器网络,以实现风格感知浮雕建模。特别地,该网络基于由Swin Transformer块组成的u型结构来处理分层深度特征。大量的实验证明了该方法的有效性。与以前的作品比较,验证了它的灵活性和最先进的性能。
{"title":"Modeling multi-style portrait relief from a single photograph","authors":"Yu-Wei Zhang , Hongguang Yang , Ping Luo , Zhi Li , Hui Liu , Zhongping Ji , Caiming Zhang","doi":"10.1016/j.gmod.2023.101210","DOIUrl":"https://doi.org/10.1016/j.gmod.2023.101210","url":null,"abstract":"<div><p>This paper aims at extending the method of Zhang et al. (2023) to produce not only portrait bas-reliefs from single photographs, but also high-depth reliefs with reasonable depth ordering. We cast this task as a problem of style-aware photo-to-depth translation, where the input is a photograph conditioned by a style vector and the output is a portrait relief with desired depth style. To construct ground-truth data for network training, we first propose an optimization-based method to synthesize high-depth reliefs from 3D portraits. Then, we train a normal-to-depth network to learn the mapping from normal maps to relief depths. After that, we use the trained network to generate high-depth relief samples using the provided normal maps from Zhang et al. (2023). As each normal map has pixel-wise photograph, we are able to establish correspondences between photographs and high-depth reliefs. By taking the bas-reliefs of Zhang et al. (2023), the new high-depth reliefs and their mixtures as target ground-truths, we finally train a encoder-to-decoder network to achieve style-aware relief modeling. Specially, the network is based on a U-shaped architecture, consisting of Swin Transformer blocks to process hierarchical deep features. Extensive experiments have demonstrated the effectiveness of the proposed method. Comparisons with previous works have verified its flexibility and state-of-the-art performance.</p></div>","PeriodicalId":55083,"journal":{"name":"Graphical Models","volume":"130 ","pages":"Article 101210"},"PeriodicalIF":1.7,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1524070323000401/pdfft?md5=de53c7cacd318b65effd57ea40c70f18&pid=1-s2.0-S1524070323000401-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138454034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-08DOI: 10.1016/j.gmod.2023.101208
Jan Martens, Jörg Blankenbach
Modern laser scanners, depth sensor devices and Dense Image Matching techniques allow for capturing of extensive point cloud datasets. While capturing has become more user-friendly, the size of registered point clouds results in large datasets which pose challenges for processing, storage and visualization. This paper presents a decomposition scheme using oriented KD trees and the wavelet transform for unordered point clouds. Taking inspiration from image pyramids, the decomposition scheme comes with a Level of Detail representation where higher-levels are progressively reconstructed from lower ones, thus making it suitable for streaming and continuous Level of Detail. Furthermore, the decomposed representation allows common compression techniques to achieve higher compression ratios by modifying the underlying frequency data at the cost of geometric accuracy and therefore allows for flexible lossy compression. After introducing this novel decomposition scheme, results are discussed to show how it deals with data captured from different sources.
{"title":"A decomposition scheme for continuous Level of Detail, streaming and lossy compression of unordered point clouds","authors":"Jan Martens, Jörg Blankenbach","doi":"10.1016/j.gmod.2023.101208","DOIUrl":"https://doi.org/10.1016/j.gmod.2023.101208","url":null,"abstract":"<div><p>Modern laser scanners, depth sensor devices and Dense Image Matching techniques allow for capturing of extensive point cloud datasets. While capturing has become more user-friendly, the size of registered point clouds results in large datasets which pose challenges for processing, storage and visualization. This paper presents a decomposition scheme using oriented KD trees and the wavelet transform for unordered point clouds. Taking inspiration from image pyramids, the decomposition scheme comes with a Level of Detail representation where higher-levels are progressively reconstructed from lower ones, thus making it suitable for streaming and continuous Level of Detail. Furthermore, the decomposed representation allows common compression techniques to achieve higher compression ratios by modifying the underlying frequency data at the cost of geometric accuracy and therefore allows for flexible lossy compression. After introducing this novel decomposition scheme, results are discussed to show how it deals with data captured from different sources.</p></div>","PeriodicalId":55083,"journal":{"name":"Graphical Models","volume":"130 ","pages":"Article 101208"},"PeriodicalIF":1.7,"publicationDate":"2023-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1524070323000383/pdfft?md5=acb2ab838184d4b7e97e6052e64a6ea6&pid=1-s2.0-S1524070323000383-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"92047097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reconstructing 3D human pose and body shape from monocular images or videos is a fundamental task for comprehending human dynamics. Frame-based methods can be broadly categorized into two fashions: those regressing parametric model parameters (e.g., SMPL) and those exploring alternative representations (e.g., volumetric shapes, 3D coordinates). Non-parametric representations have demonstrated superior performance due to their enhanced flexibility. However, when applied to video data, these non-parametric frame-based methods tend to generate inconsistent and unsmooth results. To this end, we present a novel approach that directly regresses the 3D coordinates of the mesh vertices and body joints with a spatial–temporal Transformer. In our method, we introduce a SpatioTemporal Learning Block (STLB) with Spatial Learning Module (SLM) and Temporal Learning Module (TLM), which leverages spatial and temporal information to model interactions at a finer granularity, specifically at the body token level. Our method outperforms previous state-of-the-art approaches on Human3.6M and 3DPW benchmark datasets.
{"title":"Vertex position estimation with spatial–temporal transformer for 3D human reconstruction","authors":"Xiangjun Zhang, Yinglin Zheng, Wenjin Deng, Qifeng Dai, Yuxin Lin, Wangzheng Shi, Ming Zeng","doi":"10.1016/j.gmod.2023.101207","DOIUrl":"https://doi.org/10.1016/j.gmod.2023.101207","url":null,"abstract":"<div><p>Reconstructing 3D human pose and body shape from monocular images or videos is a fundamental task for comprehending human dynamics. Frame-based methods can be broadly categorized into two fashions: those regressing parametric model parameters (e.g., SMPL) and those exploring alternative representations (e.g., volumetric shapes, 3D coordinates). Non-parametric representations have demonstrated superior performance due to their enhanced flexibility. However, when applied to video data, these non-parametric frame-based methods tend to generate inconsistent and unsmooth results. To this end, we present a novel approach that directly regresses the 3D coordinates of the mesh vertices and body joints with a spatial–temporal Transformer. In our method, we introduce a SpatioTemporal Learning Block (STLB) with Spatial Learning Module (SLM) and Temporal Learning Module (TLM), which leverages spatial and temporal information to model interactions at a finer granularity, specifically at the body token level. Our method outperforms previous state-of-the-art approaches on Human3.6M and 3DPW benchmark datasets.</p></div>","PeriodicalId":55083,"journal":{"name":"Graphical Models","volume":"130 ","pages":"Article 101207"},"PeriodicalIF":1.7,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1524070323000371/pdfft?md5=a920877b3ee3210b23f7a6444d151f50&pid=1-s2.0-S1524070323000371-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"92047096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}