Pub Date : 2024-06-22DOI: 10.1016/j.cag.2024.103983
Daniel Martin, Diego Gutierrez, Belen Masia
Predicting the path followed by the viewer’s eyes when observing an image (a scanpath) is a challenging problem, particularly due to the inter- and intra-observer variability and the spatio-temporal dependencies of the visual attention process. Most existing approaches have focused on progressively optimizing the prediction of a gaze point given the previous ones. In this work we propose instead a probabilistic approach, which we call tSPM-Net. We build our method to account for observers’ variability by resorting to Bayesian deep learning and a probabilistic approach. Besides, we optimize our model to jointly consider both spatial and temporal dimensions of scanpaths using a novel spatio-temporal loss function based on a combination of Kullback–Leibler divergence and dynamic time warping. Our tSPM-Net yields results that outperform those of current state-of-the-art approaches, and are closer to the human baseline, suggesting that our model is able to generate scanpaths whose behavior closely resembles those of the real ones.
{"title":"tSPM-Net: A probabilistic spatio-temporal approach for scanpath prediction","authors":"Daniel Martin, Diego Gutierrez, Belen Masia","doi":"10.1016/j.cag.2024.103983","DOIUrl":"https://doi.org/10.1016/j.cag.2024.103983","url":null,"abstract":"<div><p>Predicting the path followed by the viewer’s eyes when observing an image (a scanpath) is a challenging problem, particularly due to the inter- and intra-observer variability and the spatio-temporal dependencies of the visual attention process. Most existing approaches have focused on progressively optimizing the prediction of a gaze point given the previous ones. In this work we propose instead a probabilistic approach, which we call tSPM-Net. We build our method to account for observers’ variability by resorting to Bayesian deep learning and a probabilistic approach. Besides, we optimize our model to jointly consider both spatial and temporal dimensions of scanpaths using a novel spatio-temporal loss function based on a combination of Kullback–Leibler divergence and dynamic time warping. Our tSPM-Net yields results that outperform those of current state-of-the-art approaches, and are closer to the human baseline, suggesting that our model is able to generate scanpaths whose behavior closely resembles those of the real ones.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0097849324001183/pdfft?md5=63aa8280628676f6a3b43ea567f229a9&pid=1-s2.0-S0097849324001183-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141482729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-21DOI: 10.1016/j.cag.2024.103979
Tolga Yildiz , Ergun Akleman
In this paper, we present an algebraic framework that can be used to construct a large class of 3D shapes and structures that can potentially provide unusual material properties. We formalized this framework as a 3D generalization of planar nonwoven textile structures that are used to mimic the woven structures. Our extension is based on the fact that it is straightforward to extend planar nonwoven textile structures into volumetric nonwoven textile structures, which we also call nonwoven volumetric fabrics. This property is essential because such an extension is impossible with planar woven structures. In other words, using this approach, it can be possible to easily produce volumetric structures that mimic the fabric behavior as if they were planar nonwoven textile structures, which is impossible to produce. These volumetric structures also correspond to regular & semiregular frame structures and are capable of representing previously unknown infinite regular polyhedra and flexible wood structures.
{"title":"Volumetric nonwoven structures: An algebraic framework for systematic design of infinite polyhedral frames using nonwoven fabric patterns","authors":"Tolga Yildiz , Ergun Akleman","doi":"10.1016/j.cag.2024.103979","DOIUrl":"https://doi.org/10.1016/j.cag.2024.103979","url":null,"abstract":"<div><p>In this paper, we present an algebraic framework that can be used to construct a large class of 3D shapes and structures that can potentially provide unusual material properties. We formalized this framework as a 3D generalization of planar nonwoven textile structures that are used to mimic the woven structures. Our extension is based on the fact that it is straightforward to extend planar nonwoven textile structures into volumetric nonwoven textile structures, which we also call nonwoven volumetric fabrics. This property is essential because such an extension is impossible with planar woven structures. In other words, using this approach, it can be possible to easily produce volumetric structures that mimic the fabric behavior as if they were planar nonwoven textile structures, which is impossible to produce. These volumetric structures also correspond to regular & semiregular frame structures and are capable of representing previously unknown infinite regular polyhedra and flexible wood structures.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141482727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-19DOI: 10.1016/j.cag.2024.103980
Tengyao Cui , Yongfang Wang , Yingjie Yang , Yihan Wang
Reconstructing High Dynamic Range (HDR) video from alternating exposure Low Dynamic Range (LDR) sequence is an exceptionally challenging task. It not only demands the reliable reconstruction of missing information caused by occlusion or motion without introducing artifacts but also balances the exposure differences between frames to ensure a visually pleasing reconstructed HDR video. Unfortunately, existing methods are typically complex and struggle with unavoidable artifacts and noise, especially when dealing with low-exposed scenes. To tackle this formidable challenge, we propose a two-stage HDR video reconstruction method that employs a global to local alignment strategy. Firstly, we utilize iterative optical flow estimation and hybrid weighting to achieve global alignment, ensuring well-reconstructed in majority of areas. Secondly, the recursive refinement network further addresses locally misaligned areas, reconstructing HDR frames from bottom to top and recursively refining them to yield faithful reconstruction results. Extensive experimental results demonstrate that our method generates the HDR video with fine details and superior visually, surpassing the state-of-the-art method across diverse scenes.
{"title":"GLHDR: HDR video reconstruction driven by global to local alignment strategy","authors":"Tengyao Cui , Yongfang Wang , Yingjie Yang , Yihan Wang","doi":"10.1016/j.cag.2024.103980","DOIUrl":"https://doi.org/10.1016/j.cag.2024.103980","url":null,"abstract":"<div><p>Reconstructing High Dynamic Range (HDR) video from alternating exposure Low Dynamic Range (LDR) sequence is an exceptionally challenging task. It not only demands the reliable reconstruction of missing information caused by occlusion or motion without introducing artifacts but also balances the exposure differences between frames to ensure a visually pleasing reconstructed HDR video. Unfortunately, existing methods are typically complex and struggle with unavoidable artifacts and noise, especially when dealing with low-exposed scenes. To tackle this formidable challenge, we propose a two-stage HDR video reconstruction method that employs a global to local alignment strategy. Firstly, we utilize iterative optical flow estimation and hybrid weighting to achieve global alignment, ensuring well-reconstructed in majority of areas. Secondly, the recursive refinement network further addresses locally misaligned areas, reconstructing HDR frames from bottom to top and recursively refining them to yield faithful reconstruction results. Extensive experimental results demonstrate that our method generates the HDR video with fine details and superior visually, surpassing the state-of-the-art method across diverse scenes.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141482730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-19DOI: 10.1016/j.cag.2024.103976
Michael J. Hua , Junjie Wu , Zichun Zhong
In order to facilitate the robust and precise 3D vessel shape extraction and quantification from in-vivo Magnetic Resonance Imaging (MRI), this paper presents a novel multi-scale Knowledge Transfer Vision Transformer (i.e., KT-ViT) for 3D vessel shape segmentation. First, it uniquely integrates convolutional embeddings with transformer in a U-net architecture, which simultaneously responds to local receptive fields with convolution layers and global contexts with transformer encoders in a multi-scale fashion. Therefore, it intrinsically enriches local vessel feature and simultaneously promotes global connectivity and continuity for a more accurate and reliable vessel shape segmentation. Furthermore, to enable using relatively low-resolution (LR) images to segment fine scale vessel shapes, a novel knowledge transfer network is designed to explore the inter-dependencies of data and automatically transfer the knowledge gained from high-resolution (HR) data to the low-resolution handling network at multiple levels, including the multi-scale feature levels and the decision level, through an integration of multi-level loss functions. The modeling capability of fine-scale vessel shape data distribution, possessed by the HR image transformer network, can be transferred to the LR image transformer to enhance its knowledge for fine vessel shape segmentation. Extensive experimental results on public image datasets have demonstrated that our method outperforms all other state-of-the-art deep learning methods.
为了促进从体内磁共振成像(MRI)中提取和量化稳健而精确的三维血管形状,本文提出了一种用于三维血管形状分割的新型多尺度知识转移视觉变换器(即 KT-ViT)。首先,它独特地将卷积嵌入与变换器整合在一个 U 型网络架构中,同时以多尺度方式用卷积层响应局部感受野,用变换器编码器响应全局上下文。因此,它从本质上丰富了局部血管特征,同时促进了全局连接性和连续性,从而实现了更准确、更可靠的血管形状分割。此外,为了能够使用相对低分辨率(LR)图像来分割精细尺度的血管形状,设计了一个新颖的知识转移网络来探索数据之间的相互依赖关系,并通过多级损失函数的集成,自动将从高分辨率(HR)数据中获得的知识转移到低分辨率处理网络的多个级别,包括多尺度特征级别和决策级别。高分辨率图像转换器网络所拥有的细尺度血管形状数据分布建模能力,可以转移到低分辨率图像转换器中,以增强其在细血管形状分割方面的知识。在公共图像数据集上的大量实验结果表明,我们的方法优于所有其他最先进的深度学习方法。
{"title":"Multi-scale Knowledge Transfer Vision Transformer for 3D vessel shape segmentation","authors":"Michael J. Hua , Junjie Wu , Zichun Zhong","doi":"10.1016/j.cag.2024.103976","DOIUrl":"https://doi.org/10.1016/j.cag.2024.103976","url":null,"abstract":"<div><p>In order to facilitate the robust and precise 3D vessel shape extraction and quantification from in-vivo Magnetic Resonance Imaging (MRI), this paper presents a novel multi-scale Knowledge Transfer Vision Transformer (i.e., KT-ViT) for 3D vessel shape segmentation. First, it uniquely integrates convolutional embeddings with transformer in a U-net architecture, which simultaneously responds to local receptive fields with convolution layers and global contexts with transformer encoders in a multi-scale fashion. Therefore, it intrinsically enriches local vessel feature and simultaneously promotes global connectivity and continuity for a more accurate and reliable vessel shape segmentation. Furthermore, to enable using relatively low-resolution (LR) images to segment fine scale vessel shapes, a novel knowledge transfer network is designed to explore the inter-dependencies of data and automatically transfer the knowledge gained from high-resolution (HR) data to the low-resolution handling network at multiple levels, including the multi-scale feature levels and the decision level, through an integration of multi-level loss functions. The modeling capability of fine-scale vessel shape data distribution, possessed by the HR image transformer network, can be transferred to the LR image transformer to enhance its knowledge for fine vessel shape segmentation. Extensive experimental results on public image datasets have demonstrated that our method outperforms all other state-of-the-art deep learning methods.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141482726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-18DOI: 10.1016/j.cag.2024.103978
Immersive environments with head mounted displays (HMD) and hand-held controllers, either in Virtual or Augmented Reality (VR/AR), offer new possibilities for the creation of artistic 3D content. Some of them are exploited by mid-air drawing applications: the user’s hand trajectory generates a set of stylized curves or ribbons in space, giving the impression of painting or drawing in 3D. We propose a method to extend this approach to the sketching of surfaces with a VR controller. The idea is to favor shape exploration, offering a tool, where the user creates a surface just by painting ribbons. These ribbons are not constrained to form patch boundaries for example or to completely cover the shape. They can be very sparse, disordered, overlap or not, intersect or not. The shape is computed simultaneously, starting with the first piece of ribbon drawn by the user and continuing to evolve in real-time as long as the user continues sketching. Our method involves minimizing an energy function based on the projections of the ribbon strokes on a proxy surface by taking the controller’s orientations into account. The current implementation considers elevation surfaces. In addition to many examples, we evaluate the time performance of the dynamic shape modeling with respect to an increasing number of input ribbon strokes. Finally, we present images of an artistic creation that combines stylized curve drawings in VR with our surface sketching tool created by a professional artist.
{"title":"3D sketching in immersive environments: Shape from disordered ribbon strokes","authors":"","doi":"10.1016/j.cag.2024.103978","DOIUrl":"10.1016/j.cag.2024.103978","url":null,"abstract":"<div><p>Immersive environments with head mounted displays (HMD) and hand-held controllers, either in Virtual or Augmented Reality (VR/AR), offer new possibilities for the creation of artistic 3D content. Some of them are exploited by mid-air drawing applications: the user’s hand trajectory generates a set of stylized curves or ribbons in space, giving the impression of painting or drawing in 3D. We propose a method to extend this approach to the sketching of surfaces with a VR controller. The idea is to favor shape exploration, offering a tool, where the user creates a surface just by painting ribbons. These ribbons are not constrained to form patch boundaries for example or to completely cover the shape. They can be very sparse, disordered, overlap or not, intersect or not. The shape is computed simultaneously, starting with the first piece of ribbon drawn by the user and continuing to evolve in real-time as long as the user continues sketching. Our method involves minimizing an energy function based on the projections of the ribbon strokes on a proxy surface by taking the controller’s orientations into account. The current implementation considers elevation surfaces. In addition to many examples, we evaluate the time performance of the dynamic shape modeling with respect to an increasing number of input ribbon strokes. Finally, we present images of an artistic creation that combines stylized curve drawings in VR with our surface sketching tool created by a professional artist.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0097849324001134/pdfft?md5=b8947c31f44eb00638419d8108706433&pid=1-s2.0-S0097849324001134-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141571365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-15DOI: 10.1016/j.cag.2024.103977
Qi Wang, Qing Fang, Xiaoya Zhai, Ligang Liu, Xiao-Ming Fu
We propose a novel method to design differentiable microstructures. Central to our algorithm is a new representation of the mapping from the parameters to microstructures, formulated as the anisotropic thermal diffusion. A metric field governs the anisotropic diffusion. The metric associated with each point is represented as a 2 × 2 symmetric positive definite matrix that becomes the design variable. To alleviate the difficulties caused by symmetric positive definite constraints, we perform the singular value decomposition of the metric matrix so that the design variable includes a rotation angle and a diagonal matrix. Then, the positive definiteness is converted to requiring the two diagonal entries of the diagonal matrix to be positive, which is easier to deal with. The effectiveness of our algorithm is demonstrated through evaluations and comparisons over various examples.
{"title":"Differentiable microstructures design via anisotropic thermal diffusion","authors":"Qi Wang, Qing Fang, Xiaoya Zhai, Ligang Liu, Xiao-Ming Fu","doi":"10.1016/j.cag.2024.103977","DOIUrl":"10.1016/j.cag.2024.103977","url":null,"abstract":"<div><p>We propose a novel method to design differentiable microstructures. Central to our algorithm is a new representation of the mapping from the parameters to microstructures, formulated as the anisotropic thermal diffusion. A metric field governs the anisotropic diffusion. The metric associated with each point is represented as a 2 × 2 symmetric positive definite matrix that becomes the design variable. To alleviate the difficulties caused by symmetric positive definite constraints, we perform the singular value decomposition of the metric matrix so that the design variable includes a rotation angle and a diagonal matrix. Then, the positive definiteness is converted to requiring the two diagonal entries of the diagonal matrix to be positive, which is easier to deal with. The effectiveness of our algorithm is demonstrated through evaluations and comparisons over various examples.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2024-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141409852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-13DOI: 10.1016/j.cag.2024.103975
Wen Hao Png, Yichiet Aun, Ming Lee Gan
Text-conditioned image synthesis methods such as DALLE-2, IMAGEN, and Stable Diffusion are gaining strong attention from deep learning and art communities recently. Meanwhile, Image-to-Image (Img2Img) synthesis applications that emerged from the pioneering Neural Style Transfer (NST) approach have swiftly transitioned towards the feed-forward Automatic Style Transfer (AST) methods, due to numerous constraints inherent in the former method, including inconsistent synthesis outcomes and sluggish optimization-based synthesis process. However, NST holds significant potential yet remains relatively underexplored within this research domain. In this paper, we revisited the original NST method and uncovered its potential to attain image quality comparable to the AST synthesis methods across a diverse range of artistic styles. We propose a two-stage Feature-guided Style Transfer (FeaST) which consists (a) pre-stylization step called Sketching to address the poor initialization issue, and (b) Finetuning to guide the synthesis process based on high-frequency (HF) and low-frequency (LF) guidance channels. By addressing the issues of inconsistent synthesis and slow convergence inherent in the original method, FeaST unlocks the full capabilities of NST and significantly enhances its efficiency.
{"title":"FeaST: Feature-guided Style Transfer for high-fidelity art synthesis","authors":"Wen Hao Png, Yichiet Aun, Ming Lee Gan","doi":"10.1016/j.cag.2024.103975","DOIUrl":"10.1016/j.cag.2024.103975","url":null,"abstract":"<div><p>Text-conditioned image synthesis methods such as DALLE-2, IMAGEN, and Stable Diffusion are gaining strong attention from deep learning and art communities recently. Meanwhile, Image-to-Image (Img2Img) synthesis applications that emerged from the pioneering Neural Style Transfer (NST) approach have swiftly transitioned towards the feed-forward Automatic Style Transfer (AST) methods, due to numerous constraints inherent in the former method, including inconsistent synthesis outcomes and sluggish optimization-based synthesis process. However, NST holds significant potential yet remains relatively underexplored within this research domain. In this paper, we revisited the original NST method and uncovered its potential to attain image quality comparable to the AST synthesis methods across a diverse range of artistic styles. We propose a two-stage Feature-guided Style Transfer (FeaST) which consists (a) pre-stylization step called <em>Sketching</em> to address the poor initialization issue, and (b) <em>Finetuning</em> to guide the synthesis process based on high-frequency (HF) and low-frequency (LF) guidance channels. By addressing the issues of inconsistent synthesis and slow convergence inherent in the original method, FeaST unlocks the full capabilities of NST and significantly enhances its efficiency.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141401078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-13DOI: 10.1016/j.cag.2024.103973
Yusheng Yang , Zhiyuan Gao , Jinghan Zhang , Wenbo Hui , Hang Shi , Yangmin Xie
Omnidirectional images, also known as spherical images, offer a significant advantage for the environmental sensing of mobile robots due to their wide field of view. However, previous studies of constructing convolutional neural networks on spherical images have been limited by non-uniform pixel sampling, leading to suboptimal performance in semantic segmentation. To address this issue, a novel pixel segmentation approach is proposed to achieve near-uniform pixel distribution across the entire spherical surface. The corresponding convolution operation for the resulting image is designed as well, which extends the capabilities of spherical CNNs from semantic segmentation to more complex tasks such as instance segmentation. The method is evaluated on the Stanford 2D3DS dataset and shows superior performance compared to conventional spherical CNNs. Furthermore, the method also achieves impressive instance segmentation results on our experimental LiDAR data, demonstrating the general feasibility of our approach for common CNN tasks. The related code and dataset are released in the following link: https://github.com/YoungRainy/UVS-U-Net.
{"title":"UVS-CNNs: Constructing general convolutional neural networks on quasi-uniform spherical images","authors":"Yusheng Yang , Zhiyuan Gao , Jinghan Zhang , Wenbo Hui , Hang Shi , Yangmin Xie","doi":"10.1016/j.cag.2024.103973","DOIUrl":"10.1016/j.cag.2024.103973","url":null,"abstract":"<div><p>Omnidirectional images, also known as spherical images, offer a significant advantage for the environmental sensing of mobile robots due to their wide field of view. However, previous studies of constructing convolutional neural networks on spherical images have been limited by non-uniform pixel sampling, leading to suboptimal performance in semantic segmentation. To address this issue, a novel pixel segmentation approach is proposed to achieve near-uniform pixel distribution across the entire spherical surface. The corresponding convolution operation for the resulting image is designed as well, which extends the capabilities of spherical CNNs from semantic segmentation to more complex tasks such as instance segmentation. The method is evaluated on the Stanford 2D3DS dataset and shows superior performance compared to conventional spherical CNNs. Furthermore, the method also achieves impressive instance segmentation results on our experimental LiDAR data, demonstrating the general feasibility of our approach for common CNN tasks. The related code and dataset are released in the following link: <span>https://github.com/YoungRainy/UVS-U-Net</span><svg><path></path></svg>.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141399702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-13DOI: 10.1016/j.cag.2024.103974
Zhuoran Wang, Jianjun Yi, Lin Su, Yihan Pan
Point cloud registration methods based on Gaussian Mixture Models (GMMs) exhibit high robustness. However, GMM cannot precisely depict point clouds, because the Gaussian distribution is spatially symmetric and local surfaces of point clouds are typically non-symmetric. In this paper, we propose a novel method for rigid point cloud registration, termed coherent point drift with Skewed Distribution (Skewed CPD). Our method employs an asymmetric distribution constructed from the local surface normals and curvature radii. Compared to the Gaussian distribution, this skewed distribution provides a more accurate spatial description of points on local surfaces. Additionally, we integrate an adaptive multiplier to the covariance, which reallocates the weight of the covariance for different components in the probabilistic mixture model. We employ the EM algorithm to address this maximum likelihood estimation (MLE) issue and leverage GPU acceleration. In the M-step, we adopt an unconstrained optimization technique rooted in a Lie group and Lie algebra to attain the optimal transformation. Experimental results indicate that our method outperforms state-of-the-art methods in both accuracy and robustness. Remarkably, even without loop closure detection, the cumulative error of our approach remains minimal.
基于高斯混合模型(GMM)的点云注册方法具有很高的鲁棒性。然而,GMM 无法精确描绘点云,因为高斯分布在空间上是对称的,而点云的局部表面通常是不对称的。在本文中,我们提出了一种用于刚性点云注册的新方法,即倾斜分布相干点漂移(Skewed CPD)。我们的方法采用了由局部表面法线和曲率半径构建的非对称分布。与高斯分布相比,这种倾斜分布能对局部表面上的点进行更精确的空间描述。此外,我们还在协方差中加入了自适应乘数,重新分配概率混合模型中不同成分的协方差权重。我们采用 EM 算法来解决最大似然估计 (MLE) 问题,并利用 GPU 加速。在 M 步中,我们采用了植根于李群和李代数的无约束优化技术,以实现最优变换。实验结果表明,我们的方法在准确性和鲁棒性方面都优于最先进的方法。值得注意的是,即使没有闭环检测,我们方法的累积误差仍然很小。
{"title":"Coherent point drift with Skewed Distribution for accurate point cloud registration","authors":"Zhuoran Wang, Jianjun Yi, Lin Su, Yihan Pan","doi":"10.1016/j.cag.2024.103974","DOIUrl":"10.1016/j.cag.2024.103974","url":null,"abstract":"<div><p>Point cloud registration methods based on Gaussian Mixture Models (GMMs) exhibit high robustness. However, GMM cannot precisely depict point clouds, because the Gaussian distribution is spatially symmetric and local surfaces of point clouds are typically non-symmetric. In this paper, we propose a novel method for rigid point cloud registration, termed coherent point drift with Skewed Distribution (Skewed CPD). Our method employs an asymmetric distribution constructed from the local surface normals and curvature radii. Compared to the Gaussian distribution, this skewed distribution provides a more accurate spatial description of points on local surfaces. Additionally, we integrate an adaptive multiplier to the covariance, which reallocates the weight of the covariance for different components in the probabilistic mixture model. We employ the EM algorithm to address this maximum likelihood estimation (MLE) issue and leverage GPU acceleration. In the M-step, we adopt an unconstrained optimization technique rooted in a Lie group and Lie algebra to attain the optimal transformation. Experimental results indicate that our method outperforms state-of-the-art methods in both accuracy and robustness. Remarkably, even without loop closure detection, the cumulative error of our approach remains minimal.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141411340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-12DOI: 10.1016/j.cag.2024.103972
Yuping Ye , Juncheng Han , Jixin Liang , Di Wu , Zhan Song
Facial retargeting is a widely used technique in the game and film industries that replicates the expressions of a source facial model onto a target model. Existing methods for facial retargeting rely on either hand-crafted uniform triangle meshes or sparse points obtained from motion capture(mocap). In this paper, we propose an end-to-end facial retargeting algorithm that copies facial expressions from unordered dense point clouds onto the target model. First, a corresponding building method based on bi-harmonic function is introduced to ensure that the template model and a cluster of point clouds share the same triangle topology. Second, a deformation transferring method is presented to transfer the calculated deformation onto the target model. Several experiments are conducted on the SIAT-3DFE dataset to demonstrate the accuracy and efficiency of our method.
{"title":"Retargeting of facial model for unordered dense point cloud","authors":"Yuping Ye , Juncheng Han , Jixin Liang , Di Wu , Zhan Song","doi":"10.1016/j.cag.2024.103972","DOIUrl":"10.1016/j.cag.2024.103972","url":null,"abstract":"<div><p>Facial retargeting is a widely used technique in the game and film industries that replicates the expressions of a source facial model onto a target model. Existing methods for facial retargeting rely on either hand-crafted uniform triangle meshes or sparse points obtained from motion capture(mocap). In this paper, we propose an end-to-end facial retargeting algorithm that copies facial expressions from unordered dense point clouds onto the target model. First, a corresponding building method based on bi-harmonic function is introduced to ensure that the template model and a cluster of point clouds share the same triangle topology. Second, a deformation transferring method is presented to transfer the calculated deformation onto the target model. Several experiments are conducted on the SIAT-3DFE dataset to demonstrate the accuracy and efficiency of our method.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141402940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}