Pub Date : 2024-07-06DOI: 10.1016/j.cag.2024.103995
Ana Serrano, Gustavo Patow, Julio Marco
{"title":"Foreword to the special section on Spanish Computer Graphics Conference 2024","authors":"Ana Serrano, Gustavo Patow, Julio Marco","doi":"10.1016/j.cag.2024.103995","DOIUrl":"10.1016/j.cag.2024.103995","url":null,"abstract":"","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"123 ","pages":"Article 103995"},"PeriodicalIF":2.5,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141710884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-06DOI: 10.1016/j.cag.2024.103996
Qiubing Zhuang , Zhonggui Chen , Keyu He , Juan Cao , Wenping Wang
The 3D packing problem has a wide range of applications. However, the complex geometry of irregular objects leads to a sharp increase in the number of placement combinations, making it a challenging problem. In this paper, we propose a packing pipeline based on rigid body dynamics simulation to deal with two types of 3D packing problems. One is the variant bin packing problem, which involves placing more objects into a container of given dimensions to maximize space utilization. The other is the open dimension problem, where the goal is to minimize the container that can accommodate all objects. We first use heuristic placement strategies and a fast collision detection algorithm to efficiently obtain initial packing results. Then, we simulate the shaking of the container according to the dynamic principle. Combined with the vacant space filling operation, shaking the container drives the movement of objects in the container to make the arrangement of objects more compact. For the open dimension packing, the container height is optimized by adjusting the constraints of simulation in the basic pipeline. Experimental results show that our method has advantages over existing methods in both speed and packing density.
{"title":"Dynamics simulation-based packing of irregular 3D objects","authors":"Qiubing Zhuang , Zhonggui Chen , Keyu He , Juan Cao , Wenping Wang","doi":"10.1016/j.cag.2024.103996","DOIUrl":"10.1016/j.cag.2024.103996","url":null,"abstract":"<div><p>The 3D packing problem has a wide range of applications. However, the complex geometry of irregular objects leads to a sharp increase in the number of placement combinations, making it a challenging problem. In this paper, we propose a packing pipeline based on rigid body dynamics simulation to deal with two types of 3D packing problems. One is the variant bin packing problem, which involves placing more objects into a container of given dimensions to maximize space utilization. The other is the open dimension problem, where the goal is to minimize the container that can accommodate all objects. We first use heuristic placement strategies and a fast collision detection algorithm to efficiently obtain initial packing results. Then, we simulate the shaking of the container according to the dynamic principle. Combined with the vacant space filling operation, shaking the container drives the movement of objects in the container to make the arrangement of objects more compact. For the open dimension packing, the container height is optimized by adjusting the constraints of simulation in the basic pipeline. Experimental results show that our method has advantages over existing methods in both speed and packing density.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"123 ","pages":"Article 103996"},"PeriodicalIF":2.5,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141709975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-03DOI: 10.1016/j.cag.2024.103993
Jarne Van den Herrewegen , Tom Tourwé , Maks Ovsjanikov , Francis wyffels
Foundation models, such as ULIP-2 (Xue et al., 2023) recently projected forward the field of 3D deep learning. These models are trained with significantly more data and show superior representation learning capacity in many downstream tasks like 3D shape classification and few-shot part segmentation.
A particular characteristic of the recent 3D foundation models is that they are typically multi-modal, and involve image (2D) as well as caption (text) branches. This leads to an intricate interplay that benefits all modalities. At the same time, the nature of the 3D encoders alone, involved in these foundation models is not well-understood. Specifically, there is little analysis on the utility of both pre-trained 3D features provided by these models, or their capacity to adapt to new downstream 3D data. Furthermore, existing studies typically focus on label-oriented downstream tasks, such as shape classification, and ignore other critical applications, such as 3D content-based object retrieval.
In this paper, we fill this gap and show, for the first time, how 3D foundation models can be leveraged for strong 3D-to-3D retrieval performance on seven different datasets, on par with state-of-the-art view-based architectures. We evaluate both the pre-trained foundation models, as well as their fine-tuned versions using downstream data. We compare supervised fine-tuning using classification labels against two self-supervised label-free fine-tuning methods. Importantly, we introduce and describe a methodology for fine-tuning, as we found this to be crucial to make transfer learning from 3D foundation models work in a stable manner.
{"title":"Fine-tuning 3D foundation models for geometric object retrieval","authors":"Jarne Van den Herrewegen , Tom Tourwé , Maks Ovsjanikov , Francis wyffels","doi":"10.1016/j.cag.2024.103993","DOIUrl":"https://doi.org/10.1016/j.cag.2024.103993","url":null,"abstract":"<div><p>Foundation models, such as ULIP-2 (Xue et al., 2023) recently projected forward the field of 3D deep learning. These models are trained with significantly more data and show superior representation learning capacity in many downstream tasks like 3D shape classification and few-shot part segmentation.</p><p>A particular characteristic of the recent 3D foundation models is that they are typically <em>multi-modal</em>, and involve image (2D) as well as caption (text) branches. This leads to an intricate interplay that benefits all modalities. At the same time, the nature of the <em>3D</em> encoders alone, involved in these foundation models is not well-understood. Specifically, there is little analysis on the utility of both pre-trained 3D features provided by these models, or their capacity to adapt to new downstream 3D data. Furthermore, existing studies typically focus on label-oriented downstream tasks, such as shape classification, and ignore other critical applications, such as 3D content-based object retrieval.</p><p>In this paper, we fill this gap and show, for the first time, how 3D foundation models can be leveraged for strong 3D-to-3D retrieval performance on seven different datasets, on par with state-of-the-art view-based architectures. We evaluate both the pre-trained foundation models, as well as their fine-tuned versions using downstream data. We compare supervised fine-tuning using classification labels against two self-supervised label-free fine-tuning methods. Importantly, we introduce and describe a methodology for fine-tuning, as we found this to be crucial to make transfer learning from 3D foundation models work in a stable manner.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"122 ","pages":"Article 103993"},"PeriodicalIF":2.5,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0097849324001286/pdfft?md5=9cb01c40df89ca64e783dcd0f63e3f33&pid=1-s2.0-S0097849324001286-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141592876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1016/j.cag.2024.103990
Rui Tao , Xianku Zhang , Hongxiang Ren , Xiao Yang , Yi Zhou
Vortical flow animation has attracted considerable attention within the realm of computer graphics. Given that boundaries are the source of vorticity, we introduce a novel approach for the wall-bounded flow simulation in the vortex dynamics framework. We enhance the traditional Lagrangian vortex particle method in terms of boundary treatment and viscosity computation for boundary layer to furnish support for the simulation of the vortex shedding phenomenon behind the moving solids. We extend the boundary treatment strategy based on the idea of vortex generation with a reasonable placement algorithm for the generated vortex element to better satisfy the impermeability of solids. Furthermore, we modify the particle strength exchange method at solid boundaries to capture momentum transfer of moving solids. We demonstrate the efficacy of our approach by simulating a series of wall-bounded flows, such as the wake behind a delta wing and the vortex shedding behind the rotating sphere.
{"title":"Wall-bounded flow simulation on vortex dynamics","authors":"Rui Tao , Xianku Zhang , Hongxiang Ren , Xiao Yang , Yi Zhou","doi":"10.1016/j.cag.2024.103990","DOIUrl":"https://doi.org/10.1016/j.cag.2024.103990","url":null,"abstract":"<div><p>Vortical flow animation has attracted considerable attention within the realm of computer graphics. Given that boundaries are the source of vorticity, we introduce a novel approach for the wall-bounded flow simulation in the vortex dynamics framework. We enhance the traditional Lagrangian vortex particle method in terms of boundary treatment and viscosity computation for boundary layer to furnish support for the simulation of the vortex shedding phenomenon behind the moving solids. We extend the boundary treatment strategy based on the idea of vortex generation with a reasonable placement algorithm for the generated vortex element to better satisfy the impermeability of solids. Furthermore, we modify the particle strength exchange method at solid boundaries to capture momentum transfer of moving solids. We demonstrate the efficacy of our approach by simulating a series of wall-bounded flows, such as the wake behind a delta wing and the vortex shedding behind the rotating sphere.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"122 ","pages":"Article 103990"},"PeriodicalIF":2.5,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141543198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-28DOI: 10.1016/j.cag.2024.103991
Peder Bergebakken Sundt, Theoharis Theoharis
Deep learning methods are revolutionizing the solutions to visual computing problems, such as shape retrieval and generative shape modeling, but require novel shape representations that are both fast and differentiable. Neural ray fields and their improved rendering performance are promising in this regard, but struggle with a reduced fidelity and multi-view consistency when compared to the more studied coordinate-based methods which, however, are slower in training and evaluation. We propose PMARF, an improved ray field which explicitly models the skeleton of the target shape as a set of (0-thickness) parametric medial surfaces. This formulation reduces by construction the degrees-of-freedom available in the reconstruction domain, improving multi-view consistency even from sparse training views. This in turn improves fidelity while facilitating a reduction in the network size.
{"title":"Towards multi-view consistency in neural ray fields using parametric medial surfaces","authors":"Peder Bergebakken Sundt, Theoharis Theoharis","doi":"10.1016/j.cag.2024.103991","DOIUrl":"10.1016/j.cag.2024.103991","url":null,"abstract":"<div><p>Deep learning methods are revolutionizing the solutions to visual computing problems, such as shape retrieval and generative shape modeling, but require novel shape representations that are both fast and differentiable. Neural ray fields and their improved rendering performance are promising in this regard, but struggle with a reduced fidelity and multi-view consistency when compared to the more studied coordinate-based methods which, however, are slower in training and evaluation. We propose PMARF, an improved ray field which explicitly models the skeleton of the target shape as a set of (0-thickness) parametric medial surfaces. This formulation reduces by construction the degrees-of-freedom available in the reconstruction domain, improving multi-view consistency even from sparse training views. This in turn improves fidelity while facilitating a reduction in the network size.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"123 ","pages":"Article 103991"},"PeriodicalIF":2.5,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0097849324001262/pdfft?md5=ae7ee888d75c89ab09e33803aeb2e531&pid=1-s2.0-S0097849324001262-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141637685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a novel approach to understanding space use by moving entities based on repeated patterns of place visits and transitions. Our approach represents trajectories as text documents consisting of sequences of place visits or transitions and applies topic modelling to the corpus of these documents. The resulting topics represent combinations of places or transitions, respectively, that repeatedly co-occur in trips. Visualisation of the results in the spatial context reveals the regions of place connectivity through movements and the major channels used to traverse the space. This enables understanding of the use of space as a medium for movement. We compare the possibilities provided by topic modelling to alternative approaches exploiting a numeric measure of pairwise connectedness. We have extensively explored the potential of utilising topic modelling by applying our approach to multiple real-world movement data sets with different data collection procedures and varying spatial and temporal properties: GPS road traffic of cars, unconstrained movement on a football pitch, and episodic movement data reflecting social media posting events. The approach successfully demonstrated the ability to uncover meaningful patterns and interesting insights. We thoroughly discuss different aspects of the approach and share the knowledge and experience we have gained with people who might be potentially interested in analysing movement data by means of topic modelling methods.
{"title":"Topic modelling for spatial insights: Uncovering space use from movement data","authors":"Gennady Andrienko , Natalia Andrienko , Dirk Hecker","doi":"10.1016/j.cag.2024.103989","DOIUrl":"https://doi.org/10.1016/j.cag.2024.103989","url":null,"abstract":"<div><p>We present a novel approach to understanding space use by moving entities based on repeated patterns of place visits and transitions. Our approach represents trajectories as text documents consisting of sequences of place visits or transitions and applies topic modelling to the corpus of these documents. The resulting topics represent combinations of places or transitions, respectively, that repeatedly co-occur in trips. Visualisation of the results in the spatial context reveals the regions of place connectivity through movements and the major channels used to traverse the space. This enables understanding of the use of space as a medium for movement. We compare the possibilities provided by topic modelling to alternative approaches exploiting a numeric measure of pairwise connectedness. We have extensively explored the potential of utilising topic modelling by applying our approach to multiple real-world movement data sets with different data collection procedures and varying spatial and temporal properties: GPS road traffic of cars, unconstrained movement on a football pitch, and episodic movement data reflecting social media posting events. The approach successfully demonstrated the ability to uncover meaningful patterns and interesting insights. We thoroughly discuss different aspects of the approach and share the knowledge and experience we have gained with people who might be potentially interested in analysing movement data by means of topic modelling methods.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"122 ","pages":"Article 103989"},"PeriodicalIF":2.5,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0097849324001249/pdfft?md5=ee1686b8eaee02c4296da70e08390e4b&pid=1-s2.0-S0097849324001249-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141592875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-28DOI: 10.1016/j.cag.2024.103994
Juan Pareja-Corcho , Diego Montoya-Zapata , Aitor Moreno , Carlos Cadavid , Jorge Posada , Ketzare Arenas-Tobon , Oscar Ruiz-Salguero
The automatic shape description of solids is a problem of interest in manufacturing engineering, amongst other related areas. This description can be either geometrical or topological in nature and can be applied to either surfaces or solids (embedded manifolds). Topological descriptions are specially interesting for the problem of shape comparison and retrieval, where one wants to know if a given shape resembles some other known shape. Some popular topological descriptions use Morse theory to study the topology of manifolds and encode their shape characteristics. A Morse function is defined on the manifold and the manifold’s shape is indirectly studied by studying the behavior of the critical points of . This family of methods is well defined for surfaces but does not consider the case of solids. In this paper we address the topological description of solids using Morse theory. Our methodology considers three cases: solids without internal boundaries, solids with internal boundaries and thin-walled solids. We present an algorithm to identify topological changes on these solids using the principle of shape decomposition by Morse handles. The presented algorithm deals with Morse functions that produce parallel planar level sets. Future endeavors should consider other candidate functions.
固体的自动形状描述是制造工程及其他相关领域的一个重要问题。这种描述可以是几何性质的,也可以是拓扑性质的,既可以应用于表面,也可以应用于实体(嵌入流形)。拓扑描述对于形状比较和检索问题特别有趣,因为人们想知道一个给定的形状是否与其他已知形状相似。一些流行的拓扑描述使用莫尔斯理论来研究流形的拓扑结构,并对其形状特征进行编码。莫尔斯函数 f 定义在流形上,通过研究 f 的临界点的行为间接研究流形的形状。在本文中,我们将利用莫尔斯理论对固体进行拓扑描述。我们的方法考虑了三种情况:无内部边界的固体、有内部边界的固体和薄壁固体。我们提出了一种算法,利用莫尔斯手柄的形状分解原理来识别这些固体的拓扑变化。所提出的算法适用于产生平行平面水平集的莫尔斯函数。未来的工作应考虑其他候选函数。
{"title":"On the shape description of general solids using Morse theory","authors":"Juan Pareja-Corcho , Diego Montoya-Zapata , Aitor Moreno , Carlos Cadavid , Jorge Posada , Ketzare Arenas-Tobon , Oscar Ruiz-Salguero","doi":"10.1016/j.cag.2024.103994","DOIUrl":"https://doi.org/10.1016/j.cag.2024.103994","url":null,"abstract":"<div><p>The automatic shape description of solids is a problem of interest in manufacturing engineering, amongst other related areas. This description can be either geometrical or topological in nature and can be applied to either surfaces or solids (embedded manifolds). Topological descriptions are specially interesting for the problem of shape comparison and retrieval, where one wants to know if a given shape resembles some other known shape. Some popular topological descriptions use Morse theory to study the topology of manifolds and encode their shape characteristics. A Morse function <span><math><mi>f</mi></math></span> is defined on the manifold and the manifold’s shape is indirectly studied by studying the behavior of the critical points of <span><math><mi>f</mi></math></span>. This family of methods is well defined for surfaces but does not consider the case of solids. In this paper we address the topological description of solids using Morse theory. Our methodology considers three cases: solids without internal boundaries, solids with internal boundaries and thin-walled solids. We present an algorithm to identify topological changes on these solids using the principle of shape decomposition by Morse handles. The presented algorithm deals with Morse functions that produce parallel planar level sets. Future endeavors should consider other candidate functions.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"122 ","pages":"Article 103994"},"PeriodicalIF":2.5,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141543195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-28DOI: 10.1016/j.cag.2024.103987
Julian Thijssen, Zonglin Tian, Alexandru Telea
We present a set of interactive visual analysis techniques aiming at explaining data patterns in multidimensional projections. Our novel techniques include a global value-based encoding that highlights point groups having outlier values in any dimension as well as several local tools that provide details on the statistics of all dimensions for a user-selected projection area. Our techniques generically apply to any projection algorithm and scale computationally well to hundreds of thousands of points and hundreds of dimensions. We describe a user study that shows that our visual tools can be quickly learned and applied by users to obtain non-trivial insights in real-world multidimensional datasets. We also show how our techniques can help understanding a real-world dataset containing quantitative, ordinal, and categorical attributes.
{"title":"Interactive tools for explaining multidimensional projections for high-dimensional tabular data","authors":"Julian Thijssen, Zonglin Tian, Alexandru Telea","doi":"10.1016/j.cag.2024.103987","DOIUrl":"https://doi.org/10.1016/j.cag.2024.103987","url":null,"abstract":"<div><p>We present a set of interactive visual analysis techniques aiming at explaining data patterns in multidimensional projections. Our novel techniques include a global value-based encoding that highlights point groups having outlier values in any dimension as well as several local tools that provide details on the statistics of all dimensions for a user-selected projection area. Our techniques generically apply to any projection algorithm and scale computationally well to hundreds of thousands of points and hundreds of dimensions. We describe a user study that shows that our visual tools can be quickly learned and applied by users to obtain non-trivial insights in real-world multidimensional datasets. We also show how our techniques can help understanding a real-world dataset containing quantitative, ordinal, and categorical attributes.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"122 ","pages":"Article 103987"},"PeriodicalIF":2.5,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0097849324001225/pdfft?md5=bc29f3eeeac8ab279efad0c0b08c66dc&pid=1-s2.0-S0097849324001225-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141543197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-28DOI: 10.1016/j.cag.2024.103986
Hong Lin, Qi Chen, Chun Liu, Jingsong Hu
Recently, there has been widespread attention and significant progress in customized text-to-image synthesis based on diffusion models. However, reconstructing multiple concepts in the same scene remains highly challenging. Therefore, we propose a novel framework called TDG-Diff, which employs a two-stage diffusion guidance to achieve customized image synthesis with multiple concepts. TDG-Diff focuses on improving the sampling process of the diffusion model. Specifically, TDG-Diff subdivides the sampling process into two key stages: attribute separation and appearance refinement, introducing spatial constraints and concept representations for sampling guidance. In the attribute separation stage, TDG-Diff introduces a novel attention modulation method. This method effectively separates the attributes of different concepts based on spatial constraint information, reducing the risk of entanglement between attributes of different concepts. In the appearance refinement stage, TDG-Diff proposes a fusion sampling approach, which combines global text descriptions and concept representations to optimize and enhance the model’s ability to capture and represent fine-grained details of concepts. Extensive qualitative and quantitative results demonstrate the effectiveness of TDG-Diff in customized text-to-image synthesis.
{"title":"TDG-Diff: Advancing customized text-to-image synthesis with two-stage diffusion guidance","authors":"Hong Lin, Qi Chen, Chun Liu, Jingsong Hu","doi":"10.1016/j.cag.2024.103986","DOIUrl":"https://doi.org/10.1016/j.cag.2024.103986","url":null,"abstract":"<div><p>Recently, there has been widespread attention and significant progress in customized text-to-image synthesis based on diffusion models. However, reconstructing multiple concepts in the same scene remains highly challenging. Therefore, we propose a novel framework called TDG-Diff, which employs a two-stage diffusion guidance to achieve customized image synthesis with multiple concepts. TDG-Diff focuses on improving the sampling process of the diffusion model. Specifically, TDG-Diff subdivides the sampling process into two key stages: attribute separation and appearance refinement, introducing spatial constraints and concept representations for sampling guidance. In the attribute separation stage, TDG-Diff introduces a novel attention modulation method. This method effectively separates the attributes of different concepts based on spatial constraint information, reducing the risk of entanglement between attributes of different concepts. In the appearance refinement stage, TDG-Diff proposes a fusion sampling approach, which combines global text descriptions and concept representations to optimize and enhance the model’s ability to capture and represent fine-grained details of concepts. Extensive qualitative and quantitative results demonstrate the effectiveness of TDG-Diff in customized text-to-image synthesis.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"122 ","pages":"Article 103986"},"PeriodicalIF":2.5,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141543199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-27DOI: 10.1016/j.cag.2024.103988
Nicolas Wagner , Ulrich Schwanecke , Mario Botsch
Offline facial retargeting, i.e., transferring facial expressions from a source to a target character, is a common production task that still regularly leads to considerable algorithmic challenges. This task can be roughly dissected into the transfer of sequential facial animations and non-sequential blendshape personalization. Both problems are typically solved by data-driven methods that require an extensive corpus of costly target examples. Other than that, geometrically motivated approaches do not require intensive data collection but cannot account for character-specific deformations and are known to cause manifold visual artifacts.
We present AnaConDaR, a novel method for offline facial retargeting, as a hybrid of data-driven and geometry-driven methods that incorporates anatomical constraints through a physics-based simulation. As a result, our approach combines the advantages of both paradigms while balancing out the respective disadvantages. In contrast to other recent concepts, AnaConDaR achieves substantially individualized results even when only a handful of target examples are available. At the same time, we do not make the common assumption that for each target example a matching source expression must be known. Instead, AnaConDaR establishes correspondences between the source and the target character by a data-driven embedding of the target examples in the source domain. We evaluate our offline facial retargeting algorithm visually, quantitatively, and in two user studies.
{"title":"AnaConDaR: Anatomically-Constrained Data-Adaptive Facial Retargeting","authors":"Nicolas Wagner , Ulrich Schwanecke , Mario Botsch","doi":"10.1016/j.cag.2024.103988","DOIUrl":"https://doi.org/10.1016/j.cag.2024.103988","url":null,"abstract":"<div><p>Offline facial retargeting, i.e., transferring facial expressions from a source to a target character, is a common production task that still regularly leads to considerable algorithmic challenges. This task can be roughly dissected into the transfer of sequential facial animations and non-sequential blendshape personalization. Both problems are typically solved by data-driven methods that require an extensive corpus of costly target examples. Other than that, geometrically motivated approaches do not require intensive data collection but cannot account for character-specific deformations and are known to cause manifold visual artifacts.</p><p>We present AnaConDaR, a novel method for offline facial retargeting, as a hybrid of data-driven and geometry-driven methods that incorporates anatomical constraints through a physics-based simulation. As a result, our approach combines the advantages of both paradigms while balancing out the respective disadvantages. In contrast to other recent concepts, AnaConDaR achieves substantially individualized results even when only a handful of target examples are available. At the same time, we do not make the common assumption that for each target example a matching source expression must be known. Instead, AnaConDaR establishes correspondences between the source and the target character by a data-driven embedding of the target examples in the source domain. We evaluate our offline facial retargeting algorithm visually, quantitatively, and in two user studies.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"122 ","pages":"Article 103988"},"PeriodicalIF":2.5,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0097849324001237/pdfft?md5=832061b3ec358e11c3e9bfb879ea3d28&pid=1-s2.0-S0097849324001237-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141543194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}