Pub Date : 2024-12-01DOI: 10.1016/j.cag.2024.104127
Tao Ku , Sam Galanakis , Bas Boom , Remco C. Veltkamp , Darshan Bangera , Shankar Gangisetty , Nikolaos Stagakis , Gerasimos Arvanitis , Konstantinos Moustakas
This article has been retracted: please see Elsevier Policy on Article Withdrawal (https://www.elsevier.com/locate/withdrawalpolicy).
This article has been retracted at the request of the author and Editor-in-Chief.
The authors identified an error in the original paper with the software that was made publicly available on GitHub, where accidentally the testing was carried out using the training set, instead of the correct test set, and therefore the published test results are invalid.
In addition, other minor inaccuracies in the paper were also identified.
The authors intend to correct the errors and resubmit the paper.
{"title":"Retraction notice to “SHREC 2021: 3D point cloud change detection for street scenes”","authors":"Tao Ku , Sam Galanakis , Bas Boom , Remco C. Veltkamp , Darshan Bangera , Shankar Gangisetty , Nikolaos Stagakis , Gerasimos Arvanitis , Konstantinos Moustakas","doi":"10.1016/j.cag.2024.104127","DOIUrl":"10.1016/j.cag.2024.104127","url":null,"abstract":"<div><div>This article has been retracted: please see Elsevier Policy on Article Withdrawal (<span><span>https://www.elsevier.com/locate/withdrawalpolicy</span><svg><path></path></svg></span>).</div><div>This article has been retracted at the request of the author and Editor-in-Chief.</div><div>The authors identified an error in the original paper with the software that was made publicly available on GitHub, where accidentally the testing was carried out using the training set, instead of the correct test set, and therefore the published test results are invalid.</div><div>In addition, other minor inaccuracies in the paper were also identified.</div><div>The authors intend to correct the errors and resubmit the paper.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"125 ","pages":"Article 104127"},"PeriodicalIF":2.5,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142746873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01DOI: 10.1016/j.cag.2024.104110
Ayça Ceylan, Evrim Korkmaz Özay, Burcu Tunga
Converting a color image into a grayscale image is a complex problem that is based on preserving color contrast, sharpness, and luminance. In this paper, a novel image decolorization algorithm is proposed using High Dimensional Model Representation (HDMR) with an optimization procedure that retains color content and contrast. In the proposed algorithm, a color image is first decomposed into HDMR components and then the components are categorized depending on whether they are colored or colorless. After that, the image is reconstructed by merging the weighted colored and colorless HDMR components. The weight coefficients are determined by an optimization process. The proposed algorithm both visually and quantitatively compared with state-of-the-art methods in the literature using various performance evaluation metrics. As regards all obtained results, the HDMR based image decolorization algorithm is more potent and has better performance in overall comparison. Most importantly, this algorithm has a flexible structure as it is able to produce various grayscale images for different thresholds of visible color contrast which makes this algorithm superior given that it is the only one that accomplishes this feat in the literature to the best of our knowledge.
{"title":"Contrast and content preserving HDMR-based color-to-gray conversion","authors":"Ayça Ceylan, Evrim Korkmaz Özay, Burcu Tunga","doi":"10.1016/j.cag.2024.104110","DOIUrl":"10.1016/j.cag.2024.104110","url":null,"abstract":"<div><div>Converting a color image into a grayscale image is a complex problem that is based on preserving color contrast, sharpness, and luminance. In this paper, a novel image decolorization algorithm is proposed using High Dimensional Model Representation (HDMR) with an optimization procedure that retains color content and contrast. In the proposed algorithm, a color image is first decomposed into HDMR components and then the components are categorized depending on whether they are colored or colorless. After that, the image is reconstructed by merging the weighted colored and colorless HDMR components. The weight coefficients are determined by an optimization process. The proposed algorithm both visually and quantitatively compared with state-of-the-art methods in the literature using various performance evaluation metrics. As regards all obtained results, the HDMR based image decolorization algorithm is more potent and has better performance in overall comparison. Most importantly, this algorithm has a flexible structure as it is able to produce various grayscale images for different thresholds of visible color contrast which makes this algorithm superior given that it is the only one that accomplishes this feat in the literature to the best of our knowledge.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"125 ","pages":"Article 104110"},"PeriodicalIF":2.5,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142746872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-28DOI: 10.1016/j.cag.2024.104137
Rita Borgo, João Luiz Dihl Comba
{"title":"Foreword to the special section on Conference on Graphics, Patterns, and Images (SIBGRAPI 2024)","authors":"Rita Borgo, João Luiz Dihl Comba","doi":"10.1016/j.cag.2024.104137","DOIUrl":"10.1016/j.cag.2024.104137","url":null,"abstract":"","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"126 ","pages":"Article 104137"},"PeriodicalIF":2.5,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142742943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-21DOI: 10.1016/j.cag.2024.104109
Carolina Pereira , Tomás Alves , Sandra Gama
Recent research focuses on understanding what triggers cognitive biases and how to alleviate them in the context of visualization use. Given its role in decision-making in other research fields, the Phantom Effect may hold exciting prospects among known biases. The Phantom Effect belongs to the category of decoy effects, where the decoy is an optimal yet unavailable alternative. We conducted a hybrid design experiment (N=76) where participants performed decision tasks based on information represented in different visualization idioms and phantom alternative’s unavailability presentation delays. We measured participants’ perceptual speed and visual working memory to study their impact on the expression of the Phantom Effect. Results show that visualization usually triggers the Phantom Effect, but two-sided bar charts mitigate this bias more effectively. We also found that waiting until the participant decides before presenting the decoy as unavailable helps alleviate the Phantom Effect. Although we did not find measurable effects, results also suggest that visual working memory and visualization literacy play a role in bias susceptibility. Our findings extend prior research in visualization-based decoy effects. They are the first steps to understanding the role of individual differences in the susceptibility to cognitive bias in visualization contexts.
{"title":"The phantom effect in information visualization","authors":"Carolina Pereira , Tomás Alves , Sandra Gama","doi":"10.1016/j.cag.2024.104109","DOIUrl":"10.1016/j.cag.2024.104109","url":null,"abstract":"<div><div>Recent research focuses on understanding what triggers cognitive biases and how to alleviate them in the context of visualization use. Given its role in decision-making in other research fields, the Phantom Effect may hold exciting prospects among known biases. The Phantom Effect belongs to the category of decoy effects, where the decoy is an optimal yet unavailable alternative. We conducted a hybrid design experiment (N=76) where participants performed decision tasks based on information represented in different visualization idioms and phantom alternative’s unavailability presentation delays. We measured participants’ perceptual speed and visual working memory to study their impact on the expression of the Phantom Effect. Results show that visualization usually triggers the Phantom Effect, but two-sided bar charts mitigate this bias more effectively. We also found that waiting until the participant decides before presenting the decoy as unavailable helps alleviate the Phantom Effect. Although we did not find measurable effects, results also suggest that visual working memory and visualization literacy play a role in bias susceptibility. Our findings extend prior research in visualization-based decoy effects. They are the first steps to understanding the role of individual differences in the susceptibility to cognitive bias in visualization contexts.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"125 ","pages":"Article 104109"},"PeriodicalIF":2.5,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142699316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-20DOI: 10.1016/j.cag.2024.104125
Daeun Kang, Hyunah Park, Taesoo Kwon
We propose an accelerated inverse-kinematics (IK) solving method aimed at reconstructing the pose of a 3D model based on the positions of surface markers or feature points. The model encompasses a skeletal structure of joints and a triangular mesh constituting its external surface. A mesh-based IK solving method optimizes the joint configurations to achieve the desired surface pose, assuming that surface markers are attached to the joints using linear-blended skinning, and that the target positions for these surface markers are provided. In the conventional IK solving method, the final position of a given joint is determined by iteratively computing error gradients based on the target marker positions, typically implemented using a 3-nested loop structure. In this paper, we streamline the standard IK computation process by precomputing all redundant terms for future use, leading to a significant reduction in asymptotic time complexity. We experimentally show that our accelerated IK solving method exhibits increasingly superior performance gains as the number of markers increases. Our pose reconstruction tests show performance improvements ranging between 34% and three times compared to a highly optimized implementation of the conventional method.
我们提出了一种加速逆运动学(IK)求解方法,旨在根据表面标记或特征点的位置重建三维模型的姿态。该模型包括由关节组成的骨骼结构和构成其外表面的三角形网格。基于网格的 IK 求解方法可以优化关节配置以实现所需的表面姿态,前提是使用线性混合蒙皮法将表面标记连接到关节上,并提供这些表面标记的目标位置。在传统的 IK 求解方法中,给定关节的最终位置是通过基于目标标记位置迭代计算误差梯度来确定的,通常使用 3 嵌套循环结构来实现。在本文中,我们通过预计算所有冗余项来简化标准 IK 计算过程,从而显著降低渐进时间复杂度。我们的实验表明,随着标记数量的增加,我们的加速 IK 求解方法表现出越来越优异的性能。我们的姿态重建测试表明,与传统方法的高度优化实现相比,性能提高了 34% 到三倍不等。
{"title":"Efficient inverse-kinematics solver for precise pose reconstruction of skinned 3D models","authors":"Daeun Kang, Hyunah Park, Taesoo Kwon","doi":"10.1016/j.cag.2024.104125","DOIUrl":"10.1016/j.cag.2024.104125","url":null,"abstract":"<div><div>We propose an accelerated inverse-kinematics (IK) solving method aimed at reconstructing the pose of a 3D model based on the positions of surface markers or feature points. The model encompasses a skeletal structure of joints and a triangular mesh constituting its external surface. A mesh-based IK solving method optimizes the joint configurations to achieve the desired surface pose, assuming that surface markers are attached to the joints using linear-blended skinning, and that the target positions for these surface markers are provided. In the conventional IK solving method, the final position of a given joint is determined by iteratively computing error gradients based on the target marker positions, typically implemented using a 3-nested loop structure. In this paper, we streamline the standard IK computation process by precomputing all redundant terms for future use, leading to a significant reduction in asymptotic time complexity. We experimentally show that our accelerated IK solving method exhibits increasingly superior performance gains as the number of markers increases. Our pose reconstruction tests show performance improvements ranging between 34% and three times compared to a highly optimized implementation of the conventional method.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"125 ","pages":"Article 104125"},"PeriodicalIF":2.5,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142699328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-19DOI: 10.1016/j.cag.2024.104113
Xiang Li , Jin-Du Wang , John J. Dudley , Per Ola Kristensson
The theory of swarm control shows promise for controlling multiple objects, however, scalability is hindered by cost constraints, such as hardware and infrastructure. Virtual Reality (VR) can overcome these limitations, but research on swarm interaction in VR is limited. This paper introduces a novel swarm manipulation technique and compares it with two baseline techniques: Virtual Hand and Controller (ray-casting). We evaluated these techniques in a user study ( = 12) in three tasks (selection, rotation, and resizing) across five conditions. Our results indicate that swarm manipulation yielded superior performance, with significantly faster speeds in most conditions across the three tasks. It notably reduced resizing size deviations but introduced a trade-off between speed and accuracy in the rotation task. Additionally, we conducted a follow-up user study ( = 6) using swarm manipulation in two complex VR scenarios and obtained insights through semi-structured interviews, shedding light on optimized swarm control mechanisms and perceptual changes induced by this interaction paradigm. These results demonstrate the potential of the swarm manipulation technique to enhance the usability and user experience in VR compared to conventional manipulation techniques. In future studies, we aim to understand and improve swarm interaction via internal swarm particle cooperation.
{"title":"Swarm manipulation: An efficient and accurate technique for multi-object manipulation in virtual reality","authors":"Xiang Li , Jin-Du Wang , John J. Dudley , Per Ola Kristensson","doi":"10.1016/j.cag.2024.104113","DOIUrl":"10.1016/j.cag.2024.104113","url":null,"abstract":"<div><div>The theory of swarm control shows promise for controlling multiple objects, however, scalability is hindered by cost constraints, such as hardware and infrastructure. Virtual Reality (VR) can overcome these limitations, but research on swarm interaction in VR is limited. This paper introduces a novel swarm manipulation technique and compares it with two baseline techniques: Virtual Hand and Controller (ray-casting). We evaluated these techniques in a user study (<span><math><mi>N</mi></math></span> = 12) in three tasks (selection, rotation, and resizing) across five conditions. Our results indicate that swarm manipulation yielded superior performance, with significantly faster speeds in most conditions across the three tasks. It notably reduced resizing size deviations but introduced a trade-off between speed and accuracy in the rotation task. Additionally, we conducted a follow-up user study (<span><math><mi>N</mi></math></span> = 6) using swarm manipulation in two complex VR scenarios and obtained insights through semi-structured interviews, shedding light on optimized swarm control mechanisms and perceptual changes induced by this interaction paradigm. These results demonstrate the potential of the swarm manipulation technique to enhance the usability and user experience in VR compared to conventional manipulation techniques. In future studies, we aim to understand and improve swarm interaction via internal swarm particle cooperation.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"125 ","pages":"Article 104113"},"PeriodicalIF":2.5,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142721450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-15DOI: 10.1016/j.cag.2024.104123
Henry Ehlers , Daniel Pahr , Velitchko Filipov , Hsiang-Yun Wu , Renata G. Raidou
From social networks to brain connectivity, ego networks are a simple yet powerful approach to visualizing parts of a larger graph, i.e. those related to a selected focal node — the so-called “ego”. While surveys and comparisons of general graph visualization approaches exist in the literature, we note (i) the many conflicting results of comparisons of adjacency matrices and node-link diagrams, thus motivating further study, as well as (ii) the absence of such systematic comparisons for ego networks specifically. In this paper, we propose the development of empirical recommendations for ego network visualization strategies. First, we survey the literature across application domains and collect examples of network visualizations to identify the most common visual encodings, namely straight-line, radial, and layered node-link diagrams, as well as adjacency matrices. These representations are then applied to a representative, intermediate-sized network and subsequently compared in a large-scale, crowd-sourced user study in a mixed-methods analysis setup to investigate their impact on both user experience and performance. Within the limits of this study, and contrary to previous comparative investigations of adjacency matrices and node-link diagrams (outside of ego networks specifically), participants performed systematically worse when using adjacency matrices than those using node-link diagrammatic representations. Similar to previous comparisons of different node-link diagrams, we do not detect any notable differences in participant performance between the three node-link diagrams. Lastly, our quantitative and qualitative results indicate that participants found adjacency matrices harder to learn, use, and understand than node-link diagrams. We conclude that in terms of both participant experience and performance, a layered node-link diagrammatic representation appears to be the most preferable for ego network visualization purposes.
{"title":"Me! Me! Me! Me! A study and comparison of ego network representations","authors":"Henry Ehlers , Daniel Pahr , Velitchko Filipov , Hsiang-Yun Wu , Renata G. Raidou","doi":"10.1016/j.cag.2024.104123","DOIUrl":"10.1016/j.cag.2024.104123","url":null,"abstract":"<div><div>From social networks to brain connectivity, ego networks are a simple yet powerful approach to visualizing parts of a larger graph, i.e. those related to a selected focal node — the so-called “ego”. While surveys and comparisons of general graph visualization approaches exist in the literature, we note (i) the many conflicting results of comparisons of adjacency matrices and node-link diagrams, thus motivating further study, as well as (ii) the absence of such systematic comparisons for ego networks specifically. In this paper, we propose the development of empirical recommendations for ego network visualization strategies. First, we survey the literature across application domains and collect examples of network visualizations to identify the most common visual encodings, namely straight-line, radial, and layered node-link diagrams, as well as adjacency matrices. These representations are then applied to a representative, intermediate-sized network and subsequently compared in a large-scale, crowd-sourced user study in a mixed-methods analysis setup to investigate their impact on both user experience and performance. Within the limits of this study, and contrary to previous comparative investigations of adjacency matrices and node-link diagrams (outside of ego networks specifically), <em>participants performed systematically worse when using adjacency matrices than those using node-link diagrammatic representations</em>. Similar to previous comparisons of different node-link diagrams, <em>we do not detect any notable differences in participant performance between the three node-link diagrams</em>. Lastly, our quantitative and qualitative results indicate that <em>participants found adjacency matrices harder to learn, use, and understand than node-link diagrams</em>. We conclude that in terms of both participant experience and performance, a layered node-link diagrammatic representation appears to be the most preferable for ego network visualization purposes.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"125 ","pages":"Article 104123"},"PeriodicalIF":2.5,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142699327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Enhancing Visual Analytics (VA) systems with guidance, such as the automated provision of data-driven suggestions and answers to the user’s task, is becoming increasingly important and common. However, how to design such systems remains a challenging task. We present a methodology to aid and structure the design of guidance for enhancing VA solutions consisting of four steps: (S1) defining the target of analysis, (S2) identifying the user tasks, (S3) describing the guidance tasks, and (S4) placing guidance. In summary, our proposed methodology specifies a space of possible user tasks and maps them to the corresponding space of guidance tasks, using recent abstract task typologies for guidance and visualization. We exemplify this methodology through two case studies from the literature: Overview, a system for exploring and labeling document collections aimed at journalists, and DoRIAH, a system for historical imagery analysis. We show how our methodology enriches existing VA solutions with guidance and provides a structured way to design guidance in complex VA scenarios.
{"title":"Enhancing Visual Analytics systems with guidance: A task-driven methodology","authors":"Ignacio Pérez-Messina, Davide Ceneda, Silvia Miksch","doi":"10.1016/j.cag.2024.104121","DOIUrl":"10.1016/j.cag.2024.104121","url":null,"abstract":"<div><div>Enhancing Visual Analytics (VA) systems with guidance, such as the automated provision of data-driven suggestions and answers to the user’s task, is becoming increasingly important and common. However, how to design such systems remains a challenging task. We present a methodology to aid and structure the design of guidance for enhancing VA solutions consisting of four steps: (S1) defining the target of analysis, (S2) identifying the user tasks, (S3) describing the guidance tasks, and (S4) placing guidance. In summary, our proposed methodology specifies a space of possible user tasks and maps them to the corresponding space of guidance tasks, using recent abstract task typologies for guidance and visualization. We exemplify this methodology through two case studies from the literature: <em>Overview</em>, a system for exploring and labeling document collections aimed at journalists, and <em>DoRIAH</em>, a system for historical imagery analysis. We show how our methodology enriches existing VA solutions with guidance and provides a structured way to design guidance in complex VA scenarios.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"125 ","pages":"Article 104121"},"PeriodicalIF":2.5,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-07DOI: 10.1016/j.cag.2024.104118
Huan Fu, Guoqing Cheng
Text-to-image synthesis is a challenging task that aims to generate realistic and diverse images from natural language descriptions. However, existing text-to-image diffusion models (e.g., Stable Diffusion) sometimes fail to satisfy the semantic descriptions of the users, especially when the prompts contain multiple concepts or modifiers such as colors. By visualizing the cross-attention maps of the Stable Diffusion model during the denoising process, we find that one of the concepts has a very scattered attention map, which cannot form a whole and gradually gets ignored. Moreover, the attention maps of the modifiers are hard to overlap with the corresponding concepts, resulting in incorrect semantic mapping. To address this issue, we propose a Gather-and-Bind method that intervenes in the cross-attention maps during the denoising process to alleviate the catastrophic forgetting and attribute binding problems without any pre-training. Specifically, we first use information entropy to measure the dispersion degree of the cross-attention maps and construct an information entropy loss to gather these scattered attention maps, which eventually captures all the concepts in the generated output. Furthermore, we construct an attribute binding loss that minimizes the distance between the attention maps of the attributes and their corresponding concepts, which enables the model to establish correct semantic mapping and significantly improves the performance of the baseline model. We conduct extensive experiments on public datasets and demonstrate that our method can better capture the semantic information of the input prompts. Code is available at https://github.com/huan085128/Gather-and-Bind.
{"title":"Enhancing semantic mapping in text-to-image diffusion via Gather-and-Bind","authors":"Huan Fu, Guoqing Cheng","doi":"10.1016/j.cag.2024.104118","DOIUrl":"10.1016/j.cag.2024.104118","url":null,"abstract":"<div><div>Text-to-image synthesis is a challenging task that aims to generate realistic and diverse images from natural language descriptions. However, existing text-to-image diffusion models (e.g., Stable Diffusion) sometimes fail to satisfy the semantic descriptions of the users, especially when the prompts contain multiple concepts or modifiers such as colors. By visualizing the cross-attention maps of the Stable Diffusion model during the denoising process, we find that one of the concepts has a very scattered attention map, which cannot form a whole and gradually gets ignored. Moreover, the attention maps of the modifiers are hard to overlap with the corresponding concepts, resulting in incorrect semantic mapping. To address this issue, we propose a Gather-and-Bind method that intervenes in the cross-attention maps during the denoising process to alleviate the catastrophic forgetting and attribute binding problems without any pre-training. Specifically, we first use information entropy to measure the dispersion degree of the cross-attention maps and construct an information entropy loss to gather these scattered attention maps, which eventually captures all the concepts in the generated output. Furthermore, we construct an attribute binding loss that minimizes the distance between the attention maps of the attributes and their corresponding concepts, which enables the model to establish correct semantic mapping and significantly improves the performance of the baseline model. We conduct extensive experiments on public datasets and demonstrate that our method can better capture the semantic information of the input prompts. Code is available at <span><span>https://github.com/huan085128/Gather-and-Bind</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"125 ","pages":"Article 104118"},"PeriodicalIF":2.5,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-07DOI: 10.1016/j.cag.2024.104119
Prachi Kudeshia, Muhammad Altaf Agowun, Jiju Poovvancheri
Geometry and topology are vital elements in discerning and describing the shape of an object. Geometric complexes constructed on the point cloud of a 3D object capture the geometry as well as topological features of the underlying shape space. Leveraging this aspect of geometric complexes, we present an attention-based dual stream graph neural network (DS-GNN) for 3D shape classification. In the first stream of DS-GNN, we introduce spiked skeleton complex (SSC) for learning the shape patterns through comprehensive feature integration of the point cloud’s core structure. SSC is a novel and concise geometric complex comprising principal plane-based cluster centroids complemented with per-centroid spatial locality information. The second stream of DS-GNN consists of alpha complex which facilitates the learning of geometric patterns embedded in the object shapes via higher dimensional simplicial attention. To evaluate the model’s response to different shape topologies, we perform a persistent homology-based object segregation that groups the objects based on the underlying topological space characteristics quantified through the second Betti number. Our experimental study on benchmark datasets such as ModelNet40 and ScanObjectNN shows the potential of the proposed GNN for the classification of 3D shapes with different topologies and offers an alternative to the current evaluation practices in this domain.
{"title":"Learning geometric complexes for 3D shape classification","authors":"Prachi Kudeshia, Muhammad Altaf Agowun, Jiju Poovvancheri","doi":"10.1016/j.cag.2024.104119","DOIUrl":"10.1016/j.cag.2024.104119","url":null,"abstract":"<div><div>Geometry and topology are vital elements in discerning and describing the shape of an object. Geometric complexes constructed on the point cloud of a 3D object capture the geometry as well as topological features of the underlying shape space. Leveraging this aspect of geometric complexes, we present an attention-based dual stream graph neural network (DS-GNN) for 3D shape classification. In the first stream of DS-GNN, we introduce spiked skeleton complex (SSC) for learning the shape patterns through comprehensive feature integration of the point cloud’s core structure. SSC is a novel and concise geometric complex comprising principal plane-based cluster centroids complemented with per-centroid spatial locality information. The second stream of DS-GNN consists of alpha complex which facilitates the learning of geometric patterns embedded in the object shapes via higher dimensional simplicial attention. To evaluate the model’s response to different shape topologies, we perform a persistent homology-based object segregation that groups the objects based on the underlying topological space characteristics quantified through the second Betti number. Our experimental study on benchmark datasets such as ModelNet40 and ScanObjectNN shows the potential of the proposed GNN for the classification of 3D shapes with different topologies and offers an alternative to the current evaluation practices in this domain.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"125 ","pages":"Article 104119"},"PeriodicalIF":2.5,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}