Computational Visual Media最新文献_第3页

Symmetrization of quasi-regular patterns with periodic tilting of regular polygons 用规则多边形的周期性倾斜对称准规则图案

IF 6.9 3区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computational Visual Media

Pub Date : 2024-04-27 DOI: 10.1007/s41095-023-0359-z

Zhengzheng Yin, Yao Jin, Zhijian Fang, Yun Zhang, Huaxiong Zhang, Jiu Zhou, Lili He

Computer-generated aesthetic patterns are widely used as design materials in various fields. The most common methods use fractals or dynamical systems as basic tools to create various patterns. To enhance aesthetics and controllability, some researchers have introduced symmetric layouts along with these tools. One popular strategy employs dynamical systems compatible with symmetries that construct functions with the desired symmetries. However, these are typically confined to simple planar symmetries. The other generates symmetrical patterns under the constraints of tilings. Although it is slightly more flexible, it is restricted to small ranges of tilings and lacks textural variations. Thus, we proposed a new approach for generating aesthetic patterns by symmetrizing quasi-regular patterns using general k-uniform tilings. We adopted a unified strategy to construct invariant mappings for k-uniform tilings that can eliminate texture seams across the tiling edges. Furthermore, we constructed three types of symmetries associated with the patterns: dihedral, rotational, and reflection symmetries. The proposed method can be easily implemented using GPU shaders and is highly efficient and suitable for complicated tiling with regular polygons. Experiments demonstrated the advantages of our method over state-of-the-art methods in terms of flexibility in controlling the generation of patterns with various parameters as well as the diversity of textures and styles.

计算机生成的美学图案作为设计素材被广泛应用于各个领域。最常见的方法是使用分形或动力系统作为基本工具来创建各种图案。为了增强美感和可控性，一些研究人员在使用这些工具的同时还引入了对称布局。一种流行的策略是采用与对称性兼容的动力系统，构建具有所需对称性的函数。不过，这些方法通常仅限于简单的平面对称。另一种则是在倾斜的限制下生成对称图案。虽然这种方法稍微灵活一些，但它仅限于小范围的倾斜，而且缺乏纹理变化。因此，我们提出了一种新方法，利用一般的 k-uniform tilings 对准规则图案进行对称，从而生成美观的图案。我们采用统一的策略来构建 k-uniform tilings 的不变映射，从而消除了 tiling 边缘的纹理接缝。此外，我们还构建了与图案相关的三类对称性：二面对称、旋转对称和反射对称。所提出的方法可以通过 GPU 着色器轻松实现，而且效率很高，适用于带有规则多边形的复杂平铺。实验证明，与最先进的方法相比，我们的方法在灵活控制各种参数的图案生成以及纹理和样式的多样性方面具有优势。

{"title":"Symmetrization of quasi-regular patterns with periodic tilting of regular polygons","authors":"Zhengzheng Yin, Yao Jin, Zhijian Fang, Yun Zhang, Huaxiong Zhang, Jiu Zhou, Lili He","doi":"10.1007/s41095-023-0359-z","DOIUrl":"https://doi.org/10.1007/s41095-023-0359-z","url":null,"abstract":"Computer-generated aesthetic patterns are widely used as design materials in various fields. The most common methods use fractals or dynamical systems as basic tools to create various patterns. To enhance aesthetics and controllability, some researchers have introduced symmetric layouts along with these tools. One popular strategy employs dynamical systems compatible with symmetries that construct functions with the desired symmetries. However, these are typically confined to simple planar symmetries. The other generates symmetrical patterns under the constraints of tilings. Although it is slightly more flexible, it is restricted to small ranges of tilings and lacks textural variations. Thus, we proposed a new approach for generating aesthetic patterns by symmetrizing quasi-regular patterns using general k-uniform tilings. We adopted a unified strategy to construct invariant mappings for k-uniform tilings that can eliminate texture seams across the tiling edges. Furthermore, we constructed three types of symmetries associated with the patterns: dihedral, rotational, and reflection symmetries. The proposed method can be easily implemented using GPU shaders and is highly efficient and suitable for complicated tiling with regular polygons. Experiments demonstrated the advantages of our method over state-of-the-art methods in terms of flexibility in controlling the generation of patterns with various parameters as well as the diversity of textures and styles.\u0000","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"175 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140809277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Joint training with local soft attention and dual cross-neighbor label smoothing for unsupervised person re-identification 利用局部软关注和双交叉邻域标签平滑进行联合训练，实现无监督人员再识别

IF 6.9 3区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computational Visual Media

Pub Date : 2024-04-27 DOI: 10.1007/s41095-023-0354-4

Qing Han, Longfei Li, Weidong Min, Qi Wang, Qingpeng Zeng, Shimiao Cui, Jiongjin Chen

Existing unsupervised person re-identification approaches fail to fully capture the fine-grained features of local regions, which can result in people with similar appearances and different identities being assigned the same label after clustering. The identity-independent information contained in different local regions leads to different levels of local noise. To address these challenges, joint training with local soft attention and dual cross-neighbor label smoothing (DCLS) is proposed in this study. First, the joint training is divided into global and local parts, whereby a soft attention mechanism is proposed for the local branch to accurately capture the subtle differences in local regions, which improves the ability of the re-identification model in identifying a person’s local significant features. Second, DCLS is designed to progressively mitigate label noise in different local regions. The DCLS uses global and local similarity metrics to semantically align the global and local regions of the person and further determines the proximity association between local regions through the cross information of neighboring regions, thereby achieving label smoothing of the global and local regions throughout the training process. In extensive experiments, the proposed method outperformed existing methods under unsupervised settings on several standard person re-identification datasets.

现有的无监督人员再识别方法未能充分捕捉局部区域的细粒度特征，这可能导致外表相似但身份不同的人在聚类后被贴上相同的标签。不同局部区域所包含的与身份无关的信息会导致不同程度的局部噪声。为了解决这些难题，本研究提出了局部软关注和双交叉邻域标签平滑（DCLS）联合训练。首先，联合训练分为全局和局部两部分，其中局部部分采用软注意力机制，以准确捕捉局部区域的细微差别，从而提高再识别模型识别人的局部重要特征的能力。其次，DCLS 是为了逐步减轻不同局部区域的标签噪声而设计的。DCLS 使用全局和局部相似度指标对人物的全局和局部区域进行语义对齐，并通过相邻区域的交叉信息进一步确定局部区域之间的近似关联，从而在整个训练过程中实现全局和局部区域的标签平滑。在大量实验中，所提出的方法在多个标准人物再识别数据集上的无监督设置下的表现优于现有方法。

{"title":"Joint training with local soft attention and dual cross-neighbor label smoothing for unsupervised person re-identification","authors":"Qing Han, Longfei Li, Weidong Min, Qi Wang, Qingpeng Zeng, Shimiao Cui, Jiongjin Chen","doi":"10.1007/s41095-023-0354-4","DOIUrl":"https://doi.org/10.1007/s41095-023-0354-4","url":null,"abstract":"Existing unsupervised person re-identification approaches fail to fully capture the fine-grained features of local regions, which can result in people with similar appearances and different identities being assigned the same label after clustering. The identity-independent information contained in different local regions leads to different levels of local noise. To address these challenges, joint training with local soft attention and dual cross-neighbor label smoothing (DCLS) is proposed in this study. First, the joint training is divided into global and local parts, whereby a soft attention mechanism is proposed for the local branch to accurately capture the subtle differences in local regions, which improves the ability of the re-identification model in identifying a person’s local significant features. Second, DCLS is designed to progressively mitigate label noise in different local regions. The DCLS uses global and local similarity metrics to semantically align the global and local regions of the person and further determines the proximity association between local regions through the cross information of neighboring regions, thereby achieving label smoothing of the global and local regions throughout the training process. In extensive experiments, the proposed method outperformed existing methods under unsupervised settings on several standard person re-identification datasets.\u0000","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"36 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140809186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DepthGAN: GAN-based depth generation from semantic layouts DepthGAN：基于 GAN 的语义布局深度生成

IF 6.9 3区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computational Visual Media

Pub Date : 2024-04-27 DOI: 10.1007/s41095-023-0350-8

Yidi Li, Jun Xiao, Yiqun Wang, Zhengda Lu

Existing GAN-based generative methods are typically used for semantic image synthesis. We pose the question of whether GAN-based architectures can generate plausible depth maps and find that existing methods have difficulty in generating depth maps which reasonably represent 3D scene structure due to the lack of global geometric correlations. Thus, we propose DepthGAN, a novel method of generating a depth map using a semantic layout as input to aid construction, and manipulation of well-structured 3D scene point clouds. Specifically, we first build a feature generation model with a cascade of semantically-aware transformer blocks to obtain depth features with global structural information. For our semantically aware transformer block, we propose a mixed attention module and a semantically aware layer normalization module to better exploit semantic consistency for depth features generation. Moreover, we present a novel semantically weighted depth synthesis module, which generates adaptive depth intervals for the current scene. We generate the final depth map by using a weighted combination of semantically aware depth weights for different depth ranges. In this manner, we obtain a more accurate depth map. Extensive experiments on indoor and outdoor datasets demonstrate that DepthGAN achieves superior results both quantitatively and visually for the depth generation task.

现有的基于 GAN 的生成方法通常用于语义图像合成。我们提出了基于 GAN 的架构能否生成可信深度图的问题，并发现由于缺乏全局几何相关性，现有方法难以生成合理表现三维场景结构的深度图。因此，我们提出了 DepthGAN，一种使用语义布局作为输入来生成深度图的新方法，以帮助构建和操作结构良好的三维场景点云。具体来说，我们首先利用语义感知转换块级联建立一个特征生成模型，以获取具有全局结构信息的深度特征。对于语义感知转换模块，我们提出了混合注意力模块和语义感知层归一化模块，以更好地利用语义一致性生成深度特征。此外，我们还提出了一个新颖的语义加权深度合成模块，它能为当前场景生成自适应深度区间。我们通过对不同深度范围的语义深度权重进行加权组合，生成最终的深度图。通过这种方式，我们获得了更精确的深度图。在室内和室外数据集上进行的大量实验表明，DepthGAN 在深度生成任务的定量和视觉效果上都取得了卓越的成果。

{"title":"DepthGAN: GAN-based depth generation from semantic layouts","authors":"Yidi Li, Jun Xiao, Yiqun Wang, Zhengda Lu","doi":"10.1007/s41095-023-0350-8","DOIUrl":"https://doi.org/10.1007/s41095-023-0350-8","url":null,"abstract":"Existing GAN-based generative methods are typically used for semantic image synthesis. We pose the question of whether GAN-based architectures can generate plausible depth maps and find that existing methods have difficulty in generating depth maps which reasonably represent 3D scene structure due to the lack of global geometric correlations. Thus, we propose DepthGAN, a novel method of generating a depth map using a semantic layout as input to aid construction, and manipulation of well-structured 3D scene point clouds. Specifically, we first build a feature generation model with a cascade of semantically-aware transformer blocks to obtain depth features with global structural information. For our semantically aware transformer block, we propose a mixed attention module and a semantically aware layer normalization module to better exploit semantic consistency for depth features generation. Moreover, we present a novel semantically weighted depth synthesis module, which generates adaptive depth intervals for the current scene. We generate the final depth map by using a weighted combination of semantically aware depth weights for different depth ranges. In this manner, we obtain a more accurate depth map. Extensive experiments on indoor and outdoor datasets demonstrate that DepthGAN achieves superior results both quantitatively and visually for the depth generation task.\u0000","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"72 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140809310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Physics-based fluid simulation in computer graphics: Survey, research trends, and challenges 计算机制图中基于物理的流体模拟：调查、研究趋势和挑战

IF 6.9 3区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computational Visual Media

Pub Date : 2024-04-27 DOI: 10.1007/s41095-023-0368-y

Xiaokun Wang, Yanrui Xu, Sinuo Liu, Bo Ren, Jiří Kosinka, Alexandru C. Telea, Jiamin Wang, Chongming Song, Jian Chang, Chenfeng Li, Jian Jun Zhang, Xiaojuan Ban

Physics-based fluid simulation has played an increasingly important role in the computer graphics community. Recent methods in this area have greatly improved the generation of complex visual effects and its computational efficiency. Novel techniques have emerged to deal with complex boundaries, multiphase fluids, gas–liquid interfaces, and fine details. The parallel use of machine learning, image processing, and fluid control technologies has brought many interesting and novel research perspectives. In this survey, we provide an introduction to theoretical concepts underpinning physics-based fluid simulation and their practical implementation, with the aim for it to serve as a guide for both newcomers and seasoned researchers to explore the field of physics-based fluid simulation, with a focus on developments in the last decade. Driven by the distribution of recent publications in the field, we structure our survey to cover physical background; discretization approaches; computational methods that address scalability; fluid interactions with other materials and interfaces; and methods for expressive aspects of surface detail and control. From a practical perspective, we give an overview of existing implementations available for the above methods.

基于物理的流体模拟在计算机制图领域发挥着越来越重要的作用。该领域的最新方法大大提高了复杂视觉效果的生成和计算效率。出现了处理复杂边界、多相流体、气液界面和精细细节的新技术。机器学习、图像处理和流体控制技术的并行使用带来了许多有趣而新颖的研究视角。在本调查报告中，我们介绍了基于物理的流体模拟的理论概念及其实际应用，旨在为新手和经验丰富的研究人员探索基于物理的流体模拟领域提供指导，重点关注近十年来的发展。在该领域最新出版物分布的推动下，我们的调查报告涵盖了物理背景、离散化方法、解决可扩展性问题的计算方法、流体与其他材料和界面的相互作用，以及表面细节和控制的表现方法。从实用角度出发，我们概述了上述方法的现有实现方法。

{"title":"Physics-based fluid simulation in computer graphics: Survey, research trends, and challenges","authors":"Xiaokun Wang, Yanrui Xu, Sinuo Liu, Bo Ren, Jiří Kosinka, Alexandru C. Telea, Jiamin Wang, Chongming Song, Jian Chang, Chenfeng Li, Jian Jun Zhang, Xiaojuan Ban","doi":"10.1007/s41095-023-0368-y","DOIUrl":"https://doi.org/10.1007/s41095-023-0368-y","url":null,"abstract":"Physics-based fluid simulation has played an increasingly important role in the computer graphics community. Recent methods in this area have greatly improved the generation of complex visual effects and its computational efficiency. Novel techniques have emerged to deal with complex boundaries, multiphase fluids, gas–liquid interfaces, and fine details. The parallel use of machine learning, image processing, and fluid control technologies has brought many interesting and novel research perspectives. In this survey, we provide an introduction to theoretical concepts underpinning physics-based fluid simulation and their practical implementation, with the aim for it to serve as a guide for both newcomers and seasoned researchers to explore the field of physics-based fluid simulation, with a focus on developments in the last decade. Driven by the distribution of recent publications in the field, we structure our survey to cover physical background; discretization approaches; computational methods that address scalability; fluid interactions with other materials and interfaces; and methods for expressive aspects of surface detail and control. From a practical perspective, we give an overview of existing implementations available for the above methods.\u0000","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"11 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140809331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning to compose diversified prompts for image emotion classification 学习编写用于图像情感分类的多样化提示语

IF 6.9 3区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computational Visual Media

Pub Date : 2024-04-26 DOI: 10.1007/s41095-023-0389-6

Sinuo Deng, Lifang Wu, Ge Shi, Lehao Xing, Meng Jian, Ye Xiang, Ruihai Dong

Image emotion classification (IEC) aims to extract the abstract emotions evoked in images. Recently, language-supervised methods such as contrastive language-image pretraining (CLIP) have demonstrated superior performance in image understanding. However, the underexplored task of IEC presents three major challenges: a tremendous training objective gap between pretraining and IEC, shared suboptimal prompts, and invariant prompts for all instances. In this study, we propose a general framework that effectively exploits the language-supervised CLIP method for the IEC task. First, a prompt-tuning method that mimics the pretraining objective of CLIP is introduced, to exploit the rich image and text semantics associated with CLIP. Subsequently, instance-specific prompts are automatically composed, conditioning them on the categories and image content of instances, diversifying the prompts, and thus avoiding suboptimal problems. Evaluations on six widely used affective datasets show that the proposed method significantly outperforms state-of-the-art methods (up to 9.29% accuracy gain on the EmotionROI dataset) on IEC tasks with only a few trained parameters. The code is publicly available at https://github.com/dsn0w/PT-DPC/for research purposes.

图像情感分类（IEC）旨在提取图像中唤起的抽象情感。最近，对比语言-图像预训练（CLIP）等语言监督方法在图像理解方面表现出了卓越的性能。然而，未被充分探索的 IEC 任务面临着三大挑战：预训练与 IEC 之间巨大的训练目标差距、共享次优提示以及所有实例的不变提示。在本研究中，我们提出了一个通用框架，可有效利用语言监督 CLIP 方法来完成 IEC 任务。首先，我们引入了一种模仿 CLIP 预训练目标的提示调整方法，以利用与 CLIP 相关的丰富图像和文本语义。随后，根据实例的类别和图像内容自动生成针对特定实例的提示，使提示多样化，从而避免出现次优问题。在六个广泛使用的情感数据集上进行的评估表明，在 IEC 任务上，只需少量训练参数，所提出的方法就能明显优于最先进的方法（在 EmotionROI 数据集上的准确率提高了 9.29%）。代码可在 https://github.com/dsn0w/PT-DPC/for 研究网站上公开获取。

{"title":"Learning to compose diversified prompts for image emotion classification","authors":"Sinuo Deng, Lifang Wu, Ge Shi, Lehao Xing, Meng Jian, Ye Xiang, Ruihai Dong","doi":"10.1007/s41095-023-0389-6","DOIUrl":"https://doi.org/10.1007/s41095-023-0389-6","url":null,"abstract":"Image emotion classification (IEC) aims to extract the abstract emotions evoked in images. Recently, language-supervised methods such as contrastive language-image pretraining (CLIP) have demonstrated superior performance in image understanding. However, the underexplored task of IEC presents three major challenges: a tremendous training objective gap between pretraining and IEC, shared suboptimal prompts, and invariant prompts for all instances. In this study, we propose a general framework that effectively exploits the language-supervised CLIP method for the IEC task. First, a prompt-tuning method that mimics the pretraining objective of CLIP is introduced, to exploit the rich image and text semantics associated with CLIP. Subsequently, instance-specific prompts are automatically composed, conditioning them on the categories and image content of instances, diversifying the prompts, and thus avoiding suboptimal problems. Evaluations on six widely used affective datasets show that the proposed method significantly outperforms state-of-the-art methods (up to 9.29% accuracy gain on the EmotionROI dataset) on IEC tasks with only a few trained parameters. The code is publicly available at https://github.com/dsn0w/PT-DPC/for research purposes.\u0000","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"44 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140883709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CTSN: Predicting cloth deformation for skeleton-based characters with a two-stream skinning network CTSN：利用双流皮肤网络预测基于骨骼的角色的布料变形

IF 6.9 3区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computational Visual Media

Pub Date : 2024-04-19 DOI: 10.1007/s41095-023-0344-6

Yudi Li, Min Tang, Yun Yang, Ruofeng Tong, Shuangcai Yang, Yao Li, Bailin An, Qilong Kou

We present a novel learning method using a two-stream network to predict cloth deformation for skeleton-based characters. The characters processed in our approach are not limited to humans, and can be other targets with skeleton-based representations such as fish or pets. We use a novel network architecture which consists of skeleton-based and mesh-based residual networks to learn the coarse features and wrinkle features forming the overall residual from the template cloth mesh. Our network may be used to predict the deformation for loose or tight-fitting clothing. The memory footprint of our network is low, thereby resulting in reduced computational requirements. In practice, a prediction for a single cloth mesh for a skeleton-based character takes about 7 ms on an nVidia GeForce RTX 3090 GPU. Compared to prior methods, our network can generate finer deformation results with details and wrinkles.

我们提出了一种新颖的学习方法，利用双流网络预测基于骨骼的人物的布变形。我们的方法所处理的人物并不局限于人类，也可以是其他基于骨骼表示的目标，如鱼或宠物。我们采用了一种新颖的网络架构，该架构由基于骨架和基于网格的残差网络组成，用于从模板布料网格中学习形成整体残差的粗特征和皱纹特征。我们的网络可用于预测宽松或紧身服装的变形。我们的网络占用内存少，从而降低了计算要求。实际上，在 nVidia GeForce RTX 3090 GPU 上对基于骨骼的角色的单个布料网格进行预测大约需要 7 毫秒。与之前的方法相比，我们的网络可以生成更精细的变形结果，包括细节和褶皱。

引用次数: 0

Multi3D: 3D-aware multimodal image synthesis Multi3D：三维感知多模态图像合成

IF 6.9 3区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computational Visual Media

Pub Date : 2024-04-03 DOI: 10.1007/s41095-024-0422-4

Abstract

3D-aware image synthesis has attained high quality and robust 3D consistency. Existing 3D controllable generative models are designed to synthesize 3D-aware images through a single modality, such as 2D segmentation or sketches, but lack the ability to finely control generated content, such as texture and age. In pursuit of enhancing user-guided controllability, we propose Multi3D, a 3D-aware controllable image synthesis model that supports multi-modal input. Our model can govern the geometry of the generated image using a 2D label map, such as a segmentation or sketch map, while concurrently regulating the appearance of the generated image through a textual description. To demonstrate the effectiveness of our method, we have conducted experiments on multiple datasets, including CelebAMask-HQ, AFHQ-cat, and shapenet-car. Qualitative and quantitative evaluations show that our method outperforms existing state-of-the-art methods.

摘要三维感知图像合成已经实现了高质量和稳健的三维一致性。现有的三维可控生成模型旨在通过二维分割或草图等单一方式合成三维感知图像，但缺乏精细控制生成内容（如纹理和年龄）的能力。为了提高用户引导的可控性，我们提出了支持多模态输入的 3D 感知可控图像合成模型 Multi3D。我们的模型可以使用二维标签图（如分割图或草图）来控制生成图像的几何形状，同时通过文字描述来控制生成图像的外观。为了证明我们方法的有效性，我们在多个数据集上进行了实验，包括 CelebAMask-HQ、AFHQ-cat 和 shapenet-car。定性和定量评估结果表明，我们的方法优于现有的最先进方法。

引用次数: 0

Active self-training for weakly supervised 3D scene semantic segmentation 弱监督三维场景语义分割的主动自我训练

IF 6.9 3区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computational Visual Media

Pub Date : 2024-03-22 DOI: 10.1007/s41095-022-0311-7

Gengxin Liu, Oliver van Kaick, Hui Huang, Ruizhen Hu

Since the preparation of labeled data for training semantic segmentation networks of point clouds is a time-consuming process, weakly supervised approaches have been introduced to learn from only a small fraction of data. These methods are typically based on learning with contrastive losses while automatically deriving per-point pseudo-labels from a sparse set of user-annotated labels. In this paper, our key observation is that the selection of which samples to annotate is as important as how these samples are used for training. Thus, we introduce a method for weakly supervised segmentation of 3D scenes that combines self-training with active learning. Active learning selects points for annotation that are likely to result in improvements to the trained model, while self-training makes efficient use of the user-provided labels for learning the model. We demonstrate that our approach leads to an effective method that provides improvements in scene segmentation over previous work and baselines, while requiring only a few user annotations.

由于准备用于训练点云语义分割网络的标记数据是一个耗时的过程，因此引入了弱监督方法，只从一小部分数据中学习。这些方法通常基于对比损失学习，同时从稀疏的用户注释标签集中自动推导出每个点的伪标签。在本文中，我们的主要观点是，选择注释哪些样本与如何使用这些样本进行训练同样重要。因此，我们引入了一种结合自我训练和主动学习的弱监督三维场景分割方法。主动学习选择有可能改进训练模型的点进行标注，而自我训练则有效利用用户提供的标签来学习模型。我们证明，我们的方法是一种有效的方法，与以前的工作和基线相比，它在场景分割方面有所改进，同时只需要少量用户注释。

引用次数: 0

Class-conditional domain adaptation for semantic segmentation 用于语义分割的类条件域适应

IF 6.9 3区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computational Visual Media

Pub Date : 2024-03-22 DOI: 10.1007/s41095-023-0362-4

Yue Wang, Yuke Li, James H. Elder, Runmin Wu, Huchuan Lu

Semantic segmentation is an important sub-task for many applications. However, pixel-level ground-truth labeling is costly, and there is a tendency to overfit to training data, thereby limiting the generalization ability. Unsupervised domain adaptation can potentially address these problems by allowing systems trained on labelled datasets from the source domain (including less expensive synthetic domain) to be adapted to a novel target domain. The conventional approach involves automatic extraction and alignment of the representations of source and target domains globally. One limitation of this approach is that it tends to neglect the differences between classes: representations of certain classes can be more easily extracted and aligned between the source and target domains than others, limiting the adaptation over all classes. Here, we address this problem by introducing a Class-Conditional Domain Adaptation (CCDA) method. This incorporates a class-conditional multi-scale discriminator and class-conditional losses for both segmentation and adaptation. Together, they measure the segmentation, shift the domain in a class-conditional manner, and equalize the loss over classes. Experimental results demonstrate that the performance of our CCDA method matches, and in some cases, surpasses that of state-of-the-art methods.

语义分割是许多应用中的重要子任务。然而，像素级地面实况标注成本高昂，而且有过度适应训练数据的趋势，从而限制了泛化能力。无监督领域适应可以解决这些问题，它允许在源领域（包括成本较低的合成领域）的标注数据集上训练的系统适应新的目标领域。传统的方法涉及源域和目标域表征的自动提取和全局对齐。这种方法的一个局限是，它往往会忽略不同类别之间的差异：某些类别的表征比其他类别更容易在源域和目标域之间提取和配准，从而限制了对所有类别的适配。在此，我们引入了一种类条件域适应（CCDA）方法来解决这一问题。该方法结合了分类条件多尺度判别器以及用于分割和适应的分类条件损失。它们共同测量分割，以类条件方式移动域，并均衡各类损失。实验结果表明，我们的 CCDA 方法与最先进的方法性能相当，在某些情况下甚至超过了它们。

{"title":"Class-conditional domain adaptation for semantic segmentation","authors":"Yue Wang, Yuke Li, James H. Elder, Runmin Wu, Huchuan Lu","doi":"10.1007/s41095-023-0362-4","DOIUrl":"https://doi.org/10.1007/s41095-023-0362-4","url":null,"abstract":"Semantic segmentation is an important sub-task for many applications. However, pixel-level ground-truth labeling is costly, and there is a tendency to overfit to training data, thereby limiting the generalization ability. Unsupervised domain adaptation can potentially address these problems by allowing systems trained on labelled datasets from the source domain (including less expensive synthetic domain) to be adapted to a novel target domain. The conventional approach involves automatic extraction and alignment of the representations of source and target domains globally. One limitation of this approach is that it tends to neglect the differences between classes: representations of certain classes can be more easily extracted and aligned between the source and target domains than others, limiting the adaptation over all classes. Here, we address this problem by introducing a Class-Conditional Domain Adaptation (CCDA) method. This incorporates a class-conditional multi-scale discriminator and class-conditional losses for both segmentation and adaptation. Together, they measure the segmentation, shift the domain in a class-conditional manner, and equalize the loss over classes. Experimental results demonstrate that the performance of our CCDA method matches, and in some cases, surpasses that of state-of-the-art methods.\u0000","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"272 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140203925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Geometry-aware 3D pose transfer using transformer autoencoder 利用变换器自动编码器实现几何感知三维姿态转移

IF 6.9 3区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computational Visual Media

Pub Date : 2024-03-22 DOI: 10.1007/s41095-023-0379-8

Shanghuan Liu, Shaoyan Gai, Feipeng Da, Fazal Waris

3D pose transfer over unorganized point clouds is a challenging generation task, which transfers a source’s pose to a target shape and keeps the target’s identity. Recent deep models have learned deformations and used the target’s identity as a style to modulate the combined features of two shapes or the aligned vertices of the source shape. However, all operations in these models are point-wise and independent and ignore the geometric information on the surface and structure of the input shapes. This disadvantage severely limits the generation and generalization capabilities. In this study, we propose a geometry-aware method based on a novel transformer autoencoder to solve this problem. An efficient self-attention mechanism, that is, cross-covariance attention, was utilized across our framework to perceive the correlations between points at different distances. Specifically, the transformer encoder extracts the target shape’s local geometry details for identity attributes and the source shape’s global geometry structure for pose information. Our transformer decoder efficiently learns deformations and recovers identity properties by fusing and decoding the extracted features in a geometry attentional manner, which does not require corresponding information or modulation steps. The experiments demonstrated that the geometry-aware method achieved state-of-the-art performance in a 3D pose transfer task. The implementation code and data are available at https://github.com/SEULSH/Geometry-Aware-3D-Pose-Transfer-Using-Transformer-Autoencoder.

在无组织点云上进行三维姿态转移是一项具有挑战性的生成任务，它需要将源姿态转移到目标形状，并保持目标的特征。最近的深度模型学习了变形，并将目标的身份作为一种样式，以调节两个形状的组合特征或源形状的对齐顶点。然而，这些模型中的所有操作都是点对点、独立的，忽略了输入形状的表面和结构的几何信息。这一缺点严重限制了生成和泛化能力。在本研究中，我们提出了一种基于新型变换器自动编码器的几何感知方法来解决这一问题。我们的框架采用了一种高效的自我关注机制，即交叉协方差关注，来感知不同距离点之间的相关性。具体来说，变换器编码器可提取目标形状的局部几何细节以获得身份属性，并提取源形状的全局几何结构以获得姿态信息。我们的变换器解码器以几何注意的方式对提取的特征进行融合和解码，从而有效地学习变形并恢复身份属性，这不需要相应的信息或调制步骤。实验证明，几何感知方法在三维姿态转移任务中取得了最先进的性能。实现代码和数据可在 https://github.com/SEULSH/Geometry-Aware-3D-Pose-Transfer-Using-Transformer-Autoencoder 上获取。

{"title":"Geometry-aware 3D pose transfer using transformer autoencoder","authors":"Shanghuan Liu, Shaoyan Gai, Feipeng Da, Fazal Waris","doi":"10.1007/s41095-023-0379-8","DOIUrl":"https://doi.org/10.1007/s41095-023-0379-8","url":null,"abstract":"3D pose transfer over unorganized point clouds is a challenging generation task, which transfers a source’s pose to a target shape and keeps the target’s identity. Recent deep models have learned deformations and used the target’s identity as a style to modulate the combined features of two shapes or the aligned vertices of the source shape. However, all operations in these models are point-wise and independent and ignore the geometric information on the surface and structure of the input shapes. This disadvantage severely limits the generation and generalization capabilities. In this study, we propose a geometry-aware method based on a novel transformer autoencoder to solve this problem. An efficient self-attention mechanism, that is, cross-covariance attention, was utilized across our framework to perceive the correlations between points at different distances. Specifically, the transformer encoder extracts the target shape’s local geometry details for identity attributes and the source shape’s global geometry structure for pose information. Our transformer decoder efficiently learns deformations and recovers identity properties by fusing and decoding the extracted features in a geometry attentional manner, which does not require corresponding information or modulation steps. The experiments demonstrated that the geometry-aware method achieved state-of-the-art performance in a 3D pose transfer task. The implementation code and data are available at https://github.com/SEULSH/Geometry-Aware-3D-Pose-Transfer-Using-Transformer-Autoencoder.\u0000","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"4 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140203930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0