首页 > 最新文献

Computational Visual Media最新文献

英文 中文
Message from the Best Paper Award Committee 最佳论文奖委员会致辞
IF 6.9 3区 计算机科学 Q1 Computer Science Pub Date : 2024-05-14 DOI: 10.1007/s41095-024-0435-z
Ming C. Lin, Baoquan Chen, Ying He, Wenping Wang, Kun Zhou, Ralph Martin
{"title":"Message from the Best Paper Award Committee","authors":"Ming C. Lin, Baoquan Chen, Ying He, Wenping Wang, Kun Zhou, Ralph Martin","doi":"10.1007/s41095-024-0435-z","DOIUrl":"https://doi.org/10.1007/s41095-024-0435-z","url":null,"abstract":"","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140980993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Foundation models meet visualizations: Challenges and opportunities 基础模型与可视化的结合:挑战与机遇
IF 6.9 3区 计算机科学 Q1 Computer Science Pub Date : 2024-05-02 DOI: 10.1007/s41095-023-0393-x
Weikai Yang, Mengchen Liu, Zheng Wang, Shixia Liu

Recent studies have indicated that foundation models, such as BERT and GPT, excel at adapting to various downstream tasks. This adaptability has made them a dominant force in building artificial intelligence (AI) systems. Moreover, a new research paradigm has emerged as visualization techniques are incorporated into these models. This study divides these intersections into two research areas: visualization for foundation model (VIS4FM) and foundation model for visualization (FM4VIS). In terms of VIS4FM, we explore the primary role of visualizations in understanding, refining, and evaluating these intricate foundation models. VIS4FM addresses the pressing need for transparency, explainability, fairness, and robustness. Conversely, in terms of FM4VIS, we highlight how foundation models can be used to advance the visualization field itself. The intersection of foundation models with visualizations is promising but also introduces a set of challenges. By highlighting these challenges and promising opportunities, this study aims to provide a starting point for the continued exploration of this research avenue.

最近的研究表明,BERT 和 GPT 等基础模型擅长适应各种下游任务。这种适应性使它们成为构建人工智能(AI)系统的主导力量。此外,随着可视化技术融入这些模型,一种新的研究范式也应运而生。本研究将这些交叉点分为两个研究领域:基础模型可视化(VIS4FM)和可视化基础模型(FM4VIS)。在 VIS4FM 方面,我们探讨了可视化在理解、完善和评估这些错综复杂的基础模型方面的主要作用。VIS4FM 解决了对透明度、可解释性、公平性和稳健性的迫切需求。相反,就 FM4VIS 而言,我们强调如何利用基础模型来推动可视化领域本身的发展。基础模型与可视化的交叉是大有可为的,但也带来了一系列挑战。通过强调这些挑战和有前途的机遇,本研究旨在为继续探索这一研究途径提供一个起点。
{"title":"Foundation models meet visualizations: Challenges and opportunities","authors":"Weikai Yang, Mengchen Liu, Zheng Wang, Shixia Liu","doi":"10.1007/s41095-023-0393-x","DOIUrl":"https://doi.org/10.1007/s41095-023-0393-x","url":null,"abstract":"<p>Recent studies have indicated that foundation models, such as BERT and GPT, excel at adapting to various downstream tasks. This adaptability has made them a dominant force in building artificial intelligence (AI) systems. Moreover, a new research paradigm has emerged as visualization techniques are incorporated into these models. This study divides these intersections into two research areas: visualization for foundation model (VIS4FM) and foundation model for visualization (FM4VIS). In terms of VIS4FM, we explore the primary role of visualizations in understanding, refining, and evaluating these intricate foundation models. VIS4FM addresses the pressing need for transparency, explainability, fairness, and robustness. Conversely, in terms of FM4VIS, we highlight how foundation models can be used to advance the visualization field itself. The intersection of foundation models with visualizations is promising but also introduces a set of challenges. By highlighting these challenges and promising opportunities, this study aims to provide a starting point for the continued exploration of this research avenue.</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140883590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning layout generation for virtual worlds 为虚拟世界生成学习布局
IF 6.9 3区 计算机科学 Q1 Computer Science Pub Date : 2024-05-02 DOI: 10.1007/s41095-023-0365-1
Weihao Cheng, Ying Shan

The emergence of the metaverse has led to the rapidly increasing demand for the generation of extensive 3D worlds. We consider that an engaging world is built upon a rational layout of multiple land-use areas (e.g., forest, meadow, and farmland). To this end, we propose a generative model of land-use distribution that learns from geographic data. The model is based on a transformer architecture that generates a 2D map of the land-use layout, which can be conditioned on spatial and semantic controls, depending on whether either one or both are provided. This model enables diverse layout generation with user control and layout expansion by extending borders with partial inputs. To generate high-quality and satisfactory layouts, we devise a geometric objective function that supervises the model to perceive layout shapes and regularize generations using geometric priors. Additionally, we devise a planning objective function that supervises the model to perceive progressive composition demands and suppress generations deviating from controls. To evaluate the spatial distribution of the generations, we train an autoencoder to embed land-use layouts into vectors to enable comparison between the real and generated data using the Wasserstein metric, which is inspired by the Fréchet inception distance.

元宇宙的出现导致对生成广阔三维世界的需求迅速增加。我们认为,一个引人入胜的世界是建立在多个土地使用区域(如森林、草地和农田)的合理布局之上的。为此,我们提出了一种从地理数据中学习土地利用分布的生成模型。该模型基于转换器架构,可生成土地利用布局的二维地图,并可根据是否提供空间和语义控制,对空间和语义控制进行调节。该模型可在用户控制下生成多样化的布局,并通过部分输入扩展边界来扩展布局。为了生成高质量和令人满意的布局,我们设计了一个几何目标函数,用于监督模型感知布局形状,并利用几何先验对生成进行正则化。此外,我们还设计了一个规划目标函数,用于监督模型感知渐进式组合需求,并抑制偏离控制的世代。为了评估世代的空间分布,我们训练了一个自动编码器,将土地利用布局嵌入向量中,以便利用受弗雷谢特截距启发的瓦瑟斯坦度量对真实数据和生成数据进行比较。
{"title":"Learning layout generation for virtual worlds","authors":"Weihao Cheng, Ying Shan","doi":"10.1007/s41095-023-0365-1","DOIUrl":"https://doi.org/10.1007/s41095-023-0365-1","url":null,"abstract":"<p>The emergence of the metaverse has led to the rapidly increasing demand for the generation of extensive 3D worlds. We consider that an engaging world is built upon a rational layout of multiple land-use areas (e.g., forest, meadow, and farmland). To this end, we propose a generative model of land-use distribution that learns from geographic data. The model is based on a transformer architecture that generates a 2D map of the land-use layout, which can be conditioned on spatial and semantic controls, depending on whether either one or both are provided. This model enables diverse layout generation with user control and layout expansion by extending borders with partial inputs. To generate high-quality and satisfactory layouts, we devise a geometric objective function that supervises the model to perceive layout shapes and regularize generations using geometric priors. Additionally, we devise a planning objective function that supervises the model to perceive progressive composition demands and suppress generations deviating from controls. To evaluate the spatial distribution of the generations, we train an autoencoder to embed land-use layouts into vectors to enable comparison between the real and generated data using the Wasserstein metric, which is inspired by the Fréchet inception distance.</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140883711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AdaPIP: Adaptive picture-in-picture guidance for 360° film watching AdaPIP:自适应画中画引导,360°观影
IF 6.9 3区 计算机科学 Q1 Computer Science Pub Date : 2024-05-02 DOI: 10.1007/s41095-023-0347-3
Yi-Xiao Li, Guan Luo, Yi-Ke Xu, Yu He, Fang-Lue Zhang, Song-Hai Zhang

360° videos enable viewers to watch freely from different directions but inevitably prevent them from perceiving all the helpful information. To mitigate this problem, picture-in-picture (PIP) guidance was proposed using preview windows to show regions of interest (ROIs) outside the current view range. We identify several drawbacks of this representation and propose a new method for 360° film watching called AdaPIP. AdaPIP enhances traditional PIP by adaptively arranging preview windows with changeable view ranges and sizes. In addition, AdaPIP incorporates the advantage of arrow-based guidance by presenting circular windows with arrows attached to them to help users locate the corresponding ROIs more efficiently. We also adapted AdaPIP and Outside-In to HMD-based immersive virtual reality environments to demonstrate the usability of PIP-guided approaches beyond 2D screens. Comprehensive user experiments on 2D screens, as well as in VR environments, indicate that AdaPIP is superior to alternative methods in terms of visual experiences while maintaining a comparable degree of immersion.

360° 视频能让观众从不同方向自由观看,但不可避免地会妨碍他们感知所有有用信息。为了缓解这一问题,有人提出了画中画(PIP)引导,利用预览窗口显示当前视图范围之外的感兴趣区域(ROI)。我们发现了这种表示法的几个缺点,并提出了一种用于 360° 电影观看的新方法,称为 AdaPIP。AdaPIP 通过自适应地安排预览窗口来改变视图范围和大小,从而增强了传统的 PIP。此外,AdaPIP 还结合了箭头引导的优势,通过呈现带有箭头的圆形窗口,帮助用户更高效地定位相应的 ROI。我们还将 AdaPIP 和 Outside-In 应用于基于 HMD 的沉浸式虚拟现实环境,以展示 PIP 引导方法在二维屏幕之外的可用性。在二维屏幕和虚拟现实环境中进行的综合用户实验表明,AdaPIP 在视觉体验方面优于其他方法,同时还能保持相当程度的沉浸感。
{"title":"AdaPIP: Adaptive picture-in-picture guidance for 360° film watching","authors":"Yi-Xiao Li, Guan Luo, Yi-Ke Xu, Yu He, Fang-Lue Zhang, Song-Hai Zhang","doi":"10.1007/s41095-023-0347-3","DOIUrl":"https://doi.org/10.1007/s41095-023-0347-3","url":null,"abstract":"<p>360° videos enable viewers to watch freely from different directions but inevitably prevent them from perceiving all the helpful information. To mitigate this problem, picture-in-picture (PIP) guidance was proposed using preview windows to show regions of interest (ROIs) outside the current view range. We identify several drawbacks of this representation and propose a new method for 360° film watching called AdaPIP. AdaPIP enhances traditional PIP by adaptively arranging preview windows with changeable view ranges and sizes. In addition, AdaPIP incorporates the advantage of arrow-based guidance by presenting circular windows with arrows attached to them to help users locate the corresponding ROIs more efficiently. We also adapted AdaPIP and Outside-In to HMD-based immersive virtual reality environments to demonstrate the usability of PIP-guided approaches beyond 2D screens. Comprehensive user experiments on 2D screens, as well as in VR environments, indicate that AdaPIP is superior to alternative methods in terms of visual experiences while maintaining a comparable degree of immersion.</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140883707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Symmetrization of quasi-regular patterns with periodic tilting of regular polygons 用规则多边形的周期性倾斜对称准规则图案
IF 6.9 3区 计算机科学 Q1 Computer Science Pub Date : 2024-04-27 DOI: 10.1007/s41095-023-0359-z
Zhengzheng Yin, Yao Jin, Zhijian Fang, Yun Zhang, Huaxiong Zhang, Jiu Zhou, Lili He

Computer-generated aesthetic patterns are widely used as design materials in various fields. The most common methods use fractals or dynamical systems as basic tools to create various patterns. To enhance aesthetics and controllability, some researchers have introduced symmetric layouts along with these tools. One popular strategy employs dynamical systems compatible with symmetries that construct functions with the desired symmetries. However, these are typically confined to simple planar symmetries. The other generates symmetrical patterns under the constraints of tilings. Although it is slightly more flexible, it is restricted to small ranges of tilings and lacks textural variations. Thus, we proposed a new approach for generating aesthetic patterns by symmetrizing quasi-regular patterns using general k-uniform tilings. We adopted a unified strategy to construct invariant mappings for k-uniform tilings that can eliminate texture seams across the tiling edges. Furthermore, we constructed three types of symmetries associated with the patterns: dihedral, rotational, and reflection symmetries. The proposed method can be easily implemented using GPU shaders and is highly efficient and suitable for complicated tiling with regular polygons. Experiments demonstrated the advantages of our method over state-of-the-art methods in terms of flexibility in controlling the generation of patterns with various parameters as well as the diversity of textures and styles.

计算机生成的美学图案作为设计素材被广泛应用于各个领域。最常见的方法是使用分形或动力系统作为基本工具来创建各种图案。为了增强美感和可控性,一些研究人员在使用这些工具的同时还引入了对称布局。一种流行的策略是采用与对称性兼容的动力系统,构建具有所需对称性的函数。不过,这些方法通常仅限于简单的平面对称。另一种则是在倾斜的限制下生成对称图案。虽然这种方法稍微灵活一些,但它仅限于小范围的倾斜,而且缺乏纹理变化。因此,我们提出了一种新方法,利用一般的 k-uniform tilings 对准规则图案进行对称,从而生成美观的图案。我们采用统一的策略来构建 k-uniform tilings 的不变映射,从而消除了 tiling 边缘的纹理接缝。此外,我们还构建了与图案相关的三类对称性:二面对称、旋转对称和反射对称。所提出的方法可以通过 GPU 着色器轻松实现,而且效率很高,适用于带有规则多边形的复杂平铺。实验证明,与最先进的方法相比,我们的方法在灵活控制各种参数的图案生成以及纹理和样式的多样性方面具有优势。
{"title":"Symmetrization of quasi-regular patterns with periodic tilting of regular polygons","authors":"Zhengzheng Yin, Yao Jin, Zhijian Fang, Yun Zhang, Huaxiong Zhang, Jiu Zhou, Lili He","doi":"10.1007/s41095-023-0359-z","DOIUrl":"https://doi.org/10.1007/s41095-023-0359-z","url":null,"abstract":"<p>Computer-generated aesthetic patterns are widely used as design materials in various fields. The most common methods use fractals or dynamical systems as basic tools to create various patterns. To enhance aesthetics and controllability, some researchers have introduced symmetric layouts along with these tools. One popular strategy employs dynamical systems compatible with symmetries that construct functions with the desired symmetries. However, these are typically confined to simple planar symmetries. The other generates symmetrical patterns under the constraints of tilings. Although it is slightly more flexible, it is restricted to small ranges of tilings and lacks textural variations. Thus, we proposed a new approach for generating aesthetic patterns by symmetrizing quasi-regular patterns using general <i>k</i>-uniform tilings. We adopted a unified strategy to construct invariant mappings for <i>k</i>-uniform tilings that can eliminate texture seams across the tiling edges. Furthermore, we constructed three types of symmetries associated with the patterns: dihedral, rotational, and reflection symmetries. The proposed method can be easily implemented using GPU shaders and is highly efficient and suitable for complicated tiling with regular polygons. Experiments demonstrated the advantages of our method over state-of-the-art methods in terms of flexibility in controlling the generation of patterns with various parameters as well as the diversity of textures and styles.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140809277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint training with local soft attention and dual cross-neighbor label smoothing for unsupervised person re-identification 利用局部软关注和双交叉邻域标签平滑进行联合训练,实现无监督人员再识别
IF 6.9 3区 计算机科学 Q1 Computer Science Pub Date : 2024-04-27 DOI: 10.1007/s41095-023-0354-4
Qing Han, Longfei Li, Weidong Min, Qi Wang, Qingpeng Zeng, Shimiao Cui, Jiongjin Chen

Existing unsupervised person re-identification approaches fail to fully capture the fine-grained features of local regions, which can result in people with similar appearances and different identities being assigned the same label after clustering. The identity-independent information contained in different local regions leads to different levels of local noise. To address these challenges, joint training with local soft attention and dual cross-neighbor label smoothing (DCLS) is proposed in this study. First, the joint training is divided into global and local parts, whereby a soft attention mechanism is proposed for the local branch to accurately capture the subtle differences in local regions, which improves the ability of the re-identification model in identifying a person’s local significant features. Second, DCLS is designed to progressively mitigate label noise in different local regions. The DCLS uses global and local similarity metrics to semantically align the global and local regions of the person and further determines the proximity association between local regions through the cross information of neighboring regions, thereby achieving label smoothing of the global and local regions throughout the training process. In extensive experiments, the proposed method outperformed existing methods under unsupervised settings on several standard person re-identification datasets.

现有的无监督人员再识别方法未能充分捕捉局部区域的细粒度特征,这可能导致外表相似但身份不同的人在聚类后被贴上相同的标签。不同局部区域所包含的与身份无关的信息会导致不同程度的局部噪声。为了解决这些难题,本研究提出了局部软关注和双交叉邻域标签平滑(DCLS)联合训练。首先,联合训练分为全局和局部两部分,其中局部部分采用软注意力机制,以准确捕捉局部区域的细微差别,从而提高再识别模型识别人的局部重要特征的能力。其次,DCLS 是为了逐步减轻不同局部区域的标签噪声而设计的。DCLS 使用全局和局部相似度指标对人物的全局和局部区域进行语义对齐,并通过相邻区域的交叉信息进一步确定局部区域之间的近似关联,从而在整个训练过程中实现全局和局部区域的标签平滑。在大量实验中,所提出的方法在多个标准人物再识别数据集上的无监督设置下的表现优于现有方法。
{"title":"Joint training with local soft attention and dual cross-neighbor label smoothing for unsupervised person re-identification","authors":"Qing Han, Longfei Li, Weidong Min, Qi Wang, Qingpeng Zeng, Shimiao Cui, Jiongjin Chen","doi":"10.1007/s41095-023-0354-4","DOIUrl":"https://doi.org/10.1007/s41095-023-0354-4","url":null,"abstract":"<p>Existing unsupervised person re-identification approaches fail to fully capture the fine-grained features of local regions, which can result in people with similar appearances and different identities being assigned the same label after clustering. The identity-independent information contained in different local regions leads to different levels of local noise. To address these challenges, joint training with local soft attention and dual cross-neighbor label smoothing (DCLS) is proposed in this study. First, the joint training is divided into global and local parts, whereby a soft attention mechanism is proposed for the local branch to accurately capture the subtle differences in local regions, which improves the ability of the re-identification model in identifying a person’s local significant features. Second, DCLS is designed to progressively mitigate label noise in different local regions. The DCLS uses global and local similarity metrics to semantically align the global and local regions of the person and further determines the proximity association between local regions through the cross information of neighboring regions, thereby achieving label smoothing of the global and local regions throughout the training process. In extensive experiments, the proposed method outperformed existing methods under unsupervised settings on several standard person re-identification datasets.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140809186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DepthGAN: GAN-based depth generation from semantic layouts DepthGAN:基于 GAN 的语义布局深度生成
IF 6.9 3区 计算机科学 Q1 Computer Science Pub Date : 2024-04-27 DOI: 10.1007/s41095-023-0350-8
Yidi Li, Jun Xiao, Yiqun Wang, Zhengda Lu

Existing GAN-based generative methods are typically used for semantic image synthesis. We pose the question of whether GAN-based architectures can generate plausible depth maps and find that existing methods have difficulty in generating depth maps which reasonably represent 3D scene structure due to the lack of global geometric correlations. Thus, we propose DepthGAN, a novel method of generating a depth map using a semantic layout as input to aid construction, and manipulation of well-structured 3D scene point clouds. Specifically, we first build a feature generation model with a cascade of semantically-aware transformer blocks to obtain depth features with global structural information. For our semantically aware transformer block, we propose a mixed attention module and a semantically aware layer normalization module to better exploit semantic consistency for depth features generation. Moreover, we present a novel semantically weighted depth synthesis module, which generates adaptive depth intervals for the current scene. We generate the final depth map by using a weighted combination of semantically aware depth weights for different depth ranges. In this manner, we obtain a more accurate depth map. Extensive experiments on indoor and outdoor datasets demonstrate that DepthGAN achieves superior results both quantitatively and visually for the depth generation task.

现有的基于 GAN 的生成方法通常用于语义图像合成。我们提出了基于 GAN 的架构能否生成可信深度图的问题,并发现由于缺乏全局几何相关性,现有方法难以生成合理表现三维场景结构的深度图。因此,我们提出了 DepthGAN,一种使用语义布局作为输入来生成深度图的新方法,以帮助构建和操作结构良好的三维场景点云。具体来说,我们首先利用语义感知转换块级联建立一个特征生成模型,以获取具有全局结构信息的深度特征。对于语义感知转换模块,我们提出了混合注意力模块和语义感知层归一化模块,以更好地利用语义一致性生成深度特征。此外,我们还提出了一个新颖的语义加权深度合成模块,它能为当前场景生成自适应深度区间。我们通过对不同深度范围的语义深度权重进行加权组合,生成最终的深度图。通过这种方式,我们获得了更精确的深度图。在室内和室外数据集上进行的大量实验表明,DepthGAN 在深度生成任务的定量和视觉效果上都取得了卓越的成果。
{"title":"DepthGAN: GAN-based depth generation from semantic layouts","authors":"Yidi Li, Jun Xiao, Yiqun Wang, Zhengda Lu","doi":"10.1007/s41095-023-0350-8","DOIUrl":"https://doi.org/10.1007/s41095-023-0350-8","url":null,"abstract":"<p>Existing GAN-based generative methods are typically used for semantic image synthesis. We pose the question of whether GAN-based architectures can generate plausible depth maps and find that existing methods have difficulty in generating depth maps which reasonably represent 3D scene structure due to the lack of global geometric correlations. Thus, we propose DepthGAN, a novel method of generating a depth map using a semantic layout as input to aid construction, and manipulation of well-structured 3D scene point clouds. Specifically, we first build a feature generation model with a cascade of semantically-aware transformer blocks to obtain depth features with global structural information. For our semantically aware transformer block, we propose a mixed attention module and a semantically aware layer normalization module to better exploit semantic consistency for depth features generation. Moreover, we present a novel semantically weighted depth synthesis module, which generates adaptive depth intervals for the current scene. We generate the final depth map by using a weighted combination of semantically aware depth weights for different depth ranges. In this manner, we obtain a more accurate depth map. Extensive experiments on indoor and outdoor datasets demonstrate that DepthGAN achieves superior results both quantitatively and visually for the depth generation task.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140809310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Physics-based fluid simulation in computer graphics: Survey, research trends, and challenges 计算机制图中基于物理的流体模拟:调查、研究趋势和挑战
IF 6.9 3区 计算机科学 Q1 Computer Science Pub Date : 2024-04-27 DOI: 10.1007/s41095-023-0368-y
Xiaokun Wang, Yanrui Xu, Sinuo Liu, Bo Ren, Jiří Kosinka, Alexandru C. Telea, Jiamin Wang, Chongming Song, Jian Chang, Chenfeng Li, Jian Jun Zhang, Xiaojuan Ban

Physics-based fluid simulation has played an increasingly important role in the computer graphics community. Recent methods in this area have greatly improved the generation of complex visual effects and its computational efficiency. Novel techniques have emerged to deal with complex boundaries, multiphase fluids, gas–liquid interfaces, and fine details. The parallel use of machine learning, image processing, and fluid control technologies has brought many interesting and novel research perspectives. In this survey, we provide an introduction to theoretical concepts underpinning physics-based fluid simulation and their practical implementation, with the aim for it to serve as a guide for both newcomers and seasoned researchers to explore the field of physics-based fluid simulation, with a focus on developments in the last decade. Driven by the distribution of recent publications in the field, we structure our survey to cover physical background; discretization approaches; computational methods that address scalability; fluid interactions with other materials and interfaces; and methods for expressive aspects of surface detail and control. From a practical perspective, we give an overview of existing implementations available for the above methods.

基于物理的流体模拟在计算机制图领域发挥着越来越重要的作用。该领域的最新方法大大提高了复杂视觉效果的生成和计算效率。出现了处理复杂边界、多相流体、气液界面和精细细节的新技术。机器学习、图像处理和流体控制技术的并行使用带来了许多有趣而新颖的研究视角。在本调查报告中,我们介绍了基于物理的流体模拟的理论概念及其实际应用,旨在为新手和经验丰富的研究人员探索基于物理的流体模拟领域提供指导,重点关注近十年来的发展。在该领域最新出版物分布的推动下,我们的调查报告涵盖了物理背景、离散化方法、解决可扩展性问题的计算方法、流体与其他材料和界面的相互作用,以及表面细节和控制的表现方法。从实用角度出发,我们概述了上述方法的现有实现方法。
{"title":"Physics-based fluid simulation in computer graphics: Survey, research trends, and challenges","authors":"Xiaokun Wang, Yanrui Xu, Sinuo Liu, Bo Ren, Jiří Kosinka, Alexandru C. Telea, Jiamin Wang, Chongming Song, Jian Chang, Chenfeng Li, Jian Jun Zhang, Xiaojuan Ban","doi":"10.1007/s41095-023-0368-y","DOIUrl":"https://doi.org/10.1007/s41095-023-0368-y","url":null,"abstract":"<p>Physics-based fluid simulation has played an increasingly important role in the computer graphics community. Recent methods in this area have greatly improved the generation of complex visual effects and its computational efficiency. Novel techniques have emerged to deal with complex boundaries, multiphase fluids, gas–liquid interfaces, and fine details. The parallel use of machine learning, image processing, and fluid control technologies has brought many interesting and novel research perspectives. In this survey, we provide an introduction to theoretical concepts underpinning physics-based fluid simulation and their practical implementation, with the aim for it to serve as a guide for both newcomers and seasoned researchers to explore the field of physics-based fluid simulation, with a focus on developments in the last decade. Driven by the distribution of recent publications in the field, we structure our survey to cover physical background; discretization approaches; computational methods that address scalability; fluid interactions with other materials and interfaces; and methods for expressive aspects of surface detail and control. From a practical perspective, we give an overview of existing implementations available for the above methods.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140809331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning to compose diversified prompts for image emotion classification 学习编写用于图像情感分类的多样化提示语
IF 6.9 3区 计算机科学 Q1 Computer Science Pub Date : 2024-04-26 DOI: 10.1007/s41095-023-0389-6
Sinuo Deng, Lifang Wu, Ge Shi, Lehao Xing, Meng Jian, Ye Xiang, Ruihai Dong

Image emotion classification (IEC) aims to extract the abstract emotions evoked in images. Recently, language-supervised methods such as contrastive language-image pretraining (CLIP) have demonstrated superior performance in image understanding. However, the underexplored task of IEC presents three major challenges: a tremendous training objective gap between pretraining and IEC, shared suboptimal prompts, and invariant prompts for all instances. In this study, we propose a general framework that effectively exploits the language-supervised CLIP method for the IEC task. First, a prompt-tuning method that mimics the pretraining objective of CLIP is introduced, to exploit the rich image and text semantics associated with CLIP. Subsequently, instance-specific prompts are automatically composed, conditioning them on the categories and image content of instances, diversifying the prompts, and thus avoiding suboptimal problems. Evaluations on six widely used affective datasets show that the proposed method significantly outperforms state-of-the-art methods (up to 9.29% accuracy gain on the EmotionROI dataset) on IEC tasks with only a few trained parameters. The code is publicly available at https://github.com/dsn0w/PT-DPC/for research purposes.

图像情感分类(IEC)旨在提取图像中唤起的抽象情感。最近,对比语言-图像预训练(CLIP)等语言监督方法在图像理解方面表现出了卓越的性能。然而,未被充分探索的 IEC 任务面临着三大挑战:预训练与 IEC 之间巨大的训练目标差距、共享次优提示以及所有实例的不变提示。在本研究中,我们提出了一个通用框架,可有效利用语言监督 CLIP 方法来完成 IEC 任务。首先,我们引入了一种模仿 CLIP 预训练目标的提示调整方法,以利用与 CLIP 相关的丰富图像和文本语义。随后,根据实例的类别和图像内容自动生成针对特定实例的提示,使提示多样化,从而避免出现次优问题。在六个广泛使用的情感数据集上进行的评估表明,在 IEC 任务上,只需少量训练参数,所提出的方法就能明显优于最先进的方法(在 EmotionROI 数据集上的准确率提高了 9.29%)。代码可在 https://github.com/dsn0w/PT-DPC/for 研究网站上公开获取。
{"title":"Learning to compose diversified prompts for image emotion classification","authors":"Sinuo Deng, Lifang Wu, Ge Shi, Lehao Xing, Meng Jian, Ye Xiang, Ruihai Dong","doi":"10.1007/s41095-023-0389-6","DOIUrl":"https://doi.org/10.1007/s41095-023-0389-6","url":null,"abstract":"<p>Image emotion classification (IEC) aims to extract the abstract emotions evoked in images. Recently, language-supervised methods such as contrastive language-image pretraining (CLIP) have demonstrated superior performance in image understanding. However, the underexplored task of IEC presents three major challenges: a tremendous training objective gap between pretraining and IEC, shared suboptimal prompts, and invariant prompts for all instances. In this study, we propose a general framework that effectively exploits the language-supervised CLIP method for the IEC task. First, a prompt-tuning method that mimics the pretraining objective of CLIP is introduced, to exploit the rich image and text semantics associated with CLIP. Subsequently, instance-specific prompts are automatically composed, conditioning them on the categories and image content of instances, diversifying the prompts, and thus avoiding suboptimal problems. Evaluations on six widely used affective datasets show that the proposed method significantly outperforms state-of-the-art methods (up to 9.29% accuracy gain on the EmotionROI dataset) on IEC tasks with only a few trained parameters. The code is publicly available at https://github.com/dsn0w/PT-DPC/for research purposes.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140883709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CTSN: Predicting cloth deformation for skeleton-based characters with a two-stream skinning network CTSN:利用双流皮肤网络预测基于骨骼的角色的布料变形
IF 6.9 3区 计算机科学 Q1 Computer Science Pub Date : 2024-04-19 DOI: 10.1007/s41095-023-0344-6
Yudi Li, Min Tang, Yun Yang, Ruofeng Tong, Shuangcai Yang, Yao Li, Bailin An, Qilong Kou

We present a novel learning method using a two-stream network to predict cloth deformation for skeleton-based characters. The characters processed in our approach are not limited to humans, and can be other targets with skeleton-based representations such as fish or pets. We use a novel network architecture which consists of skeleton-based and mesh-based residual networks to learn the coarse features and wrinkle features forming the overall residual from the template cloth mesh. Our network may be used to predict the deformation for loose or tight-fitting clothing. The memory footprint of our network is low, thereby resulting in reduced computational requirements. In practice, a prediction for a single cloth mesh for a skeleton-based character takes about 7 ms on an nVidia GeForce RTX 3090 GPU. Compared to prior methods, our network can generate finer deformation results with details and wrinkles.

我们提出了一种新颖的学习方法,利用双流网络预测基于骨骼的人物的布变形。我们的方法所处理的人物并不局限于人类,也可以是其他基于骨骼表示的目标,如鱼或宠物。我们采用了一种新颖的网络架构,该架构由基于骨架和基于网格的残差网络组成,用于从模板布料网格中学习形成整体残差的粗特征和皱纹特征。我们的网络可用于预测宽松或紧身服装的变形。我们的网络占用内存少,从而降低了计算要求。实际上,在 nVidia GeForce RTX 3090 GPU 上对基于骨骼的角色的单个布料网格进行预测大约需要 7 毫秒。与之前的方法相比,我们的网络可以生成更精细的变形结果,包括细节和褶皱。
{"title":"CTSN: Predicting cloth deformation for skeleton-based characters with a two-stream skinning network","authors":"Yudi Li, Min Tang, Yun Yang, Ruofeng Tong, Shuangcai Yang, Yao Li, Bailin An, Qilong Kou","doi":"10.1007/s41095-023-0344-6","DOIUrl":"https://doi.org/10.1007/s41095-023-0344-6","url":null,"abstract":"<p>We present a novel learning method using a two-stream network to predict cloth deformation for skeleton-based characters. The characters processed in our approach are not limited to humans, and can be other targets with skeleton-based representations such as fish or pets. We use a novel network architecture which consists of skeleton-based and mesh-based residual networks to learn the coarse features and wrinkle features forming the overall residual from the template cloth mesh. Our network may be used to predict the deformation for loose or tight-fitting clothing. The memory footprint of our network is low, thereby resulting in reduced computational requirements. In practice, a prediction for a single cloth mesh for a skeleton-based character takes about 7 ms on an nVidia GeForce RTX 3090 GPU. Compared to prior methods, our network can generate finer deformation results with details and wrinkles.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140629637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational Visual Media
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1