Graphical Models最新文献_第4页

Pixel art character generation as an image-to-image translation problem using GANs 将像素艺术角色生成作为使用广义泛函模型的图像到图像转换问题

IF 1.7 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Graphical Models

Pub Date : 2024-02-02 DOI: 10.1016/j.gmod.2024.101213

Flávio Coutinho , Luiz Chaimowicz

Asset creation in game development usually requires multiple iterations until a final version is achieved. This iterative process becomes more significant when the content is pixel art, in which the artist carefully places each pixel. We hypothesize that the problem of generating character sprites in a target pose (e.g., facing right) given a source (e.g., facing front) can be framed as an image-to-image translation task. Then, we present an architecture of deep generative models that takes as input an image of a character in one domain (pose) and transfers it to another. We approach the problem using generative adversarial networks (GANs) and build on Pix2Pix’s architecture while leveraging some specific characteristics of the pixel art style. We evaluated the trained models using four small datasets (less than 1k) and a more extensive and diverse one (12k). The models yielded promising results, and their generalization capacity varies according to the dataset size and variability. After training models to generate images among four domains (i.e., front, right, back, left), we present an early version of a mixed-initiative sprite editor that allows users to interact with them and iterate in creating character sprites.

游戏开发中的资产创建通常需要多次迭代，直到完成最终版本。当内容是像素艺术时，这种迭代过程就变得更加重要，因为在像素艺术中，艺术家要仔细地放置每个像素。我们假设，在给定源（如朝向前方）的情况下，以目标姿势（如朝向右方）生成角色精灵的问题可以归结为图像到图像的转换任务。然后，我们提出了一种深度生成模型架构，它将一个领域（姿势）中的角色图像作为输入，并将其转换到另一个领域。我们使用生成对抗网络（GANs）来处理这个问题，并以 Pix2Pix 的架构为基础，同时利用像素艺术风格的一些特定特征。我们使用四个小型数据集（少于 1k）和一个更广泛、更多样的数据集（12k）对训练好的模型进行了评估。模型取得了可喜的成果，其泛化能力随数据集的大小和变化而变化。在训练模型生成四个域（即前、右、后、左）的图像后，我们展示了混合主动式精灵编辑器的早期版本，用户可以与模型互动，反复创建角色精灵。

{"title":"Pixel art character generation as an image-to-image translation problem using GANs","authors":"Flávio Coutinho , Luiz Chaimowicz","doi":"10.1016/j.gmod.2024.101213","DOIUrl":"10.1016/j.gmod.2024.101213","url":null,"abstract":"<div><p>Asset creation in game development usually requires multiple iterations until a final version is achieved. This iterative process becomes more significant when the content is pixel art, in which the artist carefully places each pixel. We hypothesize that the problem of generating character sprites in a target pose (e.g., facing right) given a source (e.g., facing front) can be framed as an image-to-image translation task. Then, we present an architecture of deep generative models that takes as input an image of a character in one domain (pose) and transfers it to another. We approach the problem using generative adversarial networks (GANs) and build on Pix2Pix’s architecture while leveraging some specific characteristics of the pixel art style. We evaluated the trained models using four small datasets (less than 1k) and a more extensive and diverse one (12k). The models yielded promising results, and their generalization capacity varies according to the dataset size and variability. After training models to generate images among four domains (i.e., front, right, back, left), we present an early version of a mixed-initiative sprite editor that allows users to interact with them and iterate in creating character sprites.</p></div>","PeriodicalId":55083,"journal":{"name":"Graphical Models","volume":"132 ","pages":"Article 101213"},"PeriodicalIF":1.7,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1524070324000018/pdfft?md5=d7948e383c160b41fc886121e68e438f&pid=1-s2.0-S1524070324000018-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139661295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evaluating and comparing crowd simulations: Perspectives from a crowd authoring tool 评估和比较人群模拟：人群创作工具的视角

IF 1.7 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Graphical Models

Pub Date : 2024-01-03 DOI: 10.1016/j.gmod.2023.101212

Gabriel Fonseca Silva, Paulo Ricardo Knob, Rubens Halbig Montanha, Soraia Raupp Musse

Crowd simulation is a research area widely used in diverse fields, including gaming and security, assessing virtual agent movements through metrics like time to reach their goals, speed, trajectories, and densities. This is relevant for security applications, for instance, as different crowd configurations can determine the time people spend in environments trying to evacuate them. In this work, we extend WebCrowds, an authoring tool for crowd simulation, to allow users to build scenarios and evaluate them through a set of metrics. The aim is to provide a quantitative metric that can, based on simulation data, select the best crowd configuration in a certain environment. We conduct experiments to validate our proposed metric in multiple crowd simulation scenarios and perform a comparison with another metric found in the literature. The results show that experts in the domain of crowd scenarios agree with our proposed quantitative metric.

人群模拟是一个广泛应用于游戏和安全等不同领域的研究领域，通过达到目标的时间、速度、轨迹和密度等指标来评估虚拟代理的移动。例如，这与安全应用相关，因为不同的人群配置可以决定人们在试图撤离的环境中所花费的时间。在这项工作中，我们扩展了人群模拟创作工具 WebCrowds，使用户能够构建场景并通过一系列指标对其进行评估。其目的是提供一种量化指标，根据模拟数据选择特定环境中的最佳人群配置。我们进行了实验，在多个人群模拟场景中验证了我们提出的指标，并与文献中的另一个指标进行了比较。结果表明，人群场景领域的专家同意我们提出的量化指标。

{"title":"Evaluating and comparing crowd simulations: Perspectives from a crowd authoring tool","authors":"Gabriel Fonseca Silva, Paulo Ricardo Knob, Rubens Halbig Montanha, Soraia Raupp Musse","doi":"10.1016/j.gmod.2023.101212","DOIUrl":"10.1016/j.gmod.2023.101212","url":null,"abstract":"<div><p>Crowd simulation is a research area widely used in diverse fields, including gaming and security, assessing virtual agent movements through metrics like time to reach their goals, speed, trajectories, and densities. This is relevant for security applications, for instance, as different crowd configurations can determine the time people spend in environments trying to evacuate them. In this work, we extend WebCrowds, an authoring tool for crowd simulation, to allow users to build scenarios and evaluate them through a set of metrics. The aim is to provide a quantitative metric that can, based on simulation data, select the best crowd configuration in a certain environment. We conduct experiments to validate our proposed metric in multiple crowd simulation scenarios and perform a comparison with another metric found in the literature. The results show that experts in the domain of crowd scenarios agree with our proposed quantitative metric.</p></div>","PeriodicalId":55083,"journal":{"name":"Graphical Models","volume":"131 ","pages":"Article 101212"},"PeriodicalIF":1.7,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1524070323000425/pdfft?md5=99cc8b127e117c8937d599aa1f5ebafe&pid=1-s2.0-S1524070323000425-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139084586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Editorial special issue on the 9th smart tools and applications in graphics conference (STAG 2022) 第九届图形学智能工具与应用大会(STAG 2022)编辑特刊

IF 1.7 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Graphical Models

Pub Date : 2023-12-01 DOI: 10.1016/j.gmod.2023.101203

Daniela Cabiddu , Gianmarco Cherchi , Teseo Schneider

引用次数: 0

High-performance Ellipsoidal Clipmaps 高性能椭球体剪贴图

IF 1.7 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Graphical Models

Pub Date : 2023-12-01 DOI: 10.1016/j.gmod.2023.101209

Aleksandar Dimitrijević, Dejan Rančić

This paper presents performance improvements for Ellipsoid Clipmaps, an out-of-core planet-sized geodetically accurate terrain rendering algorithm. The performance improvements were achieved by eliminating unnecessarily dense levels, more accurate block culling in the geographic coordinate system, and more efficient rendering methods. The elimination of unnecessarily dense levels is the result of analyzing and determining the optimal relative height of the viewer with respect to the most detailed level, resulting in the most consistent size of triangles across all visible levels. The proposed method for estimating the visibility of blocks based on view orientation allows rapid block-level view frustum culling performed in data space before visualization and spatial transformation of blocks. The use of a modern geometry pipeline through task and mesh shaders forced the handling of extremely fine granularity of blocks, but also shifted a significant part of the block culling process from CPU to the GPU. The approach described achieves high throughput and enables geodetically accurate rendering of the terrain based on the WGS 84 reference ellipsoid at very high resolution and in real time, with tens of millions of triangles with an average area of about 0.5 pix $^{2}$ on a 1080p screen on mid-range graphics cards.

本文介绍了椭球Clipmaps的性能改进，椭球Clipmaps是一种核外行星大小的大地测量精确地形绘制算法。性能改进是通过消除不必要的密集级别、在地理坐标系中更精确的块剔除和更有效的渲染方法来实现的。消除不必要的密集关卡是分析和确定观看者相对于最详细关卡的最佳相对高度的结果，从而在所有可见关卡中产生最一致的三角形大小。提出的基于视图方向的块可见性估计方法允许在块可视化和空间转换之前在数据空间中快速进行块级视图截锥体剔除。通过任务和网格着色器使用现代几何管道强制处理极细粒度的块，但也将块剔除过程的重要部分从CPU转移到GPU。所描述的方法实现了高吞吐量，并能够基于WGS 84参考椭球体以非常高的分辨率和实时的大地测量精度渲染地形，在中程显卡的1080p屏幕上具有数千万个平均面积约为0.5 pix2的三角形。

{"title":"High-performance Ellipsoidal Clipmaps","authors":"Aleksandar Dimitrijević, Dejan Rančić","doi":"10.1016/j.gmod.2023.101209","DOIUrl":"https://doi.org/10.1016/j.gmod.2023.101209","url":null,"abstract":"<div><p>This paper presents performance improvements for Ellipsoid Clipmaps, an out-of-core planet-sized geodetically accurate terrain rendering algorithm. The performance improvements were achieved by eliminating unnecessarily dense levels, more accurate block culling in the geographic coordinate system, and more efficient rendering methods. The elimination of unnecessarily dense levels is the result of analyzing and determining the optimal relative height of the viewer with respect to the most detailed level, resulting in the most consistent size of triangles across all visible levels. The proposed method for estimating the visibility of blocks based on view orientation allows rapid block-level view frustum culling performed in data space before visualization and spatial transformation of blocks. The use of a modern geometry pipeline through task and mesh shaders forced the handling of extremely fine granularity of blocks, but also shifted a significant part of the block culling process from CPU to the GPU. The approach described achieves high throughput and enables geodetically accurate rendering of the terrain based on the WGS 84 reference ellipsoid at very high resolution and in real time, with tens of millions of triangles with an average area of about 0.5 pix<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span> on a 1080p screen on mid-range graphics cards.</p></div>","PeriodicalId":55083,"journal":{"name":"Graphical Models","volume":"130 ","pages":"Article 101209"},"PeriodicalIF":1.7,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1524070323000395/pdfft?md5=26122c390b83d408f64d205c80bb4675&pid=1-s2.0-S1524070323000395-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138466486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Modeling multi-style portrait relief from a single photograph 从一张照片建模多风格人像浮雕

IF 1.7 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Graphical Models

Pub Date : 2023-11-28 DOI: 10.1016/j.gmod.2023.101210

Yu-Wei Zhang , Hongguang Yang , Ping Luo , Zhi Li , Hui Liu , Zhongping Ji , Caiming Zhang

This paper aims at extending the method of Zhang et al. (2023) to produce not only portrait bas-reliefs from single photographs, but also high-depth reliefs with reasonable depth ordering. We cast this task as a problem of style-aware photo-to-depth translation, where the input is a photograph conditioned by a style vector and the output is a portrait relief with desired depth style. To construct ground-truth data for network training, we first propose an optimization-based method to synthesize high-depth reliefs from 3D portraits. Then, we train a normal-to-depth network to learn the mapping from normal maps to relief depths. After that, we use the trained network to generate high-depth relief samples using the provided normal maps from Zhang et al. (2023). As each normal map has pixel-wise photograph, we are able to establish correspondences between photographs and high-depth reliefs. By taking the bas-reliefs of Zhang et al. (2023), the new high-depth reliefs and their mixtures as target ground-truths, we finally train a encoder-to-decoder network to achieve style-aware relief modeling. Specially, the network is based on a U-shaped architecture, consisting of Swin Transformer blocks to process hierarchical deep features. Extensive experiments have demonstrated the effectiveness of the proposed method. Comparisons with previous works have verified its flexibility and state-of-the-art performance.

本文旨在扩展Zhang et al.(2023)的方法，不仅可以从单张照片中生成人像浅浮雕，还可以生成深度排序合理的高深度浮雕。我们将此任务作为样式感知照片到深度转换的问题，其中输入是由样式向量限定的照片，输出是具有所需深度样式的人像浮雕。为了构建用于网络训练的真地数据，我们首先提出了一种基于优化的方法，从三维人像中合成高深度浮雕。然后，我们训练一个法线到深度的网络来学习从法线到地形深度的映射。之后，我们使用训练好的网络，使用Zhang等人(2023)提供的法线图生成高深度地形样本。由于每个法线贴图都有像素级的照片，我们能够在照片和高深度浮雕之间建立对应关系。通过将Zhang等人(2023)的浅浮雕、新的高深度浮雕及其混合物作为目标ground-truth，我们最终训练了一个编码器到解码器网络，以实现风格感知浮雕建模。特别地，该网络基于由Swin Transformer块组成的u型结构来处理分层深度特征。大量的实验证明了该方法的有效性。与以前的作品比较，验证了它的灵活性和最先进的性能。

{"title":"Modeling multi-style portrait relief from a single photograph","authors":"Yu-Wei Zhang , Hongguang Yang , Ping Luo , Zhi Li , Hui Liu , Zhongping Ji , Caiming Zhang","doi":"10.1016/j.gmod.2023.101210","DOIUrl":"https://doi.org/10.1016/j.gmod.2023.101210","url":null,"abstract":"<div><p>This paper aims at extending the method of Zhang et al. (2023) to produce not only portrait bas-reliefs from single photographs, but also high-depth reliefs with reasonable depth ordering. We cast this task as a problem of style-aware photo-to-depth translation, where the input is a photograph conditioned by a style vector and the output is a portrait relief with desired depth style. To construct ground-truth data for network training, we first propose an optimization-based method to synthesize high-depth reliefs from 3D portraits. Then, we train a normal-to-depth network to learn the mapping from normal maps to relief depths. After that, we use the trained network to generate high-depth relief samples using the provided normal maps from Zhang et al. (2023). As each normal map has pixel-wise photograph, we are able to establish correspondences between photographs and high-depth reliefs. By taking the bas-reliefs of Zhang et al. (2023), the new high-depth reliefs and their mixtures as target ground-truths, we finally train a encoder-to-decoder network to achieve style-aware relief modeling. Specially, the network is based on a U-shaped architecture, consisting of Swin Transformer blocks to process hierarchical deep features. Extensive experiments have demonstrated the effectiveness of the proposed method. Comparisons with previous works have verified its flexibility and state-of-the-art performance.</p></div>","PeriodicalId":55083,"journal":{"name":"Graphical Models","volume":"130 ","pages":"Article 101210"},"PeriodicalIF":1.7,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1524070323000401/pdfft?md5=de53c7cacd318b65effd57ea40c70f18&pid=1-s2.0-S1524070323000401-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138454034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A decomposition scheme for continuous Level of Detail, streaming and lossy compression of unordered point clouds 无序点云的连续细节、流和有损压缩分解方案

IF 1.7 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Graphical Models

Pub Date : 2023-11-08 DOI: 10.1016/j.gmod.2023.101208

Jan Martens, Jörg Blankenbach

Modern laser scanners, depth sensor devices and Dense Image Matching techniques allow for capturing of extensive point cloud datasets. While capturing has become more user-friendly, the size of registered point clouds results in large datasets which pose challenges for processing, storage and visualization. This paper presents a decomposition scheme using oriented KD trees and the wavelet transform for unordered point clouds. Taking inspiration from image pyramids, the decomposition scheme comes with a Level of Detail representation where higher-levels are progressively reconstructed from lower ones, thus making it suitable for streaming and continuous Level of Detail. Furthermore, the decomposed representation allows common compression techniques to achieve higher compression ratios by modifying the underlying frequency data at the cost of geometric accuracy and therefore allows for flexible lossy compression. After introducing this novel decomposition scheme, results are discussed to show how it deals with data captured from different sources.

现代激光扫描仪，深度传感器设备和密集图像匹配技术允许捕获广泛的点云数据集。虽然捕获变得更加用户友好，但配准点云的大小导致了大型数据集，这给处理、存储和可视化带来了挑战。提出了一种基于定向KD树和小波变换的无序点云分解方案。从图像金字塔中获得灵感，分解方案带有一个细节级别表示，其中较高的级别从较低的级别逐步重建，从而使其适合流和连续的细节级别。此外，分解后的表示允许普通压缩技术通过以几何精度为代价修改底层频率数据来实现更高的压缩比，因此允许灵活的有损压缩。在介绍了这种新的分解方案后，讨论了结果，以显示它如何处理从不同来源捕获的数据。

{"title":"A decomposition scheme for continuous Level of Detail, streaming and lossy compression of unordered point clouds","authors":"Jan Martens, Jörg Blankenbach","doi":"10.1016/j.gmod.2023.101208","DOIUrl":"https://doi.org/10.1016/j.gmod.2023.101208","url":null,"abstract":"<div><p>Modern laser scanners, depth sensor devices and Dense Image Matching techniques allow for capturing of extensive point cloud datasets. While capturing has become more user-friendly, the size of registered point clouds results in large datasets which pose challenges for processing, storage and visualization. This paper presents a decomposition scheme using oriented KD trees and the wavelet transform for unordered point clouds. Taking inspiration from image pyramids, the decomposition scheme comes with a Level of Detail representation where higher-levels are progressively reconstructed from lower ones, thus making it suitable for streaming and continuous Level of Detail. Furthermore, the decomposed representation allows common compression techniques to achieve higher compression ratios by modifying the underlying frequency data at the cost of geometric accuracy and therefore allows for flexible lossy compression. After introducing this novel decomposition scheme, results are discussed to show how it deals with data captured from different sources.</p></div>","PeriodicalId":55083,"journal":{"name":"Graphical Models","volume":"130 ","pages":"Article 101208"},"PeriodicalIF":1.7,"publicationDate":"2023-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1524070323000383/pdfft?md5=acb2ab838184d4b7e97e6052e64a6ea6&pid=1-s2.0-S1524070323000383-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"92047097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Vertex position estimation with spatial–temporal transformer for 3D human reconstruction 基于时空变换的三维人体重构顶点位置估计

IF 1.7 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Graphical Models

Pub Date : 2023-10-26 DOI: 10.1016/j.gmod.2023.101207

Xiangjun Zhang, Yinglin Zheng, Wenjin Deng, Qifeng Dai, Yuxin Lin, Wangzheng Shi, Ming Zeng

Reconstructing 3D human pose and body shape from monocular images or videos is a fundamental task for comprehending human dynamics. Frame-based methods can be broadly categorized into two fashions: those regressing parametric model parameters (e.g., SMPL) and those exploring alternative representations (e.g., volumetric shapes, 3D coordinates). Non-parametric representations have demonstrated superior performance due to their enhanced flexibility. However, when applied to video data, these non-parametric frame-based methods tend to generate inconsistent and unsmooth results. To this end, we present a novel approach that directly regresses the 3D coordinates of the mesh vertices and body joints with a spatial–temporal Transformer. In our method, we introduce a SpatioTemporal Learning Block (STLB) with Spatial Learning Module (SLM) and Temporal Learning Module (TLM), which leverages spatial and temporal information to model interactions at a finer granularity, specifically at the body token level. Our method outperforms previous state-of-the-art approaches on Human3.6M and 3DPW benchmark datasets.

从单目图像或视频中重建三维人体姿势和身体形状是理解人体动力学的基本任务。基于框架的方法可以大致分为两种模式:回归参数模型参数(例如，SMPL)和探索替代表示(例如，体积形状，3D坐标)。非参数表示由于其增强的灵活性而表现出优越的性能。然而，当应用于视频数据时，这些基于非参数帧的方法容易产生不一致和不平滑的结果。为此，我们提出了一种新的方法，即使用时空转换器直接回归网格顶点和身体关节的三维坐标。在我们的方法中，我们引入了一个具有空间学习模块(SLM)和时间学习模块(TLM)的时空学习块(STLB)，它利用空间和时间信息在更细的粒度上建模交互，特别是在身体令牌级别。我们的方法在Human3.6M和3DPW基准数据集上优于以前最先进的方法。

{"title":"Vertex position estimation with spatial–temporal transformer for 3D human reconstruction","authors":"Xiangjun Zhang, Yinglin Zheng, Wenjin Deng, Qifeng Dai, Yuxin Lin, Wangzheng Shi, Ming Zeng","doi":"10.1016/j.gmod.2023.101207","DOIUrl":"https://doi.org/10.1016/j.gmod.2023.101207","url":null,"abstract":"<div><p>Reconstructing 3D human pose and body shape from monocular images or videos is a fundamental task for comprehending human dynamics. Frame-based methods can be broadly categorized into two fashions: those regressing parametric model parameters (e.g., SMPL) and those exploring alternative representations (e.g., volumetric shapes, 3D coordinates). Non-parametric representations have demonstrated superior performance due to their enhanced flexibility. However, when applied to video data, these non-parametric frame-based methods tend to generate inconsistent and unsmooth results. To this end, we present a novel approach that directly regresses the 3D coordinates of the mesh vertices and body joints with a spatial–temporal Transformer. In our method, we introduce a SpatioTemporal Learning Block (STLB) with Spatial Learning Module (SLM) and Temporal Learning Module (TLM), which leverages spatial and temporal information to model interactions at a finer granularity, specifically at the body token level. Our method outperforms previous state-of-the-art approaches on Human3.6M and 3DPW benchmark datasets.</p></div>","PeriodicalId":55083,"journal":{"name":"Graphical Models","volume":"130 ","pages":"Article 101207"},"PeriodicalIF":1.7,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1524070323000371/pdfft?md5=a920877b3ee3210b23f7a6444d151f50&pid=1-s2.0-S1524070323000371-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"92047096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A systematic approach for enhancement of homogeneous background images using structural information 利用结构信息增强均匀背景图像的系统方法

IF 1.7 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Graphical Models

Pub Date : 2023-10-25 DOI: 10.1016/j.gmod.2023.101206

D. Vijayalakshmi , Malaya Kumar Nath

Image enhancement is an indispensable pre-processing step for several image processing applications. Mainly, histogram equalization is one of the widespread techniques used by various researchers to improve the image quality by expanding the pixel values to fill the entire dynamic grayscale. It results in the visual artifact, structural information loss near edges due to the information loss (due to many-to-one mapping), and alteration in average luminance to a higher value. This paper proposes an enhancement algorithm based on structural information for homogeneous background images. The intensities are divided into two segments using the median value to preserve the average luminance. Unlike traditional techniques, this algorithm incorporates the spatial locations in the equalization process instead of the number of intensity values occurrences. The occurrences of each intensity concerning their spatial locations are combined using Rènyi entropy to enumerate a discrete function. An adaptive clipping limit is applied to the discrete function to control the enhancement rate. Then histogram equalization is performed on each segment separately, and the equalized segments are integrated to produce an enhanced image. The algorithm’s effectiveness is validated by evaluating the proposed method on CEED, CSIQ, LOL, and TID2013 databases. Experimental results reveal that the proposed method improves the contrast while preserving structural information, detail information, and average luminance. They are quantified by the high value of contrast improvement index, structural similarity index, and discrete entropy, and low value of average mean brightness error values of the proposed method when compared with the methods available in the literature, including deep learning architectures.

在许多图像处理应用中，图像增强是必不可少的预处理步骤。直方图均衡化(histogram equalization)是众多研究者广泛使用的技术之一，通过扩展像素值来填充整个动态灰度，从而提高图像质量。它会导致视觉伪影，由于信息丢失(由于多对一映射)而导致的边缘附近的结构信息丢失，以及平均亮度变化到更高的值。提出了一种基于结构信息的均匀背景图像增强算法。使用中值将亮度分成两段，以保持平均亮度。与传统技术不同，该算法在均衡过程中包含空间位置，而不是强度值出现的次数。每个强度与其空间位置相关的出现使用rnyi熵组合以枚举离散函数。对离散函数采用自适应裁剪限制来控制增强率。然后对每一段分别进行直方图均衡化，并将均衡化后的片段进行综合，得到增强图像。通过对CEED、CSIQ、LOL和TID2013数据库的评估，验证了该算法的有效性。实验结果表明，该方法在保留结构信息、细节信息和平均亮度的同时，提高了对比度。与文献中已有的方法(包括深度学习架构)相比，本文方法的对比度改进指数、结构相似性指数和离散熵值较高，平均亮度误差值较低。

{"title":"A systematic approach for enhancement of homogeneous background images using structural information","authors":"D. Vijayalakshmi , Malaya Kumar Nath","doi":"10.1016/j.gmod.2023.101206","DOIUrl":"https://doi.org/10.1016/j.gmod.2023.101206","url":null,"abstract":"<div><p>Image enhancement is an indispensable pre-processing step for several image processing applications. Mainly, histogram equalization is one of the widespread techniques used by various researchers to improve the image quality by expanding the pixel values to fill the entire dynamic grayscale. It results in the visual artifact, structural information loss near edges due to the information loss (due to many-to-one mapping), and alteration in average luminance to a higher value. This paper proposes an enhancement algorithm based on structural information for homogeneous background images. The intensities are divided into two segments using the median value to preserve the average luminance. Unlike traditional techniques, this algorithm incorporates the spatial locations in the equalization process instead of the number of intensity values occurrences. The occurrences of each intensity concerning their spatial locations are combined using Rènyi entropy to enumerate a discrete function. An adaptive clipping limit is applied to the discrete function to control the enhancement rate. Then histogram equalization is performed on each segment separately, and the equalized segments are integrated to produce an enhanced image. The algorithm’s effectiveness is validated by evaluating the proposed method on CEED, CSIQ, LOL, and TID2013 databases. Experimental results reveal that the proposed method improves the contrast while preserving structural information, detail information, and average luminance. They are quantified by the high value of contrast improvement index, structural similarity index, and discrete entropy, and low value of average mean brightness error values of the proposed method when compared with the methods available in the literature, including deep learning architectures.</p></div>","PeriodicalId":55083,"journal":{"name":"Graphical Models","volume":"130 ","pages":"Article 101206"},"PeriodicalIF":1.7,"publicationDate":"2023-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S152407032300036X/pdfft?md5=66c749d2624c0d77acd46a4f2037626a&pid=1-s2.0-S152407032300036X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"92047095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Jrender: An efficient differentiable rendering library based on Jittor Jrender:一个基于Jittor的高效可微分渲染库

IF 1.7 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Graphical Models

Pub Date : 2023-10-18 DOI: 10.1016/j.gmod.2023.101202

Hanggao Xin, Chenzhong Xiang, Wenyang Zhou, Dun Liang

Differentiable rendering has been proven as a powerful tool to bridge 2D images and 3D models. With the aid of differentiable rendering, tasks in computer vision and computer graphics could be solved more elegantly and accurately. To address challenges in the implementations of differentiable rendering methods, we present an efficient and modular differentiable rendering library named Jrender based on Jittor. Jrender supports surface rendering for 3D meshes and volume rendering for 3D volumes. Compared with previous differentiable renderers, Jrender exhibits a significant improvement in both performance and rendering quality. Due to the modular design, various rendering effects such as PBR materials shading, ambient occlusions, soft shadows, global illumination, and subsurface scattering could be easily supported in Jrender, which are not available in other differentiable rendering libraries. To validate our library, we integrate Jrender into applications such as 3D object reconstruction and NeRF, which show that our implementations could achieve the same quality with higher performance.

可微分渲染已经被证明是一个强大的工具，以桥梁2D图像和3D模型。借助可微渲染，计算机视觉和计算机图形学中的任务可以更优雅、更精确地求解。为了解决可微渲染方法实现中的挑战，我们提出了一个基于Jittor的高效模块化可微渲染库Jrender。Jrender支持3D网格的表面渲染和3D体的体渲染。与以前的可微分渲染器相比，Jrender在性能和渲染质量方面都有了显著的改进。由于模块化设计，Jrender可以很容易地支持各种渲染效果，如PBR材质着色、环境遮挡、软阴影、全局照明和亚表面散射，这些在其他可微分渲染库中是不可用的。为了验证我们的库，我们将Jrender集成到3D对象重建和NeRF等应用程序中，这表明我们的实现可以在更高的性能下达到相同的质量。

引用次数: 0

Packing problems on generalised regular grid: Levels of abstraction using integer linear programming 广义正则网格上的填充问题:用整数线性规划的抽象层次

IF 1.7 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Graphical Models

Pub Date : 2023-10-07 DOI: 10.1016/j.gmod.2023.101205

Hao Hua , Benjamin Dillenburger

Packing a designated set of shapes on a regular grid is an important class of operations research problems that has been intensively studied for more than six decades. Representing a $d$ -dimensional discrete grid as $Z^{d}$ , we formalise the generalised regular grid (GRG) as a surjective function from $Z^{d}$ to a geometric tessellation in a physical space, for example, the cube coordinates of a hexagonal grid or a quasilattice. This study employs 0-1 integer linear programming (ILP) to formulate the polyomino tiling problem with adjacency constraints. Rotation & reflection invariance in adjacency are considered. We separate the formal ILP from the topology & geometry of various grids, such as Ammann-Beenker tiling, Penrose tiling and periodic hypercube. Based on cutting-edge solvers, we reveal an intuitive correspondence between the integer program (a pattern of algebraic rules) and the computer codes. Models of packing problems in the GRG have wide applications in production system, facility layout planning, and architectural design. Two applications in planning high-rise residential apartments are illustrated.

在规则网格上填充一组指定的形状是一类重要的运筹学问题，已经被深入研究了60多年。将d维离散网格表示为Zd，我们将广义规则网格(GRG)形式化为从Zd到物理空间中的几何镶嵌的满射函数，例如，六边形网格或准格的立方体坐标。本研究采用0-1整数线性规划(ILP)来表述具有邻接约束的多集平铺问题。旋转,考虑了邻接中的反射不变性。我们将形式ILP从拓扑中分离出来;各种网格的几何结构，如Ammann-Beenker平铺、Penrose平铺和周期超立方体。基于先进的求解器，我们揭示了整数程序(代数规则模式)与计算机代码之间的直观对应关系。GRG中的包装问题模型在生产系统、设施布局规划和建筑设计中有着广泛的应用。举例说明了在高层住宅公寓规划中的两种应用。

{"title":"Packing problems on generalised regular grid: Levels of abstraction using integer linear programming","authors":"Hao Hua , Benjamin Dillenburger","doi":"10.1016/j.gmod.2023.101205","DOIUrl":"https://doi.org/10.1016/j.gmod.2023.101205","url":null,"abstract":"<div><p>Packing a designated set of shapes on a regular grid is an important class of operations research problems that has been intensively studied for more than six decades. Representing a <span><math><mi>d</mi></math></span>-dimensional discrete grid as <span><math><msup><mrow><mi>Z</mi></mrow><mrow><mi>d</mi></mrow></msup></math></span>, we formalise the generalised regular grid (GRG) as a surjective function from <span><math><msup><mrow><mi>Z</mi></mrow><mrow><mi>d</mi></mrow></msup></math></span> to a geometric tessellation in a physical space, for example, the cube coordinates of a hexagonal grid or a quasilattice. This study employs 0-1 integer linear programming (ILP) to formulate the polyomino tiling problem with adjacency constraints. Rotation & reflection invariance in adjacency are considered. We separate the formal ILP from the topology & geometry of various grids, such as Ammann-Beenker tiling, Penrose tiling and periodic hypercube. Based on cutting-edge solvers, we reveal an intuitive correspondence between the integer program (a pattern of algebraic rules) and the computer codes. Models of packing problems in the GRG have wide applications in production system, facility layout planning, and architectural design. Two applications in planning high-rise residential apartments are illustrated.</p></div>","PeriodicalId":55083,"journal":{"name":"Graphical Models","volume":"130 ","pages":"Article 101205"},"PeriodicalIF":1.7,"publicationDate":"2023-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49889742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0