ACM Trans. Graph.最新文献

英文中文

BoolSurf: Boolean Operations on Surfaces BoolSurf:表面上的布尔运算

ACM Trans. Graph.

Pub Date : 2022-11-30 DOI: 10.1145/3550454.3555466

Marzia Riso, Giacomo Nazzaro, E. Puppo, Alec Jacobson, Qingnan Zhou, F. Pellacini

We port Boolean set operations between 2D shapes to surfaces of any genus, with any number of open boundaries. We combine shapes bounded by sets of freely intersecting loops, consisting of geodesic lines and cubic Bézier splines lying on a surface. We compute the arrangement of shapes directly on the surface and assign integer labels to the cells of such arrangement. Differently from the Euclidean case, some arrangements on a manifold may be inconsistent. We detect inconsistent arrangements and help the user to resolve them. Also, we extend to the manifold setting recent work on Boundary-Sampled Halfspaces, thus supporting operations more general than standard Booleans, which are well defined on inconsistent arrangements, too. Our implementation discretizes the input shapes into polylines at an arbitrary resolution, independent of the level of resolution of the underlying mesh. We resolve the arrangement inside each triangle of the mesh independently and combine the results to reconstruct both the boundaries and the interior of each cell in the arrangement. We reconstruct the control points of curves bounding cells, in order to free the result from discretization and provide an output in vector format. We support interactive usage, editing shapes consisting up to 100k line segments on meshes of up to 1M triangles.

我们将二维形状之间的布尔集合操作移植到任意曲面上

{"title":"BoolSurf: Boolean Operations on Surfaces","authors":"Marzia Riso, Giacomo Nazzaro, E. Puppo, Alec Jacobson, Qingnan Zhou, F. Pellacini","doi":"10.1145/3550454.3555466","DOIUrl":"https://doi.org/10.1145/3550454.3555466","url":null,"abstract":"We port Boolean set operations between 2D shapes to surfaces of any genus, with any number of open boundaries. We combine shapes bounded by sets of freely intersecting loops, consisting of geodesic lines and cubic Bézier splines lying on a surface. We compute the arrangement of shapes directly on the surface and assign integer labels to the cells of such arrangement. Differently from the Euclidean case, some arrangements on a manifold may be inconsistent. We detect inconsistent arrangements and help the user to resolve them. Also, we extend to the manifold setting recent work on Boundary-Sampled Halfspaces, thus supporting operations more general than standard Booleans, which are well defined on inconsistent arrangements, too. Our implementation discretizes the input shapes into polylines at an arbitrary resolution, independent of the level of resolution of the underlying mesh. We resolve the arrangement inside each triangle of the mesh independently and combine the results to reconstruct both the boundaries and the interior of each cell in the arrangement. We reconstruct the control points of curves bounding cells, in order to free the result from discretization and provide an output in vector format. We support interactive usage, editing shapes consisting up to 100k line segments on meshes of up to 1M triangles.","PeriodicalId":7121,"journal":{"name":"ACM Trans. Graph.","volume":"7 1","pages":"1 - 13"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74455810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

SkinMixer: Blending 3D Animated Models SkinMixer:混合3D动画模型

ACM Trans. Graph.

Pub Date : 2022-11-30 DOI: 10.1145/3550454.3555503

S. Nuvoli, N. Pietroni, Paolo Cignoni, R. Scateni, M. Tarini

We propose a novel technique to compose new 3D animated models, such as videogame characters, by combining pieces from existing ones. Our method works on production-ready rigged, skinned, and animated 3D models to reassemble new ones. We exploit mix-and-match operations on the skeletons to trigger the automatic creation of a new mesh, linked to the new skeleton by a set of skinning weights and complete with a set of animations. The resulting model preserves the quality of the input meshings (which can be quad-dominant and semi-regular), skinning weights (inducing believable deformation), and animations, featuring coherent movements of the new skeleton. Our method enables content creators to reuse valuable, carefully designed assets by assembling new ready-to-use characters while preserving most of the hand-crafted subtleties of models authored by digital artists. As shown in the accompanying video, it allows for drastically cutting the time needed to obtain the final result.

图1所示。我们的技术在行动中混合和匹配动画蒙皮模型。两个生产就绪的视频游戏模型，包括半规则的四主导网格hm0,1，一组蒙皮重量sw0,1，骨架S0,1和一组关键帧动画A0,1，被自动混合成一个完全装备的模型，由半规则的统一网格mr，骨架SR，蒙皮重量swr，和一组新的兼容动画AR。用户只需选择要合并的骨架部分(植根于红点的灰色区域中的子树)，就可以直观地指定此过程。

引用次数: 1

LuisaRender: A High-Performance Rendering Framework with Layered and Unified Interfaces on Stream Architectures LuisaRender:一个高性能渲染框架，在流架构上具有分层和统一的接口

ACM Trans. Graph.

Pub Date : 2022-11-30 DOI: 10.1145/3550454.3555463

Shao-feng Zheng, Zhiqian Zhou, Xin Chen, Difei Yan, Chuyan Zhang, Yuefeng Geng, Yan Gu, Kun Xu

The advancements in hardware have drawn more attention than ever to high-quality offline rendering with modern stream processors, both in the industry and in research fields. However, the graphics APIs are fragmented and existing shading languages lack high-level constructs such as polymorphism, which adds complexity to developing and maintaining cross-platform high-performance renderers. We present LuisaRender1, a high-performance rendering framework for modern stream-architecture hardware. Our main contribution is an expressive C++-embedded DSL for kernel programming with JIT code generation and compilation. We also implement a unified runtime layer with resource wrappers and an optimized Monte Carlo renderer. Experiments on test scenes show that LuisaRender achieves much higher performance than existing research renderers on modern graphics hardware, e.g., 5--11× faster than PBRT-v4 and 4--16× faster than Mitsuba 3.

设备资源管理抽象语法树

引用次数: 4

QuadStream: A Quad-Based Scene Streaming Architecture for Novel Viewpoint Reconstruction 一种基于QuadStream的场景流架构，用于新颖的视点重建

ACM Trans. Graph.

Pub Date : 2022-11-30 DOI: 10.1145/3550454.3555524

Jozef Hladky, Michael Stengel, Nicholas Vining, B. Kerbl, H. Seidel, M. Steinberger

Streaming rendered 3D content over a network to a thin client device, such as a phone or a VR/AR headset, brings high-fidelity graphics to platforms where it would not normally possible due to thermal, power, or cost constraints. Streamed 3D content must be transmitted with a representation that is both robust to latency and potential network dropouts. Transmitting a video stream and reprojecting to correct for changing viewpoints fails in the presence of disocclusion events; streaming scene geometry and performing high-quality rendering on the client is not possible on limited-power mobile GPUs. To balance the competing goals of disocclusion robustness and minimal client workload, we introduce QuadStream, a new streaming content representation that reduces motion-to-photon latency by allowing clients to efficiently render novel views without artifacts caused by disocclusion events. Motivated by traditional macroblock approaches to video codec design, we decompose the scene seen from positions in a view cell into a series of quad proxies, or view-aligned quads from multiple views. By operating on a rasterized G-Buffer, our approach is independent of the representation used for the scene itself; the resulting QuadStream is an approximate geometric representation of the scene that can be reconstructed by a thin client to render both the current view and nearby adjacent views. Our technical contributions are an efficient parallel quad generation, merging, and packing strategy for proxy views covering potential client movement in a scene; a packing and encoding strategy that allows masked quads with depth information to be transmitted as a frame-coherent stream; and an efficient rendering approach for rendering our QuadStream representation into entirely novel views on thin clients. We show that our approach achieves superior quality compared both to video data streaming methods, and to geometry-based streaming.

在传统的宏块视频编码设计方法的激励下，我们将从位置的一个视图单元中看到的场景分解为一系列的四边形代理，或者来自多个视图的与视图对齐的四边形。通过在栅格化的G-Buffer上操作，我们的方法独立于场景本身使用的表示;由此产生的QuadStream是场景的近似几何表示，可以通过瘦客户端重建以呈现当前视图和附近相邻视图。我们的技术贡献

{"title":"QuadStream: A Quad-Based Scene Streaming Architecture for Novel Viewpoint Reconstruction","authors":"Jozef Hladky, Michael Stengel, Nicholas Vining, B. Kerbl, H. Seidel, M. Steinberger","doi":"10.1145/3550454.3555524","DOIUrl":"https://doi.org/10.1145/3550454.3555524","url":null,"abstract":"Streaming rendered 3D content over a network to a thin client device, such as a phone or a VR/AR headset, brings high-fidelity graphics to platforms where it would not normally possible due to thermal, power, or cost constraints. Streamed 3D content must be transmitted with a representation that is both robust to latency and potential network dropouts. Transmitting a video stream and reprojecting to correct for changing viewpoints fails in the presence of disocclusion events; streaming scene geometry and performing high-quality rendering on the client is not possible on limited-power mobile GPUs. To balance the competing goals of disocclusion robustness and minimal client workload, we introduce QuadStream, a new streaming content representation that reduces motion-to-photon latency by allowing clients to efficiently render novel views without artifacts caused by disocclusion events. Motivated by traditional macroblock approaches to video codec design, we decompose the scene seen from positions in a view cell into a series of quad proxies, or view-aligned quads from multiple views. By operating on a rasterized G-Buffer, our approach is independent of the representation used for the scene itself; the resulting QuadStream is an approximate geometric representation of the scene that can be reconstructed by a thin client to render both the current view and nearby adjacent views. Our technical contributions are an efficient parallel quad generation, merging, and packing strategy for proxy views covering potential client movement in a scene; a packing and encoding strategy that allows masked quads with depth information to be transmitted as a frame-coherent stream; and an efficient rendering approach for rendering our QuadStream representation into entirely novel views on thin clients. We show that our approach achieves superior quality compared both to video data streaming methods, and to geometry-based streaming.","PeriodicalId":7121,"journal":{"name":"ACM Trans. Graph.","volume":"81 1","pages":"1 - 13"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81634420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

PopStage: The Generation of Stage Cross-Editing Video Based on Spatio-Temporal Matching PopStage:基于时空匹配的舞台交叉剪辑视频生成

ACM Trans. Graph.

Pub Date : 2022-11-30 DOI: 10.1145/3550454.3555467

Dawon Lee, Jung Eun Yoo, Kyungmin Cho, Bumki Kim, Gyeonghun Im, Jun-yong Noh

StageMix is a mixed video that is created by concatenating the segments from various performance videos of an identical song in a visually smooth manner by matching the main subject's silhouette presented in the frame. We introduce PopStage, which allows users to generate a StageMix automatically. PopStage is designed based on the StageMix Editing Guideline that we established by interviewing creators as well as observing their workflows. PopStage consists of two main steps: finding an editing path and generating a transition effect at a transition point. Using a reward function that favors visual connection and the optimality of transition timing across the videos, we obtain the optimal path that maximizes the sum of rewards through dynamic programming. Given the optimal path, PopStage then aligns the silhouettes of the main subject from the transitioning video pair to enhance the visual connection at the transition point. The virtual camera view is next optimized to remove the black areas that are often created due to the transformation needed for silhouette alignment, while reducing pixel loss. In this process, we enforce the view to be the maximum size while maintaining the temporal continuity across the frames. Experimental results show that PopStage can generate a StageMix of a similar quality to those produced by professional creators in a highly reduced production time.

StageMix是一款混合视频，它将同一首歌曲的各种表演视频片段拼接在一起，通过匹配画面中呈现的主角轮廓，以视觉上流畅的方式制作而成。我们介绍PopStage，它允许用户自动生成StageMix。PopStage是基于StageMix编辑指南设计的，我们通过采访创作者以及观察他们的工作流程建立了StageMix编辑指南。PopStage包括两个主要步骤:找到编辑路径和在过渡点生成过渡效果。使用奖励功能

{"title":"PopStage: The Generation of Stage Cross-Editing Video Based on Spatio-Temporal Matching","authors":"Dawon Lee, Jung Eun Yoo, Kyungmin Cho, Bumki Kim, Gyeonghun Im, Jun-yong Noh","doi":"10.1145/3550454.3555467","DOIUrl":"https://doi.org/10.1145/3550454.3555467","url":null,"abstract":"StageMix is a mixed video that is created by concatenating the segments from various performance videos of an identical song in a visually smooth manner by matching the main subject's silhouette presented in the frame. We introduce PopStage, which allows users to generate a StageMix automatically. PopStage is designed based on the StageMix Editing Guideline that we established by interviewing creators as well as observing their workflows. PopStage consists of two main steps: finding an editing path and generating a transition effect at a transition point. Using a reward function that favors visual connection and the optimality of transition timing across the videos, we obtain the optimal path that maximizes the sum of rewards through dynamic programming. Given the optimal path, PopStage then aligns the silhouettes of the main subject from the transitioning video pair to enhance the visual connection at the transition point. The virtual camera view is next optimized to remove the black areas that are often created due to the transformation needed for silhouette alignment, while reducing pixel loss. In this process, we enforce the view to be the maximum size while maintaining the temporal continuity across the frames. Experimental results show that PopStage can generate a StageMix of a similar quality to those produced by professional creators in a highly reduced production time.","PeriodicalId":7121,"journal":{"name":"ACM Trans. Graph.","volume":"25 1","pages":"1 - 13"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75877979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

VToonify: Controllable High-Resolution Portrait Video Style Transfer VToonify:可控的高分辨率肖像视频风格转移

ACM Trans. Graph.

Pub Date : 2022-09-22 DOI: 10.48550/arXiv.2209.11224

Shuai Yang, Liming Jiang, Ziwei Liu, Chen Change Loy

Generating high-quality artistic portrait videos is an important and desirable task in computer graphics and vision. Although a series of successful portrait image toonification models built upon the powerful StyleGAN have been proposed, these image-oriented methods have obvious limitations when applied to videos, such as the fixed frame size, the requirement of face alignment, missing non-facial details and temporal inconsistency. In this work, we investigate the challenging controllable high-resolution portrait video style transfer by introducing a novel VToonify framework. Specifically, VToonify leverages the mid- and high-resolution layers of StyleGAN to render high-quality artistic portraits based on the multi-scale content features extracted by an encoder to better preserve the frame details. The resulting fully convolutional architecture accepts non-aligned faces in videos of variable size as input, contributing to complete face regions with natural motions in the output. Our framework is compatible with existing StyleGAN-based image toonification models to extend them to video toonification, and inherits appealing features of these models for flexible style control on color and intensity. This work presents two instantiations of VToonify built upon Toonify and DualStyleGAN for collection-based and exemplar-based portrait video style transfer, respectively. Extensive experimental results demonstrate the effectiveness of our proposed VToonify framework over existing methods in generating high-quality and temporally-coherent artistic portrait videos with flexible style controls.

在计算机图形学和视觉中，生成高质量的艺术肖像视频是一项重要而理想的任务。虽然已经提出了一系列成功的基于强大的StyleGAN的人像图像化模型，但这些面向图像的方法在应用于视频时存在明显的局限性，如固定的帧大小、面部对齐要求、缺少非面部细节和时间不一致。在这项工作中，我们通过引入一种新的VToonify框架来研究具有挑战性的可控高分辨率人像视频风格转移。具体来说，VToonify利用StyleGAN的中高分辨率层，基于编码器提取的多尺度内容特征来渲染高质量的艺术肖像，以更好地保留帧细节。由此产生的全卷积架构接受可变大小视频中的未对齐面部作为输入，从而在输出中贡献具有自然运动的完整面部区域。我们的框架与现有的基于stylegan的图像色调化模型兼容，将其扩展到视频色调化，并继承了这些模型在颜色和强度上灵活的风格控制的吸引人的特性。这项工作提出了VToonify的两个实例，分别建立在Toonify和DualStyleGAN的基础上，用于基于集合和基于示例的人像视频风格转移。大量的实验结果表明，我们提出的VToonify框架在生成具有灵活样式控制的高质量和时间连贯的艺术肖像视频方面优于现有方法的有效性。

{"title":"VToonify: Controllable High-Resolution Portrait Video Style Transfer","authors":"Shuai Yang, Liming Jiang, Ziwei Liu, Chen Change Loy","doi":"10.48550/arXiv.2209.11224","DOIUrl":"https://doi.org/10.48550/arXiv.2209.11224","url":null,"abstract":"Generating high-quality artistic portrait videos is an important and desirable task in computer graphics and vision. Although a series of successful portrait image toonification models built upon the powerful StyleGAN have been proposed, these image-oriented methods have obvious limitations when applied to videos, such as the fixed frame size, the requirement of face alignment, missing non-facial details and temporal inconsistency. In this work, we investigate the challenging controllable high-resolution portrait video style transfer by introducing a novel VToonify framework. Specifically, VToonify leverages the mid- and high-resolution layers of StyleGAN to render high-quality artistic portraits based on the multi-scale content features extracted by an encoder to better preserve the frame details. The resulting fully convolutional architecture accepts non-aligned faces in videos of variable size as input, contributing to complete face regions with natural motions in the output. Our framework is compatible with existing StyleGAN-based image toonification models to extend them to video toonification, and inherits appealing features of these models for flexible style control on color and intensity. This work presents two instantiations of VToonify built upon Toonify and DualStyleGAN for collection-based and exemplar-based portrait video style transfer, respectively. Extensive experimental results demonstrate the effectiveness of our proposed VToonify framework over existing methods in generating high-quality and temporally-coherent artistic portrait videos with flexible style controls.","PeriodicalId":7121,"journal":{"name":"ACM Trans. Graph.","volume":"03 1","pages":"203:1-203:15"},"PeriodicalIF":0.0,"publicationDate":"2022-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86099233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Loki: a unified multiphysics simulation framework for production Loki:用于生产的统一多物理场模拟框架

ACM Trans. Graph.

Pub Date : 2022-07-01 DOI: 10.1145/3528223.3530058

Steve Lesser, A. Stomakhin, Gilles Daviet, J. Wretborn, Johan Edholm, Noh-Hoon Lee, Eston Schweickart, Xiao Zhai, S. Flynn, Andrew Moffat

We introduce Loki, a new framework for robust simulation of fluid, rigid, and deformable objects with non-compromising fidelity on any single element, and capabilities for coupling and representation transitions across multiple elements. Loki adapts multiple best-in-class solvers into a unified framework driven by a declarative state machine where users declare 'what' is simulated but not 'when,' so an automatic scheduling system takes care of mixing any combination of objects. This leads to intuitive setups for coupled simulations such as hair in the wind or objects transitioning from one representation to another, for example bulk water FLIP particles to SPH spray particles to volumetric mist. We also provide a consistent treatment for components used in several domains, such as unified collision and attachment constraints across 1D, 2D, 3D deforming and rigid objects. Distribution over MPI, custom linear equation solvers, and aggressive application of sparse techniques keep performance within production requirements. We demonstrate a variety of solvers within the framework and their interactions, including FLIPstyle liquids, spatially adaptive volumetric fluids, SPH, MPM, and mesh-based solids, including but not limited to discrete elastic rods, elastons, and FEM with state-of-the-art constitutive models. Our framework has proven powerful and intuitive enough for voluntary artist adoption and has delivered creature and FX simulations for multiple major movie productions in the preceding four years.

我们介绍了Loki，这是一个新的框架，用于模拟流体、刚性和可变形物体，在任何单个元素上都具有不妥协的保真度，并具有跨多个元素耦合和表示转换的能力。Loki将多个一流的解决方案适配到一个由声明性状态机驱动的统一框架中，用户可以声明“模拟什么”而不是“何时”，因此自动调度系统可以处理混合对象的任何组合。这导致耦合模拟的直观设置，例如风中的头发或物体从一种表示转换到另一种表示，例如大块水翻转粒子到SPH喷雾粒子到体积雾。我们还为多个领域中使用的组件提供一致的处理，例如统一的碰撞和连接约束，跨越1D, 2D, 3D变形和刚性对象。MPI上的分布、自定义线性方程求解器和稀疏技术的积极应用使性能保持在生产要求之内。我们展示了框架内的各种求解器及其相互作用，包括FLIP式液体，空间自适应体积流体，SPH, MPM和基于网格的固体，包括但不限于离散弹性棒，弹性和具有最先进本构模型的FEM。我们的框架已经得到了证明。

{"title":"Loki: a unified multiphysics simulation framework for production","authors":"Steve Lesser, A. Stomakhin, Gilles Daviet, J. Wretborn, Johan Edholm, Noh-Hoon Lee, Eston Schweickart, Xiao Zhai, S. Flynn, Andrew Moffat","doi":"10.1145/3528223.3530058","DOIUrl":"https://doi.org/10.1145/3528223.3530058","url":null,"abstract":"We introduce Loki, a new framework for robust simulation of fluid, rigid, and deformable objects with non-compromising fidelity on any single element, and capabilities for coupling and representation transitions across multiple elements. Loki adapts multiple best-in-class solvers into a unified framework driven by a declarative state machine where users declare 'what' is simulated but not 'when,' so an automatic scheduling system takes care of mixing any combination of objects. This leads to intuitive setups for coupled simulations such as hair in the wind or objects transitioning from one representation to another, for example bulk water FLIP particles to SPH spray particles to volumetric mist. We also provide a consistent treatment for components used in several domains, such as unified collision and attachment constraints across 1D, 2D, 3D deforming and rigid objects. Distribution over MPI, custom linear equation solvers, and aggressive application of sparse techniques keep performance within production requirements. We demonstrate a variety of solvers within the framework and their interactions, including FLIPstyle liquids, spatially adaptive volumetric fluids, SPH, MPM, and mesh-based solids, including but not limited to discrete elastic rods, elastons, and FEM with state-of-the-art constitutive models. Our framework has proven powerful and intuitive enough for voluntary artist adoption and has delivered creature and FX simulations for multiple major movie productions in the preceding four years.","PeriodicalId":7121,"journal":{"name":"ACM Trans. Graph.","volume":"29 1","pages":"1 - 20"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81903376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

ASSET: Autoregressive Semantic Scene Editing with Transformers at High Resolutions 资产:自回归语义场景编辑与变压器在高分辨率

ACM Trans. Graph.

Pub Date : 2022-05-24 DOI: 10.48550/arXiv.2205.12231

Difan Liu, Sandesh Shetty, T. Hinz, Matthew Fisher, Richard Zhang, Taesung Park, E. Kalogerakis

We present ASSET, a neural architecture for automatically modifying an input high-resolution image according to a user's edits on its semantic segmentation map. Our architecture is based on a transformer with a novel attention mechanism. Our key idea is to sparsify the transformer's attention matrix at high resolutions, guided by dense attention extracted at lower image resolutions. While previous attention mechanisms are computationally too expensive for handling high-resolution images or are overly constrained within specific image regions hampering long-range interactions, our novel attention mechanism is both computationally efficient and effective. Our sparsified attention mechanism is able to capture long-range interactions and context, leading to synthesizing interesting phenomena in scenes, such as reflections of landscapes onto water or flora consistent with the rest of the landscape, that were not possible to generate reliably with previous convnets and transformer approaches. We present qualitative and quantitative results, along with user studies, demonstrating the effectiveness of our method.

我们提出了ASSET，一种神经结构，用于根据用户对其语义分割图的编辑自动修改输入的高分辨率图像。我们的架构是基于一个具有新颖注意力机制的转换器。我们的关键思想是在高分辨率下稀疏变压器的注意力矩阵，在较低图像分辨率下提取密集的注意力。虽然以前的注意机制在处理高分辨率图像时在计算上过于昂贵，或者过于局限于特定的图像区域，阻碍了远程交互，但我们的新注意机制在计算上既高效又有效。我们的稀疏注意力机制能够捕捉远程互动和环境，从而综合场景中有趣的现象，例如景观在水面或植物上的反射与景观的其余部分一致，这是以前的convnets和transformer方法无法可靠地产生的。我们提出了定性和定量结果，以及用户研究，证明了我们方法的有效性。

{"title":"ASSET: Autoregressive Semantic Scene Editing with Transformers at High Resolutions","authors":"Difan Liu, Sandesh Shetty, T. Hinz, Matthew Fisher, Richard Zhang, Taesung Park, E. Kalogerakis","doi":"10.48550/arXiv.2205.12231","DOIUrl":"https://doi.org/10.48550/arXiv.2205.12231","url":null,"abstract":"We present ASSET, a neural architecture for automatically modifying an input high-resolution image according to a user's edits on its semantic segmentation map. Our architecture is based on a transformer with a novel attention mechanism. Our key idea is to sparsify the transformer's attention matrix at high resolutions, guided by dense attention extracted at lower image resolutions. While previous attention mechanisms are computationally too expensive for handling high-resolution images or are overly constrained within specific image regions hampering long-range interactions, our novel attention mechanism is both computationally efficient and effective. Our sparsified attention mechanism is able to capture long-range interactions and context, leading to synthesizing interesting phenomena in scenes, such as reflections of landscapes onto water or flora consistent with the rest of the landscape, that were not possible to generate reliably with previous convnets and transformer approaches. We present qualitative and quantitative results, along with user studies, demonstrating the effectiveness of our method.","PeriodicalId":7121,"journal":{"name":"ACM Trans. Graph.","volume":"2 1","pages":"74:1-74:12"},"PeriodicalIF":0.0,"publicationDate":"2022-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86026223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

WallPlan: synthesizing floorplans by learning to generate wall graphs WallPlan:通过学习生成墙图来合成平面图

ACM Trans. Graph.

Pub Date : 2022-01-01 DOI: 10.1145/3528223.3530135

Jiahui Sun, Wenming Wu, Ligang Liu, Wenjie Min, Gaofeng Zhang, Liping Zheng

Floorplan generation has drawn widespread interest in the community. Re- cent learning-based methods for generating realistic floorplans have made significant progress while a complex heuristic post-processing is still neces- sary to obtain desired results. In this paper, we propose a novel wall-oriented method, called WallPlan , for automatically and efficiently generating plausi- blefloorplansfromvariousdesignconstraints.Wepioneertherepresentation ofthefloorplanasawallgraphwithroomlabelsandconsiderthefloorplangenerationasagraphgeneration.Giventheboundaryasinput,wefirst initializetheboundarywithwindowspredictedbyWinNet.ThenagraphgenerationnetworkGraphNetandsemanticspredictionnetworkLabelNet arecoupledtogeneratethewallgraphprogressivelybyimitatinggraphtra-versal. WallPlan can be applied for practical architectural designs, especially the wall-based constraints. We conduct ablation experiments, qualitative evaluations, quantitative comparisons, and perceptual studies to evaluate our method’s feasibility, efficacy, and versatility. Intensive experiments demon- strate our method requires no post-processing, producing higher quality floorplans than state-of-the-art techniques.

楼面布置图的生成引起了社会的广泛关注。最近基于学习的生成真实平面图的方法取得了重大进展，但仍需要复杂的启发式后处理才能获得期望的结果。在本文中，我们提出了一种新的面向墙壁的方法，称为WallPlan，用于自动有效地从各种设计约束中生成平面平面图。Wepioneertherepresentation ofthefloorplanasawallgraphwithroomlabelsandconsiderthefloorplangenerationasagraphgeneration。Giventheboundaryasinput, wefirst initializetheboundarywithwindowspredictedbyWinNet。ThenagraphgenerationnetworkGraphNetandsemanticspredictionnetworkLabelNet arecoupledtogeneratethewallgraphprogressivelybyimitatinggraphtra-versal。WallPlan可以应用于实际的建筑设计，特别是基于墙的约束。我们通过消融实验、定性评估、定量比较和感知研究来评估我们的方法的可行性、有效性和通用性。密集的实验证明，我们的方法不需要后处理，产生比最先进的技术更高质量的平面图。

{"title":"WallPlan: synthesizing floorplans by learning to generate wall graphs","authors":"Jiahui Sun, Wenming Wu, Ligang Liu, Wenjie Min, Gaofeng Zhang, Liping Zheng","doi":"10.1145/3528223.3530135","DOIUrl":"https://doi.org/10.1145/3528223.3530135","url":null,"abstract":"Floorplan generation has drawn widespread interest in the community. Re- cent learning-based methods for generating realistic floorplans have made significant progress while a complex heuristic post-processing is still neces- sary to obtain desired results. In this paper, we propose a novel wall-oriented method, called WallPlan , for automatically and efficiently generating plausi- blefloorplansfromvariousdesignconstraints.Wepioneertherepresentation ofthefloorplanasawallgraphwithroomlabelsandconsiderthefloorplangenerationasagraphgeneration.Giventheboundaryasinput,wefirst initializetheboundarywithwindowspredictedbyWinNet.ThenagraphgenerationnetworkGraphNetandsemanticspredictionnetworkLabelNet arecoupledtogeneratethewallgraphprogressivelybyimitatinggraphtra-versal. WallPlan can be applied for practical architectural designs, especially the wall-based constraints. We conduct ablation experiments, qualitative evaluations, quantitative comparisons, and perceptual studies to evaluate our method’s feasibility, efficacy, and versatility. Intensive experiments demon- strate our method requires no post-processing, producing higher quality floorplans than state-of-the-art techniques.","PeriodicalId":7121,"journal":{"name":"ACM Trans. Graph.","volume":"97 1","pages":"92:1-92:14"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85944373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

stelaCSF: a unified model of contrast sensitivity as the function of spatio-temporal frequency, eccentricity, luminance and area stelaCSF:对比灵敏度作为时空频率、偏心率、亮度和面积函数的统一模型

ACM Trans. Graph.

Pub Date : 2022-01-01 DOI: 10.1145/3528223.3530115

Rafał K. Mantiuk, Alexandre Chapiro, Alexandre Chapiro

contrast many The existing CSFs typically account for a subset of relevant dimensions describing a stimulus, limiting the use of such functions to either static or foveal content but not both. In this paper, we propose a unified CSF, stelaCSF, which accounts for all major dimensions of the stimulus: spatial and temporal frequency, eccentricity, luminance, and area. To model the 5- dimensional space of contrast sensitivity, we combined data from 11 papers, each of which studied a subset of this space. While previously proposed CSFs were fitted to a single dataset, stelaCSF can predict the data from all these studies using the same set of parameters. The predictions are accurate in the entire domain, including low frequencies. In addition, stelaCSF relies on psychophysical models and experimental evidence to explain the major interactions between the 5 dimensions of the CSF. We demonstrate the utility of our new CSF in a flicker detection metric and in foveated rendering.

现有的csf通常只考虑描述刺激的相关维度的一个子集，限制了这些功能的使用，要么是静态的，要么是中央凹的内容，而不是两者都使用。在本文中，我们提出了一个统一的CSF, stelaCSF，它考虑了刺激的所有主要维度:空间和时间频率，离心率，亮度和面积。为了模拟对比敏感度的5维空间，我们结合了11篇论文的数据，每篇论文都研究了这个空间的一个子集。虽然以前提出的csf是针对单个数据集进行拟合的，但stelaCSF可以使用相同的参数集预测所有这些研究的数据。预测在整个域都是准确的，包括低频。此外，stelaCSF依赖于心理物理模型和实验证据来解释脑脊液5个维度之间的主要相互作用。我们展示了我们的新CSF在闪烁检测度量和注视点渲染中的效用。

引用次数: 14

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

ACM Trans. Graph.

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀