首页 > 最新文献

Computational Visual Media最新文献

英文 中文
Learning physically based material and lighting decompositions for face editing 为人脸编辑学习基于物理的材质和照明分解
IF 6.9 3区 计算机科学 Q1 Computer Science Pub Date : 2024-01-03 DOI: 10.1007/s41095-022-0309-1
Qian Zhang, Vikas Thamizharasan, James Tompkin

Lighting is crucial for portrait photography, yet the complex interactions between the skin and incident light are expensive to model computationally in graphics and difficult to reconstruct analytically via computer vision. Alternatively, to allow fast and controllable reflectance and lighting editing, we developed a physically based decomposition through deep learned priors from path-traced portrait images. Previous approaches that used simplified material models or low-frequency or low-dynamic-range lighting struggled to model specular reflections or relight directly without intermediate decomposition. However, we estimate the surface normal, skin albedo and roughness, and high-frequency HDRI maps, and propose an architecture to estimate both diffuse and specular reflectance components. In our experiments, we show that this approach can represent the true appearance function more effectively than simpler baseline methods, leading to better generalization and higher-quality editing.

照明对于人像摄影至关重要,然而皮肤与入射光之间复杂的相互作用在图形学中的计算建模成本很高,而且很难通过计算机视觉进行分析重建。为了实现快速、可控的反射和光照编辑,我们开发了一种基于物理的分解方法,通过对路径追踪的人像图像进行深度学习前验来实现。以前的方法使用简化的材料模型或低频或低动态范围照明,很难在没有中间分解的情况下直接模拟镜面反射或重新照明。然而,我们估算了表面法线、皮肤反照率和粗糙度以及高频 HDRI 地图,并提出了一种估算漫反射和镜面反射成分的架构。在实验中,我们发现这种方法比简单的基线方法能更有效地表示真实的外观函数,从而获得更好的泛化效果和更高质量的编辑效果。
{"title":"Learning physically based material and lighting decompositions for face editing","authors":"Qian Zhang, Vikas Thamizharasan, James Tompkin","doi":"10.1007/s41095-022-0309-1","DOIUrl":"https://doi.org/10.1007/s41095-022-0309-1","url":null,"abstract":"<p>Lighting is crucial for portrait photography, yet the complex interactions between the skin and incident light are expensive to model computationally in graphics and difficult to reconstruct analytically via computer vision. Alternatively, to allow fast and controllable reflectance and lighting editing, we developed a physically based decomposition through deep learned priors from path-traced portrait images. Previous approaches that used simplified material models or low-frequency or low-dynamic-range lighting struggled to model specular reflections or relight directly without intermediate decomposition. However, we estimate the surface normal, skin albedo and roughness, and high-frequency HDRI maps, and propose an architecture to estimate both diffuse and specular reflectance components. In our experiments, we show that this approach can represent the true appearance function more effectively than simpler baseline methods, leading to better generalization and higher-quality editing.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139084560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
APF-GAN: Exploring asymmetric pre-training and fine-tuning strategy for conditional generative adversarial network APF-GAN:探索条件生成式对抗网络的非对称预训练和微调策略
IF 6.9 3区 计算机科学 Q1 Computer Science Pub Date : 2023-11-30 DOI: 10.1007/s41095-023-0357-1
Yuxuan Li, Lingfeng Yang, Xiang Li
{"title":"APF-GAN: Exploring asymmetric pre-training and fine-tuning strategy for conditional generative adversarial network","authors":"Yuxuan Li, Lingfeng Yang, Xiang Li","doi":"10.1007/s41095-023-0357-1","DOIUrl":"https://doi.org/10.1007/s41095-023-0357-1","url":null,"abstract":"","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139078694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A unified multi-view multi-person tracking framework 统一的多视角多人跟踪框架
IF 6.9 3区 计算机科学 Q1 Computer Science Pub Date : 2023-11-30 DOI: 10.1007/s41095-023-0334-8
Fan Yang, Shigeyuki Odashima, Sosuke Yamao, Hiroaki Fujimoto, Shoichi Masui, Shan Jiang

Despite significant developments in 3D multi-view multi-person (3D MM) tracking, current frameworks separately target footprint tracking, or pose tracking. Frameworks designed for the former cannot be used for the latter, because they directly obtain 3D positions on the ground plane via a homography projection, which is inapplicable to 3D poses above the ground. In contrast, frameworks designed for pose tracking generally isolate multi-view and multi-frame associations and may not be sufficiently robust for footprint tracking, which utilizes fewer key points than pose tracking, weakening multi-view association cues in a single frame. This study presents a unified multi-view multi-person tracking framework to bridge the gap between footprint tracking and pose tracking. Without additional modifications, the framework can adopt monocular 2D bounding boxes and 2D poses as its input to produce robust 3D trajectories for multiple persons. Importantly, multi-frame and multi-view information are jointly employed to improve association and triangulation. Our framework is shown to provide state-of-the-art performance on the Campus and Shelf datasets for 3D pose tracking, with comparable results on the WILDTRACK and MMPTRACK datasets for 3D footprint tracking.

尽管在三维多视角多人(3D MM)跟踪方面取得了重大进展,但目前的框架仍分别针对足迹跟踪或姿势跟踪。为前者设计的框架不能用于后者,因为它们直接通过同构投影获得地平面上的三维位置,而这不适用于地面上的三维姿势。与此相反,为姿势跟踪设计的框架一般会隔离多视角和多帧关联,对于足迹跟踪可能不够稳健,因为足迹跟踪使用的关键点比姿势跟踪少,削弱了单帧中的多视角关联线索。本研究提出了一个统一的多视角多人跟踪框架,以弥补足迹跟踪和姿势跟踪之间的差距。无需额外修改,该框架可采用单目二维边界框和二维姿势作为输入,为多人生成稳健的三维轨迹。重要的是,多帧和多视角信息被联合用于改进关联和三角测量。研究表明,我们的框架在 Campus 和 Shelf 数据集的三维姿态跟踪方面具有最先进的性能,在 WILDTRACK 和 MMPTRACK 数据集的三维足迹跟踪方面也取得了不相上下的结果。
{"title":"A unified multi-view multi-person tracking framework","authors":"Fan Yang, Shigeyuki Odashima, Sosuke Yamao, Hiroaki Fujimoto, Shoichi Masui, Shan Jiang","doi":"10.1007/s41095-023-0334-8","DOIUrl":"https://doi.org/10.1007/s41095-023-0334-8","url":null,"abstract":"<p>Despite significant developments in 3D multi-view multi-person (3D MM) tracking, current frameworks separately target footprint tracking, or pose tracking. Frameworks designed for the former cannot be used for the latter, because they directly obtain 3D positions on the ground plane via a homography projection, which is inapplicable to 3D poses above the ground. In contrast, frameworks designed for pose tracking generally isolate multi-view and multi-frame associations and may not be sufficiently robust for footprint tracking, which utilizes fewer key points than pose tracking, weakening multi-view association cues in a single frame. This study presents a unified multi-view multi-person tracking framework to bridge the gap between footprint tracking and pose tracking. Without additional modifications, the framework can adopt monocular 2D bounding boxes and 2D poses as its input to produce robust 3D trajectories for multiple persons. Importantly, multi-frame and multi-view information are jointly employed to improve association and triangulation. Our framework is shown to provide state-of-the-art performance on the <i>Campus</i> and <i>Shelf</i> datasets for 3D pose tracking, with comparable results on the <i>WILDTRACK</i> and <i>MMPTRACK</i> datasets for 3D footprint tracking.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139078705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical vectorization for facial images 面部图像的分层矢量化
IF 6.9 3区 计算机科学 Q1 Computer Science Pub Date : 2023-11-30 DOI: 10.1007/s41095-022-0314-4
Qian Fu, Linlin Liu, Fei Hou, Ying He

The explosive growth of social media means portrait editing and retouching are in high demand. While portraits are commonly captured and stored as raster images, editing raster images is non-trivial and requires the user to be highly skilled. Aiming at developing intuitive and easy-to-use portrait editing tools, we propose a novel vectorization method that can automatically convert raster images into a 3-tier hierarchical representation. The base layer consists of a set of sparse diffusion curves (DCs) which characterize salient geometric features and low-frequency colors, providing a means for semantic color transfer and facial expression editing. The middle level encodes specular highlights and shadows as large, editable Poisson regions (PRs) and allows the user to directly adjust illumination by tuning the strength and changing the shapes of PRs. The top level contains two types of pixel-sized PRs for high-frequency residuals and fine details such as pimples and pigmentation. We train a deep generative model that can produce high-frequency residuals automatically. Thanks to the inherent meaning in vector primitives, editing portraits becomes easy and intuitive. In particular, our method supports color transfer, facial expression editing, highlight and shadow editing, and automatic retouching. To quantitatively evaluate the results, we extend the commonly used FLIP metric (which measures color and feature differences between two images) to consider illumination. The new metric, illumination-sensitive FLIP, can effectively capture salient changes in color transfer results, and is more consistent with human perception than FLIP and other quality measures for portrait images. We evaluate our method on the FFHQR dataset and show it to be effective for common portrait editing tasks, such as retouching, light editing, color transfer, and expression editing.

社交媒体的爆炸式增长意味着对肖像编辑和润饰的需求很大。虽然人像通常以光栅图像的形式采集和存储,但编辑光栅图像并非易事,需要用户具备高超的技能。为了开发直观易用的肖像编辑工具,我们提出了一种新颖的矢量化方法,可自动将光栅图像转换为三层分级表示。底层由一组稀疏的扩散曲线(DC)组成,这些曲线描述了突出的几何特征和低频色彩,为语义色彩转换和面部表情编辑提供了一种手段。中间层将镜面高光和阴影编码为大型、可编辑的泊松区域(PR),用户可以通过调整泊松区域的强度和形状直接调整光照度。顶层包含两类像素大小的 PR,分别用于处理高频残差以及痘痘和色素沉着等精细细节。我们训练的深度生成模型可以自动生成高频残差。得益于矢量基元的固有意义,人像编辑变得简单而直观。特别是,我们的方法支持色彩转换、面部表情编辑、高光和阴影编辑以及自动修饰。为了定量评估结果,我们扩展了常用的 FLIP 指标(用于测量两幅图像之间的颜色和特征差异),将光照也考虑在内。新指标--光照敏感 FLIP 能有效捕捉色彩转换结果中的显著变化,与 FLIP 和其他人像图像质量指标相比,更符合人类的感知。我们在 FFHQR 数据集上对我们的方法进行了评估,结果表明它对常见的人像编辑任务,如修饰、光线编辑、色彩转换和表情编辑等都很有效。
{"title":"Hierarchical vectorization for facial images","authors":"Qian Fu, Linlin Liu, Fei Hou, Ying He","doi":"10.1007/s41095-022-0314-4","DOIUrl":"https://doi.org/10.1007/s41095-022-0314-4","url":null,"abstract":"<p>The explosive growth of social media means portrait editing and retouching are in high demand. While portraits are commonly captured and stored as raster images, editing raster images is non-trivial and requires the user to be highly skilled. Aiming at developing intuitive and easy-to-use portrait editing tools, we propose a novel vectorization method that can automatically convert raster images into a 3-tier hierarchical representation. The base layer consists of a set of sparse diffusion curves (DCs) which characterize salient geometric features and low-frequency colors, providing a means for semantic color transfer and facial expression editing. The middle level encodes specular highlights and shadows as large, editable Poisson regions (PRs) and allows the user to directly adjust illumination by tuning the strength and changing the shapes of PRs. The top level contains two types of pixel-sized PRs for high-frequency residuals and fine details such as pimples and pigmentation. We train a deep generative model that can produce high-frequency residuals automatically. Thanks to the inherent meaning in vector primitives, editing portraits becomes easy and intuitive. In particular, our method supports color transfer, facial expression editing, highlight and shadow editing, and automatic retouching. To quantitatively evaluate the results, we extend the commonly used FLIP metric (which measures color and feature differences between two images) to consider illumination. The new metric, illumination-sensitive FLIP, can effectively capture salient changes in color transfer results, and is more consistent with human perception than FLIP and other quality measures for portrait images. We evaluate our method on the FFHQR dataset and show it to be effective for common portrait editing tasks, such as retouching, light editing, color transfer, and expression editing.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139078756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MusicFace: Music-driven expressive singing face synthesis 音乐脸谱音乐驱动的表情歌唱脸部合成
IF 6.9 3区 计算机科学 Q1 Computer Science Pub Date : 2023-11-30 DOI: 10.1007/s41095-023-0343-7
Pengfei Liu, Wenjin Deng, Hengda Li, Jintai Wang, Yinglin Zheng, Yiwei Ding, Xiaohu Guo, Ming Zeng

It remains an interesting and challenging problem to synthesize a vivid and realistic singing face driven by music. In this paper, we present a method for this task with natural motions for the lips, facial expression, head pose, and eyes. Due to the coupling of mixed information for the human voice and backing music in common music audio signals, we design a decouple-and-fuse strategy to tackle the challenge. We first decompose the input music audio into a human voice stream and a backing music stream. Due to the implicit and complicated correlation between the two-stream input signals and the dynamics of the facial expressions, head motions, and eye states, we model their relationship with an attention scheme, where the effects of the two streams are fused seamlessly. Furthermore, to improve the expressivenes of the generated results, we decompose head movement generation in terms of speed and direction, and decompose eye state generation into short-term blinking and long-term eye closing, modeling them separately. We have also built a novel dataset, SingingFace, to support training and evaluation of models for this task, including future work on this topic. Extensive experiments and a user study show that our proposed method is capable of synthesizing vivid singing faces, qualitatively and quantitatively better than the prior state-of-the-art.

在音乐的驱动下合成一张生动逼真的歌唱脸谱,仍然是一个有趣而具有挑战性的问题。在本文中,我们针对这一任务提出了一种方法,其中包括嘴唇、面部表情、头部姿势和眼睛的自然运动。由于普通音乐音频信号中人声和伴奏音乐的混合信息耦合在一起,我们设计了一种 "解耦-融合"(decouple-and-fuse)策略来应对这一挑战。我们首先将输入的音乐音频分解为人声流和伴奏音乐流。由于双流输入信号与面部表情、头部运动和眼部状态的动态之间存在着隐含而复杂的相关性,我们用注意力方案来模拟它们之间的关系,将双流的效果无缝地融合在一起。此外,为了提高生成结果的表现力,我们将头部动作的生成分解为速度和方向,将眼球状态的生成分解为短期眨眼和长期闭眼,并分别对它们进行建模。我们还建立了一个新颖的数据集--SingingFace,以支持对这一任务的模型进行训练和评估,包括未来在这一主题上的工作。广泛的实验和用户研究表明,我们提出的方法能够合成生动的歌唱面孔,在质量和数量上都优于之前的先进水平。
{"title":"MusicFace: Music-driven expressive singing face synthesis","authors":"Pengfei Liu, Wenjin Deng, Hengda Li, Jintai Wang, Yinglin Zheng, Yiwei Ding, Xiaohu Guo, Ming Zeng","doi":"10.1007/s41095-023-0343-7","DOIUrl":"https://doi.org/10.1007/s41095-023-0343-7","url":null,"abstract":"<p>It remains an interesting and challenging problem to synthesize a vivid and realistic singing face driven by music. In this paper, we present a method for this task with natural motions for the lips, facial expression, head pose, and eyes. Due to the coupling of mixed information for the human voice and backing music in common music audio signals, we design a decouple-and-fuse strategy to tackle the challenge. We first decompose the input music audio into a human voice stream and a backing music stream. Due to the implicit and complicated correlation between the two-stream input signals and the dynamics of the facial expressions, head motions, and eye states, we model their relationship with an attention scheme, where the effects of the two streams are fused seamlessly. Furthermore, to improve the expressivenes of the generated results, we decompose head movement generation in terms of speed and direction, and decompose eye state generation into short-term blinking and long-term eye closing, modeling them separately. We have also built a novel dataset, SingingFace, to support training and evaluation of models for this task, including future work on this topic. Extensive experiments and a user study show that our proposed method is capable of synthesizing vivid singing faces, qualitatively and quantitatively better than the prior state-of-the-art.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139079090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
3D hand pose and shape estimation from monocular RGB via efficient 2D cues 通过高效的二维线索从单目 RGB 进行三维手部姿势和形状估计
IF 6.9 3区 计算机科学 Q1 Computer Science Pub Date : 2023-11-30 DOI: 10.1007/s41095-023-0346-4
Fenghao Zhang, Lin Zhao, Shengling Li, Wanjuan Su, Liman Liu, Wenbing Tao

Estimating 3D hand shape from a single-view RGB image is important for many applications. However, the diversity of hand shapes and postures, depth ambiguity, and occlusion may result in pose errors and noisy hand meshes. Making full use of 2D cues such as 2D pose can effectively improve the quality of 3D human hand shape estimation. In this paper, we use 2D joint heatmaps to obtain spatial details for robust pose estimation. We also introduce a depth-independent 2D mesh to avoid depth ambiguity in mesh regression for efficient hand-image alignment. Our method has four cascaded stages: 2D cue extraction, pose feature encoding, initial reconstruction, and reconstruction refinement. Specifically, we first encode the image to determine semantic features during 2D cue extraction; this is also used to predict hand joints and for segmentation. Then, during the pose feature encoding stage, we use a hand joints encoder to learn spatial information from the joint heatmaps. Next, a coarse 3D hand mesh and 2D mesh are obtained in the initial reconstruction step; a mesh squeeze-and-excitation block is used to fuse different hand features to enhance perception of 3D hand structures. Finally, a global mesh refinement stage learns non-local relations between vertices of the hand mesh from the predicted 2D mesh, to predict an offset hand mesh to fine-tune the reconstruction results. Quantitative and qualitative results on the FreiHAND benchmark dataset demonstrate that our approach achieves state-of-the-art performance.

根据单视角 RGB 图像估计三维手形对许多应用都很重要。然而,手部形状和姿势的多样性、深度模糊性和遮挡可能会导致姿势错误和手部网格噪声。充分利用二维姿势等二维线索可以有效提高三维人体手形估计的质量。在本文中,我们利用二维联合热图来获取空间细节,从而实现稳健的姿势估计。我们还引入了与深度无关的二维网格,以避免网格回归中的深度模糊,从而实现高效的手部图像配准。我们的方法有四个级联阶段:二维线索提取、姿势特征编码、初始重建和重建完善。具体来说,我们首先对图像进行编码,以便在二维线索提取过程中确定语义特征;这也用于预测手部关节和进行分割。然后,在姿势特征编码阶段,我们使用手部关节编码器从关节热图中学习空间信息。接着,在初始重建步骤中获得粗略的三维手部网格和二维网格;网格挤压和激发块用于融合不同的手部特征,以增强对三维手部结构的感知。最后,全局网格细化阶段从预测的 2D 网格中学习手部网格顶点之间的非局部关系,从而预测偏移手部网格,对重建结果进行微调。FreiHAND 基准数据集的定量和定性结果表明,我们的方法达到了最先进的性能。
{"title":"3D hand pose and shape estimation from monocular RGB via efficient 2D cues","authors":"Fenghao Zhang, Lin Zhao, Shengling Li, Wanjuan Su, Liman Liu, Wenbing Tao","doi":"10.1007/s41095-023-0346-4","DOIUrl":"https://doi.org/10.1007/s41095-023-0346-4","url":null,"abstract":"<p>Estimating 3D hand shape from a single-view RGB image is important for many applications. However, the diversity of hand shapes and postures, depth ambiguity, and occlusion may result in pose errors and noisy hand meshes. Making full use of 2D cues such as 2D pose can effectively improve the quality of 3D human hand shape estimation. In this paper, we use 2D joint heatmaps to obtain spatial details for robust pose estimation. We also introduce a depth-independent 2D mesh to avoid depth ambiguity in mesh regression for efficient hand-image alignment. Our method has four cascaded stages: 2D cue extraction, pose feature encoding, initial reconstruction, and reconstruction refinement. Specifically, we first encode the image to determine semantic features during 2D cue extraction; this is also used to predict hand joints and for segmentation. Then, during the pose feature encoding stage, we use a hand joints encoder to learn spatial information from the joint heatmaps. Next, a coarse 3D hand mesh and 2D mesh are obtained in the initial reconstruction step; a mesh squeeze-and-excitation block is used to fuse different hand features to enhance perception of 3D hand structures. Finally, a global mesh refinement stage learns non-local relations between vertices of the hand mesh from the predicted 2D mesh, to predict an offset hand mesh to fine-tune the reconstruction results. Quantitative and qualitative results on the FreiHAND benchmark dataset demonstrate that our approach achieves state-of-the-art performance.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139078754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A causal convolutional neural network for multi-subject motion modeling and generation 基于因果卷积神经网络的多主体运动建模与生成
IF 6.9 3区 计算机科学 Q1 Computer Science Pub Date : 2023-11-30 DOI: 10.1007/s41095-022-0307-3
Shuaiying Hou, Congyi Wang, Wenlin Zhuang, Yu Chen, Yangang Wang, Hujun Bao, Jinxiang Chai, Weiwei Xu

Inspired by the success of WaveNet in multi-subject speech synthesis, we propose a novel neural network based on causal convolutions for multi-subject motion modeling and generation. The network can capture the intrinsic characteristics of the motion of different subjects, such as the influence of skeleton scale variation on motion style. Moreover, after fine-tuning the network using a small motion dataset for a novel skeleton that is not included in the training dataset, it is able to synthesize high-quality motions with a personalized style for the novel skeleton. The experimental results demonstrate that our network can model the intrinsic characteristics of motions well and can be applied to various motion modeling and synthesis tasks.

受WaveNet在多主体语音合成中的成功启发,我们提出了一种基于因果卷积的神经网络,用于多主体运动建模和生成。该网络可以捕捉不同主体运动的内在特征,如骨架尺度变化对运动风格的影响。此外,在使用小型运动数据集对网络进行微调后,该网络可以针对未包含在训练数据集中的新骨架合成具有个性化风格的高质量运动。实验结果表明,该网络可以很好地模拟运动的内在特征,可以应用于各种运动建模和合成任务。
{"title":"A causal convolutional neural network for multi-subject motion modeling and generation","authors":"Shuaiying Hou, Congyi Wang, Wenlin Zhuang, Yu Chen, Yangang Wang, Hujun Bao, Jinxiang Chai, Weiwei Xu","doi":"10.1007/s41095-022-0307-3","DOIUrl":"https://doi.org/10.1007/s41095-022-0307-3","url":null,"abstract":"<p>Inspired by the success of WaveNet in multi-subject speech synthesis, we propose a novel neural network based on causal convolutions for multi-subject motion modeling and generation. The network can capture the intrinsic characteristics of the motion of different subjects, such as the influence of skeleton scale variation on motion style. Moreover, after fine-tuning the network using a small motion dataset for a novel skeleton that is not included in the training dataset, it is able to synthesize high-quality motions with a personalized style for the novel skeleton. The experimental results demonstrate that our network can model the intrinsic characteristics of motions well and can be applied to various motion modeling and synthesis tasks.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138494583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A visual modeling method for spatiotemporal and multidimensional features in epidemiological analysis: Applied COVID-19 aggregated datasets 流行病学分析中时空和多维特征的可视化建模方法:应用COVID-19汇总数据集
IF 6.9 3区 计算机科学 Q1 Computer Science Pub Date : 2023-11-30 DOI: 10.1007/s41095-023-0353-5
Yu Dong, Christy Jie Liang, Yi Chen, Jie Hua

The visual modeling method enables flexible interactions with rich graphical depictions of data and supports the exploration of the complexities of epidemiological analysis. However, most epidemiology visualizations do not support the combined analysis of objective factors that might influence the transmission situation, resulting in a lack of quantitative and qualitative evidence. To address this issue, we developed a portrait-based visual modeling method called +msRNAer. This method considers the spatiotemporal features of virus transmission patterns and multidimensional features of objective risk factors in communities, enabling portrait-based exploration and comparison in epidemiological analysis. We applied +msRNAer to aggregate COVID-19-related datasets in New South Wales, Australia, combining COVID-19 case number trends, geo-information, intervention events, and expert-supervised risk factors extracted from local government area-based censuses. We perfected the +msRNAer workflow with collaborative views and evaluated its feasibility, effectiveness, and usefulness through one user study and three subject-driven case studies. Positive feedback from experts indicates that +msRNAer provides a general understanding for analyzing comprehension that not only compares relationships between cases in time-varying and risk factors through portraits but also supports navigation in fundamental geographical, timeline, and other factor comparisons. By adopting interactions, experts discovered functional and practical implications for potential patterns of long-standing community factors regarding the vulnerability faced by the pandemic. Experts confirmed that +msRNAer is expected to deliver visual modeling benefits with spatiotemporal and multidimensional features in other epidemiological analysis scenarios.

可视化建模方法能够与丰富的数据图形描述进行灵活互动,并支持探索复杂的流行病学分析。然而,大多数流行病学可视化方法不支持对可能影响传播情况的客观因素进行综合分析,从而导致缺乏定量和定性证据。为了解决这个问题,我们开发了一种基于肖像的可视化建模方法,名为 +msRNAer。该方法考虑了病毒传播模式的时空特征和社区中客观风险因素的多维特征,可在流行病学分析中进行基于肖像的探索和比较。我们将 +msRNAer 应用于澳大利亚新南威尔士州 COVID-19 相关数据集的汇总,结合了 COVID-19 病例数趋势、地理信息、干预事件以及从地方政府地区人口普查中提取的专家监督风险因素。我们利用协作视图完善了 +msRNAer 工作流程,并通过一项用户研究和三项主题驱动的案例研究评估了其可行性、有效性和实用性。专家们的积极反馈表明,+msRNAer 为分析理解提供了一种一般理解,不仅通过肖像比较了病例之间在时间变化和风险因素方面的关系,而且还支持在基本地理、时间线和其他因素比较方面的导航。通过采用交互方式,专家们发现了长期存在的社区因素对大流行病所面临的脆弱性的潜在模式的功能和实际影响。专家们确认,+msRNAer 预计将在其他流行病学分析方案中提供具有时空和多维特征的可视化建模优势。
{"title":"A visual modeling method for spatiotemporal and multidimensional features in epidemiological analysis: Applied COVID-19 aggregated datasets","authors":"Yu Dong, Christy Jie Liang, Yi Chen, Jie Hua","doi":"10.1007/s41095-023-0353-5","DOIUrl":"https://doi.org/10.1007/s41095-023-0353-5","url":null,"abstract":"<p>The visual modeling method enables flexible interactions with rich graphical depictions of data and supports the exploration of the complexities of epidemiological analysis. However, most epidemiology visualizations do not support the combined analysis of objective factors that might influence the transmission situation, resulting in a lack of quantitative and qualitative evidence. To address this issue, we developed a portrait-based visual modeling method called <i>+msRNAer</i>. This method considers the spatiotemporal features of virus transmission patterns and multidimensional features of objective risk factors in communities, enabling portrait-based exploration and comparison in epidemiological analysis. We applied <i>+msRNAer</i> to aggregate COVID-19-related datasets in New South Wales, Australia, combining COVID-19 case number trends, geo-information, intervention events, and expert-supervised risk factors extracted from local government area-based censuses. We perfected the <i>+msRNAer</i> workflow with collaborative views and evaluated its feasibility, effectiveness, and usefulness through one user study and three subject-driven case studies. Positive feedback from experts indicates that <i>+msRNAer</i> provides a general understanding for analyzing comprehension that not only compares relationships between cases in time-varying and risk factors through portraits but also supports navigation in fundamental geographical, timeline, and other factor comparisons. By adopting interactions, experts discovered functional and practical implications for potential patterns of long-standing community factors regarding the vulnerability faced by the pandemic. Experts confirmed that <i>+msRNAer</i> is expected to deliver visual modeling benefits with spatiotemporal and multidimensional features in other epidemiological analysis scenarios.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139078759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
6DOF pose estimation of a 3D rigid object based on edge-enhanced point pair features 基于边缘增强点对特征的三维刚体6DOF位姿估计
IF 6.9 3区 计算机科学 Q1 Computer Science Pub Date : 2023-11-30 DOI: 10.1007/s41095-022-0308-2
Chenyi Liu, Fei Chen, Lu Deng, Renjiao Yi, Lintao Zheng, Chenyang Zhu, Jia Wang, Kai Xu

The point pair feature (PPF) is widely used for 6D pose estimation. In this paper, we propose an efficient 6D pose estimation method based on the PPF framework. We introduce a well-targeted down-sampling strategy that focuses on edge areas for efficient feature extraction for complex geometry. A pose hypothesis validation approach is proposed to resolve ambiguity due to symmetry by calculating the edge matching degree. We perform evaluations on two challenging datasets and one real-world collected dataset, demonstrating the superiority of our method for pose estimation for geometrically complex, occluded, symmetrical objects. We further validate our method by applying it to simulated punctures.

点对特征(PPF)被广泛用于6D姿态估计。本文提出了一种基于PPF框架的高效6D姿态估计方法。我们引入了一种目标明确的下采样策略,该策略侧重于边缘区域,以便对复杂几何图形进行有效的特征提取。提出了一种姿态假设验证方法,通过计算边缘匹配度来解决由于对称引起的模糊问题。我们对两个具有挑战性的数据集和一个真实世界收集的数据集进行了评估,证明了我们的方法在几何复杂、遮挡、对称物体的姿态估计方面的优越性。我们通过将其应用于模拟穿孔进一步验证了我们的方法。
{"title":"6DOF pose estimation of a 3D rigid object based on edge-enhanced point pair features","authors":"Chenyi Liu, Fei Chen, Lu Deng, Renjiao Yi, Lintao Zheng, Chenyang Zhu, Jia Wang, Kai Xu","doi":"10.1007/s41095-022-0308-2","DOIUrl":"https://doi.org/10.1007/s41095-022-0308-2","url":null,"abstract":"<p>The point pair feature (PPF) is widely used for 6D pose estimation. In this paper, we propose an efficient 6D pose estimation method based on the PPF framework. We introduce a well-targeted down-sampling strategy that focuses on edge areas for efficient feature extraction for complex geometry. A pose hypothesis validation approach is proposed to resolve ambiguity due to symmetry by calculating the edge matching degree. We perform evaluations on two challenging datasets and one real-world collected dataset, demonstrating the superiority of our method for pose estimation for geometrically complex, occluded, symmetrical objects. We further validate our method by applying it to simulated punctures.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138494584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A survey on facial image deblurring 人脸图像去模糊研究进展
IF 6.9 3区 计算机科学 Q1 Computer Science Pub Date : 2023-11-30 DOI: 10.1007/s41095-023-0336-6
Bingnan Wang, Fanjiang Xu, Quan Zheng

When a facial image is blurred, it significantly affects high-level vision tasks such as face recognition. The purpose of facial image deblurring is to recover a clear image from a blurry input image, which can improve the recognition accuracy, etc. However, general deblurring methods do not perform well on facial images. Therefore, some face deblurring methods have been proposed to improve performance by adding semantic or structural information as specific priors according to the characteristics of the facial images. In this paper, we survey and summarize recently published methods for facial image deblurring, most of which are based on deep learning. First, we provide a brief introduction to the modeling of image blurring. Next, we summarize face deblurring methods into two categories: model-based methods and deep learning-based methods. Furthermore, we summarize the datasets, loss functions, and performance evaluation metrics commonly used in the neural network training process. We show the performance of classical methods on these datasets and metrics and provide a brief discussion on the differences between model-based and learning-based methods. Finally, we discuss the current challenges and possible future research directions.

当面部图像被模糊时,它会严重影响高级视觉任务,如面部识别。人脸图像去模糊的目的是从模糊的输入图像中恢复出清晰的图像,从而提高识别精度等。然而,一般的去模糊方法在面部图像上表现不佳。因此,人们提出了一些人脸去模糊方法,根据人脸图像的特点,通过添加语义或结构信息作为特定的先验来提高性能。在本文中,我们调查和总结了最近发表的面部图像去模糊的方法,其中大多数是基于深度学习的。首先,我们简要介绍了图像模糊的建模。接下来,我们将人脸去模糊方法归纳为两类:基于模型的方法和基于深度学习的方法。此外,我们总结了神经网络训练过程中常用的数据集、损失函数和性能评估指标。我们展示了经典方法在这些数据集和指标上的性能,并简要讨论了基于模型和基于学习的方法之间的差异。最后,讨论了当前面临的挑战和未来可能的研究方向。
{"title":"A survey on facial image deblurring","authors":"Bingnan Wang, Fanjiang Xu, Quan Zheng","doi":"10.1007/s41095-023-0336-6","DOIUrl":"https://doi.org/10.1007/s41095-023-0336-6","url":null,"abstract":"<p>When a facial image is blurred, it significantly affects high-level vision tasks such as face recognition. The purpose of facial image deblurring is to recover a clear image from a blurry input image, which can improve the recognition accuracy, etc. However, general deblurring methods do not perform well on facial images. Therefore, some face deblurring methods have been proposed to improve performance by adding semantic or structural information as specific priors according to the characteristics of the facial images. In this paper, we survey and summarize recently published methods for facial image deblurring, most of which are based on deep learning. First, we provide a brief introduction to the modeling of image blurring. Next, we summarize face deblurring methods into two categories: model-based methods and deep learning-based methods. Furthermore, we summarize the datasets, loss functions, and performance evaluation metrics commonly used in the neural network training process. We show the performance of classical methods on these datasets and metrics and provide a brief discussion on the differences between model-based and learning-based methods. Finally, we discuss the current challenges and possible future research directions.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138517018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational Visual Media
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1