2019 IEEE Winter Conference on Applications of Computer Vision (WACV)最新文献

英文中文

Ancient Painting to Natural Image: A New Solution for Painting Processing 古代绘画到自然形象:绘画加工的新解决方案

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2019-01-02 DOI: 10.1109/WACV.2019.00061

Tingting Qiao, Weijing Zhang, Miao Zhang, Zixuan Ma, Duanqing Xu

Collecting a large-scale and well-annotated dataset for image processing has become a common practice in computer vision. However, in the ancient painting area, this task is not practical as the number of paintings is limited and their style is greatly diverse. We, therefore, propose a novel solution for the problems that come with ancient painting processing. This is to use domain transfer to convert ancient paintings to photo-realistic natural images. By doing so, the "ancient painting processing problems" become "natural image processing problems" and models trained on natural images can be directly applied to the transferred paintings. Specifically, we focus on Chinese ancient flower, bird and landscape paintings in this work. A novel Domain Style Transfer Network (DSTN) is proposed to transfer ancient paintings to natural images which employ a compound loss to ensure that the transferred paintings still maintain the color composition and content of the input paintings. The experiment results show that the transferred paintings generated by the DSTN have a better performance in both the human perceptual test and other image processing tasks than other state-of-arts methods, indicating the authenticity of the transferred paintings and the superiority of the proposed method.

在计算机视觉领域，收集大规模且标注良好的数据集用于图像处理已经成为一种常见的做法。然而，在古代绘画领域，这一任务并不现实，因为绘画数量有限，风格各异。因此，我们提出了一种解决古代绘画加工问题的新方法。这是利用域转移将古代绘画转化为逼真的自然图像。这样，“古画处理问题”就变成了“自然图像处理问题”，在自然图像上训练的模型可以直接应用到转移的绘画上。具体来说，我们在这个作品中关注的是中国古代花鸟和山水画。提出了一种新的领域风格转移网络(Domain Style Transfer Network，简称DSTN)，将古代绘画转换为自然图像，采用复合损失的方法，保证转换后的绘画仍然保持输入绘画的色彩组成和内容。实验结果表明，通过DSTN生成的转移画作在人类感知测试和其他图像处理任务中都比其他最先进的方法有更好的表现，表明了转移画作的真实性和所提出方法的优越性。

{"title":"Ancient Painting to Natural Image: A New Solution for Painting Processing","authors":"Tingting Qiao, Weijing Zhang, Miao Zhang, Zixuan Ma, Duanqing Xu","doi":"10.1109/WACV.2019.00061","DOIUrl":"https://doi.org/10.1109/WACV.2019.00061","url":null,"abstract":"Collecting a large-scale and well-annotated dataset for image processing has become a common practice in computer vision. However, in the ancient painting area, this task is not practical as the number of paintings is limited and their style is greatly diverse. We, therefore, propose a novel solution for the problems that come with ancient painting processing. This is to use domain transfer to convert ancient paintings to photo-realistic natural images. By doing so, the \"ancient painting processing problems\" become \"natural image processing problems\" and models trained on natural images can be directly applied to the transferred paintings. Specifically, we focus on Chinese ancient flower, bird and landscape paintings in this work. A novel Domain Style Transfer Network (DSTN) is proposed to transfer ancient paintings to natural images which employ a compound loss to ensure that the transferred paintings still maintain the color composition and content of the input paintings. The experiment results show that the transferred paintings generated by the DSTN have a better performance in both the human perceptual test and other image processing tasks than other state-of-arts methods, indicating the authenticity of the transferred paintings and the superiority of the proposed method.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"41 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133021378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

TextCaps: Handwritten Character Recognition With Very Small Datasets 用非常小的数据集手写字符识别

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00033

Vinoj Jayasundara, S. Jayasekara, Hirunima Jayasekara, Jathushan Rajasegaran, Suranga Seneviratne, R. Rodrigo

Many localized languages struggle to reap the benefits of recent advancements in character recognition systems due to the lack of substantial amount of labeled training data. This is due to the difficulty in generating large amounts of labeled data for such languages and inability of deep learning techniques to properly learn from small number of training samples. We solve this problem by introducing a technique of generating new training samples from the existing samples, with realistic augmentations which reflect actual variations that are present in human hand writing, by adding random controlled noise to their corresponding instantiation parameters. Our results with a mere 200 training samples per class surpass existing character recognition results in the EMNIST-letter dataset while achieving the existing results in the three datasets: EMNIST-balanced, EMNIST-digits, and MNIST. We also develop a strategy to effectively use a combination of loss functions to improve reconstructions. Our system is useful in character recognition for localized languages that lack much labeled training data and even in other related more general contexts such as object recognition.

由于缺乏大量的标记训练数据，许多本地化语言难以从字符识别系统的最新进展中获益。这是由于难以为这些语言生成大量标记数据，以及深度学习技术无法从少量训练样本中正确学习。我们通过引入一种技术来解决这个问题，该技术从现有样本中生成新的训练样本，并通过向其相应的实例化参数中添加随机控制噪声来反映人类手写中存在的实际变化。我们每个类仅使用200个训练样本的结果超过了emnist -字母数据集中现有的字符识别结果，同时达到了三个数据集中的现有结果:emnist -平衡、emnist -数字和MNIST。我们还开发了一种策略来有效地使用损失函数的组合来改进重建。我们的系统在缺乏标记训练数据的局部语言的字符识别中很有用，甚至在其他相关的更一般的上下文中也很有用，比如物体识别。

{"title":"TextCaps: Handwritten Character Recognition With Very Small Datasets","authors":"Vinoj Jayasundara, S. Jayasekara, Hirunima Jayasekara, Jathushan Rajasegaran, Suranga Seneviratne, R. Rodrigo","doi":"10.1109/WACV.2019.00033","DOIUrl":"https://doi.org/10.1109/WACV.2019.00033","url":null,"abstract":"Many localized languages struggle to reap the benefits of recent advancements in character recognition systems due to the lack of substantial amount of labeled training data. This is due to the difficulty in generating large amounts of labeled data for such languages and inability of deep learning techniques to properly learn from small number of training samples. We solve this problem by introducing a technique of generating new training samples from the existing samples, with realistic augmentations which reflect actual variations that are present in human hand writing, by adding random controlled noise to their corresponding instantiation parameters. Our results with a mere 200 training samples per class surpass existing character recognition results in the EMNIST-letter dataset while achieving the existing results in the three datasets: EMNIST-balanced, EMNIST-digits, and MNIST. We also develop a strategy to effectively use a combination of loss functions to improve reconstructions. Our system is useful in character recognition for localized languages that lack much labeled training data and even in other related more general contexts such as object recognition.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121956299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 48

High-Speed Video from Asynchronous Camera Array 异步摄像机阵列的高速视频

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00237

Si Lu

This paper presents a method for capturing high-speed video using an asynchronous camera array. Our method sequentially fires each sensor in a camera array with a small time offset and assembles captured frames into a high-speed video according to the time stamps. The resulting video, however, suffers from parallax jittering caused by the viewpoint difference among sensors in the camera array. To address this problem, we develop a dedicated novel view synthesis algorithm that transforms the video frames as if they were captured by a single reference sensor. Specifically, for any frame from a non-reference sensor, we find the two temporally neighboring frames captured by the reference sensor. Using these three frames, we render a new frame with the same time stamp as the non-reference frame but from the viewpoint of the reference sensor. Specifically, we segment these frames into super-pixels and then apply local content-preserving warping to warp them to form the new frame. We employ a multi-label Markov Random Field method to blend these warped frames. Our experiments show that our method can produce high-quality and high-speed video of a wide variety of scenes with large parallax, scene dynamics, and camera motion and outperforms several baseline and state-of-the-art approaches.

本文提出了一种利用异步摄像机阵列捕捉高速视频的方法。我们的方法以小的时间偏移顺序触发相机阵列中的每个传感器，并根据时间戳将捕获的帧组装成高速视频。然而，由此产生的视频会受到摄像机阵列中传感器视点差异引起的视差抖动的影响。为了解决这个问题，我们开发了一种专用的新颖视图合成算法，可以将视频帧转换为单个参考传感器捕获的视频帧。具体来说，对于来自非参考传感器的任何帧，我们找到由参考传感器捕获的两个时间相邻帧。利用这三帧，我们从参考传感器的角度呈现了一个与非参考帧具有相同时间戳的新帧。具体来说，我们将这些帧分割成超像素，然后应用局部内容保留扭曲来扭曲它们以形成新帧。我们采用多标签马尔可夫随机场方法来混合这些扭曲的帧。我们的实验表明，我们的方法可以产生具有大视差，场景动态和摄像机运动的各种场景的高质量和高速视频，并且优于几种基线和最先进的方法。

{"title":"High-Speed Video from Asynchronous Camera Array","authors":"Si Lu","doi":"10.1109/WACV.2019.00237","DOIUrl":"https://doi.org/10.1109/WACV.2019.00237","url":null,"abstract":"This paper presents a method for capturing high-speed video using an asynchronous camera array. Our method sequentially fires each sensor in a camera array with a small time offset and assembles captured frames into a high-speed video according to the time stamps. The resulting video, however, suffers from parallax jittering caused by the viewpoint difference among sensors in the camera array. To address this problem, we develop a dedicated novel view synthesis algorithm that transforms the video frames as if they were captured by a single reference sensor. Specifically, for any frame from a non-reference sensor, we find the two temporally neighboring frames captured by the reference sensor. Using these three frames, we render a new frame with the same time stamp as the non-reference frame but from the viewpoint of the reference sensor. Specifically, we segment these frames into super-pixels and then apply local content-preserving warping to warp them to form the new frame. We employ a multi-label Markov Random Field method to blend these warped frames. Our experiments show that our method can produce high-quality and high-speed video of a wide variety of scenes with large parallax, scene dynamics, and camera motion and outperforms several baseline and state-of-the-art approaches.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130389107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Attentive Conditional Channel-Recurrent Autoencoding for Attribute-Conditioned Face Synthesis 面向属性条件人脸合成的注意条件通道递归自编码

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00168

Wenling Shang, Kihyuk Sohn

Attribute-conditioned face synthesis has many potential use cases, such as to aid the identification of a suspect or a missing person. Building on top of a conditional version of VAE-GAN, we augment the pathways connecting the latent space with channel-recurrent architecture, in order to provide not only improved generation qualities but also interpretable high-level features. In particular, to better achieve the latter, we further propose an attention mechanism over each attribute to indicate the specific latent subset responsible for its modulation. Thanks to the latent semantics formed via the channel-recurreny, we envision a tool that takes the desired attributes as inputs and then performs a 2-stage general-to-specific generation of diverse and realistic faces. Lastly, we incorporate the progressive-growth training scheme to the inference, generation and discriminator networks of our models to facilitate higher resolution outputs. Evaluations are performed through both qualitative visual examination and quantitative metrics, namely inception scores, human preferences, and attribute classification accuracy.

属性条件人脸合成有许多潜在的用例，例如帮助识别嫌疑人或失踪人员。在VAE-GAN的条件版本的基础上，我们用通道循环架构增强了连接潜在空间的路径，不仅提供了改进的生成质量，还提供了可解释的高级特征。特别是，为了更好地实现后者，我们进一步提出了每个属性的注意机制，以指示负责其调制的特定潜在子集。由于通过通道递归形成的潜在语义，我们设想了一种工具，它将所需的属性作为输入，然后执行从一般到特定的两阶段生成多样化和逼真的面孔。最后，我们将渐进式增长训练方案结合到模型的推理、生成和判别器网络中，以促进更高分辨率的输出。评估是通过定性视觉检查和定量度量来执行的，即初始分数、人类偏好和属性分类准确性。

{"title":"Attentive Conditional Channel-Recurrent Autoencoding for Attribute-Conditioned Face Synthesis","authors":"Wenling Shang, Kihyuk Sohn","doi":"10.1109/WACV.2019.00168","DOIUrl":"https://doi.org/10.1109/WACV.2019.00168","url":null,"abstract":"Attribute-conditioned face synthesis has many potential use cases, such as to aid the identification of a suspect or a missing person. Building on top of a conditional version of VAE-GAN, we augment the pathways connecting the latent space with channel-recurrent architecture, in order to provide not only improved generation qualities but also interpretable high-level features. In particular, to better achieve the latter, we further propose an attention mechanism over each attribute to indicate the specific latent subset responsible for its modulation. Thanks to the latent semantics formed via the channel-recurreny, we envision a tool that takes the desired attributes as inputs and then performs a 2-stage general-to-specific generation of diverse and realistic faces. Lastly, we incorporate the progressive-growth training scheme to the inference, generation and discriminator networks of our models to facilitate higher resolution outputs. Evaluations are performed through both qualitative visual examination and quantitative metrics, namely inception scores, human preferences, and attribute classification accuracy.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126271720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

VelocityGAN: Subsurface Velocity Image Estimation Using Conditional Adversarial Networks 使用条件对抗网络的地下速度图像估计

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00080

Zhongping Zhang, Yue Wu, Zheng Zhou, Youzuo Lin

Acoustic-and elastic-waveform inversion is an important and widely used method to reconstruct subsurface velocity image. Waveform inversion is a typical non-linear and ill-posed inverse problem. Existing physics-driven computational methods for solving waveform inversion suffer from the cycle skipping and local minima issues, and not to mention solving waveform inversion is computationally expensive. In this paper, we developed a real-time datadriven technique, VelocityGAN, to accurately reconstruct subsurface velocities. Our VelocityGAN is an end-to-end framework which can generate high-quality velocity images directly from the raw seismic waveform data. A series of experiments are conducted on the synthetic seismic reflection data to evaluate the effectiveness and efficiency of VelocityGAN. We not only compare it with existing physics-driven approaches but also choose some deep learning frameworks as our data-driven baselines. The experiment results show that VelocityGAN outperforms the physics-driven waveform inversion methods and achieves the state-of-the-art performance among data-driven baselines.

声弹波形反演是一种重要而广泛应用的地下速度图像重建方法。波形反演是典型的非线性不适定反问题。现有的求解波形反演的物理驱动计算方法存在周期跳变和局部最小值问题，而且求解波形反演的计算成本很高。在本文中，我们开发了一种实时数据驱动技术，VelocityGAN，以准确地重建地下速度。我们的VelocityGAN是一个端到端框架，可以直接从原始地震波形数据生成高质量的速度图像。利用地震反射合成数据进行了一系列实验，以评价VelocityGAN的有效性和效率。我们不仅将其与现有的物理驱动方法进行比较，而且还选择了一些深度学习框架作为我们的数据驱动基线。实验结果表明，VelocityGAN优于物理驱动的波形反演方法，在数据驱动的基线中达到了最先进的性能。

{"title":"VelocityGAN: Subsurface Velocity Image Estimation Using Conditional Adversarial Networks","authors":"Zhongping Zhang, Yue Wu, Zheng Zhou, Youzuo Lin","doi":"10.1109/WACV.2019.00080","DOIUrl":"https://doi.org/10.1109/WACV.2019.00080","url":null,"abstract":"Acoustic-and elastic-waveform inversion is an important and widely used method to reconstruct subsurface velocity image. Waveform inversion is a typical non-linear and ill-posed inverse problem. Existing physics-driven computational methods for solving waveform inversion suffer from the cycle skipping and local minima issues, and not to mention solving waveform inversion is computationally expensive. In this paper, we developed a real-time datadriven technique, VelocityGAN, to accurately reconstruct subsurface velocities. Our VelocityGAN is an end-to-end framework which can generate high-quality velocity images directly from the raw seismic waveform data. A series of experiments are conducted on the synthetic seismic reflection data to evaluate the effectiveness and efficiency of VelocityGAN. We not only compare it with existing physics-driven approaches but also choose some deep learning frameworks as our data-driven baselines. The experiment results show that VelocityGAN outperforms the physics-driven waveform inversion methods and achieves the state-of-the-art performance among data-driven baselines.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121930019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Taylor Convolutional Networks for Image Classification 图像分类的泰勒卷积网络

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00140

Xiaodi Wang, Ce Li, Yipeng Mou, Baochang Zhang, J. Han, Jianzhuang Liu

This paper provides a new perspective to understand CNNs based on the Taylor expansion, leading to new Taylor Convolutional Networks (TaylorNets) for image classification. We introduce a principled combination of the high frequency information (i.e., detailed information) and low frequency information in the end-to-end TaylorNets, based on a nonlinear combination of the convolutional feature maps. The steerable module developed in TaylorNets is generic, which can be easily integrated into well-known deep architectures and learned within the same pipeline of the back propagation algorithm, yielding a higher representation capacity for CNNs. Extensive experimental results demonstrate the super capability of our TaylorNets which improve widely used CNNs architectures, such as conventional CNNs and ResNet, in terms of object classification accuracy on well-known benchmarks. The code will be publicly available.

本文提供了一个新的视角来理解基于泰勒展开的cnn，导致新的泰勒卷积网络(TaylorNets)用于图像分类。基于卷积特征映射的非线性组合，我们在端到端泰勒网络中引入了高频信息(即详细信息)和低频信息的原则组合。TaylorNets开发的可操纵模块是通用的，可以很容易地集成到已知的深度架构中，并在反向传播算法的同一管道中学习，从而为cnn提供更高的表示能力。大量的实验结果表明，我们的TaylorNets在众所周知的基准上提高了广泛使用的cnn架构(如传统cnn和ResNet)在对象分类精度方面的超强能力。代码将是公开的。

引用次数: 1

On Measuring the Iconicity of a Face 论测量人脸的象似性

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00231

Prithviraj Dhar, C. Castillo, R. Chellappa

For a given identity in a face dataset, there are certain iconic images which are more representative of the subject than others. In this paper, we explore the problem of computing the iconicity of a face. The premise of the proposed approach is as follows: For an identity containing a mixture of iconic and non iconic images, if a given face cannot be successfully matched with any other face of the same identity, then the iconicity of the face image is low. Using this information, we train a Siamese Multi-Layer Perceptron network, such that each of its twins predict iconicity scores of the image feature pair, fed in as input. We observe the variation of the obtained scores with respect to covariates such as blur, yaw, pitch, roll and occlusion to demonstrate that they effectively predict the quality of the image and compare it with other existing metrics. Furthermore, we use these scores to weight features for template-based face verification and compare it with media averaging of features.

对于人脸数据集中的给定身份，有某些标志性图像比其他图像更能代表主题。本文探讨了人脸象似性的计算问题。提出的方法的前提是:对于包含标志性和非标志性图像混合的身份，如果给定的人脸不能与任何具有相同身份的其他人脸成功匹配，则该人脸图像的像似性较低。利用这些信息，我们训练了一个Siamese多层感知器网络，这样它的每一个双胞胎都可以预测图像特征对的像似性分数，作为输入输入。我们观察了获得的分数相对于协变量(如模糊、偏航、俯pitch、roll和遮挡)的变化，以证明它们有效地预测了图像的质量，并将其与其他现有指标进行了比较。此外，我们使用这些分数对基于模板的人脸验证特征进行加权，并将其与特征的媒体平均进行比较。

引用次数: 9

Improving 3D Human Pose Estimation Via 3D Part Affinity Fields 利用三维零件亲和场改进三维人体姿态估计

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00112

Ding Liu, Zixu Zhao, Xinchao Wang, Yuxiao Hu, Lei Zhang, Thomas Huang

3D human pose estimation from monocular images has become a heated area in computer vision recently. For years, most deep neural network based practices have adopted either an end-to-end approach, or a two-stage approach. An end-to-end network typically estimates 3D human poses directly from 2D input images, but it suffers from the shortage of 3D human pose data. It is also obscure to know if the inaccuracy stems from limited visual under-standing or 2D-to-3D mapping. Whereas a two-stage directly lifts those 2D keypoint outputs to the 3D space, after utilizing an existing network for 2D keypoint detections. However, they tend to ignore some useful contextual hints from the 2D raw image pixels. In this paper, we introduce a two-stage architecture that can eliminate the main disadvantages of both these approaches. During the first stage we use an existing state-of-the-art detector to estimate 2D poses. To add more con-textual information to help lifting 2D poses to 3D poses, we propose 3D Part Affinity Fields (3D-PAFs). We use 3D-PAFs to infer 3D limb vectors, and combine them with 2D poses to regress the 3D coordinates. We trained and tested our proposed framework on Human3.6M, the most popular 3D human pose benchmark dataset. Our approach achieves the state-of-the-art performance, which proves that with right selections of contextual information, a simple regression model can be very powerful in estimating 3D poses.

基于单眼图像的三维人体姿态估计是近年来计算机视觉领域的一个研究热点。多年来，大多数基于深度神经网络的实践要么采用端到端方法，要么采用两阶段方法。端到端网络通常直接从2D输入图像中估计3D人体姿势，但它受到缺乏3D人体姿势数据的困扰。我们也不知道这种不准确是源于有限的视觉理解还是2d到3d的映射。然而，在利用现有网络进行2D关键点检测之后，两阶段直接将这些2D关键点输出提升到3D空间。然而，它们往往会忽略2D原始图像像素中一些有用的上下文提示。在本文中，我们介绍了一种两阶段架构，它可以消除这两种方法的主要缺点。在第一阶段，我们使用现有的最先进的检测器来估计二维姿势。为了添加更多上下文信息以帮助将2D姿势提升到3D姿势，我们提出了3D零件关联场(3D- paf)。我们使用3D- paf来推断三维肢体向量，并将其与二维姿态相结合来回归三维坐标。我们在最流行的3D人体姿势基准数据集Human3.6M上训练和测试了我们提出的框架。我们的方法达到了最先进的性能，这证明了通过正确选择上下文信息，一个简单的回归模型可以非常强大地估计3D姿势。

{"title":"Improving 3D Human Pose Estimation Via 3D Part Affinity Fields","authors":"Ding Liu, Zixu Zhao, Xinchao Wang, Yuxiao Hu, Lei Zhang, Thomas Huang","doi":"10.1109/WACV.2019.00112","DOIUrl":"https://doi.org/10.1109/WACV.2019.00112","url":null,"abstract":"3D human pose estimation from monocular images has become a heated area in computer vision recently. For years, most deep neural network based practices have adopted either an end-to-end approach, or a two-stage approach. An end-to-end network typically estimates 3D human poses directly from 2D input images, but it suffers from the shortage of 3D human pose data. It is also obscure to know if the inaccuracy stems from limited visual under-standing or 2D-to-3D mapping. Whereas a two-stage directly lifts those 2D keypoint outputs to the 3D space, after utilizing an existing network for 2D keypoint detections. However, they tend to ignore some useful contextual hints from the 2D raw image pixels. In this paper, we introduce a two-stage architecture that can eliminate the main disadvantages of both these approaches. During the first stage we use an existing state-of-the-art detector to estimate 2D poses. To add more con-textual information to help lifting 2D poses to 3D poses, we propose 3D Part Affinity Fields (3D-PAFs). We use 3D-PAFs to infer 3D limb vectors, and combine them with 2D poses to regress the 3D coordinates. We trained and tested our proposed framework on Human3.6M, the most popular 3D human pose benchmark dataset. Our approach achieves the state-of-the-art performance, which proves that with right selections of contextual information, a simple regression model can be very powerful in estimating 3D poses.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132453339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Euclidean Invariant Recognition of 2D Shapes Using Histograms of Magnitudes of Local Fourier-Mellin Descriptors 基于局部傅里叶-梅林描述符的直方图的二维形状欧几里得不变性识别

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00038

Xinhua Zhang, L. Williams

Because the magnitude of inner products with its basis functions are invariant to rotation and scale change, the Fourier-Mellin transform has long been used as a component in Euclidean invariant 2D shape recognition systems. Yet Fourier-Mellin transform magnitudes are only invariant to rotation and scale changes about a known center point, and full Euclidean invariant shape recognition is not possible except when this center point can be consistently and accurately identified. In this paper, we describe a system where a Fourier-Mellin transform is computed at every point in the image. The spatial support of the Fourier-Mellin basis functions is made local by multiplying them with a polynomial envelope. Significantly, the magnitudes of convolutions with these complex filters at isolated points are not (by themselves) used as features for Euclidean invariant shape recognition because reliable discrimination would require filters with spatial support large enough to fully encompass the shapes. Instead, we rely on the fact that normalized histograms of magnitudes are fully Euclidean invariant. We demonstrate a system based on the VLAD machine learning method that performs Euclidean invariant recognition of 2D shapes and requires an order of magnitude less training data than comparable methods based on convolutional neural networks.

由于内积及其基函数的大小不受旋转和尺度变化的影响，傅里叶-梅林变换一直被用作欧几里得不变二维形状识别系统的一个组成部分。然而，傅里叶-梅林变换的大小仅对已知中心点的旋转和尺度变化是不变的，除非能够一致准确地识别该中心点，否则不可能实现完整的欧几里得不变形状识别。在本文中，我们描述了一个在图像中每个点计算傅里叶-梅林变换的系统。傅里叶-梅林基函数的空间支持通过与多项式包络的相乘得到局域化。值得注意的是，在孤立点上使用这些复杂滤波器的卷积大小(本身)不用作欧几里得不变形状识别的特征，因为可靠的识别需要具有足够大的空间支持的滤波器来完全包含形状。相反，我们依赖于这样一个事实，即归一化直方图的大小是完全欧几里得不变的。我们展示了一个基于VLAD机器学习方法的系统，该系统对2D形状执行欧几里得不变识别，并且比基于卷积神经网络的可比方法需要的训练数据少一个数量级。

{"title":"Euclidean Invariant Recognition of 2D Shapes Using Histograms of Magnitudes of Local Fourier-Mellin Descriptors","authors":"Xinhua Zhang, L. Williams","doi":"10.1109/WACV.2019.00038","DOIUrl":"https://doi.org/10.1109/WACV.2019.00038","url":null,"abstract":"Because the magnitude of inner products with its basis functions are invariant to rotation and scale change, the Fourier-Mellin transform has long been used as a component in Euclidean invariant 2D shape recognition systems. Yet Fourier-Mellin transform magnitudes are only invariant to rotation and scale changes about a known center point, and full Euclidean invariant shape recognition is not possible except when this center point can be consistently and accurately identified. In this paper, we describe a system where a Fourier-Mellin transform is computed at every point in the image. The spatial support of the Fourier-Mellin basis functions is made local by multiplying them with a polynomial envelope. Significantly, the magnitudes of convolutions with these complex filters at isolated points are not (by themselves) used as features for Euclidean invariant shape recognition because reliable discrimination would require filters with spatial support large enough to fully encompass the shapes. Instead, we rely on the fact that normalized histograms of magnitudes are fully Euclidean invariant. We demonstrate a system based on the VLAD machine learning method that performs Euclidean invariant recognition of 2D shapes and requires an order of magnitude less training data than comparable methods based on convolutional neural networks.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128886193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On the Importance of Feature Aggregation for Face Reconstruction 论特征聚合在人脸重构中的重要性

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00103

Xiang Xu, Ha A. Le, I. Kakadiaris

The goal of this work is to seek principles of designing a deep neural network for 3D face reconstruction from a single image. To make the evaluation simple, we generated a synthetic dataset and used it for evaluation. We conducted extensive experiments using an end-to-end face reconstruction algorithm using E2FAR and its variations, and analyzed the reason why it can be successfully applied for 3D face reconstruction. From the comparative studies, we conclude that feature aggregation from different layers is a key point to training better neural networks for 3D face reconstruction. Based on these observations, a face reconstruction feature aggregation network (FR-FAN) is proposed, which obtains significant improvements compared with baselines on the synthetic validation set. We evaluate our model on existing popular indoor and in-the-wild 2D-3D datasets. Extensive experiments demonstrate that FR-FAN performs 16.50% and 9.54% better than E2FAR on BU-3DFE and JNU-3D, respectively. Finally, the sensitivity analysis we performed on controlled datasets demonstrates that our designed network is robust to large variations of pose, illumination, and expressions.

这项工作的目标是寻求设计一个深度神经网络的原则，用于从单个图像重建三维人脸。为了简化评估，我们生成了一个合成数据集并将其用于评估。我们对基于E2FAR及其变体的端到端人脸重建算法进行了大量实验，并分析了其能够成功应用于三维人脸重建的原因。通过对比研究，我们得出结论，不同层的特征聚合是训练出更好的用于三维人脸重建的神经网络的关键。在此基础上，提出了一种人脸重构特征聚合网络(FR-FAN)，该网络在合成验证集上获得了较基线显著的改进。我们在现有流行的室内和野外2D-3D数据集上评估我们的模型。大量实验表明，FR-FAN在BU-3DFE和JNU-3D上的性能分别比E2FAR高16.50%和9.54%。最后，我们对控制数据集进行的敏感性分析表明，我们设计的网络对姿势、光照和表情的大变化具有鲁棒性。

{"title":"On the Importance of Feature Aggregation for Face Reconstruction","authors":"Xiang Xu, Ha A. Le, I. Kakadiaris","doi":"10.1109/WACV.2019.00103","DOIUrl":"https://doi.org/10.1109/WACV.2019.00103","url":null,"abstract":"The goal of this work is to seek principles of designing a deep neural network for 3D face reconstruction from a single image. To make the evaluation simple, we generated a synthetic dataset and used it for evaluation. We conducted extensive experiments using an end-to-end face reconstruction algorithm using E2FAR and its variations, and analyzed the reason why it can be successfully applied for 3D face reconstruction. From the comparative studies, we conclude that feature aggregation from different layers is a key point to training better neural networks for 3D face reconstruction. Based on these observations, a face reconstruction feature aggregation network (FR-FAN) is proposed, which obtains significant improvements compared with baselines on the synthetic validation set. We evaluate our model on existing popular indoor and in-the-wild 2D-3D datasets. Extensive experiments demonstrate that FR-FAN performs 16.50% and 9.54% better than E2FAR on BU-3DFE and JNU-3D, respectively. Finally, the sensitivity analysis we performed on controlled datasets demonstrates that our designed network is robust to large variations of pose, illumination, and expressions.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"30 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120820958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀