2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition最新文献

英文中文

Probabilistic Joint Face-Skull Modelling for Facial Reconstruction 面部重建的关节-颅骨概率建模

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00555

Dennis Madsen, M. Lüthi, Andreas Schneider, T. Vetter

We present a novel method for co-registration of two independent statistical shape models. We solve the problem of aligning a face model to a skull model with stochastic optimization based on Markov Chain Monte Carlo (MCMC). We create a probabilistic joint face-skull model and show how to obtain a distribution of plausible face shapes given a skull shape. Due to environmental and genetic factors, there exists a distribution of possible face shapes arising from the same skull. We pose facial reconstruction as a conditional distribution of plausible face shapes given a skull shape. Because it is very difficult to obtain the distribution directly from MRI or CT data, we create a dataset of artificial face-skull pairs. To do this, we propose to combine three data sources of independent origin to model the joint face-skull distribution: a face shape model, a skull shape model and tissue depth marker information. For a given skull, we compute the posterior distribution of faces matching the tissue depth distribution with Metropolis-Hastings. We estimate the joint face-skull distribution from samples of the posterior. To find faces matching to an unknown skull, we estimate the probability of the face under the joint face-skull model. To our knowledge, we are the first to provide a whole distribution of plausible faces arising from a skull instead of only a single reconstruction. We show how the face-skull model can be used to rank a face dataset and on average successfully identify the correct match in top 30%. The face ranking even works when obtaining the face shapes from 2D images. We furthermore show how the face-skull model can be useful to estimate the skull position in an MR-image.

我们提出了一种新的两种独立统计形状模型的共配准方法。采用基于马尔可夫链蒙特卡罗(Markov Chain Monte Carlo, MCMC)的随机优化方法解决了人脸模型与颅骨模型的对齐问题。我们创建了一个概率关节面部-头骨模型，并展示了如何获得一个合理的面部形状给定颅骨形状的分布。由于环境和遗传因素的影响，同一个头骨可能产生不同的脸型。我们提出面部重建作为一个条件分布似是而非的面部形状给定的头骨形状。由于很难直接从MRI或CT数据中获得人脸-颅骨的分布，我们创建了一个人工人脸-颅骨对数据集。为此，我们建议结合三个独立来源的数据源:脸型模型、颅骨形状模型和组织深度标记信息来建模关节面-颅骨分布。对于给定的颅骨，我们用Metropolis-Hastings计算了与组织深度分布相匹配的人脸后验分布。我们估计关节面-颅骨分布从样本的后侧。为了找到与未知头骨匹配的人脸，我们在人脸-头骨联合模型下估计人脸的概率。据我们所知，我们是第一个提供由头骨产生的完整的貌似合理的面部分布，而不仅仅是单一的重建。我们展示了人脸-头骨模型如何用于对人脸数据集进行排序，并平均成功识别出前30%的正确匹配。人脸排序甚至在从二维图像中获取人脸形状时也有效。我们进一步展示了脸-头骨模型如何在核磁共振图像中用于估计头骨位置。

{"title":"Probabilistic Joint Face-Skull Modelling for Facial Reconstruction","authors":"Dennis Madsen, M. Lüthi, Andreas Schneider, T. Vetter","doi":"10.1109/CVPR.2018.00555","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00555","url":null,"abstract":"We present a novel method for co-registration of two independent statistical shape models. We solve the problem of aligning a face model to a skull model with stochastic optimization based on Markov Chain Monte Carlo (MCMC). We create a probabilistic joint face-skull model and show how to obtain a distribution of plausible face shapes given a skull shape. Due to environmental and genetic factors, there exists a distribution of possible face shapes arising from the same skull. We pose facial reconstruction as a conditional distribution of plausible face shapes given a skull shape. Because it is very difficult to obtain the distribution directly from MRI or CT data, we create a dataset of artificial face-skull pairs. To do this, we propose to combine three data sources of independent origin to model the joint face-skull distribution: a face shape model, a skull shape model and tissue depth marker information. For a given skull, we compute the posterior distribution of faces matching the tissue depth distribution with Metropolis-Hastings. We estimate the joint face-skull distribution from samples of the posterior. To find faces matching to an unknown skull, we estimate the probability of the face under the joint face-skull model. To our knowledge, we are the first to provide a whole distribution of plausible faces arising from a skull instead of only a single reconstruction. We show how the face-skull model can be used to rank a face dataset and on average successfully identify the correct match in top 30%. The face ranking even works when obtaining the face shapes from 2D images. We furthermore show how the face-skull model can be useful to estimate the skull position in an MR-image.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"1 1","pages":"5295-5303"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73181318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Interpretable Video Captioning via Trajectory Structured Localization 通过轨迹结构化定位的可解释视频字幕

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00714

X. Wu, Guanbin Li, Qingxing Cao, Qingge Ji, Liang Lin

Automatically describing open-domain videos with natural language are attracting increasing interest in the field of artificial intelligence. Most existing methods simply borrow ideas from image captioning and obtain a compact video representation from an ensemble of global image feature before feeding to an RNN decoder which outputs a sentence of variable length. However, it is not only arduous for the generator to focus on specific salient objects at different time given the global video representation, it is more formidable to capture the fine-grained motion information and the relation between moving instances for more subtle linguistic descriptions. In this paper, we propose a Trajectory Structured Attentional Encoder-Decoder (TSA-ED) neural network framework for more elaborate video captioning which works by integrating local spatial-temporal representation at trajectory level through structured attention mechanism. Our proposed method is based on a LSTM-based encoder-decoder framework, which incorporates an attention modeling scheme to adaptively learn the correlation between sentence structure and the moving objects in videos, and consequently generates more accurate and meticulous statement description in the decoding stage. Experimental results demonstrate that the feature representation and structured attention mechanism based on the trajectory cluster can efficiently obtain the local motion information in the video to help generate a more fine-grained video description, and achieve the state-of-the-art performance on the well-known Charades and MSVD datasets.

利用自然语言对开放域视频进行自动描述是人工智能领域研究的热点。大多数现有方法简单地借鉴图像字幕的思想，从全局图像特征的集合中获得紧凑的视频表示，然后馈送到RNN解码器，该解码器输出可变长度的句子。然而，在全局视频表示的情况下，生成器不仅很难在不同的时间集中在特定的突出对象上，更困难的是捕获细粒度的运动信息和运动实例之间的关系，以便进行更微妙的语言描述。在本文中，我们提出了一个轨迹结构化注意编码器-解码器(TSA-ED)神经网络框架，该框架通过结构化注意机制在轨迹层面整合局部时空表征来实现更精细的视频字幕。我们提出的方法基于基于lstm的编码器-解码器框架，该框架结合了注意力建模方案，自适应学习视频中句子结构与运动物体之间的相关性，从而在解码阶段生成更准确和细致的语句描述。实验结果表明，基于轨迹聚类的特征表示和结构化注意机制能够有效获取视频中的局部运动信息，有助于生成更细粒度的视频描述，并在知名的Charades和MSVD数据集上达到最先进的性能。

{"title":"Interpretable Video Captioning via Trajectory Structured Localization","authors":"X. Wu, Guanbin Li, Qingxing Cao, Qingge Ji, Liang Lin","doi":"10.1109/CVPR.2018.00714","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00714","url":null,"abstract":"Automatically describing open-domain videos with natural language are attracting increasing interest in the field of artificial intelligence. Most existing methods simply borrow ideas from image captioning and obtain a compact video representation from an ensemble of global image feature before feeding to an RNN decoder which outputs a sentence of variable length. However, it is not only arduous for the generator to focus on specific salient objects at different time given the global video representation, it is more formidable to capture the fine-grained motion information and the relation between moving instances for more subtle linguistic descriptions. In this paper, we propose a Trajectory Structured Attentional Encoder-Decoder (TSA-ED) neural network framework for more elaborate video captioning which works by integrating local spatial-temporal representation at trajectory level through structured attention mechanism. Our proposed method is based on a LSTM-based encoder-decoder framework, which incorporates an attention modeling scheme to adaptively learn the correlation between sentence structure and the moving objects in videos, and consequently generates more accurate and meticulous statement description in the decoding stage. Experimental results demonstrate that the feature representation and structured attention mechanism based on the trajectory cluster can efficiently obtain the local motion information in the video to help generate a more fine-grained video description, and achieve the state-of-the-art performance on the well-known Charades and MSVD datasets.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"220 1","pages":"6829-6837"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79862382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 55

Multiple Granularity Group Interaction Prediction 多粒度组交互预测

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00239

Taiping Yao, Minsi Wang, Bingbing Ni, Huawei Wei, Xiaokang Yang

Most human activity analysis works (i.e., recognition or prediction) only focus on a single granularity, i.e., either modelling global motion based on the coarse level movement such as human trajectories or forecasting future detailed action based on body parts' movement such as skeleton motion. In contrast, in this work, we propose a multi-granularity interaction prediction network which integrates both global motion and detailed local action. Built on a bidirectional LSTM network, the proposed method possesses between granularities links which encourage feature sharing as well as cross-feature consistency between both global and local granularity (e.g., trajectory or local action), and in turn predict long-term global location and local dynamics of each individual. We validate our method on several public datasets with promising performance.

大多数人类活动分析工作(即识别或预测)只关注单个粒度，即要么基于粗水平运动(如人类轨迹)建模全局运动，要么基于身体部位运动(如骨骼运动)预测未来的详细动作。相比之下，在这项工作中，我们提出了一种集成了全局运动和详细局部动作的多粒度交互预测网络。该方法建立在双向LSTM网络上，具有粒度之间的链接，鼓励全局和局部粒度(例如轨迹或局部动作)之间的特征共享以及跨特征一致性，进而预测每个个体的长期全局位置和局部动态。我们在几个公共数据集上验证了我们的方法，并取得了良好的性能。

引用次数: 18

Crowd Counting via Adversarial Cross-Scale Consistency Pursuit 通过对抗性跨尺度一致性追求的人群计数

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00550

Zan Shen, Yi Xu, Bingbing Ni, Minsi Wang, Jianguo Hu, Xiaokang Yang

Crowd counting or density estimation is a challenging task in computer vision due to large scale variations, perspective distortions and serious occlusions, etc. Existing methods generally suffer from two issues: 1) the model averaging effects in multi-scale CNNs induced by the widely adopted $$ regression loss; and 2) inconsistent estimation across different scaled inputs. To explicitly address these issues, we propose a novel crowd counting (density estimation) framework called Adversarial Cross-Scale Consistency Pursuit (ACSCP). On one hand, a U-net structured generation network is designed to generate density map from input patch, and an adversarial loss is directly employed to shrink the solution onto a realistic subspace, thus attenuating the blurry effects of density map estimation. On the other hand, we design a novel scale-consistency regularizer which enforces that the sum up of the crowd counts from local patches (i.e., small scale) is coherent with the overall count of their region union (i.e., large scale). The above losses are integrated via a joint training scheme, so as to help boost density estimation performance by further exploring the collaboration between both objectives. Extensive experiments on four benchmarks have well demonstrated the effectiveness of the proposed innovations as well as the superior performance over prior art.

人群计数或密度估计在计算机视觉中是一个具有挑战性的任务，因为大规模的变化，视角扭曲和严重的遮挡等。现有方法普遍存在两个问题:1)广泛采用的$$回归损失引起的多尺度cnn模型平均效应;2)不同尺度输入的不一致估计。为了明确地解决这些问题，我们提出了一个新的人群计数(密度估计)框架，称为对抗性跨尺度一致性追求(ACSCP)。一方面，设计U-net结构化生成网络，从输入patch生成密度图，并直接使用对抗损失将解缩小到现实子空间，从而减弱密度图估计的模糊效应;另一方面，我们设计了一种新的尺度一致性正则化器，该正则化器强制局部斑块(即小尺度)的人群计数总和与其区域联合(即大尺度)的总体计数一致。通过联合训练方案整合上述损失，从而进一步探索两个目标之间的协作，从而提高密度估计的性能。在四个基准上的广泛实验已经很好地证明了所提出的创新的有效性以及优于现有技术的性能。

{"title":"Crowd Counting via Adversarial Cross-Scale Consistency Pursuit","authors":"Zan Shen, Yi Xu, Bingbing Ni, Minsi Wang, Jianguo Hu, Xiaokang Yang","doi":"10.1109/CVPR.2018.00550","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00550","url":null,"abstract":"Crowd counting or density estimation is a challenging task in computer vision due to large scale variations, perspective distortions and serious occlusions, etc. Existing methods generally suffer from two issues: 1) the model averaging effects in multi-scale CNNs induced by the widely adopted $$ regression loss; and 2) inconsistent estimation across different scaled inputs. To explicitly address these issues, we propose a novel crowd counting (density estimation) framework called Adversarial Cross-Scale Consistency Pursuit (ACSCP). On one hand, a U-net structured generation network is designed to generate density map from input patch, and an adversarial loss is directly employed to shrink the solution onto a realistic subspace, thus attenuating the blurry effects of density map estimation. On the other hand, we design a novel scale-consistency regularizer which enforces that the sum up of the crowd counts from local patches (i.e., small scale) is coherent with the overall count of their region union (i.e., large scale). The above losses are integrated via a joint training scheme, so as to help boost density estimation performance by further exploring the collaboration between both objectives. Extensive experiments on four benchmarks have well demonstrated the effectiveness of the proposed innovations as well as the superior performance over prior art.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"10 1","pages":"5245-5254"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82061516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 305

GeoNet: Geometric Neural Network for Joint Depth and Surface Normal Estimation 基于几何神经网络的节理深度和表面法向估计

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00037

Xiaojuan Qi, Renjie Liao, Zhengzhe Liu, R. Urtasun, Jiaya Jia

In this paper, we propose Geometric Neural Network (GeoNet) to jointly predict depth and surface normal maps from a single image. Building on top of two-stream CNNs, our GeoNet incorporates geometric relation between depth and surface normal via the new depth-to-normal and normal-to-depth networks. Depth-to-normal network exploits the least square solution of surface normal from depth and improves its quality with a residual module. Normal-to-depth network, contrarily, refines the depth map based on the constraints from the surface normal through a kernel regression module, which has no parameter to learn. These two networks enforce the underlying model to efficiently predict depth and surface normal for high consistency and corresponding accuracy. Our experiments on NYU v2 dataset verify that our GeoNet is able to predict geometrically consistent depth and normal maps. It achieves top performance on surface normal estimation and is on par with state-of-the-art depth estimation methods.

在本文中，我们提出了几何神经网络(GeoNet)来联合预测深度和表面法线映射从单个图像。在两流cnn的基础上，我们的GeoNet通过新的深度-法线和法线-深度网络结合了深度和表面法线之间的几何关系。深度到法线网络利用表面法线的最小二乘解，并通过残差模块提高其质量。与之相反，normal -to-depth网络通过核回归模块根据表面法线的约束来细化深度图，核回归模块不需要学习参数。这两种网络使底层模型有效地预测深度和地表法线，具有较高的一致性和相应的精度。我们在NYU v2数据集上的实验验证了我们的GeoNet能够预测几何上一致的深度和法线图。它在表面法向估计方面达到了最高的性能，并且与最先进的深度估计方法相当。

引用次数: 291

Anatomical Priors in Convolutional Networks for Unsupervised Biomedical Segmentation 用于无监督生物医学分割的卷积网络解剖先验

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00968

Adrian V. Dalca, J. Guttag, M. Sabuncu

We consider the problem of segmenting a biomedical image into anatomical regions of interest. We specifically address the frequent scenario where we have no paired training data that contains images and their manual segmentations. Instead, we employ unpaired segmentation images that we use to build an anatomical prior. Critically these segmentations can be derived from imaging data from a different dataset and imaging modality than the current task. We introduce a generative probabilistic model that employs the learned prior through a convolutional neural network to compute segmentations in an unsupervised setting. We conducted an empirical analysis of the proposed approach in the context of structural brain MRI segmentation, using a multi-study dataset of more than 14,000 scans. Our results show that an anatomical prior enables fast unsupervised segmentation which is typically not possible using standard convolutional networks. The integration of anatomical priors can facilitate CNN-based anatomical segmentation in a range of novel clinical problems, where few or no annotations are available and thus standard networks are not trainable. The code, model definitions and model weights are freely available at http://github.com/adalca/neuron.

我们考虑将生物医学图像分割成感兴趣的解剖区域的问题。我们专门解决了经常出现的情况，即我们没有包含图像及其手动分割的成对训练数据。相反，我们使用未配对的分割图像来构建解剖学先验。关键的是，这些分割可以从不同的数据集和成像模式的成像数据中得出，而不是当前的任务。我们引入了一个生成概率模型，该模型通过卷积神经网络使用学习到的先验来计算无监督设置中的分割。我们使用超过14,000次扫描的多研究数据集，在脑结构MRI分割的背景下对所提出的方法进行了实证分析。我们的研究结果表明，解剖先验可以实现快速的无监督分割，这通常是不可能使用标准卷积网络。解剖学先验的整合可以在一系列新的临床问题中促进基于cnn的解剖学分割，这些问题很少或没有可用的注释，因此标准网络是不可训练的。代码、模型定义和模型权重可以在http://github.com/adalca/neuron上免费获得。

{"title":"Anatomical Priors in Convolutional Networks for Unsupervised Biomedical Segmentation","authors":"Adrian V. Dalca, J. Guttag, M. Sabuncu","doi":"10.1109/CVPR.2018.00968","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00968","url":null,"abstract":"We consider the problem of segmenting a biomedical image into anatomical regions of interest. We specifically address the frequent scenario where we have no paired training data that contains images and their manual segmentations. Instead, we employ unpaired segmentation images that we use to build an anatomical prior. Critically these segmentations can be derived from imaging data from a different dataset and imaging modality than the current task. We introduce a generative probabilistic model that employs the learned prior through a convolutional neural network to compute segmentations in an unsupervised setting. We conducted an empirical analysis of the proposed approach in the context of structural brain MRI segmentation, using a multi-study dataset of more than 14,000 scans. Our results show that an anatomical prior enables fast unsupervised segmentation which is typically not possible using standard convolutional networks. The integration of anatomical priors can facilitate CNN-based anatomical segmentation in a range of novel clinical problems, where few or no annotations are available and thus standard networks are not trainable. The code, model definitions and model weights are freely available at http://github.com/adalca/neuron.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"16 1","pages":"9290-9299"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76386875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 129

Wrapped Gaussian Process Regression on Riemannian Manifolds 黎曼流形上的缠绕高斯过程回归

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00585

Anton Mallasto, Aasa Feragen

Gaussian process (GP) regression is a powerful tool in non-parametric regression providing uncertainty estimates. However, it is limited to data in vector spaces. In fields such as shape analysis and diffusion tensor imaging, the data often lies on a manifold, making GP regression nonviable, as the resulting predictive distribution does not live in the correct geometric space. We tackle the problem by defining wrapped Gaussian processes (WGPs) on Riemannian manifolds, using the probabilistic setting to generalize GP regression to the context of manifold-valued targets. The method is validated empirically on diffusion weighted imaging (DWI) data, directional data on the sphere and in the Kendall shape space, endorsing WGP regression as an efficient and flexible tool for manifold-valued regression.

高斯过程(GP)回归是非参数回归中提供不确定性估计的有力工具。然而，它仅限于向量空间中的数据。在形状分析和扩散张量成像等领域，数据通常位于流形上，使得GP回归不可行，因为得到的预测分布不在正确的几何空间中。我们通过在黎曼流形上定义包裹高斯过程(WGPs)来解决这个问题，使用概率设置将GP回归推广到流形值目标的上下文中。该方法在扩散加权成像(DWI)数据、球体和Kendall形状空间上的方向数据上进行了实证验证，证明WGP回归是一种高效、灵活的流形值回归工具。

引用次数: 28

A Constrained Deep Neural Network for Ordinal Regression 有序回归的约束深度神经网络

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00093

Yanzhu Liu, A. Kong, C. Goh

Ordinal regression is a supervised learning problem aiming to classify instances into ordinal categories. It is challenging to automatically extract high-level features for representing intraclass information and interclass ordinal relationship simultaneously. This paper proposes a constrained optimization formulation for the ordinal regression problem which minimizes the negative loglikelihood for multiple categories constrained by the order relationship between instances. Mathematically, it is equivalent to an unconstrained formulation with a pairwise regularizer. An implementation based on the CNN framework is proposed to solve the problem such that high-level features can be extracted automatically, and the optimal solution can be learned through the traditional back-propagation method. The proposed pairwise constraints make the algorithm work even on small datasets, and a proposed efficient implementation make it be scalable for large datasets. Experimental results on four real-world benchmarks demonstrate that the proposed algorithm outperforms the traditional deep learning approaches and other state-of-the-art approaches based on hand-crafted features.

序数回归是一种监督学习问题，旨在将实例分类到有序的类别中。同时表示类内信息和类间顺序关系的高级特征的自动提取是一个挑战。针对有序回归问题，提出了一种约束优化公式，使实例间的顺序关系约束的多类别负对数似然最小化。在数学上，它等价于带有成对正则化器的无约束公式。提出了一种基于CNN框架的实现方案，可以自动提取高级特征，并通过传统的反向传播方法学习到最优解。所提出的配对约束使算法即使在小数据集上也能工作，并且所提出的有效实现使其在大型数据集上具有可扩展性。在四个现实世界基准上的实验结果表明，所提出的算法优于传统的深度学习方法和其他基于手工制作特征的最先进的方法。

{"title":"A Constrained Deep Neural Network for Ordinal Regression","authors":"Yanzhu Liu, A. Kong, C. Goh","doi":"10.1109/CVPR.2018.00093","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00093","url":null,"abstract":"Ordinal regression is a supervised learning problem aiming to classify instances into ordinal categories. It is challenging to automatically extract high-level features for representing intraclass information and interclass ordinal relationship simultaneously. This paper proposes a constrained optimization formulation for the ordinal regression problem which minimizes the negative loglikelihood for multiple categories constrained by the order relationship between instances. Mathematically, it is equivalent to an unconstrained formulation with a pairwise regularizer. An implementation based on the CNN framework is proposed to solve the problem such that high-level features can be extracted automatically, and the optimal solution can be learned through the traditional back-propagation method. The proposed pairwise constraints make the algorithm work even on small datasets, and a proposed efficient implementation make it be scalable for large datasets. Experimental results on four real-world benchmarks demonstrate that the proposed algorithm outperforms the traditional deep learning approaches and other state-of-the-art approaches based on hand-crafted features.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"18 1","pages":"831-839"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76062685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 63

The Best of Both Worlds: Combining CNNs and Geometric Constraints for Hierarchical Motion Segmentation 两全其美:结合cnn和几何约束进行分层运动分割

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00060

Pia Bideau, Aruni RoyChowdhury, Rakesh R Menon, E. Learned-Miller

Traditional methods of motion segmentation use powerful geometric constraints to understand motion, but fail to leverage the semantics of high-level image understanding. Modern CNN methods of motion analysis, on the other hand, excel at identifying well-known structures, but may not precisely characterize well-known geometric constraints. In this work, we build a new statistical model of rigid motion flow based on classical perspective projection constraints. We then combine piecewise rigid motions into complex deformable and articulated objects, guided by semantic segmentation from CNNs and a second "object-level" statistical model. This combination of classical geometric knowledge combined with the pattern recognition abilities of CNNs yields excellent performance on a wide range of motion segmentation benchmarks, from complex geometric scenes to camouflaged animals.

传统的运动分割方法使用强大的几何约束来理解运动，但未能利用高级图像理解的语义。另一方面，现代CNN运动分析方法擅长识别已知的结构，但可能无法精确表征已知的几何约束。在这项工作中，我们建立了一个新的基于经典透视投影约束的刚性运动流统计模型。然后，通过cnn的语义分割和第二个“对象级”统计模型，我们将分段的刚性运动组合成复杂的可变形和铰接的对象。将经典几何知识与cnn的模式识别能力相结合，在从复杂几何场景到伪装动物的各种运动分割基准上产生了出色的性能。

引用次数: 38

Weakly Supervised Facial Action Unit Recognition Through Adversarial Training 通过对抗性训练的弱监督面部动作单元识别

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 DOI: 10.1109/CVPR.2018.00233

Guozhu Peng, Shangfei Wang

Current works on facial action unit (AU) recognition typically require fully AU-annotated facial images for supervised AU classifier training. AU annotation is a time-consuming, expensive, and error-prone process. While AUs are hard to annotate, facial expression is relatively easy to label. Furthermore, there exist strong probabilistic dependencies between expressions and AUs as well as dependencies among AUs. Such dependencies are referred to as domain knowledge. In this paper, we propose a novel AU recognition method that learns AU classifiers from domain knowledge and expression-annotated facial images through adversarial training. Specifically, we first generate pseudo AU labels according to the probabilistic dependencies between expressions and AUs as well as correlations among AUs summarized from domain knowledge. Then we propose a weakly supervised AU recognition method via an adversarial process, in which we simultaneously train two models: a recognition model R, which learns AU classifiers, and a discrimination model D, which estimates the probability that AU labels generated from domain knowledge rather than the recognized AU labels from R. The training procedure for R maximizes the probability of D making a mistake. By leveraging the adversarial mechanism, the distribution of recognized AUs is closed to AU prior distribution from domain knowledge. Furthermore, the proposed weakly supervised AU recognition can be extended to semi-supervised learning scenarios with partially AU-annotated images. Experimental results on three benchmark databases demonstrate that the proposed method successfully leverages the summarized domain knowledge to weakly supervised AU classifier learning through an adversarial process, and thus achieves state-of-the-art performance.

目前在面部动作单元(AU)识别方面的工作通常需要完全带AU注释的面部图像来进行有监督的AU分类器训练。AU注释是一个耗时、昂贵且容易出错的过程。虽然au很难标注，但面部表情相对容易标注。此外，表达式与目标之间以及目标之间存在很强的概率依赖关系。这样的依赖关系被称为领域知识。在本文中，我们提出了一种新的AU识别方法，该方法通过对抗性训练从领域知识和表情标注的面部图像中学习AU分类器。具体而言，我们首先根据表达式与用户之间的概率依赖关系以及从领域知识中总结的用户之间的相关性生成伪用户标签。然后，我们提出了一种基于对抗过程的弱监督AU识别方法，该方法同时训练两个模型:一个是学习AU分类器的识别模型R，另一个是估计从领域知识而不是从R中识别的AU标签生成AU标签的概率的判别模型D。通过利用对抗机制，识别对象的分布接近于领域知识的对象先验分布。此外，提出的弱监督AU识别可以扩展到具有部分AU注释的图像的半监督学习场景。在三个基准数据库上的实验结果表明，该方法通过对抗过程成功地将总结的领域知识用于弱监督AU分类器学习，从而达到了最先进的性能。

{"title":"Weakly Supervised Facial Action Unit Recognition Through Adversarial Training","authors":"Guozhu Peng, Shangfei Wang","doi":"10.1109/CVPR.2018.00233","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00233","url":null,"abstract":"Current works on facial action unit (AU) recognition typically require fully AU-annotated facial images for supervised AU classifier training. AU annotation is a time-consuming, expensive, and error-prone process. While AUs are hard to annotate, facial expression is relatively easy to label. Furthermore, there exist strong probabilistic dependencies between expressions and AUs as well as dependencies among AUs. Such dependencies are referred to as domain knowledge. In this paper, we propose a novel AU recognition method that learns AU classifiers from domain knowledge and expression-annotated facial images through adversarial training. Specifically, we first generate pseudo AU labels according to the probabilistic dependencies between expressions and AUs as well as correlations among AUs summarized from domain knowledge. Then we propose a weakly supervised AU recognition method via an adversarial process, in which we simultaneously train two models: a recognition model R, which learns AU classifiers, and a discrimination model D, which estimates the probability that AU labels generated from domain knowledge rather than the recognized AU labels from R. The training procedure for R maximizes the probability of D making a mistake. By leveraging the adversarial mechanism, the distribution of recognized AUs is closed to AU prior distribution from domain knowledge. Furthermore, the proposed weakly supervised AU recognition can be extended to semi-supervised learning scenarios with partially AU-annotated images. Experimental results on three benchmark databases demonstrate that the proposed method successfully leverages the summarized domain knowledge to weakly supervised AU classifier learning through an adversarial process, and thus achieves state-of-the-art performance.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"71 1","pages":"2188-2196"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87995762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 51

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀