Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition最新文献_第4页

Geometry-Consistent Generative Adversarial Networks for One-Sided Unsupervised Domain Mapping. 用于单面无监督领域映射的几何一致生成对抗网络

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Pub Date : 2019-06-01 Epub Date: 2020-01-09 DOI: 10.1109/cvpr.2019.00253

Huan Fu, Mingming Gong, Chaohui Wang, Kayhan Batmanghelich, Kun Zhang, Dacheng Tao

Unsupervised domain mapping aims to learn a function G_XY to translate domain $X$ to $Y$ in the absence of paired examples. Finding the optimal G _XY without paired data is an ill-posed problem, so appropriate constraints are required to obtain reasonable solutions. While some prominent constraints such as cycle consistency and distance preservation successfully constrain the solution space, they overlook the special properties of images that simple geometric transformations do not change the image's semantic structure. Based on this special property, we develop a geometry-consistent generative adversarial network (Gc-GAN), which enables one-sided unsupervised domain mapping. GcGAN takes the original image and its counterpart image transformed by a predefined geometric transformation as inputs and generates two images in the new domain coupled with the corresponding geometry-consistency constraint. The geometry-consistency constraint reduces the space of possible solutions while keep the correct solutions in the search space. Quantitative and qualitative comparisons with the baseline (GAN alone) and the state-of-the-art methods including CycleGAN [66] and DistanceGAN [5] demonstrate the effectiveness of our method.

无监督领域映射旨在学习一个函数 GXY，以便在没有配对示例的情况下将领域 X 转换为 Y。在没有配对数据的情况下寻找最优 G XY 是一个难以解决的问题，因此需要适当的约束条件才能获得合理的解决方案。虽然一些著名的约束条件（如周期一致性和距离保持）成功地限制了解空间，但它们忽略了图像的特殊属性，即简单的几何变换不会改变图像的语义结构。基于这一特殊属性，我们开发了一种几何一致性生成对抗网络（Gc-GAN），它可以实现单侧无监督领域映射。GcGAN 将原始图像和经过预定义几何变换的对应图像作为输入，并在新域中生成两幅图像以及相应的几何一致性约束。几何一致性约束减少了可能解决方案的空间，同时在搜索空间中保留了正确的解决方案。与基线（单独的 GAN）和最先进的方法（包括 CycleGAN [66] 和 DistanceGAN [5]）进行的定量和定性比较证明了我们方法的有效性。

{"title":"Geometry-Consistent Generative Adversarial Networks for One-Sided Unsupervised Domain Mapping.","authors":"Huan Fu, Mingming Gong, Chaohui Wang, Kayhan Batmanghelich, Kun Zhang, Dacheng Tao","doi":"10.1109/cvpr.2019.00253","DOIUrl":"10.1109/cvpr.2019.00253","url":null,"abstract":"Unsupervised domain mapping aims to learn a function GXY to translate domain <math><mi>X</mi></math> to <math><mi>Y</mi></math> in the absence of paired examples. Finding the optimal G XY without paired data is an ill-posed problem, so appropriate constraints are required to obtain reasonable solutions. While some prominent constraints such as cycle consistency and distance preservation successfully constrain the solution space, they overlook the special properties of images that simple geometric transformations do not change the image's semantic structure. Based on this special property, we develop a geometry-consistent generative adversarial network (Gc-GAN), which enables one-sided unsupervised domain mapping. GcGAN takes the original image and its counterpart image transformed by a predefined geometric transformation as inputs and generates two images in the new domain coupled with the corresponding geometry-consistency constraint. The geometry-consistency constraint reduces the space of possible solutions while keep the correct solutions in the search space. Quantitative and qualitative comparisons with the baseline (GAN alone) and the state-of-the-art methods including CycleGAN [66] and DistanceGAN [5] demonstrate the effectiveness of our method.","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"2019 ","pages":"2422-2431"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7030214/pdf/nihms-1037392.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37658933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Networks for Joint Affine and Non-parametric Image Registration. 用于联合仿射和非参数图像注册的网络

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Pub Date : 2019-06-01 Epub Date: 2020-01-09 DOI: 10.1109/cvpr.2019.00435

Zhengyang Shen, Xu Han, Zhenlin Xu, Marc Niethammer

We introduce an end-to-end deep-learning framework for 3D medical image registration. In contrast to existing approaches, our framework combines two registration methods: an affine registration and a vector momentum-parameterized stationary velocity field (vSVF) model. Specifically, it consists of three stages. In the first stage, a multi-step affine network predicts affine transform parameters. In the second stage, we use a U-Net-like network to generate a momentum, from which a velocity field can be computed via smoothing. Finally, in the third stage, we employ a self-iterable map-based vSVF component to provide a non-parametric refinement based on the current estimate of the transformation map. Once the model is trained, a registration is completed in one forward pass. To evaluate the performance, we conducted longitudinal and cross-subject experiments on 3D magnetic resonance images (MRI) of the knee of the Osteoarthritis Initiative (OAI) dataset. Results show that our framework achieves comparable performance to state-of-the-art medical image registration approaches, but it is much faster, with a better control of transformation regularity including the ability to produce approximately symmetric transformations, and combining affine as well as non-parametric registration.

我们为三维医学图像配准引入了端到端的深度学习框架。与现有方法相比，我们的框架结合了两种配准方法：仿射配准和矢量动量参数化静态速度场（vSVF）模型。具体来说，它包括三个阶段。在第一阶段，多步仿射网络预测仿射变换参数。在第二阶段，我们使用类似 U-Net 的网络生成动量，并通过平滑处理计算出速度场。最后，在第三阶段，我们采用基于地图的自迭代 vSVF 组件，根据当前对变换地图的估计提供非参数细化。模型训练完成后，一次前向传递即可完成配准。为了评估性能，我们在骨关节炎倡议（OAI）数据集的膝关节三维磁共振图像（MRI）上进行了纵向和跨受试者实验。结果表明，我们的框架与最先进的医学图像配准方法性能相当，但速度更快，能更好地控制变换的规则性，包括产生近似对称变换的能力，并能结合仿射和非参数配准。

{"title":"Networks for Joint Affine and Non-parametric Image Registration.","authors":"Zhengyang Shen, Xu Han, Zhenlin Xu, Marc Niethammer","doi":"10.1109/cvpr.2019.00435","DOIUrl":"10.1109/cvpr.2019.00435","url":null,"abstract":"We introduce an end-to-end deep-learning framework for 3D medical image registration. In contrast to existing approaches, our framework combines two registration methods: an affine registration and a vector momentum-parameterized stationary velocity field (vSVF) model. Specifically, it consists of three stages. In the first stage, a multi-step affine network predicts affine transform parameters. In the second stage, we use a U-Net-like network to generate a momentum, from which a velocity field can be computed via smoothing. Finally, in the third stage, we employ a self-iterable map-based vSVF component to provide a non-parametric refinement based on the current estimate of the transformation map. Once the model is trained, a registration is completed in one forward pass. To evaluate the performance, we conducted longitudinal and cross-subject experiments on 3D magnetic resonance images (MRI) of the knee of the Osteoarthritis Initiative (OAI) dataset. Results show that our framework achieves comparable performance to state-of-the-art medical image registration approaches, but it is much faster, with a better control of transformation regularity including the ability to produce approximately symmetric transformations, and combining affine as well as non-parametric registration.","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"2019 ","pages":"4219-4228"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7286599/pdf/nihms-1033312.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38036232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Robust Histopathology Image Analysis: to Label or to Synthesize? 强大的组织病理学图像分析：贴标签还是合成？

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Pub Date : 2019-06-01 Epub Date: 2020-01-09 DOI: 10.1109/CVPR.2019.00873

Le Hou, Ayush Agarwal, Dimitris Samaras, Tahsin M Kurc, Rajarsi R Gupta, Joel H Saltz

Detection, segmentation and classification of nuclei are fundamental analysis operations in digital pathology. Existing state-of-the-art approaches demand extensive amount of supervised training data from pathologists and may still perform poorly in images from unseen tissue types. We propose an unsupervised approach for histopathology image segmentation that synthesizes heterogeneous sets of training image patches, of every tissue type. Although our synthetic patches are not always of high quality, we harness the motley crew of generated samples through a generally applicable importance sampling method. This proposed approach, for the first time, re-weighs the training loss over synthetic data so that the ideal (unbiased) generalization loss over the true data distribution is minimized. This enables us to use a random polygon generator to synthesize approximate cellular structures (i.e., nuclear masks) for which no real examples are given in many tissue types, and hence, GAN-based methods are not suited. In addition, we propose a hybrid synthesis pipeline that utilizes textures in real histopathology patches and GAN models, to tackle heterogeneity in tissue textures. Compared with existing state-of-the-art supervised models, our approach generalizes significantly better on cancer types without training data. Even in cancer types with training data, our approach achieves the same performance without supervision cost. We release code and segmentation results on over 5000 Whole Slide Images (WSI) in The Cancer Genome Atlas (TCGA) repository, a dataset that would be orders of magnitude larger than what is available today.

细胞核的检测、分割和分类是数字病理学的基本分析操作。现有的先进方法需要病理学家提供大量有监督的训练数据，但在处理未见组织类型的图像时仍可能表现不佳。我们提出了一种用于组织病理学图像分割的无监督方法，它能合成各种组织类型的异质训练图像补丁集。虽然我们合成的样本质量并不总是很高，但我们通过一种普遍适用的重要度抽样方法，对生成的样本进行了综合利用。这种方法首次对合成数据的训练损失进行了重新权衡，从而使真实数据分布的理想（无偏）泛化损失最小化。这使我们能够使用随机多边形生成器合成近似的细胞结构（即核掩膜），而在许多组织类型中，并没有给出这种结构的真实示例，因此基于 GAN 的方法并不适用。此外，我们还提出了一种混合合成管道，利用真实组织病理学斑块中的纹理和 GAN 模型来解决组织纹理的异质性问题。与现有的最先进的监督模型相比，我们的方法在没有训练数据的癌症类型中的泛化效果明显更好。即使在有训练数据的癌症类型中，我们的方法也能实现相同的性能，而无需监督成本。我们在癌症基因组图谱（TCGA）资源库中的 5000 多张全切片图像（WSI）上发布了代码和分割结果，这个数据集比目前可用的数据集要大得多。

{"title":"Robust Histopathology Image Analysis: to Label or to Synthesize?","authors":"Le Hou, Ayush Agarwal, Dimitris Samaras, Tahsin M Kurc, Rajarsi R Gupta, Joel H Saltz","doi":"10.1109/CVPR.2019.00873","DOIUrl":"10.1109/CVPR.2019.00873","url":null,"abstract":"Detection, segmentation and classification of nuclei are fundamental analysis operations in digital pathology. Existing state-of-the-art approaches demand extensive amount of supervised training data from pathologists and may still perform poorly in images from unseen tissue types. We propose an unsupervised approach for histopathology image segmentation that synthesizes heterogeneous sets of training image patches, of every tissue type. Although our synthetic patches are not always of high quality, we harness the motley crew of generated samples through a generally applicable importance sampling method. This proposed approach, for the first time, re-weighs the training loss over synthetic data so that the ideal (unbiased) generalization loss over the true data distribution is minimized. This enables us to use a random polygon generator to synthesize approximate cellular structures (i.e., nuclear masks) for which no real examples are given in many tissue types, and hence, GAN-based methods are not suited. In addition, we propose a hybrid synthesis pipeline that utilizes textures in real histopathology patches and GAN models, to tackle heterogeneity in tissue textures. Compared with existing state-of-the-art supervised models, our approach generalizes significantly better on cancer types without training data. Even in cancer types with training data, our approach achieves the same performance without supervision cost. We release code and segmentation results on over 5000 Whole Slide Images (WSI) in The Cancer Genome Atlas (TCGA) repository, a dataset that would be orders of magnitude larger than what is available today.","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"2019 ","pages":"8533-8542"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8139403/pdf/nihms-1025751.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39010307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-Object Portion Tracking in 4D Fluorescence Microscopy Imagery with Deep Feature Maps. 利用深度特征图在四维荧光显微成像中进行多目标部分跟踪

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Pub Date : 2019-06-01 Epub Date: 2020-04-09 DOI: 10.1109/cvprw.2019.00142

Yang Jiao, Mo Weng, Mei Yang

3D fluorescence microscopy of living organisms has increasingly become an essential and powerful tool in biomedical research and diagnosis. An exploding amount of imaging data has been collected, whereas efficient and effective computational tools to extract information from them are still lagging behind. This is largely due to the challenges in analyzing biological data. Interesting biological structures are not only small, but are often morphologically irregular and highly dynamic. Although tracking cells in live organisms has been studied for years, existing tracking methods for cells are not effective in tracking subcellular structures, such as protein complexes, which feature in continuous morphological changes including split and merge, in addition to fast migration and complex motion. In this paper, we first define the problem of multi-object portion tracking to model the protein object tracking process. A multi-object tracking method with portion matching is proposed based on 3D segmentation results. The proposed method distills deep feature maps from deep networks, then recognizes and matches objects' portions using an extended search. Experimental results confirm that the proposed method achieves 2.96% higher on consistent tracking accuracy and 35.48% higher on event identification accuracy than the state-of-art methods.

生物体的三维荧光显微镜已日益成为生物医学研究和诊断中不可或缺的强大工具。成像数据的收集量呈爆炸式增长，而从中提取信息的高效计算工具却仍然滞后。这主要是由于分析生物数据所面临的挑战。有趣的生物结构不仅体积小，而且往往形态不规则、动态性强。虽然对活生物体内细胞的跟踪研究已有多年，但现有的细胞跟踪方法并不能有效地跟踪亚细胞结构，如蛋白质复合物，它们除了快速迁移和复杂运动外，还具有分裂和合并等连续形态变化的特征。在本文中，我们首先定义了多目标部分跟踪问题，以模拟蛋白质目标跟踪过程。在三维分割结果的基础上，提出了一种带有部分匹配的多目标跟踪方法。该方法从深度网络中提炼出深度特征图，然后使用扩展搜索来识别和匹配物体的部分。实验结果证实，与现有方法相比，所提方法的一致跟踪准确率提高了 2.96%，事件识别准确率提高了 35.48%。

{"title":"Multi-Object Portion Tracking in 4D Fluorescence Microscopy Imagery with Deep Feature Maps.","authors":"Yang Jiao, Mo Weng, Mei Yang","doi":"10.1109/cvprw.2019.00142","DOIUrl":"10.1109/cvprw.2019.00142","url":null,"abstract":"3D fluorescence microscopy of living organisms has increasingly become an essential and powerful tool in biomedical research and diagnosis. An exploding amount of imaging data has been collected, whereas efficient and effective computational tools to extract information from them are still lagging behind. This is largely due to the challenges in analyzing biological data. Interesting biological structures are not only small, but are often morphologically irregular and highly dynamic. Although tracking cells in live organisms has been studied for years, existing tracking methods for cells are not effective in tracking subcellular structures, such as protein complexes, which feature in continuous morphological changes including split and merge, in addition to fast migration and complex motion. In this paper, we first define the problem of multi-object portion tracking to model the protein object tracking process. A multi-object tracking method with portion matching is proposed based on 3D segmentation results. The proposed method distills deep feature maps from deep networks, then recognizes and matches objects' portions using an extended search. Experimental results confirm that the proposed method achieves 2.96% higher on consistent tracking accuracy and 35.48% higher on event identification accuracy than the state-of-art methods.","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"2019 ","pages":"1087-1096"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7304548/pdf/nihms-1043641.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38067872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Weakly Supervised Learning of Single-Cell Feature Embeddings. 单细胞特征嵌入的弱监督学习

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 Epub Date: 2018-12-17 DOI: 10.1109/CVPR.2018.00970

Juan C Caicedo, Claire McQuin, Allen Goodman, Shantanu Singh, Anne E Carpenter

We study the problem of learning representations for single cells in microscopy images to discover biological relationships between their experimental conditions. Many new applications in drug discovery and functional genomics require capturing the morphology of individual cells as comprehensively as possible. Deep convolutional neural networks (CNNs) can learn powerful visual representations, but require ground truth for training; this is rarely available in biomedical profiling experiments. While we do not know which experimental treatments produce cells that look alike, we do know that cells exposed to the same experimental treatment should generally look similar. Thus, we explore training CNNs using a weakly supervised approach that uses this information for feature learning. In addition, the training stage is regularized to control for unwanted variations using mixup or RNNs. We conduct experiments on two different datasets; the proposed approach yields single-cell embeddings that are more accurate than the widely adopted classical features, and are competitive with previously proposed transfer learning approaches.

我们研究的问题是学习显微镜图像中的单细胞表征，以发现其实验条件之间的生物学关系。药物发现和功能基因组学的许多新应用都需要尽可能全面地捕捉单个细胞的形态。深度卷积神经网络（CNN）可以学习强大的视觉表征，但需要地面实况进行训练；而这在生物医学剖析实验中很少能实现。虽然我们不知道哪些实验处理方法会产生外观相似的细胞，但我们知道暴露于相同实验处理方法下的细胞一般应该外观相似。因此，我们探索使用弱监督方法训练 CNN，利用这一信息进行特征学习。此外，我们还对训练阶段进行了正则化处理，以利用混合或 RNN 控制不必要的变化。我们在两个不同的数据集上进行了实验；所提出的方法产生的单细胞嵌入比广泛采用的经典特征更准确，与之前提出的迁移学习方法相比也更有竞争力。

{"title":"Weakly Supervised Learning of Single-Cell Feature Embeddings.","authors":"Juan C Caicedo, Claire McQuin, Allen Goodman, Shantanu Singh, Anne E Carpenter","doi":"10.1109/CVPR.2018.00970","DOIUrl":"10.1109/CVPR.2018.00970","url":null,"abstract":"We study the problem of learning representations for single cells in microscopy images to discover biological relationships between their experimental conditions. Many new applications in drug discovery and functional genomics require capturing the morphology of individual cells as comprehensively as possible. Deep convolutional neural networks (CNNs) can learn powerful visual representations, but require ground truth for training; this is rarely available in biomedical profiling experiments. While we do not know which experimental treatments produce cells that look alike, we do know that cells exposed to the same experimental treatment should generally look similar. Thus, we explore training CNNs using a weakly supervised approach that uses this information for feature learning. In addition, the training stage is regularized to control for unwanted variations using mixup or RNNs. We conduct experiments on two different datasets; the proposed approach yields single-cell embeddings that are more accurate than the widely adopted classical features, and are competitive with previously proposed transfer learning approaches.","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"2018 ","pages":"9309-9318"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6432648/pdf/nihms-1018562.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37271663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning Facial Action Units from Web Images with Scalable Weakly Supervised Clustering. 使用可扩展的弱监督聚类从网络图像中学习面部动作单元。

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 Epub Date: 2018-12-17 DOI: 10.1109/CVPR.2018.00223

Kaili Zhao, Wen-Sheng Chu, Aleix M Martinez

We present a scalable weakly supervised clustering approach to learn facial action units (AUs) from large, freely available web images. Unlike most existing methods (e.g., CNNs) that rely on fully annotated data, our method exploits web images with inaccurate annotations. Specifically, we derive a weakly-supervised spectral algorithm that learns an embedding space to couple image appearance and semantics. The algorithm has efficient gradient update, and scales up to large quantities of images with a stochastic extension. With the learned embedding space, we adopt rank-order clustering to identify groups of visually and semantically similar images, and re-annotate these groups for training AU classifiers. Evaluation on the 1 millon EmotioNet dataset demonstrates the effectiveness of our approach: (1) our learned annotations reach on average 91.3% agreement with human annotations on 7 common AUs, (2) classifiers trained with re-annotated images perform comparably to, sometimes even better than, its supervised CNN-based counterpart, and (3) our method offers intuitive outlier/noise pruning instead of forcing one annotation to every image. Code is available.

我们提出了一种可扩展的弱监督聚类方法，从大的、免费的网络图像中学习面部动作单元（AU）。与大多数依赖于完全注释数据的现有方法（例如，CNN）不同，我们的方法利用了注释不准确的网络图像。具体来说，我们推导了一种弱监督谱算法，该算法学习嵌入空间来耦合图像外观和语义。该算法具有高效的梯度更新，并以随机扩展的方式扩展到大量图像。利用学习的嵌入空间，我们采用秩序聚类来识别视觉和语义相似的图像组，并对这些组进行重新注释以训练AU分类器。对1 millon EmotioNet数据集的评估证明了我们方法的有效性：（1）我们学习的注释在7个常见AU上与人类注释的一致性平均达到91.3%；（2）用重新注释的图像训练的分类器的性能与基于CNN的监督分类器相当，有时甚至更好，以及（3）我们的方法提供了直观的异常值/噪声修剪，而不是强制对每个图像进行一个注释。代码可用。

{"title":"Learning Facial Action Units from Web Images with Scalable Weakly Supervised Clustering.","authors":"Kaili Zhao, Wen-Sheng Chu, Aleix M Martinez","doi":"10.1109/CVPR.2018.00223","DOIUrl":"10.1109/CVPR.2018.00223","url":null,"abstract":"We present a scalable weakly supervised clustering approach to learn facial action units (AUs) from large, freely available web images. Unlike most existing methods (e.g., CNNs) that rely on fully annotated data, our method exploits web images with inaccurate annotations. Specifically, we derive a weakly-supervised spectral algorithm that learns an embedding space to couple image appearance and semantics. The algorithm has efficient gradient update, and scales up to large quantities of images with a stochastic extension. With the learned embedding space, we adopt rank-order clustering to identify groups of visually and semantically similar images, and re-annotate these groups for training AU classifiers. Evaluation on the 1 millon EmotioNet dataset demonstrates the effectiveness of our approach: (1) our learned annotations reach on average 91.3% agreement with human annotations on 7 common AUs, (2) classifiers trained with re-annotated images perform comparably to, sometimes even better than, its supervised CNN-based counterpart, and (3) our method offers intuitive outlier/noise pruning instead of forcing one annotation to every image. Code is available.","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"2018 ","pages":"2090-2099"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6594709/pdf/nihms-995319.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37373174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep Ordinal Regression Network for Monocular Depth Estimation. 用于单目深度估计的深度有序回归网络。

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-06-01 Epub Date: 2018-12-17 DOI: 10.1109/CVPR.2018.00214

Huan Fu, Mingming Gong, Chaohui Wang, Kayhan Batmanghelich, Dacheng Tao

Monocular depth estimation, which plays a crucial role in understanding 3D scene geometry, is an ill-posed problem. Recent methods have gained significant improvement by exploring image-level information and hierarchical features from deep convolutional neural networks (DCNNs). These methods model depth estimation as a regression problem and train the regression networks by minimizing mean squared error, which suffers from slow convergence and unsatisfactory local solutions. Besides, existing depth estimation networks employ repeated spatial pooling operations, resulting in undesirable low-resolution feature maps. To obtain high-resolution depth maps, skip-connections or multilayer deconvolution networks are required, which complicates network training and consumes much more computations. To eliminate or at least largely reduce these problems, we introduce a spacing-increasing discretization (SID) strategy to discretize depth and recast depth network learning as an ordinal regression problem. By training the network using an ordinary regression loss, our method achieves much higher accuracy and faster convergence in synch. Furthermore, we adopt a multi-scale network structure which avoids unnecessary spatial pooling and captures multi-scale information in parallel. The proposed deep ordinal regression network (DORN) achieves state-of-the-art results on three challenging benchmarks, i.e., KITTI [16], Make3D [49], and NYU Depth v2 [41], and outperforms existing methods by a large margin.

单目深度估计是一个不适定问题，在理解三维场景几何中起着至关重要的作用。最近的方法通过从深度卷积神经网络（DCNN）中探索图像级信息和层次特征获得了显著的改进。这些方法将深度估计建模为一个回归问题，并通过最小化均方误差来训练回归网络，该均方误差存在收敛缓慢和局部解不令人满意的问题。此外，现有的深度估计网络采用重复的空间池化操作，导致不希望的低分辨率特征图。为了获得高分辨率的深度图，需要跳跃连接或多层反褶积网络，这使网络训练复杂化并消耗更多的计算。为了消除或至少在很大程度上减少这些问题，我们引入了一种间距增加离散化（SID）策略来离散深度，并将深度网络学习重新定义为有序回归问题。通过使用普通回归损失训练网络，我们的方法实现了更高的精度和更快的同步收敛。此外，我们采用了多尺度网络结构，避免了不必要的空间池，并并行捕获了多尺度信息。所提出的深度有序回归网络（DORN）在三个具有挑战性的基准上取得了最先进的结果，即KITTI[16]、Make3D[49]和NYU Depth v2[41]，并在很大程度上优于现有方法。

{"title":"Deep Ordinal Regression Network for Monocular Depth Estimation.","authors":"Huan Fu, Mingming Gong, Chaohui Wang, Kayhan Batmanghelich, Dacheng Tao","doi":"10.1109/CVPR.2018.00214","DOIUrl":"10.1109/CVPR.2018.00214","url":null,"abstract":"Monocular depth estimation, which plays a crucial role in understanding 3D scene geometry, is an ill-posed problem. Recent methods have gained significant improvement by exploring image-level information and hierarchical features from deep convolutional neural networks (DCNNs). These methods model depth estimation as a regression problem and train the regression networks by minimizing mean squared error, which suffers from slow convergence and unsatisfactory local solutions. Besides, existing depth estimation networks employ repeated spatial pooling operations, resulting in undesirable low-resolution feature maps. To obtain high-resolution depth maps, skip-connections or multilayer deconvolution networks are required, which complicates network training and consumes much more computations. To eliminate or at least largely reduce these problems, we introduce a spacing-increasing discretization (SID) strategy to discretize depth and recast depth network learning as an ordinal regression problem. By training the network using an ordinary regression loss, our method achieves much higher accuracy and faster convergence in synch. Furthermore, we adopt a multi-scale network structure which avoids unnecessary spatial pooling and captures multi-scale information in parallel. The proposed deep ordinal regression network (DORN) achieves state-of-the-art results on three challenging benchmarks, i.e., KITTI [16], Make3D [49], and NYU Depth v2 [41], and outperforms existing methods by a large margin.","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"2018 ","pages":"2002-2011"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CVPR.2018.00214","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37119005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1334

Lightweight Probabilistic Deep Networks 轻量级概率深度网络

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-05-29 DOI: 10.1109/CVPR.2018.00355

Jochen Gast, S. Roth

Even though probabilistic treatments of neural networks have a long history, they have not found widespread use in practice. Sampling approaches are often too slow already for simple networks. The size of the inputs and the depth of typical CNN architectures in computer vision only compound this problem. Uncertainty in neural networks has thus been largely ignored in practice, despite the fact that it may provide important information about the reliability of predictions and the inner workings of the network. In this paper, we introduce two lightweight approaches to making supervised learning with probabilistic deep networks practical: First, we suggest probabilistic output layers for classification and regression that require only minimal changes to existing networks. Second, we employ assumed density filtering and show that activation uncertainties can be propagated in a practical fashion through the entire network, again with minor changes. Both probabilistic networks retain the predictive power of the deterministic counterpart, but yield uncertainties that correlate well with the empirical error induced by their predictions. Moreover, the robustness to adversarial examples is significantly increased.

尽管神经网络的概率处理有着悠久的历史，但它们在实践中并没有得到广泛的应用。对于简单的网络来说，采样方法通常已经太慢了。计算机视觉中输入的大小和典型CNN架构的深度只会加剧这个问题。因此，尽管神经网络可能提供有关预测可靠性和网络内部工作的重要信息，但在实践中，神经网络的不确定性在很大程度上被忽视了。在本文中，我们介绍了两种轻量级方法，以使概率深度网络的监督学习变得实用：首先，我们提出了用于分类和回归的概率输出层，这些层只需要对现有网络进行最小的更改。其次，我们采用了假设的密度滤波，并表明激活的不确定性可以以一种实用的方式在整个网络中传播，同样只需微小的变化。两种概率网络都保留了确定性网络的预测能力，但产生的不确定性与其预测引起的经验误差密切相关。此外，对抗性示例的鲁棒性显著提高。

{"title":"Lightweight Probabilistic Deep Networks","authors":"Jochen Gast, S. Roth","doi":"10.1109/CVPR.2018.00355","DOIUrl":"https://doi.org/10.1109/CVPR.2018.00355","url":null,"abstract":"Even though probabilistic treatments of neural networks have a long history, they have not found widespread use in practice. Sampling approaches are often too slow already for simple networks. The size of the inputs and the depth of typical CNN architectures in computer vision only compound this problem. Uncertainty in neural networks has thus been largely ignored in practice, despite the fact that it may provide important information about the reliability of predictions and the inner workings of the network. In this paper, we introduce two lightweight approaches to making supervised learning with probabilistic deep networks practical: First, we suggest probabilistic output layers for classification and regression that require only minimal changes to existing networks. Second, we employ assumed density filtering and show that activation uncertainties can be propagated in a practical fashion through the entire network, again with minor changes. Both probabilistic networks retain the predictive power of the deterministic counterpart, but yield uncertainties that correlate well with the empirical error induced by their predictions. Moreover, the robustness to adversarial examples is significantly increased.","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"1 1","pages":"3369-3378"},"PeriodicalIF":0.0,"publicationDate":"2018-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/CVPR.2018.00355","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43877862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 154

Representations Based on Zero-Crossing in Scale-Space-M 尺度空间- m中基于过零的表示

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Pub Date : 2018-02-06 DOI: 10.1016/b978-0-08-051581-6.50072-6

A. Hummel

引用次数: 165

Online Graph Completion: Multivariate Signal Recovery in Computer Vision. 在线图形补全：计算机视觉中的多变量信号恢复

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Pub Date : 2017-07-01 Epub Date: 2017-11-09 DOI: 10.1109/CVPR.2017.533

Won Hwa Kim, Mona Jalal, Seongjae Hwang, Sterling C Johnson, Vikas Singh

The adoption of "human-in-the-loop" paradigms in computer vision and machine learning is leading to various applications where the actual data acquisition (e.g., human supervision) and the underlying inference algorithms are closely interwined. While classical work in active learning provides effective solutions when the learning module involves classification and regression tasks, many practical issues such as partially observed measurements, financial constraints and even additional distributional or structural aspects of the data typically fall outside the scope of this treatment. For instance, with sequential acquisition of partial measurements of data that manifest as a matrix (or tensor), novel strategies for completion (or collaborative filtering) of the remaining entries have only been studied recently. Motivated by vision problems where we seek to annotate a large dataset of images via a crowdsourced platform or alternatively, complement results from a state-of-the-art object detector using human feedback, we study the "completion" problem defined on graphs, where requests for additional measurements must be made sequentially. We design the optimization model in the Fourier domain of the graph describing how ideas based on adaptive submodularity provide algorithms that work well in practice. On a large set of images collected from Imgur, we see promising results on images that are otherwise difficult to categorize. We also show applications to an experimental design problem in neuroimaging.

在计算机视觉和机器学习中采用 "人在回路中 "的范例导致了各种应用，在这些应用中，实际数据采集（如人工监督）和底层推理算法紧密交织在一起。当学习模块涉及分类和回归任务时，主动学习领域的经典工作提供了有效的解决方案，但许多实际问题，如部分观察测量、资金限制，甚至数据的额外分布或结构方面，通常都不在此处理范围内。例如，对于以矩阵（或张量）形式表现的数据的部分测量的连续获取，对剩余条目的完成（或协同过滤）的新策略直到最近才得到研究。受视觉问题的启发，我们试图通过众包平台为大型图像数据集添加注释，或者利用人类反馈对最先进的物体检测器的结果进行补充，因此我们研究了定义在图上的 "完成 "问题，在该问题中，必须按顺序请求额外的测量。我们在图的傅立叶域中设计了优化模型，描述了基于自适应亚模块化的想法如何提供在实践中行之有效的算法。在从 Imgur 收集的大量图片上，我们看到了在难以分类的图片上取得的可喜成果。我们还展示了在神经成像实验设计问题上的应用。

{"title":"Online Graph Completion: Multivariate Signal Recovery in Computer Vision.","authors":"Won Hwa Kim, Mona Jalal, Seongjae Hwang, Sterling C Johnson, Vikas Singh","doi":"10.1109/CVPR.2017.533","DOIUrl":"10.1109/CVPR.2017.533","url":null,"abstract":"The adoption of \"human-in-the-loop\" paradigms in computer vision and machine learning is leading to various applications where the actual data acquisition (e.g., human supervision) and the underlying inference algorithms are closely interwined. While classical work in active learning provides effective solutions when the learning module involves classification and regression tasks, many practical issues such as partially observed measurements, financial constraints and even additional distributional or structural aspects of the data typically fall outside the scope of this treatment. For instance, with sequential acquisition of partial measurements of data that manifest as a matrix (or tensor), novel strategies for completion (or collaborative filtering) of the remaining entries have only been studied recently. Motivated by vision problems where we seek to annotate a large dataset of images via a crowdsourced platform or alternatively, complement results from a state-of-the-art object detector using human feedback, we study the \"completion\" problem defined on graphs, where requests for additional measurements must be made sequentially. We design the optimization model in the Fourier domain of the graph describing how ideas based on adaptive submodularity provide algorithms that work well in practice. On a large set of images collected from Imgur, we see promising results on images that are otherwise difficult to categorize. We also show applications to an experimental design problem in neuroimaging.","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"2017 ","pages":"5019-5027"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5798491/pdf/nihms914460.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35807727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0