2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)最新文献_第2页

Towards Unified Surgical Skill Assessment 迈向统一的手术技能评估

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.00940

Daochang Liu, Qiyue Li, Tingting Jiang, Yizhou Wang, R. Miao, F. Shan, Ziyu Li

Surgical skills have a great influence on surgical safety and patients’ well-being. Traditional assessment of surgical skills involves strenuous manual efforts, which lacks efficiency and repeatability. Therefore, we attempt to automatically predict how well the surgery is performed using the surgical video. In this paper, a unified multi-path framework for automatic surgical skill assessment is proposed, which takes care of multiple composing aspects of surgical skills, including surgical tool usage, intraoperative event pattern, and other skill proxies. The dependency relationships among these different aspects are specially modeled by a path dependency module in the framework. We conduct extensive experiments on the JIGSAWS dataset of simulated surgical tasks, and a new clinical dataset of real laparoscopic surgeries. The proposed framework achieves promising results on both datasets, with the state-of-the-art on the simulated dataset advanced from 0.71 Spearman’s correlation to 0.80. It is also shown that combining multiple skill aspects yields better performance than relying on a single aspect.

手术技巧对手术安全和患者的健康有很大的影响。传统的手术技能评估涉及繁重的手工工作，缺乏效率和可重复性。因此，我们尝试使用手术视频自动预测手术的执行情况。本文提出了一个统一的多路径手术技能自动评估框架，该框架考虑了手术技能的多个组成方面，包括手术工具的使用、术中事件模式和其他技能代理。这些不同方面之间的依赖关系由框架中的路径依赖模块专门建模。我们在模拟手术任务的JIGSAWS数据集和真实腹腔镜手术的新临床数据集上进行了广泛的实验。所提出的框架在两个数据集上都取得了令人满意的结果，模拟数据集上的最新技术从0.71 Spearman相关提高到0.80。研究还表明，结合多个技能方面比依赖单一方面产生更好的性能。

{"title":"Towards Unified Surgical Skill Assessment","authors":"Daochang Liu, Qiyue Li, Tingting Jiang, Yizhou Wang, R. Miao, F. Shan, Ziyu Li","doi":"10.1109/CVPR46437.2021.00940","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.00940","url":null,"abstract":"Surgical skills have a great influence on surgical safety and patients’ well-being. Traditional assessment of surgical skills involves strenuous manual efforts, which lacks efficiency and repeatability. Therefore, we attempt to automatically predict how well the surgery is performed using the surgical video. In this paper, a unified multi-path framework for automatic surgical skill assessment is proposed, which takes care of multiple composing aspects of surgical skills, including surgical tool usage, intraoperative event pattern, and other skill proxies. The dependency relationships among these different aspects are specially modeled by a path dependency module in the framework. We conduct extensive experiments on the JIGSAWS dataset of simulated surgical tasks, and a new clinical dataset of real laparoscopic surgeries. The proposed framework achieves promising results on both datasets, with the state-of-the-art on the simulated dataset advanced from 0.71 Spearman’s correlation to 0.80. It is also shown that combining multiple skill aspects yields better performance than relying on a single aspect.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"221 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122605275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

Flow Guided Transformable Bottleneck Networks for Motion Retargeting 运动重定向的流导向可转换瓶颈网络

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.01065

Jian Ren, Menglei Chai, Oliver J. Woodford, Kyle Olszewski, S. Tulyakov

Human motion retargeting aims to transfer the motion of one person in a "driving" video or set of images to another person. Existing efforts leverage a long training video from each target person to train a subject-specific motion transfer model. However, the scalability of such methods is limited, as each model can only generate videos for the given target subject, and such training videos are labor-intensive to acquire and process. Few-shot motion transfer techniques, which only require one or a few images from a target, have recently drawn considerable attention. Methods addressing this task generally use either 2D or explicit 3D representations to transfer motion, and in doing so, sacrifice either accurate geometric modeling or the flexibility of an end-to-end learned representation. Inspired by the Transformable Bottleneck Network, which renders novel views and manipulations of rigid objects, we propose an approach based on an implicit volumetric representation of the image content, which can then be spatially manipulated using volumetric flow fields. We address the challenging question of how to aggregate information across different body poses, learning flow fields that allow for combining content from the appropriate regions of input images of highly non-rigid human subjects performing complex motions into a single implicit volumetric representation. This allows us to learn our 3D representation solely from videos of moving people. Armed with both 3D object understanding and end-to-end learned rendering, this categorically novel representation delivers state-of-the-art image generation quality, as shown by our quantitative and qualitative evaluations.

人体动作重定向的目的是将“驾驶”视频或一组图像中的一个人的动作转移到另一个人身上。现有的努力利用来自每个目标人的长训练视频来训练特定主题的动作转移模型。然而，这种方法的可扩展性是有限的，因为每个模型只能为给定的目标主体生成视频，并且这种训练视频的获取和处理是劳动密集型的。少射运动转移技术，它只需要一个或几个图像从一个目标，最近引起了相当大的关注。解决此任务的方法通常使用2D或明确的3D表示来传递运动，这样做会牺牲精确的几何建模或端到端学习表示的灵活性。受变形瓶颈网络的启发，我们提出了一种基于图像内容的隐式体积表示的方法，然后可以使用体积流场对其进行空间操作。我们解决了一个具有挑战性的问题，即如何在不同的身体姿势之间聚合信息，学习流场，允许将执行复杂动作的高度非刚性人类受试者输入图像的适当区域的内容组合到一个隐含的体积表示中。这使我们能够仅从移动的人的视频中学习我们的3D表示。凭借3D对象理解和端到端学习渲染，这种绝对新颖的表示提供了最先进的图像生成质量，如我们的定量和定性评估所示。

{"title":"Flow Guided Transformable Bottleneck Networks for Motion Retargeting","authors":"Jian Ren, Menglei Chai, Oliver J. Woodford, Kyle Olszewski, S. Tulyakov","doi":"10.1109/CVPR46437.2021.01065","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.01065","url":null,"abstract":"Human motion retargeting aims to transfer the motion of one person in a \"driving\" video or set of images to another person. Existing efforts leverage a long training video from each target person to train a subject-specific motion transfer model. However, the scalability of such methods is limited, as each model can only generate videos for the given target subject, and such training videos are labor-intensive to acquire and process. Few-shot motion transfer techniques, which only require one or a few images from a target, have recently drawn considerable attention. Methods addressing this task generally use either 2D or explicit 3D representations to transfer motion, and in doing so, sacrifice either accurate geometric modeling or the flexibility of an end-to-end learned representation. Inspired by the Transformable Bottleneck Network, which renders novel views and manipulations of rigid objects, we propose an approach based on an implicit volumetric representation of the image content, which can then be spatially manipulated using volumetric flow fields. We address the challenging question of how to aggregate information across different body poses, learning flow fields that allow for combining content from the appropriate regions of input images of highly non-rigid human subjects performing complex motions into a single implicit volumetric representation. This allows us to learn our 3D representation solely from videos of moving people. Armed with both 3D object understanding and end-to-end learned rendering, this categorically novel representation delivers state-of-the-art image generation quality, as shown by our quantitative and qualitative evaluations.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"168 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122620670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

PSD: Principled Synthetic-to-Real Dehazing Guided by Physical Priors PSD:由物理先验引导的原则性合成到真实的除雾

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.00710

Zeyuan Chen, Yangchao Wang, Yang Yang, Dong Liu

Deep learning-based methods have achieved remarkable performance for image dehazing. However, previous studies are mostly focused on training models with synthetic hazy images, which incurs performance drop when the models are used for real-world hazy images. We propose a Principled Synthetic-to-real Dehazing (PSD) framework to improve the generalization performance of dehazing. Starting from a dehazing model backbone that is pre-trained on synthetic data, PSD exploits real hazy images to fine-tune the model in an unsupervised fashion. For the fine-tuning, we leverage several well-grounded physical priors and combine them into a prior loss committee. PSD allows for most of the existing dehazing models as its backbone, and the combination of multiple physical priors boosts dehazing significantly. Through extensive experiments, we demonstrate that our PSD framework establishes the new state-of-the-art performance for real-world dehazing, in terms of visual quality assessed by no-reference quality metrics as well as subjective evaluation and downstream task performance indicator.

基于深度学习的图像去雾方法取得了显著的效果。然而，以往的研究大多集中在使用合成的模糊图像训练模型，当模型用于真实的模糊图像时，会导致性能下降。为了提高除雾的泛化性能，我们提出了一种原则性的合成到真实除雾(PSD)框架。从预训练合成数据的消雾模型主干开始，PSD利用真实的朦胧图像以无监督的方式微调模型。为了进行微调，我们利用了几个有充分根据的物理先验，并将它们合并为一个先验损失委员会。PSD允许大多数现有的除雾模型作为其主干，并且多个物理先验的组合显著提高了除雾效果。通过广泛的实验，我们证明了我们的PSD框架在无参考质量指标、主观评价和下游任务性能指标评估的视觉质量方面，为现实世界的除雾建立了新的最先进的性能。

引用次数: 125

Learning Progressive Point Embeddings for 3D Point Cloud Generation 学习三维点云生成的渐进点嵌入

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.01013

Cheng Wen, Baosheng Yu, D. Tao

Generative models for 3D point clouds are extremely important for scene/object reconstruction applications in autonomous driving and robotics. Despite recent success of deep learning-based representation learning, it remains a great challenge for deep neural networks to synthesize or reconstruct high-fidelity point clouds, because of the difficulties in 1) learning effective pointwise representations; and 2) generating realistic point clouds from complex distributions. In this paper, we devise a dual-generators framework for point cloud generation, which generalizes vanilla generative adversarial learning framework in a progressive manner. Specifically, the first generator aims to learn effective point embeddings in a breadth-first manner, while the second generator is used to refine the generated point cloud based on a depth-first point embedding to generate a robust and uniform point cloud. The proposed dual-generators framework thus is able to progressively learn effective point embeddings for accurate point cloud generation. Experimental results on a variety of object categories from the most popular point cloud generation dataset, ShapeNet, demonstrate the state-of-the-art performance of the proposed method for accurate point cloud generation.

三维点云的生成模型对于自动驾驶和机器人中的场景/物体重建应用非常重要。尽管最近基于深度学习的表示学习取得了成功，但深度神经网络合成或重建高保真点云仍然是一个巨大的挑战，因为在1)学习有效的点表示方面存在困难;2)从复杂分布中生成逼真的点云。在本文中，我们设计了一个用于点云生成的双生成器框架，它以一种渐进的方式推广了传统的生成对抗学习框架。其中，第一个生成器旨在以宽度优先的方式学习有效的点嵌入，而第二个生成器用于基于深度优先的点嵌入对生成的点云进行细化，以生成鲁棒且均匀的点云。因此，所提出的双生成器框架能够逐步学习有效的点嵌入，从而精确地生成点云。在最流行的点云生成数据集ShapeNet的各种对象类别上的实验结果证明了所提出的方法在精确点云生成方面的最新性能。

{"title":"Learning Progressive Point Embeddings for 3D Point Cloud Generation","authors":"Cheng Wen, Baosheng Yu, D. Tao","doi":"10.1109/CVPR46437.2021.01013","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.01013","url":null,"abstract":"Generative models for 3D point clouds are extremely important for scene/object reconstruction applications in autonomous driving and robotics. Despite recent success of deep learning-based representation learning, it remains a great challenge for deep neural networks to synthesize or reconstruct high-fidelity point clouds, because of the difficulties in 1) learning effective pointwise representations; and 2) generating realistic point clouds from complex distributions. In this paper, we devise a dual-generators framework for point cloud generation, which generalizes vanilla generative adversarial learning framework in a progressive manner. Specifically, the first generator aims to learn effective point embeddings in a breadth-first manner, while the second generator is used to refine the generated point cloud based on a depth-first point embedding to generate a robust and uniform point cloud. The proposed dual-generators framework thus is able to progressively learn effective point embeddings for accurate point cloud generation. Experimental results on a variety of object categories from the most popular point cloud generation dataset, ShapeNet, demonstrate the state-of-the-art performance of the proposed method for accurate point cloud generation.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114436053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Intrinsic Image Harmonization 内禀图像协调

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.01610

Zonghui Guo, Haiyong Zheng, Yufeng Jiang, Zhaorui Gu, Bing Zheng

Compositing an image usually inevitably suffers from inharmony problem that is mainly caused by incompatibility of foreground and background from two different images with distinct surfaces and lights, corresponding to material-dependent and light-dependent characteristics, namely, reflectance and illumination intrinsic images, respectively. Therefore, we seek to solve image harmonization via separable harmonization of reflectance and illumination, i.e., intrinsic image harmonization. Our method is based on an autoencoder that disentangles composite image into reflectance and illumination for further separate harmonization. Specifically, we harmonize reflectance through material-consistency penalty, while harmonize illumination by learning and transferring light from background to foreground, moreover, we model patch relations between foreground and background of composite images in an inharmony-free learning way, to adaptively guide our intrinsic image harmonization. Both extensive experiments and ablation studies demonstrate the power of our method as well as the efficacy of each component. We also contribute a new challenging dataset for benchmarking illumination harmonization. Code and dataset are at https://github.com/zhenglab/IntrinsicHarmony.

合成图像通常不可避免地会遇到不和谐问题，这主要是由于两幅不同图像的前景和背景不相容，这两幅图像具有不同的表面和光线，分别对应于依赖材料和依赖光线的特征，即反射和照明的内在图像。因此，我们寻求通过反射率和照度的可分离协调来解决图像协调问题，即固有图像协调。我们的方法是基于一个自动编码器，它将合成图像分解成反射和照明，进一步分离协调。具体来说，我们通过材料一致性惩罚来协调反射率，通过学习和将光从背景转移到前景来协调照明，并且我们以无不和谐的学习方式建模复合图像的前景和背景之间的斑块关系，以自适应地指导我们的图像内在协调。广泛的实验和消融研究都证明了我们的方法的力量以及每个组件的功效。我们还提供了一个新的具有挑战性的数据集来对标照明协调。代码和数据集在https://github.com/zhenglab/IntrinsicHarmony。

{"title":"Intrinsic Image Harmonization","authors":"Zonghui Guo, Haiyong Zheng, Yufeng Jiang, Zhaorui Gu, Bing Zheng","doi":"10.1109/CVPR46437.2021.01610","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.01610","url":null,"abstract":"Compositing an image usually inevitably suffers from inharmony problem that is mainly caused by incompatibility of foreground and background from two different images with distinct surfaces and lights, corresponding to material-dependent and light-dependent characteristics, namely, reflectance and illumination intrinsic images, respectively. Therefore, we seek to solve image harmonization via separable harmonization of reflectance and illumination, i.e., intrinsic image harmonization. Our method is based on an autoencoder that disentangles composite image into reflectance and illumination for further separate harmonization. Specifically, we harmonize reflectance through material-consistency penalty, while harmonize illumination by learning and transferring light from background to foreground, moreover, we model patch relations between foreground and background of composite images in an inharmony-free learning way, to adaptively guide our intrinsic image harmonization. Both extensive experiments and ablation studies demonstrate the power of our method as well as the efficacy of each component. We also contribute a new challenging dataset for benchmarking illumination harmonization. Code and dataset are at https://github.com/zhenglab/IntrinsicHarmony.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"190 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114469724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 47

Pseudo Facial Generation with Extreme Poses for Face Recognition 伪人脸生成与极端姿态的人脸识别

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.00203

Guoli Wang, Jiaqi Ma, Qian Zhang, Jiwen Lu, Jie Zhou

Face recognition has achieved a great success in recent years, it is still challenging to recognize those facial images with extreme poses. Traditional methods consider it as a domain gap problem. Many of them settle it by generating fake frontal faces from extreme ones, whereas they are tough to maintain the identity information with high computational consumption and uncontrolled disturbances. Our experimental analysis shows a dramatic precision drop with extreme poses. Meanwhile, those extreme poses just exist minor visual differences after small rotations. Derived from this insight, we attempt to relieve such a huge precision drop by making minor changes to the input images without modifying existing discriminators. A novel lightweight pseudo facial generation is proposed to relieve the problem of extreme poses without generating any frontal facial image. It can depict the facial contour information and make appropriate modifications to preserve the critical identity information. Specifically, the proposed method reconstructs pseudo profile faces by minimizing the pixel-wise differences with original profile faces and maintaining the identity consistent information from their corresponding frontal faces simultaneously. The proposed framework can improve existing discriminators and obtain a great promotion on several benchmark datasets.

近年来，人脸识别技术取得了巨大的成功，但对于极端姿态的人脸图像的识别仍然是一个挑战。传统方法将其视为域间隙问题。许多方法都是通过极端人脸生成假人脸来解决这一问题，但由于计算量大，干扰不受控制，难以保持身份信息。我们的实验分析表明，在极端的姿势下，精度会急剧下降。同时，这些极端的姿势在小的旋转后只存在微小的视觉差异。基于这一见解，我们试图在不修改现有判别器的情况下，通过对输入图像进行微小改变来缓解如此巨大的精度下降。提出了一种新的轻量级伪人脸生成方法，在不生成正面人脸图像的情况下解决了极端姿态问题。它可以描述面部轮廓信息并进行适当的修改以保留关键的身份信息。具体而言，该方法通过最小化与原始轮廓面在像素上的差异，同时保持其对应正面的身份一致信息来重建伪轮廓面。该框架可以改进现有的判别器，并在多个基准数据集上获得很大的提升。

{"title":"Pseudo Facial Generation with Extreme Poses for Face Recognition","authors":"Guoli Wang, Jiaqi Ma, Qian Zhang, Jiwen Lu, Jie Zhou","doi":"10.1109/CVPR46437.2021.00203","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.00203","url":null,"abstract":"Face recognition has achieved a great success in recent years, it is still challenging to recognize those facial images with extreme poses. Traditional methods consider it as a domain gap problem. Many of them settle it by generating fake frontal faces from extreme ones, whereas they are tough to maintain the identity information with high computational consumption and uncontrolled disturbances. Our experimental analysis shows a dramatic precision drop with extreme poses. Meanwhile, those extreme poses just exist minor visual differences after small rotations. Derived from this insight, we attempt to relieve such a huge precision drop by making minor changes to the input images without modifying existing discriminators. A novel lightweight pseudo facial generation is proposed to relieve the problem of extreme poses without generating any frontal facial image. It can depict the facial contour information and make appropriate modifications to preserve the critical identity information. Specifically, the proposed method reconstructs pseudo profile faces by minimizing the pixel-wise differences with original profile faces and maintaining the identity consistent information from their corresponding frontal faces simultaneously. The proposed framework can improve existing discriminators and obtain a great promotion on several benchmark datasets.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114566551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

StyleMix: Separating Content and Style for Enhanced Data Augmentation StyleMix:分离内容和样式以增强数据增强

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.01462

Minui Hong, Jinwoo Choi, Gunhee Kim

In spite of the great success of deep neural networks for many challenging classification tasks, the learned networks are vulnerable to overfitting and adversarial attacks. Recently, mixup based augmentation methods have been actively studied as one practical remedy for these drawbacks. However, these approaches do not distinguish between the content and style features of the image, but mix or cut-and-paste the images. We propose StyleMix and StyleCutMix as the first mixup method that separately manipulates the content and style information of input image pairs. By carefully mixing up the content and style of images, we can create more abundant and robust samples, which eventually enhance the generalization of model training. We also develop an automatic scheme to decide the degree of style mixing according to the pair’s class distance, to prevent messy mixed images from too differently styled pairs. Our experiments on CIFAR-10, CIFAR-100 and ImageNet datasets show that StyleMix achieves better or comparable performance to state of the art mixup methods and learns more robust classifiers to adversarial attacks.

尽管深度神经网络在许多具有挑战性的分类任务中取得了巨大的成功，但学习到的网络容易受到过拟合和对抗性攻击。最近，基于混合的增强方法被积极研究，作为一种实用的补救措施。但是，这些方法并不区分图像的内容和样式特征，而是混合或剪切粘贴图像。我们提出StyleMix和StyleCutMix作为第一个mixup方法，分别处理输入图像对的内容和样式信息。通过仔细混合图像的内容和风格，我们可以创建更丰富和鲁棒的样本，最终增强模型训练的泛化能力。我们还开发了一种自动方案，根据对的类距离来确定风格混合的程度，以防止因风格差异太大的对而产生混乱的混合图像。我们在CIFAR-10, CIFAR-100和ImageNet数据集上的实验表明，StyleMix达到了比最先进的混合方法更好或相当的性能，并且对对抗性攻击学习了更健壮的分类器。

{"title":"StyleMix: Separating Content and Style for Enhanced Data Augmentation","authors":"Minui Hong, Jinwoo Choi, Gunhee Kim","doi":"10.1109/CVPR46437.2021.01462","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.01462","url":null,"abstract":"In spite of the great success of deep neural networks for many challenging classification tasks, the learned networks are vulnerable to overfitting and adversarial attacks. Recently, mixup based augmentation methods have been actively studied as one practical remedy for these drawbacks. However, these approaches do not distinguish between the content and style features of the image, but mix or cut-and-paste the images. We propose StyleMix and StyleCutMix as the first mixup method that separately manipulates the content and style information of input image pairs. By carefully mixing up the content and style of images, we can create more abundant and robust samples, which eventually enhance the generalization of model training. We also develop an automatic scheme to decide the degree of style mixing according to the pair’s class distance, to prevent messy mixed images from too differently styled pairs. Our experiments on CIFAR-10, CIFAR-100 and ImageNet datasets show that StyleMix achieves better or comparable performance to state of the art mixup methods and learns more robust classifiers to adversarial attacks.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121908133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 49

Positive-Unlabeled Data Purification in the Wild for Object Detection 用于对象检测的野外正未标记数据净化

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.00268

Jianyuan Guo, Kai Han, Han Wu, Chao Zhang, Xinghao Chen, Chunjing Xu, Chang Xu, Yunhe Wang

Deep learning based object detection approaches have achieved great progress with the benefit from large amount of labeled images. However, image annotation remains a laborious, time-consuming and error-prone process. To further improve the performance of detectors, we seek to exploit all available labeled data and excavate useful samples from massive unlabeled images in the wild, which is rarely discussed before. In this paper, we present a positive-unlabeled learning based scheme to expand training data by purifying valuable images from massive unlabeled ones, where the original training data are viewed as positive data and the unlabeled images in the wild are unlabeled data. To effectively utilized these purified data, we propose a self-distillation algorithm based on hint learning and ground truth bounded knowledge distillation. Experimental results verify that the proposed positive-unlabeled data purification can strengthen the original detector by mining the massive unlabeled data. In particular, our method boosts the mAP of FPN by +2.0% on COCO benchmark.

基于深度学习的目标检测方法已经取得了很大的进步，这得益于大量的标记图像。然而，图像注释仍然是一个费力、耗时且容易出错的过程。为了进一步提高检测器的性能，我们试图利用所有可用的标记数据，并从大量未标记的图像中挖掘有用的样本，这在以前很少被讨论过。在本文中，我们提出了一种基于正无标签学习的方案，通过从大量未标记的图像中纯化有价值的图像来扩展训练数据，其中原始训练数据被视为正数据，而未标记的图像被视为未标记数据。为了有效利用这些净化后的数据，我们提出了一种基于提示学习和基础真值有界知识蒸馏的自蒸馏算法。实验结果验证了所提出的正未标记数据净化方法可以通过挖掘大量未标记数据来增强原始检测器。特别是，我们的方法在COCO基准上将FPN的mAP提高了+2.0%。

{"title":"Positive-Unlabeled Data Purification in the Wild for Object Detection","authors":"Jianyuan Guo, Kai Han, Han Wu, Chao Zhang, Xinghao Chen, Chunjing Xu, Chang Xu, Yunhe Wang","doi":"10.1109/CVPR46437.2021.00268","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.00268","url":null,"abstract":"Deep learning based object detection approaches have achieved great progress with the benefit from large amount of labeled images. However, image annotation remains a laborious, time-consuming and error-prone process. To further improve the performance of detectors, we seek to exploit all available labeled data and excavate useful samples from massive unlabeled images in the wild, which is rarely discussed before. In this paper, we present a positive-unlabeled learning based scheme to expand training data by purifying valuable images from massive unlabeled ones, where the original training data are viewed as positive data and the unlabeled images in the wild are unlabeled data. To effectively utilized these purified data, we propose a self-distillation algorithm based on hint learning and ground truth bounded knowledge distillation. Experimental results verify that the proposed positive-unlabeled data purification can strengthen the original detector by mining the massive unlabeled data. In particular, our method boosts the mAP of FPN by +2.0% on COCO benchmark.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129844255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Transferable Query Selection for Active Domain Adaptation 基于活动域自适应的可转移查询选择

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.00719

Bo Fu, Zhangjie Cao, Jianmin Wang, Mingsheng Long

Unsupervised domain adaptation (UDA) enables transferring knowledge from a related source domain to a fully unlabeled target domain. Despite the significant advances in UDA, the performance gap remains quite large between UDA and supervised learning with fully labeled target data. Active domain adaptation (ADA) mitigates the gap under minimal annotation cost by selecting a small quota of target samples to annotate and incorporating them into training. Due to the domain shift, the query selection criteria of prior active learning methods may be ineffective to select the most informative target samples for annotation. In this paper, we propose Transferable Query Selection (TQS), which selects the most informative samples under domain shift by an ensemble of three new criteria: transferable committee, transferable uncertainty, and transferable domainness. We further develop a randomized selection algorithm to enhance the diversity of the selected samples. Experiments show that TQS remarkably outperforms previous UDA and ADA methods on several domain adaptation datasets. Deeper analyses demonstrate that TQS can select the most informative target samples under the domain shift.

无监督域自适应(UDA)可以将知识从相关的源域转移到完全未标记的目标域。尽管在UDA方面取得了重大进展，但在具有完全标记目标数据的UDA和监督学习之间的性能差距仍然相当大。主动域自适应(Active domain adaptation, ADA)通过选择少量目标样本进行标注并将其纳入训练中，在最小标注成本下减小了标注差距。由于领域的转移，先前的主动学习方法的查询选择标准可能无法有效地选择信息最多的目标样本进行标注。在本文中，我们提出了可转移查询选择(Transferable Query Selection, TQS)，它通过可转移的委员会、可转移的不确定性和可转移的领域三个新标准的集合来选择在领域转移下信息量最大的样本。我们进一步开发了一种随机选择算法，以增强所选样本的多样性。实验表明，TQS方法在多个领域自适应数据集上的性能明显优于先前的UDA和ADA方法。进一步的分析表明，TQS算法可以在域移位的情况下选择信息量最大的目标样本。

{"title":"Transferable Query Selection for Active Domain Adaptation","authors":"Bo Fu, Zhangjie Cao, Jianmin Wang, Mingsheng Long","doi":"10.1109/CVPR46437.2021.00719","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.00719","url":null,"abstract":"Unsupervised domain adaptation (UDA) enables transferring knowledge from a related source domain to a fully unlabeled target domain. Despite the significant advances in UDA, the performance gap remains quite large between UDA and supervised learning with fully labeled target data. Active domain adaptation (ADA) mitigates the gap under minimal annotation cost by selecting a small quota of target samples to annotate and incorporating them into training. Due to the domain shift, the query selection criteria of prior active learning methods may be ineffective to select the most informative target samples for annotation. In this paper, we propose Transferable Query Selection (TQS), which selects the most informative samples under domain shift by an ensemble of three new criteria: transferable committee, transferable uncertainty, and transferable domainness. We further develop a randomized selection algorithm to enhance the diversity of the selected samples. Experiments show that TQS remarkably outperforms previous UDA and ADA methods on several domain adaptation datasets. Deeper analyses demonstrate that TQS can select the most informative target samples under the domain shift.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128789884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 41

Self-Supervised Wasserstein Pseudo-Labeling for Semi-Supervised Image Classification 半监督图像分类的自监督Wasserstein伪标记

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.01209

Fariborz Taherkhani, Ali Dabouei, Sobhan Soleymani, J. Dawson, N. Nasrabadi

The goal is to use Wasserstein metric to provide pseudo labels for the unlabeled images to train a Convolutional Neural Networks (CNN) in a Semi-Supervised Learning (SSL) manner for the classification task. The basic premise in our method is that the discrepancy between two discrete empirical measures (e.g., clusters) which come from the same or similar distribution is expected to be less than the case where these measures come from completely two different distributions. In our proposed method, we first pre-train our CNN using a self-supervised learning method to make a cluster assumption on the unlabeled images. Next, inspired by the Wasserstein metric which considers the geometry of the metric space to provide a natural notion of similarity between discrete empirical measures, we leverage it to cluster the unlabeled images and then match the clusters to their similar class of labeled images to provide a pseudo label for the data within each cluster. We have evaluated and compared our method with state-of-the-art SSL methods on the standard datasets to demonstrate its effectiveness.

目标是使用Wasserstein度量为未标记的图像提供伪标签，以半监督学习(SSL)的方式训练卷积神经网络(CNN)进行分类任务。我们方法的基本前提是，来自相同或类似分布的两个离散经验测量(例如，集群)之间的差异预计小于这些测量来自完全不同分布的情况。在我们提出的方法中，我们首先使用自监督学习方法对CNN进行预训练，对未标记的图像进行聚类假设。接下来，受Wasserstein度量(考虑度量空间的几何形状，以提供离散经验度量之间的自然相似性概念)的启发，我们利用它对未标记的图像进行聚类，然后将聚类与其相似的标记图像类进行匹配，为每个聚类中的数据提供伪标签。我们已经在标准数据集上对我们的方法与最先进的SSL方法进行了评估和比较，以证明其有效性。

{"title":"Self-Supervised Wasserstein Pseudo-Labeling for Semi-Supervised Image Classification","authors":"Fariborz Taherkhani, Ali Dabouei, Sobhan Soleymani, J. Dawson, N. Nasrabadi","doi":"10.1109/CVPR46437.2021.01209","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.01209","url":null,"abstract":"The goal is to use Wasserstein metric to provide pseudo labels for the unlabeled images to train a Convolutional Neural Networks (CNN) in a Semi-Supervised Learning (SSL) manner for the classification task. The basic premise in our method is that the discrepancy between two discrete empirical measures (e.g., clusters) which come from the same or similar distribution is expected to be less than the case where these measures come from completely two different distributions. In our proposed method, we first pre-train our CNN using a self-supervised learning method to make a cluster assumption on the unlabeled images. Next, inspired by the Wasserstein metric which considers the geometry of the metric space to provide a natural notion of similarity between discrete empirical measures, we leverage it to cluster the unlabeled images and then match the clusters to their similar class of labeled images to provide a pseudo label for the data within each cluster. We have evaluated and compared our method with state-of-the-art SSL methods on the standard datasets to demonstrate its effectiveness.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129263317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23