首页 > 最新文献

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference最新文献

英文 中文
Biologically Plausible Variational Policy Gradient with Spiking Recurrent Winner-Take-All Networks 生物学上似是而非的变分策略梯度与反复出现的赢者通吃网络
Zhile Yang, Shangqi Guo, Ying Fang, Jian K. Liu
One stream of reinforcement learning research is exploring biologically plausible models and algorithms to simulate biological intelligence and fit neuromorphic hardware. Among them, reward-modulated spike-timing-dependent plasticity (R-STDP) is a recent branch with good potential in energy efficiency. However, current R-STDP methods rely on heuristic designs of local learning rules, thus requiring task-specific expert knowledge. In this paper, we consider a spiking recurrent winner-take-all network, and propose a new R-STDP method, spiking variational policy gradient (SVPG), whose local learning rules are derived from the global policy gradient and thus eliminate the need for heuristic designs. In experiments of MNIST classification and Gym InvertedPendulum, our SVPG achieves good training performance, and also presents better robustness to various kinds of noises than conventional methods.
强化学习研究的一个流派是探索生物学上合理的模型和算法来模拟生物智能和适应神经形态硬件。其中,奖励调制的峰值时间相关塑性(R-STDP)是一个在能效方面具有良好潜力的新分支。然而,目前的R-STDP方法依赖于局部学习规则的启发式设计,因此需要特定于任务的专家知识。本文考虑一个尖峰循环赢者通吃网络,提出了一种新的R-STDP方法——尖峰变分策略梯度(spike variational policy gradient, SVPG),该方法的局部学习规则来源于全局策略梯度,从而消除了启发式设计的需要。在MNIST分类和Gym倒立摆的实验中,我们的SVPG取得了良好的训练效果,并且对各种噪声的鲁棒性也优于常规方法。
{"title":"Biologically Plausible Variational Policy Gradient with Spiking Recurrent Winner-Take-All Networks","authors":"Zhile Yang, Shangqi Guo, Ying Fang, Jian K. Liu","doi":"10.48550/arXiv.2210.13225","DOIUrl":"https://doi.org/10.48550/arXiv.2210.13225","url":null,"abstract":"One stream of reinforcement learning research is exploring biologically plausible models and algorithms to simulate biological intelligence and fit neuromorphic hardware. Among them, reward-modulated spike-timing-dependent plasticity (R-STDP) is a recent branch with good potential in energy efficiency. However, current R-STDP methods rely on heuristic designs of local learning rules, thus requiring task-specific expert knowledge. In this paper, we consider a spiking recurrent winner-take-all network, and propose a new R-STDP method, spiking variational policy gradient (SVPG), whose local learning rules are derived from the global policy gradient and thus eliminate the need for heuristic designs. In experiments of MNIST classification and Gym InvertedPendulum, our SVPG achieves good training performance, and also presents better robustness to various kinds of noises than conventional methods.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"55 1","pages":"358"},"PeriodicalIF":0.0,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74706009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FIND: An Unsupervised Implicit 3D Model of Articulated Human Feet 发现:一个无监督的隐式三维模型铰接的人的脚
Oliver Boyne, James Charles, R. Cipolla
In this paper we present a high fidelity and articulated 3D human foot model. The model is parameterised by a disentangled latent code in terms of shape, texture and articulated pose. While high fidelity models are typically created with strong supervision such as 3D keypoint correspondences or pre-registration, we focus on the difficult case of little to no annotation. To this end, we make the following contributions: (i) we develop a Foot Implicit Neural Deformation field model, named FIND, capable of tailoring explicit meshes at any resolution i.e. for low or high powered devices; (ii) an approach for training our model in various modes of weak supervision with progressively better disentanglement as more labels, such as pose categories, are provided; (iii) a novel unsupervised part-based loss for fitting our model to 2D images which is better than traditional photometric or silhouette losses; (iv) finally, we release a new dataset of high resolution 3D human foot scans, Foot3D. On this dataset, we show our model outperforms a strong PCA implementation trained on the same data in terms of shape quality and part correspondences, and that our novel unsupervised part-based loss improves inference on images.
在本文中,我们提出了一个高保真和铰接的三维人体足模型。该模型由一个解纠缠的潜在代码参数化,包括形状、纹理和关节姿态。虽然高保真模型通常是在强大的监督下创建的,如3D关键点对应或预注册,但我们专注于很少或没有注释的困难情况。为此,我们做出以下贡献:(i)我们开发了一个名为FIND的足部隐式神经变形场模型,能够以任何分辨率剪裁显式网格,即用于低功率或高功率设备;(ii)在各种弱监督模式下训练我们的模型的方法,随着提供更多的标签(如姿势类别),模型的解缠程度逐渐提高;(iii)一种新的无监督的基于部分的损失,用于将我们的模型拟合到2D图像,比传统的光度或轮廓损失更好;(iv)最后,我们发布了一个新的高分辨率3D人体足部扫描数据集,Foot3D。在这个数据集上,我们证明了我们的模型在形状质量和部件对应性方面优于在相同数据上训练的强PCA实现,并且我们的新型无监督的基于部件的损失改进了对图像的推理。
{"title":"FIND: An Unsupervised Implicit 3D Model of Articulated Human Feet","authors":"Oliver Boyne, James Charles, R. Cipolla","doi":"10.48550/arXiv.2210.12241","DOIUrl":"https://doi.org/10.48550/arXiv.2210.12241","url":null,"abstract":"In this paper we present a high fidelity and articulated 3D human foot model. The model is parameterised by a disentangled latent code in terms of shape, texture and articulated pose. While high fidelity models are typically created with strong supervision such as 3D keypoint correspondences or pre-registration, we focus on the difficult case of little to no annotation. To this end, we make the following contributions: (i) we develop a Foot Implicit Neural Deformation field model, named FIND, capable of tailoring explicit meshes at any resolution i.e. for low or high powered devices; (ii) an approach for training our model in various modes of weak supervision with progressively better disentanglement as more labels, such as pose categories, are provided; (iii) a novel unsupervised part-based loss for fitting our model to 2D images which is better than traditional photometric or silhouette losses; (iv) finally, we release a new dataset of high resolution 3D human foot scans, Foot3D. On this dataset, we show our model outperforms a strong PCA implementation trained on the same data in terms of shape quality and part correspondences, and that our novel unsupervised part-based loss improves inference on images.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"1 1","pages":"630"},"PeriodicalIF":0.0,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85023432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Face Pyramid Vision Transformer 面部金字塔视觉变压器
Khawar Islam, M. Zaheer, Arif Mahmood
A novel Face Pyramid Vision Transformer (FPVT) is proposed to learn a discriminative multi-scale facial representations for face recognition and verification. In FPVT, Face Spatial Reduction Attention (FSRA) and Dimensionality Reduction (FDR) layers are employed to make the feature maps compact, thus reducing the computations. An Improved Patch Embedding (IPE) algorithm is proposed to exploit the benefits of CNNs in ViTs (e.g., shared weights, local context, and receptive fields) to model lower-level edges to higher-level semantic primitives. Within FPVT framework, a Convolutional Feed-Forward Network (CFFN) is proposed that extracts locality information to learn low level facial information. The proposed FPVT is evaluated on seven benchmark datasets and compared with ten existing state-of-the-art methods, including CNNs, pure ViTs, and Convolutional ViTs. Despite fewer parameters, FPVT has demonstrated excellent performance over the compared methods. Project page is available at https://khawar-islam.github.io/fpvt/
提出了一种新的人脸金字塔视觉转换器(FPVT),学习一种判别性的多尺度人脸表示,用于人脸识别和验证。FPVT采用Face Spatial Reduction Attention (FSRA)和Dimensionality Reduction (FDR)两层来压缩特征映射,从而减少了计算量。提出了一种改进的补丁嵌入(IPE)算法,利用cnn在vit中的优势(例如,共享权重、局部上下文和接受域)将低级边缘建模为高级语义原语。在FPVT框架内,提出了一种卷积前馈网络(CFFN),提取局部信息学习低级人脸信息。提出的FPVT在7个基准数据集上进行了评估,并与现有的10种最先进的方法进行了比较,包括cnn、纯vit和卷积vit。尽管参数较少,但FPVT在对比方法中表现出了优异的性能。项目页面可访问https://khawar-islam.github.io/fpvt/
{"title":"Face Pyramid Vision Transformer","authors":"Khawar Islam, M. Zaheer, Arif Mahmood","doi":"10.48550/arXiv.2210.11974","DOIUrl":"https://doi.org/10.48550/arXiv.2210.11974","url":null,"abstract":"A novel Face Pyramid Vision Transformer (FPVT) is proposed to learn a discriminative multi-scale facial representations for face recognition and verification. In FPVT, Face Spatial Reduction Attention (FSRA) and Dimensionality Reduction (FDR) layers are employed to make the feature maps compact, thus reducing the computations. An Improved Patch Embedding (IPE) algorithm is proposed to exploit the benefits of CNNs in ViTs (e.g., shared weights, local context, and receptive fields) to model lower-level edges to higher-level semantic primitives. Within FPVT framework, a Convolutional Feed-Forward Network (CFFN) is proposed that extracts locality information to learn low level facial information. The proposed FPVT is evaluated on seven benchmark datasets and compared with ten existing state-of-the-art methods, including CNNs, pure ViTs, and Convolutional ViTs. Despite fewer parameters, FPVT has demonstrated excellent performance over the compared methods. Project page is available at https://khawar-islam.github.io/fpvt/","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"4 1","pages":"758"},"PeriodicalIF":0.0,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74288609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
G2NetPL: Generic Game-Theoretic Network for Partial-Label Image Classification G2NetPL:部分标签图像分类的通用博弈论网络
R. Abdelfattah, Xin Zhang, M. Fouda, Xiaofeng Wang, Song Wang
Multi-label image classification aims to predict all possible labels in an image. It is usually formulated as a partial-label learning problem, since it could be expensive in practice to annotate all the labels in every training image. Existing works on partial-label learning focus on the case where each training image is labeled with only a subset of its positive/negative labels. To effectively address partial-label classification, this paper proposes an end-to-end Generic Game-theoretic Network (G2NetPL) for partial-label learning, which can be applied to most partial-label settings, including a very challenging, but annotation-efficient case where only a subset of the training images are labeled, each with only one positive label, while the rest of the training images remain unlabeled. In G2NetPL, each unobserved label is associated with a soft pseudo label, which, together with the network, formulates a two-player non-zero-sum non-cooperative game. The objective of the network is to minimize the loss function with given pseudo labels, while the pseudo labels will seek convergence to 1 (positive) or 0 (negative) with a penalty of deviating from the predicted labels determined by the network. In addition, we introduce a confidence-aware scheduler into the loss of the network to adaptively perform easy-to-hard learning for different labels. Extensive experiments demonstrate that our proposed G2NetPL outperforms many state-of-the-art multi-label classification methods under various partial-label settings on three different datasets.
多标签图像分类旨在预测图像中所有可能的标签。它通常被表述为一个部分标签学习问题,因为在实践中标注每个训练图像中的所有标签可能是昂贵的。部分标签学习的现有工作集中在每个训练图像只被标记为其正/负标签的子集的情况下。为了有效地解决部分标签分类问题,本文提出了一个用于部分标签学习的端到端通用博弈论网络(G2NetPL),它可以应用于大多数部分标签设置,包括一个非常具有挑战性但注释效率高的情况,即只有一部分训练图像被标记,每个图像只有一个正标签,而其余训练图像保持未标记。在G2NetPL中,每个未观察到的标签都与一个软伪标签相关联,软伪标签与网络一起构成了一个二人非零和非合作博弈。网络的目标是最小化给定伪标签的损失函数,而伪标签将寻求收敛到1(正)或0(负),并以偏离网络确定的预测标签为代价。此外,我们在网络的损失中引入了一个自信感知的调度程序,以自适应地对不同的标签进行易难学习。大量的实验表明,我们提出的G2NetPL在三种不同数据集的不同部分标签设置下优于许多最先进的多标签分类方法。
{"title":"G2NetPL: Generic Game-Theoretic Network for Partial-Label Image Classification","authors":"R. Abdelfattah, Xin Zhang, M. Fouda, Xiaofeng Wang, Song Wang","doi":"10.48550/arXiv.2210.11469","DOIUrl":"https://doi.org/10.48550/arXiv.2210.11469","url":null,"abstract":"Multi-label image classification aims to predict all possible labels in an image. It is usually formulated as a partial-label learning problem, since it could be expensive in practice to annotate all the labels in every training image. Existing works on partial-label learning focus on the case where each training image is labeled with only a subset of its positive/negative labels. To effectively address partial-label classification, this paper proposes an end-to-end Generic Game-theoretic Network (G2NetPL) for partial-label learning, which can be applied to most partial-label settings, including a very challenging, but annotation-efficient case where only a subset of the training images are labeled, each with only one positive label, while the rest of the training images remain unlabeled. In G2NetPL, each unobserved label is associated with a soft pseudo label, which, together with the network, formulates a two-player non-zero-sum non-cooperative game. The objective of the network is to minimize the loss function with given pseudo labels, while the pseudo labels will seek convergence to 1 (positive) or 0 (negative) with a penalty of deviating from the predicted labels determined by the network. In addition, we introduce a confidence-aware scheduler into the loss of the network to adaptively perform easy-to-hard learning for different labels. Extensive experiments demonstrate that our proposed G2NetPL outperforms many state-of-the-art multi-label classification methods under various partial-label settings on three different datasets.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"25 1","pages":"309"},"PeriodicalIF":0.0,"publicationDate":"2022-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77565106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Analysing Training-Data Leakage from Gradients through Linear Systems and Gradient Matching 用线性系统和梯度匹配分析梯度训练数据泄漏
Cangxiong Chen, N. Campbell
Recent works have demonstrated that it is possible to reconstruct training images and their labels from gradients of an image-classification model when its architecture is known. Unfortunately, there is still an incomplete theoretical understanding of the efficacy and failure of these gradient-leakage attacks. In this paper, we propose a novel framework to analyse training-data leakage from gradients that draws insights from both analytic and optimisation-based gradient-leakage attacks. We formulate the reconstruction problem as solving a linear system from each layer iteratively, accompanied by corrections using gradient matching. Under this framework, we claim that the solubility of the reconstruction problem is primarily determined by that of the linear system at each layer. As a result, we are able to partially attribute the leakage of the training data in a deep network to its architecture. We also propose a metric to measure the level of security of a deep learning model against gradient-based attacks on the training data.
最近的研究表明,当图像分类模型的结构已知时,可以从其梯度重建训练图像及其标签。不幸的是,对这些梯度泄漏攻击的有效性和失败的理论理解仍然不完整。在本文中,我们提出了一个新的框架来分析梯度的训练数据泄漏,该框架从分析和基于优化的梯度泄漏攻击中获得见解。我们将重建问题表述为从每层迭代求解线性系统,并使用梯度匹配进行校正。在此框架下,我们认为重构问题的溶解度主要取决于每一层线性系统的溶解度。因此,我们能够将深度网络中训练数据的泄漏部分归因于其架构。我们还提出了一个度量来衡量深度学习模型对训练数据基于梯度的攻击的安全级别。
{"title":"Analysing Training-Data Leakage from Gradients through Linear Systems and Gradient Matching","authors":"Cangxiong Chen, N. Campbell","doi":"10.48550/arXiv.2210.13231","DOIUrl":"https://doi.org/10.48550/arXiv.2210.13231","url":null,"abstract":"Recent works have demonstrated that it is possible to reconstruct training images and their labels from gradients of an image-classification model when its architecture is known. Unfortunately, there is still an incomplete theoretical understanding of the efficacy and failure of these gradient-leakage attacks. In this paper, we propose a novel framework to analyse training-data leakage from gradients that draws insights from both analytic and optimisation-based gradient-leakage attacks. We formulate the reconstruction problem as solving a linear system from each layer iteratively, accompanied by corrections using gradient matching. Under this framework, we claim that the solubility of the reconstruction problem is primarily determined by that of the linear system at each layer. As a result, we are able to partially attribute the leakage of the training data in a deep network to its architecture. We also propose a metric to measure the level of security of a deep learning model against gradient-based attacks on the training data.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"8 1","pages":"1009"},"PeriodicalIF":0.0,"publicationDate":"2022-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85358909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stating Comparison Score Uncertainty and Verification Decision Confidence Towards Transparent Face Recognition 陈述透明人脸识别的比较评分不确定性和验证决策置信度
Marco Huber, P. Terhorst, Florian Kirchbuchner, N. Damer, Arjan Kuijper
Face Recognition (FR) is increasingly used in critical verification decisions and thus, there is a need for assessing the trustworthiness of such decisions. The confidence of a decision is often based on the overall performance of the model or on the image quality. We propose to propagate model uncertainties to scores and decisions in an effort to increase the transparency of verification decisions. This work presents two contributions. First, we propose an approach to estimate the uncertainty of face comparison scores. Second, we introduce a confidence measure of the system's decision to provide insights into the verification decision. The suitability of the comparison scores uncertainties and the verification decision confidences have been experimentally proven on three face recognition models on two datasets.
人脸识别(FR)越来越多地用于关键的验证决策,因此有必要评估这些决策的可信度。决策的可信度通常基于模型的整体性能或图像质量。我们建议将模型不确定性传播到分数和决策中,以增加验证决策的透明度。这项工作有两个贡献。首先,我们提出了一种估计人脸比较分数不确定性的方法。其次,我们引入系统决策的置信度度量,以提供对验证决策的洞察。在两个数据集上的三种人脸识别模型上,实验证明了比较分数不确定性和验证决策置信度的适用性。
{"title":"Stating Comparison Score Uncertainty and Verification Decision Confidence Towards Transparent Face Recognition","authors":"Marco Huber, P. Terhorst, Florian Kirchbuchner, N. Damer, Arjan Kuijper","doi":"10.48550/arXiv.2210.10354","DOIUrl":"https://doi.org/10.48550/arXiv.2210.10354","url":null,"abstract":"Face Recognition (FR) is increasingly used in critical verification decisions and thus, there is a need for assessing the trustworthiness of such decisions. The confidence of a decision is often based on the overall performance of the model or on the image quality. We propose to propagate model uncertainties to scores and decisions in an effort to increase the transparency of verification decisions. This work presents two contributions. First, we propose an approach to estimate the uncertainty of face comparison scores. Second, we introduce a confidence measure of the system's decision to provide insights into the verification decision. The suitability of the comparison scores uncertainties and the verification decision confidences have been experimentally proven on three face recognition models on two datasets.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"148 1","pages":"506"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81631226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Pseudo-Label Noise Suppression Techniques for Semi-Supervised Semantic Segmentation 半监督语义分割中的伪标签噪声抑制技术
S. Scherer, Robin Schön, R. Lienhart
Semi-supervised learning (SSL) can reduce the need for large labelled datasets by incorporating unlabelled data into the training. This is particularly interesting for semantic segmentation, where labelling data is very costly and time-consuming. Current SSL approaches use an initially supervised trained model to generate predictions for unlabelled images, called pseudo-labels, which are subsequently used for training a new model from scratch. Since the predictions usually do not come from an error-free neural network, they are naturally full of errors. However, training with partially incorrect labels often reduce the final model performance. Thus, it is crucial to manage errors/noise of pseudo-labels wisely. In this work, we use three mechanisms to control pseudo-label noise and errors: (1) We construct a solid base framework by mixing images with cow-patterns on unlabelled images to reduce the negative impact of wrong pseudo-labels. Nevertheless, wrong pseudo-labels still have a negative impact on the performance. Therefore, (2) we propose a simple and effective loss weighting scheme for pseudo-labels defined by the feedback of the model trained on these pseudo-labels. This allows us to soft-weight the pseudo-label training examples based on their determined confidence score during training. (3) We also study the common practice to ignore pseudo-labels with low confidence and empirically analyse the influence and effect of pseudo-labels with different confidence ranges on SSL and the contribution of pseudo-label filtering to the achievable performance gains. We show that our method performs superior to state of-the-art alternatives on various datasets. Furthermore, we show that our findings also transfer to other tasks such as human pose estimation. Our code is available at https://github.com/ChristmasFan/SSL_Denoising_Segmentation.
半监督学习(SSL)可以通过将未标记的数据合并到训练中来减少对大型标记数据集的需求。这对于语义分割来说特别有趣,因为标记数据非常昂贵且耗时。当前的SSL方法使用初始监督训练模型来生成未标记图像的预测,称为伪标签,随后用于从头开始训练新模型。由于预测通常不是来自无错误的神经网络,因此它们自然充满了错误。然而,使用部分不正确的标签进行训练通常会降低最终的模型性能。因此,明智地管理伪标签的错误/噪声是至关重要的。在这项工作中,我们使用了三种机制来控制伪标签噪声和错误:(1)我们通过在未标记的图像上混合带有奶牛图案的图像来构建坚实的基础框架,以减少错误伪标签的负面影响。然而,错误的伪标签仍然会对性能产生负面影响。因此,(2)我们提出了一种简单有效的伪标签损失加权方案,该方案由这些伪标签训练的模型的反馈来定义。这允许我们在训练过程中根据伪标签训练示例确定的置信度得分对其进行软加权。(3)我们还研究了忽略低置信度伪标签的常见做法,并实证分析了不同置信度伪标签对SSL的影响和效果,以及伪标签过滤对可实现的性能增益的贡献。我们表明,在各种数据集上,我们的方法优于最先进的替代方法。此外,我们表明我们的发现也适用于其他任务,如人体姿势估计。我们的代码可在https://github.com/ChristmasFan/SSL_Denoising_Segmentation上获得。
{"title":"Pseudo-Label Noise Suppression Techniques for Semi-Supervised Semantic Segmentation","authors":"S. Scherer, Robin Schön, R. Lienhart","doi":"10.48550/arXiv.2210.10426","DOIUrl":"https://doi.org/10.48550/arXiv.2210.10426","url":null,"abstract":"Semi-supervised learning (SSL) can reduce the need for large labelled datasets by incorporating unlabelled data into the training. This is particularly interesting for semantic segmentation, where labelling data is very costly and time-consuming. Current SSL approaches use an initially supervised trained model to generate predictions for unlabelled images, called pseudo-labels, which are subsequently used for training a new model from scratch. Since the predictions usually do not come from an error-free neural network, they are naturally full of errors. However, training with partially incorrect labels often reduce the final model performance. Thus, it is crucial to manage errors/noise of pseudo-labels wisely. In this work, we use three mechanisms to control pseudo-label noise and errors: (1) We construct a solid base framework by mixing images with cow-patterns on unlabelled images to reduce the negative impact of wrong pseudo-labels. Nevertheless, wrong pseudo-labels still have a negative impact on the performance. Therefore, (2) we propose a simple and effective loss weighting scheme for pseudo-labels defined by the feedback of the model trained on these pseudo-labels. This allows us to soft-weight the pseudo-label training examples based on their determined confidence score during training. (3) We also study the common practice to ignore pseudo-labels with low confidence and empirically analyse the influence and effect of pseudo-labels with different confidence ranges on SSL and the contribution of pseudo-label filtering to the achievable performance gains. We show that our method performs superior to state of-the-art alternatives on various datasets. Furthermore, we show that our findings also transfer to other tasks such as human pose estimation. Our code is available at https://github.com/ChristmasFan/SSL_Denoising_Segmentation.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"11 1","pages":"829"},"PeriodicalIF":0.0,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89200036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rethinking Prototypical Contrastive Learning through Alignment, Uniformity and Correlation 从一致性、一致性和相关性重新思考原型对比学习
Shentong Mo, Zhun Sun, Chao Li
Contrastive self-supervised learning (CSL) with a prototypical regularization has been introduced in learning meaningful representations for downstream tasks that require strong semantic information. However, to optimize CSL with a loss that performs the prototypical regularization aggressively, e.g., the ProtoNCE loss, might cause the"coagulation"of examples in the embedding space. That is, the intra-prototype diversity of samples collapses to trivial solutions for their prototype being well-separated from others. Motivated by previous works, we propose to mitigate this phenomenon by learning Prototypical representation through Alignment, Uniformity and Correlation (PAUC). Specifically, the ordinary ProtoNCE loss is revised with: (1) an alignment loss that pulls embeddings from positive prototypes together; (2) a uniformity loss that distributes the prototypical level features uniformly; (3) a correlation loss that increases the diversity and discriminability between prototypical level features. We conduct extensive experiments on various benchmarks where the results demonstrate the effectiveness of our method in improving the quality of prototypical contrastive representations. Particularly, in the classification down-stream tasks with linear probes, our proposed method outperforms the state-of-the-art instance-wise and prototypical contrastive learning methods on the ImageNet-100 dataset by 2.96% and the ImageNet-1K dataset by 2.46% under the same settings of batch size and epochs.
将具有原型正则化的对比自监督学习(CSL)引入到需要强语义信息的下游任务的有意义表示学习中。然而,如果使用大量执行原型正则化的损失(例如ProtoNCE损失)来优化CSL,可能会导致嵌入空间中的示例“凝固”。也就是说,样本的原型内多样性崩溃为平凡的解决方案,因为它们的原型与其他原型分离得很好。在前人研究的启发下,我们提出通过对齐、均匀性和相关性(PAUC)学习原型表征来缓解这一现象。具体来说,普通的ProtoNCE损耗被修正为:(1)将嵌入从正原型拉到一起的对准损耗;(2)使原型水平特征均匀分布的均匀性损失;(3)相关损失增加了原型水平特征之间的多样性和可辨别性。我们在各种基准上进行了广泛的实验,结果证明了我们的方法在提高原型对比表征质量方面的有效性。特别是,在具有线性探针的分类下游任务中,在相同的批大小和epoch设置下,我们提出的方法在ImageNet-100数据集上比最先进的基于实例和原型对比学习方法高出2.96%,在ImageNet-1K数据集上高出2.46%。
{"title":"Rethinking Prototypical Contrastive Learning through Alignment, Uniformity and Correlation","authors":"Shentong Mo, Zhun Sun, Chao Li","doi":"10.48550/arXiv.2210.10194","DOIUrl":"https://doi.org/10.48550/arXiv.2210.10194","url":null,"abstract":"Contrastive self-supervised learning (CSL) with a prototypical regularization has been introduced in learning meaningful representations for downstream tasks that require strong semantic information. However, to optimize CSL with a loss that performs the prototypical regularization aggressively, e.g., the ProtoNCE loss, might cause the\"coagulation\"of examples in the embedding space. That is, the intra-prototype diversity of samples collapses to trivial solutions for their prototype being well-separated from others. Motivated by previous works, we propose to mitigate this phenomenon by learning Prototypical representation through Alignment, Uniformity and Correlation (PAUC). Specifically, the ordinary ProtoNCE loss is revised with: (1) an alignment loss that pulls embeddings from positive prototypes together; (2) a uniformity loss that distributes the prototypical level features uniformly; (3) a correlation loss that increases the diversity and discriminability between prototypical level features. We conduct extensive experiments on various benchmarks where the results demonstrate the effectiveness of our method in improving the quality of prototypical contrastive representations. Particularly, in the classification down-stream tasks with linear probes, our proposed method outperforms the state-of-the-art instance-wise and prototypical contrastive learning methods on the ImageNet-100 dataset by 2.96% and the ImageNet-1K dataset by 2.46% under the same settings of batch size and epochs.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"61 1","pages":"299"},"PeriodicalIF":0.0,"publicationDate":"2022-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83209473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A Tri-Layer Plugin to Improve Occluded Detection 改进遮挡检测的三层插件
Guanqi Zhan, Weidi Xie, Andrew Zisserman
Detecting occluded objects still remains a challenge for state-of-the-art object detectors. The objective of this work is to improve the detection for such objects, and thereby improve the overall performance of a modern object detector. To this end we make the following four contributions: (1) We propose a simple 'plugin' module for the detection head of two-stage object detectors to improve the recall of partially occluded objects. The module predicts a tri-layer of segmentation masks for the target object, the occluder and the occludee, and by doing so is able to better predict the mask of the target object. (2) We propose a scalable pipeline for generating training data for the module by using amodal completion of existing object detection and instance segmentation training datasets to establish occlusion relationships. (3) We also establish a COCO evaluation dataset to measure the recall performance of partially occluded and separated objects. (4) We show that the plugin module inserted into a two-stage detector can boost the performance significantly, by only fine-tuning the detection head, and with additional improvements if the entire architecture is fine-tuned. COCO results are reported for Mask R-CNN with Swin-T or Swin-S backbones, and Cascade Mask R-CNN with a Swin-B backbone.
对于最先进的目标探测器来说,检测被遮挡的物体仍然是一个挑战。这项工作的目的是提高对这些目标的检测,从而提高现代目标检测器的整体性能。为此,我们做出了以下四点贡献:(1)我们提出了一个简单的“插件”模块,用于两级目标检测器的检测头,以提高部分遮挡物体的召回率。该模块预测目标对象、遮挡者和被遮挡者的三层分割掩码,从而能够更好地预测目标对象的掩码。(2)我们提出了一个可扩展的管道,通过对现有的目标检测和实例分割训练数据集进行模态补全来建立遮挡关系,为模块生成训练数据。(3)我们还建立了一个COCO评价数据集来衡量部分遮挡和分离对象的召回性能。(4)我们表明,插入到两级检测器中的插件模块可以通过仅微调检测头来显着提高性能,并且如果对整个架构进行微调,则会有额外的改进。使用swan - t或swan - s骨干网的Mask R-CNN和使用swan - b骨干网的Cascade Mask R-CNN报告了COCO结果。
{"title":"A Tri-Layer Plugin to Improve Occluded Detection","authors":"Guanqi Zhan, Weidi Xie, Andrew Zisserman","doi":"10.48550/arXiv.2210.10046","DOIUrl":"https://doi.org/10.48550/arXiv.2210.10046","url":null,"abstract":"Detecting occluded objects still remains a challenge for state-of-the-art object detectors. The objective of this work is to improve the detection for such objects, and thereby improve the overall performance of a modern object detector. To this end we make the following four contributions: (1) We propose a simple 'plugin' module for the detection head of two-stage object detectors to improve the recall of partially occluded objects. The module predicts a tri-layer of segmentation masks for the target object, the occluder and the occludee, and by doing so is able to better predict the mask of the target object. (2) We propose a scalable pipeline for generating training data for the module by using amodal completion of existing object detection and instance segmentation training datasets to establish occlusion relationships. (3) We also establish a COCO evaluation dataset to measure the recall performance of partially occluded and separated objects. (4) We show that the plugin module inserted into a two-stage detector can boost the performance significantly, by only fine-tuning the detection head, and with additional improvements if the entire architecture is fine-tuned. COCO results are reported for Mask R-CNN with Swin-T or Swin-S backbones, and Cascade Mask R-CNN with a Swin-B backbone.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"101 1","pages":"250"},"PeriodicalIF":0.0,"publicationDate":"2022-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78746218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Dual-Curriculum Teacher for Domain-Inconsistent Object Detection in Autonomous Driving 自动驾驶领域不一致目标检测的双课程教师
L. Yu, Yifan Zhang, Lanqing Hong, Fei Chen, Zhenguo Li
Object detection for autonomous vehicles has received increasing attention in recent years, where labeled data are often expensive while unlabeled data can be collected readily, calling for research on semi-supervised learning for this area. Existing semi-supervised object detection (SSOD) methods usually assume that the labeled and unlabeled data come from the same data distribution. In autonomous driving, however, data are usually collected from different scenarios, such as different weather conditions or different times in a day. Motivated by this, we study a novel but challenging domain inconsistent SSOD problem. It involves two kinds of distribution shifts among different domains, including (1) data distribution discrepancy, and (2) class distribution shifts, making existing SSOD methods suffer from inaccurate pseudo-labels and hurting model performance. To address this problem, we propose a novel method, namely Dual-Curriculum Teacher (DucTeacher). Specifically, DucTeacher consists of two curriculums, i.e., (1) domain evolving curriculum seeks to learn from the data progressively to handle data distribution discrepancy by estimating the similarity between domains, and (2) distribution matching curriculum seeks to estimate the class distribution for each unlabeled domain to handle class distribution shifts. In this way, DucTeacher can calibrate biased pseudo-labels and handle the domain-inconsistent SSOD problem effectively. DucTeacher shows its advantages on SODA10M, the largest public semi-supervised autonomous driving dataset, and COCO, a widely used SSOD benchmark. Experiments show that DucTeacher achieves new state-of-the-art performance on SODA10M with 2.2 mAP improvement and on COCO with 0.8 mAP improvement.
近年来,自动驾驶汽车的目标检测受到越来越多的关注,其中标记数据通常昂贵,而未标记数据可以很容易地收集,这需要研究半监督学习。现有的半监督目标检测(SSOD)方法通常假设标记数据和未标记数据来自同一数据分布。然而,在自动驾驶中,数据通常是从不同的场景中收集的,比如不同的天气条件或一天中的不同时间。基于此,我们研究了一个新颖但具有挑战性的领域不一致SSOD问题。它涉及到两种不同领域之间的分布移位,包括(1)数据分布差异和(2)类分布移位,使得现有SSOD方法存在伪标签不准确的问题,影响模型性能。为了解决这一问题,我们提出了一种新的方法,即双课程教师(DucTeacher)。具体来说,DucTeacher由两个课程组成,即(1)领域演进课程寻求从数据中逐步学习,通过估计领域之间的相似性来处理数据分布差异;(2)分布匹配课程寻求估计每个未标记领域的班级分布,以处理班级分布变化。这样,DucTeacher就可以对有偏差的伪标签进行校正,有效地处理域不一致的SSOD问题。DucTeacher在最大的公共半监督自动驾驶数据集SODA10M和广泛使用的SSOD基准COCO上展示了其优势。实验表明,DucTeacher在SODA10M上实现了2.2 mAP改进,在COCO上实现了0.8 mAP改进。
{"title":"Dual-Curriculum Teacher for Domain-Inconsistent Object Detection in Autonomous Driving","authors":"L. Yu, Yifan Zhang, Lanqing Hong, Fei Chen, Zhenguo Li","doi":"10.48550/arXiv.2210.08748","DOIUrl":"https://doi.org/10.48550/arXiv.2210.08748","url":null,"abstract":"Object detection for autonomous vehicles has received increasing attention in recent years, where labeled data are often expensive while unlabeled data can be collected readily, calling for research on semi-supervised learning for this area. Existing semi-supervised object detection (SSOD) methods usually assume that the labeled and unlabeled data come from the same data distribution. In autonomous driving, however, data are usually collected from different scenarios, such as different weather conditions or different times in a day. Motivated by this, we study a novel but challenging domain inconsistent SSOD problem. It involves two kinds of distribution shifts among different domains, including (1) data distribution discrepancy, and (2) class distribution shifts, making existing SSOD methods suffer from inaccurate pseudo-labels and hurting model performance. To address this problem, we propose a novel method, namely Dual-Curriculum Teacher (DucTeacher). Specifically, DucTeacher consists of two curriculums, i.e., (1) domain evolving curriculum seeks to learn from the data progressively to handle data distribution discrepancy by estimating the similarity between domains, and (2) distribution matching curriculum seeks to estimate the class distribution for each unlabeled domain to handle class distribution shifts. In this way, DucTeacher can calibrate biased pseudo-labels and handle the domain-inconsistent SSOD problem effectively. DucTeacher shows its advantages on SODA10M, the largest public semi-supervised autonomous driving dataset, and COCO, a widely used SSOD benchmark. Experiments show that DucTeacher achieves new state-of-the-art performance on SODA10M with 2.2 mAP improvement and on COCO with 0.8 mAP improvement.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"34 10 1","pages":"872"},"PeriodicalIF":0.0,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82782517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1