首页 > 最新文献

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文 中文
Scene Essence 现场的本质
Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.00822
Jiayan Qiu, Yiding Yang, Xinchao Wang, D. Tao
What scene elements, if any, are indispensable for recognizing a scene? We strive to answer this question through the lens of an exotic learning scheme. Our goal is to identify a collection of such pivotal elements, which we term as Scene Essence, to be those that would alter scene recognition if taken out from the scene. To this end, we devise a novel approach that learns to partition the scene objects into two groups, essential ones and minor ones, under the supervision that if only the essential ones are kept while the minor ones are erased in the input image, a scene recognizer would preserve its original prediction. Specifically, we introduce a learnable graph neural network (GNN) for labelling scene objects, based on which the minor ones are wiped off by an off-the-shelf image inpainter. The features of the inpainted image derived in this way, together with those learned from the GNN with the minor-object nodes pruned, are expected to fool the scene discriminator. Both subjective and objective evaluations on Places365, SUN397, and MIT67 datasets demonstrate that, the learned Scene Essence yields a visually plausible image that convincingly retains the original scene category.
哪些场景元素(如果有的话)是识别场景不可或缺的?我们努力通过一个奇异的学习计划来回答这个问题。我们的目标是确定这样一个关键元素的集合,我们称之为场景本质,是那些如果从场景中取出会改变场景识别的元素。为此,我们设计了一种新的方法,该方法学习将场景对象划分为两组,基本对象和次要对象,在监督下,如果只保留基本对象而删除输入图像中的次要对象,则场景识别器将保留其原始预测。具体来说,我们引入了一个可学习的图神经网络(GNN)来标记场景对象,在此基础上,次要的对象被painter中现成的图像擦除。以这种方式获得的图像特征,以及从GNN中学习到的经过小目标节点修剪的特征,有望骗过场景鉴别器。对Places365、SUN397和MIT67数据集的主观和客观评估表明,学习后的场景本质产生了视觉上可信的图像,令人信服地保留了原始场景类别。
{"title":"Scene Essence","authors":"Jiayan Qiu, Yiding Yang, Xinchao Wang, D. Tao","doi":"10.1109/CVPR46437.2021.00822","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.00822","url":null,"abstract":"What scene elements, if any, are indispensable for recognizing a scene? We strive to answer this question through the lens of an exotic learning scheme. Our goal is to identify a collection of such pivotal elements, which we term as Scene Essence, to be those that would alter scene recognition if taken out from the scene. To this end, we devise a novel approach that learns to partition the scene objects into two groups, essential ones and minor ones, under the supervision that if only the essential ones are kept while the minor ones are erased in the input image, a scene recognizer would preserve its original prediction. Specifically, we introduce a learnable graph neural network (GNN) for labelling scene objects, based on which the minor ones are wiped off by an off-the-shelf image inpainter. The features of the inpainted image derived in this way, together with those learned from the GNN with the minor-object nodes pruned, are expected to fool the scene discriminator. Both subjective and objective evaluations on Places365, SUN397, and MIT67 datasets demonstrate that, the learned Scene Essence yields a visually plausible image that convincingly retains the original scene category.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127982689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
3D Video Stabilization with Depth Estimation by CNN-based Optimization 基于cnn优化深度估计的3D视频稳定
Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.01048
Yao Lee, Kuan-Wei Tseng, Yu-Ta Chen, Chien-Cheng Chen, Chu-Song Chen, Y. Hung
Video stabilization is an essential component of visual quality enhancement. Early methods rely on feature tracking to recover either 2D or 3D frame motion, which suffer from the robustness of local feature extraction and tracking in shaky videos. Recently, learning-based methods seek to find frame transformations with high-level information via deep neural networks to overcome the robustness issue of feature tracking. Nevertheless, to our best knowledge, no learning-based methods leverage 3D cues for the transformation inference yet; hence they would lead to artifacts on complex scene-depth scenarios. In this paper, we propose Deep3D Stabilizer, a novel 3D depth-based learning method for video stabilization. We take advantage of the recent self-supervised framework on jointly learning depth and camera ego-motion estimation on raw videos. Our approach requires no data for pre-training but stabilizes the input video via 3D reconstruction directly. The rectification stage incorporates the 3D scene depth and camera motion to smooth the camera trajectory and synthesize the stabilized video. Unlike most one-size-fits-all learning-based methods, our smoothing algorithm allows users to manipulate the stability of a video efficiently. Experimental results on challenging benchmarks show that the proposed solution consistently outperforms the state-of-the-art methods on almost all motion categories.
视频防抖是提高视觉质量的重要组成部分。早期的方法依赖于特征跟踪来恢复2D或3D帧运动,这在抖动视频中受到局部特征提取和跟踪的鲁棒性的影响。最近,基于学习的方法寻求通过深度神经网络寻找具有高级信息的帧变换,以克服特征跟踪的鲁棒性问题。然而,据我们所知,目前还没有基于学习的方法利用3D线索进行转换推理;因此,它们会导致复杂场景深度场景的伪影。在本文中,我们提出了Deep3D稳定器,一种新的基于3D深度的视频稳定学习方法。我们利用最新的自监督框架在原始视频上联合学习深度和摄像机自运动估计。我们的方法不需要数据进行预训练,而是直接通过3D重建来稳定输入视频。校正阶段结合三维场景深度和摄像机运动来平滑摄像机轨迹,合成稳定的视频。与大多数一刀切的基于学习的方法不同,我们的平滑算法允许用户有效地操纵视频的稳定性。在具有挑战性的基准测试上的实验结果表明,所提出的解决方案在几乎所有运动类别上始终优于最先进的方法。
{"title":"3D Video Stabilization with Depth Estimation by CNN-based Optimization","authors":"Yao Lee, Kuan-Wei Tseng, Yu-Ta Chen, Chien-Cheng Chen, Chu-Song Chen, Y. Hung","doi":"10.1109/CVPR46437.2021.01048","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.01048","url":null,"abstract":"Video stabilization is an essential component of visual quality enhancement. Early methods rely on feature tracking to recover either 2D or 3D frame motion, which suffer from the robustness of local feature extraction and tracking in shaky videos. Recently, learning-based methods seek to find frame transformations with high-level information via deep neural networks to overcome the robustness issue of feature tracking. Nevertheless, to our best knowledge, no learning-based methods leverage 3D cues for the transformation inference yet; hence they would lead to artifacts on complex scene-depth scenarios. In this paper, we propose Deep3D Stabilizer, a novel 3D depth-based learning method for video stabilization. We take advantage of the recent self-supervised framework on jointly learning depth and camera ego-motion estimation on raw videos. Our approach requires no data for pre-training but stabilizes the input video via 3D reconstruction directly. The rectification stage incorporates the 3D scene depth and camera motion to smooth the camera trajectory and synthesize the stabilized video. Unlike most one-size-fits-all learning-based methods, our smoothing algorithm allows users to manipulate the stability of a video efficiently. Experimental results on challenging benchmarks show that the proposed solution consistently outperforms the state-of-the-art methods on almost all motion categories.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115918542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Deep Perceptual Preprocessing for Video Coding 视频编码的深度感知预处理
Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.01461
A. Chadha, Y. Andreopoulos
We introduce the concept of rate-aware deep perceptual preprocessing (DPP) for video encoding. DPP makes a single pass over each input frame in order to enhance its visual quality when the video is to be compressed with any codec at any bitrate. The resulting bitstreams can be decoded and displayed at the client side without any post-processing component. DPP comprises a convolutional neural network that is trained via a composite set of loss functions that incorporates: (i) a perceptual loss based on a trained no-reference image quality assessment model, (ii) a reference-based fidelity loss expressing L1 and structural similarity aspects, (iii) a motion-based rate loss via block-based transform, quantization and entropy estimates that converts the essential components of standard hybrid video encoder designs into a trainable framework. Extensive testing using multiple quality metrics and AVC, AV1 and VVC encoders shows that DPP+encoder reduces, on average, the bitrate of the corresponding encoder by 11%. This marks the first time a server-side neural processing component achieves such savings over the state-of-the-art in video coding.
在视频编码中引入速率感知深度感知预处理(DPP)的概念。当视频要用任何编解码器以任何比特率进行压缩时,DPP对每个输入帧进行单次传输,以增强其视觉质量。产生的比特流可以在客户端进行解码和显示,而不需要任何后处理组件。DPP包括一个卷积神经网络,该网络通过一组复合损失函数进行训练,该损失函数包含:(i)基于训练过的无参考图像质量评估模型的感知损失,(ii)基于表达L1和结构相似性方面的基于参考的保真度损失,(iii)通过基于块的变换、量化和熵估计将标准混合视频编码器设计的基本组件转换为可训练框架的基于运动的速率损失。使用多种质量指标和AVC、AV1和VVC编码器进行的广泛测试表明,DPP+编码器平均降低了相应编码器的比特率11%。这标志着服务器端神经处理组件首次在视频编码中实现如此先进的节省。
{"title":"Deep Perceptual Preprocessing for Video Coding","authors":"A. Chadha, Y. Andreopoulos","doi":"10.1109/CVPR46437.2021.01461","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.01461","url":null,"abstract":"We introduce the concept of rate-aware deep perceptual preprocessing (DPP) for video encoding. DPP makes a single pass over each input frame in order to enhance its visual quality when the video is to be compressed with any codec at any bitrate. The resulting bitstreams can be decoded and displayed at the client side without any post-processing component. DPP comprises a convolutional neural network that is trained via a composite set of loss functions that incorporates: (i) a perceptual loss based on a trained no-reference image quality assessment model, (ii) a reference-based fidelity loss expressing L1 and structural similarity aspects, (iii) a motion-based rate loss via block-based transform, quantization and entropy estimates that converts the essential components of standard hybrid video encoder designs into a trainable framework. Extensive testing using multiple quality metrics and AVC, AV1 and VVC encoders shows that DPP+encoder reduces, on average, the bitrate of the corresponding encoder by 11%. This marks the first time a server-side neural processing component achieves such savings over the state-of-the-art in video coding.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"149 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132225825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Prototype-Guided Saliency Feature Learning for Person Search 基于原型引导的显著性特征学习
Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.00483
H. Kim, Sunghun Joung, Ig-Jae Kim, K. Sohn
Existing person search methods integrate person detection and re-identification (re-ID) module into a unified system. Though promising results have been achieved, the misalignment problem, which commonly occurs in person search, limits the discriminative feature representation for re-ID. To overcome this limitation, we introduce a novel framework to learn the discriminative representation by utilizing prototype in OIM loss. Unlike conventional methods using prototype as a representation of person identity, we utilize it as guidance to allow the attention network to consistently highlight multiple instances across different poses. Moreover, we propose a new prototype update scheme with adaptive momentum to increase the discriminative ability across different instances. Extensive ablation experiments demonstrate that our method can significantly enhance the feature discriminative power, outperforming the state-of-the-art results on two person search benchmarks including CUHK-SYSU and PRW.
现有的人员搜索方法将人员检测和重新识别(re-ID)模块集成到一个统一的系统中。虽然已经取得了令人满意的结果,但在人员搜索中经常出现的不对齐问题限制了re-ID的鉴别特征表示。为了克服这一限制,我们引入了一种新的框架,利用OIM损失中的原型来学习判别表示。与使用原型作为人身份表征的传统方法不同,我们利用它作为指导,允许注意力网络在不同姿势中一致地突出多个实例。此外,我们提出了一种新的带有自适应动量的原型更新方案,以提高不同实例之间的区分能力。大量的消融实验表明,我们的方法可以显著提高特征判别能力,在包括中大-中山大学和PRW在内的双人搜索基准上优于最先进的结果。
{"title":"Prototype-Guided Saliency Feature Learning for Person Search","authors":"H. Kim, Sunghun Joung, Ig-Jae Kim, K. Sohn","doi":"10.1109/CVPR46437.2021.00483","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.00483","url":null,"abstract":"Existing person search methods integrate person detection and re-identification (re-ID) module into a unified system. Though promising results have been achieved, the misalignment problem, which commonly occurs in person search, limits the discriminative feature representation for re-ID. To overcome this limitation, we introduce a novel framework to learn the discriminative representation by utilizing prototype in OIM loss. Unlike conventional methods using prototype as a representation of person identity, we utilize it as guidance to allow the attention network to consistently highlight multiple instances across different poses. Moreover, we propose a new prototype update scheme with adaptive momentum to increase the discriminative ability across different instances. Extensive ablation experiments demonstrate that our method can significantly enhance the feature discriminative power, outperforming the state-of-the-art results on two person search benchmarks including CUHK-SYSU and PRW.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130431733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Virtual Fully-Connected Layer: Training a Large-Scale Face Recognition Dataset with Limited Computational Resources 虚拟全连接层:用有限的计算资源训练大规模人脸识别数据集
Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.01311
Pengyu Li, Biao Wang, Lei Zhang
Recently, deep face recognition has achieved significant progress because of Convolutional Neural Networks (CNNs) and large-scale datasets. However, training CNNs on a large-scale face recognition dataset with limited computational resources is still a challenge. This is because the classification paradigm needs to train a fully-connected layer as the category classifier, and its parameters will be in the hundreds of millions if the training dataset contains millions of identities. This requires many computational resources, such as GPU memory. The metric learning paradigm is an economical computation method, but its performance is greatly inferior to that of the classification paradigm. To address this challenge, we propose a simple but effective CNN layer called the Virtual fully-connected (Virtual FC) layer to reduce the computational consumption of the classification paradigm. Without bells and whistles, the proposed Virtual FC reduces the parameters by more than 100 times with respect to the fully-connected layer and achieves competitive performance on mainstream face recognition evaluation datasets. Moreover, the performance of our Virtual FC layer on the evaluation datasets is superior to that of the metric learning paradigm by a significant margin. Our code will be released in hopes of disseminating our idea to other domains1.
近年来,由于卷积神经网络(cnn)和大规模数据集的应用,深度人脸识别取得了重大进展。然而,在计算资源有限的大规模人脸识别数据集上训练cnn仍然是一个挑战。这是因为分类范式需要训练一个全连接层作为类别分类器,如果训练数据集包含数百万个身份,则其参数将在数亿个。这需要大量的计算资源,比如GPU内存。度量学习范式是一种经济的计算方法,但其性能远不如分类范式。为了解决这一挑战,我们提出了一个简单但有效的CNN层,称为虚拟全连接(Virtual FC)层,以减少分类范式的计算消耗。在没有附加功能的情况下,所提出的虚拟FC相对于全连接层减少了100倍以上的参数,并在主流人脸识别评估数据集上取得了具有竞争力的性能。此外,我们的虚拟FC层在评估数据集上的性能明显优于度量学习范式。我们的代码将被发布,希望将我们的想法传播到其他领域。
{"title":"Virtual Fully-Connected Layer: Training a Large-Scale Face Recognition Dataset with Limited Computational Resources","authors":"Pengyu Li, Biao Wang, Lei Zhang","doi":"10.1109/CVPR46437.2021.01311","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.01311","url":null,"abstract":"Recently, deep face recognition has achieved significant progress because of Convolutional Neural Networks (CNNs) and large-scale datasets. However, training CNNs on a large-scale face recognition dataset with limited computational resources is still a challenge. This is because the classification paradigm needs to train a fully-connected layer as the category classifier, and its parameters will be in the hundreds of millions if the training dataset contains millions of identities. This requires many computational resources, such as GPU memory. The metric learning paradigm is an economical computation method, but its performance is greatly inferior to that of the classification paradigm. To address this challenge, we propose a simple but effective CNN layer called the Virtual fully-connected (Virtual FC) layer to reduce the computational consumption of the classification paradigm. Without bells and whistles, the proposed Virtual FC reduces the parameters by more than 100 times with respect to the fully-connected layer and achieves competitive performance on mainstream face recognition evaluation datasets. Moreover, the performance of our Virtual FC layer on the evaluation datasets is superior to that of the metric learning paradigm by a significant margin. Our code will be released in hopes of disseminating our idea to other domains1.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134507730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Gradient-based Algorithms for Machine Teaching 基于梯度的机器教学算法
Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.00144
Pei Wang, Kabir Nagrecha, N. Vasconcelos
The problem of machine teaching is considered. A new formulation is proposed under the assumption of an optimal student, where optimality is defined in the usual machine learning sense of empirical risk minimization. This is a sensible assumption for machine learning students and for human students in crowdsourcing platforms, who tend to perform at least as well as machine learning systems. It is shown that, if allowed unbounded effort, the optimal student always learns the optimal predictor for a classification task. Hence, the role of the optimal teacher is to select the teaching set that minimizes student effort. This is formulated as a problem of functional optimization where, at each teaching iteration, the teacher seeks to align the steepest descent directions of the risk of (1) the teaching set and (2) entire example population. The optimal teacher, denoted MaxGrad, is then shown to maximize the gradient of the risk on the set of new examples selected per iteration. MaxGrad teaching algorithms are finally provided for both binary and multiclass tasks, and shown to have some similarities with boosting algorithms. Experimental evaluations demonstrate the effectiveness of MaxGrad, which outperforms previous algorithms on the classification task, for both machine learning and human students from MTurk, by a substantial margin.
研究了机器教学问题。在最优学生的假设下提出了一个新的公式,其中最优性在通常的机器学习意义上的经验风险最小化中定义。对于机器学习的学生和众包平台上的人类学生来说,这是一个合理的假设,他们的表现至少和机器学习系统一样好。结果表明,在允许无限努力的情况下,最优学生总是学习到分类任务的最优预测器。因此,最优教师的角色是选择使学生努力最小化的教学集。这被表述为一个函数优化问题,在每次教学迭代中,教师寻求对齐(1)教学集和(2)整个样本总体风险的最陡下降方向。最优的老师,表示为MaxGrad,然后显示为最大化每次迭代选择的新示例集上的风险梯度。最后给出了适用于二元任务和多类任务的MaxGrad教学算法,并证明了它与增强算法有一些相似之处。实验评估证明了MaxGrad的有效性,它在机器学习和MTurk的人类学生的分类任务上都比以前的算法要好得多。
{"title":"Gradient-based Algorithms for Machine Teaching","authors":"Pei Wang, Kabir Nagrecha, N. Vasconcelos","doi":"10.1109/CVPR46437.2021.00144","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.00144","url":null,"abstract":"The problem of machine teaching is considered. A new formulation is proposed under the assumption of an optimal student, where optimality is defined in the usual machine learning sense of empirical risk minimization. This is a sensible assumption for machine learning students and for human students in crowdsourcing platforms, who tend to perform at least as well as machine learning systems. It is shown that, if allowed unbounded effort, the optimal student always learns the optimal predictor for a classification task. Hence, the role of the optimal teacher is to select the teaching set that minimizes student effort. This is formulated as a problem of functional optimization where, at each teaching iteration, the teacher seeks to align the steepest descent directions of the risk of (1) the teaching set and (2) entire example population. The optimal teacher, denoted MaxGrad, is then shown to maximize the gradient of the risk on the set of new examples selected per iteration. MaxGrad teaching algorithms are finally provided for both binary and multiclass tasks, and shown to have some similarities with boosting algorithms. Experimental evaluations demonstrate the effectiveness of MaxGrad, which outperforms previous algorithms on the classification task, for both machine learning and human students from MTurk, by a substantial margin.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"219 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133959051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Bilinear Parameterization for Non-Separable Singular Value Penalties 不可分奇异值惩罚的双线性参数化
Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.00389
Marcus Valtonen Örnhag, J. Iglesias, Carl Olsson
Low rank inducing penalties have been proven to successfully uncover fundamental structures considered in computer vision and machine learning; however, such methods generally lead to non-convex optimization problems. Since the resulting objective is non-convex one often resorts to using standard splitting schemes such as Alternating Direction Methods of Multipliers (ADMM), or other subgradient methods, which exhibit slow convergence in the neighbourhood of a local minimum. We propose a method using second order methods, in particular the variable projection method (VarPro), by replacing the nonconvex penalties with a surrogate capable of converting the original objectives to differentiable equivalents. In this way we benefit from faster convergence.The bilinear framework is compatible with a large family of regularizers, and we demonstrate the benefits of our approach on real datasets for rigid and non-rigid structure from motion. The qualitative difference in reconstructions show that many popular non-convex objectives enjoy an advantage in transitioning to the proposed framework.1
低秩诱导惩罚已被证明可以成功地揭示计算机视觉和机器学习中考虑的基本结构;然而,这种方法通常会导致非凸优化问题。由于最终目标是非凸的,因此通常采用标准的分割方案,如乘法器的交替方向方法(ADMM)或其他子梯度方法,这些方法在局部最小值附近表现出缓慢的收敛性。我们提出了一种使用二阶方法的方法,特别是变量投影法(VarPro),通过将非凸惩罚替换为能够将原始目标转换为可微当量的代理。通过这种方式,我们可以从更快的收敛中获益。双线性框架与大量正则化器兼容,我们在运动的刚性和非刚性结构的实际数据集上展示了我们的方法的好处。重建中的质量差异表明,许多流行的非凸目标在过渡到所提出的框架时具有优势
{"title":"Bilinear Parameterization for Non-Separable Singular Value Penalties","authors":"Marcus Valtonen Örnhag, J. Iglesias, Carl Olsson","doi":"10.1109/CVPR46437.2021.00389","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.00389","url":null,"abstract":"Low rank inducing penalties have been proven to successfully uncover fundamental structures considered in computer vision and machine learning; however, such methods generally lead to non-convex optimization problems. Since the resulting objective is non-convex one often resorts to using standard splitting schemes such as Alternating Direction Methods of Multipliers (ADMM), or other subgradient methods, which exhibit slow convergence in the neighbourhood of a local minimum. We propose a method using second order methods, in particular the variable projection method (VarPro), by replacing the nonconvex penalties with a surrogate capable of converting the original objectives to differentiable equivalents. In this way we benefit from faster convergence.The bilinear framework is compatible with a large family of regularizers, and we demonstrate the benefits of our approach on real datasets for rigid and non-rigid structure from motion. The qualitative difference in reconstructions show that many popular non-convex objectives enjoy an advantage in transitioning to the proposed framework.1","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134168173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
End-to-end High Dynamic Range Camera Pipeline Optimization 端到端高动态范围相机流水线优化
Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.00623
N. Robidoux, L. E. G. Capel, Dongmin Seo, Avinash Sharma, Federico Ariza, Felix Heide
The real world is a 280 dB High Dynamic Range (HDR) world which imaging sensors cannot record in a single shot. HDR cameras acquire multiple measurements with different exposures, gains and photodiodes, from which an Image Signal Processor (ISP) reconstructs an HDR image. Dynamic scene HDR image recovery is an open challenge because of motion and because stitched captures have different noise characteristics, resulting in artifacts that ISPs must resolve in real time at double-digit megapixel resolutions. Traditionally, ISP settings used by downstream vision modules are chosen by domain experts; such frozen camera designs are then used for training data acquisition and supervised learning of downstream vision modules. We depart from this paradigm and formulate HDR ISP hyperparameter search as an end-to-end optimization problem, proposing a mixed 0th and 1st-order block coordinate descent optimizer that jointly learns sensor, ISP and detector network weights using RAW image data augmented with emulated SNR transition region artifacts. We assess the proposed method for human vision and image understanding. For automotive object detection, the method improves mAP and mAR by 33% over expert-tuning and 22% over state-of-the-art optimization methods, outperforming expert-tuned HDR imaging and vision pipelines in all HDR laboratory rig and field experiments.
现实世界是一个280分贝的高动态范围(HDR)世界,成像传感器无法一次记录。HDR相机通过不同的曝光、增益和光电二极管获取多个测量值,图像信号处理器(ISP)根据这些测量值重建HDR图像。动态场景HDR图像恢复是一个开放的挑战,因为运动和拼接捕获具有不同的噪声特性,导致网络服务提供商必须以两位数的百万像素分辨率实时解决伪影。传统上,下游视觉模块使用的ISP设置由领域专家选择;然后将这种冻结相机设计用于下游视觉模块的训练数据采集和监督学习。我们从这一范式出发,将HDR ISP超参数搜索制定为端到端优化问题,提出了一个混合的0阶和1阶块坐标下降优化器,该优化器使用带有仿真信噪比过渡区伪像的RAW图像数据联合学习传感器、ISP和检测器网络权重。我们评估了所提出的人类视觉和图像理解方法。对于汽车目标检测,该方法比专家调优方法提高了33%的mAP和mAR,比最先进的优化方法提高了22%,在所有HDR实验室设备和现场实验中都优于专家调优的HDR成像和视觉管道。
{"title":"End-to-end High Dynamic Range Camera Pipeline Optimization","authors":"N. Robidoux, L. E. G. Capel, Dongmin Seo, Avinash Sharma, Federico Ariza, Felix Heide","doi":"10.1109/CVPR46437.2021.00623","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.00623","url":null,"abstract":"The real world is a 280 dB High Dynamic Range (HDR) world which imaging sensors cannot record in a single shot. HDR cameras acquire multiple measurements with different exposures, gains and photodiodes, from which an Image Signal Processor (ISP) reconstructs an HDR image. Dynamic scene HDR image recovery is an open challenge because of motion and because stitched captures have different noise characteristics, resulting in artifacts that ISPs must resolve in real time at double-digit megapixel resolutions. Traditionally, ISP settings used by downstream vision modules are chosen by domain experts; such frozen camera designs are then used for training data acquisition and supervised learning of downstream vision modules. We depart from this paradigm and formulate HDR ISP hyperparameter search as an end-to-end optimization problem, proposing a mixed 0th and 1st-order block coordinate descent optimizer that jointly learns sensor, ISP and detector network weights using RAW image data augmented with emulated SNR transition region artifacts. We assess the proposed method for human vision and image understanding. For automotive object detection, the method improves mAP and mAR by 33% over expert-tuning and 22% over state-of-the-art optimization methods, outperforming expert-tuned HDR imaging and vision pipelines in all HDR laboratory rig and field experiments.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130910523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Representative Batch Normalization with Feature Calibration 具有特征校准的代表性批处理归一化
Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.00856
Shangqi Gao, Qi Han, Duo Li, Ming-Ming Cheng, Pai Peng
Batch Normalization (BatchNorm) has become the default component in modern neural networks to stabilize training. In BatchNorm, centering and scaling operations, along with mean and variance statistics, are utilized for feature standardization over the batch dimension. The batch dependency of BatchNorm enables stable training and better representation of the network, while inevitably ignores the representation differences among instances. We propose to add a simple yet effective feature calibration scheme into the centering and scaling operations of BatchNorm, enhancing the instance-specific representations with the negligible computational cost. The centering calibration strengthens informative features and reduces noisy features. The scaling calibration restricts the feature intensity to form a more stable feature distribution. Our proposed variant of BatchNorm, namely Representative BatchNorm, can be plugged into existing methods to boost the performance of various tasks such as classification, detection, and segmentation. The source code is available in http://mmcheng.net/rbn.
批归一化(BatchNorm)已经成为现代神经网络稳定训练的默认组件。在BatchNorm中,定心和缩放操作以及均值和方差统计用于批处理维度上的特征标准化。BatchNorm的批依赖性使得训练更加稳定,能够更好地表示网络,但不可避免地忽略了实例之间的表示差异。我们建议在BatchNorm的定心和缩放操作中添加一个简单而有效的特征校准方案,以忽略不计的计算成本增强特定实例的表示。定心校正增强了信息特征,降低了噪声特征。尺度标定限制了特征强度,形成更稳定的特征分布。我们提出的BatchNorm的变体,即Representative BatchNorm,可以插入到现有的方法中,以提高分类、检测和分割等各种任务的性能。源代码可从http://mmcheng.net/rbn获得。
{"title":"Representative Batch Normalization with Feature Calibration","authors":"Shangqi Gao, Qi Han, Duo Li, Ming-Ming Cheng, Pai Peng","doi":"10.1109/CVPR46437.2021.00856","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.00856","url":null,"abstract":"Batch Normalization (BatchNorm) has become the default component in modern neural networks to stabilize training. In BatchNorm, centering and scaling operations, along with mean and variance statistics, are utilized for feature standardization over the batch dimension. The batch dependency of BatchNorm enables stable training and better representation of the network, while inevitably ignores the representation differences among instances. We propose to add a simple yet effective feature calibration scheme into the centering and scaling operations of BatchNorm, enhancing the instance-specific representations with the negligible computational cost. The centering calibration strengthens informative features and reduces noisy features. The scaling calibration restricts the feature intensity to form a more stable feature distribution. Our proposed variant of BatchNorm, namely Representative BatchNorm, can be plugged into existing methods to boost the performance of various tasks such as classification, detection, and segmentation. The source code is available in http://mmcheng.net/rbn.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133096113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Learning Graphs for Knowledge Transfer with Limited Labels 有限标签下知识转移的学习图
Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.01100
P. Ghosh, Nirat Saini, L. Davis, Abhinav Shrivastava
Fixed input graphs are a mainstay in approaches that utilize Graph Convolution Networks (GCNs) for knowledge transfer. The standard paradigm is to utilize relationships in the input graph to transfer information using GCNs from training to testing nodes in the graph; for example, the semi-supervised, zero-shot, and few-shot learning setups. We propose a generalized framework for learning and improving the input graph as part of the standard GCN-based learning setup. Moreover, we use additional constraints between similar and dissimilar neighbors for each node in the graph by applying triplet loss on the intermediate layer output. We present results of semi-supervised learning on Citeseer, Cora, and Pubmed benchmarking datasets, and zero/few-shot action recognition on UCF101 and HMDB51 datasets, significantly outperforming current approaches. We also present qualitative results visualizing the graph connections that our approach learns to update.
固定输入图是利用图卷积网络(GCNs)进行知识转移的主要方法。标准范例是利用输入图中的关系,使用GCNs将信息从图中的训练节点传递到测试节点;例如,半监督、零射击和少射击的学习设置。我们提出了一个通用的框架来学习和改进输入图,作为标准的基于gcn的学习设置的一部分。此外,我们通过在中间层输出上应用三重损失,对图中的每个节点使用相似和不相似邻居之间的附加约束。我们展示了在Citeseer、Cora和Pubmed基准数据集上的半监督学习结果,以及在UCF101和HMDB51数据集上的零/少镜头动作识别结果,显著优于当前的方法。我们还提供了定性结果,将我们的方法学习更新的图连接可视化。
{"title":"Learning Graphs for Knowledge Transfer with Limited Labels","authors":"P. Ghosh, Nirat Saini, L. Davis, Abhinav Shrivastava","doi":"10.1109/CVPR46437.2021.01100","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.01100","url":null,"abstract":"Fixed input graphs are a mainstay in approaches that utilize Graph Convolution Networks (GCNs) for knowledge transfer. The standard paradigm is to utilize relationships in the input graph to transfer information using GCNs from training to testing nodes in the graph; for example, the semi-supervised, zero-shot, and few-shot learning setups. We propose a generalized framework for learning and improving the input graph as part of the standard GCN-based learning setup. Moreover, we use additional constraints between similar and dissimilar neighbors for each node in the graph by applying triplet loss on the intermediate layer output. We present results of semi-supervised learning on Citeseer, Cora, and Pubmed benchmarking datasets, and zero/few-shot action recognition on UCF101 and HMDB51 datasets, significantly outperforming current approaches. We also present qualitative results visualizing the graph connections that our approach learns to update.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133494385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1