2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)最新文献

Deep Optimization Prior for THz Model Parameter Estimation 太赫兹模型参数估计的深度优化先验

2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00410

Tak Ming Wong, Hartmut Bauermeister, M. Kahl, P. Bolívar, Michael Möller, A. Kolb

In this paper, we propose a deep optimization prior approach with application to the estimation of material-related model parameters from terahertz (THz) data that is acquired using a Frequency Modulated Continuous Wave (FMCW) THz scanning system. A stable estimation of the THz model parameters for low SNR and shot noise configurations is essential to achieve acquisition times required for applications in, e.g., quality control. Conceptually, our deep optimization prior approach estimates the desired THz model parameters by optimizing for the weights of a neural network. While such a technique was shown to improve the reconstruction quality for convex objectives in the seminal work of Ulyanov et al., our paper demonstrates that deep priors also allow to find better local optima in the non-convex energy landscape of the nonlinear inverse problem arising from THz imaging. We verify this claim numerically on various THz parameter estimation problems for synthetic and real data under low SNR and shot noise conditions. While the low SNR scenario not even requires regularization, the impact of shot noise is significantly reduced by total variation (TV) regularization. We compare our approach with existing optimization techniques that require sophisticated physically motivated initialization, and with a 1D single-pixel reparametrization method.

在本文中，我们提出了一种深度优化先验方法，并应用于从使用调频连续波(FMCW)太赫兹扫描系统获取的太赫兹(THz)数据中估计材料相关模型参数。对于低信噪比和小噪声配置的太赫兹模型参数的稳定估计对于实现诸如质量控制等应用所需的采集时间至关重要。从概念上讲，我们的深度优化先验方法通过优化神经网络的权重来估计所需的太赫兹模型参数。虽然这种技术在Ulyanov等人的开创性工作中被证明可以提高凸目标的重建质量，但我们的论文表明，深度先验也可以在太赫兹成像引起的非线性逆问题的非凸能量景观中找到更好的局部最优解。我们在低信噪比和小噪声条件下对合成数据和实际数据的各种太赫兹参数估计问题进行了数值验证。虽然低信噪比场景甚至不需要正则化，但总变差(TV)正则化显著降低了射击噪声的影响。我们将我们的方法与现有的优化技术进行了比较，这些优化技术需要复杂的物理动机初始化，以及一维单像素重新参数化方法。

{"title":"Deep Optimization Prior for THz Model Parameter Estimation","authors":"Tak Ming Wong, Hartmut Bauermeister, M. Kahl, P. Bolívar, Michael Möller, A. Kolb","doi":"10.1109/WACV51458.2022.00410","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00410","url":null,"abstract":"In this paper, we propose a deep optimization prior approach with application to the estimation of material-related model parameters from terahertz (THz) data that is acquired using a Frequency Modulated Continuous Wave (FMCW) THz scanning system. A stable estimation of the THz model parameters for low SNR and shot noise configurations is essential to achieve acquisition times required for applications in, e.g., quality control. Conceptually, our deep optimization prior approach estimates the desired THz model parameters by optimizing for the weights of a neural network. While such a technique was shown to improve the reconstruction quality for convex objectives in the seminal work of Ulyanov et al., our paper demonstrates that deep priors also allow to find better local optima in the non-convex energy landscape of the nonlinear inverse problem arising from THz imaging. We verify this claim numerically on various THz parameter estimation problems for synthetic and real data under low SNR and shot noise conditions. While the low SNR scenario not even requires regularization, the impact of shot noise is significantly reduced by total variation (TV) regularization. We compare our approach with existing optimization techniques that require sophisticated physically motivated initialization, and with a 1D single-pixel reparametrization method.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"159 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115765917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Joint Classification and Trajectory Regression of Online Handwriting using a Multi-Task Learning Approach 基于多任务学习方法的在线手写联合分类与轨迹回归

2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00131

Felix Ott, David Rügamer, Lucas Heublein, Bernd Bischl, Christopher Mutschler

Multivariate Time Series (MTS) classification is important in various applications such as signature verification, person identification, and motion recognition. In deep learning these classification tasks are usually learned using the cross-entropy loss. A related yet different task is predicting trajectories observed as MTS. Important use cases include handwriting reconstruction, shape analysis, and human pose estimation. The goal is to align an arbitrary dimensional time series with its ground truth as accurately as possible while reducing the error in the prediction with a distance loss and the variance with a similarity loss. Although learning both losses with Multi-Task Learning (MTL) helps to improve trajectory alignment, learning often remains difficult as both tasks are contradictory. We propose a novel neural network architecture for MTL that notably improves the MTS classification and trajectory regression performance in online handwriting (OnHW) recognition. We achieve this by jointly learning the cross-entropy loss in combination with distance and similarity losses. On an OnHW task of handwritten characters with multivariate inertial and visual data inputs we are able to achieve crucial improvements (lower error with less variance) of trajectory prediction while still improving the character classification accuracy in comparison to models trained on the individual tasks.

多变量时间序列(MTS)分类在签名验证、人物识别和运动识别等各种应用中都很重要。在深度学习中，这些分类任务通常使用交叉熵损失来学习。一个相关但不同的任务是预测MTS观察到的轨迹，重要的用例包括手写重建、形状分析和人体姿势估计。目标是尽可能准确地对齐任意维度的时间序列，同时减少距离损失的预测误差和相似损失的方差。尽管使用多任务学习(MTL)学习两种损失有助于改善轨迹对齐，但由于两种任务是相互矛盾的，学习往往仍然困难。我们提出了一种新的MTL神经网络架构，显著提高了MTS分类和轨迹回归在在线手写识别中的性能。我们通过结合距离和相似损失共同学习交叉熵损失来实现这一目标。在具有多变量惯性和视觉数据输入的手写字符的OnHW任务中，我们能够在轨迹预测方面取得关键的改进(更低的误差和更少的方差)，同时与在单个任务上训练的模型相比，仍然提高了字符分类精度。

{"title":"Joint Classification and Trajectory Regression of Online Handwriting using a Multi-Task Learning Approach","authors":"Felix Ott, David Rügamer, Lucas Heublein, Bernd Bischl, Christopher Mutschler","doi":"10.1109/WACV51458.2022.00131","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00131","url":null,"abstract":"Multivariate Time Series (MTS) classification is important in various applications such as signature verification, person identification, and motion recognition. In deep learning these classification tasks are usually learned using the cross-entropy loss. A related yet different task is predicting trajectories observed as MTS. Important use cases include handwriting reconstruction, shape analysis, and human pose estimation. The goal is to align an arbitrary dimensional time series with its ground truth as accurately as possible while reducing the error in the prediction with a distance loss and the variance with a similarity loss. Although learning both losses with Multi-Task Learning (MTL) helps to improve trajectory alignment, learning often remains difficult as both tasks are contradictory. We propose a novel neural network architecture for MTL that notably improves the MTS classification and trajectory regression performance in online handwriting (OnHW) recognition. We achieve this by jointly learning the cross-entropy loss in combination with distance and similarity losses. On an OnHW task of handwritten characters with multivariate inertial and visual data inputs we are able to achieve crucial improvements (lower error with less variance) of trajectory prediction while still improving the character classification accuracy in comparison to models trained on the individual tasks.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"275 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122810530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Extractive Knowledge Distillation 提取知识蒸馏

2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00142

Takumi Kobayashi

Knowledge distillation (KD) transfers knowledge of a teacher model to improve performance of a student model which is usually equipped with lower capacity. In the KD framework, however, it is unclear what kind of knowledge is effective and how it is transferred. This paper analyzes a KD process to explore the key factors. In a KD formulation, softmax temperature entangles three main components of student and teacher probabilities and a weight for KD, making it hard to analyze contributions of those factors separately. We disentangle those components so as to further analyze especially the temperature and improve the components respectively. Based on the analysis about temperature and uniformity of the teacher probability, we propose a method, called extractive distillation, for extracting effective knowledge from the teacher model. The extractive KD touches only teacher knowledge, thus being applicable to various KD methods. In the experiments on image classification tasks using Cifar-100 and TinyImageNet datasets, we demonstrate that the proposed method outperforms the other KD methods and analyze feature representation to show its effectiveness in the framework of transfer learning.

知识蒸馏(Knowledge distillation, KD)通过转移教师模型的知识来提高学生模型的性能，而学生模型通常具有较低的能力。然而，在知识分配框架中，尚不清楚什么样的知识是有效的，以及如何转移。本文对KD工艺进行了分析，探讨了影响KD工艺的关键因素。在KD公式中，软最高温度涉及学生和教师概率的三个主要组成部分以及KD的权重，因此很难单独分析这些因素的贡献。我们分别对这些部件进行了分析，特别是对温度进行了进一步的分析和改进。在分析教师概率温度和均匀性的基础上，提出了一种提取精馏的方法，从教师模型中提取有效知识。提取KD只涉及教师知识，因此适用于各种KD方法。在Cifar-100和TinyImageNet数据集的图像分类任务实验中，我们证明了该方法优于其他KD方法，并分析了特征表示，证明了其在迁移学习框架下的有效性。

{"title":"Extractive Knowledge Distillation","authors":"Takumi Kobayashi","doi":"10.1109/WACV51458.2022.00142","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00142","url":null,"abstract":"Knowledge distillation (KD) transfers knowledge of a teacher model to improve performance of a student model which is usually equipped with lower capacity. In the KD framework, however, it is unclear what kind of knowledge is effective and how it is transferred. This paper analyzes a KD process to explore the key factors. In a KD formulation, softmax temperature entangles three main components of student and teacher probabilities and a weight for KD, making it hard to analyze contributions of those factors separately. We disentangle those components so as to further analyze especially the temperature and improve the components respectively. Based on the analysis about temperature and uniformity of the teacher probability, we propose a method, called extractive distillation, for extracting effective knowledge from the teacher model. The extractive KD touches only teacher knowledge, thus being applicable to various KD methods. In the experiments on image classification tasks using Cifar-100 and TinyImageNet datasets, we demonstrate that the proposed method outperforms the other KD methods and analyze feature representation to show its effectiveness in the framework of transfer learning.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129495673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Hyperspectral Image Super-Resolution with RGB Image Super-Resolution as an Auxiliary Task 以RGB图像超分辨率为辅助任务的高光谱图像超分辨率

2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00409

Ke Li, Dengxin Dai, L. Gool

This work studies Hyperspectral image (HSI) super-resolution (SR). HSI SR is characterized by high-dimensional data and a limited amount of training examples. This raises challenges for training deep neural networks that are known to be data hungry. This work addresses this issue with two contributions. First, we observe that HSI SR and RGB image SR are correlated and develop a novel multi-tasking network to train them jointly so that the auxiliary task RGB image SR can provide additional supervision and regulate the network training. Second, we extend the network to a semi-supervised setting so that it can learn from datasets containing only low-resolution HSIs. With these contributions, our method is able to learn hyperspectral image super-resolution from heterogeneous datasets and lifts the requirement for having a large amount of high resolution (HR) HSI training samples. Extensive experiments on three standard datasets show that our method outperforms existing methods significantly and underpin the relevance of our contributions. Our code can be found at https://github.com/kli8996/HSISR.git.

本文研究了高光谱图像(HSI)的超分辨率(SR)。HSI SR的特点是高维数据和有限数量的训练样本。这给训练深度神经网络带来了挑战，因为深度神经网络众所周知需要大量数据。这项工作通过两个贡献解决了这个问题。首先，我们观察到HSI SR和RGB图像SR是相关的，并开发了一种新的多任务网络来共同训练它们，以便辅助任务RGB图像SR可以提供额外的监督和调节网络训练。其次，我们将网络扩展到半监督设置，以便它可以从仅包含低分辨率hsi的数据集中学习。有了这些贡献，我们的方法能够从异构数据集中学习高光谱图像的超分辨率，并提高了对具有大量高分辨率(HR) HSI训练样本的要求。在三个标准数据集上进行的大量实验表明，我们的方法显著优于现有方法，并巩固了我们贡献的相关性。我们的代码可以在https://github.com/kli8996/HSISR.git上找到。

{"title":"Hyperspectral Image Super-Resolution with RGB Image Super-Resolution as an Auxiliary Task","authors":"Ke Li, Dengxin Dai, L. Gool","doi":"10.1109/WACV51458.2022.00409","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00409","url":null,"abstract":"This work studies Hyperspectral image (HSI) super-resolution (SR). HSI SR is characterized by high-dimensional data and a limited amount of training examples. This raises challenges for training deep neural networks that are known to be data hungry. This work addresses this issue with two contributions. First, we observe that HSI SR and RGB image SR are correlated and develop a novel multi-tasking network to train them jointly so that the auxiliary task RGB image SR can provide additional supervision and regulate the network training. Second, we extend the network to a semi-supervised setting so that it can learn from datasets containing only low-resolution HSIs. With these contributions, our method is able to learn hyperspectral image super-resolution from heterogeneous datasets and lifts the requirement for having a large amount of high resolution (HR) HSI training samples. Extensive experiments on three standard datasets show that our method outperforms existing methods significantly and underpin the relevance of our contributions. Our code can be found at https://github.com/kli8996/HSISR.git.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129710662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

REFICS: A Step Towards Linking Vision with Hardware Assurance REFICS:将视觉与硬件保证联系起来的一步

2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00352

Ronald Wilson, Hangwei Lu, Mengdi Zhu, Domenic Forte, D. Woodard

Hardware assurance is a key process in ensuring the integrity, security and functionality of a hardware device. Its heavy reliance on images, especially on Scanning Electron Microscopy images, makes it an excellent candidate for the vision community. The goal of this paper is to provide a pathway for inter-community collaboration by introducing the existing challenges for hardware assurance on integrated circuits in the context of computer vision and support further development using a large-scale dataset with 800,000 images. A detailed benchmark of existing vision approaches in hardware assurance on the dataset is also included for quantitative insights into the problem.

硬件保障是确保硬件设备的完整性、安全性和功能性的关键过程。它严重依赖于图像，特别是扫描电子显微镜图像，使其成为视觉界的优秀候选人。本文的目标是通过介绍计算机视觉背景下集成电路硬件保证的现有挑战，为社区间合作提供途径，并使用具有80万张图像的大规模数据集支持进一步发展。还包括对数据集上硬件保证的现有视觉方法的详细基准，以便对问题进行定量分析。

引用次数: 7

Multi-motion and Appearance Self-Supervised Moving Object Detection 多运动和外观自监督运动目标检测

2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00216

Fan Yang, S. Karanam, Meng Zheng, Terrence Chen, Haibin Ling, Ziyan Wu

In this work, we consider the problem of self-supervised Moving Object Detection (MOD) in video, where no ground truth is involved in both training and inference phases. Recently, an adversarial learning framework is proposed [32] to leverage inherent temporal information for MOD. While showing great promising results, it uses single scale temporal information and may meet problems when dealing with a deformable object under multi-scale motion in different parts. Additional challenges can arise from the moving camera, which results in the failure of the motion independence hypothesis and locally independent background motion. To deal with these problems, we propose a Multimotion and Appearance Self-supervised Network (MASNet) to introduce multi-scale motion information and appearance information of scene for MOD. In particular, a moving object, especially the deformable, usually consists of moving regions at various temporal scales. Introducing multiscale motion can aggregate these regions to form a more complete detection. Appearance information can serve as another cue for MOD when the motion independence is not reliable and for removing false detection in background caused by locally independent background motion. To encode multi-scale motion and appearance, in MASNet we respectively design a multi-branch flow encoding module and an image inpainter module. The proposed modules and MASNet are extensively evaluated on the DAVIS dataset to demonstrate the effectiveness and superiority to state-of-the-art self-supervised methods.

在这项工作中，我们考虑了视频中自监督移动目标检测(MOD)的问题，其中在训练和推理阶段都不涉及地面真理。最近，人们提出了一种对抗学习框架[32]，利用固有的时间信息进行MOD。虽然它使用了单尺度时间信息，但在处理不同部位多尺度运动下的可变形物体时可能会遇到问题。移动的摄像机可能会带来额外的挑战，这将导致运动无关假设和局部独立背景运动的失败。针对这些问题，我们提出了一种多运动和外观自监督网络(Multimotion and Appearance Self-supervised Network, MASNet)，为MOD引入场景的多尺度运动信息和外观信息。特别是一个运动物体，尤其是可变形物体，通常由不同时间尺度的运动区域组成。引入多尺度运动可以将这些区域聚合起来，形成更完整的检测。在运动独立性不可靠的情况下，外观信息可以作为MOD的另一种提示，并且可以消除局部独立背景运动引起的背景误检。为了实现多尺度运动和外观的编码，我们在MASNet中分别设计了多分支流编码模块和图像处理模块。提出的模块和MASNet在DAVIS数据集上进行了广泛的评估，以证明与最先进的自监督方法相比，其有效性和优越性。

{"title":"Multi-motion and Appearance Self-Supervised Moving Object Detection","authors":"Fan Yang, S. Karanam, Meng Zheng, Terrence Chen, Haibin Ling, Ziyan Wu","doi":"10.1109/WACV51458.2022.00216","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00216","url":null,"abstract":"In this work, we consider the problem of self-supervised Moving Object Detection (MOD) in video, where no ground truth is involved in both training and inference phases. Recently, an adversarial learning framework is proposed [32] to leverage inherent temporal information for MOD. While showing great promising results, it uses single scale temporal information and may meet problems when dealing with a deformable object under multi-scale motion in different parts. Additional challenges can arise from the moving camera, which results in the failure of the motion independence hypothesis and locally independent background motion. To deal with these problems, we propose a Multimotion and Appearance Self-supervised Network (MASNet) to introduce multi-scale motion information and appearance information of scene for MOD. In particular, a moving object, especially the deformable, usually consists of moving regions at various temporal scales. Introducing multiscale motion can aggregate these regions to form a more complete detection. Appearance information can serve as another cue for MOD when the motion independence is not reliable and for removing false detection in background caused by locally independent background motion. To encode multi-scale motion and appearance, in MASNet we respectively design a multi-branch flow encoding module and an image inpainter module. The proposed modules and MASNet are extensively evaluated on the DAVIS dataset to demonstrate the effectiveness and superiority to state-of-the-art self-supervised methods.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128724615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

On the Maximum Radius of Polynomial Lens Distortion 关于多项式透镜畸变的最大半径

2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00243

Matthew J. Leotta, David Russell, Andrew Matrai

Polynomial radial lens distortion models are widely used in image processing and computer vision applications to compensate for when straight lines in the world appear curved in an image. While polynomial models are used pervasively in software ranging from PhotoShop to OpenCV to Blender, they have an often overlooked behavior: polynomial models can fold back onto themselves. This property often goes unnoticed when simply warping to undistort an image. However, in applications such as augmented reality where 3D scene geometry is projected and distorted to overlay an image, this folding can result in a surprising behavior. Points well outside the field of view can project into the middle of the image. The domain of a radial distortion model is only valid up to some (possibly infinite) maximum radius where this folding occurs. This paper derives the closed form expression for the maximum valid radius and demonstrates how this value can be used to filter invalid projections or validate the range of an estimated lens model. Experiments on the popular Lensfun database demonstrate that this folding problem exists on 30% of lens models used in the wild.

多项式径向透镜畸变模型广泛应用于图像处理和计算机视觉应用，用于补偿世界上的直线在图像中出现弯曲时的畸变。虽然多项式模型在从PhotoShop到OpenCV到Blender的软件中广泛使用，但它们有一个经常被忽视的行为:多项式模型可以折叠回自己。当仅仅通过扭曲来消除图像扭曲时，这个属性通常不会被注意到。然而，在增强现实等应用中，3D场景几何图形被投影和扭曲以覆盖图像，这种折叠可能会导致令人惊讶的行为。视野之外的点可以投射到图像的中间。径向畸变模型的域仅在发生这种折叠的某些(可能是无限的)最大半径范围内有效。本文导出了最大有效半径的封闭形式表达式，并演示了如何使用该值来过滤无效投影或验证估计透镜模型的范围。在流行的Lensfun数据库上的实验表明，在野外使用的30%的镜头模型上存在这种折叠问题。

{"title":"On the Maximum Radius of Polynomial Lens Distortion","authors":"Matthew J. Leotta, David Russell, Andrew Matrai","doi":"10.1109/WACV51458.2022.00243","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00243","url":null,"abstract":"Polynomial radial lens distortion models are widely used in image processing and computer vision applications to compensate for when straight lines in the world appear curved in an image. While polynomial models are used pervasively in software ranging from PhotoShop to OpenCV to Blender, they have an often overlooked behavior: polynomial models can fold back onto themselves. This property often goes unnoticed when simply warping to undistort an image. However, in applications such as augmented reality where 3D scene geometry is projected and distorted to overlay an image, this folding can result in a surprising behavior. Points well outside the field of view can project into the middle of the image. The domain of a radial distortion model is only valid up to some (possibly infinite) maximum radius where this folding occurs. This paper derives the closed form expression for the maximum valid radius and demonstrates how this value can be used to filter invalid projections or validate the range of an estimated lens model. Experiments on the popular Lensfun database demonstrate that this folding problem exists on 30% of lens models used in the wild.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"149 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129097081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-level Attentive Adversarial Learning with Temporal Dilation for Unsupervised Video Domain Adaptation 基于时间扩张的多层次关注对抗学习在无监督视频域自适应中的应用

2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00085

Peipeng Chen, Yuan Gao, A. J. Ma

Most existing works on unsupervised video domain adaptation attempt to mitigate the distribution gap across domains in frame and video levels. Such two-level distribution alignment approach may suffer from the problems of insufficient alignment for complex video data and misalignment along the temporal dimension. To address these issues, we develop a novel framework of Multi-level Attentive Adversarial Learning with Temporal Dilation (MA2L- TD). Given frame-level features as input, multi-level temporal features are generated and multiple domain discriminators are individually trained by adversarial learning for them. For better distribution alignment, level-wise attention weights are calculated by the degree of domain confusion in each level. To mitigate the negative effect of misalignment, features are aggregated with the attention mechanism determined by individual domain discriminators. Moreover, temporal dilation is designed for sequential non-repeatability to balance the computational efficiency and the possible number of levels. Extensive experimental results show that our proposed method outperforms the state of the art on four benchmark datasets.1

大多数现有的无监督视频域自适应研究都试图在帧和视频级别上减小域间的分布差距。这种两级分布对齐方法可能存在对复杂视频数据对齐不足和沿时间维度不对齐的问题。为了解决这些问题，我们开发了一个具有时间扩张的多层次注意对抗性学习(MA2L- TD)的新框架。在给定帧级特征作为输入的情况下，生成多层次时间特征，并通过对抗性学习分别训练多个域鉴别器。为了更好地对齐分布，根据每个级别的域混淆程度计算分层关注权重。为了减轻不对齐的负面影响，特征与单个域鉴别器决定的注意机制进行了聚合。此外，为了平衡计算效率和可能的层数，时间膨胀被设计为顺序不可重复性。大量的实验结果表明，我们提出的方法在四个基准数据集上都优于目前的方法

{"title":"Multi-level Attentive Adversarial Learning with Temporal Dilation for Unsupervised Video Domain Adaptation","authors":"Peipeng Chen, Yuan Gao, A. J. Ma","doi":"10.1109/WACV51458.2022.00085","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00085","url":null,"abstract":"Most existing works on unsupervised video domain adaptation attempt to mitigate the distribution gap across domains in frame and video levels. Such two-level distribution alignment approach may suffer from the problems of insufficient alignment for complex video data and misalignment along the temporal dimension. To address these issues, we develop a novel framework of Multi-level Attentive Adversarial Learning with Temporal Dilation (MA2L- TD). Given frame-level features as input, multi-level temporal features are generated and multiple domain discriminators are individually trained by adversarial learning for them. For better distribution alignment, level-wise attention weights are calculated by the degree of domain confusion in each level. To mitigate the negative effect of misalignment, features are aggregated with the attention mechanism determined by individual domain discriminators. Moreover, temporal dilation is designed for sequential non-repeatability to balance the computational efficiency and the possible number of levels. Extensive experimental results show that our proposed method outperforms the state of the art on four benchmark datasets.1","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123188922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

FT-DeepNets: Fault-Tolerant Convolutional Neural Networks with Kernel-based Duplication FT-DeepNets:基于核复制的容错卷积神经网络

2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00194

Iljoo Baek, Wei Chen, Zhihao Zhu, Soheil Samii, R. Rajkumar

Deep neural network (deepnet) applications play a crucial role in safety-critical systems such as autonomous vehicles (AVs). An AV must drive safely towards its destination, avoiding obstacles, and respond quickly when the vehicle must stop. Any transient errors in software calculations or hardware memory in these deepnet applications can potentially lead to dramatically incorrect results. Therefore, assessing and mitigating any transient errors and providing robust results are important for safety-critical systems. Previous research on this subject focused on detecting errors and then recovering from the errors by re-running the network. Other approaches were based on the extent of full network duplication such as the ensemble learning-based approach to boost system fault-tolerance by leveraging each model’s advantages. However, it is hard to detect errors in a deep neural network, and the computational overhead of full redundancy can be substantial.We first study the impact of the error types and locations in deepnets. We next focus on selecting which part should be duplicated using multiple ranking methods to measure the order of importance among neurons. We find that the duplication overhead for computation and memory is a trade-off between algorithmic performance and robustness. To achieve higher robustness with less system overhead, we present two error protection mechanisms that only duplicate parts of the network from critical neurons. Finally, we substantiate the practical feasibility of our approach and evaluate the improvement in the accuracy of a deepnet in the presence of errors. We demonstrate these results using a case study with real-world applications on an Nvidia GeForce RTX 2070Ti GPU and an Nvidia Xavier embedded platform used by automotive OEMs.

深度神经网络(deepnet)应用在自动驾驶汽车(AVs)等安全关键系统中发挥着至关重要的作用。自动驾驶汽车必须安全驶向目的地，避开障碍物，并在车辆必须停车时迅速做出反应。在这些深度网络应用程序中，软件计算或硬件内存中的任何短暂错误都可能导致严重错误的结果。因此，评估和减轻任何瞬态错误并提供可靠的结果对于安全关键系统非常重要。以往对该问题的研究主要集中在检测错误，然后通过重新运行网络从错误中恢复。其他方法基于完整网络复制的程度，例如基于集成学习的方法，通过利用每个模型的优势来提高系统容错性。然而，在深度神经网络中很难检测到错误，并且完全冗余的计算开销可能很大。我们首先研究了深度网络中误差类型和位置的影响。接下来，我们将重点关注使用多种排序方法来衡量神经元之间的重要性顺序，以选择应该复制的部分。我们发现计算和内存的重复开销是算法性能和鲁棒性之间的权衡。为了以更少的系统开销获得更高的鲁棒性，我们提出了两种错误保护机制，仅从关键神经元复制部分网络。最后，我们验证了我们的方法的实际可行性，并评估了在存在误差的情况下深度网络精度的提高。我们通过在汽车oem使用的Nvidia GeForce RTX 2070Ti GPU和Nvidia Xavier嵌入式平台上的实际应用案例研究来展示这些结果。

{"title":"FT-DeepNets: Fault-Tolerant Convolutional Neural Networks with Kernel-based Duplication","authors":"Iljoo Baek, Wei Chen, Zhihao Zhu, Soheil Samii, R. Rajkumar","doi":"10.1109/WACV51458.2022.00194","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00194","url":null,"abstract":"Deep neural network (deepnet) applications play a crucial role in safety-critical systems such as autonomous vehicles (AVs). An AV must drive safely towards its destination, avoiding obstacles, and respond quickly when the vehicle must stop. Any transient errors in software calculations or hardware memory in these deepnet applications can potentially lead to dramatically incorrect results. Therefore, assessing and mitigating any transient errors and providing robust results are important for safety-critical systems. Previous research on this subject focused on detecting errors and then recovering from the errors by re-running the network. Other approaches were based on the extent of full network duplication such as the ensemble learning-based approach to boost system fault-tolerance by leveraging each model’s advantages. However, it is hard to detect errors in a deep neural network, and the computational overhead of full redundancy can be substantial.We first study the impact of the error types and locations in deepnets. We next focus on selecting which part should be duplicated using multiple ranking methods to measure the order of importance among neurons. We find that the duplication overhead for computation and memory is a trade-off between algorithmic performance and robustness. To achieve higher robustness with less system overhead, we present two error protection mechanisms that only duplicate parts of the network from critical neurons. Finally, we substantiate the practical feasibility of our approach and evaluate the improvement in the accuracy of a deepnet in the presence of errors. We demonstrate these results using a case study with real-world applications on an Nvidia GeForce RTX 2070Ti GPU and an Nvidia Xavier embedded platform used by automotive OEMs.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123627973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Forgery Detection by Internal Positional Learning of Demosaicing Traces 基于去马赛克痕迹内部位置学习的伪造检测

2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00109

Quentin Bammey, R. G. V. Gioi, J. Morel

We propose 4Point (Forensics with Positional Internal Training), an unsupervised neural network trained to assess the consistency of the image colour mosaic to find forgeries. Positional learning trains the model to learn the modulo-2 position of pixels, leveraging the translation-invariance of CNN to replicate the underlying mosaic and its potential inconsistencies. Internal learning on a single potentially forged image improves adaption and robustness to varied post-processing and counter-forensics measures. This solution beats existing mosaic detection methods, is more robust to various post-processing and counter-forensic artefacts such as JPEG compression, and can exploit traces to which state-of-the-art generic neural networks are blind. Check qbammey.github.io/4point for the code.

我们提出了4Point(法医与位置内部训练)，一个无监督的神经网络训练来评估图像颜色马赛克的一致性，以发现伪造。位置学习训练模型学习像素的模2位置，利用CNN的平移不变性来复制潜在的马赛克及其潜在的不一致性。对单个可能伪造的图像进行内部学习可以提高对各种后处理和反取证措施的适应性和鲁棒性。该解决方案击败了现有的马赛克检测方法，对各种后处理和反取证人工制品(如JPEG压缩)更具鲁棒性，并且可以利用最先进的通用神经网络无法识别的痕迹。检查qbammey.github。代码的Io /4点。

引用次数: 5