2021 IEEE/CVF International Conference on Computer Vision (ICCV)最新文献

英文中文

CrackFormer: Transformer Network for Fine-Grained Crack Detection 用于细粒度裂纹检测的变压器网络

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00376

Huajun Liu Xiangyu Miao C. Mertz Chengzhong Xu Hui Kong

Cracks are irregular line structures that are of interest in many computer vision applications. Crack detection (e.g., from pavement images) is a challenging task due to intensity in-homogeneity, topology complexity, low contrast and noisy background. The overall crack detection accuracy can be significantly affected by the detection performance on fine-grained cracks. In this work, we propose a Crack Transformer network (CrackFormer) for fine-grained crack detection. The CrackFormer is composed of novel attention modules in a SegNet-like encoder-decoder architecture. Specifically, it consists of novel self-attention modules with 1x1 convolutional kernels for efficient contextual information extraction across feature-channels, and efficient positional embedding to capture large receptive field contextual information for long range interactions. It also introduces new scaling-attention modules to combine outputs from the corresponding encoder and decoder blocks to suppress non-semantic features and sharpen semantic ones. The CrackFormer is trained and evaluated on three classical crack datasets. The experimental results show that the CrackFormer achieves the Optimal Dataset Scale (ODS) values of 0.871, 0.877 and 0.881, respectively, on the three datasets and outperforms the state-of-the-art methods.

裂缝是许多计算机视觉应用中感兴趣的不规则线结构。由于强度不均匀性、拓扑复杂性、低对比度和噪声背景，裂缝检测(例如路面图像)是一项具有挑战性的任务。细粒裂纹的检测性能对整体裂纹检测精度有显著影响。在这项工作中，我们提出了一个用于细粒度裂纹检测的裂纹变压器网络(CrackFormer)。CrackFormer由新颖的注意力模块组成，采用类似segnet的编码器-解码器架构。具体来说，它包括具有1x1卷积核的新颖自注意模块，用于跨特征通道的高效上下文信息提取，以及有效的位置嵌入，用于捕获远距离交互的大接受场上下文信息。它还引入了新的缩放注意模块，将相应编码器和解码器块的输出组合在一起，以抑制非语义特征并增强语义特征。CrackFormer在三个经典裂纹数据集上进行训练和评估。实验结果表明，该方法在3个数据集上的最优数据集尺度(ODS)值分别为0.871、0.877和0.881，优于现有方法。

{"title":"CrackFormer: Transformer Network for Fine-Grained Crack Detection","authors":"Huajun Liu, Xiangyu Miao, C. Mertz, Chengzhong Xu, Hui Kong","doi":"10.1109/ICCV48922.2021.00376","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00376","url":null,"abstract":"Cracks are irregular line structures that are of interest in many computer vision applications. Crack detection (e.g., from pavement images) is a challenging task due to intensity in-homogeneity, topology complexity, low contrast and noisy background. The overall crack detection accuracy can be significantly affected by the detection performance on fine-grained cracks. In this work, we propose a Crack Transformer network (CrackFormer) for fine-grained crack detection. The CrackFormer is composed of novel attention modules in a SegNet-like encoder-decoder architecture. Specifically, it consists of novel self-attention modules with 1x1 convolutional kernels for efficient contextual information extraction across feature-channels, and efficient positional embedding to capture large receptive field contextual information for long range interactions. It also introduces new scaling-attention modules to combine outputs from the corresponding encoder and decoder blocks to suppress non-semantic features and sharpen semantic ones. The CrackFormer is trained and evaluated on three classical crack datasets. The experimental results show that the CrackFormer achieves the Optimal Dataset Scale (ODS) values of 0.871, 0.877 and 0.881, respectively, on the three datasets and outperforms the state-of-the-art methods.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"5 1","pages":"3763-3772"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74316604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

Fast and Efficient DNN Deployment via Deep Gaussian Transfer Learning 基于深度高斯迁移学习的深度神经网络快速高效部署

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00533

Qi Sun Tinghuan Chen Hao Geng Chen Bai Bei Yu Yang Bai Xinyun Zhang

Deep neural networks (DNNs) have been widely used recently while their hardware deployment optimizations are very time-consuming and the historical deployment knowledge is not utilized efficiently. In this paper, to accelerate the optimization process and find better deployment configurations, we propose a novel transfer learning method based on deep Gaussian processes (DGPs). Firstly, a deep Gaussian process (DGP) model is built on the historical data to learn empirical knowledge. Secondly, to transfer knowledge to a new task, a tuning set is sampled for the new task under the guidance of the DGP model. Then DGP is tuned according to the tuning set via maximum-a-posteriori (MAP) estimation to accommodate for the new task and finally used to guide the deployments of the task. The experiments show that our method achieves the best inference latencies of convolutions while accelerating the optimization process significantly, compared with previous arts.

近年来，深度神经网络(Deep neural networks, dnn)得到了广泛的应用，但其硬件部署优化非常耗时，并且没有有效地利用历史部署知识。在本文中，为了加速优化过程并找到更好的部署配置，我们提出了一种基于深度高斯过程(DGPs)的迁移学习方法。首先，在历史数据基础上建立深度高斯过程模型，学习经验知识;其次，在DGP模型的指导下，对新任务的调优集进行采样，将知识转移到新任务上;然后通过最大后验(MAP)估计根据调优集对DGP进行调优，以适应新任务，并最终用于指导任务的部署。实验表明，与以往的方法相比，我们的方法获得了最佳的卷积推理延迟，同时显著加快了优化过程。

引用次数: 8

Inference of Black Hole Fluid-Dynamics from Sparse Interferometric Measurements 从稀疏干涉测量推断黑洞流体动力学

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

Pub Date : 2021-10-01 DOI: 10.1109/iccv48922.2021.00234

J. Tropp C. Gammie K. Bouman Aviad Levis Daeyoung Lee

We develop an approach to recover the underlying properties of fluid-dynamical processes from sparse measurements. We are motivated by the task of imaging the stochastically evolving environment surrounding black holes, and demonstrate how flow parameters can be estimated from sparse interferometric measurements used in radio astronomical imaging. To model the stochastic flow we use spatio-temporal Gaussian Random Fields (GRFs). The high dimensionality of the underlying source video makes direct representation via a GRF’s full covariance matrix intractable. In contrast, stochastic partial differential equations are able to capture correlations at multiple scales by specifying only local interaction coefficients. Our approach estimates the coefficients of a space-time diffusion equation that dictates the stationary statistics of the dynamical process. We analyze our approach on realistic simulations of black hole evolution and demonstrate its advantage over state-of-the-art dynamic black hole imaging techniques.

我们开发了一种从稀疏测量中恢复流体动力学过程的基本特性的方法。我们的动机是成像黑洞周围随机演化环境的任务，并演示如何通过射电天文成像中使用的稀疏干涉测量来估计流量参数。为了模拟随机流，我们使用时空高斯随机场(GRFs)。底层源视频的高维性使得通过GRF的全协方差矩阵直接表示变得难以处理。相比之下，随机偏微分方程能够通过仅指定局部相互作用系数来捕获多个尺度上的相关性。我们的方法估计时空扩散方程的系数，该方程决定了动态过程的平稳统计。我们分析了我们的方法在黑洞演化的现实模拟，并证明了它比最先进的动态黑洞成像技术的优势。

引用次数: 5

Graspness Discovery in Clutters for Fast and Accurate Grasp Detection 杂波中的抓握发现，实现快速准确的抓握检测

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.01566

Cewu Lu S. Tong Minghao Gou Haoshu Fang Chenxi Wang Hongjie Fang Jin Gao

Efficient and robust grasp pose detection is vital for robotic manipulation. For general 6 DoF grasping, conventional methods treat all points in a scene equally and usually adopt uniform sampling to select grasp candidates. However, we discover that ignoring where to grasp greatly harms the speed and accuracy of current grasp pose detection methods. In this paper, we propose "graspness", a quality based on geometry cues that distinguishes graspable area in cluttered scenes. A look-ahead searching method is proposed for measuring the graspness and statistical results justify the rationality of our method. To quickly detect graspness in practice, we develop a neural network named graspness model to approximate the searching process. Extensive experiments verify the stability, generality and effectiveness of our graspness model, allowing it to be used as a plug-and-play module for different methods. A large improvement in accuracy is witnessed for various previous methods after equipping our graspness model. Moreover, we develop GSNet, an end-to-end network that incorporate our graspness model for early filtering of low quality predictions. Experiments on a large scale benchmark, GraspNet-1Billion, show that our method outperforms previous arts by a large margin (30 + AP) and achieves a high inference speed.

高效、鲁棒的抓取姿态检测是机器人操作的关键。对于一般的6自由度抓取，传统方法对场景中的所有点一视同仁，通常采用均匀采样的方法来选择抓取候选点。然而，我们发现忽略抓取位置严重影响了当前抓取姿态检测方法的速度和准确性。在本文中，我们提出了“抓握性”，这是一种基于几何线索的质量，可以区分混乱场景中的可抓握区域。提出了一种检测抓取程度的前向搜索方法，统计结果证明了该方法的合理性。为了在实践中快速检测抓取性，我们开发了一个名为抓取性模型的神经网络来近似搜索过程。大量的实验验证了我们的抓取模型的稳定性，通用性和有效性，允许它作为一个即插即用模块用于不同的方法。在装备了我们的抓取模型后，我们发现以前的各种方法的准确率都有很大的提高。此外，我们开发了GSNet，这是一个端到端网络，结合了我们的抓取模型，用于低质量预测的早期过滤。在大规模基准测试graspnet - 10亿上的实验表明，我们的方法比以前的方法有很大的优势(30 + AP)，并且达到了很高的推理速度。

{"title":"Graspness Discovery in Clutters for Fast and Accurate Grasp Detection","authors":"Chenxi Wang, Haoshu Fang, Minghao Gou, Hongjie Fang, Jin Gao, Cewu Lu, S. Tong","doi":"10.1109/ICCV48922.2021.01566","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.01566","url":null,"abstract":"Efficient and robust grasp pose detection is vital for robotic manipulation. For general 6 DoF grasping, conventional methods treat all points in a scene equally and usually adopt uniform sampling to select grasp candidates. However, we discover that ignoring where to grasp greatly harms the speed and accuracy of current grasp pose detection methods. In this paper, we propose \"graspness\", a quality based on geometry cues that distinguishes graspable area in cluttered scenes. A look-ahead searching method is proposed for measuring the graspness and statistical results justify the rationality of our method. To quickly detect graspness in practice, we develop a neural network named graspness model to approximate the searching process. Extensive experiments verify the stability, generality and effectiveness of our graspness model, allowing it to be used as a plug-and-play module for different methods. A large improvement in accuracy is witnessed for various previous methods after equipping our graspness model. Moreover, we develop GSNet, an end-to-end network that incorporate our graspness model for early filtering of low quality predictions. Experiments on a large scale benchmark, GraspNet-1Billion, show that our method outperforms previous arts by a large margin (30 + AP) and achieves a high inference speed.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"107 1","pages":"15944-15953"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75696066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

FFT-OT: A Fast Algorithm for Optimal Transportation FFT-OT:一种快速最优运输算法

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00622

Na Lei X. Gu

An optimal transportation map finds the most economical way to transport one probability measure to the other. It has been applied in a broad range of applications in vision, deep learning and medical images. By Brenier theory, computing the optimal transport map is equivalent to solving a Monge-Ampère equation. Due to the highly non-linear nature, the computation of optimal transportation maps in large scale is very challenging.This work proposes a simple but powerful method, the FFT-OT algorithm, to tackle this difficulty based on three key ideas. First, solving Monge-Ampère equation is converted to a fixed point problem; Second, the obliqueness property of optimal transportation maps are reformulated as Neumann boundary conditions on rectangular domains; Third, FFT is applied in each iteration to solve a Poisson equation in order to improve the efficiency.Experiments on surfaces captured from 3D scanning and reconstructed from medical imaging are conducted, and compared with other existing methods. Our experimental results show that the proposed FFT-OT algorithm is simple, general and scalable with high efficiency and accuracy.

最优运输图找到将一个概率测度运输到另一个概率测度的最经济的方式。它在视觉、深度学习和医学图像等领域得到了广泛的应用。根据Brenier理论，计算最优运输图相当于求解monge - ampantere方程。由于交通地图的高度非线性性质，大比例尺的最优交通地图的计算非常具有挑战性。这项工作提出了一个简单而强大的方法，FFT-OT算法，基于三个关键思想来解决这个困难。首先，将求解monge - ampantere方程转化为不动点问题;其次，将最优运输映射的倾斜度性质重新表述为矩形域上的Neumann边界条件;第三，在每次迭代中应用FFT求解泊松方程，以提高效率。对三维扫描捕获的表面和医学成像重建的表面进行了实验，并与其他现有方法进行了比较。实验结果表明，本文提出的FFT-OT算法具有简单、通用、可扩展性强、效率高、精度高等特点。

引用次数: 6

Inferring high-resolution traffic accident risk maps based on satellite imagery and GPS trajectories 基于卫星图像和GPS轨迹推断高分辨率交通事故风险地图

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.01176

M. Sadeghi S. Chawla Mohammad Alizadeh Sam Madden Harinarayanan Balakrishnan Songtao He

Traffic accidents cost about 3% of the world’s GDP and are the leading cause of death in children and young adults. Accident risk maps are useful tools to monitor and mitigate accident risk. We present a technique to generate high-resolution (5 meters) accident risk maps. At this high resolution, accidents are sparse and risk estimation is limited by bias-variance trade-off. Prior accident risk maps either estimate low-resolution maps that are of low utility (high bias), or they use frequency-based estimation techniques that inaccurately predict where accidents actually happen (high variance). To improve this trade-off, we use an end-to-end deep architecture that can input satellite imagery, GPS trajectories, road maps and the history of accidents. Our evaluation on four metropolitan areas in the US with a total area of 7,488 km2 shows that our technique outperform prior work in terms of resolution and accuracy.

交通事故造成的损失约占全球GDP的3%，是儿童和年轻人死亡的主要原因。事故风险图是监测和减轻事故风险的有用工具。我们提出了一种生成高分辨率(5米)事故风险地图的技术。在这种高分辨率下，事故是稀疏的，风险估计受到偏差-方差权衡的限制。先前事故风险图要么估计低效用(高偏差)的低分辨率图，要么使用基于频率的估计技术，无法准确预测事故实际发生的位置(高方差)。为了改善这种权衡，我们使用了端到端的深度架构，可以输入卫星图像、GPS轨迹、道路地图和事故历史。我们对总面积为7,488 km2的美国四个大都市区进行了评估，结果表明我们的技术在分辨率和精度方面优于先前的工作。

引用次数: 11

Ultra-High-Definition Image HDR Reconstruction via Collaborative Bilateral Learning 基于协同双边学习的超高清图像HDR重建

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00441

Xiuyi Jia Xiaochun Cao headimg

Tao Wang Zhuo Zheng Wenqi Ren

Existing single image high dynamic range (HDR) reconstruction methods attempt to expand the range of illuminance. They are not effective in generating plausible textures and colors in the reconstructed results, especially for high-density pixels in ultra-high-definition (UHD) images. To address these problems, we propose a new HDR reconstruction network for UHD images by collaboratively learning color and texture details. First, we propose a dual-path network to extract the content and chromatic features at a reduced resolution of the low dynamic range (LDR) input. These two types of features are used to fit bilateral-space affine models for real-time HDR reconstruction. To extract the main data structure of the LDR input, we propose to use 3D Tucker decomposition and reconstruction to prevent pseudo edges and noise amplification in the learned bilateral grid. As a result, the high-quality content and chromatic features can be reconstructed capitalized on guided bilateral upsampling. Finally, we fuse these two full-resolution feature maps into the HDR reconstructed results. Our proposed method can achieve real-time processing for UHD images (about 160 fps). Experimental results demonstrate that the proposed algorithm performs favorably against the state-of-the-art HDR reconstruction approaches on public benchmarks and real-world UHD images.

现有的单幅图像高动态范围(HDR)重建方法都试图扩大照度范围。它们不能有效地在重建结果中生成可信的纹理和颜色，特别是对于超高清(UHD)图像中的高密度像素。为了解决这些问题，我们提出了一种新的UHD图像HDR重建网络，通过协作学习颜色和纹理细节。首先，我们提出了一种双路径网络，在低动态范围(LDR)输入的降低分辨率下提取内容和颜色特征。这两类特征用于拟合双边空间仿射模型，用于实时HDR重建。为了提取LDR输入的主要数据结构，我们提出使用3D Tucker分解和重建来防止学习到的双边网格中的伪边缘和噪声放大。因此，高质量的内容和色彩特征可以重建引导双边上采样资本化。最后，我们将这两个全分辨率特征图融合到HDR重构结果中。我们提出的方法可以实现UHD图像(约160 fps)的实时处理。实验结果表明，该算法在公共基准测试和真实UHD图像上优于最先进的HDR重建方法。

{"title":"Ultra-High-Definition Image HDR Reconstruction via Collaborative Bilateral Learning","authors":"Zhuo Zheng, Wenqi Ren, Xiaochun Cao, Tao Wang, Xiuyi Jia","doi":"10.1109/ICCV48922.2021.00441","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00441","url":null,"abstract":"Existing single image high dynamic range (HDR) reconstruction methods attempt to expand the range of illuminance. They are not effective in generating plausible textures and colors in the reconstructed results, especially for high-density pixels in ultra-high-definition (UHD) images. To address these problems, we propose a new HDR reconstruction network for UHD images by collaboratively learning color and texture details. First, we propose a dual-path network to extract the content and chromatic features at a reduced resolution of the low dynamic range (LDR) input. These two types of features are used to fit bilateral-space affine models for real-time HDR reconstruction. To extract the main data structure of the LDR input, we propose to use 3D Tucker decomposition and reconstruction to prevent pseudo edges and noise amplification in the learned bilateral grid. As a result, the high-quality content and chromatic features can be reconstructed capitalized on guided bilateral upsampling. Finally, we fuse these two full-resolution feature maps into the HDR reconstructed results. Our proposed method can achieve real-time processing for UHD images (about 160 fps). Experimental results demonstrate that the proposed algorithm performs favorably against the state-of-the-art HDR reconstruction approaches on public benchmarks and real-world UHD images.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"48 1","pages":"4429-4438"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80025591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Learning High-Fidelity Face Texture Completion without Complete Face Texture 学习高保真的面部纹理完成没有完整的面部纹理

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.01373

Jongyoo Kim Xin Tong Jiaolong Yang

For face texture completion, previous methods typically use some complete textures captured by multiview imaging systems or 3D scanners for supervised learning. This paper deals with a new challenging problem - learning to complete invisible texture in a single face image without using any complete texture. We simply leverage a large corpus of face images of different subjects (e. g., FFHQ) to train a texture completion model in an unsupervised manner. To achieve this, we propose DSD-GAN, a novel deep neural network based method that applies two discriminators in UV map space and image space. These two discriminators work in a complementary manner to learn both facial structures and texture details. We show that their combination is essential to obtain high-fidelity results. Despite the network never sees any complete facial appearance, it is able to generate compelling full textures from single images.

对于面部纹理补全，以前的方法通常使用多视图成像系统或3D扫描仪捕获的一些完整纹理进行监督学习。本文研究了一个新的具有挑战性的问题——在不使用任何完整纹理的情况下，学习在单张人脸图像中完成不可见纹理。我们简单地利用不同主题的大量面部图像(例如，FFHQ)以无监督的方式训练纹理完成模型。为了实现这一目标，我们提出了一种新的基于深度神经网络的DSD-GAN方法，该方法在UV地图空间和图像空间中应用两个鉴别器。这两种鉴别器以互补的方式学习面部结构和纹理细节。我们表明，它们的组合对于获得高保真度的结果至关重要。尽管该网络从未看到任何完整的面部外观，但它能够从单个图像中生成引人注目的完整纹理。

引用次数: 11

Neural Image Compression via Attentional Multi-scale Back Projection and Frequency Decomposition 基于注意多尺度反投影和频率分解的神经图像压缩

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.01441

Rong Pan Ge Gao Ho-Jun Lee Yuchao Dai P. You Yuanyuan Zhang Shunyuan Han

In recent years, neural image compression emerges as a rapidly developing topic in computer vision, where the state-of-the-art approaches now exhibit superior compression performance than their conventional counterparts. Despite the great progress, current methods still have limitations in preserving fine spatial details for optimal reconstruction, especially at low compression rates. We make three contributions in tackling this issue. First, we develop a novel back projection method with attentional and multi-scale feature fusion for augmented representation power. Our back projection method recalibrates the current estimation by establishing feedback connections between high-level and low-level attributes in an attentional and discriminative manner. Second, we propose to decompose the input image and separately process the distinct frequency components, whose derived latents are recombined using a novel dual attention module, so that details inside regions of interest could be explicitly manipulated. Third, we propose a novel training scheme for reducing the latent rounding residual. Experimental results show that, when measured in PSNR, our model reduces BD-rate by 9.88% and 10.32% over the state-of-the-art method, and 4.12% and 4.32% over the latest coding standard Versatile Video Coding (VVC) on the Kodak and CLIC2020 Professional Validation dataset, respectively. Our approach also produces more visually pleasant images when optimized for MS-SSIM. The significant improvement upon existing methods shows the effectiveness of our method in preserving and remedying spatial information for enhanced compression quality.

近年来，神经图像压缩成为计算机视觉领域的一个快速发展的课题，其中最先进的方法现在表现出比传统方法更好的压缩性能。尽管取得了很大的进步，但目前的方法在保留最佳重建的精细空间细节方面仍然存在局限性，特别是在低压缩率下。我们在解决这个问题上有三点贡献。首先，我们开发了一种新的基于注意力和多尺度特征融合的增强表征能力的反向投影方法。我们的反向投影方法通过以注意和判别的方式在高级别和低级别属性之间建立反馈连接来重新校准当前的估计。其次，我们提出对输入图像进行分解并分别处理不同的频率分量，并使用一种新的双注意模块对其衍生的电位进行重组，从而可以明确地操纵感兴趣区域内的细节。第三，我们提出了一种新的训练方案来减少潜在的舍入残差。实验结果表明，当以PSNR测量时，我们的模型比最先进的方法分别降低了9.88%和10.32%，比最新编码标准通用视频编码(VVC)在柯达和CLIC2020专业验证数据集上分别降低了4.12%和4.32%。当针对MS-SSIM进行优化时，我们的方法也会产生更令人赏心悦目的图像。在现有方法的基础上进行了显著改进，表明该方法在保存和修复空间信息以提高压缩质量方面是有效的。

{"title":"Neural Image Compression via Attentional Multi-scale Back Projection and Frequency Decomposition","authors":"Ge Gao, P. You, Rong Pan, Shunyuan Han, Yuanyuan Zhang, Yuchao Dai, Ho-Jun Lee","doi":"10.1109/ICCV48922.2021.01441","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.01441","url":null,"abstract":"In recent years, neural image compression emerges as a rapidly developing topic in computer vision, where the state-of-the-art approaches now exhibit superior compression performance than their conventional counterparts. Despite the great progress, current methods still have limitations in preserving fine spatial details for optimal reconstruction, especially at low compression rates. We make three contributions in tackling this issue. First, we develop a novel back projection method with attentional and multi-scale feature fusion for augmented representation power. Our back projection method recalibrates the current estimation by establishing feedback connections between high-level and low-level attributes in an attentional and discriminative manner. Second, we propose to decompose the input image and separately process the distinct frequency components, whose derived latents are recombined using a novel dual attention module, so that details inside regions of interest could be explicitly manipulated. Third, we propose a novel training scheme for reducing the latent rounding residual. Experimental results show that, when measured in PSNR, our model reduces BD-rate by 9.88% and 10.32% over the state-of-the-art method, and 4.12% and 4.32% over the latest coding standard Versatile Video Coding (VVC) on the Kodak and CLIC2020 Professional Validation dataset, respectively. Our approach also produces more visually pleasant images when optimized for MS-SSIM. The significant improvement upon existing methods shows the effectiveness of our method in preserving and remedying spatial information for enhanced compression quality.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"28 1","pages":"14657-14666"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80293685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 45

A Hierarchical Variational Neural Uncertainty Model for Stochastic Video Prediction 随机视频预测的层次变分神经不确定性模型

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00961

Moitreya Chatterjee A. Cherian N. Ahuja

Predicting the future frames of a video is a challenging task, in part due to the underlying stochastic real-world phenomena. Prior approaches to solve this task typically estimate a latent prior characterizing this stochasticity, however do not account for the predictive uncertainty of the (deep learning) model. Such approaches often derive the training signal from the mean-squared error (MSE) between the generated frame and the ground truth, which can lead to sub-optimal training, especially when the predictive uncertainty is high. Towards this end, we introduce Neural Uncertainty Quantifier (NUQ) - a stochastic quantification of the model’s predictive uncertainty, and use it to weigh the MSE loss. We propose a hierarchical, variational framework to derive NUQ in a principled manner using a deep, Bayesian graphical model. Our experiments on three benchmark stochastic video prediction datasets show that our proposed framework trains more effectively compared to the state-of-the-art models (especially when the training sets are small), while demonstrating better video generation quality and diversity against several evaluation metrics.

预测视频的未来帧是一项具有挑战性的任务，部分原因是由于潜在的随机现实世界现象。解决此任务的先前方法通常估计表征该随机性的潜在先验，但不考虑(深度学习)模型的预测不确定性。这种方法通常从生成的帧与真实值之间的均方误差(MSE)中获得训练信号，这可能导致次优训练，特别是在预测不确定性很高的情况下。为此，我们引入了神经不确定性量化器(NUQ)——模型预测不确定性的随机量化，并用它来衡量MSE损失。我们提出了一个分层的变分框架，使用深度贝叶斯图形模型以原则性的方式推导NUQ。我们在三个基准随机视频预测数据集上的实验表明，与最先进的模型相比，我们提出的框架训练更有效(特别是当训练集很小时)，同时针对几个评估指标展示了更好的视频生成质量和多样性。

{"title":"A Hierarchical Variational Neural Uncertainty Model for Stochastic Video Prediction","authors":"Moitreya Chatterjee, N. Ahuja, A. Cherian","doi":"10.1109/ICCV48922.2021.00961","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00961","url":null,"abstract":"Predicting the future frames of a video is a challenging task, in part due to the underlying stochastic real-world phenomena. Prior approaches to solve this task typically estimate a latent prior characterizing this stochasticity, however do not account for the predictive uncertainty of the (deep learning) model. Such approaches often derive the training signal from the mean-squared error (MSE) between the generated frame and the ground truth, which can lead to sub-optimal training, especially when the predictive uncertainty is high. Towards this end, we introduce Neural Uncertainty Quantifier (NUQ) - a stochastic quantification of the model’s predictive uncertainty, and use it to weigh the MSE loss. We propose a hierarchical, variational framework to derive NUQ in a principled manner using a deep, Bayesian graphical model. Our experiments on three benchmark stochastic video prediction datasets show that our proposed framework trains more effectively compared to the state-of-the-art models (especially when the training sets are small), while demonstrating better video generation quality and diversity against several evaluation metrics.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"125 1","pages":"9731-9741"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79022397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀