首页 > 最新文献

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文 中文
Distinguishing Unseen from Seen for Generalized Zero-shot Learning 广义零射击学习中未见与已见的区分
Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.00773
Hongzu Su, Jingjing Li, Zhi Chen, Lei Zhu, Ke Lu
Generalized zero-shot learning (GZSL) aims to recognize samples whose categories may not have been seen at training. Recognizing unseen classes as seen ones or vice versa often leads to poor performance in GZSL. Therefore, distinguishing seen and unseen domains is naturally an effective yet challenging solution for GZSL. In this paper, we present a novel method which leverages both visual and semantic modalities to distinguish seen and unseen categories. Specifically, our method deploys two variational autoencoders to generate latent representations for visual and semantic modalities in a shared latent space, in which we align latent representations of both modalities by Wasserstein distance and reconstruct two modalities with the representations of each other. In order to learn a clearer boundary between seen and unseen classes, we propose a two-stage training strategy which takes advantage of seen and unseen semantic descriptions and searches a threshold to separate seen and unseen visual samples. At last, a seen expert and an unseen expert are used for final classification. Extensive experiments on five widely used benchmarks verify that the proposed method can significantly improve the results of GZSL. For instance, our method correctly recognizes more than 99% samples when separating domains and improves the final classification accuracy from 72.6% to 82.9% on AWA1.
广义零概率学习(GZSL)旨在识别在训练中可能没有看到类别的样本。将不可见类识别为可见类,反之亦然,通常会导致GZSL中的性能不佳。因此,区分可见域和不可见域自然是GZSL的一个有效但具有挑战性的解决方案。在本文中,我们提出了一种利用视觉和语义模式来区分可见和未见类别的新方法。具体来说,我们的方法部署了两个变分自编码器,在共享的潜在空间中生成视觉和语义模态的潜在表征,其中我们通过沃瑟斯坦距离对齐两种模态的潜在表征,并用彼此的表征重建两种模态。为了学习更清晰的可见类和不可见类之间的边界,我们提出了一种利用可见和不可见语义描述的两阶段训练策略,并搜索阈值来分离可见和不可见的视觉样本。最后,使用一个可见专家和一个不可见专家进行最终分类。在五个广泛使用的基准测试上进行的大量实验验证了该方法可以显著改善GZSL的结果。例如,我们的方法在分离域时正确识别了99%以上的样本,并将最终的分类准确率从AWA1上的72.6%提高到82.9%。
{"title":"Distinguishing Unseen from Seen for Generalized Zero-shot Learning","authors":"Hongzu Su, Jingjing Li, Zhi Chen, Lei Zhu, Ke Lu","doi":"10.1109/CVPR52688.2022.00773","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.00773","url":null,"abstract":"Generalized zero-shot learning (GZSL) aims to recognize samples whose categories may not have been seen at training. Recognizing unseen classes as seen ones or vice versa often leads to poor performance in GZSL. Therefore, distinguishing seen and unseen domains is naturally an effective yet challenging solution for GZSL. In this paper, we present a novel method which leverages both visual and semantic modalities to distinguish seen and unseen categories. Specifically, our method deploys two variational autoencoders to generate latent representations for visual and semantic modalities in a shared latent space, in which we align latent representations of both modalities by Wasserstein distance and reconstruct two modalities with the representations of each other. In order to learn a clearer boundary between seen and unseen classes, we propose a two-stage training strategy which takes advantage of seen and unseen semantic descriptions and searches a threshold to separate seen and unseen visual samples. At last, a seen expert and an unseen expert are used for final classification. Extensive experiments on five widely used benchmarks verify that the proposed method can significantly improve the results of GZSL. For instance, our method correctly recognizes more than 99% samples when separating domains and improves the final classification accuracy from 72.6% to 82.9% on AWA1.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"756 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114003652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Uniform Subdivision of Omnidirectional Camera Space for Efficient Spherical Stereo Matching 面向球面立体匹配的全向相机空间均匀细分
Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.01263
D. Kang, Hyeonjoong Jang, Jungeon Lee, C. Kyung, Min H. Kim
Omnidirectional cameras have been used widely to better understand surrounding environments. They are often configured as stereo to estimate depth. However, due to the optics of the fish eye lens, conventional epipolar geometry is inapplicable directly to omnidirectional camera images. Intermediate formats of omnidirectional images, such as equirect-angular images, have been used. However, stereo matching performance on these image formats has been lower than the conventional stereo due to severe image distortion near pole regions. In this paper, to address the distortion problem of omnidirectional images, we devise a novel subdivision scheme of a spherical geodesic grid. This enables more isotropic patch sampling of spherical image information in the omnidirectional camera space. By extending the existing equalarc scheme, our spherical geodesic grid is tessellated with an equalepiline subdivision scheme, making the cell sizes and in-between distances as uniform as possible, i.e., the arc length of the spherical grid cell's edges is well regularized. Also, our uniformly tessellated coordinates in a 2D image can be transformed into spherical coordinates via one-to-one mapping, allowing for analytical forward/backward transformation. Our uniform tessellation scheme achieves a higher accuracy of stereo matching than the traditional cylindrical and cubemap-based approaches, reducing the memory footage required for stereo matching by 20%.
为了更好地了解周围环境,全向相机得到了广泛的应用。它们通常被配置为立体来估计深度。然而,由于鱼眼镜头的光学特性,传统的极外几何结构不能直接应用于全向相机图像。全向图像的中间格式,如等角图像,已被使用。然而,由于近极点区域的图像失真严重,这些图像格式的立体匹配性能一直低于传统的立体匹配。为了解决全向图像的失真问题,提出了一种球面测地线网格细分方案。这样可以在全向相机空间中对球面图像信息进行更多的各向同性补丁采样。通过对现有等弧格式的扩展,采用等弧细分格式对球面测地线网格进行镶嵌,使网格单元的尺寸和间隔距离尽可能均匀,即球面网格单元边缘的弧长得到了很好的正则化。此外,我们在二维图像中的均匀镶嵌坐标可以通过一对一的映射转换成球坐标,允许分析的前/后转换。我们的均匀镶嵌方案比传统的基于圆柱形和立方体的方法实现了更高的立体匹配精度,将立体匹配所需的内存片段减少了20%。
{"title":"Uniform Subdivision of Omnidirectional Camera Space for Efficient Spherical Stereo Matching","authors":"D. Kang, Hyeonjoong Jang, Jungeon Lee, C. Kyung, Min H. Kim","doi":"10.1109/CVPR52688.2022.01263","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.01263","url":null,"abstract":"Omnidirectional cameras have been used widely to better understand surrounding environments. They are often configured as stereo to estimate depth. However, due to the optics of the fish eye lens, conventional epipolar geometry is inapplicable directly to omnidirectional camera images. Intermediate formats of omnidirectional images, such as equirect-angular images, have been used. However, stereo matching performance on these image formats has been lower than the conventional stereo due to severe image distortion near pole regions. In this paper, to address the distortion problem of omnidirectional images, we devise a novel subdivision scheme of a spherical geodesic grid. This enables more isotropic patch sampling of spherical image information in the omnidirectional camera space. By extending the existing equalarc scheme, our spherical geodesic grid is tessellated with an equalepiline subdivision scheme, making the cell sizes and in-between distances as uniform as possible, i.e., the arc length of the spherical grid cell's edges is well regularized. Also, our uniformly tessellated coordinates in a 2D image can be transformed into spherical coordinates via one-to-one mapping, allowing for analytical forward/backward transformation. Our uniform tessellation scheme achieves a higher accuracy of stereo matching than the traditional cylindrical and cubemap-based approaches, reducing the memory footage required for stereo matching by 20%.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114864919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EvUnroll: Neuromorphic Events based Rolling Shutter Image Correction EvUnroll:基于滚动快门图像校正的神经形态事件
Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.01725
Xinyu Zhou, Peiqi Duan, Yi Ma, Boxin Shi
This paper proposes to use neuromorphic events for correcting rolling shutter (RS) images as consecutive global shutter (GS) frames. RS effect introduces edge distortion and region occlusion into images caused by row-wise read-out of CMOS sensors. We introduce a novel computational imaging setup consisting of an RS sensor and an event sensor, and propose a neural network called EvUnroll to solve this problem by exploring the high-temporal-resolution property of events. We use events to bridge a spatio-temporal connection between RS and GS, establish a flow estimation module to correct edge distortions, and design a synthesis-based restoration module to restore occluded regions. The results of two branches are fused through a refining module to generate corrected GS images. We further propose datasets captured by a high-speed camera and an RS-Event hybrid camera system for training and testing our network. Experimental results on both public and proposed datasets show a systematic performance improvement compared to state-of-the-art methods.
本文提出将神经形态事件作为连续全局快门(GS)帧用于校正滚动快门(RS)图像。RS效应在CMOS传感器逐行读出时,会引起图像的边缘失真和区域遮挡。我们介绍了一种由RS传感器和事件传感器组成的新型计算成像装置,并提出了一种称为EvUnroll的神经网络,通过探索事件的高时间分辨率特性来解决这一问题。我们利用事件在RS和GS之间架起了一个时空联系的桥梁,建立了一个流量估计模块来纠正边缘失真,设计了一个基于综合的恢复模块来恢复被遮挡的区域。两个分支的结果通过一个精炼模块融合,生成校正后的GS图像。我们进一步提出了由高速摄像机和RS-Event混合摄像机系统捕获的数据集,用于训练和测试我们的网络。在公开和提议的数据集上的实验结果表明,与最先进的方法相比,该方法的系统性能有所提高。
{"title":"EvUnroll: Neuromorphic Events based Rolling Shutter Image Correction","authors":"Xinyu Zhou, Peiqi Duan, Yi Ma, Boxin Shi","doi":"10.1109/CVPR52688.2022.01725","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.01725","url":null,"abstract":"This paper proposes to use neuromorphic events for correcting rolling shutter (RS) images as consecutive global shutter (GS) frames. RS effect introduces edge distortion and region occlusion into images caused by row-wise read-out of CMOS sensors. We introduce a novel computational imaging setup consisting of an RS sensor and an event sensor, and propose a neural network called EvUnroll to solve this problem by exploring the high-temporal-resolution property of events. We use events to bridge a spatio-temporal connection between RS and GS, establish a flow estimation module to correct edge distortions, and design a synthesis-based restoration module to restore occluded regions. The results of two branches are fused through a refining module to generate corrected GS images. We further propose datasets captured by a high-speed camera and an RS-Event hybrid camera system for training and testing our network. Experimental results on both public and proposed datasets show a systematic performance improvement compared to state-of-the-art methods.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127675359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Computing Wasserstein-$p$ Distance Between Images with Linear Cost 用线性代价计算Wasserstein-$p$图像之间的距离
Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.00060
Yidong Chen, Chen Li, Z. Lu
When the images are formulated as discrete measures, computing Wasserstein-p distance between them is challenging due to the complexity of solving the corresponding Kantorovich's problem. In this paper, we propose a novel algorithm to compute the Wasserstein-p distance between discrete measures by restricting the optimal transport (OT) problem on a subset. First, we define the restricted OT problem and prove the solution of the restricted problem converges to Kantorovich's OT solution. Second, we propose the SparseSinkhorn algorithm for the restricted problem and provide a multi-scale algorithm to estimate the subset. Finally, we implement the proposed algorithm on CUDA and illustrate the linear computational cost in terms of time and memory requirements. We compute Wasserstein-p distance, estimate the transport mapping, and transfer color between color images with size ranges from $64times 64$ to $1920times 1200$. (Our code is available at https://github.com/ucascnic/CudaOT)
当图像被表述为离散度量时,由于解决相应的Kantorovich问题的复杂性,计算它们之间的Wasserstein-p距离是具有挑战性的。在本文中,我们提出了一种新的算法,通过限制子集上的最优传输(OT)问题来计算离散测度之间的Wasserstein-p距离。首先定义了受限OT问题,并证明了受限问题的解收敛于Kantorovich的OT解。其次,我们针对受限问题提出了SparseSinkhorn算法,并提供了一个多尺度的子集估计算法。最后,我们在CUDA上实现了所提出的算法,并从时间和内存需求方面说明了线性计算成本。我们计算Wasserstein-p距离,估计传输映射,并在大小范围从$64 × 64$到$1920 × 1200$的彩色图像之间传输颜色。(我们的代码可在https://github.com/ucascnic/CudaOT找到)
{"title":"Computing Wasserstein-$p$ Distance Between Images with Linear Cost","authors":"Yidong Chen, Chen Li, Z. Lu","doi":"10.1109/CVPR52688.2022.00060","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.00060","url":null,"abstract":"When the images are formulated as discrete measures, computing Wasserstein-p distance between them is challenging due to the complexity of solving the corresponding Kantorovich's problem. In this paper, we propose a novel algorithm to compute the Wasserstein-p distance between discrete measures by restricting the optimal transport (OT) problem on a subset. First, we define the restricted OT problem and prove the solution of the restricted problem converges to Kantorovich's OT solution. Second, we propose the SparseSinkhorn algorithm for the restricted problem and provide a multi-scale algorithm to estimate the subset. Finally, we implement the proposed algorithm on CUDA and illustrate the linear computational cost in terms of time and memory requirements. We compute Wasserstein-p distance, estimate the transport mapping, and transfer color between color images with size ranges from $64times 64$ to $1920times 1200$. (Our code is available at https://github.com/ucascnic/CudaOT)","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126302626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Day-to-Night Image Synthesis for Training Nighttime Neural ISPs 用于训练夜间神经isp的日夜图像合成
Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.01050
Abhijith Punnappurath, Abdullah Abuolaim, A. Abdelhamed, Alex Levinshtein, M. S. Brown
Many flagship smartphone cameras now use a dedicated neural image signal processor (ISP) to render noisy raw sensor images to the final processed output. Training night-mode ISP networks relies on large-scale datasets of image pairs with: (1) a noisy raw image captured with a short exposure and a high ISO gain; and (2) a ground truth low-noise raw image captured with a long exposure and low ISO that has been rendered through the ISP. Capturing such image pairs is tedious and time-consuming, requiring careful setup to ensure alignment between the image pairs. In addition, ground truth images are often prone to motion blur due to the long exposure. To address this problem, we propose a method that synthesizes nighttime images from day-time images. Daytime images are easy to capture, exhibit low-noise (even on smartphone cameras) and rarely suffer from motion blur. We outline a processing framework to convert daytime raw images to have the appearance of realistic nighttime raw images with different levels of noise. Our procedure allows us to easily produce aligned noisy and clean nighttime image pairs. We show the effectiveness of our synthesis framework by training neural ISPs for nightmode rendering. Furthermore, we demonstrate that using our synthetic nighttime images together with small amounts of real data (e.g., 5% to 10%) yields performance almost on par with training exclusively on real nighttime images. Our dataset and code are available at https://github.com/SamsungLabs/day-to-night.
现在,许多旗舰智能手机相机都使用专用的神经图像信号处理器(ISP)来将有噪声的原始传感器图像渲染到最终处理输出。训练夜间模式ISP网络依赖于图像对的大规模数据集:(1)短曝光和高ISO增益捕获的噪声原始图像;(2)通过ISP渲染的长时间曝光和低ISO捕获的地面真实低噪声原始图像。捕获这样的图像对既繁琐又耗时,需要仔细设置以确保图像对之间的对齐。此外,由于曝光时间过长,地面真实图像往往容易出现动态模糊。为了解决这个问题,我们提出了一种从白天图像合成夜间图像的方法。白天的图像很容易捕捉,表现出低噪点(即使在智能手机相机上),很少受到运动模糊的影响。我们概述了一个处理框架,将白天的原始图像转换为具有不同噪声水平的逼真夜间原始图像的外观。我们的程序使我们能够轻松地产生对齐噪声和干净的夜间图像对。我们通过训练用于夜间模式渲染的神经isp来证明我们的合成框架的有效性。此外,我们证明,将我们的合成夜间图像与少量真实数据(例如,5%至10%)一起使用,其性能几乎与仅使用真实夜间图像进行训练相当。我们的数据集和代码可在https://github.com/SamsungLabs/day-to-night上获得。
{"title":"Day-to-Night Image Synthesis for Training Nighttime Neural ISPs","authors":"Abhijith Punnappurath, Abdullah Abuolaim, A. Abdelhamed, Alex Levinshtein, M. S. Brown","doi":"10.1109/CVPR52688.2022.01050","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.01050","url":null,"abstract":"Many flagship smartphone cameras now use a dedicated neural image signal processor (ISP) to render noisy raw sensor images to the final processed output. Training night-mode ISP networks relies on large-scale datasets of image pairs with: (1) a noisy raw image captured with a short exposure and a high ISO gain; and (2) a ground truth low-noise raw image captured with a long exposure and low ISO that has been rendered through the ISP. Capturing such image pairs is tedious and time-consuming, requiring careful setup to ensure alignment between the image pairs. In addition, ground truth images are often prone to motion blur due to the long exposure. To address this problem, we propose a method that synthesizes nighttime images from day-time images. Daytime images are easy to capture, exhibit low-noise (even on smartphone cameras) and rarely suffer from motion blur. We outline a processing framework to convert daytime raw images to have the appearance of realistic nighttime raw images with different levels of noise. Our procedure allows us to easily produce aligned noisy and clean nighttime image pairs. We show the effectiveness of our synthesis framework by training neural ISPs for nightmode rendering. Furthermore, we demonstrate that using our synthetic nighttime images together with small amounts of real data (e.g., 5% to 10%) yields performance almost on par with training exclusively on real nighttime images. Our dataset and code are available at https://github.com/SamsungLabs/day-to-night.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126413773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
ABPN: Adaptive Blend Pyramid Network for Real-Time Local Retouching of Ultra High-Resolution Photo ABPN:用于超高分辨率照片实时局部修饰的自适应混合金字塔网络
Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.00215
Biwen Lei, Xiefan Guo, Hongyu Yang, Miaomiao Cui, Xuansong Xie, Dihe Huang
Photo retouching finds many applications in various fields. However, most existing methods are designed for global retouching and seldom pay attention to the local region, while the latter is actually much more tedious and time-consuming in photography pipelines. In this paper, we propose a novel adaptive blend pyramid network, which aims to achieve fast local retouching on ultra high-resolution photos. The network is mainly composed of two components: a context-aware local retouching layer (LRL) and an adaptive blend pyramid layer (BPL). The LRL is designed to implement local retouching on low-resolution images, giving full consideration of the global context and local texture information, and the BPL is then developed to progressively expand the low-resolution results to the higher ones, with the help of the proposed adaptive blend module and refining module. Our method outperforms the existing methods by a large margin on two local photo retouching tasks and exhibits excellent performance in terms of running speed, achieving real-time inference on 4K images with a single NVIDIA Tesla P100 GPU. Moreover, we introduce the first high-definition cloth retouching dataset CRHD-3K to promote the research on local photo retouching. The dataset is available at https://github.com/youngLbw/crhd-3K.
照片修饰在各个领域都有很多应用。然而,现有的大多数方法都是针对全局修饰而设计的,很少关注局部区域,而后者实际上在摄影流程中更加繁琐和耗时。本文提出了一种新的自适应混合金字塔网络,旨在实现超高分辨率照片的快速局部修图。该网络主要由两个部分组成:上下文感知的局部修饰层(LRL)和自适应混合金字塔层(BPL)。设计LRL对低分辨率图像进行局部修饰,充分考虑全局上下文和局部纹理信息,然后开发BPL,借助本文提出的自适应混合模块和细化模块,逐步将低分辨率结果扩展到更高分辨率结果。我们的方法在两个局部照片修图任务上大大优于现有方法,并且在运行速度方面表现出色,在单个NVIDIA Tesla P100 GPU上实现了对4K图像的实时推理。此外,我们引入了第一个高清布片修图数据集CRHD-3K,推动了局部照片修图的研究。该数据集可在https://github.com/youngLbw/crhd-3K上获得。
{"title":"ABPN: Adaptive Blend Pyramid Network for Real-Time Local Retouching of Ultra High-Resolution Photo","authors":"Biwen Lei, Xiefan Guo, Hongyu Yang, Miaomiao Cui, Xuansong Xie, Dihe Huang","doi":"10.1109/CVPR52688.2022.00215","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.00215","url":null,"abstract":"Photo retouching finds many applications in various fields. However, most existing methods are designed for global retouching and seldom pay attention to the local region, while the latter is actually much more tedious and time-consuming in photography pipelines. In this paper, we propose a novel adaptive blend pyramid network, which aims to achieve fast local retouching on ultra high-resolution photos. The network is mainly composed of two components: a context-aware local retouching layer (LRL) and an adaptive blend pyramid layer (BPL). The LRL is designed to implement local retouching on low-resolution images, giving full consideration of the global context and local texture information, and the BPL is then developed to progressively expand the low-resolution results to the higher ones, with the help of the proposed adaptive blend module and refining module. Our method outperforms the existing methods by a large margin on two local photo retouching tasks and exhibits excellent performance in terms of running speed, achieving real-time inference on 4K images with a single NVIDIA Tesla P100 GPU. Moreover, we introduce the first high-definition cloth retouching dataset CRHD-3K to promote the research on local photo retouching. The dataset is available at https://github.com/youngLbw/crhd-3K.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125637011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
SMPL-A: Modeling Person-Specific Deformable Anatomy SMPL-A:建模个人特定的可变形解剖
Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.02015
Hengtao Guo, Benjamin Planche, Meng Zheng, S. Karanam, Terrence Chen, Ziyan Wu
A variety of diagnostic and therapeutic protocols rely on locating in vivo target anatomical structures, which can be obtained from medical scans. However, organs move and deform as the patient changes his/her pose. In order to obtain accurate target location information, clinicians have to either conduct frequent intraoperative scans, resulting in higher exposition of patients to radiations, or adopt proxy procedures (e.g., creating and using custom molds to keep patients in the exact same pose during both preoperative organ scanning and subsequent treatment. Such custom proxy methods are typically sub-optimal, constraining the clinicians and costing precious time and money to the patients. To the best of our knowledge, this work is the first to present a learning-based approach to estimate the patient's internal organ deformation for arbitrary human poses in order to assist with radiotherapy and similar medical protocols. The underlying method first leverages medical scans to learn a patient-specific representation that potentially encodes the organ's shape and elastic properties. During inference, given the patient's current body pose information and the organ's representation extracted from previous medical scans, our method can estimate their current organ deformation to offer guidance to clinicians. We conduct experiments on a well-sized dataset which is augmented through real clinical data using finite element modeling. Our results suggest that pose-dependent organ deformation can be learned through a point cloud autoencoder conditioned on the parametric pose input. We hope that this work can be a starting point for future research towards closing the loop between human mesh recovery and anatomical reconstruction, with applications beyond the medical domain.
各种诊断和治疗方案依赖于定位体内目标解剖结构,这可以从医学扫描中获得。然而,当病人改变姿势时,器官会移动和变形。为了获得准确的目标位置信息,临床医生必须进行频繁的术中扫描,导致患者暴露于更高的辐射,或者采用替代程序(例如,创建和使用定制模具,以保持患者在术前器官扫描和后续治疗期间保持完全相同的姿势)。这种自定义代理方法通常是次优的,限制了临床医生,耗费了患者宝贵的时间和金钱。据我们所知,这项工作首次提出了一种基于学习的方法来估计任意人体姿势下患者的内部器官变形,以协助放射治疗和类似的医疗方案。基础方法首先利用医学扫描来学习特定于患者的表征,这种表征可能会编码器官的形状和弹性特性。在推理过程中,给定患者当前的身体姿势信息和从以前的医学扫描中提取的器官表征,我们的方法可以估计他们当前的器官变形,为临床医生提供指导。我们在一个规模良好的数据集上进行实验,该数据集通过使用有限元建模的真实临床数据进行增强。我们的研究结果表明,姿态相关的器官变形可以通过参数姿态输入条件下的点云自编码器来学习。我们希望这项工作可以成为未来研究的起点,以闭合人体网状恢复和解剖重建之间的循环,并应用于医学领域以外的领域。
{"title":"SMPL-A: Modeling Person-Specific Deformable Anatomy","authors":"Hengtao Guo, Benjamin Planche, Meng Zheng, S. Karanam, Terrence Chen, Ziyan Wu","doi":"10.1109/CVPR52688.2022.02015","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.02015","url":null,"abstract":"A variety of diagnostic and therapeutic protocols rely on locating in vivo target anatomical structures, which can be obtained from medical scans. However, organs move and deform as the patient changes his/her pose. In order to obtain accurate target location information, clinicians have to either conduct frequent intraoperative scans, resulting in higher exposition of patients to radiations, or adopt proxy procedures (e.g., creating and using custom molds to keep patients in the exact same pose during both preoperative organ scanning and subsequent treatment. Such custom proxy methods are typically sub-optimal, constraining the clinicians and costing precious time and money to the patients. To the best of our knowledge, this work is the first to present a learning-based approach to estimate the patient's internal organ deformation for arbitrary human poses in order to assist with radiotherapy and similar medical protocols. The underlying method first leverages medical scans to learn a patient-specific representation that potentially encodes the organ's shape and elastic properties. During inference, given the patient's current body pose information and the organ's representation extracted from previous medical scans, our method can estimate their current organ deformation to offer guidance to clinicians. We conduct experiments on a well-sized dataset which is augmented through real clinical data using finite element modeling. Our results suggest that pose-dependent organ deformation can be learned through a point cloud autoencoder conditioned on the parametric pose input. We hope that this work can be a starting point for future research towards closing the loop between human mesh recovery and anatomical reconstruction, with applications beyond the medical domain.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122220277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation 参考视频对象分割的语言桥接时空交互
Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.00491
Zihan Ding, Tianrui Hui, Junshi Huang, Xiaoming Wei, Jizhong Han, Si Liu
Referring video object segmentation aims to predict foreground labels for objects referred by natural language expressions in videos. Previous methods either depend on 3D ConvNets or incorporate additional 2D ConvNets as encoders to extract mixed spatial-temporal features. However, these methods suffer from spatial misalignment or false distractors due to delayed and implicit spatial-temporal interaction occurring in the decoding phase. To tackle these limitations, we propose a Language-Bridged Duplex Transfer (LBDT) module which utilizes language as an intermediary bridge to accomplish explicit and adaptive spatial-temporal interaction earlier in the encoding phase. Concretely, cross-modal attention is performed among the temporal encoder, referring words and the spatial encoder to aggregate and transfer language-relevant motion and appearance information. In addition, we also propose a Bilateral Channel Activation (BCA) module in the decoding phase for further denoising and highlighting the spatial-temporal consistent features via channel-wise activation. Extensive experiments show our method achieves new state-of-the-art performances on four popular benchmarks with 6.8% and 6.9% absolute AP gains on A2D Sentences and J-HMDB Sentences respectively, while consuming around 7× less computational overhead11https://github.com/dzh19990407/LBDT.
引用视频对象分割的目的是预测视频中自然语言表达式引用的对象的前景标签。以前的方法要么依赖于三维卷积神经网络,要么将额外的二维卷积神经网络作为编码器来提取混合时空特征。然而,由于解码阶段发生的延迟和隐式时空相互作用,这些方法存在空间失调或虚假干扰。为了解决这些限制,我们提出了一种语言桥接双工传输(LBDT)模块,该模块利用语言作为中间桥梁,在编码阶段早期完成显式和自适应的时空交互。具体而言,跨模态注意在时间编码器、参考词和空间编码器之间进行,以聚合和传递与语言相关的动作和外观信息。此外,我们还提出了一个解码阶段的双边通道激活(BCA)模块,通过通道激活进一步去噪和突出时空一致性特征。大量的实验表明,我们的方法在四个流行的基准测试上取得了新的最先进的性能,在A2D句子和J-HMDB句子上分别获得了6.8%和6.9%的绝对AP增益,同时消耗了大约7倍的计算开销。
{"title":"Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation","authors":"Zihan Ding, Tianrui Hui, Junshi Huang, Xiaoming Wei, Jizhong Han, Si Liu","doi":"10.1109/CVPR52688.2022.00491","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.00491","url":null,"abstract":"Referring video object segmentation aims to predict foreground labels for objects referred by natural language expressions in videos. Previous methods either depend on 3D ConvNets or incorporate additional 2D ConvNets as encoders to extract mixed spatial-temporal features. However, these methods suffer from spatial misalignment or false distractors due to delayed and implicit spatial-temporal interaction occurring in the decoding phase. To tackle these limitations, we propose a Language-Bridged Duplex Transfer (LBDT) module which utilizes language as an intermediary bridge to accomplish explicit and adaptive spatial-temporal interaction earlier in the encoding phase. Concretely, cross-modal attention is performed among the temporal encoder, referring words and the spatial encoder to aggregate and transfer language-relevant motion and appearance information. In addition, we also propose a Bilateral Channel Activation (BCA) module in the decoding phase for further denoising and highlighting the spatial-temporal consistent features via channel-wise activation. Extensive experiments show our method achieves new state-of-the-art performances on four popular benchmarks with 6.8% and 6.9% absolute AP gains on A2D Sentences and J-HMDB Sentences respectively, while consuming around 7× less computational overhead11https://github.com/dzh19990407/LBDT.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121686797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Partial Class Activation Attention for Semantic Segmentation 语义分割的部分类激活注意
Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.01633
Sun'ao Liu, Hongtao Xie, Hai Xu, Yongdong Zhang, Qi Tian
Current attention-based methods for semantic segmentation mainly model pixel relation through pairwise affinity and coarse segmentation. For the first time, this paper explores modeling pixel relation via Class Activation Map (CAM). Beyond the previous CAM generated from image-level classification, we present Partial CAM, which sub-divides the task into region-level prediction and achieves better localization performance. In order to eliminate the intra-class inconsistency caused by the variances of local context, we further propose Partial Class Activation Attention (PCAA) that simultaneously utilizes local and global class-level representations for attention calculation. Once obtained the partial CAM, PCAA collects local class centers and computes pixel-to-class relation locally. Applying local-specific representations ensures reliable results under different local contexts. To guarantee global consistency, we gather global representations from all local class centers and conduct feature aggregation. Experimental results confirm that Partial CAM outperforms the previous two strategies as pixel relation. Notably, our method achieves state-of-the-art performance on several challenging benchmarks including Cityscapes, Pascal Context, and ADE20K. Code is available at https://github.com/lsa1997/PCAA.
目前基于注意力的语义分割方法主要通过两两亲和和粗分割对像素关系进行建模。本文首次探索了基于类激活图(Class Activation Map, CAM)的像素关系建模方法。在以往基于图像级分类生成CAM的基础上,我们提出了局部CAM,将任务细分为区域级预测,实现了更好的定位性能。为了消除局部上下文差异引起的类内不一致性,我们进一步提出了部分类激活注意(PCAA),该方法同时利用局部和全局类级别表示进行注意力计算。一旦获得部分CAM, PCAA收集局部类中心,并在局部计算像素到类的关系。应用特定于本地的表示可确保在不同的本地上下文中得到可靠的结果。为了保证全局一致性,我们从所有局部类中心收集全局表示并进行特征聚合。实验结果表明,局部CAM在像素关系方面优于前两种策略。值得注意的是,我们的方法在几个具有挑战性的基准测试中实现了最先进的性能,包括cityscape、Pascal Context和ADE20K。代码可从https://github.com/lsa1997/PCAA获得。
{"title":"Partial Class Activation Attention for Semantic Segmentation","authors":"Sun'ao Liu, Hongtao Xie, Hai Xu, Yongdong Zhang, Qi Tian","doi":"10.1109/CVPR52688.2022.01633","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.01633","url":null,"abstract":"Current attention-based methods for semantic segmentation mainly model pixel relation through pairwise affinity and coarse segmentation. For the first time, this paper explores modeling pixel relation via Class Activation Map (CAM). Beyond the previous CAM generated from image-level classification, we present Partial CAM, which sub-divides the task into region-level prediction and achieves better localization performance. In order to eliminate the intra-class inconsistency caused by the variances of local context, we further propose Partial Class Activation Attention (PCAA) that simultaneously utilizes local and global class-level representations for attention calculation. Once obtained the partial CAM, PCAA collects local class centers and computes pixel-to-class relation locally. Applying local-specific representations ensures reliable results under different local contexts. To guarantee global consistency, we gather global representations from all local class centers and conduct feature aggregation. Experimental results confirm that Partial CAM outperforms the previous two strategies as pixel relation. Notably, our method achieves state-of-the-art performance on several challenging benchmarks including Cityscapes, Pascal Context, and ADE20K. Code is available at https://github.com/lsa1997/PCAA.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131932761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Optimal LED Spectral Multiplexing for NIR2RGB Translation 最佳LED光谱复用NIR2RGB转换
Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.01232
Lei Liu, Yuze Chen, Junchi Yan, Yinqiang Zheng
The industry practice for night video surveillance is to use auxiliary near-infrared (NIR) LEDs, usually centered at 850nm or 940nm, for scene illumination. NIR LEDs are used to save power consumption while hiding the surveillance coverage area from naked human eyes. The captured images are almost monochromatic, and visual color and texture tend to disappear, which hinders human and machine perception. A few existing studies have tried to convert such NIR images to RGB images through deep learning, which can not provide satisfying results, nor generalize well beyond the training dataset. In this paper, we aim to break the fundamental restrictions on reliable NIR-to-RGB (NIR2RGB) translation by examining the imaging mechanism of single-chip silicon-based RGB cameras under NIR illuminations, and propose to retrieve the optimal LED multiplexing via deep learning. Experimental results show that this translation task can be significantly improved by properly multiplexing NIR LEDs close to the visible spectral range than using 850nm and 940nm LEDs.
夜间视频监控的行业惯例是使用辅助的近红外(NIR) led,通常以850nm或940nm为中心,用于场景照明。使用近红外led可以节省功耗,同时可以隐藏肉眼看不到的监控覆盖区域。捕获的图像几乎是单色的,视觉颜色和纹理往往会消失,这阻碍了人类和机器的感知。现有的一些研究试图通过深度学习将这种近红外图像转换为RGB图像,但不能提供令人满意的结果,也不能很好地推广到训练数据集之外。在本文中,我们旨在通过研究单片硅基RGB相机在近红外照明下的成像机制,打破可靠的NIR-to-RGB (NIR2RGB)转换的基本限制,并提出通过深度学习检索最佳LED复用。实验结果表明,与使用850nm和940nm的led相比,适当复用接近可见光谱范围的近红外led可以显著改善这种转换任务。
{"title":"Optimal LED Spectral Multiplexing for NIR2RGB Translation","authors":"Lei Liu, Yuze Chen, Junchi Yan, Yinqiang Zheng","doi":"10.1109/CVPR52688.2022.01232","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.01232","url":null,"abstract":"The industry practice for night video surveillance is to use auxiliary near-infrared (NIR) LEDs, usually centered at 850nm or 940nm, for scene illumination. NIR LEDs are used to save power consumption while hiding the surveillance coverage area from naked human eyes. The captured images are almost monochromatic, and visual color and texture tend to disappear, which hinders human and machine perception. A few existing studies have tried to convert such NIR images to RGB images through deep learning, which can not provide satisfying results, nor generalize well beyond the training dataset. In this paper, we aim to break the fundamental restrictions on reliable NIR-to-RGB (NIR2RGB) translation by examining the imaging mechanism of single-chip silicon-based RGB cameras under NIR illuminations, and propose to retrieve the optimal LED multiplexing via deep learning. Experimental results show that this translation task can be significantly improved by properly multiplexing NIR LEDs close to the visible spectral range than using 850nm and 940nm LEDs.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133830350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1