2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文中文

Adaptive Hierarchical Representation Learning for Long-Tailed Object Detection 基于自适应层次表示学习的长尾目标检测

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.00235

Banghuai Li

General object detectors are always evaluated on hand-designed datasets, e.g., MS COCO and Pascal VOC, which tend to maintain balanced data distribution over different classes. However, it goes against the practical applications in the real world which suffer from a heavy class imbalance problem, known as the long-tailed object detection. In this paper, we propose a novel method, named Adaptive Hierarchical Representation Learning (AHRL), from a metric learning perspective to address long-tailed object detection. We visualize each learned class representation in the feature space, and observe that some classes, especially under-represented scarce classes, are prone to cluster with analogous ones due to the lack of discriminative representation. Inspired by this, we propose to split the whole feature space into a hierarchical structure and eliminate the problem in a coarse-to-fine way. AHRL contains a two-stage training paradigm. First, we train a normal baseline model and construct the hierarchical structure under the unsupervised clustering method. Then, we design an AHR loss that consists of two optimization objectives. On the one hand, AHR loss retains the hierarchical structure and keeps representation clusters away from each other. On the other hand, AHR loss adopts adaptive margins according to specific class pairs in the same cluster to further optimize locally. We conduct extensive experiments on the challenging LVIS dataset and AHRL outperforms all the existing state-of-the-art methods, with 29.1% segmentation AP and 29.3% box AP on LVIS v0.5 and 27.6% segmentation AP and 28.7% box AP on LVIS v1.0 based on ResNet-101. We hope our simple yet effective approach will serve as a solid baseline to help stimulate future research in long-tailed object detection. Code will be released soon

一般的目标检测器总是在手工设计的数据集上进行评估，例如MS COCO和Pascal VOC，它们倾向于在不同的类上保持平衡的数据分布。然而，它与现实世界中的实际应用相违背，在现实世界中存在着严重的类不平衡问题，即长尾目标检测。本文从度量学习的角度提出了一种新的方法——自适应层次表示学习(AHRL)来解决长尾目标检测问题。我们在特征空间中可视化每个学习到的类表示，并观察到一些类，特别是代表性不足的稀缺类，由于缺乏判别表示，容易与类似的类聚类。受此启发，我们提出将整个特征空间分割成一个层次结构，并以从粗到精的方式消除问题。AHRL包含一个两阶段的训练范例。首先，我们在无监督聚类方法下训练一个正常基线模型并构造层次结构。然后，我们设计了一个由两个优化目标组成的AHR损失。一方面，AHR损失保留了层次结构，使表示簇彼此远离。另一方面，AHR损失根据同一聚类中的特定类对采用自适应边际，进一步进行局部优化。我们在具有挑战性的LVIS数据集上进行了大量的实验，结果表明AHRL优于所有现有的最先进的方法，在基于ResNet-101的LVIS v0.5上具有29.1%的分割AP和29.3%的盒AP，在LVIS v1.0上具有27.6%的分割AP和28.7%的盒AP。我们希望我们简单而有效的方法将作为一个坚实的基线，以帮助刺激长尾目标检测的未来研究。代码将很快发布

{"title":"Adaptive Hierarchical Representation Learning for Long-Tailed Object Detection","authors":"Banghuai Li","doi":"10.1109/CVPR52688.2022.00235","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.00235","url":null,"abstract":"General object detectors are always evaluated on hand-designed datasets, e.g., MS COCO and Pascal VOC, which tend to maintain balanced data distribution over different classes. However, it goes against the practical applications in the real world which suffer from a heavy class imbalance problem, known as the long-tailed object detection. In this paper, we propose a novel method, named Adaptive Hierarchical Representation Learning (AHRL), from a metric learning perspective to address long-tailed object detection. We visualize each learned class representation in the feature space, and observe that some classes, especially under-represented scarce classes, are prone to cluster with analogous ones due to the lack of discriminative representation. Inspired by this, we propose to split the whole feature space into a hierarchical structure and eliminate the problem in a coarse-to-fine way. AHRL contains a two-stage training paradigm. First, we train a normal baseline model and construct the hierarchical structure under the unsupervised clustering method. Then, we design an AHR loss that consists of two optimization objectives. On the one hand, AHR loss retains the hierarchical structure and keeps representation clusters away from each other. On the other hand, AHR loss adopts adaptive margins according to specific class pairs in the same cluster to further optimize locally. We conduct extensive experiments on the challenging LVIS dataset and AHRL outperforms all the existing state-of-the-art methods, with 29.1% segmentation AP and 29.3% box AP on LVIS v0.5 and 27.6% segmentation AP and 28.7% box AP on LVIS v1.0 based on ResNet-101. We hope our simple yet effective approach will serve as a solid baseline to help stimulate future research in long-tailed object detection. Code will be released soon","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"51 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114040385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution 学习连续时空超分辨率视频隐式神经表示

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.00209

Zeyuan Chen, Yinbo Chen, Jingwen Liu, Xingqian Xu, Vidit Goel, Zhangyang Wang, Humphrey Shi, Xiaolong Wang

Videos typically record the streaming and continuous visual data as discrete consecutive frames. Since the storage cost is expensive for videos of high fidelity, most of them are stored in a relatively low resolution and frame rate. Recent works of Space-Time Video Super-Resolution (STVSR) are developed to incorporate temporal interpolation and spatial super-resolution in a unified framework. However, most of them only support a fixed up-sampling scale, which limits their flexibility and applications. In this work, instead of following the discrete representations, we propose Video Implicit Neural Representation (VideoINR), and we show its applications for STVSR. The learned implicit neural representation can be decoded to videos of arbitrary spatial resolution and frame rate. We show that VideoINR achieves competitive performances with state-of-the-art STVSR methods on common up-sampling scales and significantly outperforms prior works on continuous and out-of-training-distribution scales. Our project page is at here and code is available at https://github.com/Picsart-AI-Research/VideoINR-Continuous-Space-Time-Super-Resolution.

视频通常将流和连续的视觉数据记录为离散的连续帧。由于高保真视频的存储成本昂贵，大多数视频都是以相对较低的分辨率和帧率存储的。时空视频超分辨率(STVSR)是将时间插值和空间超分辨率结合在一个统一的框架中的最新研究成果。然而，它们中的大多数只支持固定的上采样尺度，这限制了它们的灵活性和应用。在这项工作中，我们提出了视频隐式神经表示(VideoINR)，而不是遵循离散表示，并展示了其在STVSR中的应用。学习到的内隐神经表示可以解码成任意空间分辨率和帧率的视频。我们表明，VideoINR在常见的上采样尺度上使用最先进的STVSR方法取得了具有竞争力的表现，并且在连续和非训练分布尺度上显著优于先前的工作。我们的项目页面在这里，代码可以在https://github.com/Picsart-AI-Research/VideoINR-Continuous-Space-Time-Super-Resolution上获得。

{"title":"VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution","authors":"Zeyuan Chen, Yinbo Chen, Jingwen Liu, Xingqian Xu, Vidit Goel, Zhangyang Wang, Humphrey Shi, Xiaolong Wang","doi":"10.1109/CVPR52688.2022.00209","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.00209","url":null,"abstract":"Videos typically record the streaming and continuous visual data as discrete consecutive frames. Since the storage cost is expensive for videos of high fidelity, most of them are stored in a relatively low resolution and frame rate. Recent works of Space-Time Video Super-Resolution (STVSR) are developed to incorporate temporal interpolation and spatial super-resolution in a unified framework. However, most of them only support a fixed up-sampling scale, which limits their flexibility and applications. In this work, instead of following the discrete representations, we propose Video Implicit Neural Representation (VideoINR), and we show its applications for STVSR. The learned implicit neural representation can be decoded to videos of arbitrary spatial resolution and frame rate. We show that VideoINR achieves competitive performances with state-of-the-art STVSR methods on common up-sampling scales and significantly outperforms prior works on continuous and out-of-training-distribution scales. Our project page is at here and code is available at https://github.com/Picsart-AI-Research/VideoINR-Continuous-Space-Time-Super-Resolution.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"9 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114136258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 39

Global-Aware Registration of Less-Overlap RGB-D Scans 少重叠RGB-D扫描的全局感知配准

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.00625

Che Sun, Yunde Jia, Yimin Guo, Yuwei Wu

We propose a novel method of registering less-overlap RGB-D scans. Our method learns global information of a scene to construct a panorama, and aligns RGB-D scans to the panorama to perform registration. Different from existing methods that use local feature points to register less-overlap RGB-D scans and mismatch too much, we use global information to guide the registration, thereby allevi-ating the mismatching problem by preserving global consis-tency of alignments. To this end, we build a scene inference network to construct the panorama representing global in-formation. We introduce a reinforcement learning strategy to iteratively align RGB-D scans with the panorama and re-fine the panorama representation, which reduces the noise of global information and preserves global consistency of both geometric and photometric alignments. Experimental results on benchmark datasets including SUNCG, Matterport, and ScanNet show the superiority of our method.

我们提出了一种新的RGB-D扫描低重叠配准方法。我们的方法学习场景的全局信息来构建全景图，并将RGB-D扫描与全景图对齐进行配准。不同于现有方法使用局部特征点配准重叠较少、配错过多的RGB-D扫描，我们使用全局信息来指导配准，从而通过保持对齐的全局一致性来缓解配准问题。为此，我们构建了一个场景推理网络来构建代表全局信息的全景图。我们引入了一种强化学习策略来迭代对齐RGB-D扫描与全景图，并重新细化全景图表示，从而减少了全局信息的噪声，并保持了几何和光度对齐的全局一致性。在SUNCG、Matterport和ScanNet等基准数据集上的实验结果表明了该方法的优越性。

引用次数: 3

Learn from Others and Be Yourself in Heterogeneous Federated Learning 在异质联合学习中，向他人学习，做自己

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.00990

Wenke Huang, Mang Ye, Bo Du

Federated learning has emerged as an important distributed learning paradigm, which normally involves collaborative updating with others and local updating on private data. However, heterogeneity problem and catastrophic forgetting bring distinctive challenges. First, due to non-i.i.d (identically and independently distributed) data and heterogeneous architectures, models suffer performance degradation on other domains and communication barrier with participants models. Second, in local updating, model is separately optimized on private data, which is prone to overfit current data distribution and forgets previously acquired knowledge, resulting in catastrophic forgetting. In this work, we propose FCCL (Federated CrossCorrelation and Continual Learning). For heterogeneity problem, FCCL leverages unlabeled public data for communication and construct cross-correlation matrix to learn a generalizable representation under domain shift. Mean- while, for catastrophic forgetting, FCCL utilizes knowledge distillation in local updating, providing inter and intra domain information without leaking privacy. Empirical results on various image classification tasks demonstrate the effectiveness of our method and the efficiency of modules.

联邦学习已经成为一种重要的分布式学习范式，它通常涉及与他人的协作更新和对私有数据的本地更新。然而，异质性问题和灾难性遗忘带来了不同的挑战。首先，由于非i -i。D(相同和独立分布的)数据和异构架构，模型在其他领域遭受性能下降和与参与者模型的通信障碍。其次，在局部更新中，模型是针对私有数据单独优化的，容易出现过拟合当前数据分布，忘记之前获得的知识，导致灾难性遗忘。在这项工作中，我们提出了FCCL(联邦相互关系和持续学习)。针对异构性问题，FCCL利用未标记的公共数据进行通信，构建相互关联矩阵，学习域移位下的可推广表示。同时，对于灾难性遗忘，FCCL在局部更新中利用知识蒸馏，在不泄露隐私的情况下提供域间和域内信息。在各种图像分类任务上的实验结果证明了我们的方法的有效性和模块的效率。

{"title":"Learn from Others and Be Yourself in Heterogeneous Federated Learning","authors":"Wenke Huang, Mang Ye, Bo Du","doi":"10.1109/CVPR52688.2022.00990","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.00990","url":null,"abstract":"Federated learning has emerged as an important distributed learning paradigm, which normally involves collaborative updating with others and local updating on private data. However, heterogeneity problem and catastrophic forgetting bring distinctive challenges. First, due to non-i.i.d (identically and independently distributed) data and heterogeneous architectures, models suffer performance degradation on other domains and communication barrier with participants models. Second, in local updating, model is separately optimized on private data, which is prone to overfit current data distribution and forgets previously acquired knowledge, resulting in catastrophic forgetting. In this work, we propose FCCL (Federated CrossCorrelation and Continual Learning). For heterogeneity problem, FCCL leverages unlabeled public data for communication and construct cross-correlation matrix to learn a generalizable representation under domain shift. Mean- while, for catastrophic forgetting, FCCL utilizes knowledge distillation in local updating, providing inter and intra domain information without leaking privacy. Empirical results on various image classification tasks demonstrate the effectiveness of our method and the efficiency of modules.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114287636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 62

Compressive Single-Photon 3D Cameras 压缩单光子3D相机

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.01733

Felipe Gutierrez-Barragan, A. Ingle, T. Seets, Mohit Gupta, A. Velten

Single-photon avalanche diodes (SPADs) are an emerging pixel technology for time-of-flight (ToF) 3D cameras that can capture the time-of-arrival of individual photons at picosecond resolution. To estimate depths, current SPAD-based 3D cameras measure the round-trip time of a laser pulse by building a per-pixel histogram of photon times-tamps. As the spatial and timestamp resolution of SPAD-based cameras increase, their output data rates far exceed the capacity of existing data transfer technologies. One major reason for SPAD's bandwidth-intensive operation is the tight coupling that exists between depth resolution and histogram resolution. To weaken this coupling, we propose compressive single-photon histograms (CSPH). CSPHs are a per-pixel compressive representation of the high-resolution histogram, that is built on-the-fly, as each photon is detected. They are based on a family of linear coding schemes that can be expressed as a simple matrix operation. We design different CSPH coding schemes for 3D imaging and evaluate them under different signal and background levels, laser waveforms, and illumination setups. Our results show that a well-designed CSPH can consistently reduce data rates by 1–2 orders of magnitude without compromising depth precision.

单光子雪崩二极管(spad)是一种用于飞行时间(ToF) 3D相机的新兴像素技术，可以以皮秒分辨率捕获单个光子的到达时间。为了估计深度，目前基于spad的3D相机通过建立光子时间戳的逐像素直方图来测量激光脉冲的往返时间。随着spad相机空间分辨率和时间戳分辨率的提高，其输出数据速率远远超过了现有数据传输技术的容量。SPAD的带宽密集型操作的一个主要原因是深度分辨率和直方图分辨率之间存在紧密耦合。为了削弱这种耦合，我们提出了压缩单光子直方图(CSPH)。csph是高分辨率直方图的逐像素压缩表示，它是在检测到每个光子时实时构建的。它们基于一组线性编码方案，可以用简单的矩阵运算表示。我们为3D成像设计了不同的CSPH编码方案，并在不同的信号和背景电平、激光波形和照明设置下对其进行了评估。我们的研究结果表明，设计良好的CSPH可以在不影响深度精度的情况下持续降低1-2个数量级的数据速率。

{"title":"Compressive Single-Photon 3D Cameras","authors":"Felipe Gutierrez-Barragan, A. Ingle, T. Seets, Mohit Gupta, A. Velten","doi":"10.1109/CVPR52688.2022.01733","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.01733","url":null,"abstract":"Single-photon avalanche diodes (SPADs) are an emerging pixel technology for time-of-flight (ToF) 3D cameras that can capture the time-of-arrival of individual photons at picosecond resolution. To estimate depths, current SPAD-based 3D cameras measure the round-trip time of a laser pulse by building a per-pixel histogram of photon times-tamps. As the spatial and timestamp resolution of SPAD-based cameras increase, their output data rates far exceed the capacity of existing data transfer technologies. One major reason for SPAD's bandwidth-intensive operation is the tight coupling that exists between depth resolution and histogram resolution. To weaken this coupling, we propose compressive single-photon histograms (CSPH). CSPHs are a per-pixel compressive representation of the high-resolution histogram, that is built on-the-fly, as each photon is detected. They are based on a family of linear coding schemes that can be expressed as a simple matrix operation. We design different CSPH coding schemes for 3D imaging and evaluate them under different signal and background levels, laser waveforms, and illumination setups. Our results show that a well-designed CSPH can consistently reduce data rates by 1–2 orders of magnitude without compromising depth precision.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128473822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Learning 3D Object Shape and Layout without 3D Supervision 学习3D对象形状和布局没有3D监督

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.00174

Georgia Gkioxari, Nikhila Ravi, Justin Johnson

A 3D scene consists of a set of objects, each with a shape and a layout giving their position in space. Understanding 3D scenes from 2D images is an important goal, with ap-plications in robotics and graphics. While there have been recent advances in predicting 3D shape and layout from a single image, most approaches rely on 3D ground truth for training which is expensive to collect at scale. We overcome these limitations and propose a method that learns to predict 3D shape and layout for objects without any ground truth shape or layout information: instead we rely on multi-view images with 2D supervision which can more easily be col-lected at scale. Through extensive experiments on ShapeNet, Hypersim, and ScanNet we demonstrate that our approach scales to large datasets of realistic images, and compares favorably to methods relying on 3D ground truth. On Hy-persim and ScanNet where reliable 3D ground truth is not available, our approach outperforms supervised approaches trained on smaller and less diverse datasets.11Project page https://gkioxari.github.io/usl/

3D场景由一组对象组成，每个对象都有一个形状和一个布局，给出了它们在空间中的位置。从2D图像中理解3D场景是一个重要的目标，在机器人和图形学方面有应用。虽然最近在从单个图像预测3D形状和布局方面取得了进展，但大多数方法都依赖于3D地面真相进行训练，这对于大规模收集来说是昂贵的。我们克服了这些限制，并提出了一种方法，该方法可以在没有任何地面真实形状或布局信息的情况下学习预测物体的3D形状和布局:相反，我们依赖于具有2D监督的多视图图像，这可以更容易地大规模收集。通过ShapeNet, Hypersim和ScanNet上的广泛实验，我们证明了我们的方法适用于逼真图像的大型数据集，并且与依赖3D地面真相的方法相比具有优势。在hyper -persim和ScanNet中，可靠的3D地面真相是不可用的，我们的方法优于在较小和较少多样化的数据集上训练的监督方法。11项目页面https://gkioxari.github.io/usl/

引用次数: 9

Learnable Lookup Table for Neural Network Quantization 神经网络量化的可学习查找表

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.01210

Longguang Wang, Xiaoyu Dong, Yingqian Wang, Li Liu, Wei An, Y. Guo

Neural network quantization aims at reducing bit-widths of weights and activations for memory and computational efficiency. Since a linear quantizer (i.e., round(·) function) cannot well fit the bell-shaped distributions of weights and activations, many existing methods use predefined functions (e.g., exponential function) with learnable parameters to build the quantizer for joint optimization. However, these complicated quantizers introduce considerable computational overhead during inference since activation quantization should be conducted online. In this paper, we formulate the quantization process as a simple lookup operation and propose to learn lookup tables as quantizers. Specifically, we develop differentiable lookup tables and introduce several training strategies for optimization. Our lookup tables can be trained with the network in an end-to-end manner to fit the distributions in different layers and have very small additional computational cost. Comparison with previous methods show that quantized networks using our lookup tables achieve state-of-the-art performance on image classification, image super-resolution, and point cloud classification tasks.

神经网络量化的目标是减少权重和激活的比特宽度，以提高内存和计算效率。由于线性量化器(即圆形(·)函数)不能很好地拟合权重和激活的钟形分布，因此许多现有方法使用具有可学习参数的预定义函数(例如指数函数)来构建量化器以进行联合优化。然而，这些复杂的量化器在推理过程中引入了相当大的计算开销，因为激活量化应该在线进行。在本文中，我们将量化过程表述为一个简单的查找操作，并提出学习查找表作为量化器。具体来说，我们开发了可微分查找表，并介绍了几种优化的训练策略。我们的查找表可以用网络以端到端的方式进行训练，以适应不同层中的分布，并且具有非常小的额外计算成本。与以前的方法比较表明，使用我们的查找表的量化网络在图像分类、图像超分辨率和点云分类任务上实现了最先进的性能。

{"title":"Learnable Lookup Table for Neural Network Quantization","authors":"Longguang Wang, Xiaoyu Dong, Yingqian Wang, Li Liu, Wei An, Y. Guo","doi":"10.1109/CVPR52688.2022.01210","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.01210","url":null,"abstract":"Neural network quantization aims at reducing bit-widths of weights and activations for memory and computational efficiency. Since a linear quantizer (i.e., round(·) function) cannot well fit the bell-shaped distributions of weights and activations, many existing methods use predefined functions (e.g., exponential function) with learnable parameters to build the quantizer for joint optimization. However, these complicated quantizers introduce considerable computational overhead during inference since activation quantization should be conducted online. In this paper, we formulate the quantization process as a simple lookup operation and propose to learn lookup tables as quantizers. Specifically, we develop differentiable lookup tables and introduce several training strategies for optimization. Our lookup tables can be trained with the network in an end-to-end manner to fit the distributions in different layers and have very small additional computational cost. Comparison with previous methods show that quantized networks using our lookup tables achieve state-of-the-art performance on image classification, image super-resolution, and point cloud classification tasks.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129264968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Speech Driven Tongue Animation 语言驱动的舌头动画

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.01976

Salvador Medina Maza, Denis Tomè, Carsten Stoll, M. Tiede, K. Munhall, A. Hauptmann, Iain Matthews

Advances in speech driven animation techniques allow the creation of convincing animations for virtual characters solely from audio data. Many existing approaches focus on facial and lip motion and they often do not provide realistic animation of the inner mouth. This paper addresses the problem of speech-driven inner mouth animation. Obtaining performance capture data of the tongue and jaw from video alone is difficult because the inner mouth is only partially observable during speech. In this work, we introduce a large-scale speech and mocap dataset that focuses on capturing tongue, jaw, and lip motion. This dataset enables research using data-driven techniques to generate realistic inner mouth animation from speech. We then propose a deep-learning based method for accurate and generalizable speech to tongue and jaw animation, and evaluate several encoder-decoder network architectures and audio feature encoders. We find that recent self-supervised deep learning based audio feature encoders are robust, generalize well to unseen speakers and content, and work best for our task. To demonstrate the practical application of our approach, we show animations on high-quality parametric 3D face models driven by the landmarks generated from our speech-to-tongue animation method.

语音驱动动画技术的进步允许仅从音频数据为虚拟角色创建令人信服的动画。许多现有的方法侧重于面部和嘴唇的运动，它们往往不能提供真实的内嘴动画。本文研究了语音驱动的内口动画问题。仅从视频中获取舌头和下颚的性能捕获数据是困难的，因为在讲话时只能部分观察到内嘴。在这项工作中，我们引入了一个大规模的语音和动作捕捉数据集，专注于捕捉舌头、下巴和嘴唇的运动。该数据集使研究能够使用数据驱动技术从语音中生成逼真的内嘴动画。然后，我们提出了一种基于深度学习的方法，用于精确和泛化的语音到舌头和下巴动画，并评估了几种编码器-解码器网络架构和音频特征编码器。我们发现最近基于自监督深度学习的音频特征编码器是鲁棒的，可以很好地泛化到看不见的说话者和内容，并且最适合我们的任务。为了演示我们方法的实际应用，我们展示了由我们的语音到舌头动画方法生成的地标驱动的高质量参数化3D人脸模型的动画。

{"title":"Speech Driven Tongue Animation","authors":"Salvador Medina Maza, Denis Tomè, Carsten Stoll, M. Tiede, K. Munhall, A. Hauptmann, Iain Matthews","doi":"10.1109/CVPR52688.2022.01976","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.01976","url":null,"abstract":"Advances in speech driven animation techniques allow the creation of convincing animations for virtual characters solely from audio data. Many existing approaches focus on facial and lip motion and they often do not provide realistic animation of the inner mouth. This paper addresses the problem of speech-driven inner mouth animation. Obtaining performance capture data of the tongue and jaw from video alone is difficult because the inner mouth is only partially observable during speech. In this work, we introduce a large-scale speech and mocap dataset that focuses on capturing tongue, jaw, and lip motion. This dataset enables research using data-driven techniques to generate realistic inner mouth animation from speech. We then propose a deep-learning based method for accurate and generalizable speech to tongue and jaw animation, and evaluate several encoder-decoder network architectures and audio feature encoders. We find that recent self-supervised deep learning based audio feature encoders are robust, generalize well to unseen speakers and content, and work best for our task. To demonstrate the practical application of our approach, we show animations on high-quality parametric 3D face models driven by the landmarks generated from our speech-to-tongue animation method.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129688855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Cross-patch Dense Contrastive Learning for Semi-supervised Segmentation of Cellular Nuclei in Histopathologic Images 组织病理图像中细胞核半监督分割的交叉斑块密集对比学习

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.01137

Huisi Wu, Zhaoze Wang, Youyi Song, L. Yang, Jing Qin

We study the semi-supervised learning problem, using a few labeled data and a large amount of unlabeled data to train the network, by developing a cross-patch dense contrastive learning framework, to segment cellular nuclei in histopathologic images. This task is motivated by the expensive burden on collecting labeled data for histopathologic image segmentation tasks. The key idea of our method is to align features of teacher and student networks, sampled from cross-image in both patch- and pixel-levels, for enforcing the intra-class compactness and inter-class separability of features that as we shown is helpful for extracting valuable knowledge from unlabeled data. We also design a novel optimization framework that combines consistency regularization and entropy minimization techniques, showing good property in eviction of gradient vanishing. We assess the proposed method on two publicly available datasets, and obtain positive results on extensive experiments, outperforming the state-of-the-art methods. Codes are available at https://github.com/zzw-szu/CDCL.

我们研究了半监督学习问题，使用少量标记数据和大量未标记数据来训练网络，通过开发交叉补丁密集对比学习框架，在组织病理图像中分割细胞核。该任务的动机是昂贵的负担收集标记数据的组织病理图像分割任务。我们的方法的关键思想是对齐教师和学生网络的特征，从补丁和像素级的交叉图像中采样，以加强类内的紧凑性和类间的特征可分离性，正如我们所示，这有助于从未标记的数据中提取有价值的知识。我们还设计了一种新的优化框架，结合一致性正则化和熵最小化技术，在消除梯度消失方面表现出良好的性能。我们在两个公开可用的数据集上评估了所提出的方法，并在广泛的实验中获得了积极的结果，优于最先进的方法。代码可在https://github.com/zzw-szu/CDCL上获得。

{"title":"Cross-patch Dense Contrastive Learning for Semi-supervised Segmentation of Cellular Nuclei in Histopathologic Images","authors":"Huisi Wu, Zhaoze Wang, Youyi Song, L. Yang, Jing Qin","doi":"10.1109/CVPR52688.2022.01137","DOIUrl":"https://doi.org/10.1109/CVPR52688.2022.01137","url":null,"abstract":"We study the semi-supervised learning problem, using a few labeled data and a large amount of unlabeled data to train the network, by developing a cross-patch dense contrastive learning framework, to segment cellular nuclei in histopathologic images. This task is motivated by the expensive burden on collecting labeled data for histopathologic image segmentation tasks. The key idea of our method is to align features of teacher and student networks, sampled from cross-image in both patch- and pixel-levels, for enforcing the intra-class compactness and inter-class separability of features that as we shown is helpful for extracting valuable knowledge from unlabeled data. We also design a novel optimization framework that combines consistency regularization and entropy minimization techniques, showing good property in eviction of gradient vanishing. We assess the proposed method on two publicly available datasets, and obtain positive results on extensive experiments, outperforming the state-of-the-art methods. Codes are available at https://github.com/zzw-szu/CDCL.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130299936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

Convolution of Convolution: Let Kernels Spatially Collaborate 卷积的卷积:让核在空间上协作

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Pub Date : 2022-06-01 DOI: 10.1109/CVPR52688.2022.00073

Rongzhen Zhao, Jian Li, Zhenzhi Wu

In the biological visual pathway especially the retina, neurons are tiled along spatial dimensions with the electrical coupling as their local association, while in a convolution layer, kernels are placed along the channel dimension singly. We propose convolution of convolution, associating kernels in a layer and letting them collaborate spatially. With this method, a layer can provide feature maps with extra transformations and learn its kernels together instead of isolatedly. It is only used during training, bringing in negligible extra costs; then it can be re-parameterized to common convolution before testing, boosting performance gratuitously in tasks like classification, detection and segmentation. Our method works even better when larger receptive fields are demanded. The code is available on site: https://github.com/Genera1Z/ConvolutionOfConvolution.

在生物视觉通路中，尤其是视网膜，神经元沿着空间维度平铺，电偶联作为它们的局部关联，而在卷积层中，核沿着通道维度单独放置。我们提出卷积的卷积，将一个层中的核关联起来，让它们在空间上协作。使用这种方法，层可以提供具有额外转换的特征映射，并一起学习其核，而不是孤立地学习。它只在培训期间使用，带来的额外费用可以忽略不计;然后可以在测试前将其重新参数化为普通卷积，从而在分类、检测和分割等任务中无限度地提高性能。当需要更大的接受域时，我们的方法效果更好。代码可在网站上获得:https://github.com/Genera1Z/ConvolutionOfConvolution。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀