首页 > 最新文献

2021 IEEE/CVF International Conference on Computer Vision (ICCV)最新文献

英文 中文
Weakly Supervised Segmentation of Small Buildings with Point Labels 带点标签的小建筑物弱监督分割
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00731
Jae-Hun Lee, ChanYoung Kim, S. Sull
Most supervised image segmentation methods require delicate and time-consuming pixel-level labeling of building or objects, especially for small objects. In this paper, we present a weakly supervised segmentation network for aerial/satellite images, separately considering small and large objects. First, we propose a simple point labeling method for small objects, while large objects are fully labeled. Then, we present a segmentation network trained with a small object mask to separate small and large objects in the loss function. During training, we employ a memory bank to cope with the limited number of point labels. Experiments results with three public datasets demonstrate the feasibility of our approach.
大多数监督图像分割方法都需要对建筑物或物体进行精细且耗时的像素级标记,特别是对于小物体。在本文中,我们提出了一个弱监督分割网络的航空/卫星图像,分别考虑小和大目标。首先,我们提出了一种简单的小目标点标记方法,而大目标则完全标记。然后,我们提出了一个用小目标掩码训练的分割网络,在损失函数中分离小目标和大目标。在训练过程中,我们使用记忆库来处理有限数量的点标签。在三个公共数据集上的实验结果证明了该方法的可行性。
{"title":"Weakly Supervised Segmentation of Small Buildings with Point Labels","authors":"Jae-Hun Lee, ChanYoung Kim, S. Sull","doi":"10.1109/ICCV48922.2021.00731","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00731","url":null,"abstract":"Most supervised image segmentation methods require delicate and time-consuming pixel-level labeling of building or objects, especially for small objects. In this paper, we present a weakly supervised segmentation network for aerial/satellite images, separately considering small and large objects. First, we propose a simple point labeling method for small objects, while large objects are fully labeled. Then, we present a segmentation network trained with a small object mask to separate small and large objects in the loss function. During training, we employ a memory bank to cope with the limited number of point labels. Experiments results with three public datasets demonstrate the feasibility of our approach.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"17 1","pages":"7386-7395"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90498586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Just a Few Points are All You Need for Multi-view Stereo: A Novel Semi-supervised Learning Method for Multi-view Stereo 多视点立体:一种新的半监督式多视点立体学习方法
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00612
Taekyung Kim, Jaehoon Choi, Seokeon Choi, Dongki Jung, Changick Kim
While learning-based multi-view stereo (MVS) methods have recently shown successful performances in quality and efficiency, limited MVS data hampers generalization to unseen environments. A simple solution is to generate various large-scale MVS datasets, but generating dense ground truth for 3D structure requires a huge amount of time and resources. On the other hand, if the reliance on dense ground truth is relaxed, MVS systems will generalize more smoothly to new environments. To this end, we first introduce a novel semi-supervised multi-view stereo framework called a Sparse Ground truth-based MVS Network (SGT-MVSNet) that can reliably reconstruct the 3D structures even with a few ground truth 3D points. Our strategy is to divide the accurate and erroneous regions and individually conquer them based on our observation that a probability map can separate these regions. We propose a self-supervision loss called the 3D Point Consistency Loss to enhance the 3D reconstruction performance, which forces the 3D points back-projected from the corresponding pixels by the predicted depth values to meet at the same 3D co-ordinates. Finally, we propagate these improved depth pre-dictions toward edges and occlusions by the Coarse-to-fine Reliable Depth Propagation module. We generate the spare ground truth of the DTU dataset for evaluation and extensive experiments verify that our SGT-MVSNet outperforms the state-of-the-art MVS methods on the sparse ground truth setting. Moreover, our method shows comparable reconstruction results to the supervised MVS methods though we only used tens and hundreds of ground truth 3D points.
虽然基于学习的多视图立体(MVS)方法最近在质量和效率方面取得了成功,但有限的MVS数据阻碍了对未知环境的推广。一种简单的解决方案是生成各种大规模的MVS数据集,但生成密集的三维结构地面真值需要大量的时间和资源。另一方面,如果放松对密集地面真值的依赖,则MVS系统将更顺利地推广到新环境。为此,我们首先引入了一种新的半监督多视图立体框架,称为基于稀疏地面真值的MVS网络(SGT-MVSNet),即使只有几个地面真值3D点,也可以可靠地重建3D结构。我们的策略是划分准确和错误的区域,并根据我们的观察分别征服它们,概率图可以将这些区域分开。我们提出了一种称为3D点一致性损失的自我监督损失来增强3D重建性能,它迫使通过预测深度值从相应像素反向投影的3D点在相同的3D坐标处相遇。最后,我们通过粗到细的可靠深度传播模块向边缘和遮挡传播这些改进的深度预测。我们生成了用于评估的DTU数据集的备用地面真值,并且大量的实验验证了我们的SGT-MVSNet在稀疏地面真值设置上优于最先进的MVS方法。此外,尽管我们只使用了数十个或数百个地面真实三维点,但我们的方法显示出与监督MVS方法相当的重建结果。
{"title":"Just a Few Points are All You Need for Multi-view Stereo: A Novel Semi-supervised Learning Method for Multi-view Stereo","authors":"Taekyung Kim, Jaehoon Choi, Seokeon Choi, Dongki Jung, Changick Kim","doi":"10.1109/ICCV48922.2021.00612","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00612","url":null,"abstract":"While learning-based multi-view stereo (MVS) methods have recently shown successful performances in quality and efficiency, limited MVS data hampers generalization to unseen environments. A simple solution is to generate various large-scale MVS datasets, but generating dense ground truth for 3D structure requires a huge amount of time and resources. On the other hand, if the reliance on dense ground truth is relaxed, MVS systems will generalize more smoothly to new environments. To this end, we first introduce a novel semi-supervised multi-view stereo framework called a Sparse Ground truth-based MVS Network (SGT-MVSNet) that can reliably reconstruct the 3D structures even with a few ground truth 3D points. Our strategy is to divide the accurate and erroneous regions and individually conquer them based on our observation that a probability map can separate these regions. We propose a self-supervision loss called the 3D Point Consistency Loss to enhance the 3D reconstruction performance, which forces the 3D points back-projected from the corresponding pixels by the predicted depth values to meet at the same 3D co-ordinates. Finally, we propagate these improved depth pre-dictions toward edges and occlusions by the Coarse-to-fine Reliable Depth Propagation module. We generate the spare ground truth of the DTU dataset for evaluation and extensive experiments verify that our SGT-MVSNet outperforms the state-of-the-art MVS methods on the sparse ground truth setting. Moreover, our method shows comparable reconstruction results to the supervised MVS methods though we only used tens and hundreds of ground truth 3D points.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"7 1","pages":"6158-6166"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88830575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Robust Small-scale Pedestrian Detection with Cued Recall via Memory Learning 基于记忆学习的线索回忆鲁棒小尺度行人检测
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00304
Jung Uk Kim, Sungjune Park, Yong Man Ro
Although the visual appearances of small-scale objects are not well observed, humans can recognize them by associating the visual cues of small objects from their memorized appearance. It is called cued recall. In this paper, motivated by the memory process of humans, we introduce a novel pedestrian detection framework that imitates cued recall in detecting small-scale pedestrians. We propose a large-scale embedding learning with the large-scale pedestrian recalling memory (LPR Memory). The purpose of the proposed large-scale embedding learning is to memorize and recall the large-scale pedestrian appearance via the LPR Memory. To this end, we employ the large-scale pedestrian exemplar set, so that, the LPR Memory can recall the information of the large-scale pedestrians from the small-scale pedestrians. Comprehensive quantitative and qualitative experimental results validate the effectiveness of the proposed framework with the LPR Memory.
虽然小物体的视觉外观不能很好地观察到,但人类可以通过将小物体的视觉线索与记忆的外观联系起来来识别它们。这被称为线索回忆。本文以人类的记忆过程为动力,提出了一种模仿线索回忆的行人检测框架,用于检测小规模行人。提出了一种基于大规模行人回忆记忆的大规模嵌入学习方法。提出的大规模嵌入学习的目的是通过LPR记忆来记忆和回忆大规模的行人外观。为此,我们采用大规模的行人样本集,使LPR Memory能够从小规模行人中回忆起大规模行人的信息。综合定量和定性实验结果验证了该框架在LPR记忆中的有效性。
{"title":"Robust Small-scale Pedestrian Detection with Cued Recall via Memory Learning","authors":"Jung Uk Kim, Sungjune Park, Yong Man Ro","doi":"10.1109/ICCV48922.2021.00304","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00304","url":null,"abstract":"Although the visual appearances of small-scale objects are not well observed, humans can recognize them by associating the visual cues of small objects from their memorized appearance. It is called cued recall. In this paper, motivated by the memory process of humans, we introduce a novel pedestrian detection framework that imitates cued recall in detecting small-scale pedestrians. We propose a large-scale embedding learning with the large-scale pedestrian recalling memory (LPR Memory). The purpose of the proposed large-scale embedding learning is to memorize and recall the large-scale pedestrian appearance via the LPR Memory. To this end, we employ the large-scale pedestrian exemplar set, so that, the LPR Memory can recall the information of the large-scale pedestrians from the small-scale pedestrians. Comprehensive quantitative and qualitative experimental results validate the effectiveness of the proposed framework with the LPR Memory.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"93 1","pages":"3030-3039"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80803930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Learning Causal Representation for Training Cross-Domain Pose Estimator via Generative Interventions 基于生成干预的跨域姿态估计器训练的因果表示学习
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.01108
Xiheng Zhang, Yongkang Wong, Xiaofei Wu, Juwei Lu, Mohan S. Kankanhalli, Xiangdong Li, Wei-dong Geng
3D pose estimation has attracted increasing attention with the availability of high-quality benchmark datasets. However, prior works show that deep learning models tend to learn spurious correlations, which fail to generalize beyond the specific dataset they are trained on. In this work, we take a step towards training robust models for cross-domain pose estimation task, which brings together ideas from causal representation learning and generative adversarial networks. Specifically, this paper introduces a novel framework for causal representation learning which explicitly exploits the causal structure of the task. We consider changing domain as interventions on images under the data-generation process and steer the generative model to produce counterfactual features. This help the model learn transferable and causal relations across different domains. Our framework is able to learn with various types of unlabeled datasets. We demonstrate the efficacy of our proposed method on both human and hand pose estimation task. The experiment results show the proposed approach achieves state-of-the-art performance on most datasets for both domain adaptation and domain generalization settings.
随着高质量基准数据集的出现,三维姿态估计越来越受到人们的关注。然而,先前的研究表明,深度学习模型倾向于学习虚假的相关性,这无法推广到他们所训练的特定数据集之外。在这项工作中,我们朝着训练跨域姿态估计任务的鲁棒模型迈出了一步,它汇集了因果表示学习和生成对抗网络的思想。具体来说,本文引入了一种新的因果表示学习框架,该框架明确地利用了任务的因果结构。我们考虑在数据生成过程中改变域作为对图像的干预,并引导生成模型产生反事实特征。这有助于模型学习跨不同领域的可转移关系和因果关系。我们的框架能够学习各种类型的未标记数据集。我们证明了该方法在人体和手部姿态估计任务上的有效性。实验结果表明,在大多数数据集上,该方法在域自适应和域泛化设置上都达到了最先进的性能。
{"title":"Learning Causal Representation for Training Cross-Domain Pose Estimator via Generative Interventions","authors":"Xiheng Zhang, Yongkang Wong, Xiaofei Wu, Juwei Lu, Mohan S. Kankanhalli, Xiangdong Li, Wei-dong Geng","doi":"10.1109/ICCV48922.2021.01108","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.01108","url":null,"abstract":"3D pose estimation has attracted increasing attention with the availability of high-quality benchmark datasets. However, prior works show that deep learning models tend to learn spurious correlations, which fail to generalize beyond the specific dataset they are trained on. In this work, we take a step towards training robust models for cross-domain pose estimation task, which brings together ideas from causal representation learning and generative adversarial networks. Specifically, this paper introduces a novel framework for causal representation learning which explicitly exploits the causal structure of the task. We consider changing domain as interventions on images under the data-generation process and steer the generative model to produce counterfactual features. This help the model learn transferable and causal relations across different domains. Our framework is able to learn with various types of unlabeled datasets. We demonstrate the efficacy of our proposed method on both human and hand pose estimation task. The experiment results show the proposed approach achieves state-of-the-art performance on most datasets for both domain adaptation and domain generalization settings.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"4 1","pages":"11250-11260"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81209676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Synthesized Feature based Few-Shot Class-Incremental Learning on a Mixture of Subspaces 混合子空间上基于合成特征的少镜头类增量学习
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00854
A. Cheraghian, Shafin Rahman, Sameera Ramasinghe, Pengfei Fang, Christian Simon, L. Petersson, Mehrtash Harandi
Few-shot class incremental learning (FSCIL) aims to incrementally add sets of novel classes to a well-trained base model in multiple training sessions with the restriction that only a few novel instances are available per class. While learning novel classes, FSCIL methods gradually forget base (old) class training and overfit to a few novel class samples. Existing approaches have addressed this problem by computing the class prototypes from the visual or semantic word vector domain. In this paper, we propose addressing this problem using a mixture of subspaces. Subspaces define the cluster structure of the visual domain and help to describe the visual and semantic domain considering the overall distribution of the data. Additionally, we propose to employ a variational autoencoder (VAE) to generate synthesized visual samples for augmenting pseudo-feature while learning novel classes incrementally. The combined effect of the mixture of subspaces and synthesized features reduces the forgetting and overfitting problem of FSCIL. Extensive experiments on three image classification datasets show that our proposed method achieves competitive results compared to state-of-the-art methods.
few -shot class incremental learning (FSCIL)的目的是在多个训练课程中,在每个类只有几个新实例可用的限制下,逐步将新类集添加到训练良好的基础模型中。在学习新类的过程中,FSCIL方法逐渐忘记了基(旧)类训练,并对少数新类样本进行过拟合。现有的方法通过从视觉或语义词向量域计算类原型来解决这个问题。在本文中,我们建议使用混合子空间来解决这个问题。子空间定义了视觉域的聚类结构,并根据数据的整体分布来描述视觉域和语义域。此外,我们建议使用变分自编码器(VAE)在增量学习新类的同时生成用于增强伪特征的合成视觉样本。混合子空间和综合特征的联合作用减少了FSCIL的遗忘和过拟合问题。在三个图像分类数据集上的大量实验表明,与现有的方法相比,我们提出的方法取得了具有竞争力的结果。
{"title":"Synthesized Feature based Few-Shot Class-Incremental Learning on a Mixture of Subspaces","authors":"A. Cheraghian, Shafin Rahman, Sameera Ramasinghe, Pengfei Fang, Christian Simon, L. Petersson, Mehrtash Harandi","doi":"10.1109/ICCV48922.2021.00854","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00854","url":null,"abstract":"Few-shot class incremental learning (FSCIL) aims to incrementally add sets of novel classes to a well-trained base model in multiple training sessions with the restriction that only a few novel instances are available per class. While learning novel classes, FSCIL methods gradually forget base (old) class training and overfit to a few novel class samples. Existing approaches have addressed this problem by computing the class prototypes from the visual or semantic word vector domain. In this paper, we propose addressing this problem using a mixture of subspaces. Subspaces define the cluster structure of the visual domain and help to describe the visual and semantic domain considering the overall distribution of the data. Additionally, we propose to employ a variational autoencoder (VAE) to generate synthesized visual samples for augmenting pseudo-feature while learning novel classes incrementally. The combined effect of the mixture of subspaces and synthesized features reduces the forgetting and overfitting problem of FSCIL. Extensive experiments on three image classification datasets show that our proposed method achieves competitive results compared to state-of-the-art methods.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"3 1","pages":"8641-8650"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89276871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
Deep Halftoning with Reversible Binary Pattern 具有可逆二进制图案的深半色调
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.01374
Menghan Xia, Wenbo Hu, Xueting Liu, T. Wong
Existing halftoning algorithms usually drop colors and fine details when dithering color images with binary dot patterns, which makes it extremely difficult to recover the original information. To dispense the recovery trouble in future, we propose a novel halftoning technique that converts a color image into binary halftone with full restorability to the original version. The key idea is to implicitly embed those previously dropped information into the halftone patterns. So, the halftone pattern not only serves to reproduce the image tone, maintain the blue-noise randomness, but also represents the color information and fine details. To this end, we exploit two collaborative convolutional neural networks (CNNs) to learn the dithering scheme, under a nontrivial self-supervision formulation. To tackle the flatness degradation issue of CNNs, we propose a novel noise incentive block (NIB) that can serve as a generic CNN plug-in for performance promotion. At last, we tailor a guiding-aware training scheme that secures the convergence direction as regulated. We evaluate the invertible halftones in multiple aspects, which evidences the effectiveness of our method.
现有的半调算法在对具有二值点图案的彩色图像进行抖动时,通常会导致颜色和细节的下降,这使得原始信息的恢复非常困难。为了消除以后的恢复问题,我们提出了一种新的半色调技术,将彩色图像转换为具有完全还原性的二值半色调。其关键思想是隐式地将先前丢失的信息嵌入到半色调模式中。因此,半色调图案不仅起到再现图像色调、保持蓝噪随机性的作用,而且还能表现色彩信息和精细细节。为此,我们利用两个协作卷积神经网络(cnn)在非平凡自监督公式下学习抖动方案。为了解决CNN的平坦度退化问题,我们提出了一种新的噪声激励块(NIB),它可以作为一个通用的CNN插件来提升性能。最后,我们定制了一个引导感知的训练方案,保证了收敛方向符合规定。从多个方面对可逆半色调进行了评价,证明了该方法的有效性。
{"title":"Deep Halftoning with Reversible Binary Pattern","authors":"Menghan Xia, Wenbo Hu, Xueting Liu, T. Wong","doi":"10.1109/ICCV48922.2021.01374","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.01374","url":null,"abstract":"Existing halftoning algorithms usually drop colors and fine details when dithering color images with binary dot patterns, which makes it extremely difficult to recover the original information. To dispense the recovery trouble in future, we propose a novel halftoning technique that converts a color image into binary halftone with full restorability to the original version. The key idea is to implicitly embed those previously dropped information into the halftone patterns. So, the halftone pattern not only serves to reproduce the image tone, maintain the blue-noise randomness, but also represents the color information and fine details. To this end, we exploit two collaborative convolutional neural networks (CNNs) to learn the dithering scheme, under a nontrivial self-supervision formulation. To tackle the flatness degradation issue of CNNs, we propose a novel noise incentive block (NIB) that can serve as a generic CNN plug-in for performance promotion. At last, we tailor a guiding-aware training scheme that secures the convergence direction as regulated. We evaluate the invertible halftones in multiple aspects, which evidences the effectiveness of our method.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"82 1","pages":"13980-13989"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86647867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Separable Flow: Learning Motion Cost Volumes for Optical Flow Estimation 可分离流:光流估计的学习运动代价体积
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.01063
Feihu Zhang, Oliver J. Woodford, V. Prisacariu, Philip H. S. Torr
Full-motion cost volumes play a central role in current state-of-the-art optical flow methods. However, constructed using simple feature correlations, they lack the ability to encapsulate prior, or even non-local knowledge. This creates artifacts in poorly constrained ambiguous regions, such as occluded and textureless areas. We propose a separable cost volume module, a drop-in replacement to correlation cost volumes, that uses non-local aggregation layers to exploit global context cues and prior knowledge, in order to disambiguate motions in these regions. Our method leads both the now standard Sintel and KITTI optical flow benchmarks in terms of accuracy, and is also shown to generalize better from synthetic to real data.
全运动成本在当前最先进的光流方法中起着核心作用。然而,由于使用简单的特征关联构造,它们缺乏封装先验知识甚至非局部知识的能力。这会在约束不明确的区域中产生伪影,例如遮挡和无纹理的区域。我们提出了一个可分离的成本体积模块,它是相关成本体积的直接替代品,它使用非局部聚合层来利用全局上下文线索和先验知识,以消除这些区域中的运动的歧义。我们的方法在精度方面领先于现在标准的sinl和KITTI光流基准,并且也被证明可以更好地从合成数据推广到实际数据。
{"title":"Separable Flow: Learning Motion Cost Volumes for Optical Flow Estimation","authors":"Feihu Zhang, Oliver J. Woodford, V. Prisacariu, Philip H. S. Torr","doi":"10.1109/ICCV48922.2021.01063","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.01063","url":null,"abstract":"Full-motion cost volumes play a central role in current state-of-the-art optical flow methods. However, constructed using simple feature correlations, they lack the ability to encapsulate prior, or even non-local knowledge. This creates artifacts in poorly constrained ambiguous regions, such as occluded and textureless areas. We propose a separable cost volume module, a drop-in replacement to correlation cost volumes, that uses non-local aggregation layers to exploit global context cues and prior knowledge, in order to disambiguate motions in these regions. Our method leads both the now standard Sintel and KITTI optical flow benchmarks in terms of accuracy, and is also shown to generalize better from synthetic to real data.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"33 1","pages":"10787-10797"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91283129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 63
Task Switching Network for Multi-task Learning 多任务学习的任务交换网络
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00818
Guolei Sun, Thomas Probst, D. Paudel, Nikola Popovic, Menelaos Kanakis, Jagruti R. Patel, Dengxin Dai, L. Gool
We introduce Task Switching Networks (TSNs), a task-conditioned architecture with a single unified encoder/decoder for efficient multi-task learning. Multiple tasks are performed by switching between them, performing one task at a time. TSNs have a constant number of parameters irrespective of the number of tasks. This scalable yet conceptually simple approach circumvents the overhead and intricacy of task-specific network components in existing works. In fact, we demonstrate for the first time that multi-tasking can be performed with a single task-conditioned decoder. We achieve this by learning task-specific conditioning parameters through a jointly trained task embedding network, encouraging constructive interaction between tasks. Experiments validate the effectiveness of our approach, achieving state-of-the-art results on two challenging multi-task benchmarks, PASCAL-Context and NYUD. Our analysis of the learned task embeddings further indicates a connection to task relationships studied in the recent literature.
我们介绍了任务交换网络(tsn),这是一种任务条件结构,具有单个统一的编码器/解码器,用于高效的多任务学习。通过在多个任务之间切换来执行多个任务,一次执行一个任务。tsn具有恒定数量的参数,与任务的数量无关。这种可扩展但概念简单的方法规避了现有工作中特定于任务的网络组件的开销和复杂性。事实上,我们首次证明了使用单个任务条件解码器可以执行多任务。我们通过联合训练的任务嵌入网络来学习任务特定的条件反射参数,从而鼓励任务之间的建设性交互。实验验证了我们方法的有效性,在两个具有挑战性的多任务基准,PASCAL-Context和NYUD上取得了最先进的结果。我们对学习任务嵌入的分析进一步表明了与最近文献中研究的任务关系的联系。
{"title":"Task Switching Network for Multi-task Learning","authors":"Guolei Sun, Thomas Probst, D. Paudel, Nikola Popovic, Menelaos Kanakis, Jagruti R. Patel, Dengxin Dai, L. Gool","doi":"10.1109/ICCV48922.2021.00818","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00818","url":null,"abstract":"We introduce Task Switching Networks (TSNs), a task-conditioned architecture with a single unified encoder/decoder for efficient multi-task learning. Multiple tasks are performed by switching between them, performing one task at a time. TSNs have a constant number of parameters irrespective of the number of tasks. This scalable yet conceptually simple approach circumvents the overhead and intricacy of task-specific network components in existing works. In fact, we demonstrate for the first time that multi-tasking can be performed with a single task-conditioned decoder. We achieve this by learning task-specific conditioning parameters through a jointly trained task embedding network, encouraging constructive interaction between tasks. Experiments validate the effectiveness of our approach, achieving state-of-the-art results on two challenging multi-task benchmarks, PASCAL-Context and NYUD. Our analysis of the learned task embeddings further indicates a connection to task relationships studied in the recent literature.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"246 1","pages":"8271-8280"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76971889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Improving Low-Precision Network Quantization via Bin Regularization 利用Bin正则化改进低精度网络量化
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00521
Tiantian Han, Dong Li, Ji Liu, Lu Tian, Yi Shan
Model quantization is an important mechanism for energy-efficient deployment of deep neural networks on resource-constrained devices by reducing the bit precision of weights and activations. However, it remains challenging to maintain high accuracy as bit precision decreases, especially for low-precision networks (e.g., 2-bit MobileNetV2). Existing methods have been explored to address this problem by minimizing the quantization error or mimicking the data distribution of full-precision networks. In this work, we propose a novel weight regularization algorithm for improving low-precision network quantization. Instead of constraining the overall data distribution, we separably optimize all elements in each quantization bin to be as close to the target quantized value as possible. Such bin regularization (BR) mechanism encourages the weight distribution of each quantization bin to be sharp and approximate to a Dirac delta distribution ideally. Experiments demonstrate that our method achieves consistent improvements over the state-of-the-art quantization-aware training methods for different low-precision networks. Particularly, our bin regularization improves LSQ for 2-bit MobileNetV2 and MobileNetV3-Small by 3.9% and 4.9% top-1 accuracy on ImageNet, respectively.
模型量化通过降低权值和激活值的比特精度,是在资源受限设备上高效部署深度神经网络的重要机制。然而,随着比特精度的降低,特别是对于低精度网络(例如2位的MobileNetV2),保持高精度仍然具有挑战性。现有的方法通过最小化量化误差或模拟全精度网络的数据分布来解决这一问题。在这项工作中,我们提出了一种新的权重正则化算法来改善低精度网络量化。我们没有限制整体数据分布,而是分别优化每个量化bin中的所有元素,使其尽可能接近目标量化值。这种仓正则化(BR)机制使每个量化仓的权重分布清晰,理想地近似于狄拉克δ分布。实验表明,对于不同的低精度网络,我们的方法比最先进的量化感知训练方法取得了一致的改进。特别是,我们的bin正则化将2位MobileNetV2和MobileNetV3-Small的LSQ在ImageNet上分别提高了3.9%和4.9%的前1精度。
{"title":"Improving Low-Precision Network Quantization via Bin Regularization","authors":"Tiantian Han, Dong Li, Ji Liu, Lu Tian, Yi Shan","doi":"10.1109/ICCV48922.2021.00521","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00521","url":null,"abstract":"Model quantization is an important mechanism for energy-efficient deployment of deep neural networks on resource-constrained devices by reducing the bit precision of weights and activations. However, it remains challenging to maintain high accuracy as bit precision decreases, especially for low-precision networks (e.g., 2-bit MobileNetV2). Existing methods have been explored to address this problem by minimizing the quantization error or mimicking the data distribution of full-precision networks. In this work, we propose a novel weight regularization algorithm for improving low-precision network quantization. Instead of constraining the overall data distribution, we separably optimize all elements in each quantization bin to be as close to the target quantized value as possible. Such bin regularization (BR) mechanism encourages the weight distribution of each quantization bin to be sharp and approximate to a Dirac delta distribution ideally. Experiments demonstrate that our method achieves consistent improvements over the state-of-the-art quantization-aware training methods for different low-precision networks. Particularly, our bin regularization improves LSQ for 2-bit MobileNetV2 and MobileNetV3-Small by 3.9% and 4.9% top-1 accuracy on ImageNet, respectively.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"102 1","pages":"5241-5250"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78164413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Detecting Persuasive Atypicality by Modeling Contextual Compatibility 基于上下文兼容性建模的说服性非典型性检测
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00101
M. Guo, R. Hwa, Adriana Kovashka
We propose a new approach to detect atypicality in persuasive imagery. Unlike atypicality which has been studied in prior work, persuasive atypicality has a particular purpose to convey meaning, and relies on understanding the common-sense spatial relations of objects. We propose a self-supervised attention-based technique which captures contextual compatibility, and models spatial relations in a precise manner. We further experiment with capturing common sense through the semantics of co-occurring object classes. We verify our approach on a dataset of atypicality in visual advertisements, as well as a second dataset capturing atypicality that has no persuasive intent.
我们提出了一种新的方法来检测劝说意象的非典型性。与以往研究的非典型性不同,说服性非典型性具有特定的传达意义的目的,并依赖于对物体的常识性空间关系的理解。我们提出了一种自我监督的基于注意力的技术,该技术捕获上下文兼容性,并以精确的方式建模空间关系。我们进一步尝试通过共同发生的对象类的语义来捕获常识。我们在视觉广告的非典型性数据集上验证了我们的方法,以及第二个捕获非典型性的数据集,这些数据集没有说服性意图。
{"title":"Detecting Persuasive Atypicality by Modeling Contextual Compatibility","authors":"M. Guo, R. Hwa, Adriana Kovashka","doi":"10.1109/ICCV48922.2021.00101","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00101","url":null,"abstract":"We propose a new approach to detect atypicality in persuasive imagery. Unlike atypicality which has been studied in prior work, persuasive atypicality has a particular purpose to convey meaning, and relies on understanding the common-sense spatial relations of objects. We propose a self-supervised attention-based technique which captures contextual compatibility, and models spatial relations in a precise manner. We further experiment with capturing common sense through the semantics of co-occurring object classes. We verify our approach on a dataset of atypicality in visual advertisements, as well as a second dataset capturing atypicality that has no persuasive intent.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"5 1","pages":"952-962"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78278886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
2021 IEEE/CVF International Conference on Computer Vision (ICCV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1