首页 > 最新文献

2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)最新文献

英文 中文
Monocular Depth Estimation with Adaptive Geometric Attention 基于自适应几何注意的单目深度估计
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00069
Taher Naderi, Amir Sadovnik, J. Hayward, Hairong Qi
ingle image depth estimation is an ill-posed problem. That is, it is not mathematically possible to uniquely estimate the 3rd dimension (or depth) from a single 2D image. Hence, additional constraints need to be incorporated in order to regulate the solution space. In this paper, we explore the idea of constraining the model by taking advantage of the similarity between the RGB image and the corresponding depth map at the geometric edges of the 3D scene for more accurate depth estimation. We propose a general light-weight adaptive geometric attention module that uses the cross-correlation between the encoder and the decoder as a measure of this similarity. More precisely, we use the cosine similarity between the local embedded features in the encoder and the decoder at each spatial point. The proposed module along with the encoder-decoder network is trained in an end-to-end fashion and achieves superior and competitive performance in comparison with other state-of-the-art methods. In addition, adding our module to the base encoder-decoder model adds only an additional 0.03% (or 0.0003) parameters. Therefore, this module can be added to any base encoder-decoder network without changing its structure to address any task at hand.
单幅图像深度估计是一个不适定问题。也就是说,从数学上不可能从单个2D图像中唯一地估计第三维度(或深度)。因此,为了规范解决方案空间,需要加入额外的约束。在本文中,我们探索了通过利用RGB图像与相应深度图在3D场景几何边缘之间的相似性来约束模型的思想,以获得更准确的深度估计。我们提出了一个通用的轻量级自适应几何注意力模块,它使用编码器和解码器之间的相互关联作为这种相似性的度量。更精确地说,我们在每个空间点上使用编码器和解码器的局部嵌入特征之间的余弦相似度。所提出的模块以及编码器-解码器网络以端到端方式进行训练,与其他最先进的方法相比,实现了卓越和具有竞争力的性能。此外,将我们的模块添加到基本编码器-解码器模型中只会增加额外的0.03%(或0.0003)参数。因此,该模块可以添加到任何基本编码器-解码器网络中,而无需改变其结构来处理手头的任何任务。
{"title":"Monocular Depth Estimation with Adaptive Geometric Attention","authors":"Taher Naderi, Amir Sadovnik, J. Hayward, Hairong Qi","doi":"10.1109/WACV51458.2022.00069","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00069","url":null,"abstract":"ingle image depth estimation is an ill-posed problem. That is, it is not mathematically possible to uniquely estimate the 3rd dimension (or depth) from a single 2D image. Hence, additional constraints need to be incorporated in order to regulate the solution space. In this paper, we explore the idea of constraining the model by taking advantage of the similarity between the RGB image and the corresponding depth map at the geometric edges of the 3D scene for more accurate depth estimation. We propose a general light-weight adaptive geometric attention module that uses the cross-correlation between the encoder and the decoder as a measure of this similarity. More precisely, we use the cosine similarity between the local embedded features in the encoder and the decoder at each spatial point. The proposed module along with the encoder-decoder network is trained in an end-to-end fashion and achieves superior and competitive performance in comparison with other state-of-the-art methods. In addition, adding our module to the base encoder-decoder model adds only an additional 0.03% (or 0.0003) parameters. Therefore, this module can be added to any base encoder-decoder network without changing its structure to address any task at hand.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131051022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
In-Field Phenotyping Based on Crop Leaf and Plant Instance Segmentation 基于作物叶片和植物实例分割的田间表型研究
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00302
J. Weyler, Federico Magistri, Peter Seitz, J. Behley, C. Stachniss
A detailed analysis of a plant’s phenotype in real field conditions is critical for plant scientists and breeders to understand plant function. In contrast to traditional phenotyping performed manually, vision-based systems have the potential for an objective and automated assessment with high spatial and temporal resolution. One of such systems’ objectives is to detect and segment individual leaves of each plant since this information correlates to the growth stage and provides phenotypic traits, such as leaf count, cover-age, and size. In this paper, we propose a vision-based approach that performs instance segmentation of individual crop leaves and associates each with its corresponding crop plant in real fields. This enables us to compute relevant basic phenotypic traits on a per-plant level. We employ a convolutional neural network and operate directly on drone imagery. The network generates two different representations of the input image that we utilize to cluster individual crop leaf and plant instances. We propose a novel method to compute clustering regions based on our network’s predictions that achieves high accuracy. Furthermore, we com-pare to other state-of-the-art approaches and show that our system achieves superior performance. The source code of our approach is available 1.
在真实的田间条件下对植物表型的详细分析对于植物科学家和育种家了解植物的功能至关重要。与传统的手动表型分析相比,基于视觉的系统具有高空间和时间分辨率的客观和自动化评估的潜力。这种系统的目标之一是检测和分割每棵植物的单叶,因为这些信息与生长阶段相关,并提供表型性状,如叶数、覆盖年龄和大小。在本文中,我们提出了一种基于视觉的方法,该方法对单个作物叶片进行实例分割,并将其与实际田地中相应的作物植物相关联。这使我们能够在每株水平上计算相关的基本表型性状。我们使用卷积神经网络,直接对无人机图像进行操作。该网络生成两种不同的输入图像表示,我们利用它们来聚类单个作物叶片和植物实例。我们提出了一种基于网络预测计算聚类区域的新方法,达到了较高的准确率。此外,我们与其他最先进的方法进行了比较,并表明我们的系统实现了卓越的性能。我们的方法的源代码是可用的。
{"title":"In-Field Phenotyping Based on Crop Leaf and Plant Instance Segmentation","authors":"J. Weyler, Federico Magistri, Peter Seitz, J. Behley, C. Stachniss","doi":"10.1109/WACV51458.2022.00302","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00302","url":null,"abstract":"A detailed analysis of a plant’s phenotype in real field conditions is critical for plant scientists and breeders to understand plant function. In contrast to traditional phenotyping performed manually, vision-based systems have the potential for an objective and automated assessment with high spatial and temporal resolution. One of such systems’ objectives is to detect and segment individual leaves of each plant since this information correlates to the growth stage and provides phenotypic traits, such as leaf count, cover-age, and size. In this paper, we propose a vision-based approach that performs instance segmentation of individual crop leaves and associates each with its corresponding crop plant in real fields. This enables us to compute relevant basic phenotypic traits on a per-plant level. We employ a convolutional neural network and operate directly on drone imagery. The network generates two different representations of the input image that we utilize to cluster individual crop leaf and plant instances. We propose a novel method to compute clustering regions based on our network’s predictions that achieves high accuracy. Furthermore, we com-pare to other state-of-the-art approaches and show that our system achieves superior performance. The source code of our approach is available 1.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126777598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Natural Language Video Moment Localization Through Query-Controlled Temporal Convolution 基于查询控制时间卷积的自然语言视频时刻定位
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00258
Lingyu Zhang, R. Radke
The goal of natural language video moment localization is to locate a short segment of a long, untrimmed video that corresponds to a description presented as natural text. The description may contain several pieces of key information, including subjects/objects, sequential actions, and locations. Here, we propose a novel video moment localization framework based on the convolutional response between multimodal signals, i.e., the video sequence, the text query, and subtitles for the video if they are available. We emphasize the effect of the language sequence as a query about the video content, by converting the query sentence into a boundary detector with a filter kernel size and stride. We convolve the video sequence with the query detector to locate the start and end boundaries of the target video segment. When subtitles are available, we blend the boundary heatmaps from the visual and subtitle branches together using an LSTM to capture asynchronous dependencies across two modalities in the video. We perform extensive experiments on the TVR, Charades-STA, and TACoS benchmark datasets, demonstrating that our model achieves state-of-the-art results on all three.
自然语言视频时刻定位的目标是定位一个长视频的短片段,它与作为自然文本呈现的描述相对应。描述可能包含几条关键信息,包括主题/对象、顺序操作和位置。在这里,我们提出了一种新的基于多模态信号(即视频序列、文本查询和视频字幕(如果有的话))之间的卷积响应的视频矩定位框架。我们通过将查询语句转换为具有过滤核大小和步幅的边界检测器来强调语言序列作为视频内容查询的效果。我们将视频序列与查询检测器进行卷积,以定位目标视频段的开始和结束边界。当字幕可用时,我们使用LSTM将视觉分支和字幕分支的边界热图混合在一起,以捕获视频中两种模式之间的异步依赖关系。我们在TVR、Charades-STA和TACoS基准数据集上进行了广泛的实验,证明我们的模型在这三个数据集上都取得了最先进的结果。
{"title":"Natural Language Video Moment Localization Through Query-Controlled Temporal Convolution","authors":"Lingyu Zhang, R. Radke","doi":"10.1109/WACV51458.2022.00258","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00258","url":null,"abstract":"The goal of natural language video moment localization is to locate a short segment of a long, untrimmed video that corresponds to a description presented as natural text. The description may contain several pieces of key information, including subjects/objects, sequential actions, and locations. Here, we propose a novel video moment localization framework based on the convolutional response between multimodal signals, i.e., the video sequence, the text query, and subtitles for the video if they are available. We emphasize the effect of the language sequence as a query about the video content, by converting the query sentence into a boundary detector with a filter kernel size and stride. We convolve the video sequence with the query detector to locate the start and end boundaries of the target video segment. When subtitles are available, we blend the boundary heatmaps from the visual and subtitle branches together using an LSTM to capture asynchronous dependencies across two modalities in the video. We perform extensive experiments on the TVR, Charades-STA, and TACoS benchmark datasets, demonstrating that our model achieves state-of-the-art results on all three.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"65 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123187798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Uncertainty Learning towards Unsupervised Deformable Medical Image Registration 面向无监督变形医学图像配准的不确定性学习
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00162
Xuan Gong, Luckyson Khaidem, Wentao Zhu, Baochang Zhang, D. Doermann
Uncertainty estimation in medical image registration enables surgeons to evaluate the operative risk based on the trustworthiness of the registered image data thus of paramount importance for practical clinical applications. Despite the recent promising results obtained with deep unsupervised learning-based registration methods, reasoning about uncertainty of unsupervised registration models remains largely unexplored. In this work, we propose a predictive module to learn the registration and uncertainty in correspondence simultaneously. Our framework introduces empirical randomness and registration error based uncertainty prediction. We systematically assess the performances on two MRI datasets with different ensemble paradigms. Experimental results highlight that our proposed framework significantly improves the registration accuracy and uncertainty compared with the baseline.
医学图像配准中的不确定性估计使外科医生能够根据配准图像数据的可信度来评估手术风险,因此对实际临床应用至关重要。尽管最近基于深度无监督学习的注册方法取得了令人鼓舞的结果,但关于无监督注册模型的不确定性的推理仍在很大程度上未被探索。在这项工作中,我们提出了一个预测模块来同时学习对应中的配准和不确定性。我们的框架引入了经验随机性和基于配准误差的不确定性预测。我们系统地评估了两种不同集成范式的MRI数据集上的性能。实验结果表明,与基线相比,我们提出的框架显著提高了配准精度和不确定性。
{"title":"Uncertainty Learning towards Unsupervised Deformable Medical Image Registration","authors":"Xuan Gong, Luckyson Khaidem, Wentao Zhu, Baochang Zhang, D. Doermann","doi":"10.1109/WACV51458.2022.00162","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00162","url":null,"abstract":"Uncertainty estimation in medical image registration enables surgeons to evaluate the operative risk based on the trustworthiness of the registered image data thus of paramount importance for practical clinical applications. Despite the recent promising results obtained with deep unsupervised learning-based registration methods, reasoning about uncertainty of unsupervised registration models remains largely unexplored. In this work, we propose a predictive module to learn the registration and uncertainty in correspondence simultaneously. Our framework introduces empirical randomness and registration error based uncertainty prediction. We systematically assess the performances on two MRI datasets with different ensemble paradigms. Experimental results highlight that our proposed framework significantly improves the registration accuracy and uncertainty compared with the baseline.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116939386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Global Assists Local: Effective Aerial Representations for Field of View Constrained Image Geo-Localization 全局辅助局部:视场约束图像地理定位的有效空中表示
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00275
Royston Rodrigues, Masahiro Tani
When we humans recognize places from images, we not only infer about the objects that are available but even think about landmarks that might be surrounding it. Current place recognition approaches lack the ability to go beyond objects that are available in the image and hence miss out on understanding the scene completely. In this paper, we take a step towards holistic scene understanding. We address the problem of image geo-localization by retrieving corresponding aerial views from a large database of geotagged aerial imagery. One of the main challenges in tackling this problem is the limited Field of View (FoV) nature of query images which needs to be matched to aerial views which contain 360°FoV details. State-of-the-art method DSM-Net [19] tackles this challenge by matching aerial images locally within fixed FoV sectors. We show that local matching limits complete scene understanding and is inadequate when partial buildings are visible in query images or when local sectors of aerial images are covered by dense trees. Our approach considers both local and global properties of aerial images and hence is robust to such conditions. Experiments on standard benchmarks demonstrates that the proposed approach improves top-1% image recall rate on the CVACT [9] data-set from 57.08% to 77.19% and from 61.20% to 75.21% on the CVUSA [28] data-set for 70°FoV. We also achieve state-of-the art results for 90°FoV on both CVACT [9] and CVUSA [28] data-sets demonstrating the effectiveness of our proposed method.
当我们人类从图像中识别地点时,我们不仅会推断出可用的物体,甚至还会考虑周围可能存在的地标。目前的位置识别方法缺乏超越图像中可用物体的能力,因此无法完全理解场景。在本文中,我们向整体场景理解迈出了一步。我们通过从大型地理标记航空图像数据库中检索相应的鸟瞰图来解决图像地理定位问题。解决这个问题的主要挑战之一是查询图像的有限视场(FoV)性质,需要与包含360°FoV细节的鸟瞰图相匹配。最先进的方法DSM-Net[19]通过在固定视场区域内局部匹配航空图像来解决这一挑战。我们表明,局部匹配限制了完整的场景理解,当查询图像中可见部分建筑物或航拍图像的局部区域被茂密的树木覆盖时,局部匹配是不充分的。我们的方法考虑了航空图像的局部和全局特性,因此对这些条件具有鲁棒性。标准基准实验表明,该方法将CVACT[9]数据集上前1%的图像召回率从57.08%提高到77.19%,在CVUSA[28]数据集上从61.20%提高到75.21%。我们还在CVACT[9]和CVUSA[28]数据集上实现了90°视场的最先进结果,证明了我们提出的方法的有效性。
{"title":"Global Assists Local: Effective Aerial Representations for Field of View Constrained Image Geo-Localization","authors":"Royston Rodrigues, Masahiro Tani","doi":"10.1109/WACV51458.2022.00275","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00275","url":null,"abstract":"When we humans recognize places from images, we not only infer about the objects that are available but even think about landmarks that might be surrounding it. Current place recognition approaches lack the ability to go beyond objects that are available in the image and hence miss out on understanding the scene completely. In this paper, we take a step towards holistic scene understanding. We address the problem of image geo-localization by retrieving corresponding aerial views from a large database of geotagged aerial imagery. One of the main challenges in tackling this problem is the limited Field of View (FoV) nature of query images which needs to be matched to aerial views which contain 360°FoV details. State-of-the-art method DSM-Net [19] tackles this challenge by matching aerial images locally within fixed FoV sectors. We show that local matching limits complete scene understanding and is inadequate when partial buildings are visible in query images or when local sectors of aerial images are covered by dense trees. Our approach considers both local and global properties of aerial images and hence is robust to such conditions. Experiments on standard benchmarks demonstrates that the proposed approach improves top-1% image recall rate on the CVACT [9] data-set from 57.08% to 77.19% and from 61.20% to 75.21% on the CVUSA [28] data-set for 70°FoV. We also achieve state-of-the art results for 90°FoV on both CVACT [9] and CVUSA [28] data-sets demonstrating the effectiveness of our proposed method.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114900254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Mixed-dual-head Meets Box Priors: A Robust Framework for Semi-supervised Segmentation 混合双头部满足盒先验:一种鲁棒的半监督分割框架
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00265
Chenshu Chen, Tangyou Liu, Wenming Tan, Shiliang Pu
As it is costly to densely annotate large scale datasets for supervised semantic segmentation, extensive semi-supervised methods have been proposed. However, the accuracy, stability and flexibility of existing methods are still far from satisfactory. In this paper, we propose an effective and flexible framework for semi-supervised semantic segmentation using a small set of fully labeled images and a set of weakly labeled images with bounding box labels. In our framework, position and class priors are designed to guide the annotation network to predict accurate pseudo masks for weakly labeled images, which are used to train the segmentation network. We also propose a mixed-dual-head training method to reduce the interference of label noise while enabling the training process more stable. Experiments on PASCAL VOC 2012 show that our method achieves state-of-the-art performance and can achieve competitive results even with very few fully labeled images. Furthermore, the performance can be further boosted with extra weakly labeled images from COCO dataset.
由于对大规模数据集进行密集标注的成本较高,人们提出了广泛的半监督语义分割方法。然而,现有方法的准确性、稳定性和灵活性还远远不能令人满意。在本文中,我们提出了一个有效且灵活的半监督语义分割框架,该框架使用一组小的完全标记图像和一组带有边界框标签的弱标记图像。在我们的框架中,我们设计了位置先验和类先验来引导标注网络准确地预测弱标记图像的伪掩码,这些伪掩码用于训练分割网络。我们还提出了一种混合双头训练方法,以减少标签噪声的干扰,同时使训练过程更加稳定。在PASCAL VOC 2012上的实验表明,我们的方法达到了最先进的性能,即使在很少的完全标记的图像上也能获得有竞争力的结果。此外,使用COCO数据集中额外的弱标记图像可以进一步提高性能。
{"title":"Mixed-dual-head Meets Box Priors: A Robust Framework for Semi-supervised Segmentation","authors":"Chenshu Chen, Tangyou Liu, Wenming Tan, Shiliang Pu","doi":"10.1109/WACV51458.2022.00265","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00265","url":null,"abstract":"As it is costly to densely annotate large scale datasets for supervised semantic segmentation, extensive semi-supervised methods have been proposed. However, the accuracy, stability and flexibility of existing methods are still far from satisfactory. In this paper, we propose an effective and flexible framework for semi-supervised semantic segmentation using a small set of fully labeled images and a set of weakly labeled images with bounding box labels. In our framework, position and class priors are designed to guide the annotation network to predict accurate pseudo masks for weakly labeled images, which are used to train the segmentation network. We also propose a mixed-dual-head training method to reduce the interference of label noise while enabling the training process more stable. Experiments on PASCAL VOC 2012 show that our method achieves state-of-the-art performance and can achieve competitive results even with very few fully labeled images. Furthermore, the performance can be further boosted with extra weakly labeled images from COCO dataset.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121848480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Shape-coded ArUco: Fiducial Marker for Bridging 2D and 3D Modalities 形状编码ArUco:桥接2D和3D模式的基准标记
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00237
Lilika Makabe, Hiroaki Santo, Fumio Okura, Y. Matsushita
We introduce a fiducial marker for the registration of two-dimensional (2D) images and untextured three-dimensional (3D) shapes that are recorded by commodity laser scanners. Specifically, we design a 3D-version of the ArUco marker that retains exactly the same appearance as its 2D counterpart from any viewpoint above the marker but contains shape information. The shape-coded ArUco can naturally work with off-the-shelf ArUco marker detectors in the 2D image domain. For the 3D domain, we develop a method for detecting the marker in an untextured 3D point cloud. Experiments demonstrate accurate 2D-3D registration using our shape-coded ArUco markers in comparison to baseline methods.
我们介绍了一种基准标记,用于注册由商品激光扫描仪记录的二维(2D)图像和无纹理的三维(3D)形状。具体来说,我们设计了一个3d版本的ArUco标记,从标记上方的任何角度来看,它与2D标记保持完全相同的外观,但包含形状信息。形状编码的ArUco可以在2D图像域与现成的ArUco标记检测器一起工作。对于三维领域,我们开发了一种检测非纹理三维点云中的标记的方法。与基线方法相比,实验证明使用我们的形状编码ArUco标记准确的2D-3D注册。
{"title":"Shape-coded ArUco: Fiducial Marker for Bridging 2D and 3D Modalities","authors":"Lilika Makabe, Hiroaki Santo, Fumio Okura, Y. Matsushita","doi":"10.1109/WACV51458.2022.00237","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00237","url":null,"abstract":"We introduce a fiducial marker for the registration of two-dimensional (2D) images and untextured three-dimensional (3D) shapes that are recorded by commodity laser scanners. Specifically, we design a 3D-version of the ArUco marker that retains exactly the same appearance as its 2D counterpart from any viewpoint above the marker but contains shape information. The shape-coded ArUco can naturally work with off-the-shelf ArUco marker detectors in the 2D image domain. For the 3D domain, we develop a method for detecting the marker in an untextured 3D point cloud. Experiments demonstrate accurate 2D-3D registration using our shape-coded ArUco markers in comparison to baseline methods.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116605814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VCSeg: Virtual Camera Adaptation for Road Segmentation VCSeg:道路分割的虚拟摄像机适应
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00203
Gong Cheng, J. Elder
Domain shift limits generalization in many problem domains. For road segmentation, one of the principal causes of domain shift is variation in the geometric camera parameters, which results in misregistration of scene structure between images. To address this issue, we decompose the shift into two components: Between-camera shift and within-camera shift. To handle between-camera shift, we assume that average camera parameters are known or can be estimated and use this knowledge to rectify both source and target domain images to a standard virtual camera model. To handle within-camera shift, we use estimates of road vanishing points to correct for shifts in camera pan and tilt. While this approach improves alignment, it produces gaps in the virtual image that complicates network training. To solve this problem, we introduce a novel projective image completion method that fills these gaps in a plausible way. Using five diverse and challenging road segmentation datasets, we demonstrate that our virtual camera method dramatically improves road segmentation performance when generalizing across cameras, and propose that this be integrated as a standard component of road segmentation systems to improve generalization.
域移限制了许多问题域的泛化。在道路分割中,引起域偏移的主要原因之一是相机几何参数的变化,这会导致图像之间的场景结构配错。为了解决这个问题,我们将移位分解为两个组件:相机间移位和相机内移位。为了处理相机之间的偏移,我们假设平均相机参数是已知的或可以估计的,并使用这些知识来校正源和目标域图像到一个标准的虚拟相机模型。为了处理相机内移动,我们使用道路消失点的估计来纠正相机平移和倾斜的移动。虽然这种方法改善了对齐,但它会在虚拟图像中产生间隙,使网络训练变得复杂。为了解决这个问题,我们引入了一种新的投影图像补全方法,以一种合理的方式填充这些空白。使用五个不同且具有挑战性的道路分割数据集,我们证明了我们的虚拟摄像机方法在跨摄像机进行泛化时显着提高了道路分割性能,并建议将其集成为道路分割系统的标准组件以提高泛化。
{"title":"VCSeg: Virtual Camera Adaptation for Road Segmentation","authors":"Gong Cheng, J. Elder","doi":"10.1109/WACV51458.2022.00203","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00203","url":null,"abstract":"Domain shift limits generalization in many problem domains. For road segmentation, one of the principal causes of domain shift is variation in the geometric camera parameters, which results in misregistration of scene structure between images. To address this issue, we decompose the shift into two components: Between-camera shift and within-camera shift. To handle between-camera shift, we assume that average camera parameters are known or can be estimated and use this knowledge to rectify both source and target domain images to a standard virtual camera model. To handle within-camera shift, we use estimates of road vanishing points to correct for shifts in camera pan and tilt. While this approach improves alignment, it produces gaps in the virtual image that complicates network training. To solve this problem, we introduce a novel projective image completion method that fills these gaps in a plausible way. Using five diverse and challenging road segmentation datasets, we demonstrate that our virtual camera method dramatically improves road segmentation performance when generalizing across cameras, and propose that this be integrated as a standard component of road segmentation systems to improve generalization.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115069646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multi-branch Neural Networks for Video Anomaly Detection in Adverse Lighting and Weather Conditions 多分支神经网络在恶劣光照和天气条件下的视频异常检测
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00308
Sam Leroux, Bo Li, P. Simoens
Automated anomaly detection in surveillance videos has attracted much interest as it provides a scalable alternative to manual monitoring. Most existing approaches achieve good performance on clean benchmark datasets recorded in well-controlled environments. However, detecting anomalies is much more challenging in the real world. Adverse weather conditions like rain or changing brightness levels cause a significant shift in the input data distribution, which in turn can lead to the detector model incorrectly reporting high anomaly scores. Additionally, surveillance cameras are usually deployed in evolving environments such as a city street of which the appearance changes over time because of seasonal changes or roadworks. The anomaly detection model will need to be updated periodically to deal with these issues. In this paper, we introduce a multi-branch model that is equipped with a trainable preprocessing step and multiple identical branches for detecting anomalies during day and night as well as in sunny and rainy conditions. We experimentally validate our approach on a distorted version of the Avenue dataset and provide qualitative results on real-world surveillance camera data. Experimental results show that our method outperforms the existing methods in terms of detection accuracy while being faster and more robust on scenes with varying visibility.
监控视频中的自动异常检测吸引了很多人的兴趣,因为它提供了一种可扩展的人工监控替代方案。大多数现有方法在控制良好的环境中记录的干净基准数据集上都能取得良好的性能。然而,在现实世界中,检测异常要困难得多。恶劣的天气条件,如下雨或改变亮度水平,会导致输入数据分布发生重大变化,这反过来又会导致探测器模型错误地报告高异常分数。此外,监控摄像机通常部署在不断变化的环境中,例如城市街道,由于季节变化或道路工程,其外观会随着时间而变化。异常检测模型需要定期更新以处理这些问题。在本文中,我们引入了一个多分支模型,该模型配备了一个可训练的预处理步骤和多个相同的分支,用于检测白天和黑夜以及晴天和雨天条件下的异常。我们在Avenue数据集的扭曲版本上实验验证了我们的方法,并在真实世界的监控摄像机数据上提供了定性结果。实验结果表明,该方法在检测精度方面优于现有方法,同时在不同可见性场景下具有更快和更强的鲁棒性。
{"title":"Multi-branch Neural Networks for Video Anomaly Detection in Adverse Lighting and Weather Conditions","authors":"Sam Leroux, Bo Li, P. Simoens","doi":"10.1109/WACV51458.2022.00308","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00308","url":null,"abstract":"Automated anomaly detection in surveillance videos has attracted much interest as it provides a scalable alternative to manual monitoring. Most existing approaches achieve good performance on clean benchmark datasets recorded in well-controlled environments. However, detecting anomalies is much more challenging in the real world. Adverse weather conditions like rain or changing brightness levels cause a significant shift in the input data distribution, which in turn can lead to the detector model incorrectly reporting high anomaly scores. Additionally, surveillance cameras are usually deployed in evolving environments such as a city street of which the appearance changes over time because of seasonal changes or roadworks. The anomaly detection model will need to be updated periodically to deal with these issues. In this paper, we introduce a multi-branch model that is equipped with a trainable preprocessing step and multiple identical branches for detecting anomalies during day and night as well as in sunny and rainy conditions. We experimentally validate our approach on a distorted version of the Avenue dataset and provide qualitative results on real-world surveillance camera data. Experimental results show that our method outperforms the existing methods in terms of detection accuracy while being faster and more robust on scenes with varying visibility.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123512657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Learning from the CNN-based Compressed Domain 学习基于cnn的压缩域
Pub Date : 2022-01-01 DOI: 10.1109/WACV51458.2022.00405
Zhenzhen Wang, Minghai Qin, Yen-Kuang Chen
Images are transmitted or stored in their compressed form and most of the AI tasks are performed from the re-constructed domain. Convolutional neural network (CNN)-based image compression and reconstruction is growing rapidly and it achieves or surpasses the state-of-the-art heuristic image compression methods, such as JPEG or BPG. A major limitation of the application of the CNN-based image compression is on the computation complexity during compression and reconstruction. Therefore, learning from the compressed domain is desirable to avoid the computation and latency caused by reconstruction. In this paper, we show that learning from the compressed domain can achieve comparative or even better accuracy than from the reconstructed domain. At a high compression rate of 0.098 bpp, for example, the proposed compression-learning system has over 3% absolute accuracy boost over the traditional compression-reconstruction-learning flow. The improvement is achieved by optimizing the compression-learning system targeting original-sized instead of standardized (e.g., 224x224) images, which is crucial in practice since real-world images into the system have different sizes. We also propose an efficient model-free entropy estimation method and a criterion to learn from a selected subset of features in the compressed domain to further re-duce the transmission and computation cost without accuracy degradation.
图像以压缩形式传输或存储,大多数人工智能任务都是从重建的域执行的。基于卷积神经网络(Convolutional neural network, CNN)的图像压缩与重建正在迅速发展,它已经达到或超过了目前最先进的启发式图像压缩方法,如JPEG或BPG。基于cnn的图像压缩应用的一个主要限制是压缩和重构过程的计算复杂度。因此,从压缩域学习是可取的,以避免重构带来的计算和延迟。在本文中,我们证明了从压缩域学习可以达到与重构域相当甚至更好的精度。例如,在0.098 bpp的高压缩率下,所提出的压缩学习系统比传统的压缩-重建-学习流程的绝对精度提高了3%以上。改进是通过优化压缩学习系统来实现的,目标是原始大小而不是标准化(例如,224x224)的图像,这在实践中是至关重要的,因为系统中的真实图像具有不同的大小。我们还提出了一种有效的无模型熵估计方法和一种从压缩域中选定的特征子集学习的准则,以进一步降低传输和计算成本,同时不降低精度。
{"title":"Learning from the CNN-based Compressed Domain","authors":"Zhenzhen Wang, Minghai Qin, Yen-Kuang Chen","doi":"10.1109/WACV51458.2022.00405","DOIUrl":"https://doi.org/10.1109/WACV51458.2022.00405","url":null,"abstract":"Images are transmitted or stored in their compressed form and most of the AI tasks are performed from the re-constructed domain. Convolutional neural network (CNN)-based image compression and reconstruction is growing rapidly and it achieves or surpasses the state-of-the-art heuristic image compression methods, such as JPEG or BPG. A major limitation of the application of the CNN-based image compression is on the computation complexity during compression and reconstruction. Therefore, learning from the compressed domain is desirable to avoid the computation and latency caused by reconstruction. In this paper, we show that learning from the compressed domain can achieve comparative or even better accuracy than from the reconstructed domain. At a high compression rate of 0.098 bpp, for example, the proposed compression-learning system has over 3% absolute accuracy boost over the traditional compression-reconstruction-learning flow. The improvement is achieved by optimizing the compression-learning system targeting original-sized instead of standardized (e.g., 224x224) images, which is crucial in practice since real-world images into the system have different sizes. We also propose an efficient model-free entropy estimation method and a criterion to learn from a selected subset of features in the compressed domain to further re-duce the transmission and computation cost without accuracy degradation.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122868933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1