2018 15th Conference on Computer and Robot Vision (CRV)最新文献

英文中文

Robust UAV Visual Teach and Repeat Using Only Sparse Semantic Object Features 仅使用稀疏语义对象特征的鲁棒无人机视觉教学和重复

2018 15th Conference on Computer and Robot Vision (CRV)

Pub Date : 2018-05-01 DOI: 10.1109/CRV.2018.00034

A. Toudeshki, Faraz Shamshirdar, R. Vaughan

We demonstrate the use of semantic object detections as robust features for Visual Teach and Repeat (VTR). Recent CNN-based object detectors are able to reliably detect objects of tens or hundreds of categories in video at frame rates. We show that such detections are repeatable enough to use as landmarks for VTR, without any low-level image features. Since object detections are highly invariant to lighting and surface appearance changes, our VTR can cope with global lighting changes and local movements of the landmark objects. In the teaching phase we build extremely compact scene descriptors: a list of detected object labels and their image-plane locations. In the repeating phase, we use Seq-SLAM-like relocalization to identify the most similar learned scene, then use a motion control algorithm based on the funnel lane theory to navigate the robot along the previously piloted trajectory. We evaluate the method on a commodity UAV, examining the robustness of the algorithm to new viewpoints, lighting conditions, and movements of landmark objects. The results suggest that semantic object features could be useful due to their invariance to superficial appearance changes compared to low-level image features.

我们演示了使用语义对象检测作为视觉教学和重复(VTR)的鲁棒特征。最近基于cnn的目标检测器能够以帧率可靠地检测视频中数十或数百个类别的对象。我们证明这种检测是可重复的，足以用作VTR的地标，没有任何低级图像特征。由于物体检测对照明和表面外观变化具有高度的不变性，因此我们的VTR可以处理全局照明变化和地标物体的局部运动。在教学阶段，我们构建了非常紧凑的场景描述符:检测到的对象标签及其图像平面位置的列表。在重复阶段，我们使用类似seq - slam的重新定位方法来识别最相似的学习场景，然后使用基于漏斗车道理论的运动控制算法来沿着先前的驾驶轨迹导航机器人。我们在一架商用无人机上评估了该方法，检查了算法对新视点、光照条件和地标物体运动的鲁棒性。结果表明，与低级图像特征相比，语义对象特征对表面外观变化的不变性可能是有用的。

{"title":"Robust UAV Visual Teach and Repeat Using Only Sparse Semantic Object Features","authors":"A. Toudeshki, Faraz Shamshirdar, R. Vaughan","doi":"10.1109/CRV.2018.00034","DOIUrl":"https://doi.org/10.1109/CRV.2018.00034","url":null,"abstract":"We demonstrate the use of semantic object detections as robust features for Visual Teach and Repeat (VTR). Recent CNN-based object detectors are able to reliably detect objects of tens or hundreds of categories in video at frame rates. We show that such detections are repeatable enough to use as landmarks for VTR, without any low-level image features. Since object detections are highly invariant to lighting and surface appearance changes, our VTR can cope with global lighting changes and local movements of the landmark objects. In the teaching phase we build extremely compact scene descriptors: a list of detected object labels and their image-plane locations. In the repeating phase, we use Seq-SLAM-like relocalization to identify the most similar learned scene, then use a motion control algorithm based on the funnel lane theory to navigate the robot along the previously piloted trajectory. We evaluate the method on a commodity UAV, examining the robustness of the algorithm to new viewpoints, lighting conditions, and movements of landmark objects. The results suggest that semantic object features could be useful due to their invariance to superficial appearance changes compared to low-level image features.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132816884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Deep Autoencoders with Aggregated Residual Transformations for Urban Reconstruction from Remote Sensing Data 基于聚合残差变换的深度自编码器用于遥感数据的城市重建

2018 15th Conference on Computer and Robot Vision (CRV)

Pub Date : 2018-05-01 DOI: 10.1109/CRV.2018.00014

T. Forbes, Charalambos (Charis) Poullis

In this work we investigate urban reconstruction and propose a complete and automatic framework for reconstructing urban areas from remote sensing data. Firstly, we address the complex problem of semantic labeling and propose a novel network architecture named SegNeXT which combines the strengths of deep-autoencoders with feed-forward links in generating smooth predictions and reducing the number of learning parameters, with the effectiveness which cardinality-enabled residual-based building blocks have shown in improving prediction accuracy and outperforming deeper/wider network architectures with a smaller number of learning parameters. The network is trained with benchmark datasets and the reported results show that it can provide at least similar and in some cases better classification than state-of-the-art. Secondly, we address the problem of urban reconstruction and propose a complete pipeline for automatically converting semantic labels into virtual representations of the urban areas. An agglomerative clustering is performed on the points according to their classification and results in a set of contiguous and disjoint clusters. Finally, each cluster is processed according to the class it belongs: tree clusters are substituted with procedural models, cars are replaced with simplified CAD models, buildings' boundaries are extruded to form 3D models, and road, low vegetation, and clutter clusters are triangulated and simplified. The result is a complete virtual representation of the urban area. The proposed framework has been extensively tested on large-scale benchmark datasets and the semantic labeling and reconstruction results are reported.

在本文中，我们研究了城市重建，并提出了一个完整的、自动的基于遥感数据的城市重建框架。首先，我们解决了复杂的语义标记问题，并提出了一种名为SegNeXT的新型网络架构，该架构结合了深度自编码器与前馈链接在生成平滑预测和减少学习参数数量方面的优势，以及基于基数的残差构建块在提高预测精度和使用较少学习参数优于更深/更广网络架构方面的有效性。该网络使用基准数据集进行训练，报告的结果表明，它可以提供至少与最先进的分类方法相似，在某些情况下甚至更好。其次，我们解决了城市改造问题，并提出了一个完整的管道来自动将语义标签转换为城市区域的虚拟表示。根据点的分类对其进行聚类，得到一组连续和不相交的聚类。最后，对每个聚类按照所属的类进行处理:用程序模型代替树木聚类，用简化的CAD模型代替汽车聚类，挤压建筑物的边界形成三维模型，对道路、低植被、杂波聚类进行三角化和简化。其结果是一个完整的城市区域的虚拟表示。该框架已在大规模基准数据集上进行了广泛的测试，并报告了语义标注和重构结果。

{"title":"Deep Autoencoders with Aggregated Residual Transformations for Urban Reconstruction from Remote Sensing Data","authors":"T. Forbes, Charalambos (Charis) Poullis","doi":"10.1109/CRV.2018.00014","DOIUrl":"https://doi.org/10.1109/CRV.2018.00014","url":null,"abstract":"In this work we investigate urban reconstruction and propose a complete and automatic framework for reconstructing urban areas from remote sensing data. Firstly, we address the complex problem of semantic labeling and propose a novel network architecture named SegNeXT which combines the strengths of deep-autoencoders with feed-forward links in generating smooth predictions and reducing the number of learning parameters, with the effectiveness which cardinality-enabled residual-based building blocks have shown in improving prediction accuracy and outperforming deeper/wider network architectures with a smaller number of learning parameters. The network is trained with benchmark datasets and the reported results show that it can provide at least similar and in some cases better classification than state-of-the-art. Secondly, we address the problem of urban reconstruction and propose a complete pipeline for automatically converting semantic labels into virtual representations of the urban areas. An agglomerative clustering is performed on the points according to their classification and results in a set of contiguous and disjoint clusters. Finally, each cluster is processed according to the class it belongs: tree clusters are substituted with procedural models, cars are replaced with simplified CAD models, buildings' boundaries are extruded to form 3D models, and road, low vegetation, and clutter clusters are triangulated and simplified. The result is a complete virtual representation of the urban area. The proposed framework has been extensively tested on large-scale benchmark datasets and the semantic labeling and reconstruction results are reported.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132765477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Deep People Detection: A Comparative Study of SSD and LSTM-decoder 深度人物检测:SSD和lstm解码器的比较研究

2018 15th Conference on Computer and Robot Vision (CRV)

Pub Date : 2018-05-01 DOI: 10.1109/CRV.2018.00050

Md.Atiqur Rahman, Prince Kapoor, R. Laganière, Daniel Laroche, Changyun Zhu, Xiaoyin Xu, A. Ors

In this paper, we present a comparative study of two state-of-the-art object detection architectures - an end-to-end CNN-based framework called SSD [1] and an LSTM-based framework [2] which we refer to as LSTM-decoder. To this end, we study the two architectures in the context of people head detection on few benchmark datasets having small to moderately large number of head instances appearing in varying scales and occlusion levels. In order to better capture the pros and cons of the two architectures, we applied them with several deep feature extractors (e.g., Inception-V2, Inception-ResNet-V2 and MobileNet-V1) and report accuracy, speed and generalization ability of the approaches. Our experimental results show that while the LSTM-decoder can be more accurate in realizing smaller head instances especially in the presence of occlusions, the sheer detection speed and superior ability to generalize over multiple scales make SSD an ideal choice for real-time people detection.

在本文中，我们对两种最先进的目标检测架构进行了比较研究——一种基于端到端cnn的框架(称为SSD[1])和一种基于lstm的框架(称为lstm解码器)[2]。为此，我们在几个基准数据集上研究了这两种架构，这些数据集在不同的尺度和遮挡水平下出现少量到中等数量的头部实例。为了更好地捕捉这两种架构的优缺点，我们将它们与几个深度特征提取器(例如，Inception-V2, Inception-ResNet-V2和MobileNet-V1)一起应用，并报告了这些方法的准确性，速度和泛化能力。我们的实验结果表明，虽然lstm解码器可以更准确地实现较小的头部实例，特别是在存在遮挡的情况下，但绝对的检测速度和卓越的多尺度泛化能力使SSD成为实时人员检测的理想选择。

引用次数: 4

Context-Aware Action Detection in Untrimmed Videos Using Bidirectional LSTM 使用双向LSTM的未修剪视频中的上下文感知动作检测

2018 15th Conference on Computer and Robot Vision (CRV)

Pub Date : 2018-05-01 DOI: 10.1109/CRV.2018.00039

Jaideep Singh Chauhan, Yang Wang

We consider the problem of action detection in untrimmed videos. We argue that the contextual information in a video is important for this task. Based on this intuition, we design a network using a bidirectional Long Short Term Memory (Bi-LSTM) model that captures the contextual information in videos. Our model includes a modified loss function which enforces the network to learn action progression, and a backpropagation in which gradients are weighted on the basis of their origin on the temporal scale. LSTMs are good at capturing the long temporal dependencies, but not so good at modeling local temporal features. In our model, we use a 3-D Convolutional Neural Network (3-D ConvNet) for capturing the local spatio-temporal features of the videos. We perform a comprehensive analysis on the importance of learning the context of the video. Finally, we evaluate our work on two action detection datasets, namely ActivityNet and THUMOS'14. Our method achieves competitive results compared with the existing approaches on both datasets.

我们考虑了未修剪视频中的动作检测问题。我们认为视频中的上下文信息对这项任务很重要。基于这种直觉，我们设计了一个使用双向长短期记忆(Bi-LSTM)模型的网络，该模型可以捕获视频中的上下文信息。我们的模型包括一个改进的损失函数，它强制网络学习动作进展，以及一个反向传播，其中梯度是基于它们在时间尺度上的起源加权的。lstm擅长捕获长时间依赖性，但不擅长建模局部时间特征。在我们的模型中，我们使用3-D卷积神经网络(3-D ConvNet)来捕获视频的局部时空特征。我们对学习视频背景的重要性进行了全面的分析。最后，我们在两个动作检测数据集(ActivityNet和THUMOS’14)上评估了我们的工作。与现有方法相比，我们的方法在这两个数据集上都取得了具有竞争力的结果。

引用次数: 2

Occluded Leaf Matching with Full Leaf Databases Using Explicit Occlusion Modelling 利用显式遮挡建模与全叶数据库进行遮挡叶匹配

2018 15th Conference on Computer and Robot Vision (CRV)

Pub Date : 2018-05-01 DOI: 10.1109/CRV.2018.00012

Ayan Chaudhury, J. Barron

Matching an occluded contour with all the full contours in a database is an NP-hard problem. We present a suboptimal solution for this problem in this paper. We demonstrate the efficacy of our algorithm by matching partially occluded leaves with a database of full leaves. We smooth the leaf contours using a beta spline and then use the Discrete Contour Evaluation (DCE) algorithm to extract feature points. We then use subgraph matching, using the DCE points as graph nodes. This algorithm decomposes each closed contour into many open contours. We compute a number of similarity parameters for each open contour and the occluded contour. We perform an inverse similarity transform on the occluded contour. This allows the occluded contour and any open contour to be overlaid". We that compute the quality of matching for each such pair of open contours using the Fréchet distance metric. We select the best eta matched contours. Since the Fréchet distance metric is computationally cheap to compute but not always guaranteed to produce the best answer we then use an energy functional that always find best match among the best eta matches but is considerably more expensive to compute. The functional uses local and global curvature String Context descriptors and String Cut features. We minimize this energy functional using the well known GNCCP algorithm for the eta open contours yielding the best match. Experiments on a publicly available leaf image database shows that our method is both effective and efficient significantly outperforming other current state-of-the-art leaf matching methods when faced with leaf occlusion.

将遮挡轮廓与数据库中的所有完整轮廓匹配是一个np困难问题。本文给出了该问题的次优解。我们通过将部分遮挡的叶子与完整叶子数据库进行匹配来证明算法的有效性。我们使用β样条平滑叶片轮廓，然后使用离散轮廓评估(DCE)算法提取特征点。然后我们使用子图匹配，使用DCE点作为图节点。该算法将每个封闭轮廓分解为许多开放轮廓。我们计算了每个开放轮廓和遮挡轮廓的相似性参数。我们对被遮挡的轮廓进行反相似变换。这允许遮挡的轮廓和任何开放的轮廓被覆盖”。我们使用fr距离度量来计算每对开放轮廓的匹配质量。我们选择最好的eta匹配轮廓。由于fracimchet距离度量的计算成本很低，但并不总是保证产生最佳答案，因此我们使用能量函数，它总是在最佳eta匹配中找到最佳匹配，但计算成本相当高。该函数使用局部和全局曲率字符串上下文描述符和字符串切割特征。我们使用众所周知的GNCCP算法最小化该能量函数，以获得最佳匹配的eta开放轮廓。在一个公开可用的叶片图像数据库上的实验表明，当面对叶片遮挡时，我们的方法既有效又高效，显著优于当前其他最先进的叶片匹配方法。

{"title":"Occluded Leaf Matching with Full Leaf Databases Using Explicit Occlusion Modelling","authors":"Ayan Chaudhury, J. Barron","doi":"10.1109/CRV.2018.00012","DOIUrl":"https://doi.org/10.1109/CRV.2018.00012","url":null,"abstract":"Matching an occluded contour with all the full contours in a database is an NP-hard problem. We present a suboptimal solution for this problem in this paper. We demonstrate the efficacy of our algorithm by matching partially occluded leaves with a database of full leaves. We smooth the leaf contours using a beta spline and then use the Discrete Contour Evaluation (DCE) algorithm to extract feature points. We then use subgraph matching, using the DCE points as graph nodes. This algorithm decomposes each closed contour into many open contours. We compute a number of similarity parameters for each open contour and the occluded contour. We perform an inverse similarity transform on the occluded contour. This allows the occluded contour and any open contour to be overlaid\". We that compute the quality of matching for each such pair of open contours using the Fréchet distance metric. We select the best eta matched contours. Since the Fréchet distance metric is computationally cheap to compute but not always guaranteed to produce the best answer we then use an energy functional that always find best match among the best eta matches but is considerably more expensive to compute. The functional uses local and global curvature String Context descriptors and String Cut features. We minimize this energy functional using the well known GNCCP algorithm for the eta open contours yielding the best match. Experiments on a publicly available leaf image database shows that our method is both effective and efficient significantly outperforming other current state-of-the-art leaf matching methods when faced with leaf occlusion.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"236 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122818771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

HyperStackNet: A Hyper Stacked Hourglass Deep Convolutional Neural Network Architecture for Joint Player and Stick Pose Estimation in Hockey HyperStackNet:一种用于冰球中关节球员和杆姿估计的超堆叠沙漏深度卷积神经网络架构

2018 15th Conference on Computer and Robot Vision (CRV)

Pub Date : 2018-05-01 DOI: 10.1109/CRV.2018.00051

H. Neher, Kanav Vats, A. Wong, David A Clausi

Human pose estimation in ice hockey is one of the biggest challenges in computer vision-driven sports analytics, with a variety of difficulties such as bulky hockey wear, color similarity between ice rink and player jersey and the presence of additional sports equipment used by the players such as hockey sticks. As such, deep neural network architectures typically used for sports including baseball, soccer, and track and field perform poorly when applied to hockey. Inspired by the idea that the position of the hockey sticks can not only be useful for improving hockey player pose estimation but also can be used for assessing a player's performance, a novel HyperStackNet architecture has been designed and implemented for joint player and stick pose estimation. In addition to improving player pose estimation, the HyperStackNet architecture enables improved transfer learning from pre-trained stacked hourglass networks trained on a different domain. Experimental results demonstrate that when the HyperStackNet is trained to detect 18 different joint positions on a hockey player (including the hockey stick) the accuracy is 98.8% on the test dataset, thus demonstrating its efficacy for handling complex joint player and stick pose estimation from video.

冰球中的人体姿势估计是计算机视觉驱动的运动分析中最大的挑战之一，存在各种各样的困难，例如笨重的冰球服装，冰场和球员球衣之间的颜色相似性以及球员使用的其他运动设备(如曲棍球棒)的存在。因此，通常用于棒球、足球和田径等运动的深度神经网络架构在应用于曲棍球时表现不佳。冰球棒的位置不仅可以用于提高冰球运动员的姿态估计，而且可以用于评估球员的表现，这一想法的启发，设计并实现了一种新的HyperStackNet架构，用于联合球员和杆的姿态估计。除了提高玩家姿态估计，HyperStackNet架构还可以从在不同领域训练的预训练堆叠沙漏网络中改进迁移学习。实验结果表明，HyperStackNet在测试数据集上检测曲棍球运动员(包括曲棍球棒)18个不同的关节位置时，准确率为98.8%，从而证明了它对处理复杂的关节球员和曲棍球棒姿态估计的有效性。

{"title":"HyperStackNet: A Hyper Stacked Hourglass Deep Convolutional Neural Network Architecture for Joint Player and Stick Pose Estimation in Hockey","authors":"H. Neher, Kanav Vats, A. Wong, David A Clausi","doi":"10.1109/CRV.2018.00051","DOIUrl":"https://doi.org/10.1109/CRV.2018.00051","url":null,"abstract":"Human pose estimation in ice hockey is one of the biggest challenges in computer vision-driven sports analytics, with a variety of difficulties such as bulky hockey wear, color similarity between ice rink and player jersey and the presence of additional sports equipment used by the players such as hockey sticks. As such, deep neural network architectures typically used for sports including baseball, soccer, and track and field perform poorly when applied to hockey. Inspired by the idea that the position of the hockey sticks can not only be useful for improving hockey player pose estimation but also can be used for assessing a player's performance, a novel HyperStackNet architecture has been designed and implemented for joint player and stick pose estimation. In addition to improving player pose estimation, the HyperStackNet architecture enables improved transfer learning from pre-trained stacked hourglass networks trained on a different domain. Experimental results demonstrate that when the HyperStackNet is trained to detect 18 different joint positions on a hockey player (including the hockey stick) the accuracy is 98.8% on the test dataset, thus demonstrating its efficacy for handling complex joint player and stick pose estimation from video.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115447911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Real-Time Large-Scale Fusion of High Resolution 3D Scans with Details Preservation 实时大规模融合的高分辨率三维扫描与细节保存

2018 15th Conference on Computer and Robot Vision (CRV)

Pub Date : 2018-05-01 DOI: 10.1109/CRV.2018.00019

H. Sekkati, Jonathan Boisvert, G. Godin, L. Borgeat

This paper presents a real-time 3D shape fusion system that faithfully integrates very high resolution 3D scans with the goal of maximizing details preservation. The system fully maps complex shapes while allowing free movement similarly to dense SLAM systems in robotics where sensor fusion techniques map large environments. We propose a novel framework to integrate shapes into a volume with fine details preservation of the reconstructed shape which is an important aspect in many applications, especially for industrial inspection. The truncated signed distance function is generalized with a global variational scheme that controls edge preservation and leads to updating cumulative rules adapted for GPU implementation. The framework also embeds a map deformation method to online deform the shape and correct the system trajectory drift at few microns accuracy. Results are presented from the integrated system on two mechanical objects which illustrate the benefits of the proposed approach.

本文提出了一种实时三维形状融合系统，该系统忠实地集成了非常高分辨率的三维扫描，目标是最大限度地保留细节。该系统完全映射复杂的形状，同时允许自由移动，类似于机器人中的密集SLAM系统，其中传感器融合技术映射大型环境。我们提出了一种新的框架，将形状整合到一个体积中，并保留重构形状的细节，这在许多应用中是一个重要的方面，特别是在工业检测中。截断的有符号距离函数用全局变分方案进行了推广，该方案控制边缘保留，并导致适合GPU实现的更新累积规则。该框架还嵌入了一种地图变形方法，可以在线变形形状并以几微米的精度纠正系统轨迹漂移。从两个机械对象的集成系统中给出了结果，说明了所提出方法的优点。

引用次数: 0

3D Visual Homing for Commodity UAVs 商用无人机的3D视觉定位

2018 15th Conference on Computer and Robot Vision (CRV)

Pub Date : 2018-05-01 DOI: 10.1109/CRV.2018.00045

Hao Cai, Sipan Ye, A. Vardy, Minglun Gong

Visual homing enables an autonomous robot to move to a target (home) position using only visual information. While 2D visual homing has been widely studied, homing in 3D space still requires much attention. This paper presents a novel 3D visual homing method which can be applied to commodity Unmanned Aerial Vehicles (UAVs). Firstly, relative camera poses are estimated through feature correspondences between current views and the reference home image. Then homing vectors are computed and utilized to guide the UAV toward the 3D home location. All computations can be performed in real-time on mobile devices through a mobile app. To validate our approach, we conducted quantitative evaluations on the most popular image sequence datasets and performed real experiments on a quadcopter (i.e., DJI Mavic Pro). Experimental results demonstrate the effectiveness of the proposed method.

视觉归巢使自主机器人仅使用视觉信息就能移动到目标(家)位置。虽然二维视觉寻的方法已经得到了广泛的研究，但是三维空间的寻的方法仍然是一个值得关注的问题。提出了一种适用于商用无人机的三维视觉寻的新方法。首先，通过当前视图与参考主图像之间的特征对应来估计相对相机姿态;然后计算并利用归巢向量引导无人机向三维归巢位置移动。所有计算都可以通过移动应用程序在移动设备上实时执行。为了验证我们的方法，我们对最流行的图像序列数据集进行了定量评估，并在四轴飞行器(即DJI Mavic Pro)上进行了实际实验。实验结果证明了该方法的有效性。

引用次数: 0

Learning Filters for the 2D Wavelet Transform 二维小波变换的学习滤波器

2018 15th Conference on Computer and Robot Vision (CRV)

Pub Date : 2018-05-01 DOI: 10.1109/CRV.2018.00036

D. Recoskie, Richard Mann

We propose a new method for learning filters for the 2D discrete wavelet transform. We extend our previous work on the 1D wavelet transform in order to process images. We show that the 2D wavelet transform can be represented as a modified convolutional neural network (CNN). Doing so allows us to learn wavelet filters from data by gradient descent. Our learned wavelets are similar to traditional wavelets which are typically derived using Fourier methods. For filter comparison, we make use of a cosine measure under all filter rotations. The learned wavelets are able to capture the structure of the training data. Furthermore, we can generate images from our model in order to evaluate the filters. The main findings of this work is that wavelet functions can arise naturally from data, without the need for Fourier methods. Our model requires relatively few parameters compared to traditional CNNs, and is easily incorporated into neural network frameworks.

提出了一种学习二维离散小波变换滤波器的新方法。我们扩展了之前在一维小波变换上的工作，以处理图像。我们证明二维小波变换可以表示为一个改进的卷积神经网络(CNN)。这样做可以让我们通过梯度下降从数据中学习小波滤波器。我们学习的小波与传统的用傅里叶方法推导的小波相似。对于滤波器比较，我们在所有滤波器旋转下使用余弦测量。学习到的小波能够捕获训练数据的结构。此外，我们可以从我们的模型中生成图像，以便评估过滤器。这项工作的主要发现是小波函数可以从数据中自然产生，而不需要傅里叶方法。与传统的cnn相比，我们的模型需要相对较少的参数，并且很容易融入神经网络框架。

引用次数: 4

Do-It-Yourself Single Camera 3D Pointer Input Device 自己做单相机3D指针输入设备

2018 15th Conference on Computer and Robot Vision (CRV)

Pub Date : 2018-05-01 DOI: 10.1109/CRV.2018.00038

Bernard Llanos, Herbert Yang

We present a new algorithm for single camera 3D reconstruction, or 3D input for human-computer interfaces, based on precise tracking of an elongated object, such as a pen, having a pattern of colored bands. To configure the system, the user provides no more than one labelled image of a handmade pointer, measurements of its colored bands, and the camera's pinhole projection matrix. Other systems are of much higher cost and complexity, requiring combinations of multiple cameras, stereocameras, and pointers with sensors and lights. Instead of relying on information from multiple devices, we examine our single view more closely, integrating geometric and appearance constraints to robustly track the pointer in the presence of occlusion and distractor objects. By probing objects of known geometry with the pointer, we demonstrate acceptable accuracy of 3D localization.

我们提出了一种新的算法，用于单相机3D重建，或人机界面的3D输入，基于精确跟踪细长物体，如笔，具有彩色带的模式。为了配置系统，用户提供不超过一个手工指针的标记图像，其彩色带的测量值和相机的针孔投影矩阵。其他系统的成本和复杂性要高得多，需要多个摄像头、立体摄像头和带有传感器和灯的指针的组合。我们不再依赖于来自多个设备的信息，而是更密切地检查我们的单一视图，整合几何和外观约束，以在遮挡和干扰对象存在的情况下稳健地跟踪指针。通过用指针探测已知几何形状的物体，我们展示了可接受的三维定位精度。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2018 15th Conference on Computer and Robot Vision (CRV)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀