仅使用稀疏语义对象特征的鲁棒无人机视觉教学和重复

2018 15th Conference on Computer and Robot Vision (CRV) Pub Date : 2018-05-01 DOI:10.1109/CRV.2018.00034

A. Toudeshki, Faraz Shamshirdar, R. Vaughan

{"title":"仅使用稀疏语义对象特征的鲁棒无人机视觉教学和重复","authors":"A. Toudeshki, Faraz Shamshirdar, R. Vaughan","doi":"10.1109/CRV.2018.00034","DOIUrl":null,"url":null,"abstract":"We demonstrate the use of semantic object detections as robust features for Visual Teach and Repeat (VTR). Recent CNN-based object detectors are able to reliably detect objects of tens or hundreds of categories in video at frame rates. We show that such detections are repeatable enough to use as landmarks for VTR, without any low-level image features. Since object detections are highly invariant to lighting and surface appearance changes, our VTR can cope with global lighting changes and local movements of the landmark objects. In the teaching phase we build extremely compact scene descriptors: a list of detected object labels and their image-plane locations. In the repeating phase, we use Seq-SLAM-like relocalization to identify the most similar learned scene, then use a motion control algorithm based on the funnel lane theory to navigate the robot along the previously piloted trajectory. We evaluate the method on a commodity UAV, examining the robustness of the algorithm to new viewpoints, lighting conditions, and movements of landmark objects. The results suggest that semantic object features could be useful due to their invariance to superficial appearance changes compared to low-level image features.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Robust UAV Visual Teach and Repeat Using Only Sparse Semantic Object Features\",\"authors\":\"A. Toudeshki, Faraz Shamshirdar, R. Vaughan\",\"doi\":\"10.1109/CRV.2018.00034\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We demonstrate the use of semantic object detections as robust features for Visual Teach and Repeat (VTR). Recent CNN-based object detectors are able to reliably detect objects of tens or hundreds of categories in video at frame rates. We show that such detections are repeatable enough to use as landmarks for VTR, without any low-level image features. Since object detections are highly invariant to lighting and surface appearance changes, our VTR can cope with global lighting changes and local movements of the landmark objects. In the teaching phase we build extremely compact scene descriptors: a list of detected object labels and their image-plane locations. In the repeating phase, we use Seq-SLAM-like relocalization to identify the most similar learned scene, then use a motion control algorithm based on the funnel lane theory to navigate the robot along the previously piloted trajectory. We evaluate the method on a commodity UAV, examining the robustness of the algorithm to new viewpoints, lighting conditions, and movements of landmark objects. The results suggest that semantic object features could be useful due to their invariance to superficial appearance changes compared to low-level image features.\",\"PeriodicalId\":281779,\"journal\":{\"name\":\"2018 15th Conference on Computer and Robot Vision (CRV)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 15th Conference on Computer and Robot Vision (CRV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CRV.2018.00034\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 15th Conference on Computer and Robot Vision (CRV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CRV.2018.00034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

我们演示了使用语义对象检测作为视觉教学和重复(VTR)的鲁棒特征。最近基于cnn的目标检测器能够以帧率可靠地检测视频中数十或数百个类别的对象。我们证明这种检测是可重复的，足以用作VTR的地标，没有任何低级图像特征。由于物体检测对照明和表面外观变化具有高度的不变性，因此我们的VTR可以处理全局照明变化和地标物体的局部运动。在教学阶段，我们构建了非常紧凑的场景描述符:检测到的对象标签及其图像平面位置的列表。在重复阶段，我们使用类似seq - slam的重新定位方法来识别最相似的学习场景，然后使用基于漏斗车道理论的运动控制算法来沿着先前的驾驶轨迹导航机器人。我们在一架商用无人机上评估了该方法，检查了算法对新视点、光照条件和地标物体运动的鲁棒性。结果表明，与低级图像特征相比，语义对象特征对表面外观变化的不变性可能是有用的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Robust UAV Visual Teach and Repeat Using Only Sparse Semantic Object Features

We demonstrate the use of semantic object detections as robust features for Visual Teach and Repeat (VTR). Recent CNN-based object detectors are able to reliably detect objects of tens or hundreds of categories in video at frame rates. We show that such detections are repeatable enough to use as landmarks for VTR, without any low-level image features. Since object detections are highly invariant to lighting and surface appearance changes, our VTR can cope with global lighting changes and local movements of the landmark objects. In the teaching phase we build extremely compact scene descriptors: a list of detected object labels and their image-plane locations. In the repeating phase, we use Seq-SLAM-like relocalization to identify the most similar learned scene, then use a motion control algorithm based on the funnel lane theory to navigate the robot along the previously piloted trajectory. We evaluate the method on a commodity UAV, examining the robustness of the algorithm to new viewpoints, lighting conditions, and movements of landmark objects. The results suggest that semantic object features could be useful due to their invariance to superficial appearance changes compared to low-level image features.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 15th Conference on Computer and Robot Vision (CRV)

自引率

0.00%

发文量