首页 > 最新文献

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)最新文献

英文 中文
Location-Velocity Attention for Pedestrian Trajectory Prediction 行人轨迹预测的位置-速度注意
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00221
Hao Xue, D. Huynh, Mark Reynolds
Pedestrian path forecasting is crucial in applications such as smart video surveillance. It is a challenging task because of the complex crowd movement patterns in the scenes. Most of existing state-of-the-art LSTM based prediction methods require rich context like labelled static obstacles, labelled entrance/exit regions and even the background scene. Furthermore, incorporating contextual information into trajectory prediction increases the computational overhead and decreases the generalization of the prediction models across different scenes. In this paper, we propose a joint Location-Velocity Attention LSTM based method to predict trajectories. Specifically, a module is designed to tweak the LSTM network and an attention mechanism is trained to learn to optimally combine the location and the velocity information of pedestrians in the prediction process. We have evaluated our approach against other baselines and state-of-the-art methods on several publicly available datasets. The results show that it not only outperforms other prediction methods but it also has a good generalization ability.
行人路径预测在智能视频监控等应用中至关重要。由于场景中复杂的人群运动模式,这是一项具有挑战性的任务。现有最先进的基于LSTM的预测方法大多需要丰富的上下文,如标记的静态障碍物、标记的入口/出口区域甚至背景场景。此外,将上下文信息纳入轨迹预测增加了计算开销,降低了预测模型在不同场景下的泛化能力。本文提出了一种基于位置-速度关注LSTM的联合轨迹预测方法。具体来说,设计了一个模块来调整LSTM网络,并训练了一个注意机制来学习在预测过程中最优地结合行人的位置和速度信息。我们已经在几个公开可用的数据集上对其他基线和最先进的方法进行了评估。结果表明,该方法不仅优于其他预测方法,而且具有良好的泛化能力。
{"title":"Location-Velocity Attention for Pedestrian Trajectory Prediction","authors":"Hao Xue, D. Huynh, Mark Reynolds","doi":"10.1109/WACV.2019.00221","DOIUrl":"https://doi.org/10.1109/WACV.2019.00221","url":null,"abstract":"Pedestrian path forecasting is crucial in applications such as smart video surveillance. It is a challenging task because of the complex crowd movement patterns in the scenes. Most of existing state-of-the-art LSTM based prediction methods require rich context like labelled static obstacles, labelled entrance/exit regions and even the background scene. Furthermore, incorporating contextual information into trajectory prediction increases the computational overhead and decreases the generalization of the prediction models across different scenes. In this paper, we propose a joint Location-Velocity Attention LSTM based method to predict trajectories. Specifically, a module is designed to tweak the LSTM network and an attention mechanism is trained to learn to optimally combine the location and the velocity information of pedestrians in the prediction process. We have evaluated our approach against other baselines and state-of-the-art methods on several publicly available datasets. The results show that it not only outperforms other prediction methods but it also has a good generalization ability.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121965156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Bringing Vision to the Blind: From Coarse to Fine, One Dollar at a Time 为盲人带来视力:从粗糙到精细,一次一美元
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00057
T. Huynh, J. Pillai, Eunyoung Kim, Kristen Aw, Jack Sim, Ken Goldman, Rui Min
While deep learning has achieved great success in building vision applications for mainstream users, there is relatively less work for the blind and visually impaired to have a personal, on-device visual assistant for their daily life. Unlike mainstream applications, vision system for the blind must be robust, reliable and safe-to-use. In this paper, we propose a fine-grained currency recognizer based on CONGAS, which significantly surpasses other popular local features by a large margin. In addition, we introduce an effective and light-weight coarse classifier that gates the fine-grained recognizer on resource-constrained mobile devices. The coarse-to-fine approach is orchestrated to provide an extensible mobile-vision architecture, that demonstrates how the benefits of coordinating deep learning and local feature based methods can help in resolving a challenging problem for the blind and visually impaired. The proposed system runs in real-time with ~150ms latency on a Pixel device, and achieved 98% precision and 97% recall on a challenging evaluation set.
虽然深度学习在为主流用户构建视觉应用程序方面取得了巨大成功,但对于盲人和视障人士来说,为他们的日常生活提供个人设备上的视觉助手的工作相对较少。与主流应用不同,盲人视觉系统必须具有鲁棒性、可靠性和使用安全性。在本文中,我们提出了一种基于CONGAS的细粒度货币识别器,其显著优于其他流行的局部特征。此外,我们还引入了一种有效且轻量级的粗分类器,该分类器在资源受限的移动设备上对细粒度识别器进行了限制。这种从粗到精的方法被精心设计,以提供一个可扩展的移动视觉架构,它展示了协调深度学习和基于局部特征的方法的好处,如何帮助盲人和视障人士解决一个具有挑战性的问题。该系统在Pixel设备上实时运行,延迟约150ms,在具有挑战性的评估集上实现了98%的准确率和97%的召回率。
{"title":"Bringing Vision to the Blind: From Coarse to Fine, One Dollar at a Time","authors":"T. Huynh, J. Pillai, Eunyoung Kim, Kristen Aw, Jack Sim, Ken Goldman, Rui Min","doi":"10.1109/WACV.2019.00057","DOIUrl":"https://doi.org/10.1109/WACV.2019.00057","url":null,"abstract":"While deep learning has achieved great success in building vision applications for mainstream users, there is relatively less work for the blind and visually impaired to have a personal, on-device visual assistant for their daily life. Unlike mainstream applications, vision system for the blind must be robust, reliable and safe-to-use. In this paper, we propose a fine-grained currency recognizer based on CONGAS, which significantly surpasses other popular local features by a large margin. In addition, we introduce an effective and light-weight coarse classifier that gates the fine-grained recognizer on resource-constrained mobile devices. The coarse-to-fine approach is orchestrated to provide an extensible mobile-vision architecture, that demonstrates how the benefits of coordinating deep learning and local feature based methods can help in resolving a challenging problem for the blind and visually impaired. The proposed system runs in real-time with ~150ms latency on a Pixel device, and achieved 98% precision and 97% recall on a challenging evaluation set.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129139329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Comparative Analysis of Visual-Inertial SLAM for Assisted Wayfinding of the Visually Impaired 视觉惯性SLAM在视障人士辅助寻路中的比较分析
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00028
He Zhang, Lingqiu Jin, H. Zhang, C. Ye
This paper compares the performance of three state-of-the-art visual-inertial simultaneous localization and mapping (SLAM) methods in the context of assisted wayfinding of the visually impaired. Specifically, we analyze their strengths and weaknesses for assisted wayfinding of a robotic navigation aid (RNA). Based on the analysis, we select the best visual-inertial SLAM method for the RNA application and extend the method by integrating with it a method capable of detecting loops caused by the RNA's unique motion pattern. By incorporating the loop closures in the graph and optimization process, the extended visual-inertial SLAM method reduces the pose estimation error. The experimental results with our own datasets and the TUM VI benchmark datasets confirm the advantage of the selected method over the other two and validate the efficacy of the extended method.
本文比较了三种最先进的视觉惯性同步定位与映射(SLAM)方法在视障人士辅助寻路中的性能。具体来说,我们分析了它们在机器人导航辅助(RNA)辅助寻路中的优缺点。在此基础上,我们选择了RNA应用的最佳视觉惯性SLAM方法,并通过集成一种能够检测RNA独特运动模式引起的环路的方法对该方法进行了扩展。扩展视觉惯性SLAM方法通过在图中引入闭环和优化过程,减小了姿态估计误差。在我们自己的数据集和TUM VI基准数据集上的实验结果证实了所选方法相对于其他两种方法的优势,并验证了扩展方法的有效性。
{"title":"A Comparative Analysis of Visual-Inertial SLAM for Assisted Wayfinding of the Visually Impaired","authors":"He Zhang, Lingqiu Jin, H. Zhang, C. Ye","doi":"10.1109/WACV.2019.00028","DOIUrl":"https://doi.org/10.1109/WACV.2019.00028","url":null,"abstract":"This paper compares the performance of three state-of-the-art visual-inertial simultaneous localization and mapping (SLAM) methods in the context of assisted wayfinding of the visually impaired. Specifically, we analyze their strengths and weaknesses for assisted wayfinding of a robotic navigation aid (RNA). Based on the analysis, we select the best visual-inertial SLAM method for the RNA application and extend the method by integrating with it a method capable of detecting loops caused by the RNA's unique motion pattern. By incorporating the loop closures in the graph and optimization process, the extended visual-inertial SLAM method reduces the pose estimation error. The experimental results with our own datasets and the TUM VI benchmark datasets confirm the advantage of the selected method over the other two and validate the efficacy of the extended method.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"2007 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127486251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Iris Recognition: Comparing Visible-Light Lateral and Frontal Illumination to NIR Frontal Illumination 虹膜识别:比较可见光侧面和正面照明与近红外正面照明
Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00097
Daniel P. Benalcazar, C. Pérez, Diego Bastias, K. Bowyer
In most iris recognition systems the texture of the iris image is either the result of the interaction between the iris and Near Infrared (NIR) light, or between the iris pigmentation and visible-light. The iris, however, is a three-dimensional organ, and the information contained on its relief is not being exploited completely. In this article, we present an image acquisition method that enhances viewing the structural information of the iris. Our method consists of adding lateral illumination to the visible light frontal illumination to capture the structural information of the muscle fibers of the iris on the resulting image. These resulting images contain highly textured patterns of the iris. To test our method, we collected a database of 1,920 iris images using both a conventional NIR device, and a custom-made device that illuminates the eye in lateral and frontal angles with visible-light (LFVL). Then, we compared the iris recognition performance of both devices by means of a Hamming distance distribution analysis among the corresponding binary iris codes. The ROC curves show that our method produced more separable distributions than those of the NIR device, and much better distribution than using frontal visible-light alone. Eliminating errors produced by images captured with different iris dilation (13 cases), the NIR produced inter-class and intra-class distributions that are completely separable as in the case of LFVL. This acquisition method could also be useful for 3D iris scanning.
在大多数虹膜识别系统中,虹膜图像的纹理要么是虹膜与近红外(NIR)光相互作用的结果,要么是虹膜色素与可见光相互作用的结果。然而,虹膜是一个三维器官,其浮雕上所包含的信息并没有被完全利用。本文提出了一种增强虹膜结构信息观察的图像采集方法。我们的方法包括在可见光正面照明的基础上增加侧面照明,从而在得到的图像上捕获虹膜肌纤维的结构信息。这些图像包含了虹膜高度纹理化的图案。为了测试我们的方法,我们收集了1,920张虹膜图像的数据库,使用传统的近红外设备和定制的设备,用可见光(LFVL)从侧面和正面照射眼睛。然后,通过对相应二进制虹膜码的汉明距离分布分析,比较了两种设备的虹膜识别性能。ROC曲线显示,我们的方法比近红外装置产生更多的可分离分布,比单独使用正面可见光的分布要好得多。消除了不同虹膜扩张图像所产生的误差(13例),NIR产生的类间和类内分布与LFVL的情况完全可分离。这种采集方法也可用于3D虹膜扫描。
{"title":"Iris Recognition: Comparing Visible-Light Lateral and Frontal Illumination to NIR Frontal Illumination","authors":"Daniel P. Benalcazar, C. Pérez, Diego Bastias, K. Bowyer","doi":"10.1109/WACV.2019.00097","DOIUrl":"https://doi.org/10.1109/WACV.2019.00097","url":null,"abstract":"In most iris recognition systems the texture of the iris image is either the result of the interaction between the iris and Near Infrared (NIR) light, or between the iris pigmentation and visible-light. The iris, however, is a three-dimensional organ, and the information contained on its relief is not being exploited completely. In this article, we present an image acquisition method that enhances viewing the structural information of the iris. Our method consists of adding lateral illumination to the visible light frontal illumination to capture the structural information of the muscle fibers of the iris on the resulting image. These resulting images contain highly textured patterns of the iris. To test our method, we collected a database of 1,920 iris images using both a conventional NIR device, and a custom-made device that illuminates the eye in lateral and frontal angles with visible-light (LFVL). Then, we compared the iris recognition performance of both devices by means of a Hamming distance distribution analysis among the corresponding binary iris codes. The ROC curves show that our method produced more separable distributions than those of the NIR device, and much better distribution than using frontal visible-light alone. Eliminating errors produced by images captured with different iris dilation (13 cases), the NIR produced inter-class and intra-class distributions that are completely separable as in the case of LFVL. This acquisition method could also be useful for 3D iris scanning.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"191 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129237288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Multi-Component Image Translation for Deep Domain Generalization 基于深度域泛化的多分量图像平移
Pub Date : 2018-12-21 DOI: 10.1109/WACV.2019.00067
Mohammad Mahfujur Rahman, C. Fookes, Mahsa Baktash, S. Sridharan
Domain adaption (DA) and domain generalization (DG) are two closely related methods which are both concerned with the task of assigning labels to an unlabeled data set. The only dissimilarity between these approaches is that DA can access the target data during the training phase, while the target data is totally unseen during the training phase in DG. The task of DG is challenging as we have no earlier knowledge of the target samples. If DA methods are applied directly to DG by a simple exclusion of the target data from training, poor performance will result for a given task. In this paper, we tackle the domain generalization challenge in two ways. In our first approach, we propose a novel deep domain generalization architecture utilizing synthetic data generated by a Generative Adversarial Network (GAN). The discrepancy between the generated images and synthetic images is minimized using existing domain discrepancy metrics such as maximum mean discrepancy or correlation alignment. In our second approach, we introduce a protocol for applying DA methods to a DG scenario by excluding the target data from the training phase, splitting the source data to training and validation parts, and treating the validation data as target data for DA. We conduct extensive experiments on four cross-domain benchmark datasets. Experimental results signify our proposed model outperforms the current state-of-the-art methods for DG.
领域自适应(DA)和领域概化(DG)是两种密切相关的方法,它们都涉及到为未标记的数据集分配标签的任务。这两种方法的唯一不同之处在于,数据挖掘可以在训练阶段访问目标数据,而在数据挖掘中,目标数据在训练阶段是完全看不见的。DG的任务是具有挑战性的,因为我们对目标样品没有更早的了解。如果通过简单地将目标数据从训练中排除,直接将数据分析方法应用于DG,则会导致给定任务的性能不佳。在本文中,我们从两方面解决了领域泛化的挑战。在我们的第一种方法中,我们提出了一种新的深度域泛化架构,利用生成对抗网络(GAN)生成的合成数据。使用现有的域差异度量(如最大平均差异或相关对齐)最小化生成图像和合成图像之间的差异。在我们的第二种方法中,我们引入了一种协议,通过将目标数据从训练阶段排除,将源数据分割为训练和验证部分,并将验证数据作为数据处理的目标数据,将数据处理方法应用于DG场景。我们在四个跨域基准数据集上进行了广泛的实验。实验结果表明,我们提出的模型优于目前最先进的DG方法。
{"title":"Multi-Component Image Translation for Deep Domain Generalization","authors":"Mohammad Mahfujur Rahman, C. Fookes, Mahsa Baktash, S. Sridharan","doi":"10.1109/WACV.2019.00067","DOIUrl":"https://doi.org/10.1109/WACV.2019.00067","url":null,"abstract":"Domain adaption (DA) and domain generalization (DG) are two closely related methods which are both concerned with the task of assigning labels to an unlabeled data set. The only dissimilarity between these approaches is that DA can access the target data during the training phase, while the target data is totally unseen during the training phase in DG. The task of DG is challenging as we have no earlier knowledge of the target samples. If DA methods are applied directly to DG by a simple exclusion of the target data from training, poor performance will result for a given task. In this paper, we tackle the domain generalization challenge in two ways. In our first approach, we propose a novel deep domain generalization architecture utilizing synthetic data generated by a Generative Adversarial Network (GAN). The discrepancy between the generated images and synthetic images is minimized using existing domain discrepancy metrics such as maximum mean discrepancy or correlation alignment. In our second approach, we introduce a protocol for applying DA methods to a DG scenario by excluding the target data from the training phase, splitting the source data to training and validation parts, and treating the validation data as target data for DA. We conduct extensive experiments on four cross-domain benchmark datasets. Experimental results signify our proposed model outperforms the current state-of-the-art methods for DG.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116613401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
DAC: Data-Free Automatic Acceleration of Convolutional Networks 卷积网络的无数据自动加速
Pub Date : 2018-12-20 DOI: 10.1109/WACV.2019.00175
Xin Li, Shuai Zhang, Bolan Jiang, Y. Qi, M. Chuah, N. Bi
Deploying a deep learning model on mobile/IoT devices is a challenging task. The difficulty lies in the trade-off between computation speed and accuracy. A complex deep learning model with high accuracy runs slowly on resource-limited devices, while a light-weight model that runs much faster loses accuracy. In this paper, we propose a novel decomposition method, namely DAC, that is capable of factorizing an ordinary convolutional layer into two layers with much fewer parameters. DAC computes the corresponding weights for the newly generated layers directly from the weights of the original convolutional layer. Thus, no training (or fine-tuning) or any data is needed. The experimental results show that DAC reduces a large number of floating-point operations (FLOPs) while maintaining high accuracy of a pre-trained model. If 2% accuracy drop is acceptable, DAC saves 53% FLOPs of VGG16 image classification model on ImageNet dataset, 29% FLOPS of SSD300 object detection model on PASCAL VOC2007 dataset, and 46% FLOPS of a multi-person pose estimation model on Microsoft COCO dataset. Compared to other existing decomposition methods, DAC achieves better performance.
在移动/物联网设备上部署深度学习模型是一项具有挑战性的任务。难点在于计算速度和精度之间的权衡。高精度的复杂深度学习模型在资源有限的设备上运行缓慢,而运行速度快得多的轻量级模型则会失去准确性。在本文中,我们提出了一种新的分解方法,即DAC,它能够将普通卷积层分解成具有更少参数的两层。DAC直接从原始卷积层的权重计算新生成层的相应权重。因此,不需要训练(或微调)或任何数据。实验结果表明,DAC减少了大量的浮点运算,同时保持了预训练模型的高精度。在接受2%精度下降的情况下,DAC在ImageNet数据集上为VGG16图像分类模型节省53%的FLOPs,在PASCAL VOC2007数据集上为SSD300目标检测模型节省29%的FLOPs,在Microsoft COCO数据集上为多人姿态估计模型节省46%的FLOPs。与现有的其他分解方法相比,DAC具有更好的性能。
{"title":"DAC: Data-Free Automatic Acceleration of Convolutional Networks","authors":"Xin Li, Shuai Zhang, Bolan Jiang, Y. Qi, M. Chuah, N. Bi","doi":"10.1109/WACV.2019.00175","DOIUrl":"https://doi.org/10.1109/WACV.2019.00175","url":null,"abstract":"Deploying a deep learning model on mobile/IoT devices is a challenging task. The difficulty lies in the trade-off between computation speed and accuracy. A complex deep learning model with high accuracy runs slowly on resource-limited devices, while a light-weight model that runs much faster loses accuracy. In this paper, we propose a novel decomposition method, namely DAC, that is capable of factorizing an ordinary convolutional layer into two layers with much fewer parameters. DAC computes the corresponding weights for the newly generated layers directly from the weights of the original convolutional layer. Thus, no training (or fine-tuning) or any data is needed. The experimental results show that DAC reduces a large number of floating-point operations (FLOPs) while maintaining high accuracy of a pre-trained model. If 2% accuracy drop is acceptable, DAC saves 53% FLOPs of VGG16 image classification model on ImageNet dataset, 29% FLOPS of SSD300 object detection model on PASCAL VOC2007 dataset, and 46% FLOPS of a multi-person pose estimation model on Microsoft COCO dataset. Compared to other existing decomposition methods, DAC achieves better performance.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"68 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128725176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
SfMLearner++: Learning Monocular Depth & Ego-Motion Using Meaningful Geometric Constraints SfMLearner++:使用有意义的几何约束学习单目深度和自我运动
Pub Date : 2018-12-20 DOI: 10.1109/WACV.2019.00226
V. Prasad, B. Bhowmick
Most geometric approaches to monocular Visual Odometry (VO) provide robust pose estimates, but sparse or semi-dense depth estimates. Off late, deep methods have shown good performance in generating dense depths and VO from monocular images by optimizing the photometric consistency between images. Despite being intuitive, a naive photometric loss does not ensure proper pixel correspondences between two views, which is the key factor for accurate depth and relative pose estimations. It is a well known fact that simply minimizing such an error is prone to failures. We propose a method using Epipolar constraints to make the learning more geometrically sound. We use the Essential matrix, obtained using Nistér's Five Point Algorithm, for enforcing meaningful geometric constraints on the loss, rather than using it as labels for training. Our method, although simplistic but more geometrically meaningful, uses lesser number of parameters to give a comparable performance to state-of-the-art methods which use complex losses and large networks showing the effectiveness of using epipolar constraints. Such a geometrically constrained learning method performs successfully even in cases where simply minimizing the photometric error would fail.
大多数几何方法的单目视觉距离测量(VO)提供鲁棒的姿态估计,但稀疏或半密集的深度估计。近年来,深度方法通过优化图像之间的光度一致性,在单眼图像生成密集深度和VO方面表现出良好的性能。尽管是直观的,幼稚的光度损失并不能确保两个视图之间适当的像素对应,这是准确的深度和相对姿态估计的关键因素。这是一个众所周知的事实,简单地最小化这样的错误是容易失败的。我们提出了一种使用极限约束的方法,使学习在几何上更加合理。我们使用本质矩阵(Essential matrix),通过nist的五点算法(Five Point Algorithm)获得,对损失施加有意义的几何约束,而不是将其用作训练的标签。我们的方法虽然简单,但在几何上更有意义,使用较少数量的参数来提供与使用复杂损失和大型网络的最先进方法相当的性能,显示使用极外约束的有效性。这种几何约束的学习方法即使在简单地最小化光度误差失败的情况下也能成功地执行。
{"title":"SfMLearner++: Learning Monocular Depth & Ego-Motion Using Meaningful Geometric Constraints","authors":"V. Prasad, B. Bhowmick","doi":"10.1109/WACV.2019.00226","DOIUrl":"https://doi.org/10.1109/WACV.2019.00226","url":null,"abstract":"Most geometric approaches to monocular Visual Odometry (VO) provide robust pose estimates, but sparse or semi-dense depth estimates. Off late, deep methods have shown good performance in generating dense depths and VO from monocular images by optimizing the photometric consistency between images. Despite being intuitive, a naive photometric loss does not ensure proper pixel correspondences between two views, which is the key factor for accurate depth and relative pose estimations. It is a well known fact that simply minimizing such an error is prone to failures. We propose a method using Epipolar constraints to make the learning more geometrically sound. We use the Essential matrix, obtained using Nistér's Five Point Algorithm, for enforcing meaningful geometric constraints on the loss, rather than using it as labels for training. Our method, although simplistic but more geometrically meaningful, uses lesser number of parameters to give a comparable performance to state-of-the-art methods which use complex losses and large networks showing the effectiveness of using epipolar constraints. Such a geometrically constrained learning method performs successfully even in cases where simply minimizing the photometric error would fail.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123576278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Learning On-Road Visual Control for Self-Driving Vehicles With Auxiliary Tasks 具有辅助任务的自动驾驶汽车的道路视觉控制学习
Pub Date : 2018-12-19 DOI: 10.1109/WACV.2019.00041
Yilun Chen, Praveen Palanisamy, P. Mudalige, Katharina Muelling, J. Dolan
A safe and robust on-road navigation system is a crucial component of achieving fully automated vehicles. NVIDIA recently proposed an End-to-End algorithm that can directly learn steering commands from raw pixels of a front camera by using one convolutional neural network. In this paper, we leverage auxiliary information aside from raw images and design a novel network structure, called Auxiliary Task Network (ATN), to help boost the driving performance while maintaining the advantage of minimal training data and an End-to-End training method. In this network, we introduce human prior knowledge into vehicle navigation by transferring features from image recognition tasks. Image semantic segmentation is applied as an auxiliary task for navigation. We consider temporal information by introducing an LSTM module and optical flow to the network. Finally, we combine vehicle kinematics with a sensor fusion step. We discuss the benefits of our method over state-of-the-art visual navigation methods both in the Udacity simulation environment and on the real-world Comma.ai dataset.
安全可靠的道路导航系统是实现车辆全自动驾驶的关键组成部分。NVIDIA最近提出了一种端到端算法,通过一个卷积神经网络,可以直接从前置摄像头的原始像素中学习转向命令。在本文中,我们利用原始图像之外的辅助信息,设计了一种新的网络结构,称为辅助任务网络(ATN),以帮助提高驾驶性能,同时保持最小训练数据和端到端训练方法的优势。在该网络中,我们通过转移图像识别任务的特征,将人类先验知识引入到车辆导航中。将图像语义分割作为导航的辅助任务。我们通过在网络中引入LSTM模块和光流来考虑时间信息。最后,我们将车辆运动学与传感器融合步骤结合起来。我们讨论了我们的方法在Udacity模拟环境和现实世界逗号中优于最先进的视觉导航方法的好处。人工智能的数据集。
{"title":"Learning On-Road Visual Control for Self-Driving Vehicles With Auxiliary Tasks","authors":"Yilun Chen, Praveen Palanisamy, P. Mudalige, Katharina Muelling, J. Dolan","doi":"10.1109/WACV.2019.00041","DOIUrl":"https://doi.org/10.1109/WACV.2019.00041","url":null,"abstract":"A safe and robust on-road navigation system is a crucial component of achieving fully automated vehicles. NVIDIA recently proposed an End-to-End algorithm that can directly learn steering commands from raw pixels of a front camera by using one convolutional neural network. In this paper, we leverage auxiliary information aside from raw images and design a novel network structure, called Auxiliary Task Network (ATN), to help boost the driving performance while maintaining the advantage of minimal training data and an End-to-End training method. In this network, we introduce human prior knowledge into vehicle navigation by transferring features from image recognition tasks. Image semantic segmentation is applied as an auxiliary task for navigation. We consider temporal information by introducing an LSTM module and optical flow to the network. Finally, we combine vehicle kinematics with a sensor fusion step. We discuss the benefits of our method over state-of-the-art visual navigation methods both in the Udacity simulation environment and on the real-world Comma.ai dataset.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124941668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Model-Free Tracking With Deep Appearance and Motion Features Integration 无模型跟踪与深度外观和运动特征的集成
Pub Date : 2018-12-16 DOI: 10.1109/WACV.2019.00018
Xiaolong Jiang, Peizhao Li, Xiantong Zhen, Xianbin Cao
Being able to track an anonymous object, a model-free tracker is comprehensively applicable regardless of the target type. However, designing such a generalized framework is challenged by the lack of object-oriented prior information. As one solution, a real-time model-free object tracking approach is designed in this work relying on Convolutional Neural Networks (CNNs). To overcome the object-centric information scarcity, both appearance and motion features are deeply integrated by the proposed AMNet, which is an end-to-end offline trained two-stream network. Between the two parallel streams, the ANet investigates appearance features with a multi-scale Siamese atrous CNN, enabling the tracking-by-matching strategy. The MNet achieves deep motion detection to localize anonymous moving objects by processing generic motion features. The final tracking result at each frame is generated by fusing the output response maps from both sub-networks. The proposed AMNet reports leading performance on both OTB and VOT benchmark datasets with favorable real-time processing speed.
由于能够跟踪匿名对象,无模型跟踪器完全适用于任何目标类型。然而,由于缺乏面向对象的先验信息,设计这样一个通用的框架受到了挑战。作为一种解决方案,本工作设计了一种基于卷积神经网络(cnn)的实时无模型目标跟踪方法。为了克服以对象为中心的信息稀缺性,所提出的AMNet是一个端到端离线训练的双流网络,将外观和运动特征深度集成在一起。在两个并行流之间,ANet使用多尺度Siamese属性CNN调查外观特征,实现匹配跟踪策略。MNet通过处理通用运动特征,实现深度运动检测,定位匿名运动对象。通过融合两个子网络的输出响应映射,生成每帧的最终跟踪结果。所提出的AMNet报告了在OTB和VOT基准数据集上的领先性能,具有良好的实时处理速度。
{"title":"Model-Free Tracking With Deep Appearance and Motion Features Integration","authors":"Xiaolong Jiang, Peizhao Li, Xiantong Zhen, Xianbin Cao","doi":"10.1109/WACV.2019.00018","DOIUrl":"https://doi.org/10.1109/WACV.2019.00018","url":null,"abstract":"Being able to track an anonymous object, a model-free tracker is comprehensively applicable regardless of the target type. However, designing such a generalized framework is challenged by the lack of object-oriented prior information. As one solution, a real-time model-free object tracking approach is designed in this work relying on Convolutional Neural Networks (CNNs). To overcome the object-centric information scarcity, both appearance and motion features are deeply integrated by the proposed AMNet, which is an end-to-end offline trained two-stream network. Between the two parallel streams, the ANet investigates appearance features with a multi-scale Siamese atrous CNN, enabling the tracking-by-matching strategy. The MNet achieves deep motion detection to localize anonymous moving objects by processing generic motion features. The final tracking result at each frame is generated by fusing the output response maps from both sub-networks. The proposed AMNet reports leading performance on both OTB and VOT benchmark datasets with favorable real-time processing speed.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"389 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124801161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Action Quality Assessment Across Multiple Actions 跨多个行动的行动质量评估
Pub Date : 2018-12-15 DOI: 10.1109/WACV.2019.00161
Paritosh Parmar, B. Morris
Can learning to measure the quality of an action help in measuring the quality of other actions? If so, can consolidated samples from multiple actions help improve the performance of current approaches? In this paper, we carry out experiments to see if knowledge transfer is possible in the action quality assessment (AQA) setting. Experiments are carried out on our newly released AQA dataset (http://rtis.oit.unlv.edu/datasets.html) consisting of 1106 action samples from seven actions with quality as measured by expert human judges. Our experimental results show that there is utility in learning a single model across multiple actions.
学会衡量一个行为的质量是否有助于衡量其他行为的质量?如果是这样,来自多个操作的合并样本是否有助于改进当前方法的性能?在本文中,我们进行了实验,看看知识转移是否可能在行动质量评估(AQA)设置。实验在我们新发布的AQA数据集(http://rtis.oit.unlv.edu/datasets.html)上进行,该数据集由7个动作的1106个动作样本组成,这些动作的质量由人类专家判断。我们的实验结果表明,跨多个动作学习单个模型是有用的。
{"title":"Action Quality Assessment Across Multiple Actions","authors":"Paritosh Parmar, B. Morris","doi":"10.1109/WACV.2019.00161","DOIUrl":"https://doi.org/10.1109/WACV.2019.00161","url":null,"abstract":"Can learning to measure the quality of an action help in measuring the quality of other actions? If so, can consolidated samples from multiple actions help improve the performance of current approaches? In this paper, we carry out experiments to see if knowledge transfer is possible in the action quality assessment (AQA) setting. Experiments are carried out on our newly released AQA dataset (http://rtis.oit.unlv.edu/datasets.html) consisting of 1106 action samples from seven actions with quality as measured by expert human judges. Our experimental results show that there is utility in learning a single model across multiple actions.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115698192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 78
期刊
2019 IEEE Winter Conference on Applications of Computer Vision (WACV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1