首页 > 最新文献

2018 15th Conference on Computer and Robot Vision (CRV)最新文献

英文 中文
Spatiotemporal KSVD Dictionary Learning for Online Multi-target Tracking 面向在线多目标跟踪的时空KSVD字典学习
Pub Date : 2018-05-01 DOI: 10.1109/CRV.2018.00030
H. Manh, G. Alaghband
In this paper, we present a new spatiotemporal discriminative KSVD dictionary algorithm (STKSVD) for learning target appearance in online multi-target tracking system. Different from other classification/recognition tasks (e.g. face, image recognition), learning target's appearance in online multi-target tracking is impacted by factors such as: posture/articulation changes, partial occlusion by background scene or other targets, background changes (human detection bounding box covers both human parts and part of the scene), etc. However, we observe that these variations occur gradually relative to spatial and temporal dynamics. We characterize the spatial and temporal information between target's samples through a new STKSVD appearance learning algorithm to better discriminate targets. Our STKSVD method is able to learn discriminative sparse code, linear classifier parameters, and minimize reconstruction error in single optimization system. Our appearance learning algorithm and tracking framework employs two different methods of calculating appearance similarity score in each stage of a two-stage association: a linear classifier in the first stage, and minimum residual errors in the second stage. The results tested using 2DMOT2015 dataset and its public Aggregated Channel Features (ACF) human detection for all comparisons show that our method outperforms the existing related learning methods.
本文提出了一种新的用于在线多目标跟踪系统中目标外观学习的时空判别KSVD字典算法。与其他分类/识别任务(如人脸、图像识别)不同,在线多目标跟踪中学习目标的外观受到以下因素的影响:姿势/发音变化、背景场景或其他目标局部遮挡、背景变化(人体检测边界框既覆盖人体部分,也覆盖部分场景)等。然而,我们观察到这些变化相对于空间和时间动态是逐渐发生的。我们通过一种新的STKSVD外观学习算法来表征目标样本之间的时空信息,以更好地识别目标。我们的STKSVD方法能够学习判别稀疏码、线性分类器参数,并且在单个优化系统中重构误差最小。我们的外观学习算法和跟踪框架在两阶段关联的每个阶段使用两种不同的方法来计算外观相似性得分:第一阶段使用线性分类器,第二阶段使用最小残差。使用2DMOT2015数据集及其公共聚合通道特征(ACF)人工检测进行所有比较的测试结果表明,我们的方法优于现有的相关学习方法。
{"title":"Spatiotemporal KSVD Dictionary Learning for Online Multi-target Tracking","authors":"H. Manh, G. Alaghband","doi":"10.1109/CRV.2018.00030","DOIUrl":"https://doi.org/10.1109/CRV.2018.00030","url":null,"abstract":"In this paper, we present a new spatiotemporal discriminative KSVD dictionary algorithm (STKSVD) for learning target appearance in online multi-target tracking system. Different from other classification/recognition tasks (e.g. face, image recognition), learning target's appearance in online multi-target tracking is impacted by factors such as: posture/articulation changes, partial occlusion by background scene or other targets, background changes (human detection bounding box covers both human parts and part of the scene), etc. However, we observe that these variations occur gradually relative to spatial and temporal dynamics. We characterize the spatial and temporal information between target's samples through a new STKSVD appearance learning algorithm to better discriminate targets. Our STKSVD method is able to learn discriminative sparse code, linear classifier parameters, and minimize reconstruction error in single optimization system. Our appearance learning algorithm and tracking framework employs two different methods of calculating appearance similarity score in each stage of a two-stage association: a linear classifier in the first stage, and minimum residual errors in the second stage. The results tested using 2DMOT2015 dataset and its public Aggregated Channel Features (ACF) human detection for all comparisons show that our method outperforms the existing related learning methods.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127132804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Fast Unsynchronized Unstructured Light 快速非同步非结构光
Pub Date : 2018-05-01 DOI: 10.1109/CRV.2018.00046
Chaima El Asmi, S. Roy
This paper proposes a new approach in structured light correspondence to alleviate the camera-projector synchronization problem. Until now, great care was required to make sure that each camera image was corresponding exactly the correct pattern in the sequence. This was difficult to achieve with low-cost hardware or large size installations. In our method, the projector sends a constant video loop of a selected number of unstructured light patterns at a high frame rate (30 to 60 fps for common hardware), which are captured by a camera without any form of synchronization. The only constraint to satisfy is that the camera and projector frame rates are known. The matching process not only recovers the correct pattern sequence, but is impervious to partial exposures of consecutive patterns as well as rolling shutter effects.
本文提出了一种结构光对应的新方法,以缓解摄像机与投影机的同步问题。到目前为止,需要非常小心地确保每个相机图像与序列中的正确模式完全对应。这在低成本硬件或大型安装中是很难实现的。在我们的方法中,投影仪以高帧率(普通硬件为30至60 fps)发送选定数量的非结构化光模式的恒定视频循环,由摄像机捕获,而无需任何形式的同步。唯一需要满足的约束是摄像机和投影仪的帧率是已知的。匹配过程不仅恢复正确的图案序列,而且不受连续图案的部分曝光和滚动快门的影响。
{"title":"Fast Unsynchronized Unstructured Light","authors":"Chaima El Asmi, S. Roy","doi":"10.1109/CRV.2018.00046","DOIUrl":"https://doi.org/10.1109/CRV.2018.00046","url":null,"abstract":"This paper proposes a new approach in structured light correspondence to alleviate the camera-projector synchronization problem. Until now, great care was required to make sure that each camera image was corresponding exactly the correct pattern in the sequence. This was difficult to achieve with low-cost hardware or large size installations. In our method, the projector sends a constant video loop of a selected number of unstructured light patterns at a high frame rate (30 to 60 fps for common hardware), which are captured by a camera without any form of synchronization. The only constraint to satisfy is that the camera and projector frame rates are known. The matching process not only recovers the correct pattern sequence, but is impervious to partial exposures of consecutive patterns as well as rolling shutter effects.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129322351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Semantic Scene Models for Visual Localization under Large Viewpoint Changes 大视点变化下视觉定位的语义场景模型
Pub Date : 2018-05-01 DOI: 10.1109/CRV.2018.00033
J. Li, Zhaoqi Xu, D. Meger, G. Dudek
We propose an approach for camera pose estimation under large viewpoint changes using only 2D RGB images. This enables a mobile robot to relocalize itself with respect to a previously-visited scene when seeing it again from a completely new vantage point. In order to overcome large appearance changes, we integrate a variety of cues, including object detections, vanishing points, structure from motion, and object-to-object context in order to constrain the camera geometry, while simultaneously estimating the 3D pose of covisible objects represented as bounding cuboids. We propose an efficient sampling-based approach that quickly cuts down the high-dimensional search space, and a robust correspondence algorithm that matches covisible objects via inter-object spatial relationships. We validate our approach using the publicly available Sun3D dataset, in which we demonstrate the ability to handle camera translations of up to 5.9 meters and camera rotations of up to 110 degrees.
我们提出了一种仅使用2D RGB图像进行大视点变化下相机姿态估计的方法。这使得移动机器人能够在从一个全新的有利位置再次看到之前访问过的场景时重新定位自己。为了克服巨大的外观变化,我们整合了各种线索,包括物体检测,消失点,运动结构和物体到物体的上下文,以约束相机几何形状,同时估计以边界长方体表示的共视物体的3D姿态。我们提出了一种高效的基于采样的方法,该方法可以快速减少高维搜索空间,并提出了一种鲁棒的对应算法,该算法通过对象间空间关系匹配共同可见的对象。我们使用公开可用的Sun3D数据集验证了我们的方法,其中我们展示了处理高达5.9米的相机平移和高达110度的相机旋转的能力。
{"title":"Semantic Scene Models for Visual Localization under Large Viewpoint Changes","authors":"J. Li, Zhaoqi Xu, D. Meger, G. Dudek","doi":"10.1109/CRV.2018.00033","DOIUrl":"https://doi.org/10.1109/CRV.2018.00033","url":null,"abstract":"We propose an approach for camera pose estimation under large viewpoint changes using only 2D RGB images. This enables a mobile robot to relocalize itself with respect to a previously-visited scene when seeing it again from a completely new vantage point. In order to overcome large appearance changes, we integrate a variety of cues, including object detections, vanishing points, structure from motion, and object-to-object context in order to constrain the camera geometry, while simultaneously estimating the 3D pose of covisible objects represented as bounding cuboids. We propose an efficient sampling-based approach that quickly cuts down the high-dimensional search space, and a robust correspondence algorithm that matches covisible objects via inter-object spatial relationships. We validate our approach using the publicly available Sun3D dataset, in which we demonstrate the ability to handle camera translations of up to 5.9 meters and camera rotations of up to 110 degrees.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116041074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Manifold Geometry with Fast Automatic Derivatives and Coordinate Frame Semantics Checking in C++ 具有快速自动导数和坐标框架语义检查的流形几何
Pub Date : 2018-05-01 DOI: 10.1109/CRV.2018.00027
Leonid Koppel, Steven L. Waslander
Computer vision and robotics problems often require representation and estimation of poses on the SE(3) manifold. Developers of algorithms that must run in real time face several time-consuming programming tasks, including deriving and computing analytic derivatives and avoiding mathematical errors when handling poses in multiple coordinate frames. To support rapid and error-free development, we present wave_geometry, a C++ manifold geometry library with two key contributions: expression template-based automatic differentiation and compile-time enforcement of coordinate frame semantics. We contrast the library with existing open source packages and show that it can evaluate Jacobians in forward and reverse mode with little to no runtime overhead compared to hand-coded derivatives. The library is available at https://github.com/wavelab/wave_geometry.
计算机视觉和机器人问题通常需要在SE(3)流形上表示和估计姿态。必须实时运行的算法的开发人员面临着几个耗时的编程任务,包括推导和计算解析导数,以及在处理多个坐标帧中的姿态时避免数学错误。为了支持快速和无错误的开发,我们提出了wave_geometry,一个c++流形几何库,它有两个关键贡献:基于表达式模板的自动微分和编译时坐标框架语义的强制执行。我们将该库与现有的开源包进行了对比,并表明与手工编写的导数相比,它可以在正向和反向模式下计算雅可比矩阵,几乎没有运行时开销。该图书馆可在https://github.com/wavelab/wave_geometry上找到。
{"title":"Manifold Geometry with Fast Automatic Derivatives and Coordinate Frame Semantics Checking in C++","authors":"Leonid Koppel, Steven L. Waslander","doi":"10.1109/CRV.2018.00027","DOIUrl":"https://doi.org/10.1109/CRV.2018.00027","url":null,"abstract":"Computer vision and robotics problems often require representation and estimation of poses on the SE(3) manifold. Developers of algorithms that must run in real time face several time-consuming programming tasks, including deriving and computing analytic derivatives and avoiding mathematical errors when handling poses in multiple coordinate frames. To support rapid and error-free development, we present wave_geometry, a C++ manifold geometry library with two key contributions: expression template-based automatic differentiation and compile-time enforcement of coordinate frame semantics. We contrast the library with existing open source packages and show that it can evaluate Jacobians in forward and reverse mode with little to no runtime overhead compared to hand-coded derivatives. The library is available at https://github.com/wavelab/wave_geometry.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124172087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Disparity Filtering with 3D Convolutional Neural Networks 用三维卷积神经网络进行视差滤波
Pub Date : 2018-05-01 DOI: 10.1109/CRV.2018.00042
W. Mao, Minglun Gong
Stereo matching is an ill-posed problem and hence the disparity maps generated are often inaccurate and noisy. To alleviate the problem, a number of approaches were proposed to output accurate disparity values for selected pixels only. Instead of designing another disparity optimization method for sparse disparity matching, we present a novel disparity filtering step that detects and removes inaccurate matches. Based on 3D convolutional neutral networks, our detector is trained directly on 3D matching cost volumes and hence work with different matching cost generation approaches. The experimental results show that it can effectively filter out mismatches while preserving the accurate ones. As a result, combining our approach with the simplest Winner-Take-All optimization will lead to a better performance than most existing sparse stereo matching algorithms on the Middlebury Stereo Evaluation site.
立体匹配是一个不适定问题,因此产生的视差图往往是不准确的和有噪声的。为了缓解这个问题,提出了一些方法来输出精确的视差值为选定的像素。本文提出了一种新的视差过滤步骤来检测和去除不准确的匹配,而不是设计稀疏视差匹配的另一种视差优化方法。基于3D卷积神经网络,我们的检测器直接在3D匹配成本体积上进行训练,因此可以使用不同的匹配成本生成方法。实验结果表明,该方法可以有效地滤除不匹配,同时保持准确匹配。因此,将我们的方法与最简单的赢家通吃优化相结合,将比Middlebury stereo Evaluation网站上大多数现有的稀疏立体匹配算法获得更好的性能。
{"title":"Disparity Filtering with 3D Convolutional Neural Networks","authors":"W. Mao, Minglun Gong","doi":"10.1109/CRV.2018.00042","DOIUrl":"https://doi.org/10.1109/CRV.2018.00042","url":null,"abstract":"Stereo matching is an ill-posed problem and hence the disparity maps generated are often inaccurate and noisy. To alleviate the problem, a number of approaches were proposed to output accurate disparity values for selected pixels only. Instead of designing another disparity optimization method for sparse disparity matching, we present a novel disparity filtering step that detects and removes inaccurate matches. Based on 3D convolutional neutral networks, our detector is trained directly on 3D matching cost volumes and hence work with different matching cost generation approaches. The experimental results show that it can effectively filter out mismatches while preserving the accurate ones. As a result, combining our approach with the simplest Winner-Take-All optimization will lead to a better performance than most existing sparse stereo matching algorithms on the Middlebury Stereo Evaluation site.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124705807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Data-Driven Multispectral Image Registration 数据驱动的多光谱图像配准
Pub Date : 2018-05-01 DOI: 10.1109/CRV.2018.00040
Rahat Yasir, M. Eramian, I. Stavness, S. Shirtliffe, H. Duddu
Multispectral imaging is widely used in remote sensing applications from UAVs and ground-based platforms. Multispectral cameras often use a physically different camera for each wavelength causing misalignment in the images for different imaging bands. This misalignment must be corrected prior to concurrent multi-band image analysis. The traditional approach for multispectral image registration process is to select a target channel and register all other image channels to the target. There is no objective evidence-based method to select a target. The possibility of registration to some intermediate channel to the target is not usually considered, but could be beneficial if there is no target channel for which direct registration performs well for every other channel. In this paper, we propose an automatic data-driven multispectral image registration framework that determines a target channel, and possible intermediate registration steps based on the assumptions that 1) some reasonable minimum number of control points correspondences between two channels is needed to ensure a low-error registration; and 2) a greater number of such correspondences generally results in lower registration error. Our prototype is tested on three multispectral datasets captured with UAV-mounted multispectral cameras. The resulting registration schemes had more control point correspondences on average than the traditional register-all-to-one-target-channel approach in all of our experiments. For most channels in our three datasets, our registration schemes produced lower back-projection error than the direct-to-target-channel based registration approach.
多光谱成像广泛应用于无人机和地面平台的遥感应用。多光谱相机通常为每个波长使用物理上不同的相机,导致不同成像波段的图像不对准。这种不对准必须在同步多波段图像分析之前进行校正。传统的多光谱图像配准方法是选择一个目标通道,然后将所有其他图像通道配准到目标上。没有客观的循证方法来选择目标。通常不考虑向目标的中间通道注册的可能性,但如果没有目标通道,那么直接注册对其他所有通道都表现良好,则可能是有益的。在本文中,我们提出了一个自动数据驱动的多光谱图像配准框架,该框架确定目标通道,并基于以下假设确定可能的中间配准步骤:1)两个通道之间需要一些合理的最小数量的控制点对应以确保低误差配准;2)匹配次数越多,配准误差越小。我们的原型在安装在无人机上的多光谱相机捕获的三个多光谱数据集上进行了测试。在我们所有的实验中,所得到的配准方案比传统的所有对一个目标通道的配准方法平均具有更多的控制点对应。对于我们三个数据集中的大多数通道,我们的配准方案比直接到目标通道的配准方法产生更低的反向投影误差。
{"title":"Data-Driven Multispectral Image Registration","authors":"Rahat Yasir, M. Eramian, I. Stavness, S. Shirtliffe, H. Duddu","doi":"10.1109/CRV.2018.00040","DOIUrl":"https://doi.org/10.1109/CRV.2018.00040","url":null,"abstract":"Multispectral imaging is widely used in remote sensing applications from UAVs and ground-based platforms. Multispectral cameras often use a physically different camera for each wavelength causing misalignment in the images for different imaging bands. This misalignment must be corrected prior to concurrent multi-band image analysis. The traditional approach for multispectral image registration process is to select a target channel and register all other image channels to the target. There is no objective evidence-based method to select a target. The possibility of registration to some intermediate channel to the target is not usually considered, but could be beneficial if there is no target channel for which direct registration performs well for every other channel. In this paper, we propose an automatic data-driven multispectral image registration framework that determines a target channel, and possible intermediate registration steps based on the assumptions that 1) some reasonable minimum number of control points correspondences between two channels is needed to ensure a low-error registration; and 2) a greater number of such correspondences generally results in lower registration error. Our prototype is tested on three multispectral datasets captured with UAV-mounted multispectral cameras. The resulting registration schemes had more control point correspondences on average than the traditional register-all-to-one-target-channel approach in all of our experiments. For most channels in our three datasets, our registration schemes produced lower back-projection error than the direct-to-target-channel based registration approach.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"2010 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121347556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Surface-Based GICP 基于地表GICP
Pub Date : 2018-05-01 DOI: 10.1109/CRV.2018.00044
M. Vlaminck, H. Luong, W. Philips
In this paper we present an extension of the Generalized ICP algorithm for the registration of point clouds for use in lidar-based SLAM applications. As opposed to the plane-to-plane cost function, which assumes that each point set is locally planar, we propose to incorporate additional information on the underlying surface into the GICP process. Doing so, we are able to deal better with the artefacts that are typically present in lidar point clouds, including an inhomogeneous and sparse point density, noise and missing data. Experiments on lidar sequences of the KITTI benchmark demonstrate that we are able to substantially reduce the positional error compared to the original GICP algorithm.
在本文中,我们提出了一种广义ICP算法的扩展,用于点云的配准,用于基于激光雷达的SLAM应用。与平面到平面的代价函数相反,它假设每个点集都是局部平面的,我们建议将底层表面上的附加信息合并到GICP过程中。这样做,我们能够更好地处理激光雷达点云中通常存在的伪影,包括不均匀和稀疏的点密度、噪声和缺失的数据。在KITTI基准的激光雷达序列上的实验表明,与原始GICP算法相比,我们能够大大降低位置误差。
{"title":"Surface-Based GICP","authors":"M. Vlaminck, H. Luong, W. Philips","doi":"10.1109/CRV.2018.00044","DOIUrl":"https://doi.org/10.1109/CRV.2018.00044","url":null,"abstract":"In this paper we present an extension of the Generalized ICP algorithm for the registration of point clouds for use in lidar-based SLAM applications. As opposed to the plane-to-plane cost function, which assumes that each point set is locally planar, we propose to incorporate additional information on the underlying surface into the GICP process. Doing so, we are able to deal better with the artefacts that are typically present in lidar point clouds, including an inhomogeneous and sparse point density, noise and missing data. Experiments on lidar sequences of the KITTI benchmark demonstrate that we are able to substantially reduce the positional error compared to the original GICP algorithm.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132883883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
An Evaluation of Deep CNN Baselines for Scene-Independent Person Re-identification 场景无关人物再识别的深度CNN基线评价
Pub Date : 2018-05-01 DOI: 10.1109/CRV.2018.00049
P. Marchwica, Michael Jamieson, P. Siva
In recent years, a variety of proposed methods based on deep convolutional neural networks (CNNs) have improved the state of the art for large-scale person re-identification (ReID). While a large number of optimizations and network improvements have been proposed, there has been relatively little evaluation of the influence of training data and baseline network architecture. In particular, it is usually assumed either that networks are trained on labeled data from the deployment location (scene-dependent), or else adapted with unlabeled data, both of which complicate system deployment. In this paper, we investigate the feasibility of achieving scene-independent person ReID by forming a large composite dataset for training. We present an in-depth comparison of several CNN baseline architectures for both scene-dependent and scene-independent ReID, across a range of training dataset sizes. We show that scene-independent ReID can produce leading-edge results, competitive with unsupervised domain adaption techniques. Finally, we introduce a new dataset for comparing within-camera and across-camera person ReID.
近年来,基于深度卷积神经网络(cnn)的各种方法已经提高了大规模人物再识别(ReID)的技术水平。虽然已经提出了大量的优化和网络改进,但对训练数据和基线网络架构的影响的评估相对较少。特别是,通常假设网络是在来自部署位置(依赖于场景)的标记数据上进行训练的,或者使用未标记的数据进行训练,这两种情况都会使系统部署复杂化。在本文中,我们通过形成一个大型的复合数据集进行训练,来研究实现场景无关的人物识别的可行性。我们在一系列训练数据集大小的范围内,对场景依赖和场景独立的ReID的几种CNN基线架构进行了深入比较。我们表明,场景无关的ReID可以产生领先的结果,与无监督域自适应技术竞争。最后,我们引入了一个新的数据集来比较相机内和跨相机的人物ReID。
{"title":"An Evaluation of Deep CNN Baselines for Scene-Independent Person Re-identification","authors":"P. Marchwica, Michael Jamieson, P. Siva","doi":"10.1109/CRV.2018.00049","DOIUrl":"https://doi.org/10.1109/CRV.2018.00049","url":null,"abstract":"In recent years, a variety of proposed methods based on deep convolutional neural networks (CNNs) have improved the state of the art for large-scale person re-identification (ReID). While a large number of optimizations and network improvements have been proposed, there has been relatively little evaluation of the influence of training data and baseline network architecture. In particular, it is usually assumed either that networks are trained on labeled data from the deployment location (scene-dependent), or else adapted with unlabeled data, both of which complicate system deployment. In this paper, we investigate the feasibility of achieving scene-independent person ReID by forming a large composite dataset for training. We present an in-depth comparison of several CNN baseline architectures for both scene-dependent and scene-independent ReID, across a range of training dataset sizes. We show that scene-independent ReID can produce leading-edge results, competitive with unsupervised domain adaption techniques. Finally, we introduce a new dataset for comparing within-camera and across-camera person ReID.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133844818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Walking on Thin Air: Environment-Free Physics-Based Markerless Motion Capture 在稀薄的空气中行走:基于无环境物理的无标记动作捕捉
Pub Date : 2018-05-01 DOI: 10.1109/CRV.2018.00031
M. Livne, L. Sigal, Marcus A. Brubaker, David J. Fleet
We propose a generative approach to physics-based motion capture. Unlike prior attempts to incorporate physics into tracking that assume the subject and scene geometry are calibrated and known a priori, our approach is automatic and online. This distinction is important since calibration of the environment is often difficult, especially for motions with props, uneven surfaces, or outdoor scenes. The use of physics in this context provides a natural framework to reason about contact and the plausibility of recovered motions. We propose a fast data-driven parametric body model, based on linear-blend skinning, which decouples deformations due to pose, anthropometrics and body shape. Pose (and shape) parameters are estimated using robust ICP optimization with physics-based dynamic priors that incorporate contact. Contact is estimated from torque trajectories and predictions of which contact points were active. To our knowledge, this is the first approach to take physics into account without explicit a priori knowledge of the environment or body dimensions. We demonstrate effective tracking from a noisy single depth camera, improving on state-of-the-art results quantitatively and producing better qualitative results, reducing visual artifacts like foot-skate and jitter.
我们提出了一种基于物理的生成方法。不像之前的尝试将物理纳入跟踪,假设主题和场景几何是校准和先验已知的,我们的方法是自动和在线的。这种区别很重要,因为校准环境通常很困难,特别是对于带有道具的运动,不平整的表面或户外场景。在这种情况下,物理学的使用提供了一个自然的框架来推理接触和恢复运动的合理性。我们提出了一种基于线性混合蒙皮的快速数据驱动的参数化身体模型,该模型可以解耦由于姿势,人体测量和身体形状引起的变形。姿态(和形状)参数估计使用鲁棒ICP优化与物理为基础的动态先验,包括接触。接触是从扭矩轨迹估计的,并预测哪些接触点是活跃的。据我们所知,这是第一个在没有明确的环境或身体尺寸的先验知识的情况下考虑物理的方法。我们演示了从嘈杂的单深度相机进行有效跟踪,在定量上改进了最先进的结果,并产生了更好的定性结果,减少了像脚滑和抖动这样的视觉伪影。
{"title":"Walking on Thin Air: Environment-Free Physics-Based Markerless Motion Capture","authors":"M. Livne, L. Sigal, Marcus A. Brubaker, David J. Fleet","doi":"10.1109/CRV.2018.00031","DOIUrl":"https://doi.org/10.1109/CRV.2018.00031","url":null,"abstract":"We propose a generative approach to physics-based motion capture. Unlike prior attempts to incorporate physics into tracking that assume the subject and scene geometry are calibrated and known a priori, our approach is automatic and online. This distinction is important since calibration of the environment is often difficult, especially for motions with props, uneven surfaces, or outdoor scenes. The use of physics in this context provides a natural framework to reason about contact and the plausibility of recovered motions. We propose a fast data-driven parametric body model, based on linear-blend skinning, which decouples deformations due to pose, anthropometrics and body shape. Pose (and shape) parameters are estimated using robust ICP optimization with physics-based dynamic priors that incorporate contact. Contact is estimated from torque trajectories and predictions of which contact points were active. To our knowledge, this is the first approach to take physics into account without explicit a priori knowledge of the environment or body dimensions. We demonstrate effective tracking from a noisy single depth camera, improving on state-of-the-art results quantitatively and producing better qualitative results, reducing visual artifacts like foot-skate and jitter.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"353 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122763811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Pyramid CNN for Dense-Leaves Segmentation 密集叶分割的金字塔CNN
Pub Date : 2018-04-05 DOI: 10.1109/CRV.2018.00041
Daniel Morris
Automatic detection and segmentation of overlapping leaves in dense foliage can be a difficult task, particularly for leaves with strong textures and high occlusions. We present Dense-Leaves, an image dataset with ground truth segmentation labels that can be used to train and quantify algorithms for leaf segmentation in the wild. We also propose a pyramid convolutional neural network with multi-scale predictions that detects and discriminates leaf boundaries from interior textures. Using these detected boundaries, closed-contour boundaries around individual leaves are estimated with a watershed-based algorithm. The result is an instance segmenter for dense leaves. Promising segmentation results for leaves in dense foliage are obtained.
在茂密的树叶中,重叠叶子的自动检测和分割是一项艰巨的任务,特别是对于具有强纹理和高遮挡的叶子。我们提出了Dense-Leaves,这是一个具有地面真实分割标签的图像数据集,可用于训练和量化野外叶子分割算法。我们还提出了一个具有多尺度预测的金字塔卷积神经网络,用于检测和区分叶子边界和内部纹理。利用这些检测到的边界,使用基于分水岭的算法估计单个叶子周围的封闭轮廓边界。结果是密集叶子的实例分割器。在密集叶中获得了很好的分割结果。
{"title":"A Pyramid CNN for Dense-Leaves Segmentation","authors":"Daniel Morris","doi":"10.1109/CRV.2018.00041","DOIUrl":"https://doi.org/10.1109/CRV.2018.00041","url":null,"abstract":"Automatic detection and segmentation of overlapping leaves in dense foliage can be a difficult task, particularly for leaves with strong textures and high occlusions. We present Dense-Leaves, an image dataset with ground truth segmentation labels that can be used to train and quantify algorithms for leaf segmentation in the wild. We also propose a pyramid convolutional neural network with multi-scale predictions that detects and discriminates leaf boundaries from interior textures. Using these detected boundaries, closed-contour boundaries around individual leaves are estimated with a watershed-based algorithm. The result is an instance segmenter for dense leaves. Promising segmentation results for leaves in dense foliage are obtained.","PeriodicalId":281779,"journal":{"name":"2018 15th Conference on Computer and Robot Vision (CRV)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134192888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
期刊
2018 15th Conference on Computer and Robot Vision (CRV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1