Iñigo Alonso, Ana B. Cambra, A. Muñoz, T. Treibitz, A. C. Murillo
Biological datasets, such as our case of study, coral segmentation, often present scarce and sparse annotated image labels. Transfer learning techniques allow us to adapt existing deep learning models to new domains, even with small amounts of training data. Therefore, one of the main challenges to train dense segmentation models is to obtain the required dense labeled training data. This work presents a novel pipeline to address this pitfall and demonstrates the advantages of applying it to coral imagery segmentation. We fine tune state-of-the-art encoder-decoder CNN models for semantic segmentation thanks to a new proposed augmented labeling strategy. Our experiments run on a recent coral dataset [4], proving that this augmented ground truth allows us to effectively learn coral segmentation, as well as provide a relevant score of the segmentation quality based on it. Our approach provides a segmentation of comparable or better quality than the baseline presented with the dataset and a more flexible end-to-end pipeline.
{"title":"Coral-Segmentation: Training Dense Labeling Models with Sparse Ground Truth","authors":"Iñigo Alonso, Ana B. Cambra, A. Muñoz, T. Treibitz, A. C. Murillo","doi":"10.1109/ICCVW.2017.339","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.339","url":null,"abstract":"Biological datasets, such as our case of study, coral segmentation, often present scarce and sparse annotated image labels. Transfer learning techniques allow us to adapt existing deep learning models to new domains, even with small amounts of training data. Therefore, one of the main challenges to train dense segmentation models is to obtain the required dense labeled training data. This work presents a novel pipeline to address this pitfall and demonstrates the advantages of applying it to coral imagery segmentation. We fine tune state-of-the-art encoder-decoder CNN models for semantic segmentation thanks to a new proposed augmented labeling strategy. Our experiments run on a recent coral dataset [4], proving that this augmented ground truth allows us to effectively learn coral segmentation, as well as provide a relevant score of the segmentation quality based on it. Our approach provides a segmentation of comparable or better quality than the baseline presented with the dataset and a more flexible end-to-end pipeline.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122250447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giorgos Kordopatis-Zilos, S. Papadopoulos, I. Patras, Y. Kompatsiaris
This work addresses the problem of Near-Duplicate Video Retrieval (NDVR). We propose an effective video-level NDVR scheme based on deep metric learning that leverages Convolutional Neural Network (CNN) features from intermediate layers to generate discriminative global video representations in tandem with a Deep Metric Learning (DML) framework with two fusion variations, trained to approximate an embedding function for accurate distance calculation between two near-duplicate videos. In contrast to most state-of-the-art methods, which exploit information deriving from the same source of data for both development and evaluation (which usually results to dataset-specific solutions), the proposed model is fed during training with sampled triplets generated from an independent dataset and is thoroughly tested on the widely used CC_WEB_VIDEO dataset, using two popular deep CNN architectures (AlexNet, GoogleNet). We demonstrate that the proposed approach achieves outstanding performance against the state-of-the-art, either with or without access to the evaluation dataset.
{"title":"Near-Duplicate Video Retrieval with Deep Metric Learning","authors":"Giorgos Kordopatis-Zilos, S. Papadopoulos, I. Patras, Y. Kompatsiaris","doi":"10.1109/ICCVW.2017.49","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.49","url":null,"abstract":"This work addresses the problem of Near-Duplicate Video Retrieval (NDVR). We propose an effective video-level NDVR scheme based on deep metric learning that leverages Convolutional Neural Network (CNN) features from intermediate layers to generate discriminative global video representations in tandem with a Deep Metric Learning (DML) framework with two fusion variations, trained to approximate an embedding function for accurate distance calculation between two near-duplicate videos. In contrast to most state-of-the-art methods, which exploit information deriving from the same source of data for both development and evaluation (which usually results to dataset-specific solutions), the proposed model is fed during training with sampled triplets generated from an independent dataset and is thoroughly tested on the widely used CC_WEB_VIDEO dataset, using two popular deep CNN architectures (AlexNet, GoogleNet). We demonstrate that the proposed approach achieves outstanding performance against the state-of-the-art, either with or without access to the evaluation dataset.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"2002 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129572568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Organs, cells and microstructures in cells dealt with in biomedical image analysis are volumetric data. We are required to process and analyse these data as volumetric data without embedding into higher-dimensional vector space from the viewpoints of object oriented data analysis. Sampled values of volumetric data are expressed as three-way array data. Therefore, principal component analysis of multi-way data is an essential technique for subspace-based pattern recognition, data retrievals and data compression of volumetric data. For one-way array (the vector form) problem the discrete cosine transform matrix is a good relaxed solution of the eigenmatrix for principal component analysis. This algebraic property of principal component analysis, derives an approximate fast algorithm for PCA of three-way data arrays.
{"title":"Fast Approximate Karhunen-Loève Transform for Three-Way Array Data","authors":"Hayato Itoh, A. Imiya, T. Sakai","doi":"10.1109/ICCVW.2017.216","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.216","url":null,"abstract":"Organs, cells and microstructures in cells dealt with in biomedical image analysis are volumetric data. We are required to process and analyse these data as volumetric data without embedding into higher-dimensional vector space from the viewpoints of object oriented data analysis. Sampled values of volumetric data are expressed as three-way array data. Therefore, principal component analysis of multi-way data is an essential technique for subspace-based pattern recognition, data retrievals and data compression of volumetric data. For one-way array (the vector form) problem the discrete cosine transform matrix is a good relaxed solution of the eigenmatrix for principal component analysis. This algebraic property of principal component analysis, derives an approximate fast algorithm for PCA of three-way data arrays.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128240831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Menghan Xia, Jian Yao, Renping Xie, Mi Zhang, Jinsheng Xiao
Color consistency correction is a challenging problem in image stitching, because it matters several factors, including tone, contrast and fidelity, to present a natural appearance. In this paper, we propose an effective color correction method which is feasible to optimize the color consistency across images and guarantee the imaging quality of individual image meanwhile. Our method first apply well-directed alteration detection algorithms to find coherent-content regions in inter-image overlaps where reliable color correspondences are extracted. Then, we parameterize the color remapping curve as transform model, and express the constraints of color consistency, contrast and gradient in an uniform energy function. It can be formulated as a convex quadratic programming problem which provides the global optimal solution efficiently. Our method has a good performance in color consistency and suffers no pixel saturation or tonal dimming. Experimental results of representative datasets demonstrate the superiority of our method over state-of-the-art algorithms.
{"title":"Color Consistency Correction Based on Remapping Optimization for Image Stitching","authors":"Menghan Xia, Jian Yao, Renping Xie, Mi Zhang, Jinsheng Xiao","doi":"10.1109/ICCVW.2017.351","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.351","url":null,"abstract":"Color consistency correction is a challenging problem in image stitching, because it matters several factors, including tone, contrast and fidelity, to present a natural appearance. In this paper, we propose an effective color correction method which is feasible to optimize the color consistency across images and guarantee the imaging quality of individual image meanwhile. Our method first apply well-directed alteration detection algorithms to find coherent-content regions in inter-image overlaps where reliable color correspondences are extracted. Then, we parameterize the color remapping curve as transform model, and express the constraints of color consistency, contrast and gradient in an uniform energy function. It can be formulated as a convex quadratic programming problem which provides the global optimal solution efficiently. Our method has a good performance in color consistency and suffers no pixel saturation or tonal dimming. Experimental results of representative datasets demonstrate the superiority of our method over state-of-the-art algorithms.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129180135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a novel mid-level representation for action/activity recognition on RGB videos. We model the evolution of improved dense trajectory features not only for the entire video sequence, but also on subparts of the video. Subparts are obtained using a spectral divisive clustering that yields an unordered binary tree decomposing the entire cloud of trajectories of a sequence. We then compute video-darwin on video subparts, exploiting more finegrained temporal information and reducing the sensitivity of the standard time varying mean strategy of videodarwin. After decomposition, we model the evolution of features through both frames of subparts and descending/ascending paths in tree branches. We refer to these mid-level representations as node-darwintree and branch-darwintree respectively. For the final classification, we construct a kernel representation for both mid-level and holistic videodarwin representations. Our approach achieves better performance than standard videodarwin and defines the current state-of-the-art on UCF-Sports and Highfive action recognition datasets.
{"title":"Darwintrees for Action Recognition","authors":"Albert Clapés, T. Tuytelaars, Sergio Escalera","doi":"10.1109/ICCVW.2017.375","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.375","url":null,"abstract":"We propose a novel mid-level representation for action/activity recognition on RGB videos. We model the evolution of improved dense trajectory features not only for the entire video sequence, but also on subparts of the video. Subparts are obtained using a spectral divisive clustering that yields an unordered binary tree decomposing the entire cloud of trajectories of a sequence. We then compute video-darwin on video subparts, exploiting more finegrained temporal information and reducing the sensitivity of the standard time varying mean strategy of videodarwin. After decomposition, we model the evolution of features through both frames of subparts and descending/ascending paths in tree branches. We refer to these mid-level representations as node-darwintree and branch-darwintree respectively. For the final classification, we construct a kernel representation for both mid-level and holistic videodarwin representations. Our approach achieves better performance than standard videodarwin and defines the current state-of-the-art on UCF-Sports and Highfive action recognition datasets.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130664906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Particle tracking is of fundamental importance in diverse quantitative analyses of dynamic intracellular processes using time-lapse microscopy. Due to frequent impracticability of tracking particles manually, a number of fully automated algorithms have been developed over past decades, carrying out the tracking task in two subsequent phases: (1) particle detection and (2) particle linking. An objective benchmark for assessing the performance of such algorithms was recently established by the Particle Tracking Challenge. Because its performance evaluation protocol finds correspondences between a reference and algorithm-generated tracking result at the level of individual tracks, the performance assessment strongly depends on the algorithm linking capabilities. In this paper, we propose a novel performance evaluation protocol based on a simplified version of the tracking accuracy measure employed in the Cell Tracking Challenge, which establishes the correspondences at the level of individual particle detections, thus allowing one to evaluate the performance of each of the two phases in an isolated, unbiased manner By analyzing the tracking results of all 14 algorithms competing in the Particle Tracking Challenge using the proposed evaluation protocol, we reveal substantial changes in their detection and linking performance, yielding rankings different from those reported previously.
{"title":"Particle Tracking Accuracy Measurement Based on Comparison of Linear Oriented Forests","authors":"M. Maška, P. Matula","doi":"10.1109/ICCVW.2017.8","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.8","url":null,"abstract":"Particle tracking is of fundamental importance in diverse quantitative analyses of dynamic intracellular processes using time-lapse microscopy. Due to frequent impracticability of tracking particles manually, a number of fully automated algorithms have been developed over past decades, carrying out the tracking task in two subsequent phases: (1) particle detection and (2) particle linking. An objective benchmark for assessing the performance of such algorithms was recently established by the Particle Tracking Challenge. Because its performance evaluation protocol finds correspondences between a reference and algorithm-generated tracking result at the level of individual tracks, the performance assessment strongly depends on the algorithm linking capabilities. In this paper, we propose a novel performance evaluation protocol based on a simplified version of the tracking accuracy measure employed in the Cell Tracking Challenge, which establishes the correspondences at the level of individual particle detections, thus allowing one to evaluate the performance of each of the two phases in an isolated, unbiased manner By analyzing the tracking results of all 14 algorithms competing in the Particle Tracking Challenge using the proposed evaluation protocol, we reveal substantial changes in their detection and linking performance, yielding rankings different from those reported previously.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"204 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124214110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a unified deep learning framework to recover hyperspectral images from spectrally undersampled projections. Specifically, we investigate two kinds of representative projections, RGB and compressive sensing (CS) measurements. These measurements are first upsampled in the spectral dimension through simple interpolation or CS reconstruction, and the proposed method learns an end-to-end mapping from a large number of up-sampled/groundtruth hyperspectral image pairs. The mapping is represented as a deep convolutional neural network (CNN) that takes the spectrally upsampled image as input and outputs the enhanced hyperspetral one. We explore different network configurations to achieve high reconstruction fidelity. Experimental results on a variety of test images demonstrate significantly improved performance of the proposed method over the state-of-the-arts.
{"title":"HSCNN: CNN-Based Hyperspectral Image Recovery from Spectrally Undersampled Projections","authors":"Zhiwei Xiong, Zhan Shi, Huiqun Li, Lizhi Wang, Dong Liu, Feng Wu","doi":"10.1109/ICCVW.2017.68","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.68","url":null,"abstract":"This paper presents a unified deep learning framework to recover hyperspectral images from spectrally undersampled projections. Specifically, we investigate two kinds of representative projections, RGB and compressive sensing (CS) measurements. These measurements are first upsampled in the spectral dimension through simple interpolation or CS reconstruction, and the proposed method learns an end-to-end mapping from a large number of up-sampled/groundtruth hyperspectral image pairs. The mapping is represented as a deep convolutional neural network (CNN) that takes the spectrally upsampled image as input and outputs the enhanced hyperspetral one. We explore different network configurations to achieve high reconstruction fidelity. Experimental results on a variety of test images demonstrate significantly improved performance of the proposed method over the state-of-the-arts.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121209780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marcelo Cicconet, David Grant Colburn Hildebrand, H. Elliott
We demonstrate that the problem of fitting a plane of mirror symmetry to data in any Euclidian space can be reduced to the problem of registering two datasets, and that the exactness of the solution depends entirely on the registration accuracy. This new Mirror Symmetry via Registration (MSR) framework involves (1) data reflection with respect to an arbitrary plane, (2) registration of original and reflected datasets, and (3) calculation of the eigenvector of eigenvalue -1 for the transformation matrix representing the reflection and registration mappings. To support MSR, we also introduce a novel 2D registration method based on random sample consensus of an ensemble of normalized cross-correlation matches. We further demonstrate the generality of MSR by testing it on a database of 3D shapes with an iterative closest point registration back-end.
{"title":"Finding Mirror Symmetry via Registration and Optimal Symmetric Pairwise Assignment of Curves: Algorithm and Results","authors":"Marcelo Cicconet, David Grant Colburn Hildebrand, H. Elliott","doi":"10.1109/ICCVW.2017.207","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.207","url":null,"abstract":"We demonstrate that the problem of fitting a plane of mirror symmetry to data in any Euclidian space can be reduced to the problem of registering two datasets, and that the exactness of the solution depends entirely on the registration accuracy. This new Mirror Symmetry via Registration (MSR) framework involves (1) data reflection with respect to an arbitrary plane, (2) registration of original and reflected datasets, and (3) calculation of the eigenvector of eigenvalue -1 for the transformation matrix representing the reflection and registration mappings. To support MSR, we also introduce a novel 2D registration method based on random sample consensus of an ensemble of normalized cross-correlation matches. We further demonstrate the generality of MSR by testing it on a database of 3D shapes with an iterative closest point registration back-end.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114637471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Harada, Kuniaki Saito, Yusuke Mukuta, Y. Ushiku
In this work, we propose a novel method to learn the mapping to the common space wherein different modalities have the same information for shared representation learning. Our goal is to correctly classify the target modality with a classifier trained on source modality samples and their labels in common representations. We call these representations modality-invariant representations. Our proposed method has the major advantage of not needing any labels for the target samples in order to learn representations. For example, we obtain modality-invariant representations from pairs of images and texts. Then, we train the text classifier on the modality-invariant space. Although we do not give any explicit relationship between images and labels, we can expect that images can be classified correctly in that space. Our method draws upon the theory of domain adaptation and we propose to learn modality-invariant representations by utilizing adversarial training. We call our method the Deep Modality Invariant Adversarial Network (DeMIAN). We demonstrate the effectiveness of our method in experiments.
{"title":"Deep Modality Invariant Adversarial Network for Shared Representation Learning","authors":"T. Harada, Kuniaki Saito, Yusuke Mukuta, Y. Ushiku","doi":"10.1109/ICCVW.2017.311","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.311","url":null,"abstract":"In this work, we propose a novel method to learn the mapping to the common space wherein different modalities have the same information for shared representation learning. Our goal is to correctly classify the target modality with a classifier trained on source modality samples and their labels in common representations. We call these representations modality-invariant representations. Our proposed method has the major advantage of not needing any labels for the target samples in order to learn representations. For example, we obtain modality-invariant representations from pairs of images and texts. Then, we train the text classifier on the modality-invariant space. Although we do not give any explicit relationship between images and labels, we can expect that images can be classified correctly in that space. Our method draws upon the theory of domain adaptation and we propose to learn modality-invariant representations by utilizing adversarial training. We call our method the Deep Modality Invariant Adversarial Network (DeMIAN). We demonstrate the effectiveness of our method in experiments.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121705137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniele De Gregorio, Tommaso Cavallari, L. D. Stefano
We introduce SkiMap++, an extension to the recently proposed SkiMap mapping framework for robot navigation [1]. The extension deals with enriching the map with semantic information concerning the presence in the environment of certain objects that may be usefully recognized by the robot, e.g. for the sake of grasping them. More precisely, the map can accommodate information about the spatial locations of certain 3D object features, as determined by matching the visual features extracted from the incoming frames through a random forest learned off-line from a set of object models. Thereby, evidence about the presence of object features is gathered from multiple vantage points alongside with the standard geometric mapping task, so to enable recognizing the objects and estimating their 6 DOF poses. As a result, SkiMap++ can reconstruct the geometry of large scale environments as well as localize some relevant objects therein (Fig.1) in real-time on CPU. As an additional contribution, we present an RGB-D dataset featuring ground-truth camera and object poses, which may be deployed by researchers interested in pursuing SLAM alongside with object recognition, a topic often referred to as Semantic SLAM. 1
{"title":"SkiMap++: Real-Time Mapping and Object Recognition for Robotics","authors":"Daniele De Gregorio, Tommaso Cavallari, L. D. Stefano","doi":"10.1109/ICCVW.2017.84","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.84","url":null,"abstract":"We introduce SkiMap++, an extension to the recently proposed SkiMap mapping framework for robot navigation [1]. The extension deals with enriching the map with semantic information concerning the presence in the environment of certain objects that may be usefully recognized by the robot, e.g. for the sake of grasping them. More precisely, the map can accommodate information about the spatial locations of certain 3D object features, as determined by matching the visual features extracted from the incoming frames through a random forest learned off-line from a set of object models. Thereby, evidence about the presence of object features is gathered from multiple vantage points alongside with the standard geometric mapping task, so to enable recognizing the objects and estimating their 6 DOF poses. As a result, SkiMap++ can reconstruct the geometry of large scale environments as well as localize some relevant objects therein (Fig.1) in real-time on CPU. As an additional contribution, we present an RGB-D dataset featuring ground-truth camera and object poses, which may be deployed by researchers interested in pursuing SLAM alongside with object recognition, a topic often referred to as Semantic SLAM. 1","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114766056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}