{"title":"Learning to Fuse: A Deep Learning Approach to Visual-Inertial Camera Pose Estimation","authors":"J. Rambach, Aditya Tewari, A. Pagani, D. Stricker","doi":"10.1109/ISMAR.2016.19","DOIUrl":null,"url":null,"abstract":"Camera pose estimation is the cornerstone of Augmented Reality applications. Pose tracking based on camera images exclusively has been shown to be sensitive to motion blur, occlusions, and illumination changes. Thus, a lot of work has been conducted over the last years on visual-inertial pose tracking using acceleration and angular velocity measurements from inertial sensors in order to improve the visual tracking. Most proposed systems use statistical filtering techniques to approach the sensor fusion problem, that require complex system modelling and calibrations in order to perform adequately. In this work we present a novel approach to sensor fusion using a deep learning method to learn the relation between camera poses and inertial sensor measurements. A long short-term memory model (LSTM) is trained to provide an estimate of the current pose based on previous poses and inertial measurements. This estimates then appropriately combined with the output of a visual tracking system using a linear Kalman Filter to provide a robust final pose estimate. Our experimental results confirm the applicability and tracking performance improvement gained from the proposed sensor fusion system.","PeriodicalId":146808,"journal":{"name":"2016 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)","volume":"2013 16","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"49","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISMAR.2016.19","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 49
Abstract
Camera pose estimation is the cornerstone of Augmented Reality applications. Pose tracking based on camera images exclusively has been shown to be sensitive to motion blur, occlusions, and illumination changes. Thus, a lot of work has been conducted over the last years on visual-inertial pose tracking using acceleration and angular velocity measurements from inertial sensors in order to improve the visual tracking. Most proposed systems use statistical filtering techniques to approach the sensor fusion problem, that require complex system modelling and calibrations in order to perform adequately. In this work we present a novel approach to sensor fusion using a deep learning method to learn the relation between camera poses and inertial sensor measurements. A long short-term memory model (LSTM) is trained to provide an estimate of the current pose based on previous poses and inertial measurements. This estimates then appropriately combined with the output of a visual tracking system using a linear Kalman Filter to provide a robust final pose estimate. Our experimental results confirm the applicability and tracking performance improvement gained from the proposed sensor fusion system.