Jia Zhang, A. Ogan, Tzu-Chien Liu, Y. Sung, Kuo-En Chang
It has been shown in numerous studies that the application of Augmented Reality (AR) to teaching and learning is beneficial, but determining the reasons behind its effectiveness, and in particular the characteristics of students for whom an AR is best suited, can bring forth new opportunities to integrate adaptive instruction and AR in the future. Through a quasi-experimental research design, our study recruited 66 participants in an 8-week long AR-assisted learning activity, and lag sequential analysis was used to analyze participants' behavior in an AR learning environment. We found that AR was more effective in enhancing the learning gains in elementary school science of learners who prefer a Kinesthetic approach to learning. We hypothesize that these effects are due to the increase in opportunity for hands-on activities, effectively increasing learners' concentration and passion for learning.
{"title":"The Influence of using Augmented Reality on Textbook Support for Learners of Different Learning Styles","authors":"Jia Zhang, A. Ogan, Tzu-Chien Liu, Y. Sung, Kuo-En Chang","doi":"10.1109/ISMAR.2016.26","DOIUrl":"https://doi.org/10.1109/ISMAR.2016.26","url":null,"abstract":"It has been shown in numerous studies that the application of Augmented Reality (AR) to teaching and learning is beneficial, but determining the reasons behind its effectiveness, and in particular the characteristics of students for whom an AR is best suited, can bring forth new opportunities to integrate adaptive instruction and AR in the future. Through a quasi-experimental research design, our study recruited 66 participants in an 8-week long AR-assisted learning activity, and lag sequential analysis was used to analyze participants' behavior in an AR learning environment. We found that AR was more effective in enhancing the learning gains in elementary school science of learners who prefer a Kinesthetic approach to learning. We hypothesize that these effects are due to the increase in opportunity for hands-on activities, effectively increasing learners' concentration and passion for learning.","PeriodicalId":146808,"journal":{"name":"2016 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132692804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Projectors are important display devices for large scale augmented reality applications. However, precisely calibrating projectors with large focus distances implies a trade-off between practicality and accuracy. People either need a huge calibration board or a precise 3D model [12]. In this paper, we present a practical projector-camera calibration method to solve this problem. The user only needs a small calibration board to calibrate the system regardless of the focus distance of the projector. Results show that the root-mean-squared re-projection error (RMSE) for a 450cm projection distance is only about 4mm, even though it is calibrated using a small B4 (250×353mm) calibration board.
{"title":"Practical and Precise Projector-Camera Calibration","authors":"Liming Yang, Jean-Marie Normand, G. Moreau","doi":"10.1109/ISMAR.2016.22","DOIUrl":"https://doi.org/10.1109/ISMAR.2016.22","url":null,"abstract":"Projectors are important display devices for large scale augmented reality applications. However, precisely calibrating projectors with large focus distances implies a trade-off between practicality and accuracy. People either need a huge calibration board or a precise 3D model [12]. In this paper, we present a practical projector-camera calibration method to solve this problem. The user only needs a small calibration board to calibrate the system regardless of the focus distance of the projector. Results show that the root-mean-squared re-projection error (RMSE) for a 450cm projection distance is only about 4mm, even though it is calibrated using a small B4 (250×353mm) calibration board.","PeriodicalId":146808,"journal":{"name":"2016 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130322460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reality Skins enables mobile and large-scale virtual reality experiences, dynamically generated based on the user's environment. A head-mounted display (HMD) coupled with a depth camera is used to scan the user's surroundings: reconstruct geometry, infer floor plans, and detect objects and obstacles. From these elements we generate a Reality Skin, a 3D environment which replaces office or apartment walls with the corridors of a spaceship or underground tunnels, replacing chairs and desks, sofas and beds with crates and computer consoles, fungi and crumbling ancient statues. The placement of walls, furniture and objects in the Reality Skin attempts to approximate reality, such that the user can move around, and touch virtual objects with tactile feedback from real objects. Each possible reality skins world consists of objects, materials and custom scripts. Taking cues from the user's surroundings, we create a unique environment combining these building blocks, attempting to preserve the geometry and semantics of the real world.We tackle 3D environment generation as a constraint satisfaction problem, and break it into two parts: First, we use a Markov Chain Monte-Carlo optimization, over a simple 2D polygonal model, to infer the layout of the environment (the structure of the virtual world). Then, we populate the world with various objects and characters, attempting to satisfy geometric (virtual objects should align with objects in the environment), semantic (a virtual chair aligns with a real one), physical (avoid collisions, maintain stability) and other constraints. We find a discrete set of transformations for each object satisfying unary constraints, incorporate pairwise and higher-order constraints, and optimize globally using a very recent technique based on semidefinite relaxation.
{"title":"Reality Skins: Creating Immersive and Tactile Virtual Environments","authors":"Lior Shapira, D. Freedman","doi":"10.1109/ISMAR.2016.23","DOIUrl":"https://doi.org/10.1109/ISMAR.2016.23","url":null,"abstract":"Reality Skins enables mobile and large-scale virtual reality experiences, dynamically generated based on the user's environment. A head-mounted display (HMD) coupled with a depth camera is used to scan the user's surroundings: reconstruct geometry, infer floor plans, and detect objects and obstacles. From these elements we generate a Reality Skin, a 3D environment which replaces office or apartment walls with the corridors of a spaceship or underground tunnels, replacing chairs and desks, sofas and beds with crates and computer consoles, fungi and crumbling ancient statues. The placement of walls, furniture and objects in the Reality Skin attempts to approximate reality, such that the user can move around, and touch virtual objects with tactile feedback from real objects. Each possible reality skins world consists of objects, materials and custom scripts. Taking cues from the user's surroundings, we create a unique environment combining these building blocks, attempting to preserve the geometry and semantics of the real world.We tackle 3D environment generation as a constraint satisfaction problem, and break it into two parts: First, we use a Markov Chain Monte-Carlo optimization, over a simple 2D polygonal model, to infer the layout of the environment (the structure of the virtual world). Then, we populate the world with various objects and characters, attempting to satisfy geometric (virtual objects should align with objects in the environment), semantic (a virtual chair aligns with a real one), physical (avoid collisions, maintain stability) and other constraints. We find a discrete set of transformations for each object satisfying unary constraints, incorporate pairwise and higher-order constraints, and optimize globally using a very recent technique based on semidefinite relaxation.","PeriodicalId":146808,"journal":{"name":"2016 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)","volume":"125 24","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132845714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dynamic occlusion handling is critical for correct depth perception in Augmented Reality (AR) applications. Consequently it is a key component to ensure realistic and immersive AR experiences. Existing solutions to tackle this challenge typically suffer from various limitations, e.g. assumption of a static scene or high computational complexity. In this work, we propose an algorithm for depth map enhancement for dynamic occlusion handling in AR applications. The key of our algorithm is an edge snapping approach, formulated as discrete optimization, that improves the consistency of object boundaries between RGB and depth data. The optimization problem is solved efficiently via dynamic programming and our system runs in near real-time on the tablet platform. Experimental evaluations demonstrate that our approach largely improves the raw sensor data and is particularly suitable compared to several related approaches in terms of both speed and quality. Furthermore, we demonstrate visually pleasing dynamic occlusion effects for multiple AR use cases based on our edge snapping results.
{"title":"Edge Snapping-Based Depth Enhancement for Dynamic Occlusion Handling in Augmented Reality","authors":"Chao Du, Yen-Lin Chen, Mao Ye, Liu Ren","doi":"10.1109/ISMAR.2016.17","DOIUrl":"https://doi.org/10.1109/ISMAR.2016.17","url":null,"abstract":"Dynamic occlusion handling is critical for correct depth perception in Augmented Reality (AR) applications. Consequently it is a key component to ensure realistic and immersive AR experiences. Existing solutions to tackle this challenge typically suffer from various limitations, e.g. assumption of a static scene or high computational complexity. In this work, we propose an algorithm for depth map enhancement for dynamic occlusion handling in AR applications. The key of our algorithm is an edge snapping approach, formulated as discrete optimization, that improves the consistency of object boundaries between RGB and depth data. The optimization problem is solved efficiently via dynamic programming and our system runs in near real-time on the tablet platform. Experimental evaluations demonstrate that our approach largely improves the raw sensor data and is particularly suitable compared to several related approaches in terms of both speed and quality. Furthermore, we demonstrate visually pleasing dynamic occlusion effects for multiple AR use cases based on our edge snapping results.","PeriodicalId":146808,"journal":{"name":"2016 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123101994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kuo-Chin Lien, B. Nuernberger, Tobias Höllerer, M. Turk
We present a method for collaborative augmented reality (AR) that enables users from different viewpoints to interpret object references specified via 2D on-screen circling gestures. Based on a user's 2D drawing annotation, the method segments out the userselected object using an incomplete or imperfect scene model and the color image from the drawing viewpoint. Specifically, we propose a novel segmentation algorithm that utilizes both 2D and 3D scene cues, structured into a three-layer graph of pixels, 3D points, and volumes (supervoxels), solved via standard graph cut algorithms. This segmentation enables an appropriate rendering of the user's 2D annotation from other viewpoints in 3D augmented reality. Results demonstrate the superiority of the proposed method over existing methods.
{"title":"PPV: Pixel-Point-Volume Segmentation for Object Referencing in Collaborative Augmented Reality","authors":"Kuo-Chin Lien, B. Nuernberger, Tobias Höllerer, M. Turk","doi":"10.1109/ISMAR.2016.21","DOIUrl":"https://doi.org/10.1109/ISMAR.2016.21","url":null,"abstract":"We present a method for collaborative augmented reality (AR) that enables users from different viewpoints to interpret object references specified via 2D on-screen circling gestures. Based on a user's 2D drawing annotation, the method segments out the userselected object using an incomplete or imperfect scene model and the color image from the drawing viewpoint. Specifically, we propose a novel segmentation algorithm that utilizes both 2D and 3D scene cues, structured into a three-layer graph of pixels, 3D points, and volumes (supervoxels), solved via standard graph cut algorithms. This segmentation enables an appropriate rendering of the user's 2D annotation from other viewpoints in 3D augmented reality. Results demonstrate the superiority of the proposed method over existing methods.","PeriodicalId":146808,"journal":{"name":"2016 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130673550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Discrepancy check is a well-known task in industrial Augmented Reality (AR). In this paper we present a new approach consisting of three main contributions: First, we propose a new two-step depth mapping algorithm for RGB-D cameras, which fuses depth images with given camera pose in real-time into a consistent 3D model. In a rigorous evaluation with two public benchmarks we show that our mapping outperforms the state-of-the-art in accuracy. Second, we propose a semi-automatic alignment algorithm, which rapidly aligns a reference model to the reconstruction. Third, we propose an algorithm for 3D discrepancy check based on pre-computed distances. In a systematic evaluation we show the superior performance of our approach compared to state-of-the-art 3D discrepancy checks.
{"title":"Augmented Reality 3D Discrepancy Check in Industrial Applications","authors":"Oliver Wasenmüller, Marcel Meyer, D. Stricker","doi":"10.1109/ISMAR.2016.15","DOIUrl":"https://doi.org/10.1109/ISMAR.2016.15","url":null,"abstract":"Discrepancy check is a well-known task in industrial Augmented Reality (AR). In this paper we present a new approach consisting of three main contributions: First, we propose a new two-step depth mapping algorithm for RGB-D cameras, which fuses depth images with given camera pose in real-time into a consistent 3D model. In a rigorous evaluation with two public benchmarks we show that our mapping outperforms the state-of-the-art in accuracy. Second, we propose a semi-automatic alignment algorithm, which rapidly aligns a reference model to the reconstruction. Third, we propose an algorithm for 3D discrepancy check based on pre-computed distances. In a systematic evaluation we show the superior performance of our approach compared to state-of-the-art 3D discrepancy checks.","PeriodicalId":146808,"journal":{"name":"2016 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114170594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexander Plopski, J. Orlosky, Yuta Itoh, Christian Nitschke, K. Kiyokawa, G. Klinker
Properly calibrating an optical see-through head-mounted display (OST-HMD) and maintaining a consistent calibration over time can be a very challenging task. Automated methods need an accurate model of both the OST-HMD screen and the user's constantly changing eye-position to correctly project virtual information. While some automated methods exist, they often have restrictions, including fixed eye-cameras that cannot be adjusted for different users.To address this problem, we have developed a method that automatically determines the position of an adjustable eye-tracking camera and its unconstrained position relative to the display. Unlike methods that require a fixed pose between the HMD and eye camera, our framework allows for automatic calibration even after adjustments of the camera to a particular individual's eye and even after the HMD moves on the user's face. Using two sets of IR-LEDs rigidly attached to the camera and OST-HMD frame, we can calculate the correct projection for different eye positions in real time and changes in HMD position within several frames. To verify the accuracy of our method, we conducted two experiments with a commercial HMD by calibrating a number of different eye and camera positions. Ground truth was measured through markers on both the camera and HMD screens, and we achieve a viewing accuracy of 1.66 degrees for the eyes of 5 different experiment participants.
{"title":"Automated Spatial Calibration of HMD Systems with Unconstrained Eye-cameras","authors":"Alexander Plopski, J. Orlosky, Yuta Itoh, Christian Nitschke, K. Kiyokawa, G. Klinker","doi":"10.1109/ISMAR.2016.16","DOIUrl":"https://doi.org/10.1109/ISMAR.2016.16","url":null,"abstract":"Properly calibrating an optical see-through head-mounted display (OST-HMD) and maintaining a consistent calibration over time can be a very challenging task. Automated methods need an accurate model of both the OST-HMD screen and the user's constantly changing eye-position to correctly project virtual information. While some automated methods exist, they often have restrictions, including fixed eye-cameras that cannot be adjusted for different users.To address this problem, we have developed a method that automatically determines the position of an adjustable eye-tracking camera and its unconstrained position relative to the display. Unlike methods that require a fixed pose between the HMD and eye camera, our framework allows for automatic calibration even after adjustments of the camera to a particular individual's eye and even after the HMD moves on the user's face. Using two sets of IR-LEDs rigidly attached to the camera and OST-HMD frame, we can calculate the correct projection for different eye positions in real time and changes in HMD position within several frames. To verify the accuracy of our method, we conducted two experiments with a commercial HMD by calibrating a number of different eye and camera positions. Ground truth was measured through markers on both the camera and HMD screens, and we achieve a viewing accuracy of 1.66 degrees for the eyes of 5 different experiment participants.","PeriodicalId":146808,"journal":{"name":"2016 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128967501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maadh Al Kalbani, Ian Williams, Maite Frutos Pascual
This article presents an analysis into the accuracy and problems of freehand grasping in exocentric Mixed Reality (MR). We report on two experiments (1710 grasps) which quantify the influence different virtual object shape, size and position has on the most common physical grasp, a medium wrap. We propose two methods for grasp measurement, namely, the Grasp Aperture (GAp) and Grasp Displacement (GDisp). Controlled laboratory conditions are used where 30 right-handed participants attempt to recreate a medium wrap grasp. We present a comprehensive statistical analysis of the results giving pairwise comparisons of all conditions under test. The results illustrate that user Grasp Aperture varies less than expected in comparison to the variation of virtual object size, with common aperture sizes found. Regarding the position of the virtual object, depth estimation is often mismatched due to under judgement of the z position and x, y displacement has common patterns. Results from this work can be applied to aid in the development of freehand grasping and considered as the first study into accuracy of freehand grasping in MR, provide a starting point for future interaction design.
{"title":"Analysis of Medium Wrap Freehand Virtual Object Grasping in Exocentric Mixed Reality","authors":"Maadh Al Kalbani, Ian Williams, Maite Frutos Pascual","doi":"10.1109/ISMAR.2016.14","DOIUrl":"https://doi.org/10.1109/ISMAR.2016.14","url":null,"abstract":"This article presents an analysis into the accuracy and problems of freehand grasping in exocentric Mixed Reality (MR). We report on two experiments (1710 grasps) which quantify the influence different virtual object shape, size and position has on the most common physical grasp, a medium wrap. We propose two methods for grasp measurement, namely, the Grasp Aperture (GAp) and Grasp Displacement (GDisp). Controlled laboratory conditions are used where 30 right-handed participants attempt to recreate a medium wrap grasp. We present a comprehensive statistical analysis of the results giving pairwise comparisons of all conditions under test. The results illustrate that user Grasp Aperture varies less than expected in comparison to the variation of virtual object size, with common aperture sizes found. Regarding the position of the virtual object, depth estimation is often mismatched due to under judgement of the z position and x, y displacement has common patterns. Results from this work can be applied to aid in the development of freehand grasping and considered as the first study into accuracy of freehand grasping in MR, provide a starting point for future interaction design.","PeriodicalId":146808,"journal":{"name":"2016 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121370802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we propose a novel method called s-DVO for dense visual odometry using a probabilistic sensor noise model. In contrast to sparse visual odometry, where camera poses are estimated based on matched visual features, we apply dense visual odometry which makes full use of all pixel information from an RGB-D camera. Previously, t-distribution was used to model photometric and geometric errors in order to reduce the impacts of outliers in the optimization. However, this approach has the limitation that it only uses the error value to determine outliers without considering the physical process. Therefore, we propose to apply a probabilistic sensor noise model to weigh each pixel by propagating linearized uncertainty. Furthermore, we find that the geometric errors are well represented with the sensor noise model, while the photometric errors are not. Finally we propose a hybrid approach which combines t-distribution for photometric errors and a probabilistic sensor noise model for geometric errors. We extend the dense visual odometry and develop a visual SLAM system that incorporates keyframe generation, loop constraint detection and graph optimization. Experimental results with standard benchmark datasets show that our algorithm outperforms previous methods by about a 25% reduction in the absolute trajectory error.
{"title":"σ-DVO: Sensor Noise Model Meets Dense Visual Odometry","authors":"B. W. Babu, Soohwan Kim, Zhixin Yan, Liu Ren","doi":"10.1109/ISMAR.2016.11","DOIUrl":"https://doi.org/10.1109/ISMAR.2016.11","url":null,"abstract":"In this paper we propose a novel method called s-DVO for dense visual odometry using a probabilistic sensor noise model. In contrast to sparse visual odometry, where camera poses are estimated based on matched visual features, we apply dense visual odometry which makes full use of all pixel information from an RGB-D camera. Previously, t-distribution was used to model photometric and geometric errors in order to reduce the impacts of outliers in the optimization. However, this approach has the limitation that it only uses the error value to determine outliers without considering the physical process. Therefore, we propose to apply a probabilistic sensor noise model to weigh each pixel by propagating linearized uncertainty. Furthermore, we find that the geometric errors are well represented with the sensor noise model, while the photometric errors are not. Finally we propose a hybrid approach which combines t-distribution for photometric errors and a probabilistic sensor noise model for geometric errors. We extend the dense visual odometry and develop a visual SLAM system that incorporates keyframe generation, loop constraint detection and graph optimization. Experimental results with standard benchmark datasets show that our algorithm outperforms previous methods by about a 25% reduction in the absolute trajectory error.","PeriodicalId":146808,"journal":{"name":"2016 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)","volume":"2002 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128309484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Camera pose estimation is the cornerstone of Augmented Reality applications. Pose tracking based on camera images exclusively has been shown to be sensitive to motion blur, occlusions, and illumination changes. Thus, a lot of work has been conducted over the last years on visual-inertial pose tracking using acceleration and angular velocity measurements from inertial sensors in order to improve the visual tracking. Most proposed systems use statistical filtering techniques to approach the sensor fusion problem, that require complex system modelling and calibrations in order to perform adequately. In this work we present a novel approach to sensor fusion using a deep learning method to learn the relation between camera poses and inertial sensor measurements. A long short-term memory model (LSTM) is trained to provide an estimate of the current pose based on previous poses and inertial measurements. This estimates then appropriately combined with the output of a visual tracking system using a linear Kalman Filter to provide a robust final pose estimate. Our experimental results confirm the applicability and tracking performance improvement gained from the proposed sensor fusion system.
{"title":"Learning to Fuse: A Deep Learning Approach to Visual-Inertial Camera Pose Estimation","authors":"J. Rambach, Aditya Tewari, A. Pagani, D. Stricker","doi":"10.1109/ISMAR.2016.19","DOIUrl":"https://doi.org/10.1109/ISMAR.2016.19","url":null,"abstract":"Camera pose estimation is the cornerstone of Augmented Reality applications. Pose tracking based on camera images exclusively has been shown to be sensitive to motion blur, occlusions, and illumination changes. Thus, a lot of work has been conducted over the last years on visual-inertial pose tracking using acceleration and angular velocity measurements from inertial sensors in order to improve the visual tracking. Most proposed systems use statistical filtering techniques to approach the sensor fusion problem, that require complex system modelling and calibrations in order to perform adequately. In this work we present a novel approach to sensor fusion using a deep learning method to learn the relation between camera poses and inertial sensor measurements. A long short-term memory model (LSTM) is trained to provide an estimate of the current pose based on previous poses and inertial measurements. This estimates then appropriately combined with the output of a visual tracking system using a linear Kalman Filter to provide a robust final pose estimate. Our experimental results confirm the applicability and tracking performance improvement gained from the proposed sensor fusion system.","PeriodicalId":146808,"journal":{"name":"2016 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)","volume":"2013 16","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120848879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}