We address the problem of multimedia event detection from videos captured 'in the wild,' in particular the fusion of cues from multiple aspects of the video's content: detected objects, observed motion, audio signatures, etc. We employ score fusion, also known as late fusion, and propose a method that learns local weightings of the various base classifier scores which respect the performance differences arising from the video quality. Classifiers working with visual texture features, for instance, are given reduced weight when applied to subsets of the video corpus with high compression, and the weights associated with the other classifiers are adjusted to reflect this lack of confidence. We present a method to automatically partition the video corpus into relevant subsets, and to learn local weightings which optimally fuse scores on a particular subset. Improvements in event detection performance are demonstrated on the TRECVid Multimedia Event Detection (MED) MED Test dataset, and comparisons are provided to several other score fusion methods.
{"title":"Metadata-Weighted Score Fusion for Multimedia Event Detection","authors":"Scott McCloskey, Jingchen Liu","doi":"10.1109/CRV.2014.47","DOIUrl":"https://doi.org/10.1109/CRV.2014.47","url":null,"abstract":"We address the problem of multimedia event detection from videos captured 'in the wild,' in particular the fusion of cues from multiple aspects of the video's content: detected objects, observed motion, audio signatures, etc. We employ score fusion, also known as late fusion, and propose a method that learns local weightings of the various base classifier scores which respect the performance differences arising from the video quality. Classifiers working with visual texture features, for instance, are given reduced weight when applied to subsets of the video corpus with high compression, and the weights associated with the other classifiers are adjusted to reflect this lack of confidence. We present a method to automatically partition the video corpus into relevant subsets, and to learn local weightings which optimally fuse scores on a particular subset. Improvements in event detection performance are demonstrated on the TRECVid Multimedia Event Detection (MED) MED Test dataset, and comparisons are provided to several other score fusion methods.","PeriodicalId":385422,"journal":{"name":"2014 Canadian Conference on Computer and Robot Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127817916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mrigank Rochan, Shafin Rahman, Neil D. B. Bruce, Yang Wang
We consider the problem of segmenting objects in weakly labeled video. A video is weakly labeled if it is associated with a tag (e.g. Youtube videos with tags) describing the main object present in the video. It is weakly labeled because the tag only indicates the presence/absence of the object, but does not give the detailed spatial/temporal location of the object in the video. Given a weakly labeled video, our method can automatically localize the object in each frame and segment it from the background. Our method is fully automatic and does not require any user-input. In principle, it can be applied to a video of any object class. We evaluate our proposed method on a dataset with more than 100 video shots. Our experimental results show that our method outperforms other baseline approaches.
{"title":"Segmenting Objects in Weakly Labeled Videos","authors":"Mrigank Rochan, Shafin Rahman, Neil D. B. Bruce, Yang Wang","doi":"10.1109/CRV.2014.24","DOIUrl":"https://doi.org/10.1109/CRV.2014.24","url":null,"abstract":"We consider the problem of segmenting objects in weakly labeled video. A video is weakly labeled if it is associated with a tag (e.g. Youtube videos with tags) describing the main object present in the video. It is weakly labeled because the tag only indicates the presence/absence of the object, but does not give the detailed spatial/temporal location of the object in the video. Given a weakly labeled video, our method can automatically localize the object in each frame and segment it from the background. Our method is fully automatic and does not require any user-input. In principle, it can be applied to a video of any object class. We evaluate our proposed method on a dataset with more than 100 video shots. Our experimental results show that our method outperforms other baseline approaches.","PeriodicalId":385422,"journal":{"name":"2014 Canadian Conference on Computer and Robot Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126637276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. P. Quintero, R. T. Fomena, A. Shademan, Oscar A. Ramirez, Martin Jägersand
We propose and develop an interactive semi-autonomous control of robot arms. Our system controls two interactions: (1) A user can naturally control a robot arm by a direct linkage to the arm motion from the tracked human skeleton. (2) An autonomous image-based visual servoing routine can be triggered for precise positioning. Coarse motions are executed by human teleoperation and fine motions by image-based visual servoing. A successful application of our proposed interaction is presented for a WAM arm equipped with an eye-in-hand camera.
{"title":"Interactive Teleoperation Interface for Semi-autonomous Control of Robot Arms","authors":"C. P. Quintero, R. T. Fomena, A. Shademan, Oscar A. Ramirez, Martin Jägersand","doi":"10.1109/CRV.2014.55","DOIUrl":"https://doi.org/10.1109/CRV.2014.55","url":null,"abstract":"We propose and develop an interactive semi-autonomous control of robot arms. Our system controls two interactions: (1) A user can naturally control a robot arm by a direct linkage to the arm motion from the tracked human skeleton. (2) An autonomous image-based visual servoing routine can be triggered for precise positioning. Coarse motions are executed by human teleoperation and fine motions by image-based visual servoing. A successful application of our proposed interaction is presented for a WAM arm equipped with an eye-in-hand camera.","PeriodicalId":385422,"journal":{"name":"2014 Canadian Conference on Computer and Robot Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121994130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
River Allen, Neil MacMillan, D. Marinakis, R. Nishat, Rayhan Rahman, S. Whitesides
Instrumentation of an environment with sensors can provide an effective and scalable localization solution for robots. Where GPS is not available, beacons that provide position estimates to a robot must be placed effectively in order to maximize a robots navigation accuracy and robustness. Sonar range-based beacons are reasonable candidates for low cost position estimate sensors. In this paper we explore heuristics derived from computational geometry to estimate the effectiveness of sonar beacon deployments given a predefined mobile robot path. Results from numerical simulations and experimentation demonstrate the effectiveness and scalability of our approach.
{"title":"The Range Beacon Placement Problem for Robot Navigation","authors":"River Allen, Neil MacMillan, D. Marinakis, R. Nishat, Rayhan Rahman, S. Whitesides","doi":"10.1109/CRV.2014.28","DOIUrl":"https://doi.org/10.1109/CRV.2014.28","url":null,"abstract":"Instrumentation of an environment with sensors can provide an effective and scalable localization solution for robots. Where GPS is not available, beacons that provide position estimates to a robot must be placed effectively in order to maximize a robots navigation accuracy and robustness. Sonar range-based beacons are reasonable candidates for low cost position estimate sensors. In this paper we explore heuristics derived from computational geometry to estimate the effectiveness of sonar beacon deployments given a predefined mobile robot path. Results from numerical simulations and experimentation demonstrate the effectiveness and scalability of our approach.","PeriodicalId":385422,"journal":{"name":"2014 Canadian Conference on Computer and Robot Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127726987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Among many illumination robust approaches, scale-space decomposition based methods play an important role to reduce the lighting effects in face images. However, most of the existing scale-space decomposition methods perform recognition, based on the illumination-invariant small-scale features only. We propose a scale-space decomposition based face recognition approach that extracts the features of different scales through the TV+L1 model and wavelet transform. The approach represents a subject's face image via a subspace spanned by linear combination of the features of different scales. To decide the proper identity of the probe, the nearest neighbor (NN) approach is used to measure the similarities between a probe face image and subspace representations of gallery face images. Experiments on various benchmarks have demonstrated that the system outperforms many recognition methods in the same category.
{"title":"Scale-Space Decomposition and Nearest Linear Combination Based Approach for Face Recognition","authors":"F. A. Hoque, Liang Chen","doi":"10.1109/CRV.2014.37","DOIUrl":"https://doi.org/10.1109/CRV.2014.37","url":null,"abstract":"Among many illumination robust approaches, scale-space decomposition based methods play an important role to reduce the lighting effects in face images. However, most of the existing scale-space decomposition methods perform recognition, based on the illumination-invariant small-scale features only. We propose a scale-space decomposition based face recognition approach that extracts the features of different scales through the TV+L1 model and wavelet transform. The approach represents a subject's face image via a subspace spanned by linear combination of the features of different scales. To decide the proper identity of the probe, the nearest neighbor (NN) approach is used to measure the similarities between a probe face image and subspace representations of gallery face images. Experiments on various benchmarks have demonstrated that the system outperforms many recognition methods in the same category.","PeriodicalId":385422,"journal":{"name":"2014 Canadian Conference on Computer and Robot Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127913776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we present a control method of robotic system using electromyography (EMG) signals collected by surface EMG electrodes. The EMG signals are analyzed using a neuromusculoskeletal (NMS) model that represents at the same time the muscle and the skeleton of the body. It has the advantage of adding external forces to the model without changing the initial parameters which is particularly useful for the control of exoskeletons. The algorithm has been validated through experiments consisting of moving only the elbow joint freely or while handling a barbell having various sets of loads. The results of our algorithm are then compared to the motions obtained by a motion capture system during the same session. The comparison points out the efficiency of our algorithm for predicting and estimating the arm motion using only EMG signals.
{"title":"Toward a Unified Framework for EMG Signals Processing and Controlling an Exoskeleton","authors":"G. Durandau, W. Suleiman","doi":"10.1109/CRV.2014.46","DOIUrl":"https://doi.org/10.1109/CRV.2014.46","url":null,"abstract":"In this paper, we present a control method of robotic system using electromyography (EMG) signals collected by surface EMG electrodes. The EMG signals are analyzed using a neuromusculoskeletal (NMS) model that represents at the same time the muscle and the skeleton of the body. It has the advantage of adding external forces to the model without changing the initial parameters which is particularly useful for the control of exoskeletons. The algorithm has been validated through experiments consisting of moving only the elbow joint freely or while handling a barbell having various sets of loads. The results of our algorithm are then compared to the motions obtained by a motion capture system during the same session. The comparison points out the efficiency of our algorithm for predicting and estimating the arm motion using only EMG signals.","PeriodicalId":385422,"journal":{"name":"2014 Canadian Conference on Computer and Robot Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116402753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Straight line fitting is an important problem in computer and robot vision. We propose a novel method for least squares line fitting that uses both the point coordinates and the local gradient orientation to fit an optimal line by minimizing the proposed algebraic distance. The proposed inclusion of gradient orientation offers several advantages: (a) one data point is sufficient for the line fit, (b) for the same number of points the fit is more precise due to inclusion of gradient orientation, and (c) outliers can be rejected based on the gradient orientation or the distance to line.
{"title":"Using Gradient Orientation to Improve Least Squares Line Fitting","authors":"T. Petković, S. Lončarić","doi":"10.1109/CRV.2014.38","DOIUrl":"https://doi.org/10.1109/CRV.2014.38","url":null,"abstract":"Straight line fitting is an important problem in computer and robot vision. We propose a novel method for least squares line fitting that uses both the point coordinates and the local gradient orientation to fit an optimal line by minimizing the proposed algebraic distance. The proposed inclusion of gradient orientation offers several advantages: (a) one data point is sufficient for the line fit, (b) for the same number of points the fit is more precise due to inclusion of gradient orientation, and (c) outliers can be rejected based on the gradient orientation or the distance to line.","PeriodicalId":385422,"journal":{"name":"2014 Canadian Conference on Computer and Robot Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130794389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Collision avoidance for small unmanned aerial vehicles operating in a variety of environments is limited by the types of available depth sensors. Currently, there are no sensors that are lightweight, function outdoors in sunlight, and cover enough of a field of view to be useful in complex environments, although many sensors excel in one or two of these areas. We present a new depth estimation method, based on concepts from multi-view stereo and structured light methods, that uses only lightweight miniature cameras and a small laser dot matrix projector to produce measurements in the range of 1-12 meters. The field of view of the system is limited only by the number and type of cameras/projectors used, and can be fully Omni directional if desired. The sensitivity of the system to design and calibration parameters is tested in simulation, and results from a functional prototype are presented.
{"title":"Towards Full Omnidirectional Depth Sensing Using Active Vision for Small Unmanned Aerial Vehicles","authors":"A. Harmat, I. Sharf","doi":"10.1109/CRV.2014.12","DOIUrl":"https://doi.org/10.1109/CRV.2014.12","url":null,"abstract":"Collision avoidance for small unmanned aerial vehicles operating in a variety of environments is limited by the types of available depth sensors. Currently, there are no sensors that are lightweight, function outdoors in sunlight, and cover enough of a field of view to be useful in complex environments, although many sensors excel in one or two of these areas. We present a new depth estimation method, based on concepts from multi-view stereo and structured light methods, that uses only lightweight miniature cameras and a small laser dot matrix projector to produce measurements in the range of 1-12 meters. The field of view of the system is limited only by the number and type of cameras/projectors used, and can be fully Omni directional if desired. The sensitivity of the system to design and calibration parameters is tested in simulation, and results from a functional prototype are presented.","PeriodicalId":385422,"journal":{"name":"2014 Canadian Conference on Computer and Robot Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114381817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Visual Odometry (VO) is an integral part of many navigation techniques in mobile robotics. In this work, we investigate how the orientation of the camera affects the overall position estimates recovered from stereo VO. Through simulations and experimental work, we demonstrate that this error can be significantly reduced by changing the perspective of the stereo camera in relation to the moving platform. Specifically, we show that orienting the camera at an oblique angle to the direction of travel can reduce VO error by up to 82% in simulations and up to 59% in experimental data. A variety of parameters are investigated for their effects on this trend including frequency of captured images and camera resolution.
{"title":"Optimizing Camera Perspective for Stereo Visual Odometry","authors":"Valentin Peretroukhin, Jonathan Kelly, T. Barfoot","doi":"10.1109/CRV.2014.9","DOIUrl":"https://doi.org/10.1109/CRV.2014.9","url":null,"abstract":"Visual Odometry (VO) is an integral part of many navigation techniques in mobile robotics. In this work, we investigate how the orientation of the camera affects the overall position estimates recovered from stereo VO. Through simulations and experimental work, we demonstrate that this error can be significantly reduced by changing the perspective of the stereo camera in relation to the moving platform. Specifically, we show that orienting the camera at an oblique angle to the direction of travel can reduce VO error by up to 82% in simulations and up to 59% in experimental data. A variety of parameters are investigated for their effects on this trend including frequency of captured images and camera resolution.","PeriodicalId":385422,"journal":{"name":"2014 Canadian Conference on Computer and Robot Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125505800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Doug Cox, Darren Fairall, Neil MacMillan, D. Marinakis, D. Meger, Saamaan Pourtavakoli, Kyle Weston
This paper addresses the problem of inferring human trajectories through an environment using low frequency, low fidelity data from a sensor network. We present a novel "recombine" proposal for Markov Chain construction and use the new proposal to devise a probabilistic trajectory inference algorithm that generates likely trajectories given raw sensor data. We also propose a novel, low-power, long range, 900 MHz IEEE 802.15.4 compliant sensor network that makes outdoors deployment viable. Finally, we present experimental results from our deployment at a retail environment.
{"title":"Trajectory Inference Using a Motion Sensing Network","authors":"Doug Cox, Darren Fairall, Neil MacMillan, D. Marinakis, D. Meger, Saamaan Pourtavakoli, Kyle Weston","doi":"10.1109/CRV.2014.29","DOIUrl":"https://doi.org/10.1109/CRV.2014.29","url":null,"abstract":"This paper addresses the problem of inferring human trajectories through an environment using low frequency, low fidelity data from a sensor network. We present a novel \"recombine\" proposal for Markov Chain construction and use the new proposal to devise a probabilistic trajectory inference algorithm that generates likely trajectories given raw sensor data. We also propose a novel, low-power, long range, 900 MHz IEEE 802.15.4 compliant sensor network that makes outdoors deployment viable. Finally, we present experimental results from our deployment at a retail environment.","PeriodicalId":385422,"journal":{"name":"2014 Canadian Conference on Computer and Robot Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126992044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}