Person following behavior is an important task for social robots. To enable robots to follow a person, we have to track the target in real-time without critical failures. There are many situations where the robot will potentially loose tracking in a dynamic environment, e.g., occlusion, illumination, pose-changes, etc. Often, people use a complex tracking algorithm to improve robustness. However, the trade-off is that their approaches may not able to run in real-time on mobile robots. In this paper, we present Selected Online Ada-Boosting (SOAB) technique, a modified Online Ada-Boosting (OAB) tracking algorithm with integrated scene depth information obtained from a stereo camera which runs in real-time on a mobile robot. We build and share our results on the performance of our technique on a new stereo dataset for the task of person following. The dataset covers different challenging situations like squatting, partial and complete occlusion of the target being tracked, people wearing similar clothes, appearance changes, walking facing the front and back side of the person to the robot, and normal walking.
{"title":"Person Following Robot Using Selected Online Ada-Boosting with Stereo Camera","authors":"B. Chen, Raghavender Sahdev, John K. Tsotsos","doi":"10.1109/CRV.2017.55","DOIUrl":"https://doi.org/10.1109/CRV.2017.55","url":null,"abstract":"Person following behavior is an important task for social robots. To enable robots to follow a person, we have to track the target in real-time without critical failures. There are many situations where the robot will potentially loose tracking in a dynamic environment, e.g., occlusion, illumination, pose-changes, etc. Often, people use a complex tracking algorithm to improve robustness. However, the trade-off is that their approaches may not able to run in real-time on mobile robots. In this paper, we present Selected Online Ada-Boosting (SOAB) technique, a modified Online Ada-Boosting (OAB) tracking algorithm with integrated scene depth information obtained from a stereo camera which runs in real-time on a mobile robot. We build and share our results on the performance of our technique on a new stereo dataset for the task of person following. The dataset covers different challenging situations like squatting, partial and complete occlusion of the target being tracked, people wearing similar clothes, appearance changes, walking facing the front and back side of the person to the robot, and normal walking.","PeriodicalId":308760,"journal":{"name":"2017 14th Conference on Computer and Robot Vision (CRV)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116854494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we introduce an edge-based segmentation algorithm designed for web pages. We consider each web page as an image and perform segmentation as the initial stage of a planned parsing system that will also include region classification. The motivation for our work is to enable improved online experiences for users with assistive needs (serving as the back-end process for such front-end tasks as zooming and decluttering the image being presented to those with visual or cognitive challenges, or producing less unwieldy output from screenreaders). Our focus is therefore on the interpretation of a class of man-made images (where web pages consist of one particular set of these images which have important constraints that assist in performing the processing). After clarifying some comparisons with an earlier model of ours, we show validation for our method. Following this, we briefly discuss the contribution for the field of computer vision, offering a contrast with current work in segmentation focused on the processing of natural images.
{"title":"Towards an Improved Vision-Based Web Page Segmentation Algorithm","authors":"M. Cormier, R. Mann, Karyn Moffatt, R. Cohen","doi":"10.1109/CRV.2017.38","DOIUrl":"https://doi.org/10.1109/CRV.2017.38","url":null,"abstract":"In this paper we introduce an edge-based segmentation algorithm designed for web pages. We consider each web page as an image and perform segmentation as the initial stage of a planned parsing system that will also include region classification. The motivation for our work is to enable improved online experiences for users with assistive needs (serving as the back-end process for such front-end tasks as zooming and decluttering the image being presented to those with visual or cognitive challenges, or producing less unwieldy output from screenreaders). Our focus is therefore on the interpretation of a class of man-made images (where web pages consist of one particular set of these images which have important constraints that assist in performing the processing). After clarifying some comparisons with an earlier model of ours, we show validation for our method. Following this, we briefly discuss the contribution for the field of computer vision, offering a contrast with current work in segmentation focused on the processing of natural images.","PeriodicalId":308760,"journal":{"name":"2017 14th Conference on Computer and Robot Vision (CRV)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132119729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we show how a swarm of differential-drive robots can self-organize into multiple nested layers of a given shape. A key component of our work is the reliance on inter-robot collisions to provide information on how the formation should grow. We describe a simple controller and experimentally evaluate how its performance scales as the number of robots in the swarm increases from tens to several hundred robots. The average quality of the formation is shown to be a linearly decreasing function of swarm size, although the steepness of this line depends on the complexity of the formation. We also show that the time for a swarm to form a given shape does not grow quickly even as the number of robots in the swarm increases by a large amount.
{"title":"Self-Organization of a Robot Swarm into Concentric Shapes","authors":"Geoff Nagy, R. Vaughan","doi":"10.1109/CRV.2017.58","DOIUrl":"https://doi.org/10.1109/CRV.2017.58","url":null,"abstract":"In this paper, we show how a swarm of differential-drive robots can self-organize into multiple nested layers of a given shape. A key component of our work is the reliance on inter-robot collisions to provide information on how the formation should grow. We describe a simple controller and experimentally evaluate how its performance scales as the number of robots in the swarm increases from tens to several hundred robots. The average quality of the formation is shown to be a linearly decreasing function of swarm size, although the steepness of this line depends on the complexity of the formation. We also show that the time for a swarm to form a given shape does not grow quickly even as the number of robots in the swarm increases by a large amount.","PeriodicalId":308760,"journal":{"name":"2017 14th Conference on Computer and Robot Vision (CRV)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129646195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sandeep Manjanna, Johanna Hansen, Alberto Quattrini Li, Ioannis M. Rekleitis, G. Dudek
This paper addresses distributed data sampling in marine environments using robotic devices. We present a method to strategically sample locally observable features using two classes of sensor platforms. Our system consists of a sophisticated autonomous surface vehicle (ASV) which strategically samples based on information provided by a team of inexpensive sensor nodes. The sensor nodes effectively extend the observational capabilities of the vehicle by capturing georeferenced samples from disparate and moving points across the region. The ASV uses this information, along with its own observations, to plan a path so as to sample points which it expects to be particularly informative. We compare our approach to a traditional exhaustive survey approach and show that we are able to effectively represent a region with less energy expenditure. We validate our approach through simulations and test the system on real robots in field.
{"title":"Collaborative Sampling Using Heterogeneous Marine Robots Driven by Visual Cues","authors":"Sandeep Manjanna, Johanna Hansen, Alberto Quattrini Li, Ioannis M. Rekleitis, G. Dudek","doi":"10.1109/CRV.2017.49","DOIUrl":"https://doi.org/10.1109/CRV.2017.49","url":null,"abstract":"This paper addresses distributed data sampling in marine environments using robotic devices. We present a method to strategically sample locally observable features using two classes of sensor platforms. Our system consists of a sophisticated autonomous surface vehicle (ASV) which strategically samples based on information provided by a team of inexpensive sensor nodes. The sensor nodes effectively extend the observational capabilities of the vehicle by capturing georeferenced samples from disparate and moving points across the region. The ASV uses this information, along with its own observations, to plan a path so as to sample points which it expects to be particularly informative. We compare our approach to a traditional exhaustive survey approach and show that we are able to effectively represent a region with less energy expenditure. We validate our approach through simulations and test the system on real robots in field.","PeriodicalId":308760,"journal":{"name":"2017 14th Conference on Computer and Robot Vision (CRV)","volume":"145 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123052958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose using convolutional neural networks (CNNs) to automatically determine the pitch and roll of a camera using a single, scene agnostic, 2D image. We compared a linear regressor, a two-layer neural network, and two CNNs. We show the CNNs produce high levels of accuracy in estimating the ground truth orientations which can be used in various computer vision tasks where calculating the camera orientation is necessary or useful. By utilizing accelerometer data in an existing image dataset, we were able to provide the large camera orientation ground truth dataset needed to train such a network with approximately correct values. The trained network is then fine-tuned to smaller datasets with exact camera orientation labels. Additionally, the network is fine-tuned to a dataset with different intrinsic camera parameters to demonstrate the transferability of the network.
{"title":"Pitch and Roll Camera Orientation from a Single 2D Image Using Convolutional Neural Networks","authors":"Greg Olmschenk, Hao Tang, Zhigang Zhu","doi":"10.1109/CRV.2017.53","DOIUrl":"https://doi.org/10.1109/CRV.2017.53","url":null,"abstract":"In this paper, we propose using convolutional neural networks (CNNs) to automatically determine the pitch and roll of a camera using a single, scene agnostic, 2D image. We compared a linear regressor, a two-layer neural network, and two CNNs. We show the CNNs produce high levels of accuracy in estimating the ground truth orientations which can be used in various computer vision tasks where calculating the camera orientation is necessary or useful. By utilizing accelerometer data in an existing image dataset, we were able to provide the large camera orientation ground truth dataset needed to train such a network with approximately correct values. The trained network is then fine-tuned to smaller datasets with exact camera orientation labels. Additionally, the network is fine-tuned to a dataset with different intrinsic camera parameters to demonstrate the transferability of the network.","PeriodicalId":308760,"journal":{"name":"2017 14th Conference on Computer and Robot Vision (CRV)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124556812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Various denoising algorithms exist in the literature, however, no studies have ever been made to measure the impact of denoising algorithms on the quality of the video produced by a stabilization algorithm. In this paper, the impact of state of the art denoising algorithms on a feature-based video stabilization is measured and evaluated. Also, a quantitative measure is proposed which can give more insight on the impact of the chosen denoising algorithm on stabilization. The results show that the denoising algorithm can drastically affect the quality of stabilization results and choosing the latest denoising algorithm does not always guarantee the best stabilization results.
{"title":"Effect of Denoising Algorithms on Video Stabilization","authors":"Abdelrahman Ahmed, M. Shehata","doi":"10.1109/CRV.2017.34","DOIUrl":"https://doi.org/10.1109/CRV.2017.34","url":null,"abstract":"Various denoising algorithms exist in the literature, however, no studies have ever been made to measure the impact of denoising algorithms on the quality of the video produced by a stabilization algorithm. In this paper, the impact of state of the art denoising algorithms on a feature-based video stabilization is measured and evaluated. Also, a quantitative measure is proposed which can give more insight on the impact of the chosen denoising algorithm on stabilization. The results show that the denoising algorithm can drastically affect the quality of stabilization results and choosing the latest denoising algorithm does not always guarantee the best stabilization results.","PeriodicalId":308760,"journal":{"name":"2017 14th Conference on Computer and Robot Vision (CRV)","volume":"65 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132678890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It is often required to emphasize an object in an image. Artists, illustrators, cinematographers and photographers have long used the principles of contrast and composition to guide visual attention. In order to achieve this, a novel perceptually-driven approach is put forth which leads to the enhancement of visual saliency of target object without destroying the naturalness of the contents of the image. The proposed approach computes new feature values for the intended object by maximizing the feature dissimilarity (which is weighted by positional proximity) with other objects. Too much change in feature values in the target segment may destroy naturality of the image. This poses as the constraint in the proposed maximization problem. Genetic algorithm has been used, in this context, to find the feature values which maximize the saliency of the target object. Experimental validation through objective evaluation metrics using saliency maps, as well as analysis of eye-tracking data, establish the success of the proposed method.
{"title":"Enhancing Saliency of an Object Using Genetic Algorithm","authors":"R. Pal, Dipanjan Roy","doi":"10.1109/CRV.2017.33","DOIUrl":"https://doi.org/10.1109/CRV.2017.33","url":null,"abstract":"It is often required to emphasize an object in an image. Artists, illustrators, cinematographers and photographers have long used the principles of contrast and composition to guide visual attention. In order to achieve this, a novel perceptually-driven approach is put forth which leads to the enhancement of visual saliency of target object without destroying the naturalness of the contents of the image. The proposed approach computes new feature values for the intended object by maximizing the feature dissimilarity (which is weighted by positional proximity) with other objects. Too much change in feature values in the target segment may destroy naturality of the image. This poses as the constraint in the proposed maximization problem. Genetic algorithm has been used, in this context, to find the feature values which maximize the saliency of the target object. Experimental validation through objective evaluation metrics using saliency maps, as well as analysis of eye-tracking data, establish the success of the proposed method.","PeriodicalId":308760,"journal":{"name":"2017 14th Conference on Computer and Robot Vision (CRV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126365675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper focuses on the automatic quantitative performance analysis of bioprosthetic heart valves from video footage acquired during in vitro testing. Bioprosthetic heart valves, mimicking the shape and functionality of a human heart valve, are routinely used in valve replacement procedures to substitute defective native valves. Their reliability in both functionality and durability is crucial to the patients' well-being, as such, valve designs must be rigorously tested before deployment. A key quality metric of a heart valve design is the cyclical temporal evolution of the valve's area. This metric is typically computed manually from input video data, a time-consuming and error-prone task. We propose a novel, cost-effective approach for the automatic tracking and segmentation of valve orifices that integrates a probabilistic motion boundary model into a distance regularized level set evolution formulation. The proposed method constrains the level set evolution domain using data about characteristic motion patterns of heart valves. Experiments including comparisons with two other methods demonstrate the value of the proposed approach on three levels: an improved segmented orifice shape accuracy, a greater computational efficiency, and a better ability to identify video frames with orifice area content (open valve).
{"title":"Fast and Accurate Tracking of Highly Deformable Heart Valves with Locally Constrained Level Sets","authors":"A. Burden, Melissa Cote, A. Albu","doi":"10.1109/CRV.2017.13","DOIUrl":"https://doi.org/10.1109/CRV.2017.13","url":null,"abstract":"This paper focuses on the automatic quantitative performance analysis of bioprosthetic heart valves from video footage acquired during in vitro testing. Bioprosthetic heart valves, mimicking the shape and functionality of a human heart valve, are routinely used in valve replacement procedures to substitute defective native valves. Their reliability in both functionality and durability is crucial to the patients' well-being, as such, valve designs must be rigorously tested before deployment. A key quality metric of a heart valve design is the cyclical temporal evolution of the valve's area. This metric is typically computed manually from input video data, a time-consuming and error-prone task. We propose a novel, cost-effective approach for the automatic tracking and segmentation of valve orifices that integrates a probabilistic motion boundary model into a distance regularized level set evolution formulation. The proposed method constrains the level set evolution domain using data about characteristic motion patterns of heart valves. Experiments including comparisons with two other methods demonstrate the value of the proposed approach on three levels: an improved segmented orifice shape accuracy, a greater computational efficiency, and a better ability to identify video frames with orifice area content (open valve).","PeriodicalId":308760,"journal":{"name":"2017 14th Conference on Computer and Robot Vision (CRV)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126655143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We apply convolutional neural networks (CNN) to the problem of image orientation detection in the context of determining the correct orientation (from 0, 90, 180, and 270 degrees) of a consumer photo. The problem is especially important for digitazing analog photographs. We substantially improve on the published state of the art in terms of the performance on one of the standard datasets, and test our system on a more difficult large dataset of consumer photos. We use Guided Backpropagation to obtain insights into how our CNN detects photo orientation, and to explain its mistakes.
{"title":"Automatic Photo Orientation Detection with Convolutional Neural Networks","authors":"Ujash Joshi, Michael Guerzhoy","doi":"10.1109/CRV.2017.59","DOIUrl":"https://doi.org/10.1109/CRV.2017.59","url":null,"abstract":"We apply convolutional neural networks (CNN) to the problem of image orientation detection in the context of determining the correct orientation (from 0, 90, 180, and 270 degrees) of a consumer photo. The problem is especially important for digitazing analog photographs. We substantially improve on the published state of the art in terms of the performance on one of the standard datasets, and test our system on a more difficult large dataset of consumer photos. We use Guided Backpropagation to obtain insights into how our CNN detects photo orientation, and to explain its mistakes.","PeriodicalId":308760,"journal":{"name":"2017 14th Conference on Computer and Robot Vision (CRV)","volume":"187 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132576402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present an object tracking framework which fuses multiple unstable video-based methods and supports automatic tracker initialization and termination. To evaluate our system, we collected a large dataset of hand-annotated 5-minute traffic surveillance videos, which we are releasing to the community. To the best of our knowledge, this is the first publicly available dataset of such long videos, providing a diverse range of real-world object variation, scale change, interaction, different resolutions and illumination conditions. In our comprehensive evaluation using this dataset, we show that our automatic object tracking system often outperforms state-of-the-art trackers, even when these are provided with proper manual initialization. We also demonstrate tracking throughput improvements of 5× or more vs. the competition.
{"title":"Fully Automatic, Real-Time Vehicle Tracking for Surveillance Video","authors":"Yanzi Jin, Jakob Eriksson","doi":"10.1109/CRV.2017.43","DOIUrl":"https://doi.org/10.1109/CRV.2017.43","url":null,"abstract":"We present an object tracking framework which fuses multiple unstable video-based methods and supports automatic tracker initialization and termination. To evaluate our system, we collected a large dataset of hand-annotated 5-minute traffic surveillance videos, which we are releasing to the community. To the best of our knowledge, this is the first publicly available dataset of such long videos, providing a diverse range of real-world object variation, scale change, interaction, different resolutions and illumination conditions. In our comprehensive evaluation using this dataset, we show that our automatic object tracking system often outperforms state-of-the-art trackers, even when these are provided with proper manual initialization. We also demonstrate tracking throughput improvements of 5× or more vs. the competition.","PeriodicalId":308760,"journal":{"name":"2017 14th Conference on Computer and Robot Vision (CRV)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134062051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}