Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711566
P. Schroeder, A. Bartoli, P. Georgel, Nassir Navab
The quality of a mosaic depends on the projective alignment of the images involved. After point-correspondences between the images have been established, bundle adjustment finds an alignment considered optimal under certain hypotheses. This procedure minimizes a nonlinear cost and has to be initialized with care. It is very common to compose inter-frame homographies which have been computed with standard methods in order to get an initial global alignment. This technique is suboptimal if there is noise or missing ho-mographies as it typically uses a small part of the available data. We propose four new closed-form solutions. They all provide non-heuristic initial alignments using all the known inter-frame homographies. Our methods are tested with synthetic and real data and are compared to the standard method. These experiments reveal that our methods are more accurate, taking advantage of the redundant information available in the set of inter-frame homographies.
{"title":"Closed-form solutions to multiple-view homography estimation","authors":"P. Schroeder, A. Bartoli, P. Georgel, Nassir Navab","doi":"10.1109/WACV.2011.5711566","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711566","url":null,"abstract":"The quality of a mosaic depends on the projective alignment of the images involved. After point-correspondences between the images have been established, bundle adjustment finds an alignment considered optimal under certain hypotheses. This procedure minimizes a nonlinear cost and has to be initialized with care. It is very common to compose inter-frame homographies which have been computed with standard methods in order to get an initial global alignment. This technique is suboptimal if there is noise or missing ho-mographies as it typically uses a small part of the available data. We propose four new closed-form solutions. They all provide non-heuristic initial alignments using all the known inter-frame homographies. Our methods are tested with synthetic and real data and are compared to the standard method. These experiments reveal that our methods are more accurate, taking advantage of the redundant information available in the set of inter-frame homographies.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131877050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711545
Victor Fragoso, Steffen Gauglitz, S. Zamora, Jim Kleban, M. Turk
We present a mobile augmented reality (AR) translation system, using a smartphone's camera and touchscreen, that requires the user to simply tap on the word of interest once in order to produce a translation, presented as an AR overlay. The translation seamlessly replaces the original text in the live camera stream, matching background and foreground colors estimated from the source images. For this purpose, we developed an efficient algorithm for accurately detecting the location and orientation of the text in a live camera stream that is robust to perspective distortion, and we combine it with OCR and a text-to-text translation engine. Our experimental results, using the ICDAR 2003 dataset and our own set of video sequences, quantify the accuracy of our detection and analyze the sources of failure among the system's components. With the OCR and translation running in a background thread, the system runs at 26 fps on a current generation smartphone (Nokia N900) and offers a particularly easy-to-use and simple method for translation, especially in situations in which typing or correct pronunciation (for systems with speech input) is cumbersome or impossible.
{"title":"TranslatAR: A mobile augmented reality translator","authors":"Victor Fragoso, Steffen Gauglitz, S. Zamora, Jim Kleban, M. Turk","doi":"10.1109/WACV.2011.5711545","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711545","url":null,"abstract":"We present a mobile augmented reality (AR) translation system, using a smartphone's camera and touchscreen, that requires the user to simply tap on the word of interest once in order to produce a translation, presented as an AR overlay. The translation seamlessly replaces the original text in the live camera stream, matching background and foreground colors estimated from the source images. For this purpose, we developed an efficient algorithm for accurately detecting the location and orientation of the text in a live camera stream that is robust to perspective distortion, and we combine it with OCR and a text-to-text translation engine. Our experimental results, using the ICDAR 2003 dataset and our own set of video sequences, quantify the accuracy of our detection and analyze the sources of failure among the system's components. With the OCR and translation running in a background thread, the system runs at 26 fps on a current generation smartphone (Nokia N900) and offers a particularly easy-to-use and simple method for translation, especially in situations in which typing or correct pronunciation (for systems with speech input) is cumbersome or impossible.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131195008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711528
T. Kanade, Zhaozheng Yin, Ryoma Bise, Seungil Huh, Sungeun Eom, Michael F. Sandbothe, Mei Chen
We present several algorithms for cell image analysis including microscopy image restoration, cell event detection and cell tracking in a large population. The algorithms are integrated into an automated system capable of quantifying cell proliferation metrics in vitro in real-time. This offers unique opportunities for biological applications such as efficient cell behavior discovery in response to different cell culturing conditions and adaptive experiment control. We quantitatively evaluated our system's performance on 16 microscopy image sequences with satisfactory accuracy for biologists' need. We have also developed a public website compatible to the system's local user interface, thereby allowing biologists to conveniently check their experiment progress online. The website will serve as a community resource that allows other research groups to upload their cell images for analysis and comparison.
{"title":"Cell image analysis: Algorithms, system and applications","authors":"T. Kanade, Zhaozheng Yin, Ryoma Bise, Seungil Huh, Sungeun Eom, Michael F. Sandbothe, Mei Chen","doi":"10.1109/WACV.2011.5711528","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711528","url":null,"abstract":"We present several algorithms for cell image analysis including microscopy image restoration, cell event detection and cell tracking in a large population. The algorithms are integrated into an automated system capable of quantifying cell proliferation metrics in vitro in real-time. This offers unique opportunities for biological applications such as efficient cell behavior discovery in response to different cell culturing conditions and adaptive experiment control. We quantitatively evaluated our system's performance on 16 microscopy image sequences with satisfactory accuracy for biologists' need. We have also developed a public website compatible to the system's local user interface, thereby allowing biologists to conveniently check their experiment progress online. The website will serve as a community resource that allows other research groups to upload their cell images for analysis and comparison.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115767169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711486
L. Wong, Kok-Lim Low
A photograph that has visually dominant subjects in general induces stronger aesthetic interest. Inspired by this, we have developed a new approach to enhance image aesthetics through saliency retargeting. Our method alters low-level image features of the objects in the photograph such that their computed saliency measurements in the modified image become consistent with the intended order of their visual importance. The goal of our approach is to produce an image that can redirect the viewers' attention to the most important objects in the image, and thus making these objects the main subjects. Since many modified images can satisfy the same specified order of visual importance, we trained an aesthetics score prediction model to pick the one with the best aesthetics. Results from our user experiments support the effectiveness of our approach.
{"title":"Saliency retargeting: An approach to enhance image aesthetics","authors":"L. Wong, Kok-Lim Low","doi":"10.1109/WACV.2011.5711486","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711486","url":null,"abstract":"A photograph that has visually dominant subjects in general induces stronger aesthetic interest. Inspired by this, we have developed a new approach to enhance image aesthetics through saliency retargeting. Our method alters low-level image features of the objects in the photograph such that their computed saliency measurements in the modified image become consistent with the intended order of their visual importance. The goal of our approach is to produce an image that can redirect the viewers' attention to the most important objects in the image, and thus making these objects the main subjects. Since many modified images can satisfy the same specified order of visual importance, we trained an aesthetics score prediction model to pick the one with the best aesthetics. Results from our user experiments support the effectiveness of our approach.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123926793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711493
Zhidong Li, Jie Xu, Yang Wang, G. Geers, Jun Yang
This paper proposes a novel computational framework for saliency detection, which integrates the saliency map computation and proto-objects detection. The proto-objects are detected based on the saliency map using latent topic model. The detected proto-objects are then utilized to improve the saliency map computation. Extensive experiments are performed on two publicly available datasets. The experimental results show that the proposed framework outperforms the state-of-art methods.
{"title":"Saliency detection based on proto-objects and topic model","authors":"Zhidong Li, Jie Xu, Yang Wang, G. Geers, Jun Yang","doi":"10.1109/WACV.2011.5711493","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711493","url":null,"abstract":"This paper proposes a novel computational framework for saliency detection, which integrates the saliency map computation and proto-objects detection. The proto-objects are detected based on the saliency map using latent topic model. The detected proto-objects are then utilized to improve the saliency map computation. Extensive experiments are performed on two publicly available datasets. The experimental results show that the proposed framework outperforms the state-of-art methods.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124279655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711497
S. Tangruamsub, Keisuke Takada, O. Hasegawa
This paper presents a novel 3D object recognition method. The proposed objectives are to overcome shortcoming of the appearance-based method, which lacks a spatial relationship between the parts of an object, and those of other 3D model methods, which require complicated computation. The proposed method is based on a voting process. Appearance estimation is introduced in this work in order to deal with the faulty detection problem. We tested our method for object detection and pose estimation, and the results showed that our method improved the average precision and detection time compared to other methods.
{"title":"3D Object recognition using a voting algorithm in a real-world environment","authors":"S. Tangruamsub, Keisuke Takada, O. Hasegawa","doi":"10.1109/WACV.2011.5711497","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711497","url":null,"abstract":"This paper presents a novel 3D object recognition method. The proposed objectives are to overcome shortcoming of the appearance-based method, which lacks a spatial relationship between the parts of an object, and those of other 3D model methods, which require complicated computation. The proposed method is based on a voting process. Appearance estimation is introduced in this work in order to deal with the faulty detection problem. We tested our method for object detection and pose estimation, and the results showed that our method improved the average precision and detection time compared to other methods.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"445 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125771875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711531
M. Kawanabe, Alexander Binder, Christina Müller, W. Wojcikiewicz
Automatic annotation of images is a challenging task in computer vision because of “semantic gap” between highlevel visual concepts and image appearances. Therefore, user tags attached to images can provide further information to bridge the gap, even though they are partially uninformative and misleading. In this work, we investigate multi-modal visual concept classification based on visual features and user tags via kernel-based classifiers. An issue here is how to construct kernels between sets of tags. We deploy Markov random walks on graphs of key tags to incorporate co-occurrence between them. This procedure acts as a smoothing of tag based features. Our experimental result on the ImageCLEF2010 PhotoAnnotation benchmark shows that our proposed method outperforms the baseline relying solely on visual information and a recently published state-of-the-art approach.
{"title":"Multi-modal visual concept classification of images via Markov random walk over tags","authors":"M. Kawanabe, Alexander Binder, Christina Müller, W. Wojcikiewicz","doi":"10.1109/WACV.2011.5711531","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711531","url":null,"abstract":"Automatic annotation of images is a challenging task in computer vision because of “semantic gap” between highlevel visual concepts and image appearances. Therefore, user tags attached to images can provide further information to bridge the gap, even though they are partially uninformative and misleading. In this work, we investigate multi-modal visual concept classification based on visual features and user tags via kernel-based classifiers. An issue here is how to construct kernels between sets of tags. We deploy Markov random walks on graphs of key tags to incorporate co-occurrence between them. This procedure acts as a smoothing of tag based features. Our experimental result on the ImageCLEF2010 PhotoAnnotation benchmark shows that our proposed method outperforms the baseline relying solely on visual information and a recently published state-of-the-art approach.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"277 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114485587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711564
M. Ryoo, Wonpil Yu
In this paper, we present a novel human activity recognition approach that only requires a single video example per activity. We introduce the paradigm of active video composition, which enables one-example recognition of complex activities. The idea is to automatically create a large number of semi-artificial training videos called composed videos by manipulating an original human activity video. A methodology to automatically compose activity videos having different backgrounds, translations, scales, actors, and movement structures is described in this paper. Furthermore, an active learning algorithm to model the temporal structure of the human activity has been designed, preventing the generation of composed training videos violating the structural constraints of the activity. The intention is to generate composed videos having correct organizations, and take advantage of them for the training of the recognition system. In contrast to previous passive recognition systems relying only on given training videos, our methodology actively composes necessary training videos that the system is expected to observe in its environment. Experimental results illustrate that a single fully labeled video per activity is sufficient for our methodology to reliably recognize human activities by utilizing composed training videos.
{"title":"One video is sufficient? Human activity recognition using active video composition","authors":"M. Ryoo, Wonpil Yu","doi":"10.1109/WACV.2011.5711564","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711564","url":null,"abstract":"In this paper, we present a novel human activity recognition approach that only requires a single video example per activity. We introduce the paradigm of active video composition, which enables one-example recognition of complex activities. The idea is to automatically create a large number of semi-artificial training videos called composed videos by manipulating an original human activity video. A methodology to automatically compose activity videos having different backgrounds, translations, scales, actors, and movement structures is described in this paper. Furthermore, an active learning algorithm to model the temporal structure of the human activity has been designed, preventing the generation of composed training videos violating the structural constraints of the activity. The intention is to generate composed videos having correct organizations, and take advantage of them for the training of the recognition system. In contrast to previous passive recognition systems relying only on given training videos, our methodology actively composes necessary training videos that the system is expected to observe in its environment. Experimental results illustrate that a single fully labeled video per activity is sufficient for our methodology to reliably recognize human activities by utilizing composed training videos.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130081962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711559
D. Jang, M. Turk
Recent advances in computer vision have significantly reduced the difficulty of object classification and recognition. Robust feature detector and descriptor algorithms are particularly useful, forming the basis for many recognition and classification applications. These algorithms have been used in divergent bag-of-words and structural matching approaches. This work demonstrates a recognition application, based upon the SURF feature descriptor algorithm, which fuses bag-of-words and structural verification techniques. The resulting system is applied to the domain of car recognition and achieves accurate (> 90%) and real-time performance when searching databases containing thousands of images.
{"title":"Car-Rec: A real time car recognition system","authors":"D. Jang, M. Turk","doi":"10.1109/WACV.2011.5711559","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711559","url":null,"abstract":"Recent advances in computer vision have significantly reduced the difficulty of object classification and recognition. Robust feature detector and descriptor algorithms are particularly useful, forming the basis for many recognition and classification applications. These algorithms have been used in divergent bag-of-words and structural matching approaches. This work demonstrates a recognition application, based upon the SURF feature descriptor algorithm, which fuses bag-of-words and structural verification techniques. The resulting system is applied to the domain of car recognition and achieves accurate (> 90%) and real-time performance when searching databases containing thousands of images.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130964973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-01-05DOI: 10.1109/WACV.2011.5711535
Zhang Tao, T. Boult
Stereo reconstruction is an important research and application area, both for general 3D reconstruction and for operations like robotic navigation and remote sensing. This paper addresses the determination of parameters for a stereo system to optimize/minimize 3D reconstruction errors. Previous work on error analysis in stereo reconstruction optimized error in disparity space which led to the erroneous conclusion that, ignoring matching errors, errors decrease when the baseline goes to infinity. In this paper, we derive the first formal error model based on the more realistic “point-of-closest-approach” ray model used in modern stereo systems. We then show this results in finite optimal baseline that minimizes reconstruction errors in all three world directions. We also show why previous oversimplified error analysis results in infinite baselines. We derive the mathematical relationship between the error variances and the stereo system parameters. In our analysis, we consider the situations where errors exist in only one camera as well as errors in both cameras. We have derived the results for both parallel and verged systems, though only the simpler models are presented algebraically herein. The paper includes simulations to highlight the results and validate the approximations in the error propagation. The results should allow stereo system designers, or those using motion-stereo, to improve their system.
{"title":"Realistic stereo error models and finite optimal stereo baselines","authors":"Zhang Tao, T. Boult","doi":"10.1109/WACV.2011.5711535","DOIUrl":"https://doi.org/10.1109/WACV.2011.5711535","url":null,"abstract":"Stereo reconstruction is an important research and application area, both for general 3D reconstruction and for operations like robotic navigation and remote sensing. This paper addresses the determination of parameters for a stereo system to optimize/minimize 3D reconstruction errors. Previous work on error analysis in stereo reconstruction optimized error in disparity space which led to the erroneous conclusion that, ignoring matching errors, errors decrease when the baseline goes to infinity. In this paper, we derive the first formal error model based on the more realistic “point-of-closest-approach” ray model used in modern stereo systems. We then show this results in finite optimal baseline that minimizes reconstruction errors in all three world directions. We also show why previous oversimplified error analysis results in infinite baselines. We derive the mathematical relationship between the error variances and the stereo system parameters. In our analysis, we consider the situations where errors exist in only one camera as well as errors in both cameras. We have derived the results for both parallel and verged systems, though only the simpler models are presented algebraically herein. The paper includes simulations to highlight the results and validate the approximations in the error propagation. The results should allow stereo system designers, or those using motion-stereo, to improve their system.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127774303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}