Pub Date : 2002-12-10DOI: 10.1109/ICPR.2002.1048439
B. Haasdonk, Daniel Keysers
When dealing with pattern recognition problems one encounters different types of a-priori knowledge. It is important to incorporate such knowledge into the classification method at hand. A very common type of a-priori knowledge is transformation invariance of the input data, e.g. geometric transformations of image-data like shifts, scaling etc. Distance based classification methods can make use of this by a modified distance measure called tangent distance. We introduce a new class of kernels for support vector machines which incorporate tangent distance and therefore are applicable in cases where such transformation invariances are known. We report experimental results which show that the performance of our method is comparable to other state-of-the-art methods, while problems of existing ones are avoided.
{"title":"Tangent distance kernels for support vector machines","authors":"B. Haasdonk, Daniel Keysers","doi":"10.1109/ICPR.2002.1048439","DOIUrl":"https://doi.org/10.1109/ICPR.2002.1048439","url":null,"abstract":"When dealing with pattern recognition problems one encounters different types of a-priori knowledge. It is important to incorporate such knowledge into the classification method at hand. A very common type of a-priori knowledge is transformation invariance of the input data, e.g. geometric transformations of image-data like shifts, scaling etc. Distance based classification methods can make use of this by a modified distance measure called tangent distance. We introduce a new class of kernels for support vector machines which incorporate tangent distance and therefore are applicable in cases where such transformation invariances are known. We report experimental results which show that the performance of our method is comparable to other state-of-the-art methods, while problems of existing ones are avoided.","PeriodicalId":159502,"journal":{"name":"Object recognition supported by user interaction for service robots","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128967020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-12-10DOI: 10.1109/icpr.2002.1048341
D. Keren
The goal of this paper is to offer a framework for image classification "by type". For example, one may want to classify an image of a certain office as man-made - as opposed to outdoor scene, even if no image of a similar office exists in the training set. This is accomplished by using local features, and by using the naive Bayes classifier. The application presented here is classification of paintings; after the system is presented with a sample of paintings of various artists, it tries to determine who was the painter who painted it. The result is local - each small image block is assigned a painter, and a majority vote determines the painter. The results are roughly visually consistent with human perception of various artists' style.
{"title":"Painter identification using local features and naive Bayes","authors":"D. Keren","doi":"10.1109/icpr.2002.1048341","DOIUrl":"https://doi.org/10.1109/icpr.2002.1048341","url":null,"abstract":"The goal of this paper is to offer a framework for image classification \"by type\". For example, one may want to classify an image of a certain office as man-made - as opposed to outdoor scene, even if no image of a similar office exists in the training set. This is accomplished by using local features, and by using the naive Bayes classifier. The application presented here is classification of paintings; after the system is presented with a sample of paintings of various artists, it tries to determine who was the painter who painted it. The result is local - each small image block is assigned a painter, and a majority vote determines the painter. The results are roughly visually consistent with human perception of various artists' style.","PeriodicalId":159502,"journal":{"name":"Object recognition supported by user interaction for service robots","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116406535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-12-10DOI: 10.1109/ICPR.2002.1048391
Hidenori Sato, H. Matsuoka, A. Onozawa, H. Kitazawa
A new data structure for representing the color of rays from multiple-view images is presented. The structure is a hexagonal tessellation generated from a buckyball. Using the structure, the captured colors are represented as pixel values of a hexagonal image, and the image is finally saved as a compressed normal image after a simple transformation without loss of connectivity. A surface light field data generating algorithm based on image-based scheme is also presented It assigns color for each vertex of the reconstructed surface using the structure. The experimental results show that the algorithm yields the surface light field data in a short time. In addition, photorealistic rendered views are obtained from arbitrary viewpoints.
{"title":"Hexagonal image representation for 3-D photorealistic reconstruction","authors":"Hidenori Sato, H. Matsuoka, A. Onozawa, H. Kitazawa","doi":"10.1109/ICPR.2002.1048391","DOIUrl":"https://doi.org/10.1109/ICPR.2002.1048391","url":null,"abstract":"A new data structure for representing the color of rays from multiple-view images is presented. The structure is a hexagonal tessellation generated from a buckyball. Using the structure, the captured colors are represented as pixel values of a hexagonal image, and the image is finally saved as a compressed normal image after a simple transformation without loss of connectivity. A surface light field data generating algorithm based on image-based scheme is also presented It assigns color for each vertex of the reconstructed surface using the structure. The experimental results show that the algorithm yields the surface light field data in a short time. In addition, photorealistic rendered views are obtained from arbitrary viewpoints.","PeriodicalId":159502,"journal":{"name":"Object recognition supported by user interaction for service robots","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115675859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-12-10DOI: 10.1109/ICPR.2002.1048301
Lei Cheng, Fuchao Wu, Zhanyi Hu, H. Tsui
We propose an approach to solving the Kruppa equations for camera self-calibration. Traditionally, the unknown scale factors in the Kruppa equations are eliminated first, leading to a set of nonlinear constraints. Instead, we determine the scale factors by a Levenberg-Marquardt optimization or genetic optimization technique first. Then, the camera's intrinsic parameters are derived from the resulting linear constraints. Extensive simulations as well as experiments with real images verify that the above technique is both accurate and robust.
{"title":"A new approach to solving Kruppa equations for camera self-calibration","authors":"Lei Cheng, Fuchao Wu, Zhanyi Hu, H. Tsui","doi":"10.1109/ICPR.2002.1048301","DOIUrl":"https://doi.org/10.1109/ICPR.2002.1048301","url":null,"abstract":"We propose an approach to solving the Kruppa equations for camera self-calibration. Traditionally, the unknown scale factors in the Kruppa equations are eliminated first, leading to a set of nonlinear constraints. Instead, we determine the scale factors by a Levenberg-Marquardt optimization or genetic optimization technique first. Then, the camera's intrinsic parameters are derived from the resulting linear constraints. Extensive simulations as well as experiments with real images verify that the above technique is both accurate and robust.","PeriodicalId":159502,"journal":{"name":"Object recognition supported by user interaction for service robots","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115769817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-12-10DOI: 10.1109/ICPR.2002.1048494
Z. Rasheed, M. Shah
We present a method to classify movies on the basis of audio-visual cues present in previews. A preview summarizes the main idea of a movie providing a suitable amount of information to perform genre classification. In our approach movies are initially classified into action and non-action by computing the visual disturbance feature and average shot length of every movie. Visual disturbance is defined as a measure of motion content in a clip. Next we use color, audio and cinematic principles for further classification into comedy, horror drama/other and movies containing explosions and gunfire. This work is a step towards automatically building and updating a video database, thus resulting in minimum human intervention. Other potential applications include browsing and retrieval of videos on the Internet (video-on-demand), video libraries, and rating of movies.
{"title":"Movie genre classification by exploiting audio-visual features of previews","authors":"Z. Rasheed, M. Shah","doi":"10.1109/ICPR.2002.1048494","DOIUrl":"https://doi.org/10.1109/ICPR.2002.1048494","url":null,"abstract":"We present a method to classify movies on the basis of audio-visual cues present in previews. A preview summarizes the main idea of a movie providing a suitable amount of information to perform genre classification. In our approach movies are initially classified into action and non-action by computing the visual disturbance feature and average shot length of every movie. Visual disturbance is defined as a measure of motion content in a clip. Next we use color, audio and cinematic principles for further classification into comedy, horror drama/other and movies containing explosions and gunfire. This work is a step towards automatically building and updating a video database, thus resulting in minimum human intervention. Other potential applications include browsing and retrieval of videos on the Internet (video-on-demand), video libraries, and rating of movies.","PeriodicalId":159502,"journal":{"name":"Object recognition supported by user interaction for service robots","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114883397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-12-10DOI: 10.1109/ICPR.2002.1048264
Richard I. A. Davis, B. Lovell, T. Caelli
The huge popularity of hidden Markov models (HMMs) in pattern recognition is due to the ability to "learn" model parameters from an observation sequence through Baum-Welch and other re-estimation procedures. In the case of HMM parameter estimation from an ensemble of observation sequences, rather than a single sequence, we require techniques for finding the parameters which maximize the likelihood of the estimated model given the entire set of observation sequences. The importance of this study is that HMMs with parameters estimated from multiple observations are shown to be many orders of magnitude more probable than HMM models learned from any single observation sequence - thus the effectiveness of HMM "learning" is greatly enhanced. In this paper we present techniques that usually find models significantly more likely than Rabiner's well-known method on both seen and unseen sequences.
{"title":"Improved estimation of hidden Markov model parameters from multiple observation sequences","authors":"Richard I. A. Davis, B. Lovell, T. Caelli","doi":"10.1109/ICPR.2002.1048264","DOIUrl":"https://doi.org/10.1109/ICPR.2002.1048264","url":null,"abstract":"The huge popularity of hidden Markov models (HMMs) in pattern recognition is due to the ability to \"learn\" model parameters from an observation sequence through Baum-Welch and other re-estimation procedures. In the case of HMM parameter estimation from an ensemble of observation sequences, rather than a single sequence, we require techniques for finding the parameters which maximize the likelihood of the estimated model given the entire set of observation sequences. The importance of this study is that HMMs with parameters estimated from multiple observations are shown to be many orders of magnitude more probable than HMM models learned from any single observation sequence - thus the effectiveness of HMM \"learning\" is greatly enhanced. In this paper we present techniques that usually find models significantly more likely than Rabiner's well-known method on both seen and unseen sequences.","PeriodicalId":159502,"journal":{"name":"Object recognition supported by user interaction for service robots","volume":"351 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125630882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-12-10DOI: 10.1109/ICPR.2002.1048342
G. Nagy, Jie Zou
Computer Assisted Visual Interactive Recognition (CAVIAR) draws on sequential pattern recognition, image database, expert systems, pen computing, and digital camera technology. It is designed to recognize wildflowers and other families of similar objects more accurately than machine vision and faster than most laypersons. The novelty of the approach is that human perceptual ability is exploited through interaction with the image of the unknown object. The computer remembers the characteristics of all previously seen classes, suggests possible operator actions, and displays confidence scores based on already detected features. In one application, consisting of 80 test images of wildflowers, 10 laypersons averaged 80% recognition accuracy at 12 seconds per flower.
{"title":"Interactive visual pattern recognition","authors":"G. Nagy, Jie Zou","doi":"10.1109/ICPR.2002.1048342","DOIUrl":"https://doi.org/10.1109/ICPR.2002.1048342","url":null,"abstract":"Computer Assisted Visual Interactive Recognition (CAVIAR) draws on sequential pattern recognition, image database, expert systems, pen computing, and digital camera technology. It is designed to recognize wildflowers and other families of similar objects more accurately than machine vision and faster than most laypersons. The novelty of the approach is that human perceptual ability is exploited through interaction with the image of the unknown object. The computer remembers the characteristics of all previously seen classes, suggests possible operator actions, and displays confidence scores based on already detected features. In one application, consisting of 80 test images of wildflowers, 10 laypersons averaged 80% recognition accuracy at 12 seconds per flower.","PeriodicalId":159502,"journal":{"name":"Object recognition supported by user interaction for service robots","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127058870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-12-10DOI: 10.1109/ICPR.2002.1048284
D. Ridder, E. Pekalska, R. Duin
Although usually classifier error is the main concern in publications, in real applications classifier evaluation complexity may play a large role as well. In the paper, a simple economic model is proposed with which a trade-off between classifier error and calculated evaluation complexity can be formulated. This trade-off can then be used to judge the necessity of increasing sample size or number of features to decrease classification error or, conversely, feature extraction or prototype selection to decrease evaluation complexity. The model is applied to the benchmark problem of handwritten digit recognition and is shown to lead to interesting conclusions, given certain assumptions.
{"title":"The economics of classification: error vs. complexity","authors":"D. Ridder, E. Pekalska, R. Duin","doi":"10.1109/ICPR.2002.1048284","DOIUrl":"https://doi.org/10.1109/ICPR.2002.1048284","url":null,"abstract":"Although usually classifier error is the main concern in publications, in real applications classifier evaluation complexity may play a large role as well. In the paper, a simple economic model is proposed with which a trade-off between classifier error and calculated evaluation complexity can be formulated. This trade-off can then be used to judge the necessity of increasing sample size or number of features to decrease classification error or, conversely, feature extraction or prototype selection to decrease evaluation complexity. The model is applied to the benchmark problem of handwritten digit recognition and is shown to lead to interesting conclusions, given certain assumptions.","PeriodicalId":159502,"journal":{"name":"Object recognition supported by user interaction for service robots","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127777451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-12-10DOI: 10.1109/ICPR.2002.1048451
X. Muñoz, J. Martí, X. Cufí, J. Freixenet
An unsupervised approach to image segmentation which fuses region and boundary information is presented. The proposed approach takes advantage of the combined use of 3 different strategies: the guidance of seed placement, the control of decision criterion, and the boundary refinement. The new algorithm uses the boundary information to initialize a set of active regions which compete for the pixels in order to segment the whole image. The method is implemented on a multiresolution representation which ensures noise robustness as well as computation efficiency. The accuracy of the segmentation results has been proven through an objective comparative evaluation of the method.
{"title":"Unsupervised active regions for multiresolution image segmentation","authors":"X. Muñoz, J. Martí, X. Cufí, J. Freixenet","doi":"10.1109/ICPR.2002.1048451","DOIUrl":"https://doi.org/10.1109/ICPR.2002.1048451","url":null,"abstract":"An unsupervised approach to image segmentation which fuses region and boundary information is presented. The proposed approach takes advantage of the combined use of 3 different strategies: the guidance of seed placement, the control of decision criterion, and the boundary refinement. The new algorithm uses the boundary information to initialize a set of active regions which compete for the pixels in order to segment the whole image. The method is implemented on a multiresolution representation which ensures noise robustness as well as computation efficiency. The accuracy of the segmentation results has been proven through an objective comparative evaluation of the method.","PeriodicalId":159502,"journal":{"name":"Object recognition supported by user interaction for service robots","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128071314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-12-10DOI: 10.1109/ICPR.2002.1048477
S. Guimarães, A. Araújo, M. Couprie, N. J. Leite
The video segmentation problem can be regarded as a problem of detecting the fundamental video units (shots). Due to different ways of linking two consecutive shots this task turns out to be difficult. In this work, we propose a method to detect a type of gradual transition, the fade, by image segmentation tools instead of using dissimilarity measures or mathematical models. Firstly, the video is transformed into a 2D image considering the histogram information, called visual rhythm by histogram. Afterwards, we apply image processing tools to detect specified patterns in this image.
{"title":"Video fade detection by discrete line identification","authors":"S. Guimarães, A. Araújo, M. Couprie, N. J. Leite","doi":"10.1109/ICPR.2002.1048477","DOIUrl":"https://doi.org/10.1109/ICPR.2002.1048477","url":null,"abstract":"The video segmentation problem can be regarded as a problem of detecting the fundamental video units (shots). Due to different ways of linking two consecutive shots this task turns out to be difficult. In this work, we propose a method to detect a type of gradual transition, the fade, by image segmentation tools instead of using dissimilarity measures or mathematical models. Firstly, the video is transformed into a 2D image considering the histogram information, called visual rhythm by histogram. Afterwards, we apply image processing tools to detect specified patterns in this image.","PeriodicalId":159502,"journal":{"name":"Object recognition supported by user interaction for service robots","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121666843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}