Pub Date : 2002-12-10DOI: 10.1109/ICPR.2002.1048318
S. Marcel, Samy Bengio
The performance of face verification systems has steadily improved over the last few years, mainly focusing on models rather than on feature processing. State-of-the-art methods often use the gray-scale face image as input. We propose to use an additional feature of the face image: the skin color The new feature set is tested on a benchmark database, namely XM2VTS, using a simple discriminant artificial neural network. Results show that the skin color information improves the performance.
{"title":"Improving face verification using skin color information","authors":"S. Marcel, Samy Bengio","doi":"10.1109/ICPR.2002.1048318","DOIUrl":"https://doi.org/10.1109/ICPR.2002.1048318","url":null,"abstract":"The performance of face verification systems has steadily improved over the last few years, mainly focusing on models rather than on feature processing. State-of-the-art methods often use the gray-scale face image as input. We propose to use an additional feature of the face image: the skin color The new feature set is tested on a benchmark database, namely XM2VTS, using a simple discriminant artificial neural network. Results show that the skin color information improves the performance.","PeriodicalId":159502,"journal":{"name":"Object recognition supported by user interaction for service robots","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126287026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-12-10DOI: 10.1109/ICPR.2002.1048243
Yanlai Li, Kuanquan Wang, David Zhang
This paper presents a very fast step acceleration based training algorithm (SATA) for multilayer feedforward neural network training. The most outstanding virtue of this algorithm is that it does not need to calculate the gradient of the target function. In each iteration step, the computation only concentrates on the corresponding varied part. The proposed algorithm has attributes in simplicity, flexibility and feasibility, as well as high speed of convergence. Compared with the other methods, including the conventional backpropagation (BP), conjugate gradient, and weight extrapolation based BP, many simulations confirmed the superiority of this algorithm in terms of converging speed and computation time required.
{"title":"Step acceleration based training algorithm for feedforward neural networks","authors":"Yanlai Li, Kuanquan Wang, David Zhang","doi":"10.1109/ICPR.2002.1048243","DOIUrl":"https://doi.org/10.1109/ICPR.2002.1048243","url":null,"abstract":"This paper presents a very fast step acceleration based training algorithm (SATA) for multilayer feedforward neural network training. The most outstanding virtue of this algorithm is that it does not need to calculate the gradient of the target function. In each iteration step, the computation only concentrates on the corresponding varied part. The proposed algorithm has attributes in simplicity, flexibility and feasibility, as well as high speed of convergence. Compared with the other methods, including the conventional backpropagation (BP), conjugate gradient, and weight extrapolation based BP, many simulations confirmed the superiority of this algorithm in terms of converging speed and computation time required.","PeriodicalId":159502,"journal":{"name":"Object recognition supported by user interaction for service robots","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127337852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-12-10DOI: 10.1109/ICPR.2002.1048260
A. Erçil, Burak Büke
When the number of objects in the training set is too small for the number of features used, most classification procedures cannot find good classification boundaries. In this paper, we introduce a new technique to solve the one class classification problem based on fitting an implicit polynomial surface to the point cloud of features to model the one class which we are trying to separate from the others.
{"title":"One class classification using implicit polynomial surface fitting","authors":"A. Erçil, Burak Büke","doi":"10.1109/ICPR.2002.1048260","DOIUrl":"https://doi.org/10.1109/ICPR.2002.1048260","url":null,"abstract":"When the number of objects in the training set is too small for the number of features used, most classification procedures cannot find good classification boundaries. In this paper, we introduce a new technique to solve the one class classification problem based on fitting an implicit polynomial surface to the point cloud of features to model the one class which we are trying to separate from the others.","PeriodicalId":159502,"journal":{"name":"Object recognition supported by user interaction for service robots","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121862131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-12-10DOI: 10.1109/ICPR.2002.1048325
P. Perner, Horst Perner, Bernd Müller
We investigated the Boolean model for the classification of textures. We were interested in three issues: 1. What are the best features for classification? 2. How does the number of Boolean models created from the original image influence the accuracy of the classifier? 3. Is decision tree induction the right method for classification? We are working on a real-world application which is the classification of HEp-2 cells. This kind of cells are used in medicine for the identification of antinuclear autoantibodies. Human experts describe the characteristics of these cells by symbolic texture features. We apply the Boolean model to this problem and assume that the primary grains are regions of random size and shape. We use decision tree induction in order to learn the relevant classification knowledge and the structure of the classifier.
{"title":"Texture classification based on the Boolean model and its application to HEp-2 cells","authors":"P. Perner, Horst Perner, Bernd Müller","doi":"10.1109/ICPR.2002.1048325","DOIUrl":"https://doi.org/10.1109/ICPR.2002.1048325","url":null,"abstract":"We investigated the Boolean model for the classification of textures. We were interested in three issues: 1. What are the best features for classification? 2. How does the number of Boolean models created from the original image influence the accuracy of the classifier? 3. Is decision tree induction the right method for classification? We are working on a real-world application which is the classification of HEp-2 cells. This kind of cells are used in medicine for the identification of antinuclear autoantibodies. Human experts describe the characteristics of these cells by symbolic texture features. We apply the Boolean model to this problem and assume that the primary grains are regions of random size and shape. We use decision tree induction in order to learn the relevant classification knowledge and the structure of the classifier.","PeriodicalId":159502,"journal":{"name":"Object recognition supported by user interaction for service robots","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126267691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-12-10DOI: 10.1109/ICPR.2002.1048479
T. Ojala, Markus Aittola, Esa Matinmikko
This paper conducts an empirical evaluation of MPEG-7 visual part of experimentation model (XM) color descriptors in a challenging problem of content-based retrieval of semantic image categories. The performance of the four color descriptors provided in the current XM reference implementation, Color Layout, Color Structure, Dominant Color and Scalable Color is compared to that of HSV autocorrelogram, which has done well in recent empirical studies. Experimental results show that Color Structure provides best retrieval accuracy, whereas the computationally most expensive descriptor Dominant Color is worst in this problem.
{"title":"Empirical evaluation of MPEG-7 XM color descriptors in content-based retrieval of semantic image categories","authors":"T. Ojala, Markus Aittola, Esa Matinmikko","doi":"10.1109/ICPR.2002.1048479","DOIUrl":"https://doi.org/10.1109/ICPR.2002.1048479","url":null,"abstract":"This paper conducts an empirical evaluation of MPEG-7 visual part of experimentation model (XM) color descriptors in a challenging problem of content-based retrieval of semantic image categories. The performance of the four color descriptors provided in the current XM reference implementation, Color Layout, Color Structure, Dominant Color and Scalable Color is compared to that of HSV autocorrelogram, which has done well in recent empirical studies. Experimental results show that Color Structure provides best retrieval accuracy, whereas the computationally most expensive descriptor Dominant Color is worst in this problem.","PeriodicalId":159502,"journal":{"name":"Object recognition supported by user interaction for service robots","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123309336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-12-10DOI: 10.1109/ICPR.2002.1048463
M. Naphade, S. Basu, John R. Smith, Ching-Yung Lin, Belle L. Tseng
Statistical: modeling for content based retrieval is examined in the context of recent TREC Video benchmark exercise. The TREC Video exercise can be viewed as a test bed for evaluation and comparison of a variety of different algorithms on a set of high-level queries for multimedia retrieval. We report on the use of techniques adopted from statistical learning theory. Our method depends on training of models based on large data sets. Particularly, we use statistical models such as Gaussian mixture models to build computational representations for a variety of semantic concepts including rocket-launch, outdoor greenery, sky etc. Training requires a large amount of annotated (labeled) data. Thus, we explore the use of active learning for the annotation engine that minimizes the number of training samples to be labeled for satisfactory performance.
{"title":"A statistical modeling approach to content based video retrieval","authors":"M. Naphade, S. Basu, John R. Smith, Ching-Yung Lin, Belle L. Tseng","doi":"10.1109/ICPR.2002.1048463","DOIUrl":"https://doi.org/10.1109/ICPR.2002.1048463","url":null,"abstract":"Statistical: modeling for content based retrieval is examined in the context of recent TREC Video benchmark exercise. The TREC Video exercise can be viewed as a test bed for evaluation and comparison of a variety of different algorithms on a set of high-level queries for multimedia retrieval. We report on the use of techniques adopted from statistical learning theory. Our method depends on training of models based on large data sets. Particularly, we use statistical models such as Gaussian mixture models to build computational representations for a variety of semantic concepts including rocket-launch, outdoor greenery, sky etc. Training requires a large amount of annotated (labeled) data. Thus, we explore the use of active learning for the annotation engine that minimizes the number of training samples to be labeled for satisfactory performance.","PeriodicalId":159502,"journal":{"name":"Object recognition supported by user interaction for service robots","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123758972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-12-10DOI: 10.1109/ICPR.2002.1048456
M. Loog, B. Ginneken
We propose a general iterative contextual pixel classifier for supervised image segmentation. The iterative procedure is statistically well-founded and can be considered a variation on the iterated conditional modes (ICM) of Besag (1983). Having an initial segmentation, the algorithm iteratively updates it by reclassifying every pixel, based on the original features and, additionally, contextual information. This contextual information consists of the class labels of pixels in the neighborhood of the pixel to be reclassified. Three essential differences with the original ICM are: (1) our update step is merely based on a classification result, hence a voiding the explicit calculation of conditional probabilities; (2) the clique formalism of the Markov random field framework is not required; (3) no assumption is made w.r.t. the conditional independence of the observed pixel values given the segmented image. The important consequence of properties 1 and 2 is that one can easily incorporate rate common pattern recognition tools in our segmentation algorithm. Examples are different classifiers-e.g. Fisher linear discriminant, nearest-neighbor classifier, or support vector machines-and dimension reduction techniques like LDA, or PCA. We experimentally compare a specific instance of our general method to pixel classification, using simulated data and chest radiographs, and show that the former outperforms the latter.
{"title":"Supervised segmentation by iterated contextual pixel classification","authors":"M. Loog, B. Ginneken","doi":"10.1109/ICPR.2002.1048456","DOIUrl":"https://doi.org/10.1109/ICPR.2002.1048456","url":null,"abstract":"We propose a general iterative contextual pixel classifier for supervised image segmentation. The iterative procedure is statistically well-founded and can be considered a variation on the iterated conditional modes (ICM) of Besag (1983). Having an initial segmentation, the algorithm iteratively updates it by reclassifying every pixel, based on the original features and, additionally, contextual information. This contextual information consists of the class labels of pixels in the neighborhood of the pixel to be reclassified. Three essential differences with the original ICM are: (1) our update step is merely based on a classification result, hence a voiding the explicit calculation of conditional probabilities; (2) the clique formalism of the Markov random field framework is not required; (3) no assumption is made w.r.t. the conditional independence of the observed pixel values given the segmented image. The important consequence of properties 1 and 2 is that one can easily incorporate rate common pattern recognition tools in our segmentation algorithm. Examples are different classifiers-e.g. Fisher linear discriminant, nearest-neighbor classifier, or support vector machines-and dimension reduction techniques like LDA, or PCA. We experimentally compare a specific instance of our general method to pixel classification, using simulated data and chest radiographs, and show that the former outperforms the latter.","PeriodicalId":159502,"journal":{"name":"Object recognition supported by user interaction for service robots","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127188967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-12-10DOI: 10.1109/ICPR.2002.1048377
E. Michaelsen, U. Soergel, Uwe Stilla
InSAR data are used to recognise large industrial building complexes. Such buildings often show salient regular patterns of strong scatterers on their roofs. A previous segmentation which uses the intensity, height and coherence information extracts building cues. Strong scatterers are filtered by a spot detector and localised by a cluster formation. Strong scatterers are grouped in rows by a process that uses the contours of the building cues as context. Stich buildings are labelled as industrial buildings and serve as seeds to assemble adjacent buildings into complex structured building aggregates. The structure of the grouping process is depicted by a production net.
{"title":"Grouping salient scatterers in InSAR data for recognition of industrial buildings","authors":"E. Michaelsen, U. Soergel, Uwe Stilla","doi":"10.1109/ICPR.2002.1048377","DOIUrl":"https://doi.org/10.1109/ICPR.2002.1048377","url":null,"abstract":"InSAR data are used to recognise large industrial building complexes. Such buildings often show salient regular patterns of strong scatterers on their roofs. A previous segmentation which uses the intensity, height and coherence information extracts building cues. Strong scatterers are filtered by a spot detector and localised by a cluster formation. Strong scatterers are grouped in rows by a process that uses the contours of the building cues as context. Stich buildings are labelled as industrial buildings and serve as seeds to assemble adjacent buildings into complex structured building aggregates. The structure of the grouping process is depicted by a production net.","PeriodicalId":159502,"journal":{"name":"Object recognition supported by user interaction for service robots","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127700388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-12-10DOI: 10.1109/ICPR.2002.1048337
S. Mahmoudi, M. Daoudi
In this work we introduce a new method for indexing 3D models. This method is based on the characterization of 3D objects by a set of 7 characteristic views, including three principals, and four secondaries. The primary, secondary, and tertiary viewing directions are determined by the eigenvector analysis of the covariance matrix related to the 3D object. The secondary views are deduced from the principal views. We propose an index based on "curvature scale space", organized around a tree structure, named M-Tree, which is parameterized by a distance function and allows one to considerably decrease the calculating time by saving the intermediate distances.
{"title":"3D models retrieval by using characteristic views","authors":"S. Mahmoudi, M. Daoudi","doi":"10.1109/ICPR.2002.1048337","DOIUrl":"https://doi.org/10.1109/ICPR.2002.1048337","url":null,"abstract":"In this work we introduce a new method for indexing 3D models. This method is based on the characterization of 3D objects by a set of 7 characteristic views, including three principals, and four secondaries. The primary, secondary, and tertiary viewing directions are determined by the eigenvector analysis of the covariance matrix related to the 3D object. The secondary views are deduced from the principal views. We propose an index based on \"curvature scale space\", organized around a tree structure, named M-Tree, which is parameterized by a distance function and allows one to considerably decrease the calculating time by saving the intermediate distances.","PeriodicalId":159502,"journal":{"name":"Object recognition supported by user interaction for service robots","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131301697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-12-10DOI: 10.1109/ICPR.2002.1048431
Gu Xu, Yu-Fei Ma, HongJiang Zhang, Shiqiang Yang
Motion is an important cue for video understanding and is widely used in many semantic video analyses. We present a new motion representation scheme in which motion in a video is represented by the responses of frames to a set of motion filters. Each of these filters is designed to be most responsive to a type of dominant motion. Then we employ hidden Markov models (HMMs) to characterize the motion patterns based on these features and thus classify basketball video into 16 events. The evaluation by human satisfaction rate to classification result is 75%, demonstrating effectiveness of the proposed approach to recognizing semantic events in video.
{"title":"Motion based event recognition using HMM","authors":"Gu Xu, Yu-Fei Ma, HongJiang Zhang, Shiqiang Yang","doi":"10.1109/ICPR.2002.1048431","DOIUrl":"https://doi.org/10.1109/ICPR.2002.1048431","url":null,"abstract":"Motion is an important cue for video understanding and is widely used in many semantic video analyses. We present a new motion representation scheme in which motion in a video is represented by the responses of frames to a set of motion filters. Each of these filters is designed to be most responsive to a type of dominant motion. Then we employ hidden Markov models (HMMs) to characterize the motion patterns based on these features and thus classify basketball video into 16 events. The evaluation by human satisfaction rate to classification result is 75%, demonstrating effectiveness of the proposed approach to recognizing semantic events in video.","PeriodicalId":159502,"journal":{"name":"Object recognition supported by user interaction for service robots","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121224503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}