Pub Date : 2003-09-17DOI: 10.1109/ICIAP.2003.1234044
J. Kautz, H. Lensch, M. Goesele, J. Lang, H. Seidel
High quality, virtual 3D models are quickly emerging as a new multimedia data type with applications in such diverse areas as e-commerce, online encyclopaedias, or virtual museums, to name just a few. The paper presents new algorithms and techniques for the acquisition and real-time interaction with complex textured 3D objects and shows how these results can be seamlessly integrated with previous work into a single framework for the acquisition, processing, and interactive display of high quality 3D models. In addition to pure geometry, such algorithms also have to take into account the texture of an object (which is crucial for a realistic appearance) and its reflectance behavior. The measurement of accurate material properties is an important step towards photorealistic rendering, where both the general surface properties as well as the spatially varying effects of the object are needed. Recent work on the image-based reconstruction of spatially varying BRDFs (bidirectional reflectance distribution function) enables the generation of high quality models of real objects from a sparse set of input data. Efficient use of the capabilities of advanced PC graphics hardware allows for interactive rendering under arbitrary viewing and lighting conditions and realistically reproduces the appearance of the original object.
{"title":"Modeling the world: the virtualization pipeline","authors":"J. Kautz, H. Lensch, M. Goesele, J. Lang, H. Seidel","doi":"10.1109/ICIAP.2003.1234044","DOIUrl":"https://doi.org/10.1109/ICIAP.2003.1234044","url":null,"abstract":"High quality, virtual 3D models are quickly emerging as a new multimedia data type with applications in such diverse areas as e-commerce, online encyclopaedias, or virtual museums, to name just a few. The paper presents new algorithms and techniques for the acquisition and real-time interaction with complex textured 3D objects and shows how these results can be seamlessly integrated with previous work into a single framework for the acquisition, processing, and interactive display of high quality 3D models. In addition to pure geometry, such algorithms also have to take into account the texture of an object (which is crucial for a realistic appearance) and its reflectance behavior. The measurement of accurate material properties is an important step towards photorealistic rendering, where both the general surface properties as well as the spatially varying effects of the object are needed. Recent work on the image-based reconstruction of spatially varying BRDFs (bidirectional reflectance distribution function) enables the generation of high quality models of real objects from a sparse set of input data. Efficient use of the capabilities of advanced PC graphics hardware allows for interactive rendering under arbitrary viewing and lighting conditions and realistically reproduces the appearance of the original object.","PeriodicalId":218076,"journal":{"name":"12th International Conference on Image Analysis and Processing, 2003.Proceedings.","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114257810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-09-17DOI: 10.1109/ICIAP.2003.1234048
R. Strzodka, Ivo Ihrke, M. Magnor
The generalized Hough transform constitutes a wellknown approach to object recognition and pose detection. To attain reliable detection results, however, a very large number of candidate object poses and scale factors need to be considered. We employ an inexpensive, consumer-market graphics-card as the "poor man's" parallel processing system. We describe the implementation of a fast and enhanced version of the generalized Hough transform on graphics hardware. Thanks to the high bandwidth of on-board texture memory, a single pose can be evaluated in less than 3 ms, independent of the number of edge pixels in the image. From known object geometry, our hardware-accelerated generalized Hough transform algorithm is capable of detecting an object's 3D pose, scale, and position in the image within less than one minute. A good pose estimation is even delivered in less than 10 seconds.
{"title":"A graphics hardware implementation of the generalized Hough transform for fast object recognition, scale, and 3D pose detection","authors":"R. Strzodka, Ivo Ihrke, M. Magnor","doi":"10.1109/ICIAP.2003.1234048","DOIUrl":"https://doi.org/10.1109/ICIAP.2003.1234048","url":null,"abstract":"The generalized Hough transform constitutes a wellknown approach to object recognition and pose detection. To attain reliable detection results, however, a very large number of candidate object poses and scale factors need to be considered. We employ an inexpensive, consumer-market graphics-card as the \"poor man's\" parallel processing system. We describe the implementation of a fast and enhanced version of the generalized Hough transform on graphics hardware. Thanks to the high bandwidth of on-board texture memory, a single pose can be evaluated in less than 3 ms, independent of the number of edge pixels in the image. From known object geometry, our hardware-accelerated generalized Hough transform algorithm is capable of detecting an object's 3D pose, scale, and position in the image within less than one minute. A good pose estimation is even delivered in less than 10 seconds.","PeriodicalId":218076,"journal":{"name":"12th International Conference on Image Analysis and Processing, 2003.Proceedings.","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114508920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-09-17DOI: 10.1109/ICIAP.2003.1234094
Ikuhisa Mitsugami, N. Ukita, M. Kidode
We propose a new wearable system that can estimate the 3D position of a gazed point by measuring multiple binocular view lines. In principle, 3D measurement is possible by the triangulation of binocular view lines. However, it is difficult to measure these lines accurately with a device for eye tracking, because of errors caused by (1) difficulty in calibrating the device and (2) the limitation that a human cannot gaze very accurately at a distant point. Concerning (1), the accuracy of calibration can be improved by considering the optical properties of a camera in the device. To solve (2), we propose a stochastic algorithm that determines a gazed 3D position by integrating information of view lines observed at multiple head positions. We validated the effectiveness of the proposed algorithm experimentally.
{"title":"Estimation of 3D gazed position using view lines","authors":"Ikuhisa Mitsugami, N. Ukita, M. Kidode","doi":"10.1109/ICIAP.2003.1234094","DOIUrl":"https://doi.org/10.1109/ICIAP.2003.1234094","url":null,"abstract":"We propose a new wearable system that can estimate the 3D position of a gazed point by measuring multiple binocular view lines. In principle, 3D measurement is possible by the triangulation of binocular view lines. However, it is difficult to measure these lines accurately with a device for eye tracking, because of errors caused by (1) difficulty in calibrating the device and (2) the limitation that a human cannot gaze very accurately at a distant point. Concerning (1), the accuracy of calibration can be improved by considering the optical properties of a camera in the device. To solve (2), we propose a stochastic algorithm that determines a gazed 3D position by integrating information of view lines observed at multiple head positions. We validated the effectiveness of the proposed algorithm experimentally.","PeriodicalId":218076,"journal":{"name":"12th International Conference on Image Analysis and Processing, 2003.Proceedings.","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114953352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-09-17DOI: 10.1109/ICIAP.2003.1234110
A. Barla, F. Odone, A. Verri
In this paper we present a statistical learning scheme for image classification based on a mixture of old fashioned ideas and state of the art learning tools. We represent input images through large dimensional and usually sparse histograms which, depending on the task, are either color histograms or co-occurrence matrices. Support vector machines are trained on these sparse inputs directly, to solve problems like indoor/outdoor classification and cityscape retrieval from image databases. The experimental results indicate that the use of a kernel function derived from the computer vision literature leads to better recognition results than off the shelf kernels. According to our findings, it appears that image classification problems can be addressed with no need of explicit feature extraction or dimensionality reduction stages. We argue that this might be used as the starting point for developing image classification systems which can be easily tuned to a number of different tasks.
{"title":"Old fashioned state-of-the-art image classification","authors":"A. Barla, F. Odone, A. Verri","doi":"10.1109/ICIAP.2003.1234110","DOIUrl":"https://doi.org/10.1109/ICIAP.2003.1234110","url":null,"abstract":"In this paper we present a statistical learning scheme for image classification based on a mixture of old fashioned ideas and state of the art learning tools. We represent input images through large dimensional and usually sparse histograms which, depending on the task, are either color histograms or co-occurrence matrices. Support vector machines are trained on these sparse inputs directly, to solve problems like indoor/outdoor classification and cityscape retrieval from image databases. The experimental results indicate that the use of a kernel function derived from the computer vision literature leads to better recognition results than off the shelf kernels. According to our findings, it appears that image classification problems can be addressed with no need of explicit feature extraction or dimensionality reduction stages. We argue that this might be used as the starting point for developing image classification systems which can be easily tuned to a number of different tasks.","PeriodicalId":218076,"journal":{"name":"12th International Conference on Image Analysis and Processing, 2003.Proceedings.","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123866572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-09-17DOI: 10.1109/ICIAP.2003.1234098
Shih-Fu Chang
Today's mobile and wireless users access multimedia content from different types of networks and terminals. Content analysis plays a critical role in developing effective solutions in meeting unique resource constraints and user preferences in such usage environments. Specifically, content analysis is central to automatic discovery of syntactic-level summaries and generation of concise semantic-level summaries. Content analysis also provides a promising direction for finding optimal adaptation methods under various resource-utility constraints. The paper presents brief overviews of such emerging, fruitful areas and promising research directions.
{"title":"Content-based video summarization and adaptation for ubiquitous media access","authors":"Shih-Fu Chang","doi":"10.1109/ICIAP.2003.1234098","DOIUrl":"https://doi.org/10.1109/ICIAP.2003.1234098","url":null,"abstract":"Today's mobile and wireless users access multimedia content from different types of networks and terminals. Content analysis plays a critical role in developing effective solutions in meeting unique resource constraints and user preferences in such usage environments. Specifically, content analysis is central to automatic discovery of syntactic-level summaries and generation of concise semantic-level summaries. Content analysis also provides a promising direction for finding optimal adaptation methods under various resource-utility constraints. The paper presents brief overviews of such emerging, fruitful areas and promising research directions.","PeriodicalId":218076,"journal":{"name":"12th International Conference on Image Analysis and Processing, 2003.Proceedings.","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124231282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-09-17DOI: 10.1109/ICIAP.2003.1234095
A. Robles-Kelly, E. Hancock
We explore how spectral methods for graph seriation can be used to develop a new shape-from-shading algorithm. We characterise the field of surface normals using a transition matrix whose elements are computed from the sectional curvature between different image locations. We use a graph seriation method to define a curvature minimising surface integration path for the purposes of height reconstruction. To smooth the reconstructed surface, we fit quadric patches to the height data. The smoothed surface normal directions are updated ensuring compliance with Lambert's law. The processes of height recovery and surface normal adjustment are interleaved and iterated until a stable surface is obtained. We provide results on synthetic and real-world imagery.
{"title":"An eigenvector method for shape-from-shading","authors":"A. Robles-Kelly, E. Hancock","doi":"10.1109/ICIAP.2003.1234095","DOIUrl":"https://doi.org/10.1109/ICIAP.2003.1234095","url":null,"abstract":"We explore how spectral methods for graph seriation can be used to develop a new shape-from-shading algorithm. We characterise the field of surface normals using a transition matrix whose elements are computed from the sectional curvature between different image locations. We use a graph seriation method to define a curvature minimising surface integration path for the purposes of height reconstruction. To smooth the reconstructed surface, we fit quadric patches to the height data. The smoothed surface normal directions are updated ensuring compliance with Lambert's law. The processes of height recovery and surface normal adjustment are interleaved and iterated until a stable surface is obtained. We provide results on synthetic and real-world imagery.","PeriodicalId":218076,"journal":{"name":"12th International Conference on Image Analysis and Processing, 2003.Proceedings.","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125957815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-09-17DOI: 10.1109/ICIAP.2003.1234018
E. Holden, R. Owens
The paper presents a new hand shape representation technique that characterises the finger-only topology of the hand, by adapting an existing technique from speech signal processing. From a moving hand sequence, the tracking algorithm determines the centre of the largest convex subset of the hand, using a combination of pattern matching and condensation algorithms. A hand shape feature represents the topological formation of the finger-only regions of the hand using a linear predictive coding parameter set called cepstral coefficients. Experimental results demonstrate the effectiveness of detecting the shape feature from motion sequences.
{"title":"Recognising moving hand shapes","authors":"E. Holden, R. Owens","doi":"10.1109/ICIAP.2003.1234018","DOIUrl":"https://doi.org/10.1109/ICIAP.2003.1234018","url":null,"abstract":"The paper presents a new hand shape representation technique that characterises the finger-only topology of the hand, by adapting an existing technique from speech signal processing. From a moving hand sequence, the tracking algorithm determines the centre of the largest convex subset of the hand, using a combination of pattern matching and condensation algorithms. A hand shape feature represents the topological formation of the finger-only regions of the hand using a linear predictive coding parameter set called cepstral coefficients. Experimental results demonstrate the effectiveness of detecting the shape feature from motion sequences.","PeriodicalId":218076,"journal":{"name":"12th International Conference on Image Analysis and Processing, 2003.Proceedings.","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121921088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-09-17DOI: 10.1109/ICIAP.2003.1234126
W. Clocksin, P. P. Fernando
We describe a method implemented for the recognition of Syriac handwriting from historical manuscripts. The Syriac language has been a neglected area for handwriting recognition research, yet is interesting because the preponderance of scribe-written manuscripts offers a challenging yet tractable medium for OCR research between the extremes of typewritten text and free handwriting. Like Arabic, Syriac is written in a cursive form from right-to-left, and letter shape depends on the position within the word. The method described does not need to find character strokes or contours. Both whole words and character shapes were used in recognition experiments. After segmentation using a novel probabilistic method, features of these shapes are found that tolerate variation in formation and image quality. Each shape is recognised individually using a discriminative support vector machine with 10-fold cross-validation. We describe experiments using a variety of segmentation methods and combinations of features on characters and words. Images from scribe-written historical manuscripts are used, and the recognition results are compared with those for images taken from clearer 19th century typeset documents. Recognition rates vary from 61-100%, depending on the algorithms used and the size and source of the data set.
{"title":"Towards automatic transcription of Syriac handwriting","authors":"W. Clocksin, P. P. Fernando","doi":"10.1109/ICIAP.2003.1234126","DOIUrl":"https://doi.org/10.1109/ICIAP.2003.1234126","url":null,"abstract":"We describe a method implemented for the recognition of Syriac handwriting from historical manuscripts. The Syriac language has been a neglected area for handwriting recognition research, yet is interesting because the preponderance of scribe-written manuscripts offers a challenging yet tractable medium for OCR research between the extremes of typewritten text and free handwriting. Like Arabic, Syriac is written in a cursive form from right-to-left, and letter shape depends on the position within the word. The method described does not need to find character strokes or contours. Both whole words and character shapes were used in recognition experiments. After segmentation using a novel probabilistic method, features of these shapes are found that tolerate variation in formation and image quality. Each shape is recognised individually using a discriminative support vector machine with 10-fold cross-validation. We describe experiments using a variety of segmentation methods and combinations of features on characters and words. Images from scribe-written historical manuscripts are used, and the recognition results are compared with those for images taken from clearer 19th century typeset documents. Recognition rates vary from 61-100%, depending on the algorithms used and the size and source of the data set.","PeriodicalId":218076,"journal":{"name":"12th International Conference on Image Analysis and Processing, 2003.Proceedings.","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121144445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-09-17DOI: 10.1109/ICIAP.2003.1234078
B. Telle, M. Aldon, N. Ramdani
The paper deals with the problem of error estimation in 3D reconstruction. It shows how interval analysis can be used in this way for 3D vision applications. The description of an image point by an interval assumes an unknown but bounded localization. We present a new method based on interval analysis tools to propagate this bounded uncertainty. This way of computation can produce guaranteed results since a datum is not the most probabilistic value but an interval which contains the true value. We validate our method by computing a guaranteed model for a projective camera, and we achieve a guaranteed 3D reconstruction.
{"title":"Camera calibration and 3D reconstruction using interval analysis","authors":"B. Telle, M. Aldon, N. Ramdani","doi":"10.1109/ICIAP.2003.1234078","DOIUrl":"https://doi.org/10.1109/ICIAP.2003.1234078","url":null,"abstract":"The paper deals with the problem of error estimation in 3D reconstruction. It shows how interval analysis can be used in this way for 3D vision applications. The description of an image point by an interval assumes an unknown but bounded localization. We present a new method based on interval analysis tools to propagate this bounded uncertainty. This way of computation can produce guaranteed results since a datum is not the most probabilistic value but an interval which contains the true value. We validate our method by computing a guaranteed model for a projective camera, and we achieve a guaranteed 3D reconstruction.","PeriodicalId":218076,"journal":{"name":"12th International Conference on Image Analysis and Processing, 2003.Proceedings.","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116350497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-09-17DOI: 10.1109/ICIAP.2003.1234077
P. Baldassarri, P. Puliti, A. Montesanto, G. Tascini
The paper proposes a machine learning method for self-localising a mobile agent, using the images supplied by a single omni-directional camera. The images acquired by the camera may be viewed as an implicit topological representation of the environment. The environment is a priori unknown and the topological representation is derived by unsupervised neural network architecture. The architecture includes a self-organising neural network, and is constituted by a growing neural gas, which is well known for its topology preserving quality. The growth depends on the topology that is not a priori defined, and on the need of discovering it, by the neural network, during the learning. The implemented system is able to recognise correctly the input frames and to reconstruct a topological map of the environment. Each node of the neural network identifies a single zone of the environment and the connections between the nodes correspond to the real space connections in the environment.
{"title":"Visual self-localisation using automatic topology construction","authors":"P. Baldassarri, P. Puliti, A. Montesanto, G. Tascini","doi":"10.1109/ICIAP.2003.1234077","DOIUrl":"https://doi.org/10.1109/ICIAP.2003.1234077","url":null,"abstract":"The paper proposes a machine learning method for self-localising a mobile agent, using the images supplied by a single omni-directional camera. The images acquired by the camera may be viewed as an implicit topological representation of the environment. The environment is a priori unknown and the topological representation is derived by unsupervised neural network architecture. The architecture includes a self-organising neural network, and is constituted by a growing neural gas, which is well known for its topology preserving quality. The growth depends on the topology that is not a priori defined, and on the need of discovering it, by the neural network, during the learning. The implemented system is able to recognise correctly the input frames and to reconstruct a topological map of the environment. Each node of the neural network identifies a single zone of the environment and the connections between the nodes correspond to the real space connections in the environment.","PeriodicalId":218076,"journal":{"name":"12th International Conference on Image Analysis and Processing, 2003.Proceedings.","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126571740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}