Subhodev Das, B. Bhanu, Xingzhi. Wu, R. Braithwaite
Recognition of aircraft in complex, perspective aerial imagery has to be accomplished in presence of clutter, occlusion, shadow, and various forms of image degradation. This paper presents a system for aircraft recognition under real-world conditions that is based on the use of a hierarchical database of object models. The particular approach involves three key processes: (a) The qualitative object recognition process performs model-based symbolic feature extraction and generic object recognition; (b) The refocused matching and evaluation process refines the extracted features for more specific classification with input from (a); and (c) The primitive feature extraction process regulates the extracted features based on their saliency and interacts with (a) and (b). Experimental results showing the qualitative recognition of aircraft in perspective, aerial images are presented.<>
{"title":"A system for aircraft recognition in perspective aerial images","authors":"Subhodev Das, B. Bhanu, Xingzhi. Wu, R. Braithwaite","doi":"10.1109/ACV.1994.341305","DOIUrl":"https://doi.org/10.1109/ACV.1994.341305","url":null,"abstract":"Recognition of aircraft in complex, perspective aerial imagery has to be accomplished in presence of clutter, occlusion, shadow, and various forms of image degradation. This paper presents a system for aircraft recognition under real-world conditions that is based on the use of a hierarchical database of object models. The particular approach involves three key processes: (a) The qualitative object recognition process performs model-based symbolic feature extraction and generic object recognition; (b) The refocused matching and evaluation process refines the extracted features for more specific classification with input from (a); and (c) The primitive feature extraction process regulates the extracted features based on their saliency and interacts with (a) and (b). Experimental results showing the qualitative recognition of aircraft in perspective, aerial images are presented.<<ETX>>","PeriodicalId":437089,"journal":{"name":"Proceedings of 1994 IEEE Workshop on Applications of Computer Vision","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126906138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We describe a prototype generic form reader (GFR) system for reading hand-filled forms. The system can read run-on or touching handprinted characters. A one-time form specification is required for each type of form that the system is expected to read. The form specification includes geometric location of registration marks and fields of interest, field grammars, and system parameters. The GFR begins by detecting registration marks, computing image skew, extracting deskewed fields, and computing connected components in the field images. Next, the connected components are split into segments using heuristics about good splitting points. The system is liberal in splitting, i.e., a split segment could be a part of a character or a complete character, and hopefully no more than a character. Next, the segments are adaptively regrouped into 'seg-groups' with the aid of a dynamic programming algorithm that matches the character answers for the seg-groups with the field grammar specification. The single character recognizer (SCR) uses high order combinations of raw geometric features derived from segments and seg-groups. The high order combining rules are derived by statistical discriminant analysis of raw features. The GFR system provides some generic tools that can be applied to other document image analysis problems besides forms reading.<>
{"title":"Anatomy of a hand-filled form reader","authors":"A. K. Chhabra","doi":"10.1109/ACV.1994.341309","DOIUrl":"https://doi.org/10.1109/ACV.1994.341309","url":null,"abstract":"We describe a prototype generic form reader (GFR) system for reading hand-filled forms. The system can read run-on or touching handprinted characters. A one-time form specification is required for each type of form that the system is expected to read. The form specification includes geometric location of registration marks and fields of interest, field grammars, and system parameters. The GFR begins by detecting registration marks, computing image skew, extracting deskewed fields, and computing connected components in the field images. Next, the connected components are split into segments using heuristics about good splitting points. The system is liberal in splitting, i.e., a split segment could be a part of a character or a complete character, and hopefully no more than a character. Next, the segments are adaptively regrouped into 'seg-groups' with the aid of a dynamic programming algorithm that matches the character answers for the seg-groups with the field grammar specification. The single character recognizer (SCR) uses high order combinations of raw geometric features derived from segments and seg-groups. The high order combining rules are derived by statistical discriminant analysis of raw features. The GFR system provides some generic tools that can be applied to other document image analysis problems besides forms reading.<<ETX>>","PeriodicalId":437089,"journal":{"name":"Proceedings of 1994 IEEE Workshop on Applications of Computer Vision","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126881978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rakesh Kumar, Kristin J. Dana, P. Anandan, Neil E. Okamoto, J. Bergen, P. Hemler, T. Sumanaweera, P. Elsen, J. Adler
In this paper we present techniques for frameless registration of 3D Magnetic Resonance (MR) and Computed Tomography (CT) volumetric data of the head and spine. We present techniques for estimating a 3D affine or rigid transform which can be used to resample the CT (or MR) data to align with the MR (or CT) data. Our technique transforms the MR and CT data sets with spatial filters so they can be directly matched. The matching is done by a direct optimization technique using a gradient based descent approach and a coarse-to-fine control strategy over a 4D pyramid. We present results on registering the head and spine data by matching 3D edges and results on registering cranial ventricle data by matching images filtered by a Laplacian of a Gaussian.<>
{"title":"Frameless registration of MR and CT 3D volumetric data sets","authors":"Rakesh Kumar, Kristin J. Dana, P. Anandan, Neil E. Okamoto, J. Bergen, P. Hemler, T. Sumanaweera, P. Elsen, J. Adler","doi":"10.1109/ACV.1994.341316","DOIUrl":"https://doi.org/10.1109/ACV.1994.341316","url":null,"abstract":"In this paper we present techniques for frameless registration of 3D Magnetic Resonance (MR) and Computed Tomography (CT) volumetric data of the head and spine. We present techniques for estimating a 3D affine or rigid transform which can be used to resample the CT (or MR) data to align with the MR (or CT) data. Our technique transforms the MR and CT data sets with spatial filters so they can be directly matched. The matching is done by a direct optimization technique using a gradient based descent approach and a coarse-to-fine control strategy over a 4D pyramid. We present results on registering the head and spine data by matching 3D edges and results on registering cranial ventricle data by matching images filtered by a Laplacian of a Gaussian.<<ETX>>","PeriodicalId":437089,"journal":{"name":"Proceedings of 1994 IEEE Workshop on Applications of Computer Vision","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130244906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Over the last several years the concept of model-supported exploitation (MSE) has evolved to a point where relatively simple computer vision algorithms can extract significant intelligence information from aerial images in a robust and reliable manner. Information extraction is enabled by the use of detailed 3D site models which provide an extensive context for the application of image analysis algorithms. This paper reviews the basic MSE concept and illustrates the approach using three operational concepts taken from the RADIUS project, quick-look, detection and counting and focussed change detection.<>
{"title":"Model supported exploitation: quick look, detection and counting, and change detection","authors":"C. Huang, J. Mundy, Charlie Rothwell","doi":"10.1109/ACV.1994.341302","DOIUrl":"https://doi.org/10.1109/ACV.1994.341302","url":null,"abstract":"Over the last several years the concept of model-supported exploitation (MSE) has evolved to a point where relatively simple computer vision algorithms can extract significant intelligence information from aerial images in a robust and reliable manner. Information extraction is enabled by the use of detailed 3D site models which provide an extensive context for the application of image analysis algorithms. This paper reviews the basic MSE concept and illustrates the approach using three operational concepts taken from the RADIUS project, quick-look, detection and counting and focussed change detection.<<ETX>>","PeriodicalId":437089,"journal":{"name":"Proceedings of 1994 IEEE Workshop on Applications of Computer Vision","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115521490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genetic labeling is a new labeling algorithm using Genetic Algorithm (GA). Although several applications of GA for low-level image processing such as line detection have been studied, they still require much computing time. We apply GA to the labeling for scene interpretation. The chromosome coding method we proposed is such that each bit represents the existence of an object. Genetic operation enables efficient labeling based on the building block hypothesis. We have developed a vision system for depalletizing robot using this technique. Object candidates are properly labeled, and the position of cartons is recognized. Through real image experiments, we estimated that genetic labeling is about 100 times faster than an improved enumerating method. Also we have proven that the reliability and speed of this system is practical.<>
{"title":"Genetic labeling and its application to depalletizing robot vision","authors":"M. Hashimoto, K. Sumi","doi":"10.1109/ACV.1994.341307","DOIUrl":"https://doi.org/10.1109/ACV.1994.341307","url":null,"abstract":"Genetic labeling is a new labeling algorithm using Genetic Algorithm (GA). Although several applications of GA for low-level image processing such as line detection have been studied, they still require much computing time. We apply GA to the labeling for scene interpretation. The chromosome coding method we proposed is such that each bit represents the existence of an object. Genetic operation enables efficient labeling based on the building block hypothesis. We have developed a vision system for depalletizing robot using this technique. Object candidates are properly labeled, and the position of cartons is recognized. Through real image experiments, we estimated that genetic labeling is about 100 times faster than an improved enumerating method. Also we have proven that the reliability and speed of this system is practical.<<ETX>>","PeriodicalId":437089,"journal":{"name":"Proceedings of 1994 IEEE Workshop on Applications of Computer Vision","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126160172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes the design of a binocular stereo system called CCLPS (Computerized Coal Profiling System) that provides dense, accurate disparity maps of coal as it is being transported in open rail cars. After a quantitative analysis of previously developed cepstral correspondence techniques which highlights the shortcomings of the cepstrum's matching ability in the presence of random noise and severe foreshortening distortion, we present a modified power cepstral approach that is less sensitive to these effects, along with analytical arguments verifying its robustness. The design of the CCLPS system is then discussed in detail and its performance is verified.<>
{"title":"An automated stereoscopic coal profiling system-CCLPS","authors":"Philip W. Smith, N. Nandhakumar","doi":"10.1109/ACV.1994.341283","DOIUrl":"https://doi.org/10.1109/ACV.1994.341283","url":null,"abstract":"This paper describes the design of a binocular stereo system called CCLPS (Computerized Coal Profiling System) that provides dense, accurate disparity maps of coal as it is being transported in open rail cars. After a quantitative analysis of previously developed cepstral correspondence techniques which highlights the shortcomings of the cepstrum's matching ability in the presence of random noise and severe foreshortening distortion, we present a modified power cepstral approach that is less sensitive to these effects, along with analytical arguments verifying its robustness. The design of the CCLPS system is then discussed in detail and its performance is verified.<<ETX>>","PeriodicalId":437089,"journal":{"name":"Proceedings of 1994 IEEE Workshop on Applications of Computer Vision","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126602373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents some techniques for automatically deriving realistic 2-D scenes and 3-D geometric models from video sequences. These techniques can be used to build environments and 3-D models for virtual reality application based on recreating a true scene, i.e., tele-reality applications. The fundamental technique used in this paper is image mosaicing, i.e., the automatic alignment of multiple images into larger aggregates which are then used to represent portions of a 3-D scene. The paper first examines the easiest problems, those of flat scene and panoramic scene mosaicing. It then progresses to more complicated scenes with depth, and concludes with full 3-D models. The paper also discusses a number of novel applications based on tele-reality technology.<>
{"title":"Image mosaicing for tele-reality applications","authors":"R. Szeliski","doi":"10.1109/ACV.1994.341287","DOIUrl":"https://doi.org/10.1109/ACV.1994.341287","url":null,"abstract":"This paper presents some techniques for automatically deriving realistic 2-D scenes and 3-D geometric models from video sequences. These techniques can be used to build environments and 3-D models for virtual reality application based on recreating a true scene, i.e., tele-reality applications. The fundamental technique used in this paper is image mosaicing, i.e., the automatic alignment of multiple images into larger aggregates which are then used to represent portions of a 3-D scene. The paper first examines the easiest problems, those of flat scene and panoramic scene mosaicing. It then progresses to more complicated scenes with depth, and concludes with full 3-D models. The paper also discusses a number of novel applications based on tele-reality technology.<<ETX>>","PeriodicalId":437089,"journal":{"name":"Proceedings of 1994 IEEE Workshop on Applications of Computer Vision","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121467766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
An important application of machine vision is to provide a means to monitor a scene over a period of time and report changes in the content of the scene. We have developed a validation mechanism that implements the first step towards a system for detecting changes in images of aerial scenes. By validation we mean the confirmation of the presence of model objects in the image. Our system uses a 3-D site model of the scene as a basis for model validation, and eventually for detecting changes and to update the site model. The scenario for our present validation system consists of adding a new image to a database associated with the site. The validation process is implemented in three steps: registration of the image to the model, or equivalently, determination of the position and orientation of the camera; matching of model features to image features; and validation of the objects in the model. Our system processes the new image monocularly and uses shadows as 3-D clues to help validate the model. The system has been tested using a hand-generated site model and several images of a 500:1 scale model of the site, acquired form several viewpoints.<>
{"title":"Model validation for change detection [machine vision]","authors":"M. Bejanin, A. Huertas, G. Medioni, R. Nevatia","doi":"10.1109/ACV.1994.341304","DOIUrl":"https://doi.org/10.1109/ACV.1994.341304","url":null,"abstract":"An important application of machine vision is to provide a means to monitor a scene over a period of time and report changes in the content of the scene. We have developed a validation mechanism that implements the first step towards a system for detecting changes in images of aerial scenes. By validation we mean the confirmation of the presence of model objects in the image. Our system uses a 3-D site model of the scene as a basis for model validation, and eventually for detecting changes and to update the site model. The scenario for our present validation system consists of adding a new image to a database associated with the site. The validation process is implemented in three steps: registration of the image to the model, or equivalently, determination of the position and orientation of the camera; matching of model features to image features; and validation of the objects in the model. Our system processes the new image monocularly and uses shadows as 3-D clues to help validate the model. The system has been tested using a hand-generated site model and several images of a 500:1 scale model of the site, acquired form several viewpoints.<<ETX>>","PeriodicalId":437089,"journal":{"name":"Proceedings of 1994 IEEE Workshop on Applications of Computer Vision","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134560910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In automatic line drawing interpretation (e.g. map interpretation), one of the problems encountered is the finite size of scanners, or that scanners of the required size are not available. Often, one large scan is necessary for the interpretation process, instead of several smaller ones generated by the usual scanning in parts. This paper describes a method for automatically compiling mosaics from separately scanned line drawings. A mosaic is a collection of separately obtained images which are combined to form one larger image. The method is based on vectorization of the line drawings, which is used to select the control points for a geometric transformation automatically. It is not necessary to specify the overlap area between the line drawings. The resulting system is evaluated using large scale maps. Experiments with different overlaps between the line drawings were done. Results are good: the algorithm succeeds in finding accurate parameters for the transformation.<>
{"title":"Compilation of mosaics from separately scanned line drawings","authors":"R. D. T. Janssen, A. Vossepoel","doi":"10.1109/ACV.1994.341286","DOIUrl":"https://doi.org/10.1109/ACV.1994.341286","url":null,"abstract":"In automatic line drawing interpretation (e.g. map interpretation), one of the problems encountered is the finite size of scanners, or that scanners of the required size are not available. Often, one large scan is necessary for the interpretation process, instead of several smaller ones generated by the usual scanning in parts. This paper describes a method for automatically compiling mosaics from separately scanned line drawings. A mosaic is a collection of separately obtained images which are combined to form one larger image. The method is based on vectorization of the line drawings, which is used to select the control points for a geometric transformation automatically. It is not necessary to specify the overlap area between the line drawings. The resulting system is evaluated using large scale maps. Experiments with different overlaps between the line drawings were done. Results are good: the algorithm succeeds in finding accurate parameters for the transformation.<<ETX>>","PeriodicalId":437089,"journal":{"name":"Proceedings of 1994 IEEE Workshop on Applications of Computer Vision","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133780193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper investigates the the feasibility of using visual and infrared imaging sensors to aid in the location of the aircraft position during operations such as landing in bad weather. The choice of the airport model used is crucial to algorithms which are used for position estimation based on pattern recognition. In this paper we describe the effects the choice of a model has on the behaviour of such matching algorithms. Three basic models are chosen: a line segment based model, an area based model and a texture based model. It is seen that a sparse line segment based model is not adequate to identify the runway since it matches a number of false artifacts in the image. An enhanced line segment based model containing a large number of features compares favourably with the area based model. The texture based model is seen to need a number of camera and weather dependent parameters and the performance of such a scheme is not seen to be substantially better. Thus either a proper area based model or a pseudo-area based model (based on a very large number of line features) can be seen to provide the best performance for such landmark identification and position determination algorithms.<>
{"title":"Modelling issues in vision based aircraft navigation during landing","authors":"Tarun Soni, B. Sridhar","doi":"10.1109/ACV.1994.341293","DOIUrl":"https://doi.org/10.1109/ACV.1994.341293","url":null,"abstract":"This paper investigates the the feasibility of using visual and infrared imaging sensors to aid in the location of the aircraft position during operations such as landing in bad weather. The choice of the airport model used is crucial to algorithms which are used for position estimation based on pattern recognition. In this paper we describe the effects the choice of a model has on the behaviour of such matching algorithms. Three basic models are chosen: a line segment based model, an area based model and a texture based model. It is seen that a sparse line segment based model is not adequate to identify the runway since it matches a number of false artifacts in the image. An enhanced line segment based model containing a large number of features compares favourably with the area based model. The texture based model is seen to need a number of camera and weather dependent parameters and the performance of such a scheme is not seen to be substantially better. Thus either a proper area based model or a pseudo-area based model (based on a very large number of line features) can be seen to provide the best performance for such landmark identification and position determination algorithms.<<ETX>>","PeriodicalId":437089,"journal":{"name":"Proceedings of 1994 IEEE Workshop on Applications of Computer Vision","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115144427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}