Subhodev Das, B. Bhanu, Xingzhi. Wu, R. Braithwaite
Recognition of aircraft in complex, perspective aerial imagery has to be accomplished in presence of clutter, occlusion, shadow, and various forms of image degradation. This paper presents a system for aircraft recognition under real-world conditions that is based on the use of a hierarchical database of object models. The particular approach involves three key processes: (a) The qualitative object recognition process performs model-based symbolic feature extraction and generic object recognition; (b) The refocused matching and evaluation process refines the extracted features for more specific classification with input from (a); and (c) The primitive feature extraction process regulates the extracted features based on their saliency and interacts with (a) and (b). Experimental results showing the qualitative recognition of aircraft in perspective, aerial images are presented.<>
{"title":"A system for aircraft recognition in perspective aerial images","authors":"Subhodev Das, B. Bhanu, Xingzhi. Wu, R. Braithwaite","doi":"10.1109/ACV.1994.341305","DOIUrl":"https://doi.org/10.1109/ACV.1994.341305","url":null,"abstract":"Recognition of aircraft in complex, perspective aerial imagery has to be accomplished in presence of clutter, occlusion, shadow, and various forms of image degradation. This paper presents a system for aircraft recognition under real-world conditions that is based on the use of a hierarchical database of object models. The particular approach involves three key processes: (a) The qualitative object recognition process performs model-based symbolic feature extraction and generic object recognition; (b) The refocused matching and evaluation process refines the extracted features for more specific classification with input from (a); and (c) The primitive feature extraction process regulates the extracted features based on their saliency and interacts with (a) and (b). Experimental results showing the qualitative recognition of aircraft in perspective, aerial images are presented.<<ETX>>","PeriodicalId":437089,"journal":{"name":"Proceedings of 1994 IEEE Workshop on Applications of Computer Vision","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126906138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We describe a prototype generic form reader (GFR) system for reading hand-filled forms. The system can read run-on or touching handprinted characters. A one-time form specification is required for each type of form that the system is expected to read. The form specification includes geometric location of registration marks and fields of interest, field grammars, and system parameters. The GFR begins by detecting registration marks, computing image skew, extracting deskewed fields, and computing connected components in the field images. Next, the connected components are split into segments using heuristics about good splitting points. The system is liberal in splitting, i.e., a split segment could be a part of a character or a complete character, and hopefully no more than a character. Next, the segments are adaptively regrouped into 'seg-groups' with the aid of a dynamic programming algorithm that matches the character answers for the seg-groups with the field grammar specification. The single character recognizer (SCR) uses high order combinations of raw geometric features derived from segments and seg-groups. The high order combining rules are derived by statistical discriminant analysis of raw features. The GFR system provides some generic tools that can be applied to other document image analysis problems besides forms reading.<>
{"title":"Anatomy of a hand-filled form reader","authors":"A. K. Chhabra","doi":"10.1109/ACV.1994.341309","DOIUrl":"https://doi.org/10.1109/ACV.1994.341309","url":null,"abstract":"We describe a prototype generic form reader (GFR) system for reading hand-filled forms. The system can read run-on or touching handprinted characters. A one-time form specification is required for each type of form that the system is expected to read. The form specification includes geometric location of registration marks and fields of interest, field grammars, and system parameters. The GFR begins by detecting registration marks, computing image skew, extracting deskewed fields, and computing connected components in the field images. Next, the connected components are split into segments using heuristics about good splitting points. The system is liberal in splitting, i.e., a split segment could be a part of a character or a complete character, and hopefully no more than a character. Next, the segments are adaptively regrouped into 'seg-groups' with the aid of a dynamic programming algorithm that matches the character answers for the seg-groups with the field grammar specification. The single character recognizer (SCR) uses high order combinations of raw geometric features derived from segments and seg-groups. The high order combining rules are derived by statistical discriminant analysis of raw features. The GFR system provides some generic tools that can be applied to other document image analysis problems besides forms reading.<<ETX>>","PeriodicalId":437089,"journal":{"name":"Proceedings of 1994 IEEE Workshop on Applications of Computer Vision","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126881978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rakesh Kumar, Kristin J. Dana, P. Anandan, Neil E. Okamoto, J. Bergen, P. Hemler, T. Sumanaweera, P. Elsen, J. Adler
In this paper we present techniques for frameless registration of 3D Magnetic Resonance (MR) and Computed Tomography (CT) volumetric data of the head and spine. We present techniques for estimating a 3D affine or rigid transform which can be used to resample the CT (or MR) data to align with the MR (or CT) data. Our technique transforms the MR and CT data sets with spatial filters so they can be directly matched. The matching is done by a direct optimization technique using a gradient based descent approach and a coarse-to-fine control strategy over a 4D pyramid. We present results on registering the head and spine data by matching 3D edges and results on registering cranial ventricle data by matching images filtered by a Laplacian of a Gaussian.<>
{"title":"Frameless registration of MR and CT 3D volumetric data sets","authors":"Rakesh Kumar, Kristin J. Dana, P. Anandan, Neil E. Okamoto, J. Bergen, P. Hemler, T. Sumanaweera, P. Elsen, J. Adler","doi":"10.1109/ACV.1994.341316","DOIUrl":"https://doi.org/10.1109/ACV.1994.341316","url":null,"abstract":"In this paper we present techniques for frameless registration of 3D Magnetic Resonance (MR) and Computed Tomography (CT) volumetric data of the head and spine. We present techniques for estimating a 3D affine or rigid transform which can be used to resample the CT (or MR) data to align with the MR (or CT) data. Our technique transforms the MR and CT data sets with spatial filters so they can be directly matched. The matching is done by a direct optimization technique using a gradient based descent approach and a coarse-to-fine control strategy over a 4D pyramid. We present results on registering the head and spine data by matching 3D edges and results on registering cranial ventricle data by matching images filtered by a Laplacian of a Gaussian.<<ETX>>","PeriodicalId":437089,"journal":{"name":"Proceedings of 1994 IEEE Workshop on Applications of Computer Vision","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130244906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Over the last several years the concept of model-supported exploitation (MSE) has evolved to a point where relatively simple computer vision algorithms can extract significant intelligence information from aerial images in a robust and reliable manner. Information extraction is enabled by the use of detailed 3D site models which provide an extensive context for the application of image analysis algorithms. This paper reviews the basic MSE concept and illustrates the approach using three operational concepts taken from the RADIUS project, quick-look, detection and counting and focussed change detection.<>
{"title":"Model supported exploitation: quick look, detection and counting, and change detection","authors":"C. Huang, J. Mundy, Charlie Rothwell","doi":"10.1109/ACV.1994.341302","DOIUrl":"https://doi.org/10.1109/ACV.1994.341302","url":null,"abstract":"Over the last several years the concept of model-supported exploitation (MSE) has evolved to a point where relatively simple computer vision algorithms can extract significant intelligence information from aerial images in a robust and reliable manner. Information extraction is enabled by the use of detailed 3D site models which provide an extensive context for the application of image analysis algorithms. This paper reviews the basic MSE concept and illustrates the approach using three operational concepts taken from the RADIUS project, quick-look, detection and counting and focussed change detection.<<ETX>>","PeriodicalId":437089,"journal":{"name":"Proceedings of 1994 IEEE Workshop on Applications of Computer Vision","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115521490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genetic labeling is a new labeling algorithm using Genetic Algorithm (GA). Although several applications of GA for low-level image processing such as line detection have been studied, they still require much computing time. We apply GA to the labeling for scene interpretation. The chromosome coding method we proposed is such that each bit represents the existence of an object. Genetic operation enables efficient labeling based on the building block hypothesis. We have developed a vision system for depalletizing robot using this technique. Object candidates are properly labeled, and the position of cartons is recognized. Through real image experiments, we estimated that genetic labeling is about 100 times faster than an improved enumerating method. Also we have proven that the reliability and speed of this system is practical.<>
{"title":"Genetic labeling and its application to depalletizing robot vision","authors":"M. Hashimoto, K. Sumi","doi":"10.1109/ACV.1994.341307","DOIUrl":"https://doi.org/10.1109/ACV.1994.341307","url":null,"abstract":"Genetic labeling is a new labeling algorithm using Genetic Algorithm (GA). Although several applications of GA for low-level image processing such as line detection have been studied, they still require much computing time. We apply GA to the labeling for scene interpretation. The chromosome coding method we proposed is such that each bit represents the existence of an object. Genetic operation enables efficient labeling based on the building block hypothesis. We have developed a vision system for depalletizing robot using this technique. Object candidates are properly labeled, and the position of cartons is recognized. Through real image experiments, we estimated that genetic labeling is about 100 times faster than an improved enumerating method. Also we have proven that the reliability and speed of this system is practical.<<ETX>>","PeriodicalId":437089,"journal":{"name":"Proceedings of 1994 IEEE Workshop on Applications of Computer Vision","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126160172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes the design of a binocular stereo system called CCLPS (Computerized Coal Profiling System) that provides dense, accurate disparity maps of coal as it is being transported in open rail cars. After a quantitative analysis of previously developed cepstral correspondence techniques which highlights the shortcomings of the cepstrum's matching ability in the presence of random noise and severe foreshortening distortion, we present a modified power cepstral approach that is less sensitive to these effects, along with analytical arguments verifying its robustness. The design of the CCLPS system is then discussed in detail and its performance is verified.<>
{"title":"An automated stereoscopic coal profiling system-CCLPS","authors":"Philip W. Smith, N. Nandhakumar","doi":"10.1109/ACV.1994.341283","DOIUrl":"https://doi.org/10.1109/ACV.1994.341283","url":null,"abstract":"This paper describes the design of a binocular stereo system called CCLPS (Computerized Coal Profiling System) that provides dense, accurate disparity maps of coal as it is being transported in open rail cars. After a quantitative analysis of previously developed cepstral correspondence techniques which highlights the shortcomings of the cepstrum's matching ability in the presence of random noise and severe foreshortening distortion, we present a modified power cepstral approach that is less sensitive to these effects, along with analytical arguments verifying its robustness. The design of the CCLPS system is then discussed in detail and its performance is verified.<<ETX>>","PeriodicalId":437089,"journal":{"name":"Proceedings of 1994 IEEE Workshop on Applications of Computer Vision","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126602373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents some techniques for automatically deriving realistic 2-D scenes and 3-D geometric models from video sequences. These techniques can be used to build environments and 3-D models for virtual reality application based on recreating a true scene, i.e., tele-reality applications. The fundamental technique used in this paper is image mosaicing, i.e., the automatic alignment of multiple images into larger aggregates which are then used to represent portions of a 3-D scene. The paper first examines the easiest problems, those of flat scene and panoramic scene mosaicing. It then progresses to more complicated scenes with depth, and concludes with full 3-D models. The paper also discusses a number of novel applications based on tele-reality technology.<>
{"title":"Image mosaicing for tele-reality applications","authors":"R. Szeliski","doi":"10.1109/ACV.1994.341287","DOIUrl":"https://doi.org/10.1109/ACV.1994.341287","url":null,"abstract":"This paper presents some techniques for automatically deriving realistic 2-D scenes and 3-D geometric models from video sequences. These techniques can be used to build environments and 3-D models for virtual reality application based on recreating a true scene, i.e., tele-reality applications. The fundamental technique used in this paper is image mosaicing, i.e., the automatic alignment of multiple images into larger aggregates which are then used to represent portions of a 3-D scene. The paper first examines the easiest problems, those of flat scene and panoramic scene mosaicing. It then progresses to more complicated scenes with depth, and concludes with full 3-D models. The paper also discusses a number of novel applications based on tele-reality technology.<<ETX>>","PeriodicalId":437089,"journal":{"name":"Proceedings of 1994 IEEE Workshop on Applications of Computer Vision","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121467766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Human-machine interfaces play a role of growing importance as computer technology continues to evolve. Motivated by the desire to provide users with an intuitive gesture input system, we describe the design of a recursive filter applied to the vision-based gesture interpretation problem. The gestures are modeled as a hidden Markov model with the state representing the gesture sequences, and the observations being the current static hand pose. At each time step the recursive filter updates its estimate of what gesture is occurring based on the current extracted pose information. The result is a robust system which provides the user with continual feedback during compound gestures.<>
{"title":"Recursive identification of gesture inputs using hidden Markov models","authors":"J. Schlenzig, E. Hunter, R. Jain","doi":"10.1109/ACV.1994.341308","DOIUrl":"https://doi.org/10.1109/ACV.1994.341308","url":null,"abstract":"Human-machine interfaces play a role of growing importance as computer technology continues to evolve. Motivated by the desire to provide users with an intuitive gesture input system, we describe the design of a recursive filter applied to the vision-based gesture interpretation problem. The gestures are modeled as a hidden Markov model with the state representing the gesture sequences, and the observations being the current static hand pose. At each time step the recursive filter updates its estimate of what gesture is occurring based on the current extracted pose information. The result is a robust system which provides the user with continual feedback during compound gestures.<<ETX>>","PeriodicalId":437089,"journal":{"name":"Proceedings of 1994 IEEE Workshop on Applications of Computer Vision","volume":"2 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114035292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper investigates the the feasibility of using visual and infrared imaging sensors to aid in the location of the aircraft position during operations such as landing in bad weather. The choice of the airport model used is crucial to algorithms which are used for position estimation based on pattern recognition. In this paper we describe the effects the choice of a model has on the behaviour of such matching algorithms. Three basic models are chosen: a line segment based model, an area based model and a texture based model. It is seen that a sparse line segment based model is not adequate to identify the runway since it matches a number of false artifacts in the image. An enhanced line segment based model containing a large number of features compares favourably with the area based model. The texture based model is seen to need a number of camera and weather dependent parameters and the performance of such a scheme is not seen to be substantially better. Thus either a proper area based model or a pseudo-area based model (based on a very large number of line features) can be seen to provide the best performance for such landmark identification and position determination algorithms.<>
{"title":"Modelling issues in vision based aircraft navigation during landing","authors":"Tarun Soni, B. Sridhar","doi":"10.1109/ACV.1994.341293","DOIUrl":"https://doi.org/10.1109/ACV.1994.341293","url":null,"abstract":"This paper investigates the the feasibility of using visual and infrared imaging sensors to aid in the location of the aircraft position during operations such as landing in bad weather. The choice of the airport model used is crucial to algorithms which are used for position estimation based on pattern recognition. In this paper we describe the effects the choice of a model has on the behaviour of such matching algorithms. Three basic models are chosen: a line segment based model, an area based model and a texture based model. It is seen that a sparse line segment based model is not adequate to identify the runway since it matches a number of false artifacts in the image. An enhanced line segment based model containing a large number of features compares favourably with the area based model. The texture based model is seen to need a number of camera and weather dependent parameters and the performance of such a scheme is not seen to be substantially better. Thus either a proper area based model or a pseudo-area based model (based on a very large number of line features) can be seen to provide the best performance for such landmark identification and position determination algorithms.<<ETX>>","PeriodicalId":437089,"journal":{"name":"Proceedings of 1994 IEEE Workshop on Applications of Computer Vision","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115144427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. W. Eklund, G. Ravichandran, M. Trivedi, S. B. Marapane
A real-time correspondence based tracking algorithm is detailed. The system uses a pipeline processor, a general purpose processor, a camera and a display. The Minimum Noise and Correlation Energy (MINACE) filter is used in the tracking algorithm as it provides a good combination of speed, accuracy and flexibility for the targeted hardware system. The system designed is fast and tracking is accomplished at a rate of 15 hz. The system is adaptive and does not rely on a previous model of the object; the training image for filter synthesis is acquired from previous image frames and the filter is synthesized online to accommodate 3-D variations of the target being tracked. The system tracks an object consistently as is demonstrated by the low deviation of the results in the evaluation. The correlation filter-based tracking algorithm has proved to be useful in our research in cooperative mobile robots. A visual servoing system has been implemented using this tracking algorithm for convoying of multiple mobile robots.<>
{"title":"Real-time visual tracking using correlation techniques","authors":"M. W. Eklund, G. Ravichandran, M. Trivedi, S. B. Marapane","doi":"10.1109/ACV.1994.341319","DOIUrl":"https://doi.org/10.1109/ACV.1994.341319","url":null,"abstract":"A real-time correspondence based tracking algorithm is detailed. The system uses a pipeline processor, a general purpose processor, a camera and a display. The Minimum Noise and Correlation Energy (MINACE) filter is used in the tracking algorithm as it provides a good combination of speed, accuracy and flexibility for the targeted hardware system. The system designed is fast and tracking is accomplished at a rate of 15 hz. The system is adaptive and does not rely on a previous model of the object; the training image for filter synthesis is acquired from previous image frames and the filter is synthesized online to accommodate 3-D variations of the target being tracked. The system tracks an object consistently as is demonstrated by the low deviation of the results in the evaluation. The correlation filter-based tracking algorithm has proved to be useful in our research in cooperative mobile robots. A visual servoing system has been implemented using this tracking algorithm for convoying of multiple mobile robots.<<ETX>>","PeriodicalId":437089,"journal":{"name":"Proceedings of 1994 IEEE Workshop on Applications of Computer Vision","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1994-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123235706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}