Pub Date : 1998-01-03DOI: 10.1109/CAIVD.1998.646030
Sven J. Dickinson, A. Pentland, S. Stevenson
Current methods for shape-based image retrieval are restricted to images containing 2-D objects. We propose a novel approach to querying images containing 3-D objects, based on a view-based encoding of a finite domain of 3-D parts used to model the 3-D objects appearing in images. To build a query, the user manually identifies the salient parts of the object in a query image. The extracted views of these parts are then used to hypothesize the 3-D identities of the parts which, in turn, are used to hypothesize other possible views of the parts. The resulting set of part views, along with their spatial relations (constraints) in the query image, form a composite query that is passed to the image database. Images containing objects with the same parts (in any view) with similar spatial relations are returned to the user. The resulting viewpoint invariant indexing technique does not require training the system for all possible views of each object. Rather, the system requires only knowledge of the possible views for a finite vocabulary of 3-D parts from which the objects are constructed.
{"title":"Viewpoint-invariant indexing for content-based image retrieval","authors":"Sven J. Dickinson, A. Pentland, S. Stevenson","doi":"10.1109/CAIVD.1998.646030","DOIUrl":"https://doi.org/10.1109/CAIVD.1998.646030","url":null,"abstract":"Current methods for shape-based image retrieval are restricted to images containing 2-D objects. We propose a novel approach to querying images containing 3-D objects, based on a view-based encoding of a finite domain of 3-D parts used to model the 3-D objects appearing in images. To build a query, the user manually identifies the salient parts of the object in a query image. The extracted views of these parts are then used to hypothesize the 3-D identities of the parts which, in turn, are used to hypothesize other possible views of the parts. The resulting set of part views, along with their spatial relations (constraints) in the query image, form a composite query that is passed to the image database. Images containing objects with the same parts (in any view) with similar spatial relations are returned to the user. The resulting viewpoint invariant indexing technique does not require training the system for all possible views of each object. Rather, the system requires only knowledge of the possible views for a finite vocabulary of 3-D parts from which the objects are constructed.","PeriodicalId":360087,"journal":{"name":"Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114794136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-03DOI: 10.1109/CAIVD.1998.646033
Toshio Sato, T. Kanade, Ellen K. Hughes, Michael A. Smith
Video OCR is a technique that can greatly help to locate topics of interest in a large digital news video archive via the automatic extraction and reading of captions and annotations. News captions generally provide vital search information about the video being presented, the names of people and places or descriptions of objects. In this paper, two difficult problems of character recognition for videos are addressed: low resolution characters and extremely complex backgrounds. We apply an interpolation filter, multi-frame integration and a combination of four filters to solve these problems. Segmenting characters is done by a recognition-based segmentation method and intermediate character recognition results are used to improve the segmentation. The overall recognition results are good enough for use in news indexing. Performing video OCR on news video and combining its results with other video understanding techniques will improve the overall understanding of the news video content.
{"title":"Video OCR for digital news archive","authors":"Toshio Sato, T. Kanade, Ellen K. Hughes, Michael A. Smith","doi":"10.1109/CAIVD.1998.646033","DOIUrl":"https://doi.org/10.1109/CAIVD.1998.646033","url":null,"abstract":"Video OCR is a technique that can greatly help to locate topics of interest in a large digital news video archive via the automatic extraction and reading of captions and annotations. News captions generally provide vital search information about the video being presented, the names of people and places or descriptions of objects. In this paper, two difficult problems of character recognition for videos are addressed: low resolution characters and extremely complex backgrounds. We apply an interpolation filter, multi-frame integration and a combination of four filters to solve these problems. Segmenting characters is done by a recognition-based segmentation method and intermediate character recognition results are used to improve the segmentation. The overall recognition results are good enough for use in news indexing. Performing video OCR on news video and combining its results with other video understanding techniques will improve the overall understanding of the news video content.","PeriodicalId":360087,"journal":{"name":"Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126246403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-03DOI: 10.1109/CAIVD.1998.646028
J. Corridoni, A. Bimbo, P. Pala
Image databases are now a subject of increasing attention in multimedia, for archiving and retrieval of images in the fields of art, history, medicine and industry, among others. From the psychological point of view, color perception is related to several factors including color features (brightness, chromaticity and saturation), surrounding colors, color spatial organization, observer memory/knowledge/experience etc. Paintings are an example where the message is contained more in the high-level color qualities and spatial arrangements than in the physical properties of colors. Starting from this observation, Johannes Itten (1961) introduced a formalism to analyze the use of color in art and the effects that this induces on the user's psyche. We present a system which translates the Itten theory into a formal language that allows to express the semantics associated with the combination of chromatic properties of color images. Fuzzy sets are used to represent low-level region properties. A formal language and a set of model checking rules are implemented that allow to define semantic clauses and verify the degree of truth by which they hold over an image.
{"title":"Retrieval of paintings using effects induced by color features","authors":"J. Corridoni, A. Bimbo, P. Pala","doi":"10.1109/CAIVD.1998.646028","DOIUrl":"https://doi.org/10.1109/CAIVD.1998.646028","url":null,"abstract":"Image databases are now a subject of increasing attention in multimedia, for archiving and retrieval of images in the fields of art, history, medicine and industry, among others. From the psychological point of view, color perception is related to several factors including color features (brightness, chromaticity and saturation), surrounding colors, color spatial organization, observer memory/knowledge/experience etc. Paintings are an example where the message is contained more in the high-level color qualities and spatial arrangements than in the physical properties of colors. Starting from this observation, Johannes Itten (1961) introduced a formalism to analyze the use of color in art and the effects that this induces on the user's psyche. We present a system which translates the Itten theory into a formal language that allows to express the semantics associated with the combination of chromatic properties of color images. Fuzzy sets are used to represent low-level region properties. A formal language and a set of model checking rules are implemented that allow to define semantic clauses and verify the degree of truth by which they hold over an image.","PeriodicalId":360087,"journal":{"name":"Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132581253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-03DOI: 10.1109/CAIVD.1998.646034
Michael A. Smith, T. Kanade
Digital video is rapidly becoming important for education, entertainment and a host of multimedia applications. With the size of the video collections growing to thousands of hours, technology is needed to effectively browse segments in a short time without losing the content of the video. We propose a method to extract the significant audio and video information and create a skim video which represents a very short synopsis of the original. The goal of this work is to show the utility of integrating language and image understanding techniques for video skimming by extraction of significant information, such as specific objects, audio keywords and relevant video structure. The resulting skim video is much shorter; where compaction is as high as 20:1, and yet retains the essential content of the original segment. We have conducted a user-study to test the content summarization and effectiveness of the skim as a browsing tool.
{"title":"Video skimming and characterization through the combination of image and language understanding","authors":"Michael A. Smith, T. Kanade","doi":"10.1109/CAIVD.1998.646034","DOIUrl":"https://doi.org/10.1109/CAIVD.1998.646034","url":null,"abstract":"Digital video is rapidly becoming important for education, entertainment and a host of multimedia applications. With the size of the video collections growing to thousands of hours, technology is needed to effectively browse segments in a short time without losing the content of the video. We propose a method to extract the significant audio and video information and create a skim video which represents a very short synopsis of the original. The goal of this work is to show the utility of integrating language and image understanding techniques for video skimming by extraction of significant information, such as specific objects, audio keywords and relevant video structure. The resulting skim video is much shorter; where compaction is as high as 20:1, and yet retains the essential content of the original segment. We have conducted a user-study to test the content summarization and effectiveness of the skim as a browsing tool.","PeriodicalId":360087,"journal":{"name":"Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124987494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-03DOI: 10.1109/CAIVD.1998.646031
Weiping Zhu, T. Syeda-Mahmood
In content-based access of image databases, there is a need for a shape formalism that allows a precise description and recognition of a wider class of shape variations that evoke the same overall perceptual similarity in appearance. Such a description not only allows images of a database to be organized into shape categories for efficient indexing, but also makes a wider class of shape-similarity queries possible. This paper presents a region topology-based shape model called the constrained affine shape model, that captures the spatial layout similarity between members of a class by a set of constrained affine deformations from a prototypical member. The shape model is proposed for use in organizing images of a database into shape categories represented by prototypical members and the associated shape constraints. An efficient matching algorithm is presented for use in shape categorization and querying. The effect of global pose changes on the constraints of the shape model are analyzed to make shape matching robust to global pose changes. An application of the model for document retrieval based on document shape genres is presented. Finally, the effectiveness of the shape model in content-based access of such databases is evaluated.
{"title":"Image organization and retrieval using a flexible shape model","authors":"Weiping Zhu, T. Syeda-Mahmood","doi":"10.1109/CAIVD.1998.646031","DOIUrl":"https://doi.org/10.1109/CAIVD.1998.646031","url":null,"abstract":"In content-based access of image databases, there is a need for a shape formalism that allows a precise description and recognition of a wider class of shape variations that evoke the same overall perceptual similarity in appearance. Such a description not only allows images of a database to be organized into shape categories for efficient indexing, but also makes a wider class of shape-similarity queries possible. This paper presents a region topology-based shape model called the constrained affine shape model, that captures the spatial layout similarity between members of a class by a set of constrained affine deformations from a prototypical member. The shape model is proposed for use in organizing images of a database into shape categories represented by prototypical members and the associated shape constraints. An efficient matching algorithm is presented for use in shape categorization and querying. The effect of global pose changes on the constraints of the shape model are analyzed to make shape matching robust to global pose changes. An application of the model for document retrieval based on document shape genres is presented. Finally, the effectiveness of the shape model in content-based access of such databases is evaluated.","PeriodicalId":360087,"journal":{"name":"Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116190975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-03DOI: 10.1109/CAIVD.1998.646037
Yanxi Liu, W. Rothfus, T. Kanade
A content-based 3D neuroradiologic image retrieval system is being developed at the Robotics Institute of CMU. The special characteristics of this system include: directly dealing with multimodal 3D images (MR/CT); image similarity based on anatomical structures of the human brain; and combining both visual and collateral information for indexing and retrieval. A testbed has been implemented for using detected salient visual features for indexing and retrieving 3D images.
{"title":"Content-based 3D neuroradiologic image retrieval: preliminary results","authors":"Yanxi Liu, W. Rothfus, T. Kanade","doi":"10.1109/CAIVD.1998.646037","DOIUrl":"https://doi.org/10.1109/CAIVD.1998.646037","url":null,"abstract":"A content-based 3D neuroradiologic image retrieval system is being developed at the Robotics Institute of CMU. The special characteristics of this system include: directly dealing with multimodal 3D images (MR/CT); image similarity based on anatomical structures of the human brain; and combining both visual and collateral information for indexing and retrieval. A testbed has been implemented for using detected salient visual features for indexing and retrieving 3D images.","PeriodicalId":360087,"journal":{"name":"Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database","volume":"15 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125828512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-03DOI: 10.1109/CAIVD.1998.646036
G. Sudhir, J. C. Lee, Anil K. Jain
We present our techniques and results on automatic analysis of tennis video to facilitate content-based retrieval. Our approach is based on the generation of an image model for the tennis court-lines. We derive this model by using the knowledge about dimensions and connectivity (form) of a tennis court and typical camera geometry used when capturing a tennis video. We use this model to develop: a court line detection algorithm; and a robust player tracking algorithm to track the tennis players over the image sequence. We also present a color-based algorithm to select tennis court clips from an input raw footage of tennis video. Automatically extracted tennis court lines and the players' location information are analyzed in a high-level reasoning module and related to useful high-level tennis play events. Results on real tennis video data are presented demonstrating the validity and performance of the approach.
{"title":"Automatic classification of tennis video for high-level content-based retrieval","authors":"G. Sudhir, J. C. Lee, Anil K. Jain","doi":"10.1109/CAIVD.1998.646036","DOIUrl":"https://doi.org/10.1109/CAIVD.1998.646036","url":null,"abstract":"We present our techniques and results on automatic analysis of tennis video to facilitate content-based retrieval. Our approach is based on the generation of an image model for the tennis court-lines. We derive this model by using the knowledge about dimensions and connectivity (form) of a tennis court and typical camera geometry used when capturing a tennis video. We use this model to develop: a court line detection algorithm; and a robust player tracking algorithm to track the tennis players over the image sequence. We also present a color-based algorithm to select tennis court clips from an input raw footage of tennis video. Automatically extracted tennis court lines and the players' location information are analyzed in a high-level reasoning module and related to useful high-level tennis play events. Results on real tennis video data are presented demonstrating the validity and performance of the approach.","PeriodicalId":360087,"journal":{"name":"Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128574343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-03DOI: 10.1109/CAIVD.1998.646035
M. Caliani, C. Colombo, A. Bimbo, P. Pala
Video information processing and retrieval is a key challenge for future multimedia technologies and applications. Commercial videos encode several planes of expression through a rich and dense use of colors, editing effects, viewpoints and rhythms, which are exploited together to attract potential purchasers. In this paper, previous research in the marketing and semiotics fields is translated into a multimedia engineering perspective, and a link is formalized between visual features of commercials at the perceptual level and the specific information that is being vehiculated to the audience. The link allows us to define higher level semantic features capturing the main narrative structures of the video, and embed them in a video retrieval system supporting access to a database of commercials based on four different semiotic categories.
{"title":"Commercial video retrieval by induced semantics","authors":"M. Caliani, C. Colombo, A. Bimbo, P. Pala","doi":"10.1109/CAIVD.1998.646035","DOIUrl":"https://doi.org/10.1109/CAIVD.1998.646035","url":null,"abstract":"Video information processing and retrieval is a key challenge for future multimedia technologies and applications. Commercial videos encode several planes of expression through a rich and dense use of colors, editing effects, viewpoints and rhythms, which are exploited together to attract potential purchasers. In this paper, previous research in the marketing and semiotics fields is translated into a multimedia engineering perspective, and a link is formalized between visual features of commercials at the perceptual level and the specific information that is being vehiculated to the audience. The link allows us to define higher level semantic features capturing the main narrative structures of the video, and embed them in a video retrieval system supporting access to a database of commercials based on four different semiotic categories.","PeriodicalId":360087,"journal":{"name":"Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database","volume":"254 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121280470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-03DOI: 10.1109/CAIVD.1998.646032
M. Szummer, Rosalind W. Picard
We show how high-level scene properties can be inferred from classification of low-level image features, specifically for the indoor-outdoor scene retrieval problem. We systematically studied the features of: histograms in the Ohta color space; multiresolution, simultaneous autoregressive model parameters; and coefficients of a shift-invariant DCT. We demonstrate that performance is improved by computing features on subblocks, classifying these subblocks, and then combining these results in a way reminiscent of stacking. State of the art single-feature methods are shown to result in about 75-86% performance, while the new method results in 90.3% correct classification, when evaluated on a diverse database of over 1300 consumer images provided by Kodak.
{"title":"Indoor-outdoor image classification","authors":"M. Szummer, Rosalind W. Picard","doi":"10.1109/CAIVD.1998.646032","DOIUrl":"https://doi.org/10.1109/CAIVD.1998.646032","url":null,"abstract":"We show how high-level scene properties can be inferred from classification of low-level image features, specifically for the indoor-outdoor scene retrieval problem. We systematically studied the features of: histograms in the Ohta color space; multiresolution, simultaneous autoregressive model parameters; and coefficients of a shift-invariant DCT. We demonstrate that performance is improved by computing features on subblocks, classifying these subblocks, and then combining these results in a way reminiscent of stacking. State of the art single-feature methods are shown to result in about 75-86% performance, while the new method results in 90.3% correct classification, when evaluated on a diverse database of over 1300 consumer images provided by Kodak.","PeriodicalId":360087,"journal":{"name":"Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127425147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-01-03DOI: 10.1109/CAIVD.1998.646029
A. Berman, L. Shapiro
A new class of algorithms based on triangle inequality has been proposed for use in content-based image retrieval. These algorithms rely on comparing a set of key images to the database images, and storing the computed distances. Query images are later compared to the keys, and the triangle inequality is used to speedily compute lower bounds on the distance from the query to each of the database images. This paper addresses the question of increasing performance of this algorithm by the selection of appropriate key images. Several algorithms for key selection are proposed and tested.
{"title":"Selecting good keys for triangle-inequality-based pruning algorithms","authors":"A. Berman, L. Shapiro","doi":"10.1109/CAIVD.1998.646029","DOIUrl":"https://doi.org/10.1109/CAIVD.1998.646029","url":null,"abstract":"A new class of algorithms based on triangle inequality has been proposed for use in content-based image retrieval. These algorithms rely on comparing a set of key images to the database images, and storing the computed distances. Query images are later compared to the keys, and the triangle inequality is used to speedily compute lower bounds on the distance from the query to each of the database images. This paper addresses the question of increasing performance of this algorithm by the selection of appropriate key images. Several algorithms for key selection are proposed and tested.","PeriodicalId":360087,"journal":{"name":"Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database","volume":"97 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113985516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}