首页 > 最新文献

Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database最新文献

英文 中文
Viewpoint-invariant indexing for content-based image retrieval 基于内容的图像检索的视点不变索引
Sven J. Dickinson, A. Pentland, S. Stevenson
Current methods for shape-based image retrieval are restricted to images containing 2-D objects. We propose a novel approach to querying images containing 3-D objects, based on a view-based encoding of a finite domain of 3-D parts used to model the 3-D objects appearing in images. To build a query, the user manually identifies the salient parts of the object in a query image. The extracted views of these parts are then used to hypothesize the 3-D identities of the parts which, in turn, are used to hypothesize other possible views of the parts. The resulting set of part views, along with their spatial relations (constraints) in the query image, form a composite query that is passed to the image database. Images containing objects with the same parts (in any view) with similar spatial relations are returned to the user. The resulting viewpoint invariant indexing technique does not require training the system for all possible views of each object. Rather, the system requires only knowledge of the possible views for a finite vocabulary of 3-D parts from which the objects are constructed.
当前基于形状的图像检索方法仅限于包含二维物体的图像。我们提出了一种新的方法来查询包含三维物体的图像,该方法基于基于视图的编码,该编码用于对图像中出现的三维物体进行建模。要构建查询,用户需要手动识别查询图像中对象的突出部分。然后,这些零件的提取视图被用来假设零件的三维身份,而这些身份又被用来假设零件的其他可能视图。生成的部分视图集,以及它们在查询图像中的空间关系(约束),形成一个复合查询,传递给图像数据库。将包含具有相似空间关系的相同部分(在任何视图中)的对象的图像返回给用户。由此产生的视点不变索引技术不需要为每个对象的所有可能的视图训练系统。更确切地说,该系统只需要了解有限的3d部件词汇表的可能视图,这些部件是构建对象的基础。
{"title":"Viewpoint-invariant indexing for content-based image retrieval","authors":"Sven J. Dickinson, A. Pentland, S. Stevenson","doi":"10.1109/CAIVD.1998.646030","DOIUrl":"https://doi.org/10.1109/CAIVD.1998.646030","url":null,"abstract":"Current methods for shape-based image retrieval are restricted to images containing 2-D objects. We propose a novel approach to querying images containing 3-D objects, based on a view-based encoding of a finite domain of 3-D parts used to model the 3-D objects appearing in images. To build a query, the user manually identifies the salient parts of the object in a query image. The extracted views of these parts are then used to hypothesize the 3-D identities of the parts which, in turn, are used to hypothesize other possible views of the parts. The resulting set of part views, along with their spatial relations (constraints) in the query image, form a composite query that is passed to the image database. Images containing objects with the same parts (in any view) with similar spatial relations are returned to the user. The resulting viewpoint invariant indexing technique does not require training the system for all possible views of each object. Rather, the system requires only knowledge of the possible views for a finite vocabulary of 3-D parts from which the objects are constructed.","PeriodicalId":360087,"journal":{"name":"Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114794136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Video OCR for digital news archive 视频OCR数字新闻档案
Toshio Sato, T. Kanade, Ellen K. Hughes, Michael A. Smith
Video OCR is a technique that can greatly help to locate topics of interest in a large digital news video archive via the automatic extraction and reading of captions and annotations. News captions generally provide vital search information about the video being presented, the names of people and places or descriptions of objects. In this paper, two difficult problems of character recognition for videos are addressed: low resolution characters and extremely complex backgrounds. We apply an interpolation filter, multi-frame integration and a combination of four filters to solve these problems. Segmenting characters is done by a recognition-based segmentation method and intermediate character recognition results are used to improve the segmentation. The overall recognition results are good enough for use in news indexing. Performing video OCR on news video and combining its results with other video understanding techniques will improve the overall understanding of the news video content.
视频OCR是一种通过自动提取和读取标题和注释,可以极大地帮助定位大型数字新闻视频档案中感兴趣的主题的技术。新闻标题通常会提供重要的搜索信息,比如正在播放的视频、人物和地点的名字或物体的描述。本文解决了视频字符识别中的两个难题:低分辨率字符和极其复杂的背景。我们采用插值滤波器、多帧集成和四种滤波器的组合来解决这些问题。采用基于识别的字符分割方法对字符进行分割,并利用中间字符识别结果对分割效果进行改进。总体识别结果足以用于新闻索引。对新闻视频进行视频OCR,并将其结果与其他视频理解技术相结合,可以提高对新闻视频内容的整体理解。
{"title":"Video OCR for digital news archive","authors":"Toshio Sato, T. Kanade, Ellen K. Hughes, Michael A. Smith","doi":"10.1109/CAIVD.1998.646033","DOIUrl":"https://doi.org/10.1109/CAIVD.1998.646033","url":null,"abstract":"Video OCR is a technique that can greatly help to locate topics of interest in a large digital news video archive via the automatic extraction and reading of captions and annotations. News captions generally provide vital search information about the video being presented, the names of people and places or descriptions of objects. In this paper, two difficult problems of character recognition for videos are addressed: low resolution characters and extremely complex backgrounds. We apply an interpolation filter, multi-frame integration and a combination of four filters to solve these problems. Segmenting characters is done by a recognition-based segmentation method and intermediate character recognition results are used to improve the segmentation. The overall recognition results are good enough for use in news indexing. Performing video OCR on news video and combining its results with other video understanding techniques will improve the overall understanding of the news video content.","PeriodicalId":360087,"journal":{"name":"Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126246403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 278
Retrieval of paintings using effects induced by color features 利用色彩特征引起的效果检索绘画
J. Corridoni, A. Bimbo, P. Pala
Image databases are now a subject of increasing attention in multimedia, for archiving and retrieval of images in the fields of art, history, medicine and industry, among others. From the psychological point of view, color perception is related to several factors including color features (brightness, chromaticity and saturation), surrounding colors, color spatial organization, observer memory/knowledge/experience etc. Paintings are an example where the message is contained more in the high-level color qualities and spatial arrangements than in the physical properties of colors. Starting from this observation, Johannes Itten (1961) introduced a formalism to analyze the use of color in art and the effects that this induces on the user's psyche. We present a system which translates the Itten theory into a formal language that allows to express the semantics associated with the combination of chromatic properties of color images. Fuzzy sets are used to represent low-level region properties. A formal language and a set of model checking rules are implemented that allow to define semantic clauses and verify the degree of truth by which they hold over an image.
图像数据库现在是多媒体领域日益受到重视的一个主题,用于存档和检索艺术、历史、医学和工业等领域的图像。从心理学的角度来看,色彩感知与几个因素有关,包括色彩特征(亮度、色度和饱和度)、周围的颜色、色彩空间组织、观察者的记忆/知识/经验等。绘画就是一个例子,其中的信息更多地包含在高层次的色彩质量和空间安排中,而不是色彩的物理特性中。从这一观察出发,Johannes Itten(1961)引入了一种形式主义来分析艺术中色彩的使用及其对使用者心理的影响。我们提出了一个系统,将Itten理论转化为一种形式语言,允许表达与彩色图像的颜色属性组合相关的语义。模糊集用于表示低级区域属性。实现了一种形式语言和一组模型检查规则,允许定义语义子句并验证它们在图像上的真实程度。
{"title":"Retrieval of paintings using effects induced by color features","authors":"J. Corridoni, A. Bimbo, P. Pala","doi":"10.1109/CAIVD.1998.646028","DOIUrl":"https://doi.org/10.1109/CAIVD.1998.646028","url":null,"abstract":"Image databases are now a subject of increasing attention in multimedia, for archiving and retrieval of images in the fields of art, history, medicine and industry, among others. From the psychological point of view, color perception is related to several factors including color features (brightness, chromaticity and saturation), surrounding colors, color spatial organization, observer memory/knowledge/experience etc. Paintings are an example where the message is contained more in the high-level color qualities and spatial arrangements than in the physical properties of colors. Starting from this observation, Johannes Itten (1961) introduced a formalism to analyze the use of color in art and the effects that this induces on the user's psyche. We present a system which translates the Itten theory into a formal language that allows to express the semantics associated with the combination of chromatic properties of color images. Fuzzy sets are used to represent low-level region properties. A formal language and a set of model checking rules are implemented that allow to define semantic clauses and verify the degree of truth by which they hold over an image.","PeriodicalId":360087,"journal":{"name":"Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132581253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Video skimming and characterization through the combination of image and language understanding 通过图像和语言理解相结合的视频浏览和表征
Michael A. Smith, T. Kanade
Digital video is rapidly becoming important for education, entertainment and a host of multimedia applications. With the size of the video collections growing to thousands of hours, technology is needed to effectively browse segments in a short time without losing the content of the video. We propose a method to extract the significant audio and video information and create a skim video which represents a very short synopsis of the original. The goal of this work is to show the utility of integrating language and image understanding techniques for video skimming by extraction of significant information, such as specific objects, audio keywords and relevant video structure. The resulting skim video is much shorter; where compaction is as high as 20:1, and yet retains the essential content of the original segment. We have conducted a user-study to test the content summarization and effectiveness of the skim as a browsing tool.
数字视频正迅速成为教育、娱乐和大量多媒体应用的重要工具。随着视频集合的规模增长到数千小时,需要技术在短时间内有效地浏览片段而不丢失视频的内容。我们提出了一种方法来提取重要的音频和视频信息,并创建一个略读视频,它代表了一个非常简短的原始摘要。这项工作的目标是通过提取重要信息(如特定对象、音频关键字和相关视频结构)来展示集成语言和图像理解技术在视频浏览中的实用性。由此产生的略读视频要短得多;其中压实率高达20:1,但保留了原始段的基本内容。我们进行了一项用户研究,以测试内容摘要和略读作为浏览工具的有效性。
{"title":"Video skimming and characterization through the combination of image and language understanding","authors":"Michael A. Smith, T. Kanade","doi":"10.1109/CAIVD.1998.646034","DOIUrl":"https://doi.org/10.1109/CAIVD.1998.646034","url":null,"abstract":"Digital video is rapidly becoming important for education, entertainment and a host of multimedia applications. With the size of the video collections growing to thousands of hours, technology is needed to effectively browse segments in a short time without losing the content of the video. We propose a method to extract the significant audio and video information and create a skim video which represents a very short synopsis of the original. The goal of this work is to show the utility of integrating language and image understanding techniques for video skimming by extraction of significant information, such as specific objects, audio keywords and relevant video structure. The resulting skim video is much shorter; where compaction is as high as 20:1, and yet retains the essential content of the original segment. We have conducted a user-study to test the content summarization and effectiveness of the skim as a browsing tool.","PeriodicalId":360087,"journal":{"name":"Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124987494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 202
Image organization and retrieval using a flexible shape model 使用柔性形状模型的图像组织和检索
Weiping Zhu, T. Syeda-Mahmood
In content-based access of image databases, there is a need for a shape formalism that allows a precise description and recognition of a wider class of shape variations that evoke the same overall perceptual similarity in appearance. Such a description not only allows images of a database to be organized into shape categories for efficient indexing, but also makes a wider class of shape-similarity queries possible. This paper presents a region topology-based shape model called the constrained affine shape model, that captures the spatial layout similarity between members of a class by a set of constrained affine deformations from a prototypical member. The shape model is proposed for use in organizing images of a database into shape categories represented by prototypical members and the associated shape constraints. An efficient matching algorithm is presented for use in shape categorization and querying. The effect of global pose changes on the constraints of the shape model are analyzed to make shape matching robust to global pose changes. An application of the model for document retrieval based on document shape genres is presented. Finally, the effectiveness of the shape model in content-based access of such databases is evaluated.
在基于内容的图像数据库访问中,需要一种形状形式化,它允许对更广泛的形状变化类进行精确描述和识别,这些形状变化类在外观上具有相同的整体感知相似性。这样的描述不仅允许将数据库的图像组织成形状类别以进行有效的索引,而且还使更广泛的形状相似性查询成为可能。本文提出了一种基于区域拓扑的形状模型,称为约束仿射形状模型,该模型通过一个原型成员的一组约束仿射变形来捕获类成员之间的空间布局相似性。提出了形状模型,用于将数据库图像组织成由原型成员和相关形状约束表示的形状类别。提出了一种用于形状分类和查询的高效匹配算法。分析了全局姿态变化对形状模型约束的影响,使形状匹配对全局姿态变化具有鲁棒性。给出了该模型在基于文档形状类型的文档检索中的应用。最后,对形状模型在基于内容的数据库访问中的有效性进行了评价。
{"title":"Image organization and retrieval using a flexible shape model","authors":"Weiping Zhu, T. Syeda-Mahmood","doi":"10.1109/CAIVD.1998.646031","DOIUrl":"https://doi.org/10.1109/CAIVD.1998.646031","url":null,"abstract":"In content-based access of image databases, there is a need for a shape formalism that allows a precise description and recognition of a wider class of shape variations that evoke the same overall perceptual similarity in appearance. Such a description not only allows images of a database to be organized into shape categories for efficient indexing, but also makes a wider class of shape-similarity queries possible. This paper presents a region topology-based shape model called the constrained affine shape model, that captures the spatial layout similarity between members of a class by a set of constrained affine deformations from a prototypical member. The shape model is proposed for use in organizing images of a database into shape categories represented by prototypical members and the associated shape constraints. An efficient matching algorithm is presented for use in shape categorization and querying. The effect of global pose changes on the constraints of the shape model are analyzed to make shape matching robust to global pose changes. An application of the model for document retrieval based on document shape genres is presented. Finally, the effectiveness of the shape model in content-based access of such databases is evaluated.","PeriodicalId":360087,"journal":{"name":"Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116190975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Content-based 3D neuroradiologic image retrieval: preliminary results 基于内容的三维神经放射学图像检索:初步结果
Yanxi Liu, W. Rothfus, T. Kanade
A content-based 3D neuroradiologic image retrieval system is being developed at the Robotics Institute of CMU. The special characteristics of this system include: directly dealing with multimodal 3D images (MR/CT); image similarity based on anatomical structures of the human brain; and combining both visual and collateral information for indexing and retrieval. A testbed has been implemented for using detected salient visual features for indexing and retrieving 3D images.
CMU机器人研究所正在开发一种基于内容的3D神经放射图像检索系统。该系统的特点包括:直接处理多模态三维图像(MR/CT);基于人脑解剖结构的图像相似度研究并结合视觉和附属信息的索引和检索。实现了利用检测到的显著视觉特征对三维图像进行索引和检索的测试平台。
{"title":"Content-based 3D neuroradiologic image retrieval: preliminary results","authors":"Yanxi Liu, W. Rothfus, T. Kanade","doi":"10.1109/CAIVD.1998.646037","DOIUrl":"https://doi.org/10.1109/CAIVD.1998.646037","url":null,"abstract":"A content-based 3D neuroradiologic image retrieval system is being developed at the Robotics Institute of CMU. The special characteristics of this system include: directly dealing with multimodal 3D images (MR/CT); image similarity based on anatomical structures of the human brain; and combining both visual and collateral information for indexing and retrieval. A testbed has been implemented for using detected salient visual features for indexing and retrieving 3D images.","PeriodicalId":360087,"journal":{"name":"Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database","volume":"15 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125828512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Automatic classification of tennis video for high-level content-based retrieval 基于高级内容检索的网球视频自动分类
G. Sudhir, J. C. Lee, Anil K. Jain
We present our techniques and results on automatic analysis of tennis video to facilitate content-based retrieval. Our approach is based on the generation of an image model for the tennis court-lines. We derive this model by using the knowledge about dimensions and connectivity (form) of a tennis court and typical camera geometry used when capturing a tennis video. We use this model to develop: a court line detection algorithm; and a robust player tracking algorithm to track the tennis players over the image sequence. We also present a color-based algorithm to select tennis court clips from an input raw footage of tennis video. Automatically extracted tennis court lines and the players' location information are analyzed in a high-level reasoning module and related to useful high-level tennis play events. Results on real tennis video data are presented demonstrating the validity and performance of the approach.
我们提出了网球视频自动分析的技术和结果,以促进基于内容的检索。我们的方法是基于生成网球场线的图像模型。我们通过使用关于网球场的尺寸和连接(形式)的知识以及捕获网球视频时使用的典型摄像机几何形状来推导该模型。我们利用这个模型开发了:一个球场线检测算法;以及一个健壮的球员跟踪算法,通过图像序列跟踪网球运动员。我们还提出了一种基于颜色的算法,从输入的网球视频原始片段中选择网球场片段。自动提取的网球场线和球员的位置信息在高级推理模块中进行分析,并与有用的高水平网球比赛事件相关。通过对真实网球视频数据的分析,验证了该方法的有效性和有效性。
{"title":"Automatic classification of tennis video for high-level content-based retrieval","authors":"G. Sudhir, J. C. Lee, Anil K. Jain","doi":"10.1109/CAIVD.1998.646036","DOIUrl":"https://doi.org/10.1109/CAIVD.1998.646036","url":null,"abstract":"We present our techniques and results on automatic analysis of tennis video to facilitate content-based retrieval. Our approach is based on the generation of an image model for the tennis court-lines. We derive this model by using the knowledge about dimensions and connectivity (form) of a tennis court and typical camera geometry used when capturing a tennis video. We use this model to develop: a court line detection algorithm; and a robust player tracking algorithm to track the tennis players over the image sequence. We also present a color-based algorithm to select tennis court clips from an input raw footage of tennis video. Automatically extracted tennis court lines and the players' location information are analyzed in a high-level reasoning module and related to useful high-level tennis play events. Results on real tennis video data are presented demonstrating the validity and performance of the approach.","PeriodicalId":360087,"journal":{"name":"Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128574343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 254
Commercial video retrieval by induced semantics 基于诱导语义的商业视频检索
M. Caliani, C. Colombo, A. Bimbo, P. Pala
Video information processing and retrieval is a key challenge for future multimedia technologies and applications. Commercial videos encode several planes of expression through a rich and dense use of colors, editing effects, viewpoints and rhythms, which are exploited together to attract potential purchasers. In this paper, previous research in the marketing and semiotics fields is translated into a multimedia engineering perspective, and a link is formalized between visual features of commercials at the perceptual level and the specific information that is being vehiculated to the audience. The link allows us to define higher level semantic features capturing the main narrative structures of the video, and embed them in a video retrieval system supporting access to a database of commercials based on four different semiotic categories.
视频信息的处理和检索是未来多媒体技术和应用面临的关键挑战。商业视频通过丰富而密集的色彩、剪辑效果、视点和节奏,编码了多个表达层面,共同利用,吸引潜在的购买者。本文将市场营销和符号学领域的研究成果转化为多媒体工程的视角,在感知层面确立了商业广告的视觉特征与向受众传播的具体信息之间的联系。这个链接允许我们定义更高级的语义特征,捕捉视频的主要叙事结构,并将它们嵌入到视频检索系统中,该系统支持访问基于四种不同符号学类别的商业广告数据库。
{"title":"Commercial video retrieval by induced semantics","authors":"M. Caliani, C. Colombo, A. Bimbo, P. Pala","doi":"10.1109/CAIVD.1998.646035","DOIUrl":"https://doi.org/10.1109/CAIVD.1998.646035","url":null,"abstract":"Video information processing and retrieval is a key challenge for future multimedia technologies and applications. Commercial videos encode several planes of expression through a rich and dense use of colors, editing effects, viewpoints and rhythms, which are exploited together to attract potential purchasers. In this paper, previous research in the marketing and semiotics fields is translated into a multimedia engineering perspective, and a link is formalized between visual features of commercials at the perceptual level and the specific information that is being vehiculated to the audience. The link allows us to define higher level semantic features capturing the main narrative structures of the video, and embed them in a video retrieval system supporting access to a database of commercials based on four different semiotic categories.","PeriodicalId":360087,"journal":{"name":"Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database","volume":"254 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121280470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Indoor-outdoor image classification 室内外图像分类
M. Szummer, Rosalind W. Picard
We show how high-level scene properties can be inferred from classification of low-level image features, specifically for the indoor-outdoor scene retrieval problem. We systematically studied the features of: histograms in the Ohta color space; multiresolution, simultaneous autoregressive model parameters; and coefficients of a shift-invariant DCT. We demonstrate that performance is improved by computing features on subblocks, classifying these subblocks, and then combining these results in a way reminiscent of stacking. State of the art single-feature methods are shown to result in about 75-86% performance, while the new method results in 90.3% correct classification, when evaluated on a diverse database of over 1300 consumer images provided by Kodak.
我们展示了如何从低级图像特征的分类中推断出高级场景属性,特别是针对室内外场景检索问题。我们系统地研究了Ohta颜色空间中直方图的特征;多分辨率、同步自回归模型参数;和平移不变DCT的系数。我们证明,通过计算子块上的特征,对这些子块进行分类,然后以一种让人想起堆叠的方式组合这些结果,可以提高性能。最先进的单一特征方法显示出大约75-86%的性能,而新方法的分类正确率为90.3%,当在柯达提供的超过1300张消费者图像的不同数据库中进行评估时。
{"title":"Indoor-outdoor image classification","authors":"M. Szummer, Rosalind W. Picard","doi":"10.1109/CAIVD.1998.646032","DOIUrl":"https://doi.org/10.1109/CAIVD.1998.646032","url":null,"abstract":"We show how high-level scene properties can be inferred from classification of low-level image features, specifically for the indoor-outdoor scene retrieval problem. We systematically studied the features of: histograms in the Ohta color space; multiresolution, simultaneous autoregressive model parameters; and coefficients of a shift-invariant DCT. We demonstrate that performance is improved by computing features on subblocks, classifying these subblocks, and then combining these results in a way reminiscent of stacking. State of the art single-feature methods are shown to result in about 75-86% performance, while the new method results in 90.3% correct classification, when evaluated on a diverse database of over 1300 consumer images provided by Kodak.","PeriodicalId":360087,"journal":{"name":"Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127425147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 785
Selecting good keys for triangle-inequality-based pruning algorithms 为基于三角形不等式的剪枝算法选择好键
A. Berman, L. Shapiro
A new class of algorithms based on triangle inequality has been proposed for use in content-based image retrieval. These algorithms rely on comparing a set of key images to the database images, and storing the computed distances. Query images are later compared to the keys, and the triangle inequality is used to speedily compute lower bounds on the distance from the query to each of the database images. This paper addresses the question of increasing performance of this algorithm by the selection of appropriate key images. Several algorithms for key selection are proposed and tested.
提出了一种新的基于三角形不等式的图像检索算法。这些算法依赖于将一组关键图像与数据库图像进行比较,并存储计算出的距离。稍后将查询图像与键进行比较,并使用三角形不等式快速计算从查询到每个数据库图像的距离的下界。本文通过选择合适的关键图像来提高算法的性能。提出并测试了几种键选择算法。
{"title":"Selecting good keys for triangle-inequality-based pruning algorithms","authors":"A. Berman, L. Shapiro","doi":"10.1109/CAIVD.1998.646029","DOIUrl":"https://doi.org/10.1109/CAIVD.1998.646029","url":null,"abstract":"A new class of algorithms based on triangle inequality has been proposed for use in content-based image retrieval. These algorithms rely on comparing a set of key images to the database images, and storing the computed distances. Query images are later compared to the keys, and the triangle inequality is used to speedily compute lower bounds on the distance from the query to each of the database images. This paper addresses the question of increasing performance of this algorithm by the selection of appropriate key images. Several algorithms for key selection are proposed and tested.","PeriodicalId":360087,"journal":{"name":"Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database","volume":"97 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113985516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
期刊
Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1