Managing photos by using visual features (e.g., color and texture) is known to be a powerful, yet imprecise, retrieval paradigm because of the semantic gap problem. The same is true if search relies only on keywords (or tags), derived from either the image context or user-provided annotations. In this paper we present a new multi-faceted image search and browsing system, named Scenique, that allows the user to manage her photo collections by using both visual features and tags, possibly organized into multiple dimensions (or facets). Each facet can be seen as a coordinate of a multidimensional space describing the image content (for example, the visual appearance, the content type, the geographic location, and so on). We present the basic principles of Scenique and provide evidence of the effectiveness of its visual tools. Feedback supplied by a set of real users indicates that the proposed interface is intuitive, easy-to-use, and that satisfies users' expectations in managing photo collections and quickly locating images of interest.
{"title":"A Multi-faceted Browsing Interface for Digital Photo Collections","authors":"Ilaria Bartolini","doi":"10.1109/CBMI.2009.23","DOIUrl":"https://doi.org/10.1109/CBMI.2009.23","url":null,"abstract":"Managing photos by using visual features (e.g., color and texture) is known to be a powerful, yet imprecise, retrieval paradigm because of the semantic gap problem. The same is true if search relies only on keywords (or tags), derived from either the image context or user-provided annotations. In this paper we present a new multi-faceted image search and browsing system, named Scenique, that allows the user to manage her photo collections by using both visual features and tags, possibly organized into multiple dimensions (or facets). Each facet can be seen as a coordinate of a multidimensional space describing the image content (for example, the visual appearance, the content type, the geographic location, and so on). We present the basic principles of Scenique and provide evidence of the effectiveness of its visual tools. Feedback supplied by a set of real users indicates that the proposed interface is intuitive, easy-to-use, and that satisfies users' expectations in managing photo collections and quickly locating images of interest.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126972701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
O. Schreer, I. Feldmann, Isabel Alonso Mediavilla, P. Concejero, A. Sadka, M. Swash
Multimedia analysis and reuse of raw un-edited audio visual content known as rushes is gaining acceptance by a large number of research labs and companies. A set of research projects are considering multimedia indexing, annotation, search and retrieval in the context of European funded research, but only the FP6 project RUSHES is focusing on automatic semantic annotation, indexing and retrieval of raw and un-edited audio-visual content. Even professional content creators and providers as well as home-users are dealing with this type of content and therefore novel technologies for semantic search and retrieval are required. In this paper, we present a summary of the most relevant achievements of the RUSHES project, focusing on specific approaches for automatic annotation as well as the main features of the final RUSHES search engine.
{"title":"RUSHES Retrieval of Multimedia Semantic Units for Enhanced Reusability","authors":"O. Schreer, I. Feldmann, Isabel Alonso Mediavilla, P. Concejero, A. Sadka, M. Swash","doi":"10.1109/CBMI.2009.43","DOIUrl":"https://doi.org/10.1109/CBMI.2009.43","url":null,"abstract":"Multimedia analysis and reuse of raw un-edited audio visual content known as rushes is gaining acceptance by a large number of research labs and companies. A set of research projects are considering multimedia indexing, annotation, search and retrieval in the context of European funded research, but only the FP6 project RUSHES is focusing on automatic semantic annotation, indexing and retrieval of raw and un-edited audio-visual content. Even professional content creators and providers as well as home-users are dealing with this type of content and therefore novel technologies for semantic search and retrieval are required. In this paper, we present a summary of the most relevant achievements of the RUSHES project, focusing on specific approaches for automatic annotation as well as the main features of the final RUSHES search engine.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134017970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes to improve our previous work on the concept-based video shot indexing, by considering an ontological concept construction in the TRECVid 2007 video retrieval, based on two steps. First, each single concept is modeled independently. Second, an ontology-based concept is introduced via the representation of the influence relations between concepts and the ontological readjustment of the confidence values. The main contribution of this paper is in the exploitation manner of the inter-concepts similarity in our indexing system, where three measures are represented: co-occurrence, visual similarity and LSCOM-lite ontology path length contribution. The experimental results report the efficiency and the significant improvement provided by the proposed scheme.
{"title":"Hierarchical Ontology-Based Robust Video Shots Indexation Using Global MPEG-7 Visual Descriptors","authors":"R. Benmokhtar, B. Huet","doi":"10.1109/CBMI.2009.18","DOIUrl":"https://doi.org/10.1109/CBMI.2009.18","url":null,"abstract":"This paper proposes to improve our previous work on the concept-based video shot indexing, by considering an ontological concept construction in the TRECVid 2007 video retrieval, based on two steps. First, each single concept is modeled independently. Second, an ontology-based concept is introduced via the representation of the influence relations between concepts and the ontological readjustment of the confidence values. The main contribution of this paper is in the exploitation manner of the inter-concepts similarity in our indexing system, where three measures are represented: co-occurrence, visual similarity and LSCOM-lite ontology path length contribution. The experimental results report the efficiency and the significant improvement provided by the proposed scheme.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124345485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Dunker, C. Dittmar, André Begau, S. Nowak, M. Gruhne
This paper describes a technical solution for automated slideshow generation by extracting a set of high-level features from music, such as beat grid, mood and genre and intelligently combining this set with image high-level features, such as mood, daytime- and scene classification. An advantage of this high-level concept is to enable the user to incorporate his preferences regarding the semantic aspects of music and images. For example, the user might request the system to automatically create a slideshow, which plays soft music and shows pictures with sunsets from the last 10 years of his own photo collection.The high-level feature extraction on both, the audio and the visual information is based on the same underlying machine learning core, which processes different audio- and visual- low- and mid-level features. This paper describes the technical realization and evaluation of the algorithms with suitable test databases.
{"title":"Semantic High-Level Features for Automated Cross-Modal Slideshow Generation","authors":"P. Dunker, C. Dittmar, André Begau, S. Nowak, M. Gruhne","doi":"10.1109/CBMI.2009.32","DOIUrl":"https://doi.org/10.1109/CBMI.2009.32","url":null,"abstract":"This paper describes a technical solution for automated slideshow generation by extracting a set of high-level features from music, such as beat grid, mood and genre and intelligently combining this set with image high-level features, such as mood, daytime- and scene classification. An advantage of this high-level concept is to enable the user to incorporate his preferences regarding the semantic aspects of music and images. For example, the user might request the system to automatically create a slideshow, which plays soft music and shows pictures with sunsets from the last 10 years of his own photo collection.The high-level feature extraction on both, the audio and the visual information is based on the same underlying machine learning core, which processes different audio- and visual- low- and mid-level features. This paper describes the technical realization and evaluation of the algorithms with suitable test databases.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127476332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rémi Vieux, J. Benois-Pineau, J. Domenger, A. Braquelaire
Content Based Image Retrieval is a topic which has received a lot of attention and increasing popularity due to a wide range of applications. In this paper, we present a similarity measure for CBIR in an industrial context, where the images of a vibration phenomenon are obtained by Electronic Speckle Pattern Interferometry. Images obtained have very poor visual characteristics, and traditional CBIR systems which rely on color or texture information could not be efficient. We propose a CBIR approach based on the 1-dimensional projections of the images obtained by the Radon transform. Experiments show that this signature is relevant and enables good retrieval performances compared to a baseline image correlation.
{"title":"ESPI Image Indexing and Similarity Search in Radon Transform Domain","authors":"Rémi Vieux, J. Benois-Pineau, J. Domenger, A. Braquelaire","doi":"10.1109/CBMI.2009.38","DOIUrl":"https://doi.org/10.1109/CBMI.2009.38","url":null,"abstract":"Content Based Image Retrieval is a topic which has received a lot of attention and increasing popularity due to a wide range of applications. In this paper, we present a similarity measure for CBIR in an industrial context, where the images of a vibration phenomenon are obtained by Electronic Speckle Pattern Interferometry. Images obtained have very poor visual characteristics, and traditional CBIR systems which rely on color or texture information could not be efficient. We propose a CBIR approach based on the 1-dimensional projections of the images obtained by the Radon transform. Experiments show that this signature is relevant and enables good retrieval performances compared to a baseline image correlation.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"197 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132099993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We are interested in designing a data structure for n objects of dimension d, with the following objectives: Space requirements should be O(d * n) and the query time should be O(d * log(n)). Such a structure corresponds to subspace trees. A subspace tree divides the distances between the subspaces. It is realized by the hierarchical linear subspace method. By doing so, the data is divided into disjoint entities. The asymptotic upper bound estimation of the maximum applicable number of subspaces is logarithmically constrained by the number of represented elements and their dimension.The search in such a tree starts at the subspace with the lowest dimension. In this subspace, the set of all possible similar objects is determined. In the next subspace, additional metric information corresponding to a higher dimension is used to reduce this set.
{"title":"Subspace Tree","authors":"A. Wichert","doi":"10.1109/CBMI.2009.14","DOIUrl":"https://doi.org/10.1109/CBMI.2009.14","url":null,"abstract":"We are interested in designing a data structure for n objects of dimension d, with the following objectives: Space requirements should be O(d * n) and the query time should be O(d * log(n)). Such a structure corresponds to subspace trees. A subspace tree divides the distances between the subspaces. It is realized by the hierarchical linear subspace method. By doing so, the data is divided into disjoint entities. The asymptotic upper bound estimation of the maximum applicable number of subspaces is logarithmically constrained by the number of represented elements and their dimension.The search in such a tree starts at the subspace with the lowest dimension. In this subspace, the set of all possible similar objects is determined. In the next subspace, additional metric information corresponding to a higher dimension is used to reduce this set.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127895307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chant and cantillation research is particularly interesting as it explores the transition from oral to written transmission of music. The goal of this work to create web-based computational tools that can assist the study of how diverse recitation traditions, having their origin in primarily non-notated melodies, later became codified. One of the authors is a musicologist and music theorist who has guided the system design and development by providing manual annotations and participating in the design process. We describe novel content-based visualization and analysis algorithms that can be used for problem-seeking exploration of audio recordings of chant and recitations.
{"title":"Content-Aware Web Browsing and Visualization Tools for Cantillation and Chant Research","authors":"S. Ness, G. Tzanetakis, D. Biró","doi":"10.1109/CBMI.2009.46","DOIUrl":"https://doi.org/10.1109/CBMI.2009.46","url":null,"abstract":"Chant and cantillation research is particularly interesting as it explores the transition from oral to written transmission of music. The goal of this work to create web-based computational tools that can assist the study of how diverse recitation traditions, having their origin in primarily non-notated melodies, later became codified. One of the authors is a musicologist and music theorist who has guided the system design and development by providing manual annotations and participating in the design process. We describe novel content-based visualization and analysis algorithms that can be used for problem-seeking exploration of audio recordings of chant and recitations.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125546453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Traditional content-based retrieval approaches employ either centralized or flooding strategies in ad hoc networks, which may result in low fault tolerance and high search cost making them inefficient. To facilitate an efficient video retrieval, we propose a logic-based content summary framework that is able to represent semantic contents of video data using concise logic terms. In this method the video data is characterized by color and wavelet coefficients which will be converted into logical terms by using threshold operators. The logical terms are then summarized as node content descriptions. The nodes containing similar node descriptions are clustered into a virtual infrastructure according to the semantic content.
{"title":"Semantic Video Clustering in Ad Hoc Networks for Content-Based Retrieval","authors":"Bo Yang, M. Manohar","doi":"10.1109/CBMI.2009.31","DOIUrl":"https://doi.org/10.1109/CBMI.2009.31","url":null,"abstract":"Traditional content-based retrieval approaches employ either centralized or flooding strategies in ad hoc networks, which may result in low fault tolerance and high search cost making them inefficient. To facilitate an efficient video retrieval, we propose a logic-based content summary framework that is able to represent semantic contents of video data using concise logic terms. In this method the video data is characterized by color and wavelet coefficients which will be converted into logical terms by using threshold operators. The logical terms are then summarized as node content descriptions. The nodes containing similar node descriptions are clustered into a virtual infrastructure according to the semantic content.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127792381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose to use hidden Markov models (HMMs) to classify images. Images are modeled by extracting symbols corresponding to 3x3 binary neighborhoods of interest points, and by ordering these symbols by decreasing saliency order, thus obtaining strings of symbols. HMMs are learned from sets of strings modeling classes of images. The method has been tested on the SIMPLIcity database and shows an improvement over competing approaches based on interest points. We also evaluate these approaches for classifying thumbnail images, i.e., low resolution images.
{"title":"Classification of Images Based on Hidden Markov Models","authors":"Marc Mouret, C. Solnon, Christian Wolf","doi":"10.1109/CBMI.2009.22","DOIUrl":"https://doi.org/10.1109/CBMI.2009.22","url":null,"abstract":"We propose to use hidden Markov models (HMMs) to classify images. Images are modeled by extracting symbols corresponding to 3x3 binary neighborhoods of interest points, and by ordering these symbols by decreasing saliency order, thus obtaining strings of symbols. HMMs are learned from sets of strings modeling classes of images. The method has been tested on the SIMPLIcity database and shows an improvement over competing approaches based on interest points. We also evaluate these approaches for classifying thumbnail images, i.e., low resolution images.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122524668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, a novel view-based approach for 3D object retrieval is introduced. A set of 2D images (multi-views) are automatically generated from a 3D object, by taking views from uniformly distributed viewpoints. For each image, a set of 2D rotation-invariant shape descriptors is extracted. The global shape similarity between two 3D models is achieved by applying a novel matching scheme, which effectively combines the information extracted from the multiview representation. The proposed approach can well serve as a unified framework, supporting multimodal queries (such as sketches, 2D images, 3D objects). The experimental results illustrate the superiority of the method over similar view-based approaches.
{"title":"A Compact Multi-view Descriptor for 3D Object Retrieval","authors":"P. Daras, A. Axenopoulos","doi":"10.1109/CBMI.2009.15","DOIUrl":"https://doi.org/10.1109/CBMI.2009.15","url":null,"abstract":"In this paper, a novel view-based approach for 3D object retrieval is introduced. A set of 2D images (multi-views) are automatically generated from a 3D object, by taking views from uniformly distributed viewpoints. For each image, a set of 2D rotation-invariant shape descriptors is extracted. The global shape similarity between two 3D models is achieved by applying a novel matching scheme, which effectively combines the information extracted from the multiview representation. The proposed approach can well serve as a unified framework, supporting multimodal queries (such as sketches, 2D images, 3D objects). The experimental results illustrate the superiority of the method over similar view-based approaches.","PeriodicalId":417012,"journal":{"name":"2009 Seventh International Workshop on Content-Based Multimedia Indexing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123930586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}