Pub Date : 2012-06-27DOI: 10.1109/CBMI.2012.6269813
Grégory Païs, P. Lambert, Daniel Beauchêne, F. Deloule, B. Ionescu
This paper addresses the automatic movie genre classification in the specific case of animated movies. Two types of information are used. The first one are movie synopsis. For each genre, a symbolic representation of a thematic intensity is extracted from synopsis. Addressed visually, movie content is described with symbolic representations of different mid-level color and activity features. A fusion between the text and image descriptions is performed using a set of symbolic rules conveying human expertise. The approach is tested on a set of 107 animated movies in order to estimate their ”drama” character. It is observed that the text-image fusion achieves a precision up to 78% and a recall of 44%.
{"title":"Animated movie genre detection using symbolic fusion of text and image descriptors","authors":"Grégory Païs, P. Lambert, Daniel Beauchêne, F. Deloule, B. Ionescu","doi":"10.1109/CBMI.2012.6269813","DOIUrl":"https://doi.org/10.1109/CBMI.2012.6269813","url":null,"abstract":"This paper addresses the automatic movie genre classification in the specific case of animated movies. Two types of information are used. The first one are movie synopsis. For each genre, a symbolic representation of a thematic intensity is extracted from synopsis. Addressed visually, movie content is described with symbolic representations of different mid-level color and activity features. A fusion between the text and image descriptions is performed using a set of symbolic rules conveying human expertise. The approach is tested on a set of 107 animated movies in order to estimate their ”drama” character. It is observed that the text-image fusion achieves a precision up to 78% and a recall of 44%.","PeriodicalId":120769,"journal":{"name":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129814655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-06-27DOI: 10.1109/CBMI.2012.6269846
Amel Ksibi, A. Ammar, C. Amar
Automatic photo annotation task aims to describe the semantic content by detecting high level concepts. Most existing approaches are performed by training independent concept detectors omitting the interdependencies between concepts. The obtained annotations are often not so satisfactory. Therefore, a process of annotation refinement is mondatory to improve the imprecise annotation results. Recently, harnessing the contextual correlation between concepts is shown to be an important resource to improve concept detection. In this paper, we propose a new context based concept detection process. For this purpose, we define a new semantic measure called Second order Co-occurence Flickr context similarity (SOCFCS), which aggregates the FCS values of common Flickr related-tags of two target concepts in order to calculate their relative semantic context relatedness (SCR). Our proposed measure is applied to build a concept network as the context space. A Random Walk with Restart process is performed over this network to refine the annotation results by exploring the contextual correlation among concepts. Experimental studies are conducted on ImageCLEF 2011 Collection containing 99 concepts. The results demonstrate the effectiveness of our proposed approach.
照片自动标注任务旨在通过检测高层次的概念来描述语义内容。大多数现有的方法是通过训练独立的概念检测器来执行的,忽略了概念之间的相互依赖性。得到的注解往往不那么令人满意。因此,为了改善标注不精确的结果,需要对标注进行细化。近年来,利用概念之间的上下文相关性被证明是提高概念检测的重要资源。本文提出了一种新的基于上下文的概念检测方法。为此,我们定义了一种新的语义度量,称为二阶共现Flickr上下文相似性(SOCFCS),它将两个目标概念的常见Flickr相关标签的FCS值聚合在一起,以计算它们的相对语义上下文相关性(SCR)。我们提出的方法被应用于构建一个概念网络作为上下文空间。在该网络上执行随机行走(Random Walk with Restart)过程,通过探索概念之间的上下文相关性来改进注释结果。在包含99个概念的ImageCLEF 2011 Collection上进行实验研究。结果证明了我们所提出的方法的有效性。
{"title":"Effective concept detection using Second order Co-occurence Flickr context similarity measure SOCFCS","authors":"Amel Ksibi, A. Ammar, C. Amar","doi":"10.1109/CBMI.2012.6269846","DOIUrl":"https://doi.org/10.1109/CBMI.2012.6269846","url":null,"abstract":"Automatic photo annotation task aims to describe the semantic content by detecting high level concepts. Most existing approaches are performed by training independent concept detectors omitting the interdependencies between concepts. The obtained annotations are often not so satisfactory. Therefore, a process of annotation refinement is mondatory to improve the imprecise annotation results. Recently, harnessing the contextual correlation between concepts is shown to be an important resource to improve concept detection. In this paper, we propose a new context based concept detection process. For this purpose, we define a new semantic measure called Second order Co-occurence Flickr context similarity (SOCFCS), which aggregates the FCS values of common Flickr related-tags of two target concepts in order to calculate their relative semantic context relatedness (SCR). Our proposed measure is applied to build a concept network as the context space. A Random Walk with Restart process is performed over this network to refine the annotation results by exploring the contextual correlation among concepts. Experimental studies are conducted on ImageCLEF 2011 Collection containing 99 concepts. The results demonstrate the effectiveness of our proposed approach.","PeriodicalId":120769,"journal":{"name":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133482781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-06-27DOI: 10.1109/CBMI.2012.6269805
Jean-Philippe Cabanal
Quaero-MSSE is an applicative project of the Quaero collaborative program. This project develops a multimedia search and navigation demonstrator, which gives access to several types of contents (catch-up TV, archives, foreign videos, music) and which illustrates the benefits of advanced audio-video analysis technologies.
{"title":"Quaero-MSSE: A content based multimedia indexing prototype","authors":"Jean-Philippe Cabanal","doi":"10.1109/CBMI.2012.6269805","DOIUrl":"https://doi.org/10.1109/CBMI.2012.6269805","url":null,"abstract":"Quaero-MSSE is an applicative project of the Quaero collaborative program. This project develops a multimedia search and navigation demonstrator, which gives access to several types of contents (catch-up TV, archives, foreign videos, music) and which illustrates the benefits of advanced audio-video analysis technologies.","PeriodicalId":120769,"journal":{"name":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117331129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-06-27DOI: 10.1109/CBMI.2012.6269843
G. Csurka, S. Clinchant
In this paper we propose an empirical study of late fusion operators for multimodal image retrieval. Therefore, we consider two experts, one based on textual and one on visual similarities between documents and study the possibilities to go beyond simple score averaging. The main idea is to exploit the correlation between the two experts by encoding explicitly or implicitly an "and" and an "or" operator in an efficient way. We show through several experiments that the operators that combine both of these two aspects generally outperform the ones that look only to one of them. Based on this observation we propose several generalized version of most classical fusion operators and compare them using ImageClef benchmark datasets both in an unsupervised and in a supervised framework.
{"title":"An empirical study of fusion operators for multimodal image retrieval","authors":"G. Csurka, S. Clinchant","doi":"10.1109/CBMI.2012.6269843","DOIUrl":"https://doi.org/10.1109/CBMI.2012.6269843","url":null,"abstract":"In this paper we propose an empirical study of late fusion operators for multimodal image retrieval. Therefore, we consider two experts, one based on textual and one on visual similarities between documents and study the possibilities to go beyond simple score averaging. The main idea is to exploit the correlation between the two experts by encoding explicitly or implicitly an \"and\" and an \"or\" operator in an efficient way. We show through several experiments that the operators that combine both of these two aspects generally outperform the ones that look only to one of them. Based on this observation we propose several generalized version of most classical fusion operators and compare them using ImageClef benchmark datasets both in an unsupervised and in a supervised framework.","PeriodicalId":120769,"journal":{"name":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127655691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-06-27DOI: 10.1109/CBMI.2012.6269797
Fattah Alizadeh, Alistair Sutherland
3D Model Retrieval is one of the most popular topics in computer vision and huge efforts are dedicated to finding a way to improve retrieval accuracy. Defining a new efficient and effective way to describe 3D models plays a critical role in the retrieval process. In this paper we propose a view-based shape signature to search and retrieve 3D objects using the 2D Poisson equation. Our proposed method uses 60 different 2D silhouettes, which are automatically extracted from different view-angles of 3D models. Solving the Poisson equation for each Silhouette assigns a number to each pixel as the pixel's signature. Counting and accumulating these pixel signatures generates a histogram-based signature for each silhouette (Silhouette Poisson Histogram or simply SilPH). By doing some preprocessing steps one can see that the signature is insensitive to rotation, scaling and translation. The results show a high power of discrimination on the McGill dataset and demonstrate that the proposed method outperforms other existing methods.
{"title":"3D model retrieval using the 2D Poisson equation","authors":"Fattah Alizadeh, Alistair Sutherland","doi":"10.1109/CBMI.2012.6269797","DOIUrl":"https://doi.org/10.1109/CBMI.2012.6269797","url":null,"abstract":"3D Model Retrieval is one of the most popular topics in computer vision and huge efforts are dedicated to finding a way to improve retrieval accuracy. Defining a new efficient and effective way to describe 3D models plays a critical role in the retrieval process. In this paper we propose a view-based shape signature to search and retrieve 3D objects using the 2D Poisson equation. Our proposed method uses 60 different 2D silhouettes, which are automatically extracted from different view-angles of 3D models. Solving the Poisson equation for each Silhouette assigns a number to each pixel as the pixel's signature. Counting and accumulating these pixel signatures generates a histogram-based signature for each silhouette (Silhouette Poisson Histogram or simply SilPH). By doing some preprocessing steps one can see that the signature is insensitive to rotation, scaling and translation. The results show a high power of discrimination on the McGill dataset and demonstrate that the proposed method outperforms other existing methods.","PeriodicalId":120769,"journal":{"name":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121090437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-06-27DOI: 10.1109/CBMI.2012.6269806
Duc-Tien Dang-Nguyen, G. Boato, Alessandro Moschitti, F. D. Natale
Nowadays, large-scale networked social media need better search technologies to achieve suitable performance. Multimodal approaches are promising technologies to improve image ranking. This is particularly true when metadata are not completely reliable, which is a rather common case as far as user annotation, time and location are concerned. In this paper, we propose to properly combine visual information with additional multi-faceted information, to define a novel multimodal similarity measure. More specifically, we combine visual features, which strongly relate to the image content, with semantic information represented by manually annotated concepts, and geo tagging, very often available in the form of object/subject location. Furthermore, we propose a supervised machine learning approach, based on Support Vector Machines (SVMs), to automatically learn optimized weights to combine the above features. The resulting models is used as a ranking function to sort the results of a multimodal query.
{"title":"Supervised models for multimodal image retrieval based on visual, semantic and geographic information","authors":"Duc-Tien Dang-Nguyen, G. Boato, Alessandro Moschitti, F. D. Natale","doi":"10.1109/CBMI.2012.6269806","DOIUrl":"https://doi.org/10.1109/CBMI.2012.6269806","url":null,"abstract":"Nowadays, large-scale networked social media need better search technologies to achieve suitable performance. Multimodal approaches are promising technologies to improve image ranking. This is particularly true when metadata are not completely reliable, which is a rather common case as far as user annotation, time and location are concerned. In this paper, we propose to properly combine visual information with additional multi-faceted information, to define a novel multimodal similarity measure. More specifically, we combine visual features, which strongly relate to the image content, with semantic information represented by manually annotated concepts, and geo tagging, very often available in the form of object/subject location. Furthermore, we propose a supervised machine learning approach, based on Support Vector Machines (SVMs), to automatically learn optimized weights to combine the above features. The resulting models is used as a ranking function to sort the results of a multimodal query.","PeriodicalId":120769,"journal":{"name":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126312327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-06-27DOI: 10.1109/CBMI.2012.6269838
Angelo Nodari, Matteo Ghiringhelli, Alessandro Zamberletti, M. Vanetti, S. Albertini, I. Gallo
In this study we propose a mobile application which interfaces with a Content-Based Image Retrieval engine for online shopping in the fashion domain. Using this application it is possible to take a picture of a garment to retrieve its most similar products. The proposed method is firstly presented as an application in which the user manually select the name of the subject framed by the camera, before sending the request to the server. In the second part we propose an advanced approach which automatically classifies the object of interest, in this way it is possible to minimize the effort required by the user during the query process. In order to evaluate the performance of the proposed method, we have collected three datasets: the first contains clothing images of products taken from different online shops, whereas for the other datasets we have used images and video frames of clothes taken by Internet users. The results show the feasibility in the use of the proposed mobile application in a real scenario.
{"title":"A mobile visual search application for content based image retrieval in the fashion domain","authors":"Angelo Nodari, Matteo Ghiringhelli, Alessandro Zamberletti, M. Vanetti, S. Albertini, I. Gallo","doi":"10.1109/CBMI.2012.6269838","DOIUrl":"https://doi.org/10.1109/CBMI.2012.6269838","url":null,"abstract":"In this study we propose a mobile application which interfaces with a Content-Based Image Retrieval engine for online shopping in the fashion domain. Using this application it is possible to take a picture of a garment to retrieve its most similar products. The proposed method is firstly presented as an application in which the user manually select the name of the subject framed by the camera, before sending the request to the server. In the second part we propose an advanced approach which automatically classifies the object of interest, in this way it is possible to minimize the effort required by the user during the query process. In order to evaluate the performance of the proposed method, we have collected three datasets: the first contains clothing images of products taken from different online shops, whereas for the other datasets we have used images and video frames of clothes taken by Internet users. The results show the feasibility in the use of the proposed mobile application in a real scenario.","PeriodicalId":120769,"journal":{"name":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125497898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-06-27DOI: 10.1109/CBMI.2012.6269845
M. L. Coz, R. André-Obrecht, J. Pinquier
The music is commonly structured in terms of the three classical categories which are instrumental, singing or singing-instrumental parts. To refine this notion, the number of singers and/or instruments is searched. An important difficulty appears when a choir in unison is observed: several singers try to reach the same note at the same time and the classical pitch analysis fails. This paper presents a method to detect such situation in an a capella context (without instrument). The approach is based on a temporal segmentation followed by a frequency tracking inside located frequency bands; it exploits the apparent splitting of the high harmonics due to small difference between the singers. The first results obtained on ethnomusicological corpora are quite satisfying and offer interesting perspectives to our work.
{"title":"Feasibility of the detection of choirs for ethnomusicologic music indexing","authors":"M. L. Coz, R. André-Obrecht, J. Pinquier","doi":"10.1109/CBMI.2012.6269845","DOIUrl":"https://doi.org/10.1109/CBMI.2012.6269845","url":null,"abstract":"The music is commonly structured in terms of the three classical categories which are instrumental, singing or singing-instrumental parts. To refine this notion, the number of singers and/or instruments is searched. An important difficulty appears when a choir in unison is observed: several singers try to reach the same note at the same time and the classical pitch analysis fails. This paper presents a method to detect such situation in an a capella context (without instrument). The approach is based on a temporal segmentation followed by a frequency tracking inside located frequency bands; it exploits the apparent splitting of the high harmonics due to small difference between the singers. The first results obtained on ethnomusicological corpora are quite satisfying and offer interesting perspectives to our work.","PeriodicalId":120769,"journal":{"name":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129172928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-06-27DOI: 10.1109/CBMI.2012.6269809
Boris Mansencal, J. Benois-Pineau, Rémi Vieux, J. Domenger
The paper addresses the problem of object search in video content. Both Query-By-Example paradigm and context search are explored. In QBE paradigm the object of interest is searched by matching of object signatures built from SURF descriptors with on-the-fly computed signatures in frames. The ”context” search is understood as a query on the whole frame with features extracted after a region-based segmentation. Both kinds of features are transcribed in Bag-Of-Words framework. The combination of Bag-of-Visual-Words and Bag-of-Region-Words gives promising results in TRECVID'2011 Instance Search Task.
{"title":"Search of objects of interest in videos","authors":"Boris Mansencal, J. Benois-Pineau, Rémi Vieux, J. Domenger","doi":"10.1109/CBMI.2012.6269809","DOIUrl":"https://doi.org/10.1109/CBMI.2012.6269809","url":null,"abstract":"The paper addresses the problem of object search in video content. Both Query-By-Example paradigm and context search are explored. In QBE paradigm the object of interest is searched by matching of object signatures built from SURF descriptors with on-the-fly computed signatures in frames. The ”context” search is understood as a query on the whole frame with features extracted after a region-based segmentation. Both kinds of features are transcribed in Bag-Of-Words framework. The combination of Bag-of-Visual-Words and Bag-of-Region-Words gives promising results in TRECVID'2011 Instance Search Task.","PeriodicalId":120769,"journal":{"name":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127352532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-06-27DOI: 10.1109/CBMI.2012.6269848
G. Gudmundsson, L. Amsaleg, B. Jónsson
This paper describes an initial study where the open-source Hadoop parallel and distributed run-time environment is used to speedup the construction phase of a large high-dimensional index. This paper first discusses the typical practical problems developers may run into when porting their code to Hadoop. It then presents early experimental results showing that the performance gains are substantial when indexing large data sets.
{"title":"Distributed high-dimensional index creation using Hadoop, HDFS and C++","authors":"G. Gudmundsson, L. Amsaleg, B. Jónsson","doi":"10.1109/CBMI.2012.6269848","DOIUrl":"https://doi.org/10.1109/CBMI.2012.6269848","url":null,"abstract":"This paper describes an initial study where the open-source Hadoop parallel and distributed run-time environment is used to speedup the construction phase of a large high-dimensional index. This paper first discusses the typical practical problems developers may run into when porting their code to Hadoop. It then presents early experimental results showing that the performance gains are substantial when indexing large data sets.","PeriodicalId":120769,"journal":{"name":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116741427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}