Pub Date : 2012-06-27DOI: 10.1109/CBMI.2012.6269836
Philippe Ercolessi, Christine Sénac, H. Bredin
Multiple sub-stories usually coexist in every episode of a TV series. We propose several variants of an approach for plot de-interlacing based on scenes clustering - with the ultimate goal of providing the end-user with tools for fast and easy overview of one episode, one season or the whole TV series. Each scene can be described in three different ways (based on color histograms, speaker diarization or automatic speech recognition outputs) and four clustering approaches are investigated, one of them based on a graphical representation of the video. Experiments are performed on two TV series of different lengths and formats. We show that semantic descriptors (such as speaker diarization) give the best results and underline that our approach provides useful information for plot de-interlacing.
{"title":"Toward plot de-interlacing in TV series using scenes clustering","authors":"Philippe Ercolessi, Christine Sénac, H. Bredin","doi":"10.1109/CBMI.2012.6269836","DOIUrl":"https://doi.org/10.1109/CBMI.2012.6269836","url":null,"abstract":"Multiple sub-stories usually coexist in every episode of a TV series. We propose several variants of an approach for plot de-interlacing based on scenes clustering - with the ultimate goal of providing the end-user with tools for fast and easy overview of one episode, one season or the whole TV series. Each scene can be described in three different ways (based on color histograms, speaker diarization or automatic speech recognition outputs) and four clustering approaches are investigated, one of them based on a graphical representation of the video. Experiments are performed on two TV series of different lengths and formats. We show that semantic descriptors (such as speaker diarization) give the best results and underline that our approach provides useful information for plot de-interlacing.","PeriodicalId":120769,"journal":{"name":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126215687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-06-27DOI: 10.1109/CBMI.2012.6269842
Delphine Charlet, Géraldine Damnati
Politician speaker turn detection in TV Broadcast News shows is addressed in this paper. After a first role labeling pass of speaker turns among anchor, reporter and other, turns labeled as other are submitted to a politician speech detection process. The proposed approach combines acoustical and lexical cues as well as contextual information, and does not use any specific politician model (person-independent). Experiments on a set of 101 TV broadcast news shows show that the proposed approach, which relies on fully automatic processing, enables to detect politician speech with an equal error rate of 12.1%, which turns to a maximal F-measure of 70.3% due to the unbalanced distribution among politicians and non-politicians.
{"title":"Detecting politician speech in TV broadcast news shows","authors":"Delphine Charlet, Géraldine Damnati","doi":"10.1109/CBMI.2012.6269842","DOIUrl":"https://doi.org/10.1109/CBMI.2012.6269842","url":null,"abstract":"Politician speaker turn detection in TV Broadcast News shows is addressed in this paper. After a first role labeling pass of speaker turns among anchor, reporter and other, turns labeled as other are submitted to a politician speech detection process. The proposed approach combines acoustical and lexical cues as well as contextual information, and does not use any specific politician model (person-independent). Experiments on a set of 101 TV broadcast news shows show that the proposed approach, which relies on fully automatic processing, enables to detect politician speech with an equal error rate of 12.1%, which turns to a maximal F-measure of 70.3% due to the unbalanced distribution among politicians and non-politicians.","PeriodicalId":120769,"journal":{"name":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133856590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-06-27DOI: 10.1109/CBMI.2012.6269793
Zineddine Kouahla, José Martinez
Retrieval of images based on their contents is a process that requires comparisons of a given query (image) with virtually all the images stored in a database with respect to a given distance function. But this is inapplicable on large databases. The main difficulties and goals are to focus the search on as few images as possible and to further limit the need to compute extensive distances between them. Here, we introduce a variant of a metric tree data structure for indexing and querying such data. Both a sequential and a parallel versions are introduced. The efficiency of our proposal is studied through experiments on real-world datasets.
{"title":"A new intersection tree for content-based image retrieval","authors":"Zineddine Kouahla, José Martinez","doi":"10.1109/CBMI.2012.6269793","DOIUrl":"https://doi.org/10.1109/CBMI.2012.6269793","url":null,"abstract":"Retrieval of images based on their contents is a process that requires comparisons of a given query (image) with virtually all the images stored in a database with respect to a given distance function. But this is inapplicable on large databases. The main difficulties and goals are to focus the search on as few images as possible and to further limit the need to compute extensive distances between them. Here, we introduce a variant of a metric tree data structure for indexing and querying such data. Both a sequential and a parallel versions are introduced. The efficiency of our proposal is studied through experiments on real-world datasets.","PeriodicalId":120769,"journal":{"name":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116873563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-06-27DOI: 10.1109/CBMI.2012.6269801
L. Capodiferro, Luca Costantini, F. Mangiatordi, E. Pallotti
In this work a pre-processing strategy to improve the performances of SVM in video clips classification is proposed. The segmentation of a video clip and the extraction of key frames, whose representation in terms of low-level features constitute the basic elements for the generation of the SVM data sets, are generally performed in an automatic way. This approach may produce several noise data, and it is therefore desirable to find a removal strategy. Noise key frames are usually detected when video includes color bars, test cards or other homogeneous frames. Duplicated key frames, generated when video is steady for a long while, also need to be removed. In this paper we propose a data clustering method that performs an automatic pre-processing of SVM data sets, to minimize the presence of noise. Our experiments show an example of classification of historical sport video clips, demonstrating that the proposed pre-processing strategy improves the overall performances of SVM.
{"title":"Data pre-processing to improve SVM video classification","authors":"L. Capodiferro, Luca Costantini, F. Mangiatordi, E. Pallotti","doi":"10.1109/CBMI.2012.6269801","DOIUrl":"https://doi.org/10.1109/CBMI.2012.6269801","url":null,"abstract":"In this work a pre-processing strategy to improve the performances of SVM in video clips classification is proposed. The segmentation of a video clip and the extraction of key frames, whose representation in terms of low-level features constitute the basic elements for the generation of the SVM data sets, are generally performed in an automatic way. This approach may produce several noise data, and it is therefore desirable to find a removal strategy. Noise key frames are usually detected when video includes color bars, test cards or other homogeneous frames. Duplicated key frames, generated when video is steady for a long while, also need to be removed. In this paper we propose a data clustering method that performs an automatic pre-processing of SVM data sets, to minimize the presence of noise. Our experiments show an example of classification of historical sport video clips, demonstrating that the proposed pre-processing strategy improves the overall performances of SVM.","PeriodicalId":120769,"journal":{"name":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115825938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-06-27DOI: 10.1109/CBMI.2012.6269792
G. Tómasson, G. Olafsson, Hlynur Sigurþórsson, B. Jónsson, K. Runarsson, L. Amsaleg
As digital image collections have been growing ever larger, the multimedia community has put emphasis on methods for image content analysis and presentation. To facilitate extensive user studies of these methods, a single platform is needed that can uniformly incorporate all the analysis and presentation methods under study. Due to its extensibility features, a plug-in API for image analysis methods and a browsing mode API for presentation methods, we believe that the PhotoCube browser can be that platform. We propose a demonstration focusing primarily on these features, allowing participants to appreciate the full potential of PhotoCube as a demonstration platform.
{"title":"Using PhotoCube as an extensible demonstration platform for advanced image analysis techniques","authors":"G. Tómasson, G. Olafsson, Hlynur Sigurþórsson, B. Jónsson, K. Runarsson, L. Amsaleg","doi":"10.1109/CBMI.2012.6269792","DOIUrl":"https://doi.org/10.1109/CBMI.2012.6269792","url":null,"abstract":"As digital image collections have been growing ever larger, the multimedia community has put emphasis on methods for image content analysis and presentation. To facilitate extensive user studies of these methods, a single platform is needed that can uniformly incorporate all the analysis and presentation methods under study. Due to its extensibility features, a plug-in API for image analysis methods and a browsing mode API for presentation methods, we believe that the PhotoCube browser can be that platform. We propose a demonstration focusing primarily on these features, allowing participants to appreciate the full potential of PhotoCube as a demonstration platform.","PeriodicalId":120769,"journal":{"name":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128695032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-06-27DOI: 10.1109/CBMI.2012.6269810
Maria Eskevich, G. Jones, Christian Wartena, M. Larson, Robin Aly, T. Verschoor, R. Ordelman
We present an exploratory study of the retrieval of semiprofessional user-generated Internet video. The study is based on the MediaEval 2011 Rich Speech Retrieval (RSR) task for which the dataset was taken from the Internet sharing platform blip.tv, and search queries associated with specific speech acts occurring in the video. We compare results from three participant groups using: automatic speech recognition system transcript (ASR), metadata manually assigned to each video by the user who uploaded it, and their combination. RSR 2011 was a known-item search for a single manually identified ideal jump-in point in the video for each query where playback should begin. Retrieval effectiveness is measured using the MRR and mGAP metrics. Using different transcript segmentation methods the participants tried to maximize the rank of the relevant item and to locate the nearest match to the ideal jump-in point. Results indicate that best overall results are obtained for topically homogeneous segments which have a strong overlap with the relevant region associated with the jump-in point, and that use of metadata can be beneficial when segments are unfocused or cover more than one topic.
{"title":"Comparing retrieval effectiveness of alternative content segmentation methods for Internet video search","authors":"Maria Eskevich, G. Jones, Christian Wartena, M. Larson, Robin Aly, T. Verschoor, R. Ordelman","doi":"10.1109/CBMI.2012.6269810","DOIUrl":"https://doi.org/10.1109/CBMI.2012.6269810","url":null,"abstract":"We present an exploratory study of the retrieval of semiprofessional user-generated Internet video. The study is based on the MediaEval 2011 Rich Speech Retrieval (RSR) task for which the dataset was taken from the Internet sharing platform blip.tv, and search queries associated with specific speech acts occurring in the video. We compare results from three participant groups using: automatic speech recognition system transcript (ASR), metadata manually assigned to each video by the user who uploaded it, and their combination. RSR 2011 was a known-item search for a single manually identified ideal jump-in point in the video for each query where playback should begin. Retrieval effectiveness is measured using the MRR and mGAP metrics. Using different transcript segmentation methods the participants tried to maximize the rank of the relevant item and to locate the nearest match to the ideal jump-in point. Results indicate that best overall results are obtained for topically homogeneous segments which have a strong overlap with the relevant region associated with the jump-in point, and that use of metadata can be beneficial when segments are unfocused or cover more than one topic.","PeriodicalId":120769,"journal":{"name":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130848936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-06-27DOI: 10.1109/CBMI.2012.6269799
Jinlin Guo, David Scott, F. Hopfgartner, C. Gurrin
Automatic detection of complex events in user-generated videos (UGV) is a challenging task due to its new characteristics differing from broadcast video. In this work, we firstly summarize the new characteristics of UGV, and then explore how to utilize concept classifiers to recognize complex events in UGV content. The method starts from manually selecting a variety of relevant concepts, followed by constructing classifiers for these concepts. Finally, complex event detectors are learned by using the concatenated probabilistic scores of these concept classifiers as features. Further, we also compare three different fusion operations of probabilistic scores, namely Maximum, Average and Minimum fusion. Experimental results suggest that our method provides promising results. It also shows that Maximum fusion tends to give better performance for most complex events.
{"title":"Detecting complex events in user-generated video using concept classifiers","authors":"Jinlin Guo, David Scott, F. Hopfgartner, C. Gurrin","doi":"10.1109/CBMI.2012.6269799","DOIUrl":"https://doi.org/10.1109/CBMI.2012.6269799","url":null,"abstract":"Automatic detection of complex events in user-generated videos (UGV) is a challenging task due to its new characteristics differing from broadcast video. In this work, we firstly summarize the new characteristics of UGV, and then explore how to utilize concept classifiers to recognize complex events in UGV content. The method starts from manually selecting a variety of relevant concepts, followed by constructing classifiers for these concepts. Finally, complex event detectors are learned by using the concatenated probabilistic scores of these concept classifiers as features. Further, we also compare three different fusion operations of probabilistic scores, namely Maximum, Average and Minimum fusion. Experimental results suggest that our method provides promising results. It also shows that Maximum fusion tends to give better performance for most complex events.","PeriodicalId":120769,"journal":{"name":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121663117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-06-27DOI: 10.1109/CBMI.2012.6269849
M. Law, Carlos Sureda Gutiérrez, Nicolas Thome, Stéphane Gançarski
We present in this paper a Web page archiving approach combining image and structural techniques. Our main goal is to learn a similarity between Web pages in order to detect whether successive versions of pages are similar or not. Our system is based on a visual similarity measure designed for Web pages. Combined with a structural analysis of Web page source codes, a supervised feature selection method adapted to Web archiving is proposed. Experiments on real Web archives are reported including scalability issues.
{"title":"Structural and visual similarity learning for Web page archiving","authors":"M. Law, Carlos Sureda Gutiérrez, Nicolas Thome, Stéphane Gançarski","doi":"10.1109/CBMI.2012.6269849","DOIUrl":"https://doi.org/10.1109/CBMI.2012.6269849","url":null,"abstract":"We present in this paper a Web page archiving approach combining image and structural techniques. Our main goal is to learn a similarity between Web pages in order to detect whether successive versions of pages are similar or not. Our system is based on a visual similarity measure designed for Web pages. Combined with a structural analysis of Web page source codes, a supervised feature selection method adapted to Web archiving is proposed. Experiments on real Web archives are reported including scalability issues.","PeriodicalId":120769,"journal":{"name":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122832882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-06-27DOI: 10.1109/CBMI.2012.6269794
Miriam Redi, B. Mérialdo
The Bag of Words model is probably one of the most effective ways to represent images based on the aggregation of locally extracted descriptors. It uses clustering techniques to build visual dictionaries that map each image into a fixed length signature. Despite its effectiveness, one major drawback of this model is the codebook informativeness and its computational complexity. In this paper we propose Copula-BoW (C-BoW), namely an efficient local feature aggregator inspired by the Copula theory. In C-BoW, we build in a quadratic time an efficient codebook for vector quantization, based on the correlation of the marginal distributions of the local features. Our experimental results prove that the C-BoW signature is much more efficient and as discriminative as traditional BoW for scene recognition and video retrieval (TRECVID [14] data). Moreover, we also show that our new model provides complementary information when combined to existing local features aggregators, substantially improving the final retrieval performance.
Bag of Words模型可能是基于局部提取描述符的聚合来表示图像的最有效方法之一。它使用聚类技术构建可视字典,将每个图像映射到固定长度的签名中。尽管它很有效,但该模型的一个主要缺点是码本的信息量和计算复杂度。本文提出了一种基于Copula理论的高效局部特征聚合器Copula- bow (C-BoW)。在C-BoW中,我们基于局部特征边缘分布的相关性,在二次元时间内构建了一个有效的矢量量化码本。实验结果表明,在场景识别和视频检索(TRECVID[14]数据)中,C-BoW签名比传统的BoW签名更有效,具有更好的判别能力。此外,我们还表明,当与现有的局部特征聚合器结合时,我们的新模型提供了互补的信息,大大提高了最终的检索性能。
{"title":"Fitting Gaussian copulae for efficient visual codebooks generation","authors":"Miriam Redi, B. Mérialdo","doi":"10.1109/CBMI.2012.6269794","DOIUrl":"https://doi.org/10.1109/CBMI.2012.6269794","url":null,"abstract":"The Bag of Words model is probably one of the most effective ways to represent images based on the aggregation of locally extracted descriptors. It uses clustering techniques to build visual dictionaries that map each image into a fixed length signature. Despite its effectiveness, one major drawback of this model is the codebook informativeness and its computational complexity. In this paper we propose Copula-BoW (C-BoW), namely an efficient local feature aggregator inspired by the Copula theory. In C-BoW, we build in a quadratic time an efficient codebook for vector quantization, based on the correlation of the marginal distributions of the local features. Our experimental results prove that the C-BoW signature is much more efficient and as discriminative as traditional BoW for scene recognition and video retrieval (TRECVID [14] data). Moreover, we also show that our new model provides complementary information when combined to existing local features aggregators, substantially improving the final retrieval performance.","PeriodicalId":120769,"journal":{"name":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"4 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131381878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-06-27DOI: 10.1109/CBMI.2012.6269847
Thibault Tournier, S. Bres, Elöd Egyed-Zsigmond
This paper presents a new system, running on smartphone, to provide augmented reality in unconstrained context like an urban environment. Our approach is mainly based on interest points extraction and description. We use two new improvements of existing methods to increase the general performance. The first one allows optimization of the matching step with a BRIEF descriptor and a corner detector. The second one describes the functioning of a new tracker. We add other improvements to get more stable points especially in our urban environment and speed up the matching process between interest points. Thanks to these improvements, we were able to implement real time augmented reality on smartphones in urban environment. Demonstration videos are provided with the paper.
{"title":"Insertion of tags in urban scenes in real time on smartphone","authors":"Thibault Tournier, S. Bres, Elöd Egyed-Zsigmond","doi":"10.1109/CBMI.2012.6269847","DOIUrl":"https://doi.org/10.1109/CBMI.2012.6269847","url":null,"abstract":"This paper presents a new system, running on smartphone, to provide augmented reality in unconstrained context like an urban environment. Our approach is mainly based on interest points extraction and description. We use two new improvements of existing methods to increase the general performance. The first one allows optimization of the matching step with a BRIEF descriptor and a corner detector. The second one describes the functioning of a new tracker. We add other improvements to get more stable points especially in our urban environment and speed up the matching process between interest points. Thanks to these improvements, we were able to implement real time augmented reality on smartphones in urban environment. Demonstration videos are provided with the paper.","PeriodicalId":120769,"journal":{"name":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123239515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}