首页 > 最新文献

2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)最新文献

英文 中文
Toward plot de-interlacing in TV series using scenes clustering 基于场景聚类的电视剧情节去交错研究
Pub Date : 2012-06-27 DOI: 10.1109/CBMI.2012.6269836
Philippe Ercolessi, Christine Sénac, H. Bredin
Multiple sub-stories usually coexist in every episode of a TV series. We propose several variants of an approach for plot de-interlacing based on scenes clustering - with the ultimate goal of providing the end-user with tools for fast and easy overview of one episode, one season or the whole TV series. Each scene can be described in three different ways (based on color histograms, speaker diarization or automatic speech recognition outputs) and four clustering approaches are investigated, one of them based on a graphical representation of the video. Experiments are performed on two TV series of different lengths and formats. We show that semantic descriptors (such as speaker diarization) give the best results and underline that our approach provides useful information for plot de-interlacing.
电视剧的每一集通常会同时出现多个子故事。我们提出了几种基于场景聚类的情节去隔行化方法的变体——最终目标是为最终用户提供快速简单地概述一集、一季或整个电视剧的工具。每个场景都可以用三种不同的方式来描述(基于颜色直方图、说话者dialarization或自动语音识别输出),并研究了四种聚类方法,其中一种方法基于视频的图形表示。在两部不同长度和格式的电视剧上进行了实验。我们证明了语义描述符(如说话人dialarization)给出了最好的结果,并强调我们的方法为情节去隔行化提供了有用的信息。
{"title":"Toward plot de-interlacing in TV series using scenes clustering","authors":"Philippe Ercolessi, Christine Sénac, H. Bredin","doi":"10.1109/CBMI.2012.6269836","DOIUrl":"https://doi.org/10.1109/CBMI.2012.6269836","url":null,"abstract":"Multiple sub-stories usually coexist in every episode of a TV series. We propose several variants of an approach for plot de-interlacing based on scenes clustering - with the ultimate goal of providing the end-user with tools for fast and easy overview of one episode, one season or the whole TV series. Each scene can be described in three different ways (based on color histograms, speaker diarization or automatic speech recognition outputs) and four clustering approaches are investigated, one of them based on a graphical representation of the video. Experiments are performed on two TV series of different lengths and formats. We show that semantic descriptors (such as speaker diarization) give the best results and underline that our approach provides useful information for plot de-interlacing.","PeriodicalId":120769,"journal":{"name":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126215687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Detecting politician speech in TV broadcast news shows 在电视广播新闻节目中发现政治家的讲话
Pub Date : 2012-06-27 DOI: 10.1109/CBMI.2012.6269842
Delphine Charlet, Géraldine Damnati
Politician speaker turn detection in TV Broadcast News shows is addressed in this paper. After a first role labeling pass of speaker turns among anchor, reporter and other, turns labeled as other are submitted to a politician speech detection process. The proposed approach combines acoustical and lexical cues as well as contextual information, and does not use any specific politician model (person-independent). Experiments on a set of 101 TV broadcast news shows show that the proposed approach, which relies on fully automatic processing, enables to detect politician speech with an equal error rate of 12.1%, which turns to a maximal F-measure of 70.3% due to the unbalanced distribution among politicians and non-politicians.
本文研究了电视广播新闻节目中政治家说话人转向检测问题。演讲者在主持人、记者和其他人之间进行第一次角色标记后,将被标记为其他人的人提交给政治家语音检测过程。所提出的方法结合了声学和词汇线索以及上下文信息,并且不使用任何特定的政治家模型(个人独立)。在101个电视广播新闻节目上的实验表明,基于全自动处理的方法能够以12.1%的平均错误率检测出政治家言论,由于政治家和非政治家之间分布的不平衡,其最大f值为70.3%。
{"title":"Detecting politician speech in TV broadcast news shows","authors":"Delphine Charlet, Géraldine Damnati","doi":"10.1109/CBMI.2012.6269842","DOIUrl":"https://doi.org/10.1109/CBMI.2012.6269842","url":null,"abstract":"Politician speaker turn detection in TV Broadcast News shows is addressed in this paper. After a first role labeling pass of speaker turns among anchor, reporter and other, turns labeled as other are submitted to a politician speech detection process. The proposed approach combines acoustical and lexical cues as well as contextual information, and does not use any specific politician model (person-independent). Experiments on a set of 101 TV broadcast news shows show that the proposed approach, which relies on fully automatic processing, enables to detect politician speech with an equal error rate of 12.1%, which turns to a maximal F-measure of 70.3% due to the unbalanced distribution among politicians and non-politicians.","PeriodicalId":120769,"journal":{"name":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133856590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A new intersection tree for content-based image retrieval 一种新的基于内容的图像检索交叉树
Pub Date : 2012-06-27 DOI: 10.1109/CBMI.2012.6269793
Zineddine Kouahla, José Martinez
Retrieval of images based on their contents is a process that requires comparisons of a given query (image) with virtually all the images stored in a database with respect to a given distance function. But this is inapplicable on large databases. The main difficulties and goals are to focus the search on as few images as possible and to further limit the need to compute extensive distances between them. Here, we introduce a variant of a metric tree data structure for indexing and querying such data. Both a sequential and a parallel versions are introduced. The efficiency of our proposal is studied through experiments on real-world datasets.
基于内容的图像检索是一个需要将给定查询(图像)与数据库中存储的几乎所有图像根据给定距离函数进行比较的过程。但这不适用于大型数据库。主要的困难和目标是将搜索集中在尽可能少的图像上,并进一步限制计算图像之间广泛距离的需要。在这里,我们将介绍用于索引和查询此类数据的度量树数据结构的一种变体。介绍了顺序和并行两种版本。通过对真实数据集的实验研究了我们的建议的效率。
{"title":"A new intersection tree for content-based image retrieval","authors":"Zineddine Kouahla, José Martinez","doi":"10.1109/CBMI.2012.6269793","DOIUrl":"https://doi.org/10.1109/CBMI.2012.6269793","url":null,"abstract":"Retrieval of images based on their contents is a process that requires comparisons of a given query (image) with virtually all the images stored in a database with respect to a given distance function. But this is inapplicable on large databases. The main difficulties and goals are to focus the search on as few images as possible and to further limit the need to compute extensive distances between them. Here, we introduce a variant of a metric tree data structure for indexing and querying such data. Both a sequential and a parallel versions are introduced. The efficiency of our proposal is studied through experiments on real-world datasets.","PeriodicalId":120769,"journal":{"name":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116873563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Data pre-processing to improve SVM video classification 数据预处理改进SVM视频分类
Pub Date : 2012-06-27 DOI: 10.1109/CBMI.2012.6269801
L. Capodiferro, Luca Costantini, F. Mangiatordi, E. Pallotti
In this work a pre-processing strategy to improve the performances of SVM in video clips classification is proposed. The segmentation of a video clip and the extraction of key frames, whose representation in terms of low-level features constitute the basic elements for the generation of the SVM data sets, are generally performed in an automatic way. This approach may produce several noise data, and it is therefore desirable to find a removal strategy. Noise key frames are usually detected when video includes color bars, test cards or other homogeneous frames. Duplicated key frames, generated when video is steady for a long while, also need to be removed. In this paper we propose a data clustering method that performs an automatic pre-processing of SVM data sets, to minimize the presence of noise. Our experiments show an example of classification of historical sport video clips, demonstrating that the proposed pre-processing strategy improves the overall performances of SVM.
为了提高支持向量机在视频片段分类中的性能,本文提出了一种预处理策略。视频片段的分割和关键帧的提取是生成支持向量机数据集的基本要素,而关键帧的底层特征表示是关键帧的自动提取。这种方法可能会产生一些噪声数据,因此需要找到一种去除策略。噪声关键帧通常在视频包含色条、测试卡或其他同质帧时检测到。视频长时间稳定时产生的重复关键帧也需要删除。在本文中,我们提出了一种数据聚类方法,该方法对支持向量机数据集进行自动预处理,以尽量减少噪声的存在。我们的实验显示了一个历史体育视频片段的分类示例,表明所提出的预处理策略提高了支持向量机的整体性能。
{"title":"Data pre-processing to improve SVM video classification","authors":"L. Capodiferro, Luca Costantini, F. Mangiatordi, E. Pallotti","doi":"10.1109/CBMI.2012.6269801","DOIUrl":"https://doi.org/10.1109/CBMI.2012.6269801","url":null,"abstract":"In this work a pre-processing strategy to improve the performances of SVM in video clips classification is proposed. The segmentation of a video clip and the extraction of key frames, whose representation in terms of low-level features constitute the basic elements for the generation of the SVM data sets, are generally performed in an automatic way. This approach may produce several noise data, and it is therefore desirable to find a removal strategy. Noise key frames are usually detected when video includes color bars, test cards or other homogeneous frames. Duplicated key frames, generated when video is steady for a long while, also need to be removed. In this paper we propose a data clustering method that performs an automatic pre-processing of SVM data sets, to minimize the presence of noise. Our experiments show an example of classification of historical sport video clips, demonstrating that the proposed pre-processing strategy improves the overall performances of SVM.","PeriodicalId":120769,"journal":{"name":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115825938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Using PhotoCube as an extensible demonstration platform for advanced image analysis techniques 使用PhotoCube作为高级图像分析技术的可扩展演示平台
Pub Date : 2012-06-27 DOI: 10.1109/CBMI.2012.6269792
G. Tómasson, G. Olafsson, Hlynur Sigurþórsson, B. Jónsson, K. Runarsson, L. Amsaleg
As digital image collections have been growing ever larger, the multimedia community has put emphasis on methods for image content analysis and presentation. To facilitate extensive user studies of these methods, a single platform is needed that can uniformly incorporate all the analysis and presentation methods under study. Due to its extensibility features, a plug-in API for image analysis methods and a browsing mode API for presentation methods, we believe that the PhotoCube browser can be that platform. We propose a demonstration focusing primarily on these features, allowing participants to appreciate the full potential of PhotoCube as a demonstration platform.
随着数字图像收集量的增长,多媒体社区已经把重点放在图像内容分析和呈现的方法上。为了方便用户对这些方法进行广泛的研究,需要一个单一的平台来统一地整合所有正在研究的分析和表示方法。由于其可扩展性,一个用于图像分析方法的插件API和一个用于表示方法的浏览模式API,我们相信PhotoCube浏览器可以成为这样的平台。我们建议演示主要集中在这些功能上,让参与者了解PhotoCube作为演示平台的全部潜力。
{"title":"Using PhotoCube as an extensible demonstration platform for advanced image analysis techniques","authors":"G. Tómasson, G. Olafsson, Hlynur Sigurþórsson, B. Jónsson, K. Runarsson, L. Amsaleg","doi":"10.1109/CBMI.2012.6269792","DOIUrl":"https://doi.org/10.1109/CBMI.2012.6269792","url":null,"abstract":"As digital image collections have been growing ever larger, the multimedia community has put emphasis on methods for image content analysis and presentation. To facilitate extensive user studies of these methods, a single platform is needed that can uniformly incorporate all the analysis and presentation methods under study. Due to its extensibility features, a plug-in API for image analysis methods and a browsing mode API for presentation methods, we believe that the PhotoCube browser can be that platform. We propose a demonstration focusing primarily on these features, allowing participants to appreciate the full potential of PhotoCube as a demonstration platform.","PeriodicalId":120769,"journal":{"name":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128695032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Comparing retrieval effectiveness of alternative content segmentation methods for Internet video search 网络视频搜索中不同内容分割方法的检索效果比较
Pub Date : 2012-06-27 DOI: 10.1109/CBMI.2012.6269810
Maria Eskevich, G. Jones, Christian Wartena, M. Larson, Robin Aly, T. Verschoor, R. Ordelman
We present an exploratory study of the retrieval of semiprofessional user-generated Internet video. The study is based on the MediaEval 2011 Rich Speech Retrieval (RSR) task for which the dataset was taken from the Internet sharing platform blip.tv, and search queries associated with specific speech acts occurring in the video. We compare results from three participant groups using: automatic speech recognition system transcript (ASR), metadata manually assigned to each video by the user who uploaded it, and their combination. RSR 2011 was a known-item search for a single manually identified ideal jump-in point in the video for each query where playback should begin. Retrieval effectiveness is measured using the MRR and mGAP metrics. Using different transcript segmentation methods the participants tried to maximize the rank of the relevant item and to locate the nearest match to the ideal jump-in point. Results indicate that best overall results are obtained for topically homogeneous segments which have a strong overlap with the relevant region associated with the jump-in point, and that use of metadata can be beneficial when segments are unfocused or cover more than one topic.
我们提出了半专业用户生成的互联网视频检索的探索性研究。该研究基于MediaEval 2011富语音检索(RSR)任务,该任务的数据集取自互联网共享平台blip。电视,以及与视频中出现的特定言语行为相关的搜索查询。我们使用自动语音识别系统记录(ASR)、上传视频的用户手动分配给每个视频的元数据以及它们的组合来比较三个参与者组的结果。RSR 2011是一个已知项搜索,为每个查询在视频中应该开始播放的地方手动确定一个理想的跳入点。使用MRR和mGAP度量度量检索效率。使用不同的转录片段分割方法,参与者试图最大化相关项目的排名,并找到最接近理想跳入点的匹配。结果表明,对于与跳跃点相关区域有很强重叠的主题同质片段,可以获得最佳的总体结果,并且当片段不集中或覆盖多个主题时,元数据的使用可能是有益的。
{"title":"Comparing retrieval effectiveness of alternative content segmentation methods for Internet video search","authors":"Maria Eskevich, G. Jones, Christian Wartena, M. Larson, Robin Aly, T. Verschoor, R. Ordelman","doi":"10.1109/CBMI.2012.6269810","DOIUrl":"https://doi.org/10.1109/CBMI.2012.6269810","url":null,"abstract":"We present an exploratory study of the retrieval of semiprofessional user-generated Internet video. The study is based on the MediaEval 2011 Rich Speech Retrieval (RSR) task for which the dataset was taken from the Internet sharing platform blip.tv, and search queries associated with specific speech acts occurring in the video. We compare results from three participant groups using: automatic speech recognition system transcript (ASR), metadata manually assigned to each video by the user who uploaded it, and their combination. RSR 2011 was a known-item search for a single manually identified ideal jump-in point in the video for each query where playback should begin. Retrieval effectiveness is measured using the MRR and mGAP metrics. Using different transcript segmentation methods the participants tried to maximize the rank of the relevant item and to locate the nearest match to the ideal jump-in point. Results indicate that best overall results are obtained for topically homogeneous segments which have a strong overlap with the relevant region associated with the jump-in point, and that use of metadata can be beneficial when segments are unfocused or cover more than one topic.","PeriodicalId":120769,"journal":{"name":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130848936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Detecting complex events in user-generated video using concept classifiers 使用概念分类器检测用户生成视频中的复杂事件
Pub Date : 2012-06-27 DOI: 10.1109/CBMI.2012.6269799
Jinlin Guo, David Scott, F. Hopfgartner, C. Gurrin
Automatic detection of complex events in user-generated videos (UGV) is a challenging task due to its new characteristics differing from broadcast video. In this work, we firstly summarize the new characteristics of UGV, and then explore how to utilize concept classifiers to recognize complex events in UGV content. The method starts from manually selecting a variety of relevant concepts, followed by constructing classifiers for these concepts. Finally, complex event detectors are learned by using the concatenated probabilistic scores of these concept classifiers as features. Further, we also compare three different fusion operations of probabilistic scores, namely Maximum, Average and Minimum fusion. Experimental results suggest that our method provides promising results. It also shows that Maximum fusion tends to give better performance for most complex events.
用户生成视频(UGV)具有不同于广播视频的新特点,因此复杂事件的自动检测是一项具有挑战性的任务。在本文中,我们首先总结了UGV的新特征,然后探讨了如何利用概念分类器识别UGV内容中的复杂事件。该方法从手动选择各种相关概念开始,然后为这些概念构建分类器。最后,通过使用这些概念分类器的串联概率分数作为特征来学习复杂事件检测器。此外,我们还比较了三种不同的融合操作的概率得分,即最大,平均和最小融合。实验结果表明,该方法具有较好的效果。它还表明,对于大多数复杂事件,最大融合倾向于提供更好的性能。
{"title":"Detecting complex events in user-generated video using concept classifiers","authors":"Jinlin Guo, David Scott, F. Hopfgartner, C. Gurrin","doi":"10.1109/CBMI.2012.6269799","DOIUrl":"https://doi.org/10.1109/CBMI.2012.6269799","url":null,"abstract":"Automatic detection of complex events in user-generated videos (UGV) is a challenging task due to its new characteristics differing from broadcast video. In this work, we firstly summarize the new characteristics of UGV, and then explore how to utilize concept classifiers to recognize complex events in UGV content. The method starts from manually selecting a variety of relevant concepts, followed by constructing classifiers for these concepts. Finally, complex event detectors are learned by using the concatenated probabilistic scores of these concept classifiers as features. Further, we also compare three different fusion operations of probabilistic scores, namely Maximum, Average and Minimum fusion. Experimental results suggest that our method provides promising results. It also shows that Maximum fusion tends to give better performance for most complex events.","PeriodicalId":120769,"journal":{"name":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121663117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Structural and visual similarity learning for Web page archiving 网页归档的结构和视觉相似性学习
Pub Date : 2012-06-27 DOI: 10.1109/CBMI.2012.6269849
M. Law, Carlos Sureda Gutiérrez, Nicolas Thome, Stéphane Gançarski
We present in this paper a Web page archiving approach combining image and structural techniques. Our main goal is to learn a similarity between Web pages in order to detect whether successive versions of pages are similar or not. Our system is based on a visual similarity measure designed for Web pages. Combined with a structural analysis of Web page source codes, a supervised feature selection method adapted to Web archiving is proposed. Experiments on real Web archives are reported including scalability issues.
本文提出了一种结合图像和结构技术的网页归档方法。我们的主要目标是学习Web页面之间的相似性,以便检测页面的后续版本是否相似。我们的系统基于为Web页面设计的视觉相似性度量。结合网页源代码的结构分析,提出了一种适用于网页存档的监督特征选择方法。报告了在真实Web存档上的实验,包括可伸缩性问题。
{"title":"Structural and visual similarity learning for Web page archiving","authors":"M. Law, Carlos Sureda Gutiérrez, Nicolas Thome, Stéphane Gançarski","doi":"10.1109/CBMI.2012.6269849","DOIUrl":"https://doi.org/10.1109/CBMI.2012.6269849","url":null,"abstract":"We present in this paper a Web page archiving approach combining image and structural techniques. Our main goal is to learn a similarity between Web pages in order to detect whether successive versions of pages are similar or not. Our system is based on a visual similarity measure designed for Web pages. Combined with a structural analysis of Web page source codes, a supervised feature selection method adapted to Web archiving is proposed. Experiments on real Web archives are reported including scalability issues.","PeriodicalId":120769,"journal":{"name":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122832882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Fitting Gaussian copulae for efficient visual codebooks generation 拟合高斯耦合以实现高效的可视化码本生成
Pub Date : 2012-06-27 DOI: 10.1109/CBMI.2012.6269794
Miriam Redi, B. Mérialdo
The Bag of Words model is probably one of the most effective ways to represent images based on the aggregation of locally extracted descriptors. It uses clustering techniques to build visual dictionaries that map each image into a fixed length signature. Despite its effectiveness, one major drawback of this model is the codebook informativeness and its computational complexity. In this paper we propose Copula-BoW (C-BoW), namely an efficient local feature aggregator inspired by the Copula theory. In C-BoW, we build in a quadratic time an efficient codebook for vector quantization, based on the correlation of the marginal distributions of the local features. Our experimental results prove that the C-BoW signature is much more efficient and as discriminative as traditional BoW for scene recognition and video retrieval (TRECVID [14] data). Moreover, we also show that our new model provides complementary information when combined to existing local features aggregators, substantially improving the final retrieval performance.
Bag of Words模型可能是基于局部提取描述符的聚合来表示图像的最有效方法之一。它使用聚类技术构建可视字典,将每个图像映射到固定长度的签名中。尽管它很有效,但该模型的一个主要缺点是码本的信息量和计算复杂度。本文提出了一种基于Copula理论的高效局部特征聚合器Copula- bow (C-BoW)。在C-BoW中,我们基于局部特征边缘分布的相关性,在二次元时间内构建了一个有效的矢量量化码本。实验结果表明,在场景识别和视频检索(TRECVID[14]数据)中,C-BoW签名比传统的BoW签名更有效,具有更好的判别能力。此外,我们还表明,当与现有的局部特征聚合器结合时,我们的新模型提供了互补的信息,大大提高了最终的检索性能。
{"title":"Fitting Gaussian copulae for efficient visual codebooks generation","authors":"Miriam Redi, B. Mérialdo","doi":"10.1109/CBMI.2012.6269794","DOIUrl":"https://doi.org/10.1109/CBMI.2012.6269794","url":null,"abstract":"The Bag of Words model is probably one of the most effective ways to represent images based on the aggregation of locally extracted descriptors. It uses clustering techniques to build visual dictionaries that map each image into a fixed length signature. Despite its effectiveness, one major drawback of this model is the codebook informativeness and its computational complexity. In this paper we propose Copula-BoW (C-BoW), namely an efficient local feature aggregator inspired by the Copula theory. In C-BoW, we build in a quadratic time an efficient codebook for vector quantization, based on the correlation of the marginal distributions of the local features. Our experimental results prove that the C-BoW signature is much more efficient and as discriminative as traditional BoW for scene recognition and video retrieval (TRECVID [14] data). Moreover, we also show that our new model provides complementary information when combined to existing local features aggregators, substantially improving the final retrieval performance.","PeriodicalId":120769,"journal":{"name":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"4 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131381878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Insertion of tags in urban scenes in real time on smartphone 在智能手机上实时插入城市场景中的标签
Pub Date : 2012-06-27 DOI: 10.1109/CBMI.2012.6269847
Thibault Tournier, S. Bres, Elöd Egyed-Zsigmond
This paper presents a new system, running on smartphone, to provide augmented reality in unconstrained context like an urban environment. Our approach is mainly based on interest points extraction and description. We use two new improvements of existing methods to increase the general performance. The first one allows optimization of the matching step with a BRIEF descriptor and a corner detector. The second one describes the functioning of a new tracker. We add other improvements to get more stable points especially in our urban environment and speed up the matching process between interest points. Thanks to these improvements, we were able to implement real time augmented reality on smartphones in urban environment. Demonstration videos are provided with the paper.
本文提出了一个在智能手机上运行的新系统,可以在城市环境等不受约束的环境中提供增强现实。我们的方法主要基于兴趣点的提取和描述。我们对现有方法进行了两种新的改进,以提高总体性能。第一个允许使用BRIEF描述符和角检测器优化匹配步骤。第二部分描述了新跟踪器的功能。我们添加了其他改进以获得更稳定的点,特别是在我们的城市环境中,并加快了兴趣点之间的匹配过程。由于这些改进,我们能够在城市环境中在智能手机上实现实时增强现实。本文还提供了演示视频。
{"title":"Insertion of tags in urban scenes in real time on smartphone","authors":"Thibault Tournier, S. Bres, Elöd Egyed-Zsigmond","doi":"10.1109/CBMI.2012.6269847","DOIUrl":"https://doi.org/10.1109/CBMI.2012.6269847","url":null,"abstract":"This paper presents a new system, running on smartphone, to provide augmented reality in unconstrained context like an urban environment. Our approach is mainly based on interest points extraction and description. We use two new improvements of existing methods to increase the general performance. The first one allows optimization of the matching step with a BRIEF descriptor and a corner detector. The second one describes the functioning of a new tracker. We add other improvements to get more stable points especially in our urban environment and speed up the matching process between interest points. Thanks to these improvements, we were able to implement real time augmented reality on smartphones in urban environment. Demonstration videos are provided with the paper.","PeriodicalId":120769,"journal":{"name":"2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123239515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1