2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)最新文献

英文中文

Efficient indexing structures for fast media search and browsing 高效的索引结构，用于快速媒体搜索和浏览

2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2011-06-13 DOI: 10.1109/CBMI.2011.5972533

Marco Teixeira, João Magalhães

Fast media search and browsing is today a growing need to face the challenge of managing large collections of personal media. Traditional databases, (e.g. MySQL), or text databases (e.g. Lucene), do not address an important aspect of multimedia data: the high-dimensionality of data. In this paper, we describe the implementation and evaluation of high-dimensional data indexing structures for fast search and browsing. Index structures for high-dimensional data have been researched in the literature and several proposals exist in the literature. We compare five popular index structures on a large-scale image retrieval scenario with different visual features of varying dimensionality. Both indexing and search aspects are evaluated: indexing time, search time and the trade-off between precision and performance.

快速媒体搜索和浏览是当今日益增长的需求，需要面对管理大量个人媒体集合的挑战。传统的数据库(如MySQL)或文本数据库(如Lucene)没有解决多媒体数据的一个重要方面:数据的高维性。在本文中，我们描述了用于快速搜索和浏览的高维数据索引结构的实现和评估。文献对高维数据的索引结构进行了研究，并提出了几种建议。在具有不同维数的视觉特征的大规模图像检索场景中，我们比较了五种流行的索引结构。对索引和搜索两个方面进行评估:索引时间、搜索时间以及精度和性能之间的权衡。

引用次数: 0

Spoken WordCloud: Clustering recurrent patterns in speech 口语词云:语音中重复模式的聚类

2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2011-06-13 DOI: 10.1109/CBMI.2011.5972534

Rémi Flamary, Xavier Anguera Miró, Nuria Oliver

The automatic summarization of speech recordings is typically carried out as a two step process: the speech is first decoded using an automatic speech recognition system and the resulting text transcripts are processed to create a summary. However, this approach might not be suitable in adverse acoustic conditions or when applied to languages with limited training resources. In order to address these limitations, in this paper we propose an automatic speech summarization method that is based on the automatic discovery of recurrent patterns in the speech: recurrent acoustic patterns are first extracted from the audio and then are clustered and ranked according to the number of repetitions, creating an approximate acoustic summary of what was spoken. This approach allows us to build what we call a “Spoken WordCloud” termed after similarity with text-based word-clouds. We present an algorithm that achieves a cluster purity of up to 90% and an inverse purity of 71% in preliminary experiments using a small dataset of connected spoken words.

语音记录的自动摘要通常作为两步过程进行:首先使用自动语音识别系统对语音进行解码，然后处理产生的文本抄本以创建摘要。然而，这种方法可能不适用于不利的声学条件或应用于训练资源有限的语言时。为了解决这些限制，本文提出了一种基于自动发现语音中重复模式的自动语音摘要方法:首先从音频中提取重复的声学模式，然后根据重复次数进行聚类和排序，创建一个近似的语音摘要。这种方法使我们能够建立我们所谓的“口语词云”，这是根据与基于文本的词云的相似性来命名的。我们提出了一种算法，该算法在使用连接语音的小数据集的初步实验中实现了高达90%的聚类纯度和71%的反向纯度。

引用次数: 24

Unsupervised anchorpersons differentiation in news video 新闻视频中无监督主持人的分化

2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2011-06-13 DOI: 10.1109/CBMI.2011.5972531

M. Broilo, A. Basso, F. D. Natale

The automatic extraction of video structure from content is of key importance to enable a variety of multimedia services that span from search and retrieval to content manipulation. An unsupervised independent unimodal clustering method for anchorpersons detection and differentiation in newscasts is presented in this paper. The algorithm exploits audio, frame and face information to identify major cast in the content. These three components are first processed independently during the cluster analysis and then jointly in a compositional mining phase. A differentiation of the role played by the people in the video has been implemented exploiting the temporal characteristics of the detected anchorpersons. Experiments show significant precision/recall results thus opening further research directions in video analysis, particularly when the content is highly structured as in TV newscasts.

从内容中自动提取视频结构对于实现从搜索和检索到内容操作的各种多媒体服务至关重要。提出了一种无监督独立单峰聚类方法，用于新闻节目主持人的识别和区分。该算法利用音频、帧和人脸信息来识别内容中的主要演员。在聚类分析阶段，首先对这三个成分进行独立处理，然后在组合挖掘阶段联合处理。利用检测到的主持人的时间特征，实现了视频中人物角色的区分。实验显示了显著的精度/召回结果，从而为视频分析开辟了进一步的研究方向，特别是当内容高度结构化时，如电视新闻节目。

引用次数: 8

Efficient video summarization and retrieval tools 高效的视频摘要和检索工具

2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2011-06-13 DOI: 10.1109/CBMI.2011.5972518

Víctor Valdés, J. Sanchez

In this paper we describe the video browsing and retrieval techniques included within the ASSETS project system, focused on providing enhanced access to video repositories. The proposed mechanisms aims to provide efficient and reusable techniques for browsing and retrieval, trying to minimize the computational and storage cost of the approach while offering novel functionalities such as personalized/real-time video summarization. The system is under design and development within the ASSETS project that deals with advanced tools for accessing to cultural content.

在本文中，我们描述了ASSETS项目系统中包含的视频浏览和检索技术，重点是提供对视频存储库的增强访问。所提出的机制旨在为浏览和检索提供高效和可重用的技术，在提供个性化/实时视频摘要等新功能的同时，尽量减少该方法的计算和存储成本。该系统正在ASSETS项目的设计和开发中，该项目处理访问文化内容的高级工具。

引用次数: 8

Using LIDO to handle 3D cultural heritage documentation data provenance 利用LIDO处理三维文物文献资料来源

2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2011-06-13 DOI: 10.1109/CBMI.2011.5972517

D. Pitzalis, F. Niccolucci, M. Cord

It is important for Digital Libraries (DL) to be flexible in exposing their content. Typically a DL provides a search/browse interface which allows resources to be found and a service to make the data available for harvesting from/to other DLs. This kind of communication is possible because the structures of different DLs are expressed following formal specifications. In particular in Cultural Heritage, where we need to describe an extremely heterogeneous environment, some metadata standards are emerging and mappings are proposed to allow metadata exchange and enrichment. CIDOC-CRM is an ontology designed to mediate contents in the area of tangible cultural heritage and was published as ISO 21127 : 2006 standard. Lately an extension of CIDOC-CRM, known as CRMdig, enables to document information about data provenance and digital surrogates in a very precise way. Another metadata schema suitable for handling museum-related data is LIDO. In this paper we propose a case study where we show how CIDOC-CRMdig and LIDO handle the digital information of an object and specially the data provenance.

对于数字图书馆(DL)来说，灵活地公开其内容是非常重要的。通常，DL提供一个搜索/浏览接口，允许查找资源，并提供一个服务，使数据可用于从其他DL中获取。这种通信是可能的，因为不同dl的结构是按照正式规范表示的。特别是在文化遗产领域，我们需要描述一个极其异构的环境，一些元数据标准正在出现，并提出了映射以允许元数据交换和丰富。CIDOC-CRM是一个本体，旨在调解物质文化遗产领域的内容，并作为ISO 21127: 2006标准发布。最近，CIDOC-CRM的一个扩展，被称为CRMdig，能够以非常精确的方式记录有关数据来源和数字代理的信息。另一个适合处理博物馆相关数据的元数据模式是LIDO。在本文中，我们提出了一个案例研究，我们展示了CIDOC-CRMdig和LIDO如何处理对象的数字信息，特别是数据来源。

引用次数: 3

Unsupervised scene detection in Olympic video using multi-modal chains 基于多模态链的奥运视频无监督场景检测

2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2011-06-13 DOI: 10.1109/CBMI.2011.5972529

Gert-Jan Poulisse, Marie-Francine Moens

This paper presents a novel unsupervised method for identifying the semantic structure in long semi-structured video streams. We identify ‘chains’, local clusters of repeated features from both the video stream and audio transcripts. Each chain serves as an indicator that the temporal interval it demarcates is part of the same semantic event. By layering all the chains over each other, dense regions emerge from the overlapping chains, from which we can identify the semantic structure of the video. We analyze two clustering strategies that accomplish this task.

本文提出了一种新的无监督方法来识别长半结构化视频流中的语义结构。我们从视频流和音频文本中识别“链”，即重复特征的局部集群。每个链都作为一个指示器，表明它所划分的时间间隔是同一语义事件的一部分。通过将所有的链相互叠加，密集的区域从重叠的链中出现，我们可以从中识别视频的语义结构。我们分析了完成此任务的两种聚类策略。

引用次数: 4

Interactive social, spatial and temporal querying for multimedia retrieval 多媒体检索的交互式社会、空间和时间查询

2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2011-06-13 DOI: 10.1109/CBMI.2011.5972512

G. C. D. Silva, K. Aizawa, Yuki Arase, Xing Xie

We propose a scheme for faster and more effective retrieval of temporal, spatial and social multimedia from large collections. We define interactive multimedia queries that allow simultaneous query refinement on multiple search dimensions. User interaction techniques based on line and iconic sketches allow specifying queries based on the above definition. We prototype a multi-user travel media network and implement the proposed user interaction techniques for retrieving locomotion patterns of the users. The proposed queries facilitate easy input and refinement of queries, and efficient retrieval.

我们提出了一种更快、更有效地从大型馆藏中检索时间、空间和社会多媒体的方案。我们定义交互式多媒体查询，允许在多个搜索维度上同时进行查询细化。基于线条和图标草图的用户交互技术允许根据上述定义指定查询。我们原型化了一个多用户旅游媒体网络，并实现了所提出的用户交互技术来检索用户的运动模式。所提出的查询便于查询的输入和细化，以及有效的检索。

引用次数: 2

Binary SIFT: Fast image retrieval using binary quantized SIFT features 二值SIFT:使用二值量化SIFT特征的快速图像检索

2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2011-06-13 DOI: 10.1109/CBMI.2011.5972548

K. A. Peker

SIFT features are widely used in content based image retrieval. Typically, a few thousand keypoints are extracted from each image. Image matching involves distance computations across all pairs of SIFT feature vectors from both images, which is quite costly. We show that SIFT features perform surprisingly well even after quantizing each component to binary, when the medians are used as the quantization thresholds. Quantized features preserve both distinctiveness and matching properties. Almost all of the features in our 5.4 million feature test set map to distinct binary patterns after quantization. Furthermore, number of matches between images using both the original and the binary quantized SIFT features are quite similar. We investigate the distribution of SIFT features and observe that the space of 128-D binary vectors has sufficient capacity for the current performance of SIFT features. We use component median values as quantization thresholds and show through vector-to-vector distance comparisons and image-to-image matches that the resulting binary vectors perform comparable to original SIFT vectors. We also discuss computational and storage gains. Binary vector distance computation reduces to bit-wise operations. Square operation is eliminated. Fast and efficient indexing techniques such as the signatures used for chemical databases can also be considered.

SIFT特征广泛应用于基于内容的图像检索。通常，从每张图像中提取几千个关键点。图像匹配涉及到两幅图像中所有对SIFT特征向量之间的距离计算，这是非常昂贵的。我们表明，即使在将每个分量量化为二值之后，当使用中位数作为量化阈值时，SIFT特征也表现得非常好。量化特征保留了显著性和匹配性。在我们的540万个特征测试集中，几乎所有的特征在量化后都映射到不同的二进制模式。此外，使用原始和二值量化SIFT特征的图像之间的匹配次数非常相似。我们研究了SIFT特征的分布，发现128-D二值向量的空间有足够的容量来满足SIFT特征的当前性能。我们使用分量中值作为量化阈值，并通过向量到向量的距离比较和图像到图像的匹配显示，所得的二值向量的性能与原始SIFT向量相当。我们还讨论了计算和存储增益。二进制矢量距离计算减少到位操作。消除了平方运算。还可以考虑快速有效的索引技术，例如用于化学数据库的签名。

{"title":"Binary SIFT: Fast image retrieval using binary quantized SIFT features","authors":"K. A. Peker","doi":"10.1109/CBMI.2011.5972548","DOIUrl":"https://doi.org/10.1109/CBMI.2011.5972548","url":null,"abstract":"SIFT features are widely used in content based image retrieval. Typically, a few thousand keypoints are extracted from each image. Image matching involves distance computations across all pairs of SIFT feature vectors from both images, which is quite costly. We show that SIFT features perform surprisingly well even after quantizing each component to binary, when the medians are used as the quantization thresholds. Quantized features preserve both distinctiveness and matching properties. Almost all of the features in our 5.4 million feature test set map to distinct binary patterns after quantization. Furthermore, number of matches between images using both the original and the binary quantized SIFT features are quite similar. We investigate the distribution of SIFT features and observe that the space of 128-D binary vectors has sufficient capacity for the current performance of SIFT features. We use component median values as quantization thresholds and show through vector-to-vector distance comparisons and image-to-image matches that the resulting binary vectors perform comparable to original SIFT vectors. We also discuss computational and storage gains. Binary vector distance computation reduces to bit-wise operations. Square operation is eliminated. Fast and efficient indexing techniques such as the signatures used for chemical databases can also be considered.","PeriodicalId":358337,"journal":{"name":"2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"185 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131406192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

Semi-supervised object recognition using flickr images 使用flickr图像的半监督对象识别

2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2011-06-13 DOI: 10.1109/CBMI.2011.5972550

E. Chatzilari, S. Nikolopoulos, S. Papadopoulos, Christos Zigkolis, Y. Kompatsiaris

In this work we present an algorithm for extracting region level annotations from flickr images using a small set of manually labelled regions to guide the selection process. More specifically, we construct a set of flickr images that focuses on a certain concept and apply a novel graph based clustering algorithm on their regions. Then, we select the cluster or clusters that correspond to the examined concept guided by the manually labelled data. Experimental results show that although the obtained regions are of lower quality compared to the manually labelled regions, the gain in effort compensates for the loss in performance.

在这项工作中，我们提出了一种从flickr图像中提取区域级注释的算法，该算法使用一小组手动标记的区域来指导选择过程。更具体地说，我们构建了一组关注某个概念的flickr图像，并在其区域上应用了一种新的基于图的聚类算法。然后，我们在手动标记的数据指导下选择与检查概念对应的集群。实验结果表明，尽管与人工标记的区域相比，获得的区域质量较低，但努力的增加弥补了性能的损失。

引用次数: 15

ImmEx: IMMersive text documents exploration system ImmEx:沉浸式文本文档探索系统

2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2011-06-13 DOI: 10.1109/CBMI.2011.5972511

Mario Cataldi, Luigi Di Caro, C. Schifanella

Common search engines, especially web-based, rely on standard keyword-based queries and matching algorithms using word frequencies, topics recentness, documents authority and/or thesauri. However, even if those systems present efficient retrieval algorithms, they are not able to lead the user into an intuitive exploration of large data collections because of their cumbersome presentations of the results (e.g. large lists of entries). Moreover, these methods do not provide any mechanism to retrieve other relevant information associated to those contents and, even if query refinement methods are proposed, it is really hard to express it because of the user's inexperience and common lack of familiarity with terminology. Therefore, we propose ImmEx, a novel visual navigational system for an immersive exploration of text documents that overcomes these problems by leveraging the intuitiveness of semantically-related images, retrieved in real-time from popular image sharing services. ImmEx lets independently explore large text collection through a novel approach that exploits the directness of the images and their user-generated metadata. We finally analyze the efficiency and usability of the proposed system by providing case and user studies.

常见的搜索引擎，尤其是基于web的，依赖于标准的基于关键字的查询和匹配算法，使用词频、主题近代性、文档权威性和/或词典。然而，即使这些系统提供了高效的检索算法，它们也无法引导用户直观地探索大型数据集合，因为它们对结果的表示很繁琐(例如，大型条目列表)。此外，这些方法不提供任何机制来检索与这些内容相关的其他相关信息，即使提出了查询细化方法，也很难表达出来，因为用户缺乏经验，而且通常不熟悉术语。因此，我们提出了一种新的视觉导航系统ImmEx，它通过利用从流行的图像共享服务中实时检索的语义相关图像的直观性来克服这些问题，用于沉浸式文本文档探索。ImmEx通过一种新颖的方法，利用图像的直观性及其用户生成的元数据，允许独立地探索大型文本集合。最后，我们通过案例和用户研究来分析系统的效率和可用性。

{"title":"ImmEx: IMMersive text documents exploration system","authors":"Mario Cataldi, Luigi Di Caro, C. Schifanella","doi":"10.1109/CBMI.2011.5972511","DOIUrl":"https://doi.org/10.1109/CBMI.2011.5972511","url":null,"abstract":"Common search engines, especially web-based, rely on standard keyword-based queries and matching algorithms using word frequencies, topics recentness, documents authority and/or thesauri. However, even if those systems present efficient retrieval algorithms, they are not able to lead the user into an intuitive exploration of large data collections because of their cumbersome presentations of the results (e.g. large lists of entries). Moreover, these methods do not provide any mechanism to retrieve other relevant information associated to those contents and, even if query refinement methods are proposed, it is really hard to express it because of the user's inexperience and common lack of familiarity with terminology. Therefore, we propose ImmEx, a novel visual navigational system for an immersive exploration of text documents that overcomes these problems by leveraging the intuitiveness of semantically-related images, retrieved in real-time from popular image sharing services. ImmEx lets independently explore large text collection through a novel approach that exploits the directness of the images and their user-generated metadata. We finally analyze the efficiency and usability of the proposed system by providing case and user studies.","PeriodicalId":358337,"journal":{"name":"2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116692795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀