2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)最新文献

英文中文

An evolutionary confidence measurement for spoken term detection 语音术语检测的进化置信度测量

2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2011-06-13 DOI: 10.1109/CBMI.2011.5972537

Javier Tejedor, A. Echeverría, Dong Wang

We propose a new discriminative confidence measurement approach based on an evolution strategy for spoken term detection (STD). Our evolutionary algorithm, named evolutionary discriminant analysis (EDA), optimizes classification errors directly, which is a salient advantage compared with some conventional discriminative models which optimize objective functions based on certain class encoding, e.g. MLPs and SVMs. In addition, with the intrinsic randomness of the evolution strategy, EDA largely reduces the risk of converging to local minimums in model training. This is particularly valuable when the decision boundary is complex, which is the case when dealing with out-of-vocabulary (OOV) terms in STD. Experimental results on the meeting domain in English demonstrate considerable performance improvement with the EDA-based confidence for OOV terms compared with MLPs- and SVMs-based confidences; for in-vocabulary terms, however, no significant difference is observed with the three models. This confirms our conjecture that EDA exhibits more advantage for tasks with complex decision boundaries.

提出了一种基于进化策略的判别置信度测量方法。我们的进化算法——进化判别分析(EDA)，直接优化分类误差，与传统的基于特定类编码优化目标函数的判别模型(如mlp和svm)相比，这是一个显著的优势。此外，由于进化策略的内在随机性，EDA在很大程度上降低了模型训练中收敛到局部最小值的风险。当决策边界很复杂时，这一点尤其有价值，这就是STD中处理词汇外(OOV)术语时的情况。英语会议域的实验结果表明，与基于mlp和svm的置信度相比，基于eda的OOV术语置信度有相当大的性能提高;然而，对于词汇内术语，三种模型之间没有显著差异。这证实了我们的猜想，EDA在具有复杂决策边界的任务中表现出更大的优势。

引用次数: 9

Real-time single-view video event recognition in controlled environments 受控环境下实时单视点视频事件识别

2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2011-06-13 DOI: 10.1109/CBMI.2011.5972527

Juan C. Sanmiguel, Marcos Escudero-Viñolo, J. Sanchez, Jesús Bescós

This paper presents a real-time video event recognition system for controlled environments. It is able to recognize human activities and interactions with the objects of the environment by exploiting different cues like trajectory analysis, skin detection and people recognition of the foreground blobs of the scene. Time variations of these features are studied and combined using Bayesian inference to detect the events. Contextual information, including fixed objects' location, object types and event hierarchical definitions, is formally included in the system. A corpus of video sequences has been designed and recorded considering different complexity levels for object extraction. Experimental results show that our approach can recognize five kinds of events (two activities and three human-object interactions) with high precision operating at real-time.

本文提出了一种用于受控环境的实时视频事件识别系统。它能够通过利用不同的线索，如轨迹分析、皮肤检测和人们对场景前景斑点的识别，来识别人类活动和与环境物体的互动。研究了这些特征的时间变化，并结合贝叶斯推理来检测事件。上下文信息，包括固定对象的位置、对象类型和事件层次定义，正式包含在系统中。考虑不同的目标提取复杂度，设计并记录了一个视频序列语料库。实验结果表明，该方法可以实时高精度地识别5种事件(2种活动和3种人-物交互)。

引用次数: 5

A region-dependent image matching method for image and video annotation 一种基于区域的图像和视频标注匹配方法

2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2011-06-13 DOI: 10.1109/CBMI.2011.5972532

Golnaz Abdollahian, M. Birinci, F. Díaz-de-María, M. Gabbouj, E. Delp

In this paper we propose an image matching approach that selects the method of matching for each region in the image based on the region properties. This method can be used to find images similar to a query image from a database, which is useful for automatic image and video annotation. In this approach, each image is first divided into large homogeneous areas, identified as “texture areas”, and non-texture areas. Local descriptors are then used to match the keypoints in the non-texture areas, while texture regions are matched based on low level visual features. Experimental results prove that while exclusion of texture areas from local descriptor matching increases the efficiency of the whole process, utilization of appropriate measures for different regions can also increase the overall performance.

本文提出了一种基于区域属性选择图像中每个区域的匹配方法的图像匹配方法。该方法可用于从数据库中查找与查询图像相似的图像，有助于实现图像和视频的自动标注。在这种方法中，每张图像首先被划分为较大的均匀区域，被识别为“纹理区域”，以及非纹理区域。然后使用局部描述符匹配非纹理区域中的关键点，而纹理区域则基于低级视觉特征进行匹配。实验结果表明，将纹理区域排除在局部描述子匹配之外可以提高整个匹配过程的效率，同时对不同区域采用适当的度量也可以提高整体性能。

引用次数: 6

dpikt — Automatic illustration system for media content dpikt -媒体内容的自动插图系统

2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2011-06-13 DOI: 10.1109/CBMI.2011.5972552

Filipe Coelho, Cristina Ribeiro

Journalists and bloggers need to find useful images to illustrate news stories and blog entries with high quality photos. The dpikt text illustration system uses multimedia information retrieval to assist this content enrichment task. Users query the system with text fragments and get collections of candidate photos. Images in the results can be visually sorted according to a selected photo, or be used as a seed for interactive searches over the entire collection. dpikt incorporates a recent visual descriptor, the Joint Composite Descriptor, and an approximate indexing scheme designed for large-scale image collections, the Permutation-Prefix Index. We have used the SAPO-Labs large-scale news stories photo collection, containing almost two million high quality photos with short descriptions, as the resource for the illustration task.

记者和博客需要找到有用的图片来说明高质量的新闻故事和博客条目。dpikt文本插图系统使用多媒体信息检索来辅助内容充实任务。用户使用文本片段查询系统，并获得候选照片的集合。结果中的图像可以根据选定的照片进行视觉排序，或者用作整个集合的交互式搜索的种子。dpikt结合了一种最新的视觉描述符，联合复合描述符，以及一种为大规模图像集合设计的近似索引方案，排列前缀索引。我们使用SAPO-Labs的大型新闻故事图片集作为插图任务的资源，该图片集包含近200万张带有简短描述的高质量照片。

引用次数: 2

Combining local and global visual feature similarity using a text search engine 结合局部和全局视觉特征相似使用文本搜索引擎

2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2011-06-13 DOI: 10.1109/CBMI.2011.5972519

Giuseppe Amato, Paolo Bolettieri, F. Falchi, C. Gennaro, F. Rabitti

In this paper we propose a novel approach that allows processing image content based queries expressed as arbitrary combinations of local and global visual features, by using a single index realized as an inverted file. The index was implemented on top of the Lucene retrieval engine. This is particularly useful to allow people to efficiently and interactively check the quality of the retrieval result by exploiting combinations of various features when using content based retrieval systems.

在本文中，我们提出了一种新颖的方法，该方法允许处理基于图像内容的查询，这些查询表示为局部和全局视觉特征的任意组合，通过使用作为倒立文件实现的单个索引。索引是在Lucene检索引擎之上实现的。当使用基于内容的检索系统时，这对于允许人们通过利用各种特征的组合来有效地和交互式地检查检索结果的质量特别有用。

引用次数: 22

Generic R-transform for invariant pattern representation 用于不变模式表示的通用r变换

2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2011-06-13 DOI: 10.1109/CBMI.2011.5972538

Thai V. Hoang, S. Tabbone

The beneficial properties of the Radon transform make it an useful intermediate representation for the extraction of invariant features from pattern images for the purpose of indexing/matching. This paper revisits the problem with a generic view on a popular Radon-based pattern descriptor, the R-signature, bringing in a class of descriptors spatially describing patterns at all the directions and at different levels. The domain of this class and the selection of its representative are also discussed. Theoretical arguments validate the robustness of the generic R-signature to additive noise and experimental results show its effectiveness.

Radon变换的有利特性使其成为一种有用的中间表示，用于从模式图像中提取用于索引/匹配的不变特征。本文以一种流行的基于氡的模式描述符r -签名的一般观点重新审视了这个问题，引入了一类描述符在空间上描述所有方向和不同层次的模式。讨论了该类的研究领域及其代表人物的选择。理论论证验证了通用r特征对加性噪声的鲁棒性，实验结果表明了它的有效性。

引用次数: 2

On modality classification and its use in text-based image retrieval in medical databases 模态分类及其在医学数据库文本图像检索中的应用

2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2011-06-13 DOI: 10.1109/CBMI.2011.5972530

Pierre Tirilly, Kun Lu, Xiangming Mu, Tian Zhao, Yu Cao

Medical databases have been a popular application field for image retrieval techniques during the last decade. More recently, much attention has been paid to the prediction of medical image modality (X-rays, MRI…) and the integration of the predicted modality into image retrieval systems. This paper addresses these two issues. On the one hand, we believe it is possible to design specific visual descriptors to determine image modality much more efficiently than the traditional image descriptors currently used for this task. We propose very light image descriptors that better describe the modality properties and show promising results. On the other hand, we present a comparison of different existing or new modality integration methods. This comprehensive study provide insights on the behavior of these models with respect to the initial classification and retrieval systems. These results can be extended to other applications with a similar framework. All the experiments presented in this work are performed using datasets provided during the 2009 and 2010 ImageCLEF medical tracks.

近十年来，医学数据库已成为图像检索技术的一个热门应用领域。近年来，人们越来越关注医学图像模态的预测(x射线，MRI…)以及将预测模态集成到图像检索系统中。本文解决了这两个问题。一方面，我们相信设计特定的视觉描述符可以比目前用于此任务的传统图像描述符更有效地确定图像模态。我们提出了非常轻的图像描述符，可以更好地描述模态属性，并显示出令人满意的结果。另一方面，我们对现有的和新的模态整合方法进行了比较。这项全面的研究为这些模型的行为提供了与初始分类和检索系统相关的见解。这些结果可以扩展到具有类似框架的其他应用程序。在这项工作中提出的所有实验都是使用2009年和2010年ImageCLEF医学轨道期间提供的数据集进行的。

{"title":"On modality classification and its use in text-based image retrieval in medical databases","authors":"Pierre Tirilly, Kun Lu, Xiangming Mu, Tian Zhao, Yu Cao","doi":"10.1109/CBMI.2011.5972530","DOIUrl":"https://doi.org/10.1109/CBMI.2011.5972530","url":null,"abstract":"Medical databases have been a popular application field for image retrieval techniques during the last decade. More recently, much attention has been paid to the prediction of medical image modality (X-rays, MRI…) and the integration of the predicted modality into image retrieval systems. This paper addresses these two issues. On the one hand, we believe it is possible to design specific visual descriptors to determine image modality much more efficiently than the traditional image descriptors currently used for this task. We propose very light image descriptors that better describe the modality properties and show promising results. On the other hand, we present a comparison of different existing or new modality integration methods. This comprehensive study provide insights on the behavior of these models with respect to the initial classification and retrieval systems. These results can be extended to other applications with a similar framework. All the experiments presented in this work are performed using datasets provided during the 2009 and 2010 ImageCLEF medical tracks.","PeriodicalId":358337,"journal":{"name":"2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"87 27 Pt 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126305505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Content-based image retrieval for Alzheimer's disease detection 基于内容的阿尔茨海默病检测图像检索

2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2011-06-13 DOI: 10.1109/CBMI.2011.5972513

Mayank Agarwal, Javed Mostafa

This paper describes ViewFinder Medicine (vfM) as an application of content-based image retrieval to the domain of Alzheimer's disease and medical imaging in general. The system follows a multi-tier architecture which provides the flexibility in experimenting with different representation, classification, ranking and feedback techniques. Classification is central to the system because besides providing an estimate of what stage of the disease the input query may belong to, it also helps adapt and rank the search results. It was found that using our multi-level approach, the classification performance matched the best result reported in the medical imaging literature. Up to 87% of patients were correctly classified in their respective classes, leading to an average precision of about 0.8 without any relevance feedback from the user. To encourage engagement and leverage physicians' knowledge, a relevance feedback function was subsequently added and as result precision improved to 0.89.

本文描述了ViewFinder Medicine (vfM)作为一种基于内容的图像检索在阿尔茨海默病和医学成像领域的应用。该系统遵循多层架构，提供了试验不同表示、分类、排名和反馈技术的灵活性。分类是系统的核心，因为除了提供输入查询可能属于疾病的哪个阶段的估计外，它还有助于调整搜索结果并对其进行排序。我们发现，采用我们的多层次方法，分类性能与医学影像学文献报道的最佳结果相匹配。高达87%的患者在各自的类别中被正确分类，在没有用户任何相关反馈的情况下，平均精度约为0.8。为了鼓励参与和利用医生的知识，随后增加了相关反馈功能，结果精度提高到0.89。

引用次数: 26

Applying soft links to diversify video recommendations 应用软链接使视频推荐多样化

2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2011-06-13 DOI: 10.1109/CBMI.2011.5972523

D. Vallet, Martin Halvey, J. Jose, P. Castells

In this paper we present a study of exploratory video search tasks and recommendation techniques based on a graph representation of past user com-munity interactions with the system, which have been used in a number of multimedia retrieval systems. We propose an extension for such graph-based usage representation techniques based on the creation of additional soft links between nodes. It is demonstrated how soft links can be incorporated into a graph-based representation and how different state of the art techniques can be adapted to use soft links. Our evaluation, based on a simulation-oriented technique and real interaction data gathered from users, shows how our soft links can help in improving the diversity and, in some cases, the accuracy of the studied recommendation techniques.

在本文中，我们研究了探索性视频搜索任务和基于过去用户社区与系统交互的图表示的推荐技术，该技术已在许多多媒体检索系统中使用。我们提出了一种基于节点间附加软链接创建的基于图的用法表示技术的扩展。演示了如何将软链接合并到基于图形的表示中，以及如何适应不同的技术状态来使用软链接。我们的评估基于面向模拟的技术和从用户那里收集的真实交互数据，显示了我们的软链接如何有助于提高所研究推荐技术的多样性，在某些情况下，还能提高其准确性。

引用次数: 1

People indexing in TV-content using lip-activity and unsupervised audio-visual identity verification 人们在电视内容索引使用唇活动和无监督的视听身份验证

2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2011-06-13 DOI: 10.1109/CBMI.2011.5972535

Meriem Bendris, Delphine Charlet, G. Chollet

Our goal is to structure TV-content by person allowing a user to navigate through the sequences of the same person. To let a user browse through the content without restriction on people within it, this structuration has to be done without any pre-defined dictionary of people. To this end, most methods propose to index people independently by the audio and visual information, and associate the indexes to obtain the talking-face one. Unfortunately, this approach combines clustering errors provided in each modality. In this work, we propose a mutual correction scheme of audio and visual clustering errors. First, the clustering errors are detected using indicators suspecting a talking-face presence. Then, the incorrect label is corrected according to an automatic modification scheme. Two modification schemes are proposed and evaluated : one based on systematic correction of the a priori supposed less reliable modality while the second proposes to compare unsupervised audio-visual models scores to determine which modality failed. Experiments on a TV-show database show that the proposed correction schemes yield significant improvement in performance, mainly due to an important reduction of missed talking-faces.

我们的目标是按人构建电视内容，允许用户在同一个人的序列中导航。为了让用户浏览内容而不受其中人员的限制，这种结构必须在没有任何预定义的人员字典的情况下完成。为此，大多数方法都提出通过音像信息对人物进行独立索引，并将索引关联起来，得到说话人的索引。不幸的是，这种方法结合了每种模式中提供的聚类错误。在这项工作中，我们提出了一种音频和视觉聚类误差的相互校正方案。首先，使用怀疑说话面孔存在的指标来检测聚类错误。然后根据自动修改方案对不正确的标签进行修改。提出并评估了两种修正方案:一种是基于对先验假设的不可靠模态的系统修正，另一种是通过比较无监督视听模型的得分来确定哪一种模态失败。在一个电视节目数据库上的实验表明，所提出的校正方案在性能上取得了显著的改善，主要是由于大大减少了漏听的谈话面孔。

{"title":"People indexing in TV-content using lip-activity and unsupervised audio-visual identity verification","authors":"Meriem Bendris, Delphine Charlet, G. Chollet","doi":"10.1109/CBMI.2011.5972535","DOIUrl":"https://doi.org/10.1109/CBMI.2011.5972535","url":null,"abstract":"Our goal is to structure TV-content by person allowing a user to navigate through the sequences of the same person. To let a user browse through the content without restriction on people within it, this structuration has to be done without any pre-defined dictionary of people. To this end, most methods propose to index people independently by the audio and visual information, and associate the indexes to obtain the talking-face one. Unfortunately, this approach combines clustering errors provided in each modality. In this work, we propose a mutual correction scheme of audio and visual clustering errors. First, the clustering errors are detected using indicators suspecting a talking-face presence. Then, the incorrect label is corrected according to an automatic modification scheme. Two modification schemes are proposed and evaluated : one based on systematic correction of the a priori supposed less reliable modality while the second proposes to compare unsupervised audio-visual models scores to determine which modality failed. Experiments on a TV-show database show that the proposed correction schemes yield significant improvement in performance, mainly due to an important reduction of missed talking-faces.","PeriodicalId":358337,"journal":{"name":"2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122904166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀