2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)最新文献

英文中文

Effective training of convolutional networks using noisy Web images 使用带有噪声的Web图像进行卷积网络的有效训练

2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2015-06-10 DOI: 10.1109/CBMI.2015.7153607

Phong D. Vo, A. Gînsca, H. Borgne, Adrian Daniel Popescu

Deep convolutional networks have recently shown very interesting performance in a variety of computer vision tasks. Besides network architecture optimization, a key contribution to their success is the availability of training data. Network training is usually done with manually validated data but this approach has a significant cost and poses a scalability problem. Here we introduce an innovative pipeline that combines weakly-supervised image reranking methods and network fine-tuning to effectively train convolutional networks from noisy Web collections. We evaluate the proposed training method versus the conventional supervised training on cross-domain classification tasks. Results show that our method outperforms the conventional method in all of the three datasets. Our findings open opportunities for researchers and practitioners to use convolutional networks with inexpensive training cost.

深度卷积网络最近在各种计算机视觉任务中表现出非常有趣的性能。除了网络架构优化之外，他们成功的一个关键贡献是训练数据的可用性。网络训练通常是用人工验证的数据完成的，但这种方法成本很高，并且存在可扩展性问题。在这里，我们介绍了一种创新的管道，它结合了弱监督图像重新排序方法和网络微调来有效地训练来自噪声Web集合的卷积网络。在跨域分类任务上，我们将所提出的训练方法与传统的监督训练方法进行了比较。结果表明，该方法在所有三个数据集上都优于传统方法。我们的发现为研究人员和从业人员提供了以低廉的培训成本使用卷积网络的机会。

引用次数: 10

GPU implementation of an audio fingerprints similarity search algorithm GPU实现的一个音频指纹相似度搜索算法

2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2015-06-10 DOI: 10.1109/CBMI.2015.7153625

Chahid Ouali, P. Dumouchel, Vishwa Gupta

This paper describes a parallel implementation of a promising similarity search algorithm for an audio fingerprinting system. Efficient parallel implementation on a GPU accelerates the search on a dataset containing over 61 million audio fingerprints. The similarity between two fingerprints is defined as the intersection of their elements. We evaluate GPU implementations of two intersection algorithms for this dataset. We show that intelligent use of the GPU memory spaces (shared memory in particular) that maximizes the number of concurrent threads has a significant impact on the overall compute time when using fingerprints of varying dimensions. With simple modifications we obtain up to 4 times better GPU performance when using GPU memory to maximize concurrent threads. Compared to the CPU only implementations, the proposed GPU implementation reduces run times by up to 150 times for one intersection algorithm and by up to 379 times for the other intersection algorithm.

本文描述了一种用于音频指纹识别系统的相似度搜索算法的并行实现。在GPU上的高效并行实现加速了对包含超过6100万个音频指纹的数据集的搜索。两个指纹之间的相似性被定义为它们元素的交集。我们评估了该数据集的两种交集算法的GPU实现。我们表明，在使用不同维度的指纹时，智能地使用GPU内存空间(特别是共享内存)来最大化并发线程的数量会对总体计算时间产生重大影响。通过简单的修改，当使用GPU内存最大化并发线程时，我们获得了高达4倍的GPU性能。与仅使用CPU的实现相比，所提出的GPU实现将一个交集算法的运行时间减少了150倍，另一个交集算法的运行时间减少了379倍。

引用次数: 7

News-oriented multimedia search over multiple social networks 面向新闻的多媒体搜索在多个社会网络

2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2015-06-10 DOI: 10.1109/CBMI.2015.7153612

Katerina Iliakopoulou, S. Papadopoulos, Y. Kompatsiaris

The paper explores the problem of focused multimedia search over multiple social media sharing platforms such as Twitter and Facebook. A multi-step multimedia retrieval framework is presented that collects relevant and diverse multimedia content from multiple social media sources given an input news story or event of interest. The framework utilizes a novel query formulation method in combination with relevance prediction. The query formulation method relies on the construction of a graph of keywords for generating refined queries about the event/news story of interest based on the results of a firststep high precision query. Relevance prediction is based on supervised learning using 12 features computed from the content (text, visual) and social context (popularity, publication time) of posted items. A study is carried out on 20 real-world events and breaking news stories, using six social sources as input, and demonstrating the effectiveness of the proposed framework to collect and aggregate relevant high-quality media content from multiple social sources.

本文探讨了在多个社交媒体共享平台(如Twitter和Facebook)上集中多媒体搜索的问题。提出了一个多步骤多媒体检索框架，该框架从多个社交媒体来源收集相关的和不同的多媒体内容，给定输入的新闻故事或感兴趣的事件。该框架结合相关性预测，采用了一种新颖的查询表述方法。查询公式方法依赖于基于第一步高精度查询的结果生成关于感兴趣的事件/新闻故事的精细查询的关键字图的构造。相关性预测基于监督学习，使用从发布内容(文本、视觉)和社会背景(流行程度、发布时间)中计算出的12个特征。通过对20个现实世界事件和突发新闻故事的研究，使用6个社会来源作为输入，并证明了所提出的框架在从多个社会来源收集和聚合相关高质量媒体内容方面的有效性。

引用次数: 2

Duplicate image detection in a stream of web visual data 网页视觉数据流中的重复图像检测

2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2015-06-10 DOI: 10.1109/CBMI.2015.7153614

Etienne Gadeski, H. Borgne, Adrian Daniel Popescu

We consider the problem of indexing and searching image duplicates in streaming visual data. This task requires a fast image descriptor, a small memory footprint for each signature and a quick search algorithm. To this end, we propose a new descriptor satisfying the aforementioned requirements. We evaluate our method on two different datasets with the use of different sets of distractor images, leading to large-scale image collections (up to 85 million images). We compare our method to the state of the art and show it exhibits among the best detection performances but is much faster (one to two orders of magnitude).

研究了流视觉数据中图像副本的索引和搜索问题。该任务需要一个快速的图像描述符，每个签名占用的内存较小，以及快速的搜索算法。为此，我们提出了一个满足上述要求的新描述符。我们在两个不同的数据集上评估我们的方法，使用不同的分心图像集，导致大规模的图像收集(多达8500万张图像)。我们将我们的方法与最先进的方法进行比较，发现它具有最佳的检测性能，但速度要快得多(一到两个数量级)。

引用次数: 1

Permutation based indexing for high dimensional data on GPU architectures 基于排列索引的高维数据在GPU架构上

2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2015-06-10 DOI: 10.1109/CBMI.2015.7153619

Martin Kruliš, Hasmik Osipyan, S. Marchand-Maillet

Permutation-based indexing is one of the most popular techniques for the approximate nearest-neighbor search problem in high-dimensional spaces. Due to the exponential increase of multimedia data, the time required to index this data has become a serious constraint of the indexing techniques. One of the possible steps towards faster index construction is utilization of massively parallel platforms such as the GPGPU architectures. In this paper, we have analyzed the computational costs of individual steps of the permutation-based index construction in a high-dimensional feature space and proposed a hybrid solution, where computational power of GPU is utilized for distance computations whilst the host CPU performs the postprocessing and sorting steps. Despite the fact that computing the distances is a naturally data-parallel task, an efficient implementation is quite challenging due to various GPU limitations and complex memory hierarchy. We have tested possible approaches to work division and data caching to utilize the GPU to its best abilities. We summarize our empirical results and point out the optimal solution.

基于排列的索引是高维空间中最近邻搜索问题中最流行的技术之一。由于多媒体数据呈指数级增长，索引这些数据所需的时间已成为索引技术的一个严重制约因素。实现更快索引构建的一个可能步骤是利用大规模并行平台，如GPGPU架构。在本文中，我们分析了高维特征空间中基于排列索引构建的各个步骤的计算成本，并提出了一种混合解决方案，利用GPU的计算能力进行距离计算，而主机CPU执行后处理和排序步骤。尽管计算距离是一个自然的数据并行任务，但由于各种GPU限制和复杂的内存层次结构，有效的实现是相当具有挑战性的。我们已经测试了工作划分和数据缓存的可能方法，以利用GPU的最佳能力。总结了实证结果，指出了最优解。

{"title":"Permutation based indexing for high dimensional data on GPU architectures","authors":"Martin Kruliš, Hasmik Osipyan, S. Marchand-Maillet","doi":"10.1109/CBMI.2015.7153619","DOIUrl":"https://doi.org/10.1109/CBMI.2015.7153619","url":null,"abstract":"Permutation-based indexing is one of the most popular techniques for the approximate nearest-neighbor search problem in high-dimensional spaces. Due to the exponential increase of multimedia data, the time required to index this data has become a serious constraint of the indexing techniques. One of the possible steps towards faster index construction is utilization of massively parallel platforms such as the GPGPU architectures. In this paper, we have analyzed the computational costs of individual steps of the permutation-based index construction in a high-dimensional feature space and proposed a hybrid solution, where computational power of GPU is utilized for distance computations whilst the host CPU performs the postprocessing and sorting steps. Despite the fact that computing the distances is a naturally data-parallel task, an efficient implementation is quite challenging due to various GPU limitations and complex memory hierarchy. We have tested possible approaches to work division and data caching to utilize the GPU to its best abilities. We summarize our empirical results and point out the optimal solution.","PeriodicalId":387496,"journal":{"name":"2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129643997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Object detection and depth estimation for 3D trajectory extraction 三维轨迹提取中的目标检测与深度估计

2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2015-06-10 DOI: 10.1109/CBMI.2015.7153632

Zeyd Boukhers, Kimiaki Shirahama, Frédéric Li, M. Grzegorzek

To detect an event which is defined by the interaction of objects in a video, it is necessary to capture their spatio-temporal relation. However, the video only displays the original 3D space which is projected onto a 2D image plane. This paper introduces a method which extracts 3D trajectories of objects from 2D videos. Each trajectory represents the transition of an object's positions in the 3D space. We extract such trajectories by combining object detection with depth estimation that estimates the depth information in 2D videos. The major problem for this is the inconsistency between object detection and depth estimation results. For example, significantly different depths may be estimated for the region of the same object, and an object region that is appropriately shaped by estimated depths may be missed. To overcome this, we first initialise the 3D position of an object by selecting the frame with the highest consistency between the object detection and depth estimation results. Then, we track the object in the 3D space using particle filter, where the 3D position of this object is modelled as a hidden state to generate its 2D visual appearance. Experimental results demonstrate the effectiveness of our method.

为了检测由视频中物体的交互作用所定义的事件，有必要捕获它们的时空关系。然而，视频只显示原始的3D空间，并将其投影到二维图像平面上。介绍了一种从二维视频中提取物体三维轨迹的方法。每条轨迹表示物体在3D空间中的位置变化。我们通过结合物体检测和深度估计来提取这些轨迹，从而估计2D视频中的深度信息。主要问题是目标检测和深度估计结果不一致。例如，可能会对同一对象的区域估计明显不同的深度，并且可能会错过由估计深度适当塑造的对象区域。为了克服这个问题，我们首先通过选择物体检测和深度估计结果之间一致性最高的帧来初始化物体的3D位置。然后，我们使用粒子滤波在三维空间中跟踪物体，其中该物体的三维位置被建模为隐藏状态以生成其二维视觉外观。实验结果证明了该方法的有效性。

{"title":"Object detection and depth estimation for 3D trajectory extraction","authors":"Zeyd Boukhers, Kimiaki Shirahama, Frédéric Li, M. Grzegorzek","doi":"10.1109/CBMI.2015.7153632","DOIUrl":"https://doi.org/10.1109/CBMI.2015.7153632","url":null,"abstract":"To detect an event which is defined by the interaction of objects in a video, it is necessary to capture their spatio-temporal relation. However, the video only displays the original 3D space which is projected onto a 2D image plane. This paper introduces a method which extracts 3D trajectories of objects from 2D videos. Each trajectory represents the transition of an object's positions in the 3D space. We extract such trajectories by combining object detection with depth estimation that estimates the depth information in 2D videos. The major problem for this is the inconsistency between object detection and depth estimation results. For example, significantly different depths may be estimated for the region of the same object, and an object region that is appropriately shaped by estimated depths may be missed. To overcome this, we first initialise the 3D position of an object by selecting the frame with the highest consistency between the object detection and depth estimation results. Then, we track the object in the 3D space using particle filter, where the 3D position of this object is modelled as a hidden state to generate its 2D visual appearance. Experimental results demonstrate the effectiveness of our method.","PeriodicalId":387496,"journal":{"name":"2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115546613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Introducing FoxPersonTracks: A benchmark for person re-identification from TV broadcast shows 介绍FoxPersonTracks:从电视广播节目中重新识别人物的基准

2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2015-06-01 DOI: 10.1109/CBMI.2015.7153630

Rémi Auguste, Pierre Tirilly, J. Martinet

This paper introduces a novel person track dataset dedicated to person re-identification. The dataset is built from a set of real life TV shows broadcasted from BFMTV and LCP TV French channels, provided during the REPERE challenge. It contains a total of 4,604 persontracks (short video sequences featuring an individual with no background) from 266 persons. The dataset has been built from the REPERE dataset by following several automated processing and manual selection/filtering steps. It is meant to serve as a benchmark in person re-identification from images/videos. The dataset also provides re-identifications results using space-time histograms as a baseline, together with an evaluation tool in order to ease the comparison to other re-identification methods.

本文介绍了一种新的用于人的再识别的人轨迹数据集。该数据集是根据BFMTV和LCP TV法国频道播出的一组真实电视节目构建的，这些节目是在REPERE挑战期间提供的。它总共包含266个人的4604个人物轨迹(没有背景的个人短视频序列)。数据集是通过遵循几个自动处理和手动选择/过滤步骤从REPERE数据集构建的。它旨在作为从图像/视频中重新识别的基准。该数据集还提供了使用时空直方图作为基线的重新识别结果，以及一个评估工具，以便于与其他重新识别方法进行比较。

引用次数: 1

Visual information retrieval in endoscopic video archives 内窥镜视频档案中的视觉信息检索

2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)

Pub Date : 2015-04-29 DOI: 10.1109/CBMI.2015.7153618

Jennifer Roldan-Carlos, M. Lux, Xavier Giró-i-Nieto, P. Muñoz, N. Anagnostopoulos

In endoscopic procedures, surgeons work with live video streams from the inside of their subjects. A main source for documentation of procedures are still frames from the video, identified and taken during the surgery. However, with growing demands and technical means, the streams are saved to storage servers and the surgeons need to retrieve parts of the videos on demand. In this submission we present a demo application allowing for video retrieval based on visual features and late fusion, which allows surgeons to re-find shots taken during the procedure.

在内窥镜手术中，外科医生使用来自患者体内的实时视频流。手术记录的主要来源仍然是手术过程中识别和拍摄的视频帧。然而，随着需求的增长和技术手段的发展，视频流被保存到存储服务器上，外科医生需要按需检索部分视频。在这篇文章中，我们展示了一个基于视觉特征和后期融合的视频检索演示应用程序，它允许外科医生重新找到手术过程中拍摄的镜头。

引用次数: 13

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀