Pub Date : 2015-06-10DOI: 10.1109/CBMI.2015.7153607
Phong D. Vo, A. Gînsca, H. Borgne, Adrian Daniel Popescu
Deep convolutional networks have recently shown very interesting performance in a variety of computer vision tasks. Besides network architecture optimization, a key contribution to their success is the availability of training data. Network training is usually done with manually validated data but this approach has a significant cost and poses a scalability problem. Here we introduce an innovative pipeline that combines weakly-supervised image reranking methods and network fine-tuning to effectively train convolutional networks from noisy Web collections. We evaluate the proposed training method versus the conventional supervised training on cross-domain classification tasks. Results show that our method outperforms the conventional method in all of the three datasets. Our findings open opportunities for researchers and practitioners to use convolutional networks with inexpensive training cost.
{"title":"Effective training of convolutional networks using noisy Web images","authors":"Phong D. Vo, A. Gînsca, H. Borgne, Adrian Daniel Popescu","doi":"10.1109/CBMI.2015.7153607","DOIUrl":"https://doi.org/10.1109/CBMI.2015.7153607","url":null,"abstract":"Deep convolutional networks have recently shown very interesting performance in a variety of computer vision tasks. Besides network architecture optimization, a key contribution to their success is the availability of training data. Network training is usually done with manually validated data but this approach has a significant cost and poses a scalability problem. Here we introduce an innovative pipeline that combines weakly-supervised image reranking methods and network fine-tuning to effectively train convolutional networks from noisy Web collections. We evaluate the proposed training method versus the conventional supervised training on cross-domain classification tasks. Results show that our method outperforms the conventional method in all of the three datasets. Our findings open opportunities for researchers and practitioners to use convolutional networks with inexpensive training cost.","PeriodicalId":387496,"journal":{"name":"2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126706014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-06-10DOI: 10.1109/CBMI.2015.7153625
Chahid Ouali, P. Dumouchel, Vishwa Gupta
This paper describes a parallel implementation of a promising similarity search algorithm for an audio fingerprinting system. Efficient parallel implementation on a GPU accelerates the search on a dataset containing over 61 million audio fingerprints. The similarity between two fingerprints is defined as the intersection of their elements. We evaluate GPU implementations of two intersection algorithms for this dataset. We show that intelligent use of the GPU memory spaces (shared memory in particular) that maximizes the number of concurrent threads has a significant impact on the overall compute time when using fingerprints of varying dimensions. With simple modifications we obtain up to 4 times better GPU performance when using GPU memory to maximize concurrent threads. Compared to the CPU only implementations, the proposed GPU implementation reduces run times by up to 150 times for one intersection algorithm and by up to 379 times for the other intersection algorithm.
{"title":"GPU implementation of an audio fingerprints similarity search algorithm","authors":"Chahid Ouali, P. Dumouchel, Vishwa Gupta","doi":"10.1109/CBMI.2015.7153625","DOIUrl":"https://doi.org/10.1109/CBMI.2015.7153625","url":null,"abstract":"This paper describes a parallel implementation of a promising similarity search algorithm for an audio fingerprinting system. Efficient parallel implementation on a GPU accelerates the search on a dataset containing over 61 million audio fingerprints. The similarity between two fingerprints is defined as the intersection of their elements. We evaluate GPU implementations of two intersection algorithms for this dataset. We show that intelligent use of the GPU memory spaces (shared memory in particular) that maximizes the number of concurrent threads has a significant impact on the overall compute time when using fingerprints of varying dimensions. With simple modifications we obtain up to 4 times better GPU performance when using GPU memory to maximize concurrent threads. Compared to the CPU only implementations, the proposed GPU implementation reduces run times by up to 150 times for one intersection algorithm and by up to 379 times for the other intersection algorithm.","PeriodicalId":387496,"journal":{"name":"2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124079491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-06-10DOI: 10.1109/CBMI.2015.7153612
Katerina Iliakopoulou, S. Papadopoulos, Y. Kompatsiaris
The paper explores the problem of focused multimedia search over multiple social media sharing platforms such as Twitter and Facebook. A multi-step multimedia retrieval framework is presented that collects relevant and diverse multimedia content from multiple social media sources given an input news story or event of interest. The framework utilizes a novel query formulation method in combination with relevance prediction. The query formulation method relies on the construction of a graph of keywords for generating refined queries about the event/news story of interest based on the results of a firststep high precision query. Relevance prediction is based on supervised learning using 12 features computed from the content (text, visual) and social context (popularity, publication time) of posted items. A study is carried out on 20 real-world events and breaking news stories, using six social sources as input, and demonstrating the effectiveness of the proposed framework to collect and aggregate relevant high-quality media content from multiple social sources.
{"title":"News-oriented multimedia search over multiple social networks","authors":"Katerina Iliakopoulou, S. Papadopoulos, Y. Kompatsiaris","doi":"10.1109/CBMI.2015.7153612","DOIUrl":"https://doi.org/10.1109/CBMI.2015.7153612","url":null,"abstract":"The paper explores the problem of focused multimedia search over multiple social media sharing platforms such as Twitter and Facebook. A multi-step multimedia retrieval framework is presented that collects relevant and diverse multimedia content from multiple social media sources given an input news story or event of interest. The framework utilizes a novel query formulation method in combination with relevance prediction. The query formulation method relies on the construction of a graph of keywords for generating refined queries about the event/news story of interest based on the results of a firststep high precision query. Relevance prediction is based on supervised learning using 12 features computed from the content (text, visual) and social context (popularity, publication time) of posted items. A study is carried out on 20 real-world events and breaking news stories, using six social sources as input, and demonstrating the effectiveness of the proposed framework to collect and aggregate relevant high-quality media content from multiple social sources.","PeriodicalId":387496,"journal":{"name":"2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121373941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-06-10DOI: 10.1109/CBMI.2015.7153614
Etienne Gadeski, H. Borgne, Adrian Daniel Popescu
We consider the problem of indexing and searching image duplicates in streaming visual data. This task requires a fast image descriptor, a small memory footprint for each signature and a quick search algorithm. To this end, we propose a new descriptor satisfying the aforementioned requirements. We evaluate our method on two different datasets with the use of different sets of distractor images, leading to large-scale image collections (up to 85 million images). We compare our method to the state of the art and show it exhibits among the best detection performances but is much faster (one to two orders of magnitude).
{"title":"Duplicate image detection in a stream of web visual data","authors":"Etienne Gadeski, H. Borgne, Adrian Daniel Popescu","doi":"10.1109/CBMI.2015.7153614","DOIUrl":"https://doi.org/10.1109/CBMI.2015.7153614","url":null,"abstract":"We consider the problem of indexing and searching image duplicates in streaming visual data. This task requires a fast image descriptor, a small memory footprint for each signature and a quick search algorithm. To this end, we propose a new descriptor satisfying the aforementioned requirements. We evaluate our method on two different datasets with the use of different sets of distractor images, leading to large-scale image collections (up to 85 million images). We compare our method to the state of the art and show it exhibits among the best detection performances but is much faster (one to two orders of magnitude).","PeriodicalId":387496,"journal":{"name":"2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129625324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-06-10DOI: 10.1109/CBMI.2015.7153619
Martin Kruliš, Hasmik Osipyan, S. Marchand-Maillet
Permutation-based indexing is one of the most popular techniques for the approximate nearest-neighbor search problem in high-dimensional spaces. Due to the exponential increase of multimedia data, the time required to index this data has become a serious constraint of the indexing techniques. One of the possible steps towards faster index construction is utilization of massively parallel platforms such as the GPGPU architectures. In this paper, we have analyzed the computational costs of individual steps of the permutation-based index construction in a high-dimensional feature space and proposed a hybrid solution, where computational power of GPU is utilized for distance computations whilst the host CPU performs the postprocessing and sorting steps. Despite the fact that computing the distances is a naturally data-parallel task, an efficient implementation is quite challenging due to various GPU limitations and complex memory hierarchy. We have tested possible approaches to work division and data caching to utilize the GPU to its best abilities. We summarize our empirical results and point out the optimal solution.
{"title":"Permutation based indexing for high dimensional data on GPU architectures","authors":"Martin Kruliš, Hasmik Osipyan, S. Marchand-Maillet","doi":"10.1109/CBMI.2015.7153619","DOIUrl":"https://doi.org/10.1109/CBMI.2015.7153619","url":null,"abstract":"Permutation-based indexing is one of the most popular techniques for the approximate nearest-neighbor search problem in high-dimensional spaces. Due to the exponential increase of multimedia data, the time required to index this data has become a serious constraint of the indexing techniques. One of the possible steps towards faster index construction is utilization of massively parallel platforms such as the GPGPU architectures. In this paper, we have analyzed the computational costs of individual steps of the permutation-based index construction in a high-dimensional feature space and proposed a hybrid solution, where computational power of GPU is utilized for distance computations whilst the host CPU performs the postprocessing and sorting steps. Despite the fact that computing the distances is a naturally data-parallel task, an efficient implementation is quite challenging due to various GPU limitations and complex memory hierarchy. We have tested possible approaches to work division and data caching to utilize the GPU to its best abilities. We summarize our empirical results and point out the optimal solution.","PeriodicalId":387496,"journal":{"name":"2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129643997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-06-10DOI: 10.1109/CBMI.2015.7153632
Zeyd Boukhers, Kimiaki Shirahama, Frédéric Li, M. Grzegorzek
To detect an event which is defined by the interaction of objects in a video, it is necessary to capture their spatio-temporal relation. However, the video only displays the original 3D space which is projected onto a 2D image plane. This paper introduces a method which extracts 3D trajectories of objects from 2D videos. Each trajectory represents the transition of an object's positions in the 3D space. We extract such trajectories by combining object detection with depth estimation that estimates the depth information in 2D videos. The major problem for this is the inconsistency between object detection and depth estimation results. For example, significantly different depths may be estimated for the region of the same object, and an object region that is appropriately shaped by estimated depths may be missed. To overcome this, we first initialise the 3D position of an object by selecting the frame with the highest consistency between the object detection and depth estimation results. Then, we track the object in the 3D space using particle filter, where the 3D position of this object is modelled as a hidden state to generate its 2D visual appearance. Experimental results demonstrate the effectiveness of our method.
{"title":"Object detection and depth estimation for 3D trajectory extraction","authors":"Zeyd Boukhers, Kimiaki Shirahama, Frédéric Li, M. Grzegorzek","doi":"10.1109/CBMI.2015.7153632","DOIUrl":"https://doi.org/10.1109/CBMI.2015.7153632","url":null,"abstract":"To detect an event which is defined by the interaction of objects in a video, it is necessary to capture their spatio-temporal relation. However, the video only displays the original 3D space which is projected onto a 2D image plane. This paper introduces a method which extracts 3D trajectories of objects from 2D videos. Each trajectory represents the transition of an object's positions in the 3D space. We extract such trajectories by combining object detection with depth estimation that estimates the depth information in 2D videos. The major problem for this is the inconsistency between object detection and depth estimation results. For example, significantly different depths may be estimated for the region of the same object, and an object region that is appropriately shaped by estimated depths may be missed. To overcome this, we first initialise the 3D position of an object by selecting the frame with the highest consistency between the object detection and depth estimation results. Then, we track the object in the 3D space using particle filter, where the 3D position of this object is modelled as a hidden state to generate its 2D visual appearance. Experimental results demonstrate the effectiveness of our method.","PeriodicalId":387496,"journal":{"name":"2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115546613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-06-01DOI: 10.1109/CBMI.2015.7153630
Rémi Auguste, Pierre Tirilly, J. Martinet
This paper introduces a novel person track dataset dedicated to person re-identification. The dataset is built from a set of real life TV shows broadcasted from BFMTV and LCP TV French channels, provided during the REPERE challenge. It contains a total of 4,604 persontracks (short video sequences featuring an individual with no background) from 266 persons. The dataset has been built from the REPERE dataset by following several automated processing and manual selection/filtering steps. It is meant to serve as a benchmark in person re-identification from images/videos. The dataset also provides re-identifications results using space-time histograms as a baseline, together with an evaluation tool in order to ease the comparison to other re-identification methods.
{"title":"Introducing FoxPersonTracks: A benchmark for person re-identification from TV broadcast shows","authors":"Rémi Auguste, Pierre Tirilly, J. Martinet","doi":"10.1109/CBMI.2015.7153630","DOIUrl":"https://doi.org/10.1109/CBMI.2015.7153630","url":null,"abstract":"This paper introduces a novel person track dataset dedicated to person re-identification. The dataset is built from a set of real life TV shows broadcasted from BFMTV and LCP TV French channels, provided during the REPERE challenge. It contains a total of 4,604 persontracks (short video sequences featuring an individual with no background) from 266 persons. The dataset has been built from the REPERE dataset by following several automated processing and manual selection/filtering steps. It is meant to serve as a benchmark in person re-identification from images/videos. The dataset also provides re-identifications results using space-time histograms as a baseline, together with an evaluation tool in order to ease the comparison to other re-identification methods.","PeriodicalId":387496,"journal":{"name":"2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"61 19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127756134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-04-29DOI: 10.1109/CBMI.2015.7153618
Jennifer Roldan-Carlos, M. Lux, Xavier Giró-i-Nieto, P. Muñoz, N. Anagnostopoulos
In endoscopic procedures, surgeons work with live video streams from the inside of their subjects. A main source for documentation of procedures are still frames from the video, identified and taken during the surgery. However, with growing demands and technical means, the streams are saved to storage servers and the surgeons need to retrieve parts of the videos on demand. In this submission we present a demo application allowing for video retrieval based on visual features and late fusion, which allows surgeons to re-find shots taken during the procedure.
{"title":"Visual information retrieval in endoscopic video archives","authors":"Jennifer Roldan-Carlos, M. Lux, Xavier Giró-i-Nieto, P. Muñoz, N. Anagnostopoulos","doi":"10.1109/CBMI.2015.7153618","DOIUrl":"https://doi.org/10.1109/CBMI.2015.7153618","url":null,"abstract":"In endoscopic procedures, surgeons work with live video streams from the inside of their subjects. A main source for documentation of procedures are still frames from the video, identified and taken during the surgery. However, with growing demands and technical means, the streams are saved to storage servers and the surgeons need to retrieve parts of the videos on demand. In this submission we present a demo application allowing for video retrieval based on visual features and late fusion, which allows surgeons to re-find shots taken during the procedure.","PeriodicalId":387496,"journal":{"name":"2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122163025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}