Beyond Bag-of-Words: Fast video classification with Fisher Kernel Vector of Locally Aggregated Descriptors

2015 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2015-06-01 DOI:10.1109/ICME.2015.7177489

Ionut Mironica, Ionut Cosmin Duta, B. Ionescu, N. Sebe

引用次数: 7

Abstract

In this paper we introduce a new video description framework that replaces traditional Bag-of-Words with a combination of Fisher Kernels (FK) and Vector of Locally Aggregated Descriptors (VLAD). The main contributions are: (i) a fast algorithm to densely extract global frame features, easier and faster to compute than spatio-temporal local features; (ii) replacing the traditional k-means based vocabulary with a Random Forest approach that allows significant speedup; (iii) use of a modified VLAD and FK representation to replace the classic Bag-of-Words and obtaining better performance. We show that our framework is highly general and is not dependent on a particular type of descriptor. It achieves state-of-the-art results in several classification scenarios.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

超越词袋:局部聚合描述子的Fisher核向量快速视频分类

本文提出了一种新的视频描述框架，它将Fisher核(FK)和局部聚合描述子向量(VLAD)相结合，取代了传统的词袋描述框架。主要贡献有:(1)采用快速算法密集提取全局帧特征，比时空局部特征计算更简单、更快;(ii)用随机森林方法取代传统的基于k均值的词汇表，从而实现显著的加速;(iii)使用改进的VLAD和FK表示来取代经典的Bag-of-Words，获得更好的性能。我们展示了我们的框架是高度通用的，并且不依赖于特定类型的描述符。它在几个分类场景中实现了最先进的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊