超越词袋:局部聚合描述子的Fisher核向量快速视频分类

2015 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2015-06-01 DOI:10.1109/ICME.2015.7177489

Ionut Mironica, Ionut Cosmin Duta, B. Ionescu, N. Sebe

{"title":"超越词袋:局部聚合描述子的Fisher核向量快速视频分类","authors":"Ionut Mironica, Ionut Cosmin Duta, B. Ionescu, N. Sebe","doi":"10.1109/ICME.2015.7177489","DOIUrl":null,"url":null,"abstract":"In this paper we introduce a new video description framework that replaces traditional Bag-of-Words with a combination of Fisher Kernels (FK) and Vector of Locally Aggregated Descriptors (VLAD). The main contributions are: (i) a fast algorithm to densely extract global frame features, easier and faster to compute than spatio-temporal local features; (ii) replacing the traditional k-means based vocabulary with a Random Forest approach that allows significant speedup; (iii) use of a modified VLAD and FK representation to replace the classic Bag-of-Words and obtaining better performance. We show that our framework is highly general and is not dependent on a particular type of descriptor. It achieves state-of-the-art results in several classification scenarios.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Beyond Bag-of-Words: Fast video classification with Fisher Kernel Vector of Locally Aggregated Descriptors\",\"authors\":\"Ionut Mironica, Ionut Cosmin Duta, B. Ionescu, N. Sebe\",\"doi\":\"10.1109/ICME.2015.7177489\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we introduce a new video description framework that replaces traditional Bag-of-Words with a combination of Fisher Kernels (FK) and Vector of Locally Aggregated Descriptors (VLAD). The main contributions are: (i) a fast algorithm to densely extract global frame features, easier and faster to compute than spatio-temporal local features; (ii) replacing the traditional k-means based vocabulary with a Random Forest approach that allows significant speedup; (iii) use of a modified VLAD and FK representation to replace the classic Bag-of-Words and obtaining better performance. We show that our framework is highly general and is not dependent on a particular type of descriptor. It achieves state-of-the-art results in several classification scenarios.\",\"PeriodicalId\":146271,\"journal\":{\"name\":\"2015 IEEE International Conference on Multimedia and Expo (ICME)\",\"volume\":\"44 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE International Conference on Multimedia and Expo (ICME)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICME.2015.7177489\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Multimedia and Expo (ICME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME.2015.7177489","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

本文提出了一种新的视频描述框架，它将Fisher核(FK)和局部聚合描述子向量(VLAD)相结合，取代了传统的词袋描述框架。主要贡献有:(1)采用快速算法密集提取全局帧特征，比时空局部特征计算更简单、更快;(ii)用随机森林方法取代传统的基于k均值的词汇表，从而实现显著的加速;(iii)使用改进的VLAD和FK表示来取代经典的Bag-of-Words，获得更好的性能。我们展示了我们的框架是高度通用的，并且不依赖于特定类型的描述符。它在几个分类场景中实现了最先进的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Beyond Bag-of-Words: Fast video classification with Fisher Kernel Vector of Locally Aggregated Descriptors

In this paper we introduce a new video description framework that replaces traditional Bag-of-Words with a combination of Fisher Kernels (FK) and Vector of Locally Aggregated Descriptors (VLAD). The main contributions are: (i) a fast algorithm to densely extract global frame features, easier and faster to compute than spatio-temporal local features; (ii) replacing the traditional k-means based vocabulary with a Random Forest approach that allows significant speedup; (iii) use of a modified VLAD and FK representation to replace the classic Bag-of-Words and obtaining better performance. We show that our framework is highly general and is not dependent on a particular type of descriptor. It achieves state-of-the-art results in several classification scenarios.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 IEEE International Conference on Multimedia and Expo (ICME)

自引率

0.00%

发文量

期刊最新文献

Affect-expressive hand gestures synthesis and animation VTouch: Vision-enhanced interaction for large touch displays Egocentric hand pose estimation and distance recovery in a single RGB image A hybrid approach for retrieving diverse social images of landmarks Spatial perception reproduction of sound events based on sound property coincidences