语义视频索引的学习特征与工程特征

2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI) Pub Date : 2015-06-10 DOI:10.1109/CBMI.2015.7153637

Mateusz Budnik, Efrain-Leonardo Gutierrez-Gomez, Bahjat Safadi, G. Quénot

{"title":"语义视频索引的学习特征与工程特征","authors":"Mateusz Budnik, Efrain-Leonardo Gutierrez-Gomez, Bahjat Safadi, G. Quénot","doi":"10.1109/CBMI.2015.7153637","DOIUrl":null,"url":null,"abstract":"In this paper, we compare “traditional” engineered (hand-crafted) features (or descriptors) and learned features for content-based semantic indexing of video documents. Learned (or semantic) features are obtained by training classifiers for other target concepts on other data. These classifiers are then applied to the current collection. The vector of classification scores is the new feature used for training a classifier for the current target concepts on the current collection. If the classifiers used on the other collection are of the Deep Convolutional Neural Network (DCNN) type, it is possible to use as a new feature not only the score values provided by the last layer but also the intermediate values corresponding to the output of all the hidden layers. We made an extensive comparison of the performance of such features with traditional engineered ones as well as with combinations of them. The comparison was made in the context of the TRECVid semantic indexing task. Our results confirm those obtained for still images: features learned from other training data generally outperform engineered features for concept recognition. Additionally, we found that directly training SVM classifiers using these features does significantly better than partially retraining the DCNN for adapting it to the new data. We also found that, even though the learned features performed better that the engineered ones, the fusion of both of them perform significantly better, indicating that engineered features are still useful, at least in this case.","PeriodicalId":387496,"journal":{"name":"2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Learned features versus engineered features for semantic video indexing\",\"authors\":\"Mateusz Budnik, Efrain-Leonardo Gutierrez-Gomez, Bahjat Safadi, G. Quénot\",\"doi\":\"10.1109/CBMI.2015.7153637\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we compare “traditional” engineered (hand-crafted) features (or descriptors) and learned features for content-based semantic indexing of video documents. Learned (or semantic) features are obtained by training classifiers for other target concepts on other data. These classifiers are then applied to the current collection. The vector of classification scores is the new feature used for training a classifier for the current target concepts on the current collection. If the classifiers used on the other collection are of the Deep Convolutional Neural Network (DCNN) type, it is possible to use as a new feature not only the score values provided by the last layer but also the intermediate values corresponding to the output of all the hidden layers. We made an extensive comparison of the performance of such features with traditional engineered ones as well as with combinations of them. The comparison was made in the context of the TRECVid semantic indexing task. Our results confirm those obtained for still images: features learned from other training data generally outperform engineered features for concept recognition. Additionally, we found that directly training SVM classifiers using these features does significantly better than partially retraining the DCNN for adapting it to the new data. We also found that, even though the learned features performed better that the engineered ones, the fusion of both of them perform significantly better, indicating that engineered features are still useful, at least in this case.\",\"PeriodicalId\":387496,\"journal\":{\"name\":\"2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-06-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CBMI.2015.7153637\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CBMI.2015.7153637","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

摘要

在本文中，我们比较了基于内容的视频文档语义索引的“传统”工程(手工制作)特征(或描述符)和学习特征。学习到的(或语义的)特征是通过在其他数据上训练其他目标概念的分类器来获得的。然后将这些分类器应用于当前集合。分类分数向量是用于训练当前集合上当前目标概念的分类器的新特征。如果在另一个集合上使用的分类器是深度卷积神经网络(Deep Convolutional Neural Network, DCNN)类型的分类器，则不仅可以使用最后一层提供的得分值，还可以使用与所有隐藏层的输出相对应的中间值作为新特征。我们将这些特性的性能与传统的工程特性以及它们的组合进行了广泛的比较。比较是在TRECVid语义索引任务的背景下进行的。我们的结果证实了从静态图像中获得的结果:从其他训练数据中学习的特征通常优于概念识别的工程特征。此外，我们发现使用这些特征直接训练SVM分类器比部分重新训练DCNN使其适应新数据要好得多。我们还发现，尽管学习的特征比设计的特征表现得更好，但两者的融合表现得更好，这表明设计的特征仍然是有用的，至少在这种情况下。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Learned features versus engineered features for semantic video indexing

In this paper, we compare “traditional” engineered (hand-crafted) features (or descriptors) and learned features for content-based semantic indexing of video documents. Learned (or semantic) features are obtained by training classifiers for other target concepts on other data. These classifiers are then applied to the current collection. The vector of classification scores is the new feature used for training a classifier for the current target concepts on the current collection. If the classifiers used on the other collection are of the Deep Convolutional Neural Network (DCNN) type, it is possible to use as a new feature not only the score values provided by the last layer but also the intermediate values corresponding to the output of all the hidden layers. We made an extensive comparison of the performance of such features with traditional engineered ones as well as with combinations of them. The comparison was made in the context of the TRECVid semantic indexing task. Our results confirm those obtained for still images: features learned from other training data generally outperform engineered features for concept recognition. Additionally, we found that directly training SVM classifiers using these features does significantly better than partially retraining the DCNN for adapting it to the new data. We also found that, even though the learned features performed better that the engineered ones, the fusion of both of them perform significantly better, indicating that engineered features are still useful, at least in this case.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)

自引率

0.00%

发文量