多模态识别中单模态方法的比较与结合

2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI) Pub Date : 2016-06-01 DOI:10.1109/CBMI.2016.7500253

S. Ishikawa, Jorma T. Laaksonen

{"title":"多模态识别中单模态方法的比较与结合","authors":"S. Ishikawa, Jorma T. Laaksonen","doi":"10.1109/CBMI.2016.7500253","DOIUrl":null,"url":null,"abstract":"Multimodal recognition has recently become more attractive and common method in multimedia information retrieval. In many cases it shows better recognition results than using only unimodal methods. Most of current multimodal recognition methods still depend on unimodal recognition results. Therefore, in order to get better recognition performance, it is important to choose suitable features and classification models for each unimodal recognition task. In this paper, we research several unimodal recognition methods, features for them and their combination techniques, in the application setup of concept detection in image-text data. For image features, we use GoogLeNet deep convolutional neural network (DCNN) activation features and semantic concept vectors. For text features, we use simple binary vectors for tags and word2vec vectors. As the concept detection model, we apply the Multimodal Deep Boltzmann Machine (DBM) model and the Support Vector Machine (SVM) with the linear homogeneous kernel map and the non-linear radial basis function (RBF) kernel. The experimental results with the MIRFLICKR-1M data set show that the Multimodal DBM or the non-linear SVM approaches produce equally good results within the margins of statistical variation.","PeriodicalId":356608,"journal":{"name":"2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Comparing and combining unimodal methods for multimodal recognition\",\"authors\":\"S. Ishikawa, Jorma T. Laaksonen\",\"doi\":\"10.1109/CBMI.2016.7500253\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multimodal recognition has recently become more attractive and common method in multimedia information retrieval. In many cases it shows better recognition results than using only unimodal methods. Most of current multimodal recognition methods still depend on unimodal recognition results. Therefore, in order to get better recognition performance, it is important to choose suitable features and classification models for each unimodal recognition task. In this paper, we research several unimodal recognition methods, features for them and their combination techniques, in the application setup of concept detection in image-text data. For image features, we use GoogLeNet deep convolutional neural network (DCNN) activation features and semantic concept vectors. For text features, we use simple binary vectors for tags and word2vec vectors. As the concept detection model, we apply the Multimodal Deep Boltzmann Machine (DBM) model and the Support Vector Machine (SVM) with the linear homogeneous kernel map and the non-linear radial basis function (RBF) kernel. The experimental results with the MIRFLICKR-1M data set show that the Multimodal DBM or the non-linear SVM approaches produce equally good results within the margins of statistical variation.\",\"PeriodicalId\":356608,\"journal\":{\"name\":\"2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CBMI.2016.7500253\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CBMI.2016.7500253","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

多模态识别是近年来多媒体信息检索中越来越受欢迎的常用方法。在许多情况下，它比单峰方法的识别效果更好。目前大多数多模态识别方法仍然依赖于单模态识别结果。因此，为了获得更好的识别性能，为每个单峰识别任务选择合适的特征和分类模型是非常重要的。本文研究了几种单模识别方法，它们的特点及其组合技术，在图像-文本数据概念检测中的应用设置。对于图像特征，我们使用GoogLeNet深度卷积神经网络(DCNN)激活特征和语义概念向量。对于文本特征，我们使用简单的二进制向量作为标记和word2vec向量。作为概念检测模型，我们采用了具有线性齐次核映射和非线性径向基函数核的多模态深度玻尔兹曼机(DBM)模型和支持向量机(SVM)。MIRFLICKR-1M数据集的实验结果表明，在统计变异范围内，多模态DBM方法和非线性支持向量机方法的效果相同。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Comparing and combining unimodal methods for multimodal recognition

Multimodal recognition has recently become more attractive and common method in multimedia information retrieval. In many cases it shows better recognition results than using only unimodal methods. Most of current multimodal recognition methods still depend on unimodal recognition results. Therefore, in order to get better recognition performance, it is important to choose suitable features and classification models for each unimodal recognition task. In this paper, we research several unimodal recognition methods, features for them and their combination techniques, in the application setup of concept detection in image-text data. For image features, we use GoogLeNet deep convolutional neural network (DCNN) activation features and semantic concept vectors. For text features, we use simple binary vectors for tags and word2vec vectors. As the concept detection model, we apply the Multimodal Deep Boltzmann Machine (DBM) model and the Support Vector Machine (SVM) with the linear homogeneous kernel map and the non-linear radial basis function (RBF) kernel. The experimental results with the MIRFLICKR-1M data set show that the Multimodal DBM or the non-linear SVM approaches produce equally good results within the margins of statistical variation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI)

自引率

0.00%

发文量