利用LSTM和自编码器提出多模态集成模型

Wataru Noguchi, H. Iizuka, Masahito Yamamoto
{"title":"利用LSTM和自编码器提出多模态集成模型","authors":"Wataru Noguchi, H. Iizuka, Masahito Yamamoto","doi":"10.4108/eai.3-12-2015.2262505","DOIUrl":null,"url":null,"abstract":"We propose an architecture of neural network that can learn and integrate sequential multimodal information using Long Short Term Memory. Our model consists of encoder and decoder LSTMs and multimodal autoencoder. For integrating sequential multimodal information, firstly, the encoder LSTM encodes a sequential input to a fixed range feature vector for each modality. Secondly, the multimodal autoencoder integrates the feature vectors from each modality and generate a fused feature vector which contains sequential multimodal information in a mixed form. The original feature vectors from each modality are re-generated from the fused feature vector in the multimodal autoencoder. The decoder LSTM decodes the sequential inputs from the regenerated feature vector. Our model is trained with the visual and motion sequences of humans and is tested by recall tasks. The experimental results show that our model can learn and remember the sequential multimodal inputs and decrease the ambiguity generated at the learning stage of LSTMs using integrated multimodal information. Our model can also recall the visual sequences from the only motion sequences and vice versa.","PeriodicalId":335727,"journal":{"name":"EAI Endorsed Trans. Security Safety","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Proposing Multimodal Integration Model Using LSTM and Autoencoder\",\"authors\":\"Wataru Noguchi, H. Iizuka, Masahito Yamamoto\",\"doi\":\"10.4108/eai.3-12-2015.2262505\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose an architecture of neural network that can learn and integrate sequential multimodal information using Long Short Term Memory. Our model consists of encoder and decoder LSTMs and multimodal autoencoder. For integrating sequential multimodal information, firstly, the encoder LSTM encodes a sequential input to a fixed range feature vector for each modality. Secondly, the multimodal autoencoder integrates the feature vectors from each modality and generate a fused feature vector which contains sequential multimodal information in a mixed form. The original feature vectors from each modality are re-generated from the fused feature vector in the multimodal autoencoder. The decoder LSTM decodes the sequential inputs from the regenerated feature vector. Our model is trained with the visual and motion sequences of humans and is tested by recall tasks. The experimental results show that our model can learn and remember the sequential multimodal inputs and decrease the ambiguity generated at the learning stage of LSTMs using integrated multimodal information. Our model can also recall the visual sequences from the only motion sequences and vice versa.\",\"PeriodicalId\":335727,\"journal\":{\"name\":\"EAI Endorsed Trans. Security Safety\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"EAI Endorsed Trans. Security Safety\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4108/eai.3-12-2015.2262505\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"EAI Endorsed Trans. Security Safety","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4108/eai.3-12-2015.2262505","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

我们提出了一种利用长短期记忆学习和整合顺序多模态信息的神经网络结构。我们的模型由编码器和解码器lstm和多模态自编码器组成。为了整合序列多模态信息,首先,编码器LSTM将序列输入编码为每个模态的固定范围特征向量;其次,多模态自编码器对各模态特征向量进行整合,生成融合特征向量,融合特征向量以混合形式包含顺序多模态信息;在多模态自编码器中,从融合的特征向量中重新生成每个模态的原始特征向量。解码器LSTM从重新生成的特征向量中解码顺序输入。我们的模型是用人类的视觉和动作序列进行训练的,并通过回忆任务进行测试。实验结果表明,该模型可以学习和记忆连续的多模态输入,并利用集成的多模态信息减少lstm学习阶段产生的歧义。我们的模型还可以从唯一的运动序列中召回视觉序列,反之亦然。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Proposing Multimodal Integration Model Using LSTM and Autoencoder
We propose an architecture of neural network that can learn and integrate sequential multimodal information using Long Short Term Memory. Our model consists of encoder and decoder LSTMs and multimodal autoencoder. For integrating sequential multimodal information, firstly, the encoder LSTM encodes a sequential input to a fixed range feature vector for each modality. Secondly, the multimodal autoencoder integrates the feature vectors from each modality and generate a fused feature vector which contains sequential multimodal information in a mixed form. The original feature vectors from each modality are re-generated from the fused feature vector in the multimodal autoencoder. The decoder LSTM decodes the sequential inputs from the regenerated feature vector. Our model is trained with the visual and motion sequences of humans and is tested by recall tasks. The experimental results show that our model can learn and remember the sequential multimodal inputs and decrease the ambiguity generated at the learning stage of LSTMs using integrated multimodal information. Our model can also recall the visual sequences from the only motion sequences and vice versa.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Systemic Security and Privacy Review: Attacks and Prevention Mechanisms over IOT Layers Mitigating Vulnerabilities in Closed Source Software Comparing Online Surveys for Cybersecurity: SONA and MTurk Dynamic Risk Assessment and Analysis Framework for Large-Scale Cyber-Physical Systems How data-sharing nudges influence people's privacy preferences: A machine learning-based analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1