Multi-View and Multi-Modal Action Recognition with Learned Fusion

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2018-11-01 DOI:10.23919/APSIPA.2018.8659539

Sandy Ardianto, H. Hang

引用次数: 9

Abstract

In this paper, we study multi-modal and multi-view action recognition system based on the deep-learning techniques. We extended the Temporal Segment Network with additional data fusion stage to combine information from different sources. In this research, we use multiple types of information from different modality such as RGB, depth, infrared data to detect predefined human actions. We tested various combinations of these data sources to examine their impact on the final detection accuracy. We designed 3 information fusion methods to generate the final decision. The most interested one is the Learned Fusion Net designed by us. It turns out the Learned Fusion structure has the best results but requires more training.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于学习融合的多视角多模态动作识别

本文研究了基于深度学习技术的多模态、多视角动作识别系统。我们对时态段网络进行了扩展，增加了数据融合阶段，以整合不同来源的信息。在本研究中，我们使用来自不同模态的多种类型的信息，如RGB、深度、红外数据来检测预定义的人类行为。我们测试了这些数据源的各种组合，以检查它们对最终检测精度的影响。我们设计了3种信息融合方法来生成最终的决策。最让人感兴趣的是我们设计的学习型融合网。事实证明，习得融合结构的效果最好，但需要更多的训练。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

自引率

0.00%

发文量