长短期记忆卷积神经网络在第一人称视觉活动识别中的应用

2017 IEEE International Conference on Computer Vision Workshops (ICCVW) Pub Date : 2017-10-01 DOI:10.1109/ICCVW.2017.159

Girmaw Abebe, A. Cavallaro

{"title":"长短期记忆卷积神经网络在第一人称视觉活动识别中的应用","authors":"Girmaw Abebe, A. Cavallaro","doi":"10.1109/ICCVW.2017.159","DOIUrl":null,"url":null,"abstract":"Temporal information is the main source of discriminating characteristics for the recognition of proprioceptive activities in first-person vision (FPV). In this paper, we propose a motion representation that uses stacked spectrograms. These spectrograms are generated over temporal windows from mean grid-optical-flow vectors and the displacement vectors of the intensity centroid. The stacked representation enables us to use 2D convolutions to learn and extract global motion features. Moreover, we employ a long short-term memory (LSTM) network to encode the temporal dependency among consecutive samples recursively. Experimental results show that the proposed approach achieves state-of-the-art performance in the largest public dataset for FPV activity recognition.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"A Long Short-Term Memory Convolutional Neural Network for First-Person Vision Activity Recognition\",\"authors\":\"Girmaw Abebe, A. Cavallaro\",\"doi\":\"10.1109/ICCVW.2017.159\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Temporal information is the main source of discriminating characteristics for the recognition of proprioceptive activities in first-person vision (FPV). In this paper, we propose a motion representation that uses stacked spectrograms. These spectrograms are generated over temporal windows from mean grid-optical-flow vectors and the displacement vectors of the intensity centroid. The stacked representation enables us to use 2D convolutions to learn and extract global motion features. Moreover, we employ a long short-term memory (LSTM) network to encode the temporal dependency among consecutive samples recursively. Experimental results show that the proposed approach achieves state-of-the-art performance in the largest public dataset for FPV activity recognition.\",\"PeriodicalId\":149766,\"journal\":{\"name\":\"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCVW.2017.159\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCVW.2017.159","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

摘要

时间信息是第一人称视觉(FPV)本体感觉活动识别的主要鉴别特征来源。在本文中，我们提出了一种使用堆叠谱图的运动表示。这些谱图是在时间窗口上由平均网格光流矢量和强度质心的位移矢量生成的。堆叠表示使我们能够使用二维卷积来学习和提取全局运动特征。此外，我们采用长短期记忆(LSTM)网络对连续样本间的时间依赖性进行递归编码。实验结果表明，该方法在最大的公共数据集中达到了最先进的FPV活动识别性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Long Short-Term Memory Convolutional Neural Network for First-Person Vision Activity Recognition

Temporal information is the main source of discriminating characteristics for the recognition of proprioceptive activities in first-person vision (FPV). In this paper, we propose a motion representation that uses stacked spectrograms. These spectrograms are generated over temporal windows from mean grid-optical-flow vectors and the displacement vectors of the intensity centroid. The stacked representation enables us to use 2D convolutions to learn and extract global motion features. Moreover, we employ a long short-term memory (LSTM) network to encode the temporal dependency among consecutive samples recursively. Experimental results show that the proposed approach achieves state-of-the-art performance in the largest public dataset for FPV activity recognition.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE International Conference on Computer Vision Workshops (ICCVW)

自引率

0.00%

发文量