Human Action Recognition in First Person Videos using Verb-Object Pairs

Zeynep Gökce, Selen Pehlivan
{"title":"Human Action Recognition in First Person Videos using Verb-Object Pairs","authors":"Zeynep Gökce, Selen Pehlivan","doi":"10.1109/SIU.2019.8806562","DOIUrl":null,"url":null,"abstract":"Human action recognition problem is important for distinguishing the rich variety of human activities in first-person videos. While there has been an improvement in egocentric action recognition, the space of action categories is large and it looks impractical to label training data for all categories. In this work, we decompose action models into verb and noun model pairs and propose a method to combine them with a simple fusion strategy. Particularly, we use 3 Dimensional Convolutional Neural Network model, C3D, for verb stream to model video-based features, and we use object detection model, YOLO, for noun stream to model objects interacting with human. We present experiments on the recently introduced large-scale EGTEA Gaze+ dataset with 106 action classes, and show that our model is comparable to the state-of-the-art action recognition models.","PeriodicalId":326275,"journal":{"name":"2019 27th Signal Processing and Communications Applications Conference (SIU)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 27th Signal Processing and Communications Applications Conference (SIU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIU.2019.8806562","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Human action recognition problem is important for distinguishing the rich variety of human activities in first-person videos. While there has been an improvement in egocentric action recognition, the space of action categories is large and it looks impractical to label training data for all categories. In this work, we decompose action models into verb and noun model pairs and propose a method to combine them with a simple fusion strategy. Particularly, we use 3 Dimensional Convolutional Neural Network model, C3D, for verb stream to model video-based features, and we use object detection model, YOLO, for noun stream to model objects interacting with human. We present experiments on the recently introduced large-scale EGTEA Gaze+ dataset with 106 action classes, and show that our model is comparable to the state-of-the-art action recognition models.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
第一人称视频中使用动物对的人类动作识别
人类动作识别问题对于区分第一人称视频中丰富多样的人类活动具有重要意义。虽然在以自我为中心的动作识别方面有了很大的进步,但动作类别的空间很大,对所有类别的训练数据进行标记是不切实际的。在这项工作中,我们将动作模型分解为动词和名词模型对,并提出了一种用简单的融合策略将它们组合起来的方法。其中,动词流使用三维卷积神经网络模型C3D来模拟基于视频的特征,名词流使用目标检测模型YOLO来模拟与人交互的对象。我们在最近引入的具有106个动作类的大规模EGTEA Gaze+数据集上进行了实验,并表明我们的模型与最先进的动作识别模型相当。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Antenna Selection on Spatial Modulation: A Machine Learning Approach Design of Phase and Amplitude Controlled Circuits for Active Phased-Array RF Beamforming Networks Classification of Extracranial and Intracranial EEG Signals by using Finite Impulse Response Filter through Ensemble Learning Visual Place Recognition by DTW-based sequence alignment Delay Analysis for Wireless Communication Systems with Caching
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1