Attention Boosted Deep Networks For Video Classification

2020 IEEE International Conference on Image Processing (ICIP) Pub Date : 2020-10-01 DOI:10.1109/ICIP40778.2020.9190996

Junyong You, J. Korhonen

{"title":"Attention Boosted Deep Networks For Video Classification","authors":"Junyong You, J. Korhonen","doi":"10.1109/ICIP40778.2020.9190996","DOIUrl":null,"url":null,"abstract":"Video classification can be performed by summarizing image contents of individual frames into one class by deep neural networks, e.g., CNN and LSTM. Human interpretation of video content is influenced by the attention mechanism. In other words, video class can be more attentively decided by certain information than others. In this paper, we propose to integrate the attention mechanism into deep networks for video classification. The proposed framework employs 2D CNN networks with ImageNet pretrained weights to extract features of video frames that are then fed to a bidirectional LSTM network for video classification. An attention block has been developed that can be added after the LSTM network in the proposed framework. Several different 2D CNN architectures have been tested in the experiments. The results with respect to two publicly available datasets have demonstrated that integrating attention can boost the performance of deep networks in video classification compared to not applying the attention block. We also found out that applying attention to the LSTM outputs on the VGG19 architecture provides the highest classification accuracy in the proposed framework.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Image Processing (ICIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIP40778.2020.9190996","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Video classification can be performed by summarizing image contents of individual frames into one class by deep neural networks, e.g., CNN and LSTM. Human interpretation of video content is influenced by the attention mechanism. In other words, video class can be more attentively decided by certain information than others. In this paper, we propose to integrate the attention mechanism into deep networks for video classification. The proposed framework employs 2D CNN networks with ImageNet pretrained weights to extract features of video frames that are then fed to a bidirectional LSTM network for video classification. An attention block has been developed that can be added after the LSTM network in the proposed framework. Several different 2D CNN architectures have been tested in the experiments. The results with respect to two publicly available datasets have demonstrated that integrating attention can boost the performance of deep networks in video classification compared to not applying the attention block. We also found out that applying attention to the LSTM outputs on the VGG19 architecture provides the highest classification accuracy in the proposed framework.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

关注增强的视频分类深度网络

视频分类可以通过深度神经网络，如CNN和LSTM，将单个帧的图像内容归纳为一类。人类对视频内容的解读受到注意机制的影响。换句话说，视频课可以由某些信息比其他信息更专注地决定。在本文中，我们提出将注意力机制整合到深度网络中进行视频分类。该框架采用带有ImageNet预训练权值的二维CNN网络来提取视频帧的特征，然后将其馈送到双向LSTM网络进行视频分类。我们开发了一个注意力块，可以添加到LSTM网络之后。几种不同的二维CNN架构已经在实验中进行了测试。针对两个公开可用的数据集的结果表明，与不应用注意力块相比，集成注意力可以提高深度网络在视频分类中的性能。我们还发现，在提出的框架中，将注意力应用于VGG19架构上的LSTM输出提供了最高的分类精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 IEEE International Conference on Image Processing (ICIP)

自引率

0.00%

发文量