Joint Event Detection and Description in Continuous Video Streams

2019 IEEE Winter Applications of Computer Vision Workshops (WACVW) Pub Date : 2018-02-28 DOI:10.1109/WACV.2019.00048

Huijuan Xu, Boyang Albert Li, Vasili Ramanishka, L. Sigal, Kate Saenko

引用次数: 48

Abstract

Dense video captioning involves first localizing events in a video and then generating captions for the identified events. We present the Joint Event Detection and Description Network (JEDDi-Net) for solving this task in an end-to-end fashion, which encodes the input video stream with three-dimensional convolutional layers, proposes variable- length temporal events based on pooled features, and then uses a two-level hierarchical LSTM module with context modeling to transcribe the event proposals into captions. We show the effectiveness of our proposed JEDDi-Net on the large-scale ActivityNet Captions dataset.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

连续视频流中的联合事件检测与描述

密集视频字幕包括首先将视频中的事件本地化，然后为已识别的事件生成字幕。我们提出了联合事件检测和描述网络(JEDDi-Net)以端到端方式解决该任务，该网络使用三维卷积层对输入视频流进行编码，提出基于池化特征的变长时间事件，然后使用具有上下文建模的两级分层LSTM模块将事件建议转录成字幕。我们在大规模ActivityNet Captions数据集上展示了我们提出的JEDDi-Net的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 IEEE Winter Applications of Computer Vision Workshops (WACVW)

自引率

0.00%

发文量

期刊最新文献

Can Liveness Be Automatically Detected from Latent Fingerprints? Novel Activities Detection Algorithm in Extended Videos Exploring Automatic Face Recognition on Match Performance and Gender Bias for Children MFC Datasets: Large-Scale Benchmark Datasets for Media Forensic Challenge Evaluation Sponsors and Corporate Donors