A study of evaluation metrics and datasets for video captioning

2017 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS) Pub Date : 2017-11-01 DOI:10.1109/ICIIBMS.2017.8279760

Jaehui Park, C. Song, Ji-Hyeong Han

引用次数: 6

Abstract

With the fast growing interest in deep learning, various applications and machine learning tasks are emerged in recent years. Video captioning is especially gaining a lot of attention from both computer vision and natural language processing fields. Generating captions is usually performed by jointly learning of different types of data modalities that share common themes in the video. Learning with the joining representations of different modalities is very challenging due to the inherent heterogeneity resided in the mixed information of visual scenes, speech dialogs, music and sounds, and etc. Consequently, it is hard to evaluate the quality of video captioning results. In this paper, we introduce well-known metrics and datasets for evaluation of video captioning. We compare the the existing metrics and datasets to derive a new research proposal for the evaluation of video descriptions.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

视频字幕评价指标和数据集的研究

随着人们对深度学习的兴趣日益浓厚，近年来出现了各种各样的应用和机器学习任务。视频字幕尤其受到计算机视觉和自然语言处理领域的广泛关注。生成字幕通常是通过联合学习视频中共享共同主题的不同类型的数据模式来完成的。由于视觉场景、语音对话、音乐和声音等混合信息的固有异质性，使用不同模态的连接表示进行学习是非常具有挑战性的。因此，很难评价视频字幕效果的质量。在本文中，我们引入了著名的度量和数据集来评估视频字幕。我们比较了现有的指标和数据集，得出了一个新的研究建议，以评估视频描述。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2017 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS)

自引率

0.00%

发文量

期刊最新文献

Verification of accuracy of knife tip position estimation in liver surgery support system From usability to user experience Optimal window lengths, features and subsets thereof for freezing of gait classification FF OCT with a swept source integrating a SLD and an AOTF 2-P imaging of mouse visual cortex layer 6 corticothalamic feedback during different behavior states