Similar scene retrieval in soccer videos with weak annotations by multimodal use of bidirectional LSTM

Proceedings of the 2nd ACM International Conference on Multimedia in Asia Pub Date : 2021-03-07 DOI:10.1145/3444685.3446280

T. Haruyama, Sho Takahashi, Takahiro Ogawa, M. Haseyama

{"title":"Similar scene retrieval in soccer videos with weak annotations by multimodal use of bidirectional LSTM","authors":"T. Haruyama, Sho Takahashi, Takahiro Ogawa, M. Haseyama","doi":"10.1145/3444685.3446280","DOIUrl":null,"url":null,"abstract":"This paper presents a novel method to retrieve similar scenes in soccer videos with weak annotations via multimodal use of bidirectional long short-term memory (BiLSTM). The significant increase in the number of different types of soccer videos with the development of technology brings valid assets for effective coaching, but it also increases the work of players and training staff. We tackle this problem with a nontraditional combination of pre-trained models for feature extraction and BiLSTMs for feature transformation. By using the pre-trained models, no training data is required for feature extraction. Then effective feature transformation for similarity calculation is performed by applying BiLSTM trained with weak annotations. This transformation allows for highly accurate capture of soccer video context from less annotation work. In this paper, we achieve an accurate retrieval of similar scenes by multimodal use of this BiLSTM-based transformer trainable with less human effort. The effectiveness of our method was verified by comparative experiments with state-of-the-art using actual soccer video dataset.","PeriodicalId":119278,"journal":{"name":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2nd ACM International Conference on Multimedia in Asia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3444685.3446280","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

This paper presents a novel method to retrieve similar scenes in soccer videos with weak annotations via multimodal use of bidirectional long short-term memory (BiLSTM). The significant increase in the number of different types of soccer videos with the development of technology brings valid assets for effective coaching, but it also increases the work of players and training staff. We tackle this problem with a nontraditional combination of pre-trained models for feature extraction and BiLSTMs for feature transformation. By using the pre-trained models, no training data is required for feature extraction. Then effective feature transformation for similarity calculation is performed by applying BiLSTM trained with weak annotations. This transformation allows for highly accurate capture of soccer video context from less annotation work. In this paper, we achieve an accurate retrieval of similar scenes by multimodal use of this BiLSTM-based transformer trainable with less human effort. The effectiveness of our method was verified by comparative experiments with state-of-the-art using actual soccer video dataset.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于双向LSTM的弱注释足球视频相似场景检索

提出了一种基于双向长短期记忆(BiLSTM)的多模态弱注释足球视频相似场景检索方法。随着技术的发展，不同类型的足球视频的数量显著增加，为有效的教练带来了有效的资产，但同时也增加了球员和训练人员的工作量。我们采用了一种非传统的组合方法来解决这个问题，即使用预训练模型进行特征提取，使用bilstm进行特征转换。通过使用预训练模型，特征提取不需要训练数据。然后利用弱标注训练的BiLSTM进行有效的特征变换，进行相似度计算。这种转换允许从较少的注释工作中高度准确地捕获足球视频上下文。在本文中，我们通过多模态使用这种基于bilstm的可训练变压器实现了相似场景的准确检索，并且减少了人工的工作量。通过实际足球视频数据集与最新技术的对比实验，验证了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 2nd ACM International Conference on Multimedia in Asia

自引率

0.00%

发文量

期刊最新文献

Storyboard relational model for group activity recognition Objective object segmentation visual quality evaluation based on pixel-level and region-level characteristics Multiplicative angular margin loss for text-based person search Distilling knowledge in causal inference for unbiased visual question answering A large-scale image retrieval system for everyday scenes