Automatic Content Curation System for Multiple Live Sport Video Streams

2015 IEEE International Symposium on Multimedia (ISM) Pub Date : 2015-12-01 DOI:10.1109/ISM.2015.17

Kazuki Fujisawa, Yuko Hirabe, H. Suwa, Yutaka Arakawa, K. Yasumoto

{"title":"Automatic Content Curation System for Multiple Live Sport Video Streams","authors":"Kazuki Fujisawa, Yuko Hirabe, H. Suwa, Yutaka Arakawa, K. Yasumoto","doi":"10.1109/ISM.2015.17","DOIUrl":null,"url":null,"abstract":"In this paper, we aim to develop a method to create personalized and high-presence multi-channel contents for a sport game through realtime content curation from various media streams captured/created by spectators. We use the live TV broadcast as a ground truth data and construct a machine learning-based model to automatically conduct curation from multiple videos which spectators captured from different angles and zoom levels. The live TV broadcast of a baseball game has some curation rules which select a specific angle camera for some specific scenes (e.g., a pitcher throwing a ball). As inputs for constructing a model, we use meta data such as image feature data (e.g., a pitcher is on the screen) in each fixed interval of baseball videos and game progress data (e.g., the inning number and the batting order). Output is the camera ID (among multiple cameras of spectators) at each point of time. For evaluation, we targeted Spring-Selection high-school baseball games. As training data, we used image features, game progress data, and the camera position at each point of time in the TV broadcast. We used videos of a baseball game captured from 7 different points in Hanshin Koshien Stadium with handy video cameras and generated sample data set by dividing the videos to fixed interval segments. We divided the sample data set into the training data set and the test data set and evaluated our method through two validation methods: (1) 10-fold crossvalidation method and (2) hold-out methods (e.g., learning first and second innings and testing third inning). As a result, our method predicted the camera switching timings with accuracy (F-measure) of 72.53% on weighted average for the base camera work and 92.1% for the fixed camera work.","PeriodicalId":250353,"journal":{"name":"2015 IEEE International Symposium on Multimedia (ISM)","volume":"436 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Symposium on Multimedia (ISM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISM.2015.17","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

In this paper, we aim to develop a method to create personalized and high-presence multi-channel contents for a sport game through realtime content curation from various media streams captured/created by spectators. We use the live TV broadcast as a ground truth data and construct a machine learning-based model to automatically conduct curation from multiple videos which spectators captured from different angles and zoom levels. The live TV broadcast of a baseball game has some curation rules which select a specific angle camera for some specific scenes (e.g., a pitcher throwing a ball). As inputs for constructing a model, we use meta data such as image feature data (e.g., a pitcher is on the screen) in each fixed interval of baseball videos and game progress data (e.g., the inning number and the batting order). Output is the camera ID (among multiple cameras of spectators) at each point of time. For evaluation, we targeted Spring-Selection high-school baseball games. As training data, we used image features, game progress data, and the camera position at each point of time in the TV broadcast. We used videos of a baseball game captured from 7 different points in Hanshin Koshien Stadium with handy video cameras and generated sample data set by dividing the videos to fixed interval segments. We divided the sample data set into the training data set and the test data set and evaluated our method through two validation methods: (1) 10-fold crossvalidation method and (2) hold-out methods (e.g., learning first and second innings and testing third inning). As a result, our method predicted the camera switching timings with accuracy (F-measure) of 72.53% on weighted average for the base camera work and 92.1% for the fixed camera work.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

多个直播体育视频流的自动内容管理系统

在本文中，我们的目标是开发一种方法，通过从观众捕获/创建的各种媒体流中进行实时内容策划，为体育比赛创建个性化和高存在度的多渠道内容。我们将电视直播作为地面真实数据，构建基于机器学习的模型，对观众从不同角度和缩放级别拍摄的多个视频进行自动策展。棒球比赛的电视直播有一些策展规则，会为某些特定场景选择特定角度的摄像机(例如投手投球)。作为构建模型的输入，我们在棒球视频的每个固定间隔中使用图像特征数据(例如，屏幕上有投手)和比赛进度数据(例如，局数和击球顺序)等元数据。输出是每个时间点的摄像机ID(在观众的多个摄像机中)。为了进行评估，我们以春季选拔高中棒球比赛为目标。作为训练数据，我们使用图像特征、比赛进程数据和电视转播中每个时间点的摄像机位置。我们使用便携式摄像机从Hanshin Koshien体育场的7个不同地点拍摄的棒球比赛视频，并通过将视频划分为固定的间隔片段来生成样本数据集。我们将样本数据集分为训练数据集和测试数据集，并通过两种验证方法对我们的方法进行了评估:(1)10倍交叉验证法和(2)保留方法(例如学习第一局和第二局，测试第三局)。结果表明，该方法预测摄像机切换时间的加权平均精度(F-measure)为基础摄像机工作的72.53%，固定摄像机工作的92.1%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2015 IEEE International Symposium on Multimedia (ISM)

自引率

0.00%

发文量