Content Extraction from Lecture Video via Speaker Action Classification Based on Pose Information

Fei Xu, Kenny Davila, S. Setlur, V. Govindaraju
{"title":"Content Extraction from Lecture Video via Speaker Action Classification Based on Pose Information","authors":"Fei Xu, Kenny Davila, S. Setlur, V. Govindaraju","doi":"10.1109/ICDAR.2019.00171","DOIUrl":null,"url":null,"abstract":"Online lecture videos are increasingly important e-learning materials for students. Automated content extraction from lecture videos facilitates information retrieval applications that improve access to the lecture material. A significant number of lecture videos include the speaker in the image. Speakers perform various semantically meaningful actions during the process of teaching. Among all the movements of the speaker, key actions such as writing or erasing potentially indicate important features directly related to the lecture content. In this paper, we present a methodology for lecture video content extraction using the speaker actions. Each lecture video is divided into small temporal units called action segments. Using a pose estimator, body and hands skeleton data are extracted and used to compute motion-based features describing each action segment. Then, the dominant speaker action of each of these segments is classified using Random forests and the motion-based features. With the temporal and spatial range of these actions, we implement an alternative way to draw key-frames of handwritten content from the video. In addition, for our fixed camera videos, we also use the skeleton data to compute a mask of the speaker writing locations for the subtraction of the background noise from the binarized key-frames. Our method has been tested on a publicly available lecture video dataset, and it shows reasonable recall and precision results, with a very good compression ratio which is better than previous methods based on content analysis.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Document Analysis and Recognition (ICDAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2019.00171","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Online lecture videos are increasingly important e-learning materials for students. Automated content extraction from lecture videos facilitates information retrieval applications that improve access to the lecture material. A significant number of lecture videos include the speaker in the image. Speakers perform various semantically meaningful actions during the process of teaching. Among all the movements of the speaker, key actions such as writing or erasing potentially indicate important features directly related to the lecture content. In this paper, we present a methodology for lecture video content extraction using the speaker actions. Each lecture video is divided into small temporal units called action segments. Using a pose estimator, body and hands skeleton data are extracted and used to compute motion-based features describing each action segment. Then, the dominant speaker action of each of these segments is classified using Random forests and the motion-based features. With the temporal and spatial range of these actions, we implement an alternative way to draw key-frames of handwritten content from the video. In addition, for our fixed camera videos, we also use the skeleton data to compute a mask of the speaker writing locations for the subtraction of the background noise from the binarized key-frames. Our method has been tested on a publicly available lecture video dataset, and it shows reasonable recall and precision results, with a very good compression ratio which is better than previous methods based on content analysis.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于姿态信息的演讲者动作分类的演讲视频内容提取
在线讲座视频是学生越来越重要的电子学习材料。从讲座视频中自动提取内容有助于信息检索应用程序,从而改善对讲座材料的访问。很多讲座视频的图像中都有讲者。在教学过程中,说话者会做出各种有语义意义的动作。在演讲者的所有动作中,关键动作,如写或擦除,可能表明与演讲内容直接相关的重要特征。在本文中,我们提出了一种使用演讲者动作提取讲座视频内容的方法。每个讲座视频被分成小的时间单元,称为动作片段。使用姿态估计器,提取身体和手部骨骼数据并用于计算描述每个动作段的基于运动的特征。然后,使用随机森林和基于运动的特征对每个片段的主要说话人动作进行分类。利用这些动作的时间和空间范围,我们实现了一种从视频中绘制手写内容的关键帧的替代方法。此外,对于我们的固定摄像机视频,我们还使用骨架数据来计算扬声器写入位置的掩码,以便从二值化的关键帧中减去背景噪声。我们的方法已经在一个公开的讲座视频数据集上进行了测试,显示出合理的查全率和查准率结果,压缩比非常好,优于以往基于内容分析的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Article Segmentation in Digitised Newspapers with a 2D Markov Model ICDAR 2019 Robust Reading Challenge on Reading Chinese Text on Signboard TableNet: Deep Learning Model for End-to-end Table Detection and Tabular Data Extraction from Scanned Document Images DICE: Deep Intelligent Contextual Embedding for Twitter Sentiment Analysis Blind Source Separation Based Framework for Multispectral Document Images Binarization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1