基于演讲内容提取的演讲视频分割框架

Dipesh Chand, H. Oğul
{"title":"基于演讲内容提取的演讲视频分割框架","authors":"Dipesh Chand, H. Oğul","doi":"10.1109/SAMI50585.2021.9378632","DOIUrl":null,"url":null,"abstract":"Increasing demand for lecture videos in digital libraries has raised the challenge of automatic annotation of lecture content for effective navigation of lectures by users. One direction is the prior segmentation of lecture videos to simplify several applications such as indexing, keyword spotting, and targeted search. In this study, we present a lecture video segmentation framework based on the speech content of the instructors. The framework is built upon a model that extracts textual and acoustic features from speech and uses them to identify topical segment boundaries of the lecture video. To evaluate our proposed model, we collected our own dataset containing a diverse set of 37 lecture videos and also manually created ground truth. The performance was measured by using metrics like Precision, Recall, and F1 Score and obtained 0.69, 0.58, and 0.63 respectively. We also compared our model with some previously known similar models where our model outperformed others. The overall results of the study are presented as a lecture video segmentation model, integrating various tools and techniques, and showing promising performance. Findings can be used further for research in content-based search and retrieval using speech content.","PeriodicalId":402414,"journal":{"name":"2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A Framework for Lecture Video Segmentation from Extracted Speech Content\",\"authors\":\"Dipesh Chand, H. Oğul\",\"doi\":\"10.1109/SAMI50585.2021.9378632\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Increasing demand for lecture videos in digital libraries has raised the challenge of automatic annotation of lecture content for effective navigation of lectures by users. One direction is the prior segmentation of lecture videos to simplify several applications such as indexing, keyword spotting, and targeted search. In this study, we present a lecture video segmentation framework based on the speech content of the instructors. The framework is built upon a model that extracts textual and acoustic features from speech and uses them to identify topical segment boundaries of the lecture video. To evaluate our proposed model, we collected our own dataset containing a diverse set of 37 lecture videos and also manually created ground truth. The performance was measured by using metrics like Precision, Recall, and F1 Score and obtained 0.69, 0.58, and 0.63 respectively. We also compared our model with some previously known similar models where our model outperformed others. The overall results of the study are presented as a lecture video segmentation model, integrating various tools and techniques, and showing promising performance. Findings can be used further for research in content-based search and retrieval using speech content.\",\"PeriodicalId\":402414,\"journal\":{\"name\":\"2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI)\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SAMI50585.2021.9378632\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SAMI50585.2021.9378632","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

数字图书馆对讲座视频的需求日益增长,对讲座内容的自动标注提出了挑战,使用户能够有效地浏览讲座。一个方向是讲座视频的预先分割,以简化索引、关键字定位和目标搜索等几个应用。在本研究中,我们提出了一个基于讲师演讲内容的讲座视频分割框架。该框架建立在一个模型之上,该模型从语音中提取文本和声学特征,并使用它们来识别讲座视频的主题片段边界。为了评估我们提出的模型,我们收集了自己的数据集,其中包含37个不同的讲座视频,并手动创建了地面真相。使用Precision、Recall和F1 Score等指标来衡量性能,分别获得0.69、0.58和0.63。我们还将我们的模型与一些已知的类似模型进行了比较,其中我们的模型优于其他模型。该研究的总体结果以讲座视频分割模型的形式呈现,该模型集成了各种工具和技术,并显示出良好的性能。研究结果可以进一步用于基于内容的语音内容搜索和检索的研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Framework for Lecture Video Segmentation from Extracted Speech Content
Increasing demand for lecture videos in digital libraries has raised the challenge of automatic annotation of lecture content for effective navigation of lectures by users. One direction is the prior segmentation of lecture videos to simplify several applications such as indexing, keyword spotting, and targeted search. In this study, we present a lecture video segmentation framework based on the speech content of the instructors. The framework is built upon a model that extracts textual and acoustic features from speech and uses them to identify topical segment boundaries of the lecture video. To evaluate our proposed model, we collected our own dataset containing a diverse set of 37 lecture videos and also manually created ground truth. The performance was measured by using metrics like Precision, Recall, and F1 Score and obtained 0.69, 0.58, and 0.63 respectively. We also compared our model with some previously known similar models where our model outperformed others. The overall results of the study are presented as a lecture video segmentation model, integrating various tools and techniques, and showing promising performance. Findings can be used further for research in content-based search and retrieval using speech content.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Usage of RAPTOR for travel time minimizing journey planner Slip Control by Identifying the Magnetic Field of the Elements of an Asynchronous Motor Supervised Operational Change Point Detection using Ensemble Long-Short Term Memory in a Multicomponent Industrial System Improving the activity recognition using GMAF and transfer learning in post-stroke rehabilitation assessment A Baseline Assessment Method of UAV Swarm Resilience Based on Complex Networks*
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1