{"title":"Educational documentary video segmentation and access through combination of visual, audio and text understanding","authors":"Aijuan Dong, Honglin Li","doi":"10.1109/ISSPIT.2005.1577174","DOIUrl":null,"url":null,"abstract":"Educational documentary videos play an important role in enriching learning experience. However, due to unstructured and linear features, documentary videos are much more difficult to access than text-based documents and have not been effectively utilized. In this paper, we propose a multimodal, hierarchical documentary video segmentation procedure based on image, audio and text understanding. The coincidence of scene-level audio breaks and text (transcript) breaks from domain independent text segmentation determines documentary video scenes/paragraphs. Each video scene/paragraph is further segmented into video shots based on video visual features. To effectively utilize composite documentary video learning materials generated, we propose a documentary video access platform that supports hierarchical organization of video content, multimodal presentation of information, augmented video content and multi-level flexible search. A prototype platform is implemented to demonstrate the idea","PeriodicalId":421826,"journal":{"name":"Proceedings of the Fifth IEEE International Symposium on Signal Processing and Information Technology, 2005.","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Fifth IEEE International Symposium on Signal Processing and Information Technology, 2005.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSPIT.2005.1577174","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Educational documentary videos play an important role in enriching learning experience. However, due to unstructured and linear features, documentary videos are much more difficult to access than text-based documents and have not been effectively utilized. In this paper, we propose a multimodal, hierarchical documentary video segmentation procedure based on image, audio and text understanding. The coincidence of scene-level audio breaks and text (transcript) breaks from domain independent text segmentation determines documentary video scenes/paragraphs. Each video scene/paragraph is further segmented into video shots based on video visual features. To effectively utilize composite documentary video learning materials generated, we propose a documentary video access platform that supports hierarchical organization of video content, multimodal presentation of information, augmented video content and multi-level flexible search. A prototype platform is implemented to demonstrate the idea