{"title":"教育纪录片视频的分割与获取通过视、声、文相结合的方式进行理解","authors":"Aijuan Dong, Honglin Li","doi":"10.1109/ISSPIT.2005.1577174","DOIUrl":null,"url":null,"abstract":"Educational documentary videos play an important role in enriching learning experience. However, due to unstructured and linear features, documentary videos are much more difficult to access than text-based documents and have not been effectively utilized. In this paper, we propose a multimodal, hierarchical documentary video segmentation procedure based on image, audio and text understanding. The coincidence of scene-level audio breaks and text (transcript) breaks from domain independent text segmentation determines documentary video scenes/paragraphs. Each video scene/paragraph is further segmented into video shots based on video visual features. To effectively utilize composite documentary video learning materials generated, we propose a documentary video access platform that supports hierarchical organization of video content, multimodal presentation of information, augmented video content and multi-level flexible search. A prototype platform is implemented to demonstrate the idea","PeriodicalId":421826,"journal":{"name":"Proceedings of the Fifth IEEE International Symposium on Signal Processing and Information Technology, 2005.","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Educational documentary video segmentation and access through combination of visual, audio and text understanding\",\"authors\":\"Aijuan Dong, Honglin Li\",\"doi\":\"10.1109/ISSPIT.2005.1577174\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Educational documentary videos play an important role in enriching learning experience. However, due to unstructured and linear features, documentary videos are much more difficult to access than text-based documents and have not been effectively utilized. In this paper, we propose a multimodal, hierarchical documentary video segmentation procedure based on image, audio and text understanding. The coincidence of scene-level audio breaks and text (transcript) breaks from domain independent text segmentation determines documentary video scenes/paragraphs. Each video scene/paragraph is further segmented into video shots based on video visual features. To effectively utilize composite documentary video learning materials generated, we propose a documentary video access platform that supports hierarchical organization of video content, multimodal presentation of information, augmented video content and multi-level flexible search. A prototype platform is implemented to demonstrate the idea\",\"PeriodicalId\":421826,\"journal\":{\"name\":\"Proceedings of the Fifth IEEE International Symposium on Signal Processing and Information Technology, 2005.\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-12-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Fifth IEEE International Symposium on Signal Processing and Information Technology, 2005.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISSPIT.2005.1577174\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Fifth IEEE International Symposium on Signal Processing and Information Technology, 2005.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSPIT.2005.1577174","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Educational documentary video segmentation and access through combination of visual, audio and text understanding
Educational documentary videos play an important role in enriching learning experience. However, due to unstructured and linear features, documentary videos are much more difficult to access than text-based documents and have not been effectively utilized. In this paper, we propose a multimodal, hierarchical documentary video segmentation procedure based on image, audio and text understanding. The coincidence of scene-level audio breaks and text (transcript) breaks from domain independent text segmentation determines documentary video scenes/paragraphs. Each video scene/paragraph is further segmented into video shots based on video visual features. To effectively utilize composite documentary video learning materials generated, we propose a documentary video access platform that supports hierarchical organization of video content, multimodal presentation of information, augmented video content and multi-level flexible search. A prototype platform is implemented to demonstrate the idea