Seung-Bo Park, Kyung-Jin Oh, Heung-Nam Kim, Geun-Sik Jo
{"title":"Automatic Subtitles Localization through Speaker Identification in Multimedia System","authors":"Seung-Bo Park, Kyung-Jin Oh, Heung-Nam Kim, Geun-Sik Jo","doi":"10.1109/IWSCA.2008.28","DOIUrl":null,"url":null,"abstract":"With the increasing popularity of online video, efficient captioning and displaying the captioned text (subtitles) have also been issued with the accessibility. However, in most cases, subtitles are shown on a separate display below a screen. As a result, some viewers lose condensed information about the contents of the video. To elevate readability and visibility of viewers, in this paper, we present a framework for displaying synchronized text around a speaker in video. The proposed approach first identifies speakers using face detection technologies and subsequently detects a subtitles region. In addition, we adapt DFXP, which is interoperable timed text format of W3C, to support interchanging with existing legacy system. In order to achieve smooth playback of multimedia presentation, such as SMIL and DFXP, a prototype system, namely MoNaPlayer, has been implemented. Our case studies show that the proposed system is feasible to several multimedia applications.","PeriodicalId":425055,"journal":{"name":"2008 IEEE International Workshop on Semantic Computing and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Workshop on Semantic Computing and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWSCA.2008.28","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
With the increasing popularity of online video, efficient captioning and displaying the captioned text (subtitles) have also been issued with the accessibility. However, in most cases, subtitles are shown on a separate display below a screen. As a result, some viewers lose condensed information about the contents of the video. To elevate readability and visibility of viewers, in this paper, we present a framework for displaying synchronized text around a speaker in video. The proposed approach first identifies speakers using face detection technologies and subsequently detects a subtitles region. In addition, we adapt DFXP, which is interoperable timed text format of W3C, to support interchanging with existing legacy system. In order to achieve smooth playback of multimedia presentation, such as SMIL and DFXP, a prototype system, namely MoNaPlayer, has been implemented. Our case studies show that the proposed system is feasible to several multimedia applications.