{"title":"一种高效的视频文本检测粗到精方案","authors":"Liuan Wang, Lin-Lin Huang, Yang Wu","doi":"10.1109/ACPR.2011.6166605","DOIUrl":null,"url":null,"abstract":"To achieve fast and accurate text detection from videos, we propose an efficient coarse-to-fine scheme comprising three stages: key frame extraction, candidate text line detection and fine text detection. Key frames, which are assumed to carry texts, are extracted based on multi-threshold difference of color histogram (MDCH). From the key frames, candidate text lines are detected by morphological operations and connected component analysis. Sliding window classification is performed on the candidate text lines so as to detect refined text lines. We use two types of features: histogram of gradients (HOG) and local assembled binary (LAB), and two classifiers: Real Adaboost and polynomial neural network (PNN), for improving the classification accuracy. The effectiveness of the proposed method has been demonstrated by the experiment results on a large video dataset. Also, the benefits of key frame extraction and combining multiple features and classifiers have been justified.","PeriodicalId":287232,"journal":{"name":"The First Asian Conference on Pattern Recognition","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"An efficient coarse-to-fine scheme for text detection in videos\",\"authors\":\"Liuan Wang, Lin-Lin Huang, Yang Wu\",\"doi\":\"10.1109/ACPR.2011.6166605\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To achieve fast and accurate text detection from videos, we propose an efficient coarse-to-fine scheme comprising three stages: key frame extraction, candidate text line detection and fine text detection. Key frames, which are assumed to carry texts, are extracted based on multi-threshold difference of color histogram (MDCH). From the key frames, candidate text lines are detected by morphological operations and connected component analysis. Sliding window classification is performed on the candidate text lines so as to detect refined text lines. We use two types of features: histogram of gradients (HOG) and local assembled binary (LAB), and two classifiers: Real Adaboost and polynomial neural network (PNN), for improving the classification accuracy. The effectiveness of the proposed method has been demonstrated by the experiment results on a large video dataset. Also, the benefits of key frame extraction and combining multiple features and classifiers have been justified.\",\"PeriodicalId\":287232,\"journal\":{\"name\":\"The First Asian Conference on Pattern Recognition\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The First Asian Conference on Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ACPR.2011.6166605\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The First Asian Conference on Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACPR.2011.6166605","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An efficient coarse-to-fine scheme for text detection in videos
To achieve fast and accurate text detection from videos, we propose an efficient coarse-to-fine scheme comprising three stages: key frame extraction, candidate text line detection and fine text detection. Key frames, which are assumed to carry texts, are extracted based on multi-threshold difference of color histogram (MDCH). From the key frames, candidate text lines are detected by morphological operations and connected component analysis. Sliding window classification is performed on the candidate text lines so as to detect refined text lines. We use two types of features: histogram of gradients (HOG) and local assembled binary (LAB), and two classifiers: Real Adaboost and polynomial neural network (PNN), for improving the classification accuracy. The effectiveness of the proposed method has been demonstrated by the experiment results on a large video dataset. Also, the benefits of key frame extraction and combining multiple features and classifiers have been justified.