{"title":"Video Text Extraction Using the Fusion of Color Gradient and Log-Gabor Filter","authors":"Zhike Zhang, Weiqiang Wang, K. Lu","doi":"10.1109/ICPR.2014.506","DOIUrl":null,"url":null,"abstract":"Video text which contains rich semantic information can be utilized for video indexing and summarization. However, compared with scanned documents, text recogniton for video text is still a challenging problem due to complex background. Segmenting text line into single characters before text extraction can achieve higher recognition accuracy, since background of single character is less complex compared with whole text line. Therefore, we first perform character segmentation, which can accurately locate the character gap in the text line. More specifically, we get a fusion map which fuses the results of color gradient and log-gabor filter. Then, candidate segmentation points are obtained by vertical projection analysis of the fusion map. We get segmentation points by finding minimum projection value of candidate points in a limited range. Finally, we get the binary image of the single character image by applying K-means clustering and combine their results to form binary image of the whole text line. The binary image is further refined by inward filling and the fusion map. The experimental results on a large amount of data show that the proposed method can contribute to better binarization result which leads to a higher character recognition rate of OCR engine.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 22nd International Conference on Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPR.2014.506","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Video text which contains rich semantic information can be utilized for video indexing and summarization. However, compared with scanned documents, text recogniton for video text is still a challenging problem due to complex background. Segmenting text line into single characters before text extraction can achieve higher recognition accuracy, since background of single character is less complex compared with whole text line. Therefore, we first perform character segmentation, which can accurately locate the character gap in the text line. More specifically, we get a fusion map which fuses the results of color gradient and log-gabor filter. Then, candidate segmentation points are obtained by vertical projection analysis of the fusion map. We get segmentation points by finding minimum projection value of candidate points in a limited range. Finally, we get the binary image of the single character image by applying K-means clustering and combine their results to form binary image of the whole text line. The binary image is further refined by inward filling and the fusion map. The experimental results on a large amount of data show that the proposed method can contribute to better binarization result which leads to a higher character recognition rate of OCR engine.