基于广义颜色增强对比极值区域和神经网络的自然场景图像鲁棒文本检测

2014 22nd International Conference on Pattern Recognition Pub Date : 2014-08-24 DOI:10.1109/ICPR.2014.469

Lei Sun, Qiang Huo, Wei Jia, Kai Chen

{"title":"基于广义颜色增强对比极值区域和神经网络的自然场景图像鲁棒文本检测","authors":"Lei Sun, Qiang Huo, Wei Jia, Kai Chen","doi":"10.1109/ICPR.2014.469","DOIUrl":null,"url":null,"abstract":"This paper presents a robust text detection approach based on generalized color-enhanced contrasting extremal region (CER) and neural networks. Given a color natural scene image, six component-trees are built from its gray scale image, hue and saturation channel images in a perception-based illumination invariant color space, and their inverted images, respectively. From each component-tree, generalized color-enhanced CERs are extracted as character candidates. By using a \"divide-and-conquer\" strategy, each candidate image patch is labeled reliably by rules as one of five types, namely, Long, Thin, Fill, Square-large and Square-small, and classified as text or non-text by a corresponding neural network, which is trained by an ambiguity-free learning strategy. After pruning non-text components, repeating components in each component-tree are pruned by using color and area information to obtain a component graph, from which candidate text-lines are formed and verified by another set of neural networks. Finally, results from six component-trees are combined, and a post-processing step is used to recover lost characters and split text lines into words as appropriate. Our proposed method achieves 85.72% recall, 87.03% precision, and 86.37% F-score on ICDAR-2013 \"Reading Text in Scene Images\" test set.","PeriodicalId":142159,"journal":{"name":"2014 22nd International Conference on Pattern Recognition","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"43","resultStr":"{\"title\":\"Robust Text Detection in Natural Scene Images by Generalized Color-Enhanced Contrasting Extremal Region and Neural Networks\",\"authors\":\"Lei Sun, Qiang Huo, Wei Jia, Kai Chen\",\"doi\":\"10.1109/ICPR.2014.469\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a robust text detection approach based on generalized color-enhanced contrasting extremal region (CER) and neural networks. Given a color natural scene image, six component-trees are built from its gray scale image, hue and saturation channel images in a perception-based illumination invariant color space, and their inverted images, respectively. From each component-tree, generalized color-enhanced CERs are extracted as character candidates. By using a \\\"divide-and-conquer\\\" strategy, each candidate image patch is labeled reliably by rules as one of five types, namely, Long, Thin, Fill, Square-large and Square-small, and classified as text or non-text by a corresponding neural network, which is trained by an ambiguity-free learning strategy. After pruning non-text components, repeating components in each component-tree are pruned by using color and area information to obtain a component graph, from which candidate text-lines are formed and verified by another set of neural networks. Finally, results from six component-trees are combined, and a post-processing step is used to recover lost characters and split text lines into words as appropriate. Our proposed method achieves 85.72% recall, 87.03% precision, and 86.37% F-score on ICDAR-2013 \\\"Reading Text in Scene Images\\\" test set.\",\"PeriodicalId\":142159,\"journal\":{\"name\":\"2014 22nd International Conference on Pattern Recognition\",\"volume\":\"68 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-08-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"43\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 22nd International Conference on Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPR.2014.469\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 22nd International Conference on Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPR.2014.469","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 43

摘要

提出了一种基于广义颜色增强对比极值区域(CER)和神经网络的鲁棒文本检测方法。给定一幅彩色自然场景图像，分别从其灰度图像、基于感知的照明不变色彩空间中的色调和饱和度通道图像及其倒立图像构建6个分量树。从每个组件树中提取广义颜色增强cer作为候选字符。通过“分而治之”策略，每个候选图像patch被规则可靠地标记为Long, Thin, Fill, Square-large和Square-small五种类型之一，并由相应的神经网络分类为文本或非文本，该神经网络通过无歧义学习策略进行训练。在对非文本成分进行剪枝后，利用颜色和面积信息对每个成分树中的重复成分进行剪枝，得到成分图，形成候选文本行，并由另一组神经网络进行验证。最后，对来自六个组件树的结果进行组合，并使用后处理步骤来恢复丢失的字符，并根据需要将文本行拆分为单词。该方法在ICDAR-2013“场景图像中阅读文本”测试集上达到了85.72%的召回率、87.03%的准确率和86.37%的f分。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Robust Text Detection in Natural Scene Images by Generalized Color-Enhanced Contrasting Extremal Region and Neural Networks

This paper presents a robust text detection approach based on generalized color-enhanced contrasting extremal region (CER) and neural networks. Given a color natural scene image, six component-trees are built from its gray scale image, hue and saturation channel images in a perception-based illumination invariant color space, and their inverted images, respectively. From each component-tree, generalized color-enhanced CERs are extracted as character candidates. By using a "divide-and-conquer" strategy, each candidate image patch is labeled reliably by rules as one of five types, namely, Long, Thin, Fill, Square-large and Square-small, and classified as text or non-text by a corresponding neural network, which is trained by an ambiguity-free learning strategy. After pruning non-text components, repeating components in each component-tree are pruned by using color and area information to obtain a component graph, from which candidate text-lines are formed and verified by another set of neural networks. Finally, results from six component-trees are combined, and a post-processing step is used to recover lost characters and split text lines into words as appropriate. Our proposed method achieves 85.72% recall, 87.03% precision, and 86.37% F-score on ICDAR-2013 "Reading Text in Scene Images" test set.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 22nd International Conference on Pattern Recognition

自引率

0.00%

发文量

期刊最新文献

Real-Time Tracking via Deformable Structure Regression Learning Traffic Camera Anomaly Detection Velocity-Based Multiple Change-Point Inference for Unsupervised Segmentation of Human Movement Behavior Volume Reconstruction for MRI Anomaly Detection through Spatio-temporal Context Modeling in Crowded Scenes