Rui Zhu, Xiao-Jiao Mao, Qi-Hai Zhu, Ning Li, Yubin Yang
{"title":"Text detection based on convolutional neural networks with spatial pyramid pooling","authors":"Rui Zhu, Xiao-Jiao Mao, Qi-Hai Zhu, Ning Li, Yubin Yang","doi":"10.1109/ICIP.2016.7532514","DOIUrl":null,"url":null,"abstract":"Text detection is a difficult task due to the significant diversity of the texts appearing in natural scene images. In this paper, we propose a novel text descriptor, SPP-net, extracted by equipping the Convolutional Neural Network (CNN) with spatial pyramid pooling. We first compute the feature maps from the original text lines without any cropping or warping, and then generate the fixed-size representations for text discrimination. Experimental results on the latest ICDAR 2011 and 2013 datasets have proven that the proposed descriptor outperforms the state-of-the-art methods by a noticeable margin on F-measure with its merit of incorporating multi-scale text information and its flexibility of describing text regions with different sizes and shapes.","PeriodicalId":6521,"journal":{"name":"2016 IEEE International Conference on Image Processing (ICIP)","volume":"167 1","pages":"1032-1036"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Image Processing (ICIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIP.2016.7532514","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15
Abstract
Text detection is a difficult task due to the significant diversity of the texts appearing in natural scene images. In this paper, we propose a novel text descriptor, SPP-net, extracted by equipping the Convolutional Neural Network (CNN) with spatial pyramid pooling. We first compute the feature maps from the original text lines without any cropping or warping, and then generate the fixed-size representations for text discrimination. Experimental results on the latest ICDAR 2011 and 2013 datasets have proven that the proposed descriptor outperforms the state-of-the-art methods by a noticeable margin on F-measure with its merit of incorporating multi-scale text information and its flexibility of describing text regions with different sizes and shapes.