TextDragon: An End-to-End Framework for Arbitrary Shaped Text Spotting

2019 IEEE/CVF International Conference on Computer Vision (ICCV) Pub Date : 2019-10-01 DOI:10.1109/ICCV.2019.00917

Wei Feng, Wenhao He, Fei Yin, Xu-Yao Zhang, Cheng-Lin Liu

{"title":"TextDragon: An End-to-End Framework for Arbitrary Shaped Text Spotting","authors":"Wei Feng, Wenhao He, Fei Yin, Xu-Yao Zhang, Cheng-Lin Liu","doi":"10.1109/ICCV.2019.00917","DOIUrl":null,"url":null,"abstract":"Most existing text spotting methods either focus on horizontal/oriented texts or perform arbitrary shaped text spotting with character-level annotations. In this paper, we propose a novel text spotting framework to detect and recognize text of arbitrary shapes in an end-to-end manner, using only word/line-level annotations for training. Motivated from the name of TextSnake, which is only a detection model, we call the proposed text spotting framework TextDragon. In TextDragon, a text detector is designed to describe the shape of text with a series of quadrangles, which can handle text of arbitrary shapes. To extract arbitrary text regions from feature maps, we propose a new differentiable operator named RoISlide, which is the key to connect arbitrary shaped text detection and recognition. Based on the extracted features through RoISlide, a CNN and CTC based text recognizer is introduced to make the framework free from labeling the location of characters. The proposed method achieves state-of-the-art performance on two curved text benchmarks CTW1500 and Total-Text, and competitive results on the ICDAR 2015 Dataset.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"45 1","pages":"9075-9084"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"146","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCV.2019.00917","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 146

Abstract

Most existing text spotting methods either focus on horizontal/oriented texts or perform arbitrary shaped text spotting with character-level annotations. In this paper, we propose a novel text spotting framework to detect and recognize text of arbitrary shapes in an end-to-end manner, using only word/line-level annotations for training. Motivated from the name of TextSnake, which is only a detection model, we call the proposed text spotting framework TextDragon. In TextDragon, a text detector is designed to describe the shape of text with a series of quadrangles, which can handle text of arbitrary shapes. To extract arbitrary text regions from feature maps, we propose a new differentiable operator named RoISlide, which is the key to connect arbitrary shaped text detection and recognition. Based on the extracted features through RoISlide, a CNN and CTC based text recognizer is introduced to make the framework free from labeling the location of characters. The proposed method achieves state-of-the-art performance on two curved text benchmarks CTW1500 and Total-Text, and competitive results on the ICDAR 2015 Dataset.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

TextDragon:一个端到端的框架，用于任意形状的文本识别

大多数现有的文本识别方法要么专注于水平/方向文本，要么使用字符级注释执行任意形状的文本识别。在本文中，我们提出了一种新的文本识别框架，以端到端方式检测和识别任意形状的文本，仅使用单词/行级别的注释进行训练。由于TextSnake只是一个检测模型，我们将提出的文本识别框架称为TextDragon。在TextDragon中，一个文本检测器被设计成用一系列四边形来描述文本的形状，它可以处理任意形状的文本。为了从特征映射中提取任意文本区域，我们提出了一种新的可微算子RoISlide，它是连接任意形状文本检测和识别的关键。基于RoISlide提取的特征，引入了一种基于CNN和CTC的文本识别器，使框架不需要标注字符的位置。该方法在两个曲线文本基准CTW1500和Total-Text上取得了最先进的性能，并在ICDAR 2015数据集上取得了竞争结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

自引率

0.00%

发文量

期刊最新文献

Very Long Natural Scenery Image Prediction by Outpainting VTNFP: An Image-Based Virtual Try-On Network With Body and Clothing Feature Preservation Towards Latent Attribute Discovery From Triplet Similarities Gaze360: Physically Unconstrained Gaze Estimation in the Wild Attention Bridging Network for Knowledge Transfer