An Attention-Based End-to-End Model for Multiple Text Lines Recognition in Japanese Historical Documents

2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI:10.1109/ICDAR.2019.00106

N. Ly, C. Nguyen, M. Nakagawa

{"title":"An Attention-Based End-to-End Model for Multiple Text Lines Recognition in Japanese Historical Documents","authors":"N. Ly, C. Nguyen, M. Nakagawa","doi":"10.1109/ICDAR.2019.00106","DOIUrl":null,"url":null,"abstract":"This paper presents an attention-based convolutional sequence to sequence (ACseq2seq) model for recognizing an input image of multiple text lines from Japanese historical documents without explicit segmentation of lines. The recognition system has three main parts: a feature extractor using Convolutional Neural Network (CNN) to extract a feature sequence from an input image; an encoder employing bidirectional Long Short-Term Memory (BLSTM) to encode the feature sequence; and a decoder using a unidirectional LSTM with the attention mechanism to generate the final target text based on the attended pertinent features. We also introduce a residual LSTM network between the attention vector and softmax layer in the decoder. The system can be trained end-to-end by a standard cross-entropy loss function. In the experiment, we evaluate the performance of the ACseq2seq model on the anomalously deformed Kana datasets in the PRMU contest. The results of the experiments show that our proposed model achieves higher recognition accuracy than the state-of-the-art recognition methods on the anomalously deformed Kana datasets.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"210 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Document Analysis and Recognition (ICDAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2019.00106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

This paper presents an attention-based convolutional sequence to sequence (ACseq2seq) model for recognizing an input image of multiple text lines from Japanese historical documents without explicit segmentation of lines. The recognition system has three main parts: a feature extractor using Convolutional Neural Network (CNN) to extract a feature sequence from an input image; an encoder employing bidirectional Long Short-Term Memory (BLSTM) to encode the feature sequence; and a decoder using a unidirectional LSTM with the attention mechanism to generate the final target text based on the attended pertinent features. We also introduce a residual LSTM network between the attention vector and softmax layer in the decoder. The system can be trained end-to-end by a standard cross-entropy loss function. In the experiment, we evaluate the performance of the ACseq2seq model on the anomalously deformed Kana datasets in the PRMU contest. The results of the experiments show that our proposed model achieves higher recognition accuracy than the state-of-the-art recognition methods on the anomalously deformed Kana datasets.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于注意的日语历史文献多文本行识别的端到端模型

本文提出了一种基于注意力的卷积序列到序列(ACseq2seq)模型，用于识别日本历史文献中多文本行输入图像，而不需要明确的行分割。该识别系统有三个主要部分:使用卷积神经网络(CNN)从输入图像中提取特征序列的特征提取器;采用双向长短期记忆(BLSTM)对特征序列进行编码的编码器;以及使用具有注意机制的单向LSTM的解码器，以基于所关注的相关特征生成最终目标文本。我们还在解码器的注意向量和softmax层之间引入了残差LSTM网络。系统可以通过标准的交叉熵损失函数进行端到端训练。在实验中，我们评估了ACseq2seq模型在PRMU竞赛中异常变形的假名数据集上的性能。实验结果表明，该模型在异常变形假名数据集上取得了比现有识别方法更高的识别精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 International Conference on Document Analysis and Recognition (ICDAR)

自引率

0.00%

发文量