{"title":"An Attention-Based End-to-End Model for Multiple Text Lines Recognition in Japanese Historical Documents","authors":"N. Ly, C. Nguyen, M. Nakagawa","doi":"10.1109/ICDAR.2019.00106","DOIUrl":null,"url":null,"abstract":"This paper presents an attention-based convolutional sequence to sequence (ACseq2seq) model for recognizing an input image of multiple text lines from Japanese historical documents without explicit segmentation of lines. The recognition system has three main parts: a feature extractor using Convolutional Neural Network (CNN) to extract a feature sequence from an input image; an encoder employing bidirectional Long Short-Term Memory (BLSTM) to encode the feature sequence; and a decoder using a unidirectional LSTM with the attention mechanism to generate the final target text based on the attended pertinent features. We also introduce a residual LSTM network between the attention vector and softmax layer in the decoder. The system can be trained end-to-end by a standard cross-entropy loss function. In the experiment, we evaluate the performance of the ACseq2seq model on the anomalously deformed Kana datasets in the PRMU contest. The results of the experiments show that our proposed model achieves higher recognition accuracy than the state-of-the-art recognition methods on the anomalously deformed Kana datasets.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"210 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Document Analysis and Recognition (ICDAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2019.00106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
This paper presents an attention-based convolutional sequence to sequence (ACseq2seq) model for recognizing an input image of multiple text lines from Japanese historical documents without explicit segmentation of lines. The recognition system has three main parts: a feature extractor using Convolutional Neural Network (CNN) to extract a feature sequence from an input image; an encoder employing bidirectional Long Short-Term Memory (BLSTM) to encode the feature sequence; and a decoder using a unidirectional LSTM with the attention mechanism to generate the final target text based on the attended pertinent features. We also introduce a residual LSTM network between the attention vector and softmax layer in the decoder. The system can be trained end-to-end by a standard cross-entropy loss function. In the experiment, we evaluate the performance of the ACseq2seq model on the anomalously deformed Kana datasets in the PRMU contest. The results of the experiments show that our proposed model achieves higher recognition accuracy than the state-of-the-art recognition methods on the anomalously deformed Kana datasets.