Training transformer architectures on few annotated data: an application to historical handwritten text recognition

IF 1.8 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE International Journal on Document Analysis and Recognition Pub Date : 2024-01-25 DOI:10.1007/s10032-023-00459-2

Killian Barrere, Yann Soullard, Aurélie Lemaitre, Bertrand Coüasnon

{"title":"Training transformer architectures on few annotated data: an application to historical handwritten text recognition","authors":"Killian Barrere, Yann Soullard, Aurélie Lemaitre, Bertrand Coüasnon","doi":"10.1007/s10032-023-00459-2","DOIUrl":null,"url":null,"abstract":"<p>Transformer-based architectures show excellent results on the task of handwritten text recognition, becoming the standard architecture for modern datasets. However, they require a significant amount of annotated data to achieve competitive results. They typically rely on synthetic data to solve this problem. Historical handwritten text recognition represents a challenging task due to degradations, specific handwritings for which few examples are available and ancient languages that vary over time. These limitations also make it difficult to generate realistic synthetic data. Given sufficient and appropriate data, Transformer-based architectures could alleviate these concerns, thanks to their ability to have a global view of textual images and their language modeling capabilities. In this paper, we propose the use of a lightweight Transformer model to tackle the task of historical handwritten text recognition. To train the architecture, we introduce realistic looking synthetic data reproducing the style of historical handwritings. We present a specific strategy, both for training and prediction, to deal with historical documents, where only a limited amount of training data are available. We evaluate our approach on the ICFHR 2018 READ dataset which is dedicated to handwriting recognition in specific historical documents. The results show that our Transformer-based approach is able to outperform existing methods.</p>","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":"26 1","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal on Document Analysis and Recognition","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10032-023-00459-2","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Transformer-based architectures show excellent results on the task of handwritten text recognition, becoming the standard architecture for modern datasets. However, they require a significant amount of annotated data to achieve competitive results. They typically rely on synthetic data to solve this problem. Historical handwritten text recognition represents a challenging task due to degradations, specific handwritings for which few examples are available and ancient languages that vary over time. These limitations also make it difficult to generate realistic synthetic data. Given sufficient and appropriate data, Transformer-based architectures could alleviate these concerns, thanks to their ability to have a global view of textual images and their language modeling capabilities. In this paper, we propose the use of a lightweight Transformer model to tackle the task of historical handwritten text recognition. To train the architecture, we introduce realistic looking synthetic data reproducing the style of historical handwritings. We present a specific strategy, both for training and prediction, to deal with historical documents, where only a limited amount of training data are available. We evaluate our approach on the ICFHR 2018 READ dataset which is dedicated to handwriting recognition in specific historical documents. The results show that our Transformer-based approach is able to outperform existing methods.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在少量注释数据上训练转换器架构：应用于历史手写文本识别

基于变换器的架构在手写文本识别任务中显示出卓越的效果，已成为现代数据集的标准架构。然而，它们需要大量的注释数据才能获得有竞争力的结果。它们通常依靠合成数据来解决这个问题。历史手写文本识别是一项具有挑战性的任务，原因包括退化、可用示例很少的特定手写体以及随时间变化的古代语言。这些局限性也使得生成真实的合成数据变得困难。如果有足够和适当的数据，基于变换器的架构可以缓解这些问题，这要归功于它们对文本图像的全局视图能力和语言建模能力。在本文中，我们建议使用轻量级 Transformer 模型来处理历史手写文本识别任务。为了训练该架构，我们引入了逼真的合成数据，再现了历史手写体的风格。我们提出了一种用于训练和预测的特定策略，以处理训练数据量有限的历史文件。我们在 ICFHR 2018 READ 数据集上评估了我们的方法，该数据集专门用于特定历史文件中的手写识别。结果表明，我们基于变换器的方法能够超越现有方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International Journal on Document Analysis and Recognition 工程技术-计算机：人工智能

CiteScore

6.20

自引率

4.30%

发文量

审稿时长

7.5 months

期刊介绍： The large number of existing documents and the production of a multitude of new ones every year raise important issues in efficient handling, retrieval and storage of these documents and the information which they contain. This has led to the emergence of new research domains dealing with the recognition by computers of the constituent elements of documents - including characters, symbols, text, lines, graphics, images, handwriting, signatures, etc. In addition, these new domains deal with automatic analyses of the overall physical and logical structures of documents, with the ultimate objective of a high-level understanding of their semantic content. We have also seen renewed interest in optical character recognition (OCR) and handwriting recognition during the last decade. Document analysis and recognition are obviously the next stage. Automatic, intelligent processing of documents is at the intersections of many fields of research, especially of computer vision, image analysis, pattern recognition and artificial intelligence, as well as studies on reading, handwriting and linguistics. Although quality document related publications continue to appear in journals dedicated to these domains, the community will benefit from having this journal as a focal point for archival literature dedicated to document analysis and recognition.