{"title":"Generative adversarial networks for handwriting image generation: a review","authors":"Randa Elanwar, Margrit Betke","doi":"10.1007/s00371-024-03534-9","DOIUrl":null,"url":null,"abstract":"<p>Handwriting synthesis, the task of automatically generating realistic images of handwritten text, has gained increasing attention in recent years, both as a challenge in itself, as well as a task that supports handwriting recognition research. The latter task is to synthesize large image datasets that can then be used to train deep learning models to recognize handwritten text without the need for human-provided annotations. While early attempts at developing handwriting generators yielded limited results [1], more recent works involving generative models of deep neural network architectures have been shown able to produce realistic imitations of human handwriting [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19]. In this review, we focus on one of the most prevalent and successful architectures in the field of handwriting synthesis, the generative adversarial network (GAN). We describe the capabilities, architecture specifics, and performance of the GAN-based models that have been introduced to the literature since 2019 [2,3,4,5,6,7,8,9,10,11,12,13,14]. These models can generate random handwriting styles, imitate reference styles, and produce realistic images of arbitrary text that was not in the training lexicon. The generated images have been shown to contribute to improving handwriting recognition results when augmenting the training samples of recognition models with synthetic images. The synthetic images were often hard to expose as non-real, even by human examiners, but also could be implausible or style-limited. The review includes a discussion of the characteristics of the GAN architecture in comparison with other paradigms in the image-generation domain and highlights the remaining challenges for handwriting synthesis.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"61 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03534-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Handwriting synthesis, the task of automatically generating realistic images of handwritten text, has gained increasing attention in recent years, both as a challenge in itself, as well as a task that supports handwriting recognition research. The latter task is to synthesize large image datasets that can then be used to train deep learning models to recognize handwritten text without the need for human-provided annotations. While early attempts at developing handwriting generators yielded limited results [1], more recent works involving generative models of deep neural network architectures have been shown able to produce realistic imitations of human handwriting [2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19]. In this review, we focus on one of the most prevalent and successful architectures in the field of handwriting synthesis, the generative adversarial network (GAN). We describe the capabilities, architecture specifics, and performance of the GAN-based models that have been introduced to the literature since 2019 [2,3,4,5,6,7,8,9,10,11,12,13,14]. These models can generate random handwriting styles, imitate reference styles, and produce realistic images of arbitrary text that was not in the training lexicon. The generated images have been shown to contribute to improving handwriting recognition results when augmenting the training samples of recognition models with synthetic images. The synthetic images were often hard to expose as non-real, even by human examiners, but also could be implausible or style-limited. The review includes a discussion of the characteristics of the GAN architecture in comparison with other paradigms in the image-generation domain and highlights the remaining challenges for handwriting synthesis.
手写合成是一项自动生成逼真手写文本图像的任务,近年来越来越受到关注,它本身既是一项挑战,也是一项支持手写识别研究的任务。后者的任务是合成大型图像数据集,然后用于训练深度学习模型来识别手写文本,而无需人类提供注释。虽然早期开发手写生成器的尝试成果有限[1],但最近涉及深度神经网络架构生成模型的工作已经证明能够生成逼真的人类手写模仿[2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19]。在本综述中,我们将重点介绍手写合成领域最流行、最成功的架构之一--生成式对抗网络 (GAN)。我们将介绍自 2019 年以来文献[2,3,4,5,6,7,8,9,10,11,12,13,14]中介绍的基于 GAN 模型的功能、架构细节和性能。这些模型可以生成随机笔迹样式、模仿参考样式,并生成不在训练词典中的任意文本的逼真图像。事实证明,在用合成图像增强识别模型的训练样本时,生成的图像有助于改善手写识别结果。合成图像通常很难被揭示为非真实图像,即使是人类检查员也很难发现,但也可能是不可信的或有风格限制的。这篇综述将 GAN 架构的特点与图像生成领域的其他范例进行了比较讨论,并强调了手写合成仍面临的挑战。