Deformity removal from handwritten text documents using variable cycle GAN

IF 2.5 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE International Journal on Document Analysis and Recognition Pub Date : 2024-05-07 DOI:10.1007/s10032-024-00466-x

Shivangi Nigam, Adarsh Prasad Behera, Shekhar Verma, P. Nagabhushan

{"title":"Deformity removal from handwritten text documents using variable cycle GAN","authors":"Shivangi Nigam, Adarsh Prasad Behera, Shekhar Verma, P. Nagabhushan","doi":"10.1007/s10032-024-00466-x","DOIUrl":null,"url":null,"abstract":"Text recognition systems typically work well for printed documents but struggle with handwritten documents due to different writing styles, background complexities, added noise of image acquisition methods, and deformed text images such as strike-offs and underlines. These deformities change the structural information, making it difficult to restore the deformed images while maintaining the structural information and preserving the semantic dependencies of the local pixels. Current adversarial networks are unable to preserve the structural and semantic dependencies as they focus on individual pixel-to-pixel variation and encourage non-meaningful aspects of the images. To address this, we propose a Variable Cycle Generative Adversarial Network (VCGAN) that considers the perceptual quality of the images. By using a variable Content Loss (Top-k Variable Loss (\\(TV_{k}\\)) ), VCGAN preserves the inter-dependence of spatially close pixels while removing the strike-off strokes. The similarity of the images is computed with \\(TV_{k}\\) considering the intensity variations that do not interfere with the semantic structures of the image. Our results show that VCGAN can remove most deformities with an elevated F1 score of \\(97.40 \\%\\) and outperforms current state-of-the-art algorithms with a character error rate of \\(7.64 \\%\\) and word accuracy of \\(81.53 \\%\\) when tested on the handwritten text recognition system","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":"18 1","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal on Document Analysis and Recognition","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10032-024-00466-x","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Text recognition systems typically work well for printed documents but struggle with handwritten documents due to different writing styles, background complexities, added noise of image acquisition methods, and deformed text images such as strike-offs and underlines. These deformities change the structural information, making it difficult to restore the deformed images while maintaining the structural information and preserving the semantic dependencies of the local pixels. Current adversarial networks are unable to preserve the structural and semantic dependencies as they focus on individual pixel-to-pixel variation and encourage non-meaningful aspects of the images. To address this, we propose a Variable Cycle Generative Adversarial Network (VCGAN) that considers the perceptual quality of the images. By using a variable Content Loss (Top-k Variable Loss (\(TV_{k}\)) ), VCGAN preserves the inter-dependence of spatially close pixels while removing the strike-off strokes. The similarity of the images is computed with \(TV_{k}\) considering the intensity variations that do not interfere with the semantic structures of the image. Our results show that VCGAN can remove most deformities with an elevated F1 score of \(97.40 \%\) and outperforms current state-of-the-art algorithms with a character error rate of \(7.64 \%\) and word accuracy of \(81.53 \%\) when tested on the handwritten text recognition system

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用可变周期 GAN 从手写文本文档中去除畸形

文本识别系统通常能很好地识别印刷文件，但在识别手写文件时却很困难，原因包括书写风格不同、背景复杂、图像采集方法增加了噪音，以及文本图像（如删除线和下划线）变形。这些变形改变了结构信息，因此很难在还原变形图像的同时保持结构信息和局部像素的语义依赖性。目前的对抗网络无法保留结构和语义依赖性，因为它们只关注单个像素间的变化，并鼓励图像的非意义方面。为此，我们提出了一种考虑图像感知质量的可变周期生成对抗网络（VCGAN）。通过使用可变内容损失（Top-k Variable Loss (\(TV_{k}\)) ），VCGAN 保留了空间上相近像素的相互依存性，同时消除了剔除笔画。使用 \(TV_{k}\) 计算图像的相似度时，会考虑不干扰图像语义结构的强度变化。我们的研究结果表明，在手写文本识别系统上进行测试时，VCGAN 可以去除大多数变形，F1 分数高达 97.40 分，并且优于当前最先进的算法，其字符错误率为 7.64 分，单词准确率为 81.53 分。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International Journal on Document Analysis and Recognition 工程技术-计算机：人工智能

CiteScore

6.20

自引率

4.30%

发文量

审稿时长

7.5 months

期刊介绍： The large number of existing documents and the production of a multitude of new ones every year raise important issues in efficient handling, retrieval and storage of these documents and the information which they contain. This has led to the emergence of new research domains dealing with the recognition by computers of the constituent elements of documents - including characters, symbols, text, lines, graphics, images, handwriting, signatures, etc. In addition, these new domains deal with automatic analyses of the overall physical and logical structures of documents, with the ultimate objective of a high-level understanding of their semantic content. We have also seen renewed interest in optical character recognition (OCR) and handwriting recognition during the last decade. Document analysis and recognition are obviously the next stage. Automatic, intelligent processing of documents is at the intersections of many fields of research, especially of computer vision, image analysis, pattern recognition and artificial intelligence, as well as studies on reading, handwriting and linguistics. Although quality document related publications continue to appear in journals dedicated to these domains, the community will benefit from having this journal as a focal point for archival literature dedicated to document analysis and recognition.