论八度卷积递归神经网络对手写文本行识别的改进

IF 1.8 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE International Journal on Document Analysis and Recognition Pub Date : 2024-02-20 DOI:10.1007/s10032-024-00460-3
Dayvid Castro, Cleber Zanchettin, Luís A. Nunes Amaral
{"title":"论八度卷积递归神经网络对手写文本行识别的改进","authors":"Dayvid Castro, Cleber Zanchettin, Luís A. Nunes Amaral","doi":"10.1007/s10032-024-00460-3","DOIUrl":null,"url":null,"abstract":"<p>Off-line handwritten text recognition (HTR) poses a significant challenge due to the complexities of variable handwriting styles, background degradation, and unconstrained word sequences. This work tackles the handwritten text line recognition problem using octave convolutional recurrent neural networks (OctCRNN). Our approach requires no word segmentation, preprocessing, or explicit feature extraction and leverages octave convolutions to process multiscale features without increasing the number of learnable parameters. We investigate the OctCRNN under different settings, including an octave design that efficiently balances computational cost and recognition performance. We thoroughly investigate the OctCRNN under different settings by formulating an experimental pipeline with a visualization step to get intuitions about how the model works compared to a counterpart based on traditional convolutions. The system becomes complete by adding a language model to increase linguistic knowledge. Finally, we assess the performance of our solution using character and word error rates against established handwritten text recognition benchmarks: IAM, RIMES, and ICFHR 2016 READ. According to the results, our proposal achieves state-of-the-art performance while reducing the computational requirements. Our findings suggest that the architecture provides a robust framework for building HTR systems.</p>","PeriodicalId":50277,"journal":{"name":"International Journal on Document Analysis and Recognition","volume":"3 1","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On the improvement of handwritten text line recognition with octave convolutional recurrent neural networks\",\"authors\":\"Dayvid Castro, Cleber Zanchettin, Luís A. Nunes Amaral\",\"doi\":\"10.1007/s10032-024-00460-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Off-line handwritten text recognition (HTR) poses a significant challenge due to the complexities of variable handwriting styles, background degradation, and unconstrained word sequences. This work tackles the handwritten text line recognition problem using octave convolutional recurrent neural networks (OctCRNN). Our approach requires no word segmentation, preprocessing, or explicit feature extraction and leverages octave convolutions to process multiscale features without increasing the number of learnable parameters. We investigate the OctCRNN under different settings, including an octave design that efficiently balances computational cost and recognition performance. We thoroughly investigate the OctCRNN under different settings by formulating an experimental pipeline with a visualization step to get intuitions about how the model works compared to a counterpart based on traditional convolutions. The system becomes complete by adding a language model to increase linguistic knowledge. Finally, we assess the performance of our solution using character and word error rates against established handwritten text recognition benchmarks: IAM, RIMES, and ICFHR 2016 READ. According to the results, our proposal achieves state-of-the-art performance while reducing the computational requirements. Our findings suggest that the architecture provides a robust framework for building HTR systems.</p>\",\"PeriodicalId\":50277,\"journal\":{\"name\":\"International Journal on Document Analysis and Recognition\",\"volume\":\"3 1\",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2024-02-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal on Document Analysis and Recognition\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s10032-024-00460-3\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal on Document Analysis and Recognition","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10032-024-00460-3","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

离线手写文本识别(HTR)因手写风格多变、背景退化和无约束单词序列等复杂问题而面临巨大挑战。这项研究利用八度卷积递归神经网络(OctCRNN)解决了手写文本行识别问题。我们的方法无需进行单词分割、预处理或显式特征提取,并利用倍频卷积处理多尺度特征,同时不增加可学习参数的数量。我们研究了不同设置下的 OctCRNN,包括有效平衡计算成本和识别性能的倍频程设计。我们通过制定一个具有可视化步骤的实验流水线,深入研究了 OctCRNN 在不同设置下的工作原理。通过添加语言模型来增加语言知识,系统变得更加完整。最后,我们使用字符和单词错误率评估了我们的解决方案与既定手写文本识别基准(IAM、RIMES 和 ICFHR 2016 READ)的性能。结果表明,我们的方案在降低计算要求的同时实现了最先进的性能。我们的研究结果表明,该架构为构建 HTR 系统提供了一个稳健的框架。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
On the improvement of handwritten text line recognition with octave convolutional recurrent neural networks

Off-line handwritten text recognition (HTR) poses a significant challenge due to the complexities of variable handwriting styles, background degradation, and unconstrained word sequences. This work tackles the handwritten text line recognition problem using octave convolutional recurrent neural networks (OctCRNN). Our approach requires no word segmentation, preprocessing, or explicit feature extraction and leverages octave convolutions to process multiscale features without increasing the number of learnable parameters. We investigate the OctCRNN under different settings, including an octave design that efficiently balances computational cost and recognition performance. We thoroughly investigate the OctCRNN under different settings by formulating an experimental pipeline with a visualization step to get intuitions about how the model works compared to a counterpart based on traditional convolutions. The system becomes complete by adding a language model to increase linguistic knowledge. Finally, we assess the performance of our solution using character and word error rates against established handwritten text recognition benchmarks: IAM, RIMES, and ICFHR 2016 READ. According to the results, our proposal achieves state-of-the-art performance while reducing the computational requirements. Our findings suggest that the architecture provides a robust framework for building HTR systems.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International Journal on Document Analysis and Recognition
International Journal on Document Analysis and Recognition 工程技术-计算机:人工智能
CiteScore
6.20
自引率
4.30%
发文量
30
审稿时长
7.5 months
期刊介绍: The large number of existing documents and the production of a multitude of new ones every year raise important issues in efficient handling, retrieval and storage of these documents and the information which they contain. This has led to the emergence of new research domains dealing with the recognition by computers of the constituent elements of documents - including characters, symbols, text, lines, graphics, images, handwriting, signatures, etc. In addition, these new domains deal with automatic analyses of the overall physical and logical structures of documents, with the ultimate objective of a high-level understanding of their semantic content. We have also seen renewed interest in optical character recognition (OCR) and handwriting recognition during the last decade. Document analysis and recognition are obviously the next stage. Automatic, intelligent processing of documents is at the intersections of many fields of research, especially of computer vision, image analysis, pattern recognition and artificial intelligence, as well as studies on reading, handwriting and linguistics. Although quality document related publications continue to appear in journals dedicated to these domains, the community will benefit from having this journal as a focal point for archival literature dedicated to document analysis and recognition.
期刊最新文献
A survey on artificial intelligence-based approaches for personality analysis from handwritten documents In-domain versus out-of-domain transfer learning for document layout analysis Deep learning-based modified-EAST scene text detector: insights from a novel multiscript dataset Towards fully automated processing and analysis of construction diagrams: AI-powered symbol detection GAN-based text line segmentation method for challenging handwritten documents
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1