Arabic Historical Documents Layout Analysis using Mask RCNN

Procedia Computer Science Pub Date : 2024-01-01 Epub Date: 2024-10-25 DOI:10.1016/j.procs.2024.10.220

Latifa Aljiffry , Hassanin Al-Barhamtoshy , Felwa Abukhodair , Amani Jamal

{"title":"Arabic Historical Documents Layout Analysis using Mask RCNN","authors":"Latifa Aljiffry , Hassanin Al-Barhamtoshy , Felwa Abukhodair , Amani Jamal","doi":"10.1016/j.procs.2024.10.220","DOIUrl":null,"url":null,"abstract":"<div><div>In recent times, there has been a notable surge in the interest of researchers in the realm of document analysis and optical character recognition (OCR). Significant advancements have been made in OCR engines across various languages, encompassing both printed and handwritten documents. However, there has been a comparatively lower focus on processing documents written in Arabic when juxtaposed with languages like English. This discrepancy arises from several factors, including the inherent challenges posed by the Arabic language and the limited availability of Arabic document datasets. To implement any OCR engine, the initial step involves analyzing the layout of images before subjecting them to the OCR process. This thesis specifically delves into the realm of layout analysis for historical Arabic documents, employing a deep learning (DL) approach. The chosen methodology utilizes the Mask Region-based Convolutional Neural Network (RCNN). The dataset employed consists of historical Arabic documents, particularly early printed ones, each characterized by unique sizes, structures, and processing prerequisites. Processing historical documents is inherently more challenging due to factors such as the document's layout structure, distinctive handwriting styles of the authors, paper aging, historical timeframe, ink properties, and more. The achieved accuracy result is 51.14%. When juxtaposed with other existing models, it becomes evident that this work attains a state-of-the-art status, showcasing an impressive outcome.</div></div>","PeriodicalId":20465,"journal":{"name":"Procedia Computer Science","volume":"244 ","pages":"Pages 453-460"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Procedia Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1877050924030217","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/25 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In recent times, there has been a notable surge in the interest of researchers in the realm of document analysis and optical character recognition (OCR). Significant advancements have been made in OCR engines across various languages, encompassing both printed and handwritten documents. However, there has been a comparatively lower focus on processing documents written in Arabic when juxtaposed with languages like English. This discrepancy arises from several factors, including the inherent challenges posed by the Arabic language and the limited availability of Arabic document datasets. To implement any OCR engine, the initial step involves analyzing the layout of images before subjecting them to the OCR process. This thesis specifically delves into the realm of layout analysis for historical Arabic documents, employing a deep learning (DL) approach. The chosen methodology utilizes the Mask Region-based Convolutional Neural Network (RCNN). The dataset employed consists of historical Arabic documents, particularly early printed ones, each characterized by unique sizes, structures, and processing prerequisites. Processing historical documents is inherently more challenging due to factors such as the document's layout structure, distinctive handwriting styles of the authors, paper aging, historical timeframe, ink properties, and more. The achieved accuracy result is 51.14%. When juxtaposed with other existing models, it becomes evident that this work attains a state-of-the-art status, showcasing an impressive outcome.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用掩码 RCNN 进行阿拉伯语历史文献布局分析

近来，研究人员对文档分析和光学字符识别（OCR）领域的兴趣明显增加。各种语言的光学字符识别引擎都取得了长足的进步，包括印刷文件和手写文件。然而，与英语等语言相比，人们对阿拉伯语文档处理的关注度相对较低。造成这种差异的因素有很多，包括阿拉伯语本身带来的挑战和阿拉伯语文档数据集的有限性。要实现任何 OCR 引擎，第一步都要先分析图像的布局，然后再对其进行 OCR 处理。本论文采用深度学习（DL）方法，专门研究阿拉伯语历史文献的布局分析。所选方法利用了基于掩码区域的卷积神经网络（RCNN）。所使用的数据集由阿拉伯语历史文献组成，尤其是早期印刷文献，每种文献都有独特的尺寸、结构和处理前提。由于文件的版面结构、作者独特的手写风格、纸张老化、历史时限、油墨属性等因素，处理历史文件本身就更具挑战性。所达到的准确率为 51.14%。与其他现有模型相比，这项工作显然达到了最先进的水平，展示了令人印象深刻的成果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊