{"title":"Arabic Historical Documents Layout Analysis using Mask RCNN","authors":"Latifa Aljiffry , Hassanin Al-Barhamtoshy , Felwa Abukhodair , Amani Jamal","doi":"10.1016/j.procs.2024.10.220","DOIUrl":null,"url":null,"abstract":"<div><div>In recent times, there has been a notable surge in the interest of researchers in the realm of document analysis and optical character recognition (OCR). Significant advancements have been made in OCR engines across various languages, encompassing both printed and handwritten documents. However, there has been a comparatively lower focus on processing documents written in Arabic when juxtaposed with languages like English. This discrepancy arises from several factors, including the inherent challenges posed by the Arabic language and the limited availability of Arabic document datasets. To implement any OCR engine, the initial step involves analyzing the layout of images before subjecting them to the OCR process. This thesis specifically delves into the realm of layout analysis for historical Arabic documents, employing a deep learning (DL) approach. The chosen methodology utilizes the Mask Region-based Convolutional Neural Network (RCNN). The dataset employed consists of historical Arabic documents, particularly early printed ones, each characterized by unique sizes, structures, and processing prerequisites. Processing historical documents is inherently more challenging due to factors such as the document's layout structure, distinctive handwriting styles of the authors, paper aging, historical timeframe, ink properties, and more. The achieved accuracy result is 51.14%. When juxtaposed with other existing models, it becomes evident that this work attains a state-of-the-art status, showcasing an impressive outcome.</div></div>","PeriodicalId":20465,"journal":{"name":"Procedia Computer Science","volume":"244 ","pages":"Pages 453-460"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Procedia Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1877050924030217","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In recent times, there has been a notable surge in the interest of researchers in the realm of document analysis and optical character recognition (OCR). Significant advancements have been made in OCR engines across various languages, encompassing both printed and handwritten documents. However, there has been a comparatively lower focus on processing documents written in Arabic when juxtaposed with languages like English. This discrepancy arises from several factors, including the inherent challenges posed by the Arabic language and the limited availability of Arabic document datasets. To implement any OCR engine, the initial step involves analyzing the layout of images before subjecting them to the OCR process. This thesis specifically delves into the realm of layout analysis for historical Arabic documents, employing a deep learning (DL) approach. The chosen methodology utilizes the Mask Region-based Convolutional Neural Network (RCNN). The dataset employed consists of historical Arabic documents, particularly early printed ones, each characterized by unique sizes, structures, and processing prerequisites. Processing historical documents is inherently more challenging due to factors such as the document's layout structure, distinctive handwriting styles of the authors, paper aging, historical timeframe, ink properties, and more. The achieved accuracy result is 51.14%. When juxtaposed with other existing models, it becomes evident that this work attains a state-of-the-art status, showcasing an impressive outcome.