Anna-Maria Sichani, Panagiotis Kaddas, Georgios K. Mikros, B. Gatos
{"title":"OCR for Greek polytonic (multi accent) historical printed documents: development, optimization and quality control","authors":"Anna-Maria Sichani, Panagiotis Kaddas, Georgios K. Mikros, B. Gatos","doi":"10.1145/3322905.3322926","DOIUrl":null,"url":null,"abstract":"This paper presents the development and implementation of a robust OCR tool and a related comprehensive workflow for the recognition of Greek printed polytonic scripts. This project is initiated and developed by an interdisciplinary team with expertise in the areas of document image processing, character segmentation and recognition, machine learning, corpus creation and digital humanities. Our paper aims to describe the design and development of the workflow around this project, including data gathering and structuring, OCR tool development, user interface development, experiments on the training procedure of the tool, evaluation, post-correction and quality control of the results.","PeriodicalId":418911,"journal":{"name":"Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3322905.3322926","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
This paper presents the development and implementation of a robust OCR tool and a related comprehensive workflow for the recognition of Greek printed polytonic scripts. This project is initiated and developed by an interdisciplinary team with expertise in the areas of document image processing, character segmentation and recognition, machine learning, corpus creation and digital humanities. Our paper aims to describe the design and development of the workflow around this project, including data gathering and structuring, OCR tool development, user interface development, experiments on the training procedure of the tool, evaluation, post-correction and quality control of the results.