V. Rezanezhad, Konstantin Baierer, Clemens Neudecker
{"title":"A hybrid CNN-Transformer model for Historical Document Image Binarization","authors":"V. Rezanezhad, Konstantin Baierer, Clemens Neudecker","doi":"10.1145/3604951.3605508","DOIUrl":null,"url":null,"abstract":"Document image binarization is one of the main preprocessing steps in document image analysis for text recognition. Noise, faint characters, bad scanning conditions, uneven lighting or paper aging can cause artifacts that negatively impact text recognition algorithms. The task of binarization is to segment the foreground (text) from these degradations in order to improve optical character recognition (OCR) results. Convolutional Neural Networks (CNNs) are one popular method for binarization. But they suffer from focusing on the local context in a document image. We have applied a hybrid CNN-Transformer model to convert a document image into a binary output. For the model training, we used datasets from the Document Image Binarization Contests (DIBCO). For the datasets DIBCO-2012, DIBCO-2017 and DIBCO-2018, our model outperforms the state-of-the-art algorithms.","PeriodicalId":375632,"journal":{"name":"Proceedings of the 7th International Workshop on Historical Document Imaging and Processing","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 7th International Workshop on Historical Document Imaging and Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3604951.3605508","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Document image binarization is one of the main preprocessing steps in document image analysis for text recognition. Noise, faint characters, bad scanning conditions, uneven lighting or paper aging can cause artifacts that negatively impact text recognition algorithms. The task of binarization is to segment the foreground (text) from these degradations in order to improve optical character recognition (OCR) results. Convolutional Neural Networks (CNNs) are one popular method for binarization. But they suffer from focusing on the local context in a document image. We have applied a hybrid CNN-Transformer model to convert a document image into a binary output. For the model training, we used datasets from the Document Image Binarization Contests (DIBCO). For the datasets DIBCO-2012, DIBCO-2017 and DIBCO-2018, our model outperforms the state-of-the-art algorithms.