{"title":"Adaptive Enhancement for Scanned Historical Document Images","authors":"Farouk Suleiman, Chris J. Hughes, E. Obio","doi":"10.1109/ICECE54449.2021.9674392","DOIUrl":null,"url":null,"abstract":"In this paper we propose, a novel adaptative histogram matching method to remove low contrast, smeared ink, bleed-through and uneven illumination artefacts from scanned images of historical documents. The goal is to provide a better representation of document images and therefore improve readability and the source images for Optical Character Recognition (OCR). Unlike other methods that are designed for single artefacts, our proposed method enhances multiple (low-contrast, smeared-ink, bleed-through and uneven illumination). The method starts by taking the bimodal peaks of the original grayscale image and multiplying them to generated gaussian windows to create an ideal histogram with weights of importance to distribution. This histogram becomes the reference histogram to be matched to the original image for a more optimized image. Median filtering is also incorporated in the method to remove salt and pepper noise. We demonstrate the technique on the European Newspapers project (ENP) dataset chosen from the Pattern recognition and image analysis research lab (PRImA) and establish from the results that, the proposed method significantly reduces background noise and improves image quality on multiple artefacts as compared to other enhancement methods tested. To evaluate the efficiency of the proposed method, we make use of several performance criteria. These include Signal to Noise Ratio (SNR), Mean opinion score (MOS), and visual document image quality assessment (VDIQA) metric. The proposed method performs best in all the evaluation metrics having a 42.6 % increment on the average score of the other methods for MOS, 44.3% increment on average score of other methods for SNR and 61.11% better in VDIQA compared to other methods.","PeriodicalId":166178,"journal":{"name":"2021 IEEE 4th International Conference on Electronics and Communication Engineering (ICECE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 4th International Conference on Electronics and Communication Engineering (ICECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECE54449.2021.9674392","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper we propose, a novel adaptative histogram matching method to remove low contrast, smeared ink, bleed-through and uneven illumination artefacts from scanned images of historical documents. The goal is to provide a better representation of document images and therefore improve readability and the source images for Optical Character Recognition (OCR). Unlike other methods that are designed for single artefacts, our proposed method enhances multiple (low-contrast, smeared-ink, bleed-through and uneven illumination). The method starts by taking the bimodal peaks of the original grayscale image and multiplying them to generated gaussian windows to create an ideal histogram with weights of importance to distribution. This histogram becomes the reference histogram to be matched to the original image for a more optimized image. Median filtering is also incorporated in the method to remove salt and pepper noise. We demonstrate the technique on the European Newspapers project (ENP) dataset chosen from the Pattern recognition and image analysis research lab (PRImA) and establish from the results that, the proposed method significantly reduces background noise and improves image quality on multiple artefacts as compared to other enhancement methods tested. To evaluate the efficiency of the proposed method, we make use of several performance criteria. These include Signal to Noise Ratio (SNR), Mean opinion score (MOS), and visual document image quality assessment (VDIQA) metric. The proposed method performs best in all the evaluation metrics having a 42.6 % increment on the average score of the other methods for MOS, 44.3% increment on average score of other methods for SNR and 61.11% better in VDIQA compared to other methods.