{"title":"Automatic Parameter Tuning of K-Means Algorithm for Document Binarization","authors":"A. Gattal, Faycel Abbas, Mohamed Ridda Laouar","doi":"10.1145/3330089.3330124","DOIUrl":null,"url":null,"abstract":"The document binarization is a primary processing step toward document recognition system. It goals to separate the foreground from the document background. In this paper, we propose an algorithm for the binarization of document images degraded by using the clustering algorithm K-Means with automatic parameter tuning. It uses the K-Means algorithm to classify the document image into three classes as background, foreground and noise labels. Experimental results show that our method is more robust to the state of the art on recent benchmarks on the H-DIBCO 2016 dataset.","PeriodicalId":251275,"journal":{"name":"Proceedings of the 7th International Conference on Software Engineering and New Technologies","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 7th International Conference on Software Engineering and New Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3330089.3330124","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
The document binarization is a primary processing step toward document recognition system. It goals to separate the foreground from the document background. In this paper, we propose an algorithm for the binarization of document images degraded by using the clustering algorithm K-Means with automatic parameter tuning. It uses the K-Means algorithm to classify the document image into three classes as background, foreground and noise labels. Experimental results show that our method is more robust to the state of the art on recent benchmarks on the H-DIBCO 2016 dataset.