{"title":"Correcting broken characters in the recognition of historical printed documents","authors":"M. Droettboom","doi":"10.1109/JCDL.2003.1204889","DOIUrl":null,"url":null,"abstract":"We present a new technique for dealing with broken characters, one of the major challenges in the optical character recognition (OCR) of degraded historical printed documents. A technique based on graph combinatorics is used to rejoin the appropriate connected components. It has been applied to real data with successful results.","PeriodicalId":248854,"journal":{"name":"2003 Joint Conference on Digital Libraries, 2003. Proceedings.","volume":"140 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"47","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2003 Joint Conference on Digital Libraries, 2003. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/JCDL.2003.1204889","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 47
Abstract
We present a new technique for dealing with broken characters, one of the major challenges in the optical character recognition (OCR) of degraded historical printed documents. A technique based on graph combinatorics is used to rejoin the appropriate connected components. It has been applied to real data with successful results.