{"title":"减少哥特文字文档中的OCR错误","authors":"Lenz Furrer, M. Volk","doi":"10.5167/UZH-49812","DOIUrl":null,"url":null,"abstract":"In order to improve OCR quality in texts originally typeset in Gothic script, we have built an automated correction system which is highly specialized for the given text. Our approach includes external dictionary resources as well as information derived from the text itself. The focus lies on testing and improving different methods for classifying words as correct or erroneous. Also, different techniques are applied to find and rate correction candidates. In addition, we are working on a web application that enables users to read and edit the digitized text online.","PeriodicalId":44543,"journal":{"name":"ERCIM News","volume":"2011 1","pages":""},"PeriodicalIF":0.1000,"publicationDate":"2011-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":"{\"title\":\"Reducing OCR Errors in Gothic-Script Documents\",\"authors\":\"Lenz Furrer, M. Volk\",\"doi\":\"10.5167/UZH-49812\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In order to improve OCR quality in texts originally typeset in Gothic script, we have built an automated correction system which is highly specialized for the given text. Our approach includes external dictionary resources as well as information derived from the text itself. The focus lies on testing and improving different methods for classifying words as correct or erroneous. Also, different techniques are applied to find and rate correction candidates. In addition, we are working on a web application that enables users to read and edit the digitized text online.\",\"PeriodicalId\":44543,\"journal\":{\"name\":\"ERCIM News\",\"volume\":\"2011 1\",\"pages\":\"\"},\"PeriodicalIF\":0.1000,\"publicationDate\":\"2011-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"21\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ERCIM News\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5167/UZH-49812\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ERCIM News","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5167/UZH-49812","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
In order to improve OCR quality in texts originally typeset in Gothic script, we have built an automated correction system which is highly specialized for the given text. Our approach includes external dictionary resources as well as information derived from the text itself. The focus lies on testing and improving different methods for classifying words as correct or erroneous. Also, different techniques are applied to find and rate correction candidates. In addition, we are working on a web application that enables users to read and edit the digitized text online.