{"title":"基于模糊编码策略的中文退化文档检索","authors":"Yong Xia, Xuhui Jia, Kuanquan Wang","doi":"10.1109/ICSAI.2012.6223602","DOIUrl":null,"url":null,"abstract":"For the sake of the low recognition rate for degraded Chinese document, the performance of retrieval is not good if directly based on OCR result. This paper presents a new way to improve the performance of retrieval by fuzzy coding strategy. Lots of character classes with similar shapes are clustered and are indexed by pseudo code. For ease of test, this paper also presents a way to generate ground-truth of imaged document and synthesized degraded document image. A true OCR text collection and two synthesized document image collections are used for performance evaluation, and the result confirms the validation of our method.","PeriodicalId":90521,"journal":{"name":"IEEE International Conference on Systems Biology : [proceedings]. IEEE International Conference on Systems Biology","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2012-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Retrieval of degraded Chinese document based on fuzzy coding strategy\",\"authors\":\"Yong Xia, Xuhui Jia, Kuanquan Wang\",\"doi\":\"10.1109/ICSAI.2012.6223602\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For the sake of the low recognition rate for degraded Chinese document, the performance of retrieval is not good if directly based on OCR result. This paper presents a new way to improve the performance of retrieval by fuzzy coding strategy. Lots of character classes with similar shapes are clustered and are indexed by pseudo code. For ease of test, this paper also presents a way to generate ground-truth of imaged document and synthesized degraded document image. A true OCR text collection and two synthesized document image collections are used for performance evaluation, and the result confirms the validation of our method.\",\"PeriodicalId\":90521,\"journal\":{\"name\":\"IEEE International Conference on Systems Biology : [proceedings]. IEEE International Conference on Systems Biology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE International Conference on Systems Biology : [proceedings]. IEEE International Conference on Systems Biology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSAI.2012.6223602\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Conference on Systems Biology : [proceedings]. IEEE International Conference on Systems Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSAI.2012.6223602","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Retrieval of degraded Chinese document based on fuzzy coding strategy
For the sake of the low recognition rate for degraded Chinese document, the performance of retrieval is not good if directly based on OCR result. This paper presents a new way to improve the performance of retrieval by fuzzy coding strategy. Lots of character classes with similar shapes are clustered and are indexed by pseudo code. For ease of test, this paper also presents a way to generate ground-truth of imaged document and synthesized degraded document image. A true OCR text collection and two synthesized document image collections are used for performance evaluation, and the result confirms the validation of our method.