{"title":"基于语义聚类词的汉字识别语言模型","authors":"Hsi-Jian Lee, Cheng-Huang Tung","doi":"10.1109/ICDAR.1995.599033","DOIUrl":null,"url":null,"abstract":"This paper presents a new method for clustering the words in a dictionary into word groups, which are applied in a Chinese character recognition system with a language model to describe the contextual information. The Chinese synonym dictionary Tong2yi4ci2 ci2lin2 providing the semantic features is used to train the weights of the semantic attributes of the character-based word classes. The weights of the semantic attributes are next updated according to the words of the behavior dictionary, which has a rather complete word set. Then, the updated word classes are clustered into m groups according to the semantic measurement by a greedy method. The words in the behavior dictionary can finally be assigned into the m groups. The parameter space for bigram contextual information of the character recognition system is m/sup 2/. From the experimental results, the recognition system with the proposed model has shown better performance than that of a character-based bigram language model.","PeriodicalId":273519,"journal":{"name":"Proceedings of 3rd International Conference on Document Analysis and Recognition","volume":"172 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"A language model based on semantically clustered words in a Chinese character recognition system\",\"authors\":\"Hsi-Jian Lee, Cheng-Huang Tung\",\"doi\":\"10.1109/ICDAR.1995.599033\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a new method for clustering the words in a dictionary into word groups, which are applied in a Chinese character recognition system with a language model to describe the contextual information. The Chinese synonym dictionary Tong2yi4ci2 ci2lin2 providing the semantic features is used to train the weights of the semantic attributes of the character-based word classes. The weights of the semantic attributes are next updated according to the words of the behavior dictionary, which has a rather complete word set. Then, the updated word classes are clustered into m groups according to the semantic measurement by a greedy method. The words in the behavior dictionary can finally be assigned into the m groups. The parameter space for bigram contextual information of the character recognition system is m/sup 2/. From the experimental results, the recognition system with the proposed model has shown better performance than that of a character-based bigram language model.\",\"PeriodicalId\":273519,\"journal\":{\"name\":\"Proceedings of 3rd International Conference on Document Analysis and Recognition\",\"volume\":\"172 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1995-08-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of 3rd International Conference on Document Analysis and Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDAR.1995.599033\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 3rd International Conference on Document Analysis and Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.1995.599033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A language model based on semantically clustered words in a Chinese character recognition system
This paper presents a new method for clustering the words in a dictionary into word groups, which are applied in a Chinese character recognition system with a language model to describe the contextual information. The Chinese synonym dictionary Tong2yi4ci2 ci2lin2 providing the semantic features is used to train the weights of the semantic attributes of the character-based word classes. The weights of the semantic attributes are next updated according to the words of the behavior dictionary, which has a rather complete word set. Then, the updated word classes are clustered into m groups according to the semantic measurement by a greedy method. The words in the behavior dictionary can finally be assigned into the m groups. The parameter space for bigram contextual information of the character recognition system is m/sup 2/. From the experimental results, the recognition system with the proposed model has shown better performance than that of a character-based bigram language model.