{"title":"一种基于形态学的汉语分词方法","authors":"Xiaojun Lin, Liang Zhao, Meng Zhang, Xihong Wu","doi":"10.1109/NLPKE.2010.5587786","DOIUrl":null,"url":null,"abstract":"This paper proposes a novel method of Chinese word segmentation utilizing morphology information. The method introduces morphology into statistical model to capture structural relationship within word. It improves the conventional Conditional Random Fields (CRFs) models on the ability of representing the structure information. Firstly, a word-segmented Chinese corpus is annotated with morphology tags by a semi-automatic method. The resulting structure-related tags are integrated into the CRFs model. Secondly, a joint CRFs model is trained, which generates both morphology tags and word boundaries. Experiments are carried out on several SIGHAN Bakeoff corpus and show that the morphology information can improve the performance of Chinese word segmentation significantly, especially for the segmentation of out-of-vocabulary words.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A morphology-based Chinese word segmentation method\",\"authors\":\"Xiaojun Lin, Liang Zhao, Meng Zhang, Xihong Wu\",\"doi\":\"10.1109/NLPKE.2010.5587786\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes a novel method of Chinese word segmentation utilizing morphology information. The method introduces morphology into statistical model to capture structural relationship within word. It improves the conventional Conditional Random Fields (CRFs) models on the ability of representing the structure information. Firstly, a word-segmented Chinese corpus is annotated with morphology tags by a semi-automatic method. The resulting structure-related tags are integrated into the CRFs model. Secondly, a joint CRFs model is trained, which generates both morphology tags and word boundaries. Experiments are carried out on several SIGHAN Bakeoff corpus and show that the morphology information can improve the performance of Chinese word segmentation significantly, especially for the segmentation of out-of-vocabulary words.\",\"PeriodicalId\":259975,\"journal\":{\"name\":\"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)\",\"volume\":\"97 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NLPKE.2010.5587786\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NLPKE.2010.5587786","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A morphology-based Chinese word segmentation method
This paper proposes a novel method of Chinese word segmentation utilizing morphology information. The method introduces morphology into statistical model to capture structural relationship within word. It improves the conventional Conditional Random Fields (CRFs) models on the ability of representing the structure information. Firstly, a word-segmented Chinese corpus is annotated with morphology tags by a semi-automatic method. The resulting structure-related tags are integrated into the CRFs model. Secondly, a joint CRFs model is trained, which generates both morphology tags and word boundaries. Experiments are carried out on several SIGHAN Bakeoff corpus and show that the morphology information can improve the performance of Chinese word segmentation significantly, especially for the segmentation of out-of-vocabulary words.