{"title":"基于约束条件随机场的高棉语实体标注语料库构建","authors":"Shuhui Huang, Xin Yan, Zhengtao Yu, Qingling Lei","doi":"10.1109/FSKD.2016.7603531","DOIUrl":null,"url":null,"abstract":"It is hard to solve the problem of considerable segmentation error caused by out-of-vocabulary words, so for segmentation task based on Constrained Conditional Random Fields Model, the Precision is relatively low. To solve this problem, Constrained Conditional Random Fields Model, which exploits constraints on the Khmer word segmentation and named entity recognition, is proposed in this paper. With a series of segmentation, POS tagging and NER tasks, the Khmer entity annotation corpus which contains more entities is constructed. Several groups of experiments are set up in each link of construction entity annotation corpus. These experimental data have proven that the proposed method is feasible.","PeriodicalId":373155,"journal":{"name":"2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Construction of Khmer entity annotation corpus based on constrained conditional random fields\",\"authors\":\"Shuhui Huang, Xin Yan, Zhengtao Yu, Qingling Lei\",\"doi\":\"10.1109/FSKD.2016.7603531\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It is hard to solve the problem of considerable segmentation error caused by out-of-vocabulary words, so for segmentation task based on Constrained Conditional Random Fields Model, the Precision is relatively low. To solve this problem, Constrained Conditional Random Fields Model, which exploits constraints on the Khmer word segmentation and named entity recognition, is proposed in this paper. With a series of segmentation, POS tagging and NER tasks, the Khmer entity annotation corpus which contains more entities is constructed. Several groups of experiments are set up in each link of construction entity annotation corpus. These experimental data have proven that the proposed method is feasible.\",\"PeriodicalId\":373155,\"journal\":{\"name\":\"2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)\",\"volume\":\"48 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FSKD.2016.7603531\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FSKD.2016.7603531","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Construction of Khmer entity annotation corpus based on constrained conditional random fields
It is hard to solve the problem of considerable segmentation error caused by out-of-vocabulary words, so for segmentation task based on Constrained Conditional Random Fields Model, the Precision is relatively low. To solve this problem, Constrained Conditional Random Fields Model, which exploits constraints on the Khmer word segmentation and named entity recognition, is proposed in this paper. With a series of segmentation, POS tagging and NER tasks, the Khmer entity annotation corpus which contains more entities is constructed. Several groups of experiments are set up in each link of construction entity annotation corpus. These experimental data have proven that the proposed method is feasible.