{"title":"Construction of Khmer entity annotation corpus based on constrained conditional random fields","authors":"Shuhui Huang, Xin Yan, Zhengtao Yu, Qingling Lei","doi":"10.1109/FSKD.2016.7603531","DOIUrl":null,"url":null,"abstract":"It is hard to solve the problem of considerable segmentation error caused by out-of-vocabulary words, so for segmentation task based on Constrained Conditional Random Fields Model, the Precision is relatively low. To solve this problem, Constrained Conditional Random Fields Model, which exploits constraints on the Khmer word segmentation and named entity recognition, is proposed in this paper. With a series of segmentation, POS tagging and NER tasks, the Khmer entity annotation corpus which contains more entities is constructed. Several groups of experiments are set up in each link of construction entity annotation corpus. These experimental data have proven that the proposed method is feasible.","PeriodicalId":373155,"journal":{"name":"2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FSKD.2016.7603531","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
It is hard to solve the problem of considerable segmentation error caused by out-of-vocabulary words, so for segmentation task based on Constrained Conditional Random Fields Model, the Precision is relatively low. To solve this problem, Constrained Conditional Random Fields Model, which exploits constraints on the Khmer word segmentation and named entity recognition, is proposed in this paper. With a series of segmentation, POS tagging and NER tasks, the Khmer entity annotation corpus which contains more entities is constructed. Several groups of experiments are set up in each link of construction entity annotation corpus. These experimental data have proven that the proposed method is feasible.