{"title":"Chinese base phrases chunking based on latent semi-CRF model","authors":"Xiao Sun, Xiaoli Nan","doi":"10.1109/NLPKE.2010.5587802","DOIUrl":null,"url":null,"abstract":"In the fields of Chinese natural language processing, recognizing simple and non-recursive base phrases is an important task for natural language processing applications, such as information processing and machine translation. Instead of rule-based model, we adopt the statistical machine learning method, newly proposed Latent semi-CRF model to solve the Chinese base phrase chunking problem. The Chinese base phrases could be treated as the sequence labeling problem, which involve the prediction of a class label for each frame in an unsegmented sequence. The Chinese base phrases have sub-structures which could not be observed in training data. We propose a latent discriminative model called Latent semi-CRF(Latent Semi Conditional Random Fields), which incorporates the advantages of LDCRF(Latent Dynamic Conditional Random Fields) and semi-CRF that model the sub-structure of a class sequence and learn dynamics between class labels, in detecting the Chinese base phrases. Our results demonstrate that the latent dynamic discriminative model compares favorably to Support Vector Machines, Maximum Entropy Model, and Conditional Random Fields(including LDCRF and semi-CRF) on Chinese base phrases chunking.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NLPKE.2010.5587802","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
In the fields of Chinese natural language processing, recognizing simple and non-recursive base phrases is an important task for natural language processing applications, such as information processing and machine translation. Instead of rule-based model, we adopt the statistical machine learning method, newly proposed Latent semi-CRF model to solve the Chinese base phrase chunking problem. The Chinese base phrases could be treated as the sequence labeling problem, which involve the prediction of a class label for each frame in an unsegmented sequence. The Chinese base phrases have sub-structures which could not be observed in training data. We propose a latent discriminative model called Latent semi-CRF(Latent Semi Conditional Random Fields), which incorporates the advantages of LDCRF(Latent Dynamic Conditional Random Fields) and semi-CRF that model the sub-structure of a class sequence and learn dynamics between class labels, in detecting the Chinese base phrases. Our results demonstrate that the latent dynamic discriminative model compares favorably to Support Vector Machines, Maximum Entropy Model, and Conditional Random Fields(including LDCRF and semi-CRF) on Chinese base phrases chunking.