{"title":"A conditional random fields model for overlapping ambiguity resolution in Chinese word segmentation","authors":"Yan Liang, Yaoting Zhu","doi":"10.1109/GRC.2009.5255092","DOIUrl":null,"url":null,"abstract":"Overlapping ambiguity is a kind of ambiguity phenomena in the Chinese word segmentation. Up to now, the researches on overlapping ambiguity always focused on the 3-character overlapping ambiguity strings. In this paper the distribution and forms of overlapping ambiguity strings are discussed empirically. In order to deal with the overlapping ambiguity strings in different forms synchronously, a conditional random fields model is used. Different features for overlapping ambiguity resolution are explored, including component independency probability, component co-occurrence probability, in-word probability of a component and string structures. The experimental results show that the precision reaches 93.81% in the open test.","PeriodicalId":388774,"journal":{"name":"2009 IEEE International Conference on Granular Computing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE International Conference on Granular Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GRC.2009.5255092","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Overlapping ambiguity is a kind of ambiguity phenomena in the Chinese word segmentation. Up to now, the researches on overlapping ambiguity always focused on the 3-character overlapping ambiguity strings. In this paper the distribution and forms of overlapping ambiguity strings are discussed empirically. In order to deal with the overlapping ambiguity strings in different forms synchronously, a conditional random fields model is used. Different features for overlapping ambiguity resolution are explored, including component independency probability, component co-occurrence probability, in-word probability of a component and string structures. The experimental results show that the precision reaches 93.81% in the open test.