Anh-Cuong Le, Phuong-Thai Nguyen, Hoai-Thu Vuong, Minh-Thu Pham, T. Ho
{"title":"An Experimental Study on Lexicalized Statistical Parsing for Vietnamese","authors":"Anh-Cuong Le, Phuong-Thai Nguyen, Hoai-Thu Vuong, Minh-Thu Pham, T. Ho","doi":"10.1109/KSE.2009.41","DOIUrl":null,"url":null,"abstract":"Syntactic parsing is a central problem and a challenge in the field of natural language processing. It attracts many studies and consequently there exists the effective parsers for several popular languages such as English and Chinese. For Vietnamese parsing, there have been a few studies focusing on this problem, these studies lack of applying modern techniques, and no popular parser has been released. This paper presents the first study on developing a Vietnamese wide coverage parser based on lexicalized probabilistic context free grammar (LPCFG) and using a standard parsed corpus (similar to Penn Treebank). In this paper the Bikel's parser is modified to analyze Vietnamese. We also provide a comparison based on investigating different parsing models and different linguistic features. The best configuration achieves around 78\\% of F-score.","PeriodicalId":347175,"journal":{"name":"2009 International Conference on Knowledge and Systems Engineering","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 International Conference on Knowledge and Systems Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KSE.2009.41","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
Syntactic parsing is a central problem and a challenge in the field of natural language processing. It attracts many studies and consequently there exists the effective parsers for several popular languages such as English and Chinese. For Vietnamese parsing, there have been a few studies focusing on this problem, these studies lack of applying modern techniques, and no popular parser has been released. This paper presents the first study on developing a Vietnamese wide coverage parser based on lexicalized probabilistic context free grammar (LPCFG) and using a standard parsed corpus (similar to Penn Treebank). In this paper the Bikel's parser is modified to analyze Vietnamese. We also provide a comparison based on investigating different parsing models and different linguistic features. The best configuration achieves around 78\% of F-score.