{"title":"Building vietnamese dependency treebank based on Chinese-Vietnamese bilingual word alignment","authors":"Ying Li, Jianyi Guo, Zhengtao Yu, Hongbin Wang, Yonghua Wen","doi":"10.1109/FSKD.2016.7603371","DOIUrl":null,"url":null,"abstract":"Treebank is one of important resources in the natural language processing. Compared with the rich and mature Chinese corpus, Vietnamese Syntactic Analysis is much more difficult. This paper presents a new approach which uses Chinese-Vietnamese bilingual word alignment corpus to build Vietnamese Dependency Treebank. Firstly, the aligned word processing was made by Chinese-Vietnamese sentence alignment; Secondly, the dependency parsing was done with Chinese sentences. Finally, Vietnamese Dependency Parsing Treebank was generated by Chinese-Vietnamese Languages align relationship and Chinese Dependency Tree, At the same time, The Vietnamese phrase tree converted into dependency Treebank can significantly improve the accuracy of dependency analysis. Experimental results show that this approach can simplify the process of manual collection and annotation of Vietnamese Treebank, and it can save manpower and time to build the Vietnamese Treebank. Experimental results show that the accuracy of this method compared to machine learning methods has improved significantly.","PeriodicalId":373155,"journal":{"name":"2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FSKD.2016.7603371","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Treebank is one of important resources in the natural language processing. Compared with the rich and mature Chinese corpus, Vietnamese Syntactic Analysis is much more difficult. This paper presents a new approach which uses Chinese-Vietnamese bilingual word alignment corpus to build Vietnamese Dependency Treebank. Firstly, the aligned word processing was made by Chinese-Vietnamese sentence alignment; Secondly, the dependency parsing was done with Chinese sentences. Finally, Vietnamese Dependency Parsing Treebank was generated by Chinese-Vietnamese Languages align relationship and Chinese Dependency Tree, At the same time, The Vietnamese phrase tree converted into dependency Treebank can significantly improve the accuracy of dependency analysis. Experimental results show that this approach can simplify the process of manual collection and annotation of Vietnamese Treebank, and it can save manpower and time to build the Vietnamese Treebank. Experimental results show that the accuracy of this method compared to machine learning methods has improved significantly.