Building vietnamese dependency treebank based on Chinese-Vietnamese bilingual word alignment

2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD) Pub Date : 2016-08-01 DOI:10.1109/FSKD.2016.7603371

Ying Li, Jianyi Guo, Zhengtao Yu, Hongbin Wang, Yonghua Wen

{"title":"Building vietnamese dependency treebank based on Chinese-Vietnamese bilingual word alignment","authors":"Ying Li, Jianyi Guo, Zhengtao Yu, Hongbin Wang, Yonghua Wen","doi":"10.1109/FSKD.2016.7603371","DOIUrl":null,"url":null,"abstract":"Treebank is one of important resources in the natural language processing. Compared with the rich and mature Chinese corpus, Vietnamese Syntactic Analysis is much more difficult. This paper presents a new approach which uses Chinese-Vietnamese bilingual word alignment corpus to build Vietnamese Dependency Treebank. Firstly, the aligned word processing was made by Chinese-Vietnamese sentence alignment; Secondly, the dependency parsing was done with Chinese sentences. Finally, Vietnamese Dependency Parsing Treebank was generated by Chinese-Vietnamese Languages align relationship and Chinese Dependency Tree, At the same time, The Vietnamese phrase tree converted into dependency Treebank can significantly improve the accuracy of dependency analysis. Experimental results show that this approach can simplify the process of manual collection and annotation of Vietnamese Treebank, and it can save manpower and time to build the Vietnamese Treebank. Experimental results show that the accuracy of this method compared to machine learning methods has improved significantly.","PeriodicalId":373155,"journal":{"name":"2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FSKD.2016.7603371","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Treebank is one of important resources in the natural language processing. Compared with the rich and mature Chinese corpus, Vietnamese Syntactic Analysis is much more difficult. This paper presents a new approach which uses Chinese-Vietnamese bilingual word alignment corpus to build Vietnamese Dependency Treebank. Firstly, the aligned word processing was made by Chinese-Vietnamese sentence alignment; Secondly, the dependency parsing was done with Chinese sentences. Finally, Vietnamese Dependency Parsing Treebank was generated by Chinese-Vietnamese Languages align relationship and Chinese Dependency Tree, At the same time, The Vietnamese phrase tree converted into dependency Treebank can significantly improve the accuracy of dependency analysis. Experimental results show that this approach can simplify the process of manual collection and annotation of Vietnamese Treebank, and it can save manpower and time to build the Vietnamese Treebank. Experimental results show that the accuracy of this method compared to machine learning methods has improved significantly.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于中越双语词对齐的越南语依存树库构建

树库是自然语言处理中的重要资源之一。与丰富而成熟的汉语语料库相比，越南语的句法分析难度要大得多。本文提出了一种利用中越双语词对齐语料库构建越南语依存树库的新方法。首先，采用中越句子对齐方法进行对齐字处理;其次，对汉语句子进行依存句法分析。最后，利用中越语言对齐关系和中文依存树生成越南语依存解析树库，同时将越南语短语树转换为依存树库，可以显著提高依存分析的准确性。实验结果表明，该方法简化了手工收集和标注越南语树库的过程，节省了越南语树库构建的人力和时间。实验结果表明，与机器学习方法相比，该方法的准确率有了明显提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)

自引率

0.00%

发文量