基于长度的汉英双语句子对齐

2011 International Conference on Asian Language Processing Pub Date : 2011-11-15 DOI:10.1109/IALP.2011.70

Huafu Ding, Li Quan, Haoliang Qi

{"title":"基于长度的汉英双语句子对齐","authors":"Huafu Ding, Li Quan, Haoliang Qi","doi":"10.1109/IALP.2011.70","DOIUrl":null,"url":null,"abstract":"Bilingual sentence pairs are key resource for statistical machine translation. Currently, most of the sentence alignment corpus is between English and French or English and German. And there is little specialized sentence alignment dataset between English and Chinese. So our aim is to create large-scale, high-precision English-Chinese aligned sentences. Length based method is used to align bilingual paragraphs which were extracted from CNKI (China National Knowledge Infrastructure). CNKI is one of largest academic website, and contains huge Chinese-English bilingual paragraph. Our method adapts and combines some approaches, which are based on words and based on hybrid. At last, we choose the best alignment by dynamic programming. The experiments on CNKI dataset showed that the presented method had satisfactory the recall ratio and the precision ratio.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"The Chinese-English Bilingual Sentence Alignment Based on Length\",\"authors\":\"Huafu Ding, Li Quan, Haoliang Qi\",\"doi\":\"10.1109/IALP.2011.70\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Bilingual sentence pairs are key resource for statistical machine translation. Currently, most of the sentence alignment corpus is between English and French or English and German. And there is little specialized sentence alignment dataset between English and Chinese. So our aim is to create large-scale, high-precision English-Chinese aligned sentences. Length based method is used to align bilingual paragraphs which were extracted from CNKI (China National Knowledge Infrastructure). CNKI is one of largest academic website, and contains huge Chinese-English bilingual paragraph. Our method adapts and combines some approaches, which are based on words and based on hybrid. At last, we choose the best alignment by dynamic programming. The experiments on CNKI dataset showed that the presented method had satisfactory the recall ratio and the precision ratio.\",\"PeriodicalId\":297167,\"journal\":{\"name\":\"2011 International Conference on Asian Language Processing\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-11-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 International Conference on Asian Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IALP.2011.70\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Asian Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IALP.2011.70","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

双语句子对是统计机器翻译的关键资源。目前，大多数句子对齐语料库是在英语和法语之间或英语和德语之间。英汉句子比对数据集也很少。所以我们的目标是创造大规模、高精度的英汉对齐句子。采用基于长度的方法对从中国知网(CNKI)中提取的双语段落进行对齐。中国知网是中国最大的学术网站之一，包含大量的中英双语段落。该方法对基于词的方法和基于混合的方法进行了适应和结合。最后，采用动态规划的方法选择最佳对齐方式。在CNKI数据集上的实验表明，该方法具有令人满意的查全率和查准率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

The Chinese-English Bilingual Sentence Alignment Based on Length

Bilingual sentence pairs are key resource for statistical machine translation. Currently, most of the sentence alignment corpus is between English and French or English and German. And there is little specialized sentence alignment dataset between English and Chinese. So our aim is to create large-scale, high-precision English-Chinese aligned sentences. Length based method is used to align bilingual paragraphs which were extracted from CNKI (China National Knowledge Infrastructure). CNKI is one of largest academic website, and contains huge Chinese-English bilingual paragraph. Our method adapts and combines some approaches, which are based on words and based on hybrid. At last, we choose the best alignment by dynamic programming. The experiments on CNKI dataset showed that the presented method had satisfactory the recall ratio and the precision ratio.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 International Conference on Asian Language Processing

自引率

0.00%

发文量

期刊最新文献

An Automatic Linguistics Approach for Persian Document Summarization Research on the Uyghur Information Database for Information Processing Research on Multi-document Summarization Model Based on Dynamic Manifold-Ranking Mining Parallel Data from Comparable Corpora via Triangulation A Query Reformulation Model Using Markov Graphic Method