Looking for Candidate Translational Equivalents in Specialized, Comparable Corpora

Proceedings of the 19th international conference on Computational linguistics - Pub Date : 2002-08-24 DOI:10.3115/1071884.1071904

Yun-Chuang Chiao, Pierre Zweigenbaum

引用次数: 146

Abstract

Previous attempts at identifying translational equivalents in comparable corpora have dealt with very large 'general language' corpora and words. We address this task in a specialized domain, medicine, starting from smaller non-parallel, comparable corpora and an initial bilingual medical lexicon. We compare the distributional contexts of source and target words, testing several weighting factors and similarity measures. On a test set of frequently occurring words, for the best combination (the Jaccard similarity measure with or without tf.idf weighting), the correct translation is ranked first for 20% of our test words, and is found in the top 10 candidates for 50% of them. An additional reverse-translation filtering step improves the precision of the top candidate translation up to 74%, with a 33% recall.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

在专业可比语料库中寻找候选翻译对等物

以前在可比语料库中识别翻译对等物的尝试处理了非常大的“通用语言”语料库和单词。我们从一个较小的非平行的、可比较的语料库和一个初始的双语医学词典开始，在一个专门的领域——医学中解决这个任务。我们比较了源词和目标词的分布上下文，测试了几个权重因子和相似度度量。在一个频繁出现的单词的测试集上，为了获得最佳组合(使用或不使用tf的Jaccard相似性度量)。在我们的测试词中，正确的翻译在20%的测试词中排名第一，在50%的测试词中排名前10位。一个额外的反翻译过滤步骤将最佳候选翻译的精度提高到74%，召回率为33%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 19th international conference on Computational linguistics -

自引率

0.00%

发文量

期刊最新文献

Morphological Analysis of the Spontaneous Speech Corpus An Agent-based Approach to Chinese Named Entity Recognition Natural Language and Inference in a Computer Game Meta-evaluation of Summaries in a Cross-lingual Environment using Content-based Metrics Learning Verb Argument Structure from Minimally Annotated Corpora