Reinforcing language model for speech translation with auxiliary data

2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI:10.1109/ASRU.2009.5373308

Jia Cui, Yonggang Deng, Bowen Zhou

引用次数: 1

Abstract

Language model domain adaption usually uses a large quantity of auxiliary data in different genres and domains. It has mostly been relying on scoring functions for selection and it is typically independent of intended applications such as machine translation. In this paper, we present a novel domain adaptation approach that is directly motivated by the need of translation engine. We first identify interesting phrases by examining phrase translation tables, and then use those phrases as anchors to select useful and relevant sentences from general domain data, with the goal of improving domain coverage or providing additional contextual information. The experimental results on Farsi to English translation in military force protection domain and Chinese to English translation in travel domain show statistical significant gain using the reinforced language models over the baseline.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用辅助数据强化语音翻译的语言模型

语言模型领域自适应通常使用大量不同体裁和领域的辅助数据。它主要依赖于评分功能进行选择，并且通常独立于机器翻译等预期应用程序。本文提出了一种直接受翻译引擎需求驱动的领域自适应方法。我们首先通过检查短语翻译表来识别有趣的短语，然后使用这些短语作为锚点从一般领域数据中选择有用和相关的句子，目的是提高领域覆盖率或提供额外的上下文信息。在军事保护领域的波斯语英翻译和旅行领域的汉英翻译的实验结果表明，使用强化语言模型在基线上有统计学上的显著提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2009 IEEE Workshop on Automatic Speech Recognition & Understanding

自引率

0.00%

发文量