首页 > 最新文献

2011 International Conference on Asian Language Processing最新文献

英文 中文
Optimal Translation Boundaries for BTG-Based Decoding 基于btg的译码的最佳翻译边界
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.73
Xiangyu Duan, Min Zhang
This paper proposes a method for inducing translation boundaries as soft constraints for Bracketing Transduction Grammar based (BTG-based) decoding. Translation boundaries used in previous research are extracted from left-most synchronous trees generated by a deterministic algorithm. Translation boundaries in this research are extracted from induced synchronous trees, which are statistically optimal and more balanced than the left-most synchronous trees. Experiments show that induced translation boundaries are more consistent than those extracted from left-most synchronous trees, resulting in significantly better performances over the strong baseline.
本文提出了一种引入翻译边界作为基于括号转导语法(btg)译码的软约束的方法。先前研究中使用的翻译边界是由确定性算法生成的最左侧同步树提取的。本文的翻译边界提取自诱导同步树,与最左侧同步树相比,诱导同步树具有统计最优性和平衡性。实验表明,诱导翻译边界比从最左侧同步树提取的边界更一致,在强基线上的性能显著提高。
{"title":"Optimal Translation Boundaries for BTG-Based Decoding","authors":"Xiangyu Duan, Min Zhang","doi":"10.1109/IALP.2011.73","DOIUrl":"https://doi.org/10.1109/IALP.2011.73","url":null,"abstract":"This paper proposes a method for inducing translation boundaries as soft constraints for Bracketing Transduction Grammar based (BTG-based) decoding. Translation boundaries used in previous research are extracted from left-most synchronous trees generated by a deterministic algorithm. Translation boundaries in this research are extracted from induced synchronous trees, which are statistically optimal and more balanced than the left-most synchronous trees. Experiments show that induced translation boundaries are more consistent than those extracted from left-most synchronous trees, resulting in significantly better performances over the strong baseline.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127657113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Character-Level System Combination: An Empirical Study for English-to-Chinese Spoken Language Translation 字符级系统组合:英汉口语翻译的实证研究
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.47
Jinhua Du
This paper proposes a character-level system combination strategy for English -- Chinese spoken language translation. For languages like Chinese that the word boundaries are not orthographically marked, word segmentation which segments a Chinese sentence into a sequence of words, is often required for many Natural Language Processing tasks. In this paper we evaluate the impact of segmentation (spoken data) on the performance of system combination, and show that using inappropriate segmentation in system combination can result in inferior performance compared to single systems. We further demonstrate that using characters as basic translation unit in system combination on IWSLT ASR translation task leads to significant gains in translation quality in terms of BLEU and NIST scores.
本文提出了一种用于英汉口语翻译的字符级系统组合策略。对于像汉语这样的词边界没有正字法标记的语言,在许多自然语言处理任务中通常需要分词,即将汉语句子分割成一系列词。在本文中,我们评估了分割(语音数据)对系统组合性能的影响,并表明在系统组合中使用不适当的分割会导致性能低于单个系统。我们进一步证明,在IWSLT ASR翻译任务的系统组合中使用字符作为基本翻译单元,在BLEU和NIST分数方面可以显著提高翻译质量。
{"title":"Character-Level System Combination: An Empirical Study for English-to-Chinese Spoken Language Translation","authors":"Jinhua Du","doi":"10.1109/IALP.2011.47","DOIUrl":"https://doi.org/10.1109/IALP.2011.47","url":null,"abstract":"This paper proposes a character-level system combination strategy for English -- Chinese spoken language translation. For languages like Chinese that the word boundaries are not orthographically marked, word segmentation which segments a Chinese sentence into a sequence of words, is often required for many Natural Language Processing tasks. In this paper we evaluate the impact of segmentation (spoken data) on the performance of system combination, and show that using inappropriate segmentation in system combination can result in inferior performance compared to single systems. We further demonstrate that using characters as basic translation unit in system combination on IWSLT ASR translation task leads to significant gains in translation quality in terms of BLEU and NIST scores.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"178 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115303455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Simplified-Traditional Chinese Character Conversion Model Based on Log-Linear Models 基于对数线性模型的简体繁体汉字转换模型
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.15
Yidong Chen, X. Shi, Changle Zhou
With the growth of exchange activities between four regions of cross strait, the problem to correctly convert between Traditional Chinese (TC) and Simplified Chinese (SC) become more and more important. Numerous one-to-many mappings and term usage differences make it more difficult to convert from SC to TC. This paper proposed a novel simplified-traditional Chinese character conversion model based on log-linear models, in which features such as language models and lexical semantic consistency weighs are integrated. When estimating lexical semantic consistency weighs, cross-language word-based semantic spaces were used. Experiments were conducted and the results show that the proposed model achieve better performance.
随着两岸两岸交流活动的日益增多,繁体中文与简体中文的正确转换问题变得越来越重要。大量的一对多映射和术语使用差异使得从SC到TC的转换更加困难。本文提出了一种基于对数线性模型的简体汉字转换模型,该模型综合了语言模型和词汇语义一致性权重等特征。在估计词汇语义一致性权重时,使用了跨语言的基于词的语义空间。实验结果表明,该模型具有较好的性能。
{"title":"A Simplified-Traditional Chinese Character Conversion Model Based on Log-Linear Models","authors":"Yidong Chen, X. Shi, Changle Zhou","doi":"10.1109/IALP.2011.15","DOIUrl":"https://doi.org/10.1109/IALP.2011.15","url":null,"abstract":"With the growth of exchange activities between four regions of cross strait, the problem to correctly convert between Traditional Chinese (TC) and Simplified Chinese (SC) become more and more important. Numerous one-to-many mappings and term usage differences make it more difficult to convert from SC to TC. This paper proposed a novel simplified-traditional Chinese character conversion model based on log-linear models, in which features such as language models and lexical semantic consistency weighs are integrated. When estimating lexical semantic consistency weighs, cross-language word-based semantic spaces were used. Experiments were conducted and the results show that the proposed model achieve better performance.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121699211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Automatic Acquisition of Chinese-Tibetan Multi-word Equivalent Pair from Bilingual Corpora 双语语料库汉藏多词对等对的自动习得
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.33
Minghua Nuo, Huidan Liu, Long-Long Ma, Jian Wu, Zhiming Ding
This paper aims to construct Chinese-Tibetan multi-word equivalent pair dictionary for Chinese-Tibetan computer-aided translation system. Since Tibetan is a morphologically rich language, we propose two-phase framework to automatically extract multi-word equivalent pairs. First, extract Chinese Multi-word Units (MWUs). In this phase, we propose CBEM model to partition a Chinese sentence into MWUs using two measures of collocation and binding degree. Second, get Tibetan translations of the extracted Chinese MWUs. In the second phase, we propose TSIM model to focus on extracting 1-to-n bilingual MWUs. Preliminary experimental results show that the mixed method combining CBEM model with TSIM model is effective.
本文旨在为汉藏计算机辅助翻译系统构建汉藏多词对等对词典。由于藏文是一种形态丰富的语言,我们提出了两阶段框架来自动提取多词对等对。首先,提取中文多词单位。在这一阶段,我们提出了CBEM模型,通过搭配度和结合度两个度量来将汉语句子划分为mwu。第二,获取提取的中文mwu的藏文翻译。在第二阶段,我们提出了TSIM模型,重点提取1对n双语mwu。初步实验结果表明,将CBEM模型与TSIM模型相结合的混合方法是有效的。
{"title":"Automatic Acquisition of Chinese-Tibetan Multi-word Equivalent Pair from Bilingual Corpora","authors":"Minghua Nuo, Huidan Liu, Long-Long Ma, Jian Wu, Zhiming Ding","doi":"10.1109/IALP.2011.33","DOIUrl":"https://doi.org/10.1109/IALP.2011.33","url":null,"abstract":"This paper aims to construct Chinese-Tibetan multi-word equivalent pair dictionary for Chinese-Tibetan computer-aided translation system. Since Tibetan is a morphologically rich language, we propose two-phase framework to automatically extract multi-word equivalent pairs. First, extract Chinese Multi-word Units (MWUs). In this phase, we propose CBEM model to partition a Chinese sentence into MWUs using two measures of collocation and binding degree. Second, get Tibetan translations of the extracted Chinese MWUs. In the second phase, we propose TSIM model to focus on extracting 1-to-n bilingual MWUs. Preliminary experimental results show that the mixed method combining CBEM model with TSIM model is effective.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"51 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129312543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The Chinese-English Bilingual Sentence Alignment Based on Length 基于长度的汉英双语句子对齐
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.70
Huafu Ding, Li Quan, Haoliang Qi
Bilingual sentence pairs are key resource for statistical machine translation. Currently, most of the sentence alignment corpus is between English and French or English and German. And there is little specialized sentence alignment dataset between English and Chinese. So our aim is to create large-scale, high-precision English-Chinese aligned sentences. Length based method is used to align bilingual paragraphs which were extracted from CNKI (China National Knowledge Infrastructure). CNKI is one of largest academic website, and contains huge Chinese-English bilingual paragraph. Our method adapts and combines some approaches, which are based on words and based on hybrid. At last, we choose the best alignment by dynamic programming. The experiments on CNKI dataset showed that the presented method had satisfactory the recall ratio and the precision ratio.
双语句子对是统计机器翻译的关键资源。目前,大多数句子对齐语料库是在英语和法语之间或英语和德语之间。英汉句子比对数据集也很少。所以我们的目标是创造大规模、高精度的英汉对齐句子。采用基于长度的方法对从中国知网(CNKI)中提取的双语段落进行对齐。中国知网是中国最大的学术网站之一,包含大量的中英双语段落。该方法对基于词的方法和基于混合的方法进行了适应和结合。最后,采用动态规划的方法选择最佳对齐方式。在CNKI数据集上的实验表明,该方法具有令人满意的查全率和查准率。
{"title":"The Chinese-English Bilingual Sentence Alignment Based on Length","authors":"Huafu Ding, Li Quan, Haoliang Qi","doi":"10.1109/IALP.2011.70","DOIUrl":"https://doi.org/10.1109/IALP.2011.70","url":null,"abstract":"Bilingual sentence pairs are key resource for statistical machine translation. Currently, most of the sentence alignment corpus is between English and French or English and German. And there is little specialized sentence alignment dataset between English and Chinese. So our aim is to create large-scale, high-precision English-Chinese aligned sentences. Length based method is used to align bilingual paragraphs which were extracted from CNKI (China National Knowledge Infrastructure). CNKI is one of largest academic website, and contains huge Chinese-English bilingual paragraph. Our method adapts and combines some approaches, which are based on words and based on hybrid. At last, we choose the best alignment by dynamic programming. The experiments on CNKI dataset showed that the presented method had satisfactory the recall ratio and the precision ratio.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126657878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Natural Language Grammar Induction of Indonesian Language Corpora Using Genetic Algorithm 基于遗传算法的印尼语语料库自然语言语法归纳
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.58
Ary Hermawan, Gunawan, Joan Santoso
Grammar Induction is a machine learning process for learning grammar from corpora. This paper will discuss the process of grammar induction for Indonesian language corpora using genetic algorithm. The Grammar production rules will be modeled in the form of chromosomes. The fitness function is used to count how many sentences can be parsed. The data used are Indonesian fairy tales stories such as "Bawang Merah Bawang Putih" and "Malin Kundang". This paper describes the detailed explanations about the steps of each process carried out for natural language grammar problems.
语法归纳是从语料库中学习语法的机器学习过程。本文将讨论用遗传算法对印尼语语料库进行语法归纳的过程。语法生成规则将以染色体的形式建模。适应度函数用于计算可以解析多少个句子。使用的数据是印度尼西亚的童话故事,如“Bawang Merah Bawang Putih”和“Malin Kundang”。本文对自然语言语法问题的每个处理步骤进行了详细的说明。
{"title":"Natural Language Grammar Induction of Indonesian Language Corpora Using Genetic Algorithm","authors":"Ary Hermawan, Gunawan, Joan Santoso","doi":"10.1109/IALP.2011.58","DOIUrl":"https://doi.org/10.1109/IALP.2011.58","url":null,"abstract":"Grammar Induction is a machine learning process for learning grammar from corpora. This paper will discuss the process of grammar induction for Indonesian language corpora using genetic algorithm. The Grammar production rules will be modeled in the form of chromosomes. The fitness function is used to count how many sentences can be parsed. The data used are Indonesian fairy tales stories such as \"Bawang Merah Bawang Putih\" and \"Malin Kundang\". This paper describes the detailed explanations about the steps of each process carried out for natural language grammar problems.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126516917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Automatic Labeling and Phonetic Assessment for an Unknown Asian Language: The Case of the "Mo Piu" North Vietnamese Minority (early results) 一种未知亚洲语言的自动标注与语音评估:以北越“莫痞”少数民族为例(早期结果)
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.81
G. Caelen-Haumont, Sam Sethserey, E. Castelli
This paper aims at assessing the automatic labeling of an undocumented, unknown and underresourced unwritten language (Mo Piu) of the North Vietnam, by an expert phonetician. For this task, we chose 5 languages in different combinations in order to highlight the best set. Two assessments will be presented, first, that of the phonetic events, and secondly that of the languages sets. After the presentation of the methods used for the automatic labeling and recognition, the paper will focus on the assessment of the phonetic units and of the languages sets.
本文旨在通过语音专家对北越一种未被记录、未知和资源不足的不成文语言(墨笔语)的自动标注进行评估。在这个任务中,我们选择了5种不同组合的语言,以突出最佳组合。将提出两种评估,第一种是语音事件的评估,第二种是语言集的评估。在介绍了自动标注和自动识别的方法之后,本文将重点介绍语音单位和语言集的评估。
{"title":"Automatic Labeling and Phonetic Assessment for an Unknown Asian Language: The Case of the \"Mo Piu\" North Vietnamese Minority (early results)","authors":"G. Caelen-Haumont, Sam Sethserey, E. Castelli","doi":"10.1109/IALP.2011.81","DOIUrl":"https://doi.org/10.1109/IALP.2011.81","url":null,"abstract":"This paper aims at assessing the automatic labeling of an undocumented, unknown and underresourced unwritten language (Mo Piu) of the North Vietnam, by an expert phonetician. For this task, we chose 5 languages in different combinations in order to highlight the best set. Two assessments will be presented, first, that of the phonetic events, and secondly that of the languages sets. After the presentation of the methods used for the automatic labeling and recognition, the paper will focus on the assessment of the phonetic units and of the languages sets.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116801777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Theoretical Framework of Mongolian Word Segmentation Specification for Information Processing 面向信息处理的蒙古语分词规范理论框架
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.45
T. Laga, Xiaobing Zhao
The establishment of Contemporary Mongolian word segmentation specification for information processing has a great significance in the standardization of information processing, the compatibleness of different systems, the sharing of corpus, grammatical analysis, and POS tagging. The present paper studies the framework of Mongolian word segmentation including guidelines, formulating principles, styles, scopes of segmentation units, establishment foundation, structure of the specification and so on, and lays the theoretical foundation for this specification.
现代蒙古语信息处理分词规范的建立,对于信息处理的规范化、不同系统的兼容、语料库的共享、语法分析、词性标注等方面具有重要意义。本文研究了蒙古语分词的框架,包括分词指南、分词原则、分词风格、分词单元范围、分词基础、分词规范结构等,为蒙古语分词规范的制定奠定了理论基础。
{"title":"Theoretical Framework of Mongolian Word Segmentation Specification for Information Processing","authors":"T. Laga, Xiaobing Zhao","doi":"10.1109/IALP.2011.45","DOIUrl":"https://doi.org/10.1109/IALP.2011.45","url":null,"abstract":"The establishment of Contemporary Mongolian word segmentation specification for information processing has a great significance in the standardization of information processing, the compatibleness of different systems, the sharing of corpus, grammatical analysis, and POS tagging. The present paper studies the framework of Mongolian word segmentation including guidelines, formulating principles, styles, scopes of segmentation units, establishment foundation, structure of the specification and so on, and lays the theoretical foundation for this specification.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128331151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Developing Bengali Speech Corpus for Phone Recognizer Using Optimum Text Selection Technique 利用最佳文本选择技术开发孟加拉语语音语料库
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.16
S. Mandal, B. Das, Pabitra Mitra, A. Basu
Speech corpus plays a key role in construction of automatic speech recognition (ASR), text-to-speech (TTS) synthesis and phone recognition (PR) system. PR system and ASR system are quite similar in functionality. The difference between these two is that for PR system the speech signal is converted to phonefootnote{smallest discrete segment of sound in uttered speech} text whereas for ASR system the speech signal is converted to word text. Speech corpus for PR system usually consists of a text corpus, recording data corresponding to the text corpus, phonetic representation of the text corpus and a pronunciation dictionary. Selecting optimum text from available text with balanced phone distribution is an important task for developing high quality PR system. In this paper, we describe our text selection technique and discuss the performance of phone recognition system.
语音语料库在自动语音识别(ASR)、文本到语音(TTS)合成和电话识别(PR)系统的构建中起着关键作用。PR系统和ASR系统在功能上非常相似。两者之间的区别在于,对于PR系统,语音信号被转换为电话footnote{发音中最小的离散音段}文本,而对于ASR系统,语音信号被转换为文字文本。PR系统的语音语料库通常由文本语料库、与文本语料库相对应的记录数据、文本语料库的语音表示和语音词典组成。从电话分布均衡的文本中选择最佳文本是开发高质量PR系统的重要任务。在本文中,我们描述了我们的文本选择技术,并讨论了手机识别系统的性能。
{"title":"Developing Bengali Speech Corpus for Phone Recognizer Using Optimum Text Selection Technique","authors":"S. Mandal, B. Das, Pabitra Mitra, A. Basu","doi":"10.1109/IALP.2011.16","DOIUrl":"https://doi.org/10.1109/IALP.2011.16","url":null,"abstract":"Speech corpus plays a key role in construction of automatic speech recognition (ASR), text-to-speech (TTS) synthesis and phone recognition (PR) system. PR system and ASR system are quite similar in functionality. The difference between these two is that for PR system the speech signal is converted to phonefootnote{smallest discrete segment of sound in uttered speech} text whereas for ASR system the speech signal is converted to word text. Speech corpus for PR system usually consists of a text corpus, recording data corresponding to the text corpus, phonetic representation of the text corpus and a pronunciation dictionary. Selecting optimum text from available text with balanced phone distribution is an important task for developing high quality PR system. In this paper, we describe our text selection technique and discuss the performance of phone recognition system.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134491378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Improving Bilingual Lexicon Construction from Chinese-English Comparable Corpora via Dependency Relationship Mapping 基于依存关系映射的英汉可比语料库双语词汇构建研究
Pub Date : 2011-11-15 DOI: 10.1109/IALP.2011.22
Hua Xu, Dandan Liu, Longhua Qian, Guodong Zhou
Currently context-based approach is a popular approach for constructing bilingual lexicons from comparable corpora. Following this line of research, this paper proposes a dependency relationship mapping model and investigates its effect on bilingual lexicon construction. The experiments show that, by mapping context words, dependency relationship types and directions simultaneously when calculating the similarity between two words in the source and target languages respectively, our approach significantly outperforms a state-of-the-art system in bilingual lexicon construction from either Chinese-English or English-Chinese. This justifies the effectiveness of our dependency relationship mapping model on bilingual lexicon construction.
基于上下文的方法是目前比较流行的从可比语料库构建双语词汇的方法。在此基础上,本文提出了依赖关系映射模型,并探讨了依赖关系映射模型对双语词汇构建的影响。实验表明,在计算源语言和目标语言中两个词的相似度时,通过同时映射上下文词、依赖关系类型和方向,我们的方法在汉英或英汉双语词典构建方面明显优于目前最先进的系统。这证明了依存关系映射模型在双语词典构建中的有效性。
{"title":"Improving Bilingual Lexicon Construction from Chinese-English Comparable Corpora via Dependency Relationship Mapping","authors":"Hua Xu, Dandan Liu, Longhua Qian, Guodong Zhou","doi":"10.1109/IALP.2011.22","DOIUrl":"https://doi.org/10.1109/IALP.2011.22","url":null,"abstract":"Currently context-based approach is a popular approach for constructing bilingual lexicons from comparable corpora. Following this line of research, this paper proposes a dependency relationship mapping model and investigates its effect on bilingual lexicon construction. The experiments show that, by mapping context words, dependency relationship types and directions simultaneously when calculating the similarity between two words in the source and target languages respectively, our approach significantly outperforms a state-of-the-art system in bilingual lexicon construction from either Chinese-English or English-Chinese. This justifies the effectiveness of our dependency relationship mapping model on bilingual lexicon construction.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127710907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2011 International Conference on Asian Language Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1