首页 > 最新文献

2013 International Conference on Asian Language Processing最新文献

英文 中文
A Computer-Assist Algorithm to Detect Repetitive Stuttering Automatically 一种自动检测重复口吃的计算机辅助算法
Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.32
Junbo Zhang, Bin Dong, Yonghong Yan
An algorithm to detect Chinese repetitive stuttering by computer is studied. According to the features of repetitions in Chinese stuttered speech, improvement solutions are provided based on the previous research findings. First, a multi-span looping forced alignment decoding networks is designed to detect multi-syllable repetitions in Chinese stuttered speech. Second, branch penalty factor is added in the networks to adjust decoding trend using recursive search in order to reduce the error from the complexity of the decoding networks. Finally, we rejudge the detected stutters by calculating confidence to improve the reliability of the detection result. The experimental results show that compared to previous algorithm, the proposed algorithm can improve system performance significantly, about 18% average detection error rate relatively.
研究了汉语重复口吃的计算机检测算法。根据汉语结巴言语中重复的特点,在前人研究成果的基础上提出改进方案。首先,设计了一种多跨圈强制对齐解码网络,用于检测汉语口吃语音中的多音节重复。其次,在网络中加入分支惩罚因子,利用递归搜索调整译码趋势,以降低译码网络复杂度带来的误差;最后,通过计算置信度对检测到的口吃进行重新判断,提高检测结果的可靠性。实验结果表明,与以前的算法相比,该算法可以显著提高系统性能,相对平均检测错误率约为18%。
{"title":"A Computer-Assist Algorithm to Detect Repetitive Stuttering Automatically","authors":"Junbo Zhang, Bin Dong, Yonghong Yan","doi":"10.1109/IALP.2013.32","DOIUrl":"https://doi.org/10.1109/IALP.2013.32","url":null,"abstract":"An algorithm to detect Chinese repetitive stuttering by computer is studied. According to the features of repetitions in Chinese stuttered speech, improvement solutions are provided based on the previous research findings. First, a multi-span looping forced alignment decoding networks is designed to detect multi-syllable repetitions in Chinese stuttered speech. Second, branch penalty factor is added in the networks to adjust decoding trend using recursive search in order to reduce the error from the complexity of the decoding networks. Finally, we rejudge the detected stutters by calculating confidence to improve the reliability of the detection result. The experimental results show that compared to previous algorithm, the proposed algorithm can improve system performance significantly, about 18% average detection error rate relatively.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115748751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Subjectivity Classification of Filipino Text with Features Based on Term Frequency -- Inverse Document Frequency 基于词频特征的菲文文本主体性分类——逆文献频率
Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.40
Ralph Vincent J. Regalado, Jenina L. Chua, J. L. Co, Thomas James Z. Tiam-Lee
Subjectivity classification classifies a given document if it contains subjective information or not, or identifies which portions of the document are subjective. This research reports a machine learning approach on document-level and sentence-level subjectivity classification of Filipino texts using existing machine learning algorithms such as C4.5, Naïve Bayes, k-Nearest Neighbor, and Support Vector Machine. For the document-level classification, result shows that Support Vector Machines gave the best result with 95.06% accuracy. While for the sentence-level classification, Naïve Baves gave the best result with 58.75% accuracy.
主观性分类对给定文档是否包含主观信息进行分类,或者识别文档的哪些部分是主观的。本研究报告了一种机器学习方法,使用现有的机器学习算法,如C4.5, Naïve贝叶斯,k-最近邻和支持向量机,对菲律宾文本进行文档级和句子级主观性分类。对于文档级别的分类,结果表明支持向量机给出了最好的结果,准确率为95.06%。而对于句子级分类,Naïve Baves给出了最好的结果,准确率为58.75%。
{"title":"Subjectivity Classification of Filipino Text with Features Based on Term Frequency -- Inverse Document Frequency","authors":"Ralph Vincent J. Regalado, Jenina L. Chua, J. L. Co, Thomas James Z. Tiam-Lee","doi":"10.1109/IALP.2013.40","DOIUrl":"https://doi.org/10.1109/IALP.2013.40","url":null,"abstract":"Subjectivity classification classifies a given document if it contains subjective information or not, or identifies which portions of the document are subjective. This research reports a machine learning approach on document-level and sentence-level subjectivity classification of Filipino texts using existing machine learning algorithms such as C4.5, Naïve Bayes, k-Nearest Neighbor, and Support Vector Machine. For the document-level classification, result shows that Support Vector Machines gave the best result with 95.06% accuracy. While for the sentence-level classification, Naïve Baves gave the best result with 58.75% accuracy.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122157788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Using Mutual Information Criterion to Design an Effective Lexicon for Chinese Pinyin-to-Character Conversion 用互信息准则设计有效的汉语拼音文字转换词典
Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.37
Wei Li, Jin-Song Zhang, Yanlu Xie, Xiaoyun Wang, M. Nishida, Seiichi Yamamoto
Pinyin-to-character (P2C) conversion is mostly used to input Chinese characters into a computer. Its main problem is homophone words, which is solved through exploiting contextual information provided by lexicon and n-gram language model (LM). Our investigation about the state-of-the-art P2C technologies reveals that the methods of conventional optimization for them were almost based on minimizing text perplexity, however it is not directly related to the optimization of P2C performance. Therefore, we propose to use a new optimization criterion: mutual information (MI) between text corpus and its Pinyin script, to do self-supervised word segmentation, build a lexicon and estimate an n-gram LM, then use them to build P2C system. We realized the P2C system using newspaper corpus. Compared with the two baseline systems using handcrafted lexicon and perplexity based optimized lexicon, our system got relatively 19.7% and 10.3% error reductions on testing corpus respectively. The results show the efficiency of our proposal.
拼音-字符转换(P2C)主要用于向计算机输入汉字。其主要问题是同音词,通过利用词汇提供的上下文信息和n-gram语言模型(LM)来解决同音词问题。我们对最先进的P2C技术的调查表明,传统的优化方法几乎是基于最小化文本困惑,但它与P2C性能的优化没有直接关系。因此,我们提出了一种新的优化准则:文本语料库与其拼音脚本之间的互信息(MI),进行自监督分词,构建词典和估计n-gram LM,然后利用它们构建P2C系统。我们利用报纸语料库实现了P2C系统。与使用手工词典和基于困惑度的优化词典的基线系统相比,我们的系统在测试语料库上的错误率分别降低了19.7%和10.3%。结果表明我们的建议是有效的。
{"title":"Using Mutual Information Criterion to Design an Effective Lexicon for Chinese Pinyin-to-Character Conversion","authors":"Wei Li, Jin-Song Zhang, Yanlu Xie, Xiaoyun Wang, M. Nishida, Seiichi Yamamoto","doi":"10.1109/IALP.2013.37","DOIUrl":"https://doi.org/10.1109/IALP.2013.37","url":null,"abstract":"Pinyin-to-character (P2C) conversion is mostly used to input Chinese characters into a computer. Its main problem is homophone words, which is solved through exploiting contextual information provided by lexicon and n-gram language model (LM). Our investigation about the state-of-the-art P2C technologies reveals that the methods of conventional optimization for them were almost based on minimizing text perplexity, however it is not directly related to the optimization of P2C performance. Therefore, we propose to use a new optimization criterion: mutual information (MI) between text corpus and its Pinyin script, to do self-supervised word segmentation, build a lexicon and estimate an n-gram LM, then use them to build P2C system. We realized the P2C system using newspaper corpus. Compared with the two baseline systems using handcrafted lexicon and perplexity based optimized lexicon, our system got relatively 19.7% and 10.3% error reductions on testing corpus respectively. The results show the efficiency of our proposal.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116479882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Tentative Study on Language Model Based Solution to Multiple Choice of CET-4 基于语言模型的大学英语四级选择题解法初探
Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.35
Zhihang Fan, Muyun Yang, T. Zhao, Sheng Li
The paper presents a language model based solution to the test item of Multiple Choice of CET-4. Trained on the web scale English language data, different n-grams are examined under a dynamic programming searching for the best answers. Experimental results indicate that both 4-gram and 5-gram model could generate an average of 81% precision for 16 test items.
本文提出了一种基于语言模型的大学英语四级选择题解题方法。在网络规模的英语语言数据上进行训练,在动态规划下检查不同的n-grams,以寻找最佳答案。实验结果表明,4克和5克模型对16个测试项目的平均准确率均达到81%。
{"title":"A Tentative Study on Language Model Based Solution to Multiple Choice of CET-4","authors":"Zhihang Fan, Muyun Yang, T. Zhao, Sheng Li","doi":"10.1109/IALP.2013.35","DOIUrl":"https://doi.org/10.1109/IALP.2013.35","url":null,"abstract":"The paper presents a language model based solution to the test item of Multiple Choice of CET-4. Trained on the web scale English language data, different n-grams are examined under a dynamic programming searching for the best answers. Experimental results indicate that both 4-gram and 5-gram model could generate an average of 81% precision for 16 test items.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125844006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mining Recipes in Microblog 挖掘微博秘方
Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.13
Shengyu Liu, Qingcai Chen, Shanshan Guan, Xiaolong Wang, Huimiao Shi
Microblog, as an online communication platform, is becoming more and more popular. Users generate volumes of data everyday and the user generated content contains a lot of useful knowledge such as practical skills and technical expertise. This paper proposes a cross-data method to mine recipes in Microblog. In the proposed method, snippets of text relevant to recipes are firstly extracted from Baidu Encyclopedia. Secondly, the extracted snippets of text are used to train a domain-specific unigram language model. Thirdly, candidate recipes in Microblog are mined based on the unigram language model. Finally, some heuristic rules are used to identify real recipes from the candidate recipes. Experimental results show the effectiveness of the proposed method.
微博作为一种在线交流平台,正变得越来越受欢迎。用户每天生成大量的数据,用户生成的内容包含许多有用的知识,如实用技能和技术专长。提出了一种跨数据挖掘微博菜谱的方法。该方法首先从百度百科中提取食谱相关的文本片段。其次,将提取的文本片段用于训练特定领域的一元语言模型。第三,基于一元语言模型挖掘微博候选菜谱。最后,利用启发式规则从候选菜谱中识别出真实菜谱。实验结果表明了该方法的有效性。
{"title":"Mining Recipes in Microblog","authors":"Shengyu Liu, Qingcai Chen, Shanshan Guan, Xiaolong Wang, Huimiao Shi","doi":"10.1109/IALP.2013.13","DOIUrl":"https://doi.org/10.1109/IALP.2013.13","url":null,"abstract":"Microblog, as an online communication platform, is becoming more and more popular. Users generate volumes of data everyday and the user generated content contains a lot of useful knowledge such as practical skills and technical expertise. This paper proposes a cross-data method to mine recipes in Microblog. In the proposed method, snippets of text relevant to recipes are firstly extracted from Baidu Encyclopedia. Secondly, the extracted snippets of text are used to train a domain-specific unigram language model. Thirdly, candidate recipes in Microblog are mined based on the unigram language model. Finally, some heuristic rules are used to identify real recipes from the candidate recipes. Experimental results show the effectiveness of the proposed method.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134554602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The Annotation Scheme for Uyghur Dependency Treebank 维吾尔语依存树库标注方案
Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.56
Samat Mamitimin, Turgun Ibrahim, Marhaba Eli
The paper introduces a dependency annotation effort which aims to fully annotate an Uyghur corpus. It is the first attempt of its kind to develop a large scale tree-bank for Uyghur. In this paper, we provide the motivation for following the dependency theory as the annotation scheme and argue that the dependency grammar is better suited to model the various linguistic phenomena in Uyghur. In our solution, the syntactic relations are encoded as labeled dependency relations among segments of lexical items and sequence of inflectional groups separated by derivational boundaries. We present the basic annotation scheme including morphological and syntactically dependency relation. We also show how the scheme handles some phenomenon such as omissions in copula sentences, punctuations and coordinations, etc.
本文介绍了一种依赖标注方法,旨在对维吾尔语语料库进行全标注。这是国内首次为维吾尔族人开发大型林木库。在本文中,我们提供了采用依存理论作为注释方案的动机,并认为依存语法更适合于建模维吾尔语的各种语言现象。在我们的解决方案中,语法关系被编码为词法条目片段和由派生边界分隔的屈折组序列之间的标记依赖关系。我们提出了包括词法和句法依赖关系在内的基本标注方案。我们还展示了该方案如何处理一些现象,如连句省略、标点和协调等。
{"title":"The Annotation Scheme for Uyghur Dependency Treebank","authors":"Samat Mamitimin, Turgun Ibrahim, Marhaba Eli","doi":"10.1109/IALP.2013.56","DOIUrl":"https://doi.org/10.1109/IALP.2013.56","url":null,"abstract":"The paper introduces a dependency annotation effort which aims to fully annotate an Uyghur corpus. It is the first attempt of its kind to develop a large scale tree-bank for Uyghur. In this paper, we provide the motivation for following the dependency theory as the annotation scheme and argue that the dependency grammar is better suited to model the various linguistic phenomena in Uyghur. In our solution, the syntactic relations are encoded as labeled dependency relations among segments of lexical items and sequence of inflectional groups separated by derivational boundaries. We present the basic annotation scheme including morphological and syntactically dependency relation. We also show how the scheme handles some phenomenon such as omissions in copula sentences, punctuations and coordinations, etc.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116846973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Multi-thread Multi-keywords Matching Approach for Uyghur Text 维吾尔语文本多线程多关键词匹配方法
Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.36
Xinyuan Zhao, Adili Abuliz
Keywords matching is a preliminary means in public opinion analysis. Uyghur language is an agglutinative language, which words can be attaching by suffixes to express different semantic or syntactic in the text. Therefore, traditional matching algorithm can not be applied directly to the Uyghur text due to the Uyghur words have different surface forms in the text. In this paper, we implement a multi-keywords matching algorithm based on automaton for Uyghur text. The algorithm handles the inflection suffixes and the weakening of vowel letter in the word by use of reseverse suffixes automata and weakening of vowel restoration automata. By classification the keywords automata on the first letter of each keyword, a general multi-thread keywords matching approach for Uyghur also be proposed.
关键词匹配是民意分析的初步手段。维吾尔语是一种粘附性语言,词语可以通过词缀来表达文本中不同的语义或句法。因此,由于维吾尔语单词在文本中具有不同的表面形式,传统的匹配算法不能直接应用于维吾尔语文本。本文实现了一种基于自动机的维吾尔语文本多关键词匹配算法。该算法利用逆后缀自动机和弱化元音恢复自动机来处理单词中的屈折后缀和弱化元音字母。根据关键词首字母对关键词自动机进行分类,提出了一种通用的维吾尔语多线程关键词匹配方法。
{"title":"Multi-thread Multi-keywords Matching Approach for Uyghur Text","authors":"Xinyuan Zhao, Adili Abuliz","doi":"10.1109/IALP.2013.36","DOIUrl":"https://doi.org/10.1109/IALP.2013.36","url":null,"abstract":"Keywords matching is a preliminary means in public opinion analysis. Uyghur language is an agglutinative language, which words can be attaching by suffixes to express different semantic or syntactic in the text. Therefore, traditional matching algorithm can not be applied directly to the Uyghur text due to the Uyghur words have different surface forms in the text. In this paper, we implement a multi-keywords matching algorithm based on automaton for Uyghur text. The algorithm handles the inflection suffixes and the weakening of vowel letter in the word by use of reseverse suffixes automata and weakening of vowel restoration automata. By classification the keywords automata on the first letter of each keyword, a general multi-thread keywords matching approach for Uyghur also be proposed.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"354 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131287664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Speech Recognition Research on Uyghur Accent Spoken Language 维吾尔口音口语语音识别研究
Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.52
Yating Yang, Bo Ma, Xinyu Tang, Osman Turghun
This research focus on the problem of Uygur language speech recognition with the accent spoken language. The recognition rate is not high enough, when recognizing the spoken language with pronunciation variation based on the recognition system of standard spoken language. We propose a Speech Recognition framework based on Uighur Accent Spoken Language, analyze acoustic characteristics, describe the phenomenon of pronunciation variation of Uyghur and create the acoustic model and the multi-pronunciation dictionary. The preliminary experimental results showed the capability of the proposed method improved the performance of the Uyghur continuous speech recognition.
本研究主要研究带有口音口语的维吾尔语语音识别问题。在基于标准口语识别系统的语音变异口语识别中,识别率不够高。提出了一种基于维吾尔口音口语的语音识别框架,分析了维吾尔语语音特征,描述了维吾尔语语音变化现象,建立了维吾尔语语音模型和多语音词典。初步实验结果表明,该方法提高了维吾尔语连续语音识别的性能。
{"title":"Speech Recognition Research on Uyghur Accent Spoken Language","authors":"Yating Yang, Bo Ma, Xinyu Tang, Osman Turghun","doi":"10.1109/IALP.2013.52","DOIUrl":"https://doi.org/10.1109/IALP.2013.52","url":null,"abstract":"This research focus on the problem of Uygur language speech recognition with the accent spoken language. The recognition rate is not high enough, when recognizing the spoken language with pronunciation variation based on the recognition system of standard spoken language. We propose a Speech Recognition framework based on Uighur Accent Spoken Language, analyze acoustic characteristics, describe the phenomenon of pronunciation variation of Uyghur and create the acoustic model and the multi-pronunciation dictionary. The preliminary experimental results showed the capability of the proposed method improved the performance of the Uyghur continuous speech recognition.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"91 27","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131878028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rule Refinement for Spoken Language Translation by Retrieving the Missing Translation of Content Words 基于内容词缺失翻译检索的口语翻译规则优化
Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.23
Linfeng Song, Jun Xie, Xing Wang, Yajuan Lü, Qun Liu
Spoken language translation usually suffers from the missing translation of content words, failing to generate the appropriate translation. In this paper we propose a novel Mutual Information based method to improve spoken language translation by retrieving the missing translation of content words. We exploit several features that indicate how well the inner content words are translated for each rule to let MT systems select better translation rules. Experimental results show that our method can improve translation performance significantly ranging from 1.95 to 4.47 BLEU points on different test sets.
口语翻译通常存在实义词翻译缺失的问题,无法生成恰当的译文。本文提出了一种新的基于互信息的方法,通过检索内容词的缺失翻译来改进口语翻译。我们利用了几个特征来表明内部内容词对每个规则的翻译程度,让机器翻译系统选择更好的翻译规则。实验结果表明,在不同的测试集上,我们的方法可以显著提高翻译性能,范围在1.95 ~ 4.47 BLEU点之间。
{"title":"Rule Refinement for Spoken Language Translation by Retrieving the Missing Translation of Content Words","authors":"Linfeng Song, Jun Xie, Xing Wang, Yajuan Lü, Qun Liu","doi":"10.1109/IALP.2013.23","DOIUrl":"https://doi.org/10.1109/IALP.2013.23","url":null,"abstract":"Spoken language translation usually suffers from the missing translation of content words, failing to generate the appropriate translation. In this paper we propose a novel Mutual Information based method to improve spoken language translation by retrieving the missing translation of content words. We exploit several features that indicate how well the inner content words are translated for each rule to let MT systems select better translation rules. Experimental results show that our method can improve translation performance significantly ranging from 1.95 to 4.47 BLEU points on different test sets.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117149889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pronominal Resolution in Tamil Using Tree CRFs 用Tree CRFs解析泰米尔语中的代词
Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.59
R. Ram, S. L. Devi
We describe our work on pronominal resolution in Tamil using Tree CRFs. Pronominal resolution is the task of identifying the referent of a pronominal. In this work we have studied third person pronouns in Tamil such as 'avan', 'aval', 'athu', 'avar', he, she, it and they respectively. Tamil is a Dravidian language and it is morphologically rich and highly agglutinative language. Tree CRFs is a machine learning method, in which the data is modeled as a graph with edge weights used for learning. The features for learning are developed by using the morphological features of the language. The work is carried out on tourism domain data from the Web. We have obtained 70.8% precision and 66.5% recall. The results are encouraging.
我们使用Tree CRFs描述我们在泰米尔语代词解决方面的工作。代词解析是识别代词所指对象的任务。在这项工作中,我们研究了泰米尔语中的第三人称代词,如“avan”,“aval”,“athu”,“avar”,他,她,它和他们。泰米尔语是一种德拉威语,它是一种形态丰富且高度粘连的语言。Tree CRFs是一种机器学习方法,它将数据建模为带有边缘权重的图,用于学习。学习的特征是利用语言的形态特征来发展的。该工作是在来自Web的旅游领域数据上进行的。我们获得了70.8%的准确率和66.5%的召回率。结果令人鼓舞。
{"title":"Pronominal Resolution in Tamil Using Tree CRFs","authors":"R. Ram, S. L. Devi","doi":"10.1109/IALP.2013.59","DOIUrl":"https://doi.org/10.1109/IALP.2013.59","url":null,"abstract":"We describe our work on pronominal resolution in Tamil using Tree CRFs. Pronominal resolution is the task of identifying the referent of a pronominal. In this work we have studied third person pronouns in Tamil such as 'avan', 'aval', 'athu', 'avar', he, she, it and they respectively. Tamil is a Dravidian language and it is morphologically rich and highly agglutinative language. Tree CRFs is a machine learning method, in which the data is modeled as a graph with edge weights used for learning. The features for learning are developed by using the morphological features of the language. The work is carried out on tourism domain data from the Web. We have obtained 70.8% precision and 66.5% recall. The results are encouraging.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124846380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
期刊
2013 International Conference on Asian Language Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1