首页 > 最新文献

NEWS@IJCNLP最新文献

英文 中文
Report of NEWS 2016 Machine Transliteration Shared Task 2016年新闻报道机器音译共享任务
Pub Date : 2011-11-01 DOI: 10.18653/v1/W16-2709
Xiangyu Duan, Rafael E. Banchs, Min Zhang, Haizhou Li, A. Kumaran
This report documents the Machine Transliteration Shared Task conducted as a part of the Named Entities Workshop (NEWS 2011), an IJCNLP 2011 workshop. The shared task features machine transliteration of proper names from English to 11 languages and from 3 languages to English. In total, 14 tasks are provided. 10 teams from 7 different countries participated in the evaluations. Finally, 73 standard and 4 non-standard runs are submitted, where diverse transliteration methodologies are explored and reported on the evaluation data. We report the results with 4 performance metrics. We believe that the shared task has successfully achieved its objective by providing a common benchmarking platform for the research community to evaluate the state-of-the-art technologies that benefit the future research and development.
本报告记录了作为IJCNLP 2011年研讨会命名实体研讨会(NEWS 2011)一部分的机器音译共享任务。共享任务的特点是机器将专有名称从英语音译为11种语言,从3种语言音译为英语。总共提供了14个任务。来自7个不同国家的10个小组参加了评价。最后,提交了73个标准版本和4个非标准版本,其中探索了不同的音译方法并报告了评估数据。我们用4个性能指标报告结果。我们相信,共享任务已经成功地实现了其目标,为研究界提供了一个共同的基准平台,以评估最先进的技术,从而有利于未来的研究和发展。
{"title":"Report of NEWS 2016 Machine Transliteration Shared Task","authors":"Xiangyu Duan, Rafael E. Banchs, Min Zhang, Haizhou Li, A. Kumaran","doi":"10.18653/v1/W16-2709","DOIUrl":"https://doi.org/10.18653/v1/W16-2709","url":null,"abstract":"This report documents the Machine Transliteration Shared Task conducted as a part of the Named Entities Workshop (NEWS 2011), an IJCNLP 2011 workshop. The shared task features machine transliteration of proper names from English to 11 languages and from 3 languages to English. In total, 14 tasks are provided. 10 teams from 7 different countries participated in the evaluations. Finally, 73 standard and 4 non-standard runs are submitted, where diverse transliteration methodologies are explored and reported on the evaluation data. We report the results with 4 performance metrics. We believe that the shared task has successfully achieved its objective by providing a common benchmarking platform for the research community to evaluate the state-of-the-art technologies that benefit the future research and development.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"156 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122158499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Transliteration System Using Pair HMM with Weighted FSTs 加权fst对HMM转写系统
Pub Date : 2009-08-07 DOI: 10.3115/1699705.1699731
Peter Nabende
This paper presents a transliteration system based on pair Hidden Markov Model (pair HMM) training and Weighted Finite State Transducer (WFST) techniques. Parameters used by WFSTs for transliteration generation are learned from a pair HMM. Parameters from pair-HMM training on English-Russian data sets are found to give better transliteration quality than parameters trained for WFSTs for corresponding structures. Training a pair HMM on English vowel bigrams and standard bigrams for Cyrillic Romanization, and using a few transformation rules on generated Russian transliterations to test for context improves the system's transliteration quality.
提出了一种基于对隐马尔可夫模型(pair HMM)训练和加权有限状态传感器(WFST)技术的音译系统。WFSTs用于音译生成的参数是从一对HMM中学习的。结果表明,在英-俄数据集上训练得到的参数比在wfst上训练得到的参数具有更好的转写质量。在英语元音双字母和西里尔罗马化标准双字母上训练一对HMM,并对生成的俄语音译使用一些转换规则来测试上下文,提高了系统的音译质量。
{"title":"Transliteration System Using Pair HMM with Weighted FSTs","authors":"Peter Nabende","doi":"10.3115/1699705.1699731","DOIUrl":"https://doi.org/10.3115/1699705.1699731","url":null,"abstract":"This paper presents a transliteration system based on pair Hidden Markov Model (pair HMM) training and Weighted Finite State Transducer (WFST) techniques. Parameters used by WFSTs for transliteration generation are learned from a pair HMM. Parameters from pair-HMM training on English-Russian data sets are found to give better transliteration quality than parameters trained for WFSTs for corresponding structures. Training a pair HMM on English vowel bigrams and standard bigrams for Cyrillic Romanization, and using a few transformation rules on generated Russian transliterations to test for context improves the system's transliteration quality.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124750956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Bridging Languages by SuperSense Entity Tagging 用超感实体标记桥接语言
Pub Date : 2009-08-07 DOI: 10.3115/1699705.1699740
Davide Picca, A. Gliozzo, S. Campora
This paper explores a very basic linguistic phenomenon in multilingualism: the lexicalizations of entities are very often identical within different languages while concepts are usually lexicalized differently. Since entities are commonly referred to by proper names in natural language, we measured their distribution in the lexical overlap of the terminologies extracted from comparable corpora. Results show that the lexical overlap is mostly composed by unambiguous words, which can be regarded as anchors to bridge languages: most of terms having the same spelling refer exactly to the same entities. Thanks to this important feature of Named Entities, we developed a multilingual super sense tagging system capable to distinguish between concepts and individuals. Individuals adopted for training have been extracted both by YAGO and by a heuristic procedure. The general F1 of the English tagger is over 76%, which is in line with the state of the art on super sense tagging while augmenting the number of classes. Performances for Italian are slightly lower, while ensuring a reasonable accuracy level which is capable to show effective results for knowledge acquisition.
本文探讨了多语使用中一个非常基本的语言现象:不同语言中实体的词汇化往往相同,而概念的词汇化往往不同。由于实体在自然语言中通常使用专有名称,因此我们测量了它们在从可比语料库中提取的术语的词汇重叠中的分布。结果表明,词汇重叠主要由明确的词构成,这些词可以被视为桥梁语言的锚点:拼写相同的术语大多指的是相同的实体。由于命名实体的这一重要特性,我们开发了一个能够区分概念和个体的多语言超级感觉标记系统。通过YAGO和启发式程序提取了用于训练的个体。英语标注器的总体F1在76%以上,在增加类数的同时符合超感官标注技术的现状。意大利语的表现略低,但保证了合理的准确度水平,能够显示出有效的知识获取结果。
{"title":"Bridging Languages by SuperSense Entity Tagging","authors":"Davide Picca, A. Gliozzo, S. Campora","doi":"10.3115/1699705.1699740","DOIUrl":"https://doi.org/10.3115/1699705.1699740","url":null,"abstract":"This paper explores a very basic linguistic phenomenon in multilingualism: the lexicalizations of entities are very often identical within different languages while concepts are usually lexicalized differently. Since entities are commonly referred to by proper names in natural language, we measured their distribution in the lexical overlap of the terminologies extracted from comparable corpora. Results show that the lexical overlap is mostly composed by unambiguous words, which can be regarded as anchors to bridge languages: most of terms having the same spelling refer exactly to the same entities. Thanks to this important feature of Named Entities, we developed a multilingual super sense tagging system capable to distinguish between concepts and individuals. Individuals adopted for training have been extracted both by YAGO and by a heuristic procedure. The general F1 of the English tagger is over 76%, which is in line with the state of the art on super sense tagging while augmenting the number of classes. Performances for Italian are slightly lower, while ensuring a reasonable accuracy level which is capable to show effective results for knowledge acquisition.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130231223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Voted NER System using Appropriate Unlabeled Data 使用适当未标记数据的投票NER系统
Pub Date : 2009-08-07 DOI: 10.3115/1699705.1699749
Asif Ekbal, Sivaji Bandyopadhyay
This paper reports a voted Named Entity Recognition (NER) system with the use of appropriate unlabeled data. The proposed method is based on the classifiers such as Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM) and has been tested for Bengali. The system makes use of the language independent features in the form of different contextual and orthographic word level features along with the language dependent features extracted from the Part of Speech (POS) tagger and gazetteers. Context patterns generated from the unlabeled data using an active learning method have been used as the features in each of the classifiers. A semi-supervised method has been used to describe the measures to automatically select effective documents and sentences from unlabeled data. Finally, the models have been combined together into a final system by weighted voting technique. Experimental results show the effectiveness of the proposed approach with the overall Recall, Precision, and F-Score values of 93.81%, 92.18% and 92.98%, respectively. We have shown how the language dependent features can improve the system performance.
本文报道了一种使用适当的未标记数据的投票命名实体识别(NER)系统。该方法基于最大熵(Maximum Entropy, ME)、条件随机场(Conditional Random Field, CRF)和支持向量机(Support Vector Machine, SVM)等分类器,并对孟加拉语进行了测试。该系统利用了从词性标注器和地名词典中提取的语言依赖特征,并以不同的上下文和正字法词级特征的形式提取了语言独立特征。使用主动学习方法从未标记数据生成的上下文模式被用作每个分类器中的特征。采用半监督方法描述了从未标注数据中自动选择有效文档和句子的方法。最后,通过加权投票技术将模型组合成最终的系统。实验结果表明,该方法的总体查全率、查准率和F-Score值分别为93.81%、92.18%和92.98%。我们已经展示了语言相关的特性是如何提高系统性能的。
{"title":"Voted NER System using Appropriate Unlabeled Data","authors":"Asif Ekbal, Sivaji Bandyopadhyay","doi":"10.3115/1699705.1699749","DOIUrl":"https://doi.org/10.3115/1699705.1699749","url":null,"abstract":"This paper reports a voted Named Entity Recognition (NER) system with the use of appropriate unlabeled data. The proposed method is based on the classifiers such as Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM) and has been tested for Bengali. The system makes use of the language independent features in the form of different contextual and orthographic word level features along with the language dependent features extracted from the Part of Speech (POS) tagger and gazetteers. Context patterns generated from the unlabeled data using an active learning method have been used as the features in each of the classifiers. A semi-supervised method has been used to describe the measures to automatically select effective documents and sentences from unlabeled data. Finally, the models have been combined together into a final system by weighted voting technique. Experimental results show the effectiveness of the proposed approach with the overall Recall, Precision, and F-Score values of 93.81%, 92.18% and 92.98%, respectively. We have shown how the language dependent features can improve the system performance.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"284 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122087723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
Chinese-English Organization Name Translation Based on Correlative Expansion 基于关联展开的汉英机构名称翻译
Pub Date : 2009-08-07 DOI: 10.3115/1699705.1699741
Feiliang Ren, Muhua Zhu, Huizhen Wang, Jingbo Zhu
This paper presents an approach to translating Chinese organization names into English based on correlative expansion. Firstly, some candidate translations are generated by using statistical translation method. And several correlative named entities for the input are retrieved from a correlative named entity list. Secondly, three kinds of expansion methods are used to generate some expanded queries. Finally, these queries are submitted to a search engine, and the refined translation results are mined and re-ranked by using the returned web pages. Experimental results show that this approach outperforms the compared system in overall translation accuracy.
本文提出了一种基于关联展开的中文组织名称英译方法。首先,利用统计翻译方法生成候选译文;并且从相关命名实体列表中检索输入的几个相关命名实体。其次,采用三种扩展方法生成一些扩展查询;最后,将这些查询提交给搜索引擎,并使用返回的网页挖掘和重新排序精炼的翻译结果。实验结果表明,该方法在整体翻译精度上优于对比系统。
{"title":"Chinese-English Organization Name Translation Based on Correlative Expansion","authors":"Feiliang Ren, Muhua Zhu, Huizhen Wang, Jingbo Zhu","doi":"10.3115/1699705.1699741","DOIUrl":"https://doi.org/10.3115/1699705.1699741","url":null,"abstract":"This paper presents an approach to translating Chinese organization names into English based on correlative expansion. Firstly, some candidate translations are generated by using statistical translation method. And several correlative named entities for the input are retrieved from a correlative named entity list. Secondly, three kinds of expansion methods are used to generate some expanded queries. Finally, these queries are submitted to a search engine, and the refined translation results are mined and re-ranked by using the returned web pages. Experimental results show that this approach outperforms the compared system in overall translation accuracy.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127991607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Modeling Machine Transliteration as a Phrase Based Statistical Machine Translation Problem 基于短语的机器音译建模统计机器翻译问题
Pub Date : 2009-08-07 DOI: 10.3115/1699705.1699737
Taraka Rama, Karthik Gali
In this paper we use the popular phrase-based SMT techniques for the task of machine transliteration, for English-Hindi language pair. Minimum error rate training has been used to learn the model weights. We have achieved an accuracy of 46.3% on the test set. Our results show these techniques can be successfully used for the task of machine transliteration.
在本文中,我们使用流行的基于短语的SMT技术来完成英语-印地语对的机器音译任务。采用最小错误率训练来学习模型权值。我们在测试集上达到了46.3%的准确率。结果表明,这些技术可以成功地用于机器音译任务。
{"title":"Modeling Machine Transliteration as a Phrase Based Statistical Machine Translation Problem","authors":"Taraka Rama, Karthik Gali","doi":"10.3115/1699705.1699737","DOIUrl":"https://doi.org/10.3115/1699705.1699737","url":null,"abstract":"In this paper we use the popular phrase-based SMT techniques for the task of machine transliteration, for English-Hindi language pair. Minimum error rate training has been used to learn the model weights. We have achieved an accuracy of 46.3% on the test set. Our results show these techniques can be successfully used for the task of machine transliteration.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"6 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121009063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
Transliteration of Name Entity via Improved Statistical Translation on Character Sequences 基于字符序列改进统计翻译的名称实体音译
Pub Date : 2009-08-07 DOI: 10.3115/1699705.1699720
Yan Song, C. Kit, Xiao Chen
Transliteration of given parallel name entities can be formulated as a phrase-based statistical machine translation (SMT) process, via its routine procedure comprising training, optimization and decoding. In this paper, we present our approach to transliterating name entities using the loglinear phrase-based SMT on character sequences. Our proposed work improves the translation by using bidirectional models, plus some heuristic guidance integrated in the decoding process. Our evaluated results indicate that this approach performs well in all standard runs in the NEWS2009 Machine Transliteration Shared Task.
对给定的平行名称实体进行音译,可以将其表述为基于短语的统计机器翻译(SMT)过程,该过程包括训练、优化和解码。在本文中,我们提出了在字符序列上使用基于loglinear短语的SMT来音译名称实体的方法。我们提出的工作通过使用双向模型以及在解码过程中集成的启发式指导来改进翻译。我们的评估结果表明,这种方法在NEWS2009机器音译共享任务的所有标准运行中都表现良好。
{"title":"Transliteration of Name Entity via Improved Statistical Translation on Character Sequences","authors":"Yan Song, C. Kit, Xiao Chen","doi":"10.3115/1699705.1699720","DOIUrl":"https://doi.org/10.3115/1699705.1699720","url":null,"abstract":"Transliteration of given parallel name entities can be formulated as a phrase-based statistical machine translation (SMT) process, via its routine procedure comprising training, optimization and decoding. In this paper, we present our approach to transliterating name entities using the loglinear phrase-based SMT on character sequences. Our proposed work improves the translation by using bidirectional models, plus some heuristic guidance integrated in the decoding process. Our evaluated results indicate that this approach performs well in all standard runs in the NEWS2009 Machine Transliteration Shared Task.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133028668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
epsilon-extension Hidden Markov Models and Weighted Transducers for Machine Transliteration 机器音译的扩展隐马尔可夫模型和加权换能器
Pub Date : 2009-08-07 DOI: 10.3115/1699705.1699736
Balakrishnan Varadarajan, D. Rao
We describe in detail a method for transliterating an English string to a foreign language string evaluated on five different languages, including Tamil, Hindi, Russian, Chinese, and Kannada. Our method involves deriving substring alignments from the training data and learning a weighted finite state transducer from these alignments. We define an e-extension Hidden Markov Model to derive alignments between training pairs and a heuristic to extract the substring alignments. Our method involves only two tunable parameters that can be optimized on held-out data.
我们详细描述了一种将英语字符串音译为五种不同语言(包括泰米尔语、印地语、俄语、汉语和卡纳达语)评估的外语字符串的方法。我们的方法包括从训练数据中获得子串对齐,并从这些对齐中学习加权有限状态传感器。我们定义了一个e-扩展隐马尔可夫模型来推导训练对之间的对齐,并定义了一个启发式算法来提取子串之间的对齐。我们的方法只涉及两个可调参数,它们可以在搁置数据上进行优化。
{"title":"epsilon-extension Hidden Markov Models and Weighted Transducers for Machine Transliteration","authors":"Balakrishnan Varadarajan, D. Rao","doi":"10.3115/1699705.1699736","DOIUrl":"https://doi.org/10.3115/1699705.1699736","url":null,"abstract":"We describe in detail a method for transliterating an English string to a foreign language string evaluated on five different languages, including Tamil, Hindi, Russian, Chinese, and Kannada. Our method involves deriving substring alignments from the training data and learning a weighted finite state transducer from these alignments. We define an e-extension Hidden Markov Model to derive alignments between training pairs and a heuristic to extract the substring alignments. Our method involves only two tunable parameters that can be optimized on held-out data.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128853159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Language-Independent Transliteration Schema Using Character Aligned Models at NEWS 2009 一种使用字符对齐模型的非语言转写图式[j]
Pub Date : 2009-08-07 DOI: 10.3115/1699705.1699715
Praneeth Shishtla, S. Veeravalli, Sethuramalingam Subramaniam, Vasudeva Varma
In this paper we present a statistical transliteration technique that is language independent. This technique uses statistical alignment models and Conditional Random Fields (CRF). Statistical alignment models maximizes the probability of the observed (source, target) word pairs using the expectation maximization algorithm and then the character level alignments are set to maximum posterior predictions of the model. CRF has efficient training and decoding processes which is conditioned on both source and target languages and produces globally optimal solution.
本文提出了一种与语言无关的统计音译技术。该技术使用统计对齐模型和条件随机场(CRF)。统计对齐模型使用期望最大化算法最大化观察到的(源、目标)单词对的概率,然后将字符级对齐设置为模型的最大后验预测。CRF具有高效的训练和译码过程,该过程同时以源语言和目标语言为条件,并产生全局最优解。
{"title":"A Language-Independent Transliteration Schema Using Character Aligned Models at NEWS 2009","authors":"Praneeth Shishtla, S. Veeravalli, Sethuramalingam Subramaniam, Vasudeva Varma","doi":"10.3115/1699705.1699715","DOIUrl":"https://doi.org/10.3115/1699705.1699715","url":null,"abstract":"In this paper we present a statistical transliteration technique that is language independent. This technique uses statistical alignment models and Conditional Random Fields (CRF). Statistical alignment models maximizes the probability of the observed (source, target) word pairs using the expectation maximization algorithm and then the character level alignments are set to maximum posterior predictions of the model. CRF has efficient training and decoding processes which is conditioned on both source and target languages and produces globally optimal solution.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132750960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
English to Hindi Machine Transliteration System at NEWS 2009 英语到印地语机器音译系统在新闻2009
Pub Date : 2009-08-07 DOI: 10.3115/1699705.1699726
Amitava Das, Asif Ekbal, Tapabrata Mondal, Sivaji Bandyopadhyay
This paper reports about our work in the NEWS 2009 Machine Transliteration Shared Task held as part of ACL-IJCNLP 2009. We submitted one standard run and two non-standard runs for English to Hindi transliteration. The modified joint source-channel model has been used along with a number of alternatives. The system has been trained on the NEWS 2009 Machine Transliteration Shared Task datasets. For standard run, the system demonstrated an accuracy of 0.471 and the mean F-Score of 0.861. The non-standard runs yielded the accuracy and mean F-scores of 0.389 and 0.831 respectively in the first one and 0.384 and 0.828 respectively in the second one. The non-standard runs resulted in substantially worse performance than the standard run. The reasons for this are the ranking algorithm used for the output and the types of tokens present in the test set.
本文报道了我们在作为ACL-IJCNLP 2009的一部分举行的NEWS 2009机器音译共享任务中的工作。我们提交了一个标准运行和两个非标准运行的英语到印地语音译。改进的联合源-通道模型与许多替代模型一起被使用。该系统已在NEWS 2009机器音译共享任务数据集上进行了训练。标准运行时,系统的准确率为0.471,平均F-Score为0.861。第一次非标准运行的准确性和平均f分数分别为0.389和0.831,第二次运行的准确性和平均f分数分别为0.384和0.828。非标准运行导致的性能比标准运行差得多。其原因是用于输出的排序算法和测试集中出现的令牌类型。
{"title":"English to Hindi Machine Transliteration System at NEWS 2009","authors":"Amitava Das, Asif Ekbal, Tapabrata Mondal, Sivaji Bandyopadhyay","doi":"10.3115/1699705.1699726","DOIUrl":"https://doi.org/10.3115/1699705.1699726","url":null,"abstract":"This paper reports about our work in the NEWS 2009 Machine Transliteration Shared Task held as part of ACL-IJCNLP 2009. We submitted one standard run and two non-standard runs for English to Hindi transliteration. The modified joint source-channel model has been used along with a number of alternatives. The system has been trained on the NEWS 2009 Machine Transliteration Shared Task datasets. For standard run, the system demonstrated an accuracy of 0.471 and the mean F-Score of 0.861. The non-standard runs yielded the accuracy and mean F-scores of 0.389 and 0.831 respectively in the first one and 0.384 and 0.828 respectively in the second one. The non-standard runs resulted in substantially worse performance than the standard run. The reasons for this are the ranking algorithm used for the output and the types of tokens present in the test set.","PeriodicalId":262513,"journal":{"name":"NEWS@IJCNLP","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133847673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
期刊
NEWS@IJCNLP
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1