首页 > 最新文献

NEWS@ACM最新文献

英文 中文
Regulating Orthography-Phonology Relationship for English to Thai Transliteration 英语到泰语音译的正字法-音系关系调节
Pub Date : 2016-08-01 DOI: 10.18653/v1/W16-2712
Binh Minh Nguyen, H. Ngo, Nancy F. Chen
In this paper, we discuss our endeavors for the Named Entities Workshop (NEWS) 2016 transliteration shared task, where we focus on English to Thai transliteration. The alignment between Thai orthography and phonology is not always monotonous, but few transliteration systems take this into account. In our proposed system, we exploit phonological knowledge to resolve problematic instances where the monotonous alignment assumption breaks down. We achieve a 29% relative improvement over the baseline system for the NEWS 2016 transliteration shared task.
在本文中,我们讨论了我们为2016年命名实体研讨会(NEWS)音译共享任务所做的努力,我们的重点是英语到泰语的音译。泰语正字法和音系之间的对齐并不总是单调的,但很少有音译系统考虑到这一点。在我们提出的系统中,我们利用语音知识来解决单调对齐假设失效的问题实例。我们在NEWS 2016音译共享任务的基线系统上实现了29%的相对改进。
{"title":"Regulating Orthography-Phonology Relationship for English to Thai Transliteration","authors":"Binh Minh Nguyen, H. Ngo, Nancy F. Chen","doi":"10.18653/v1/W16-2712","DOIUrl":"https://doi.org/10.18653/v1/W16-2712","url":null,"abstract":"In this paper, we discuss our endeavors for the Named Entities Workshop (NEWS) 2016 transliteration shared task, where we focus on English to Thai transliteration. The alignment between Thai orthography and phonology is not always monotonous, but few transliteration systems take this into account. In our proposed system, we exploit phonological knowledge to resolve problematic instances where the monotonous alignment assumption breaks down. We achieve a 29% relative improvement over the baseline system for the NEWS 2016 transliteration shared task.","PeriodicalId":254249,"journal":{"name":"NEWS@ACM","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126351959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Moses-based official baseline for NEWS 2016 摩西为基础的官方基线新闻2016
Pub Date : 2016-08-01 DOI: 10.18653/v1/W16-2713
M. Costa-jussà
Transliteration is the phonetic translation between two different languages. There are many works that approach transliteration using machine translation methods. This paper describes the official baseline system for the NEWS 2016 workshop shared task. This baseline is based on a standard phrase-based machine translation system using Moses. Results are between the range of best and worst from last year’s workshops providing a nice starting point for participants this year.
音译是两种不同语言之间的语音翻译。有许多作品使用机器翻译的方法来处理音译。本文描述了NEWS 2016研讨会共享任务的官方基线系统。这个基线是基于使用Moses的基于短语的标准机器翻译系统。去年研讨会的结果介于最好和最差之间,为今年的参与者提供了一个很好的起点。
{"title":"Moses-based official baseline for NEWS 2016","authors":"M. Costa-jussà","doi":"10.18653/v1/W16-2713","DOIUrl":"https://doi.org/10.18653/v1/W16-2713","url":null,"abstract":"Transliteration is the phonetic translation between two different languages. There are many works that approach transliteration using machine translation methods. This paper describes the official baseline system for the NEWS 2016 workshop shared task. This baseline is based on a standard phrase-based machine translation system using Moses. Results are between the range of best and worst from last year’s workshops providing a nice starting point for participants this year.","PeriodicalId":254249,"journal":{"name":"NEWS@ACM","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133696949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Multi-source named entity typing for social media 用于社交媒体的多源命名实体类型
Pub Date : 2016-08-01 DOI: 10.18653/v1/W16-2702
R. Vexler, Einat Minkov
Typed lexicons that encode knowledge about the semantic types of an entity name, e.g., that ‘Paris’ denotes a geolocation, product, or person, have proven useful for many text processing tasks. While lexicons may be derived from large-scale knowledge bases (KBs), KBs are inherently imperfect, in particular they lack coverage with respect to long tail entity names. We infer the types of a given entity name using multi-source learning, considering information obtained by alignment to the Freebase knowledge base, Web-scale distributional patterns, and global semi-structured contexts retrieved by means of Web search. Evaluation in the challenging domain of social media shows that multi-source learning improves performance compared with rule-based KB lookups, boosting typing results for some semantic categories.
对实体名称的语义类型的知识进行编码的类型化词汇(例如,“Paris”表示地理位置、产品或人)已被证明对许多文本处理任务很有用。虽然词典可能来源于大规模知识库(KBs),但知识库本身是不完善的,特别是它们缺乏对长尾实体名称的覆盖。我们使用多源学习推断给定实体名称的类型,考虑到通过与Freebase知识库对齐获得的信息、网络规模的分布模式和通过网络搜索检索到的全球半结构化上下文。在社交媒体领域的评估表明,与基于规则的知识库查找相比,多源学习提高了性能,提高了某些语义类别的输入结果。
{"title":"Multi-source named entity typing for social media","authors":"R. Vexler, Einat Minkov","doi":"10.18653/v1/W16-2702","DOIUrl":"https://doi.org/10.18653/v1/W16-2702","url":null,"abstract":"Typed lexicons that encode knowledge about the semantic types of an entity name, e.g., that ‘Paris’ denotes a geolocation, product, or person, have proven useful for many text processing tasks. While lexicons may be derived from large-scale knowledge bases (KBs), KBs are inherently imperfect, in particular they lack coverage with respect to long tail entity names. We infer the types of a given entity name using multi-source learning, considering information obtained by alignment to the Freebase knowledge base, Web-scale distributional patterns, and global semi-structured contexts retrieved by means of Web search. Evaluation in the challenging domain of social media shows that multi-source learning improves performance compared with rule-based KB lookups, boosting typing results for some semantic categories.","PeriodicalId":254249,"journal":{"name":"NEWS@ACM","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125066131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Target-Bidirectional Neural Models for Machine Transliteration 机器音译的目标-双向神经模型
Pub Date : 2016-08-01 DOI: 10.18653/v1/W16-2711
A. Finch, Lemao Liu, Xiaolin Wang, E. Sumita
Our purely neural network-based system represents a paradigm shift away from the techniques based on phrase-based statistical machine translation we have used in the past. The approach exploits the agreement between a pair of target-bidirectional LSTMs, in order to generate balanced targets with both good suffixes and good prefixes. The evaluation results show that the method is able to match and even surpass the current state-of-the-art on most language pairs, but also exposes weaknesses on some tasks motivating further study. The Janus toolkit that was used to build the systems used in the evaluation is publicly available at https://github.com/lemaoliu/Agtarbidir.
我们的纯粹基于神经网络的系统代表了我们过去使用的基于短语的统计机器翻译技术的范式转变。该方法利用一对目标-双向lstm之间的一致性,以生成具有良好后缀和前缀的平衡目标。评价结果表明,该方法在大多数语言对上可以达到甚至超过目前的水平,但在一些有待进一步研究的任务上也存在不足。用于构建评估中使用的系统的Janus工具包可在https://github.com/lemaoliu/Agtarbidir上公开获得。
{"title":"Target-Bidirectional Neural Models for Machine Transliteration","authors":"A. Finch, Lemao Liu, Xiaolin Wang, E. Sumita","doi":"10.18653/v1/W16-2711","DOIUrl":"https://doi.org/10.18653/v1/W16-2711","url":null,"abstract":"Our purely neural network-based system represents a paradigm shift away from the techniques based on phrase-based statistical machine translation we have used in the past. The approach exploits the agreement between a pair of target-bidirectional LSTMs, in order to generate balanced targets with both good suffixes and good prefixes. The evaluation results show that the method is able to match and even surpass the current state-of-the-art on most language pairs, but also exposes weaknesses on some tasks motivating further study. The Janus toolkit that was used to build the systems used in the evaluation is publicly available at https://github.com/lemaoliu/Agtarbidir.","PeriodicalId":254249,"journal":{"name":"NEWS@ACM","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134450799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Spanish NER with Word Representations and Conditional Random Fields 具有词表示和条件随机场的西班牙语NER
Pub Date : 2016-08-01 DOI: 10.18653/v1/W16-2705
J. Copara, J. Ochoa, Camilo Thorne, Goran Glavas
Word Representations such as word embeddings have been shown to significantly improve (semi-)supervised NER for the English language. In this work we investigate whether word representations can also boost (semi-)supervised NER in Spanish. To do so, we use word representations as additional features in a linear chain Conditional Random Field (CRF) classifier. Experimental results (82.44 Fscore on the CoNLL-2002 corpus) show that our approach is comparable to some state-of-the-art Deep Learning approaches for Spanish, in particular when using
单词表示(如单词嵌入)已被证明可以显著改善英语语言的(半)监督NER。在这项工作中,我们研究了单词表征是否也可以提高西班牙语的(半)监督NER。为此,我们在线性链条件随机场(CRF)分类器中使用单词表示作为附加特征。实验结果(CoNLL-2002语料库上的82.44分)表明,我们的方法可以与一些最先进的西班牙语深度学习方法相媲美,特别是在使用
{"title":"Spanish NER with Word Representations and Conditional Random Fields","authors":"J. Copara, J. Ochoa, Camilo Thorne, Goran Glavas","doi":"10.18653/v1/W16-2705","DOIUrl":"https://doi.org/10.18653/v1/W16-2705","url":null,"abstract":"Word Representations such as word embeddings have been shown to significantly improve (semi-)supervised NER for the English language. In this work we investigate whether word representations can also boost (semi-)supervised NER in Spanish. To do so, we use word representations as additional features in a linear chain Conditional Random Field (CRF) classifier. Experimental results (82.44 Fscore on the CoNLL-2002 corpus) show that our approach is comparable to some state-of-the-art Deep Learning approaches for Spanish, in particular when using","PeriodicalId":254249,"journal":{"name":"NEWS@ACM","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128676337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Applying Neural Networks to English-Chinese Named Entity Transliteration 神经网络在英汉命名实体音译中的应用
Pub Date : 2016-08-01 DOI: 10.18653/v1/W16-2710
Yan Shao, Joakim Nivre
This paper presents the machine transliteration systems that we employ for our participation in the NEWS 2016 machine transliteration shared task. Based on the prevalent deep learning models developed for general sequence processing tasks, we use convolutional neural networks to extract character level information from the transliteration units and stack a simple recurrent neural network on top for sequence processing. The systems are applied to the standard runs for both English to Chinese and Chinese to English transliteration tasks. Our systems achieve competitive results according to the official evaluation.
本文介绍了我们用于参与NEWS 2016机器音译共享任务的机器音译系统。基于为一般序列处理任务开发的流行深度学习模型,我们使用卷积神经网络从音译单元提取字符级信息,并在其上堆叠一个简单的递归神经网络进行序列处理。这套系统适用于中英文转写及中英文转写的标准程序。根据官方评估,我们的系统取得了有竞争力的结果。
{"title":"Applying Neural Networks to English-Chinese Named Entity Transliteration","authors":"Yan Shao, Joakim Nivre","doi":"10.18653/v1/W16-2710","DOIUrl":"https://doi.org/10.18653/v1/W16-2710","url":null,"abstract":"This paper presents the machine transliteration systems that we employ for our participation in the NEWS 2016 machine transliteration shared task. Based on the prevalent deep learning models developed for general sequence processing tasks, we use convolutional neural networks to extract character level information from the transliteration units and stack a simple recurrent neural network on top for sequence processing. The systems are applied to the standard runs for both English to Chinese and Chinese to English transliteration tasks. Our systems achieve competitive results according to the official evaluation.","PeriodicalId":254249,"journal":{"name":"NEWS@ACM","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126203145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
German NER with a Multilingual Rule Based Information Extraction System: Analysis and Issues 基于多语言规则的德语NER信息抽取系统:分析与问题
Pub Date : 2016-08-01 DOI: 10.18653/v1/W16-2704
Anna Druzhkina, A. Leontyev, M. Stepanova
This paper presents a rule-based approach to Named Entity Recognition for the German language. The approach rests upon deep linguistic parsing and has already been applied to English and Russian. In this paper we present the first results of our system, ABBYY InfoExtractor, on GermEval 2014 Shared Task corpus. We focus on the main challenges of German NER that we have encountered when adapting our system to German and possible solutions for them.
本文提出了一种基于规则的德语命名实体识别方法。该方法基于深度语言分析,并已应用于英语和俄语。在本文中,我们展示了我们的系统ABBYY InfoExtractor在德国2014年共享任务语料库上的第一个结果。我们将重点放在德国NER的主要挑战上,我们在将我们的系统调整为德语时遇到了这些挑战以及可能的解决方案。
{"title":"German NER with a Multilingual Rule Based Information Extraction System: Analysis and Issues","authors":"Anna Druzhkina, A. Leontyev, M. Stepanova","doi":"10.18653/v1/W16-2704","DOIUrl":"https://doi.org/10.18653/v1/W16-2704","url":null,"abstract":"This paper presents a rule-based approach to Named Entity Recognition for the German language. The approach rests upon deep linguistic parsing and has already been applied to English and Russian. In this paper we present the first results of our system, ABBYY InfoExtractor, on GermEval 2014 Shared Task corpus. We focus on the main challenges of German NER that we have encountered when adapting our system to German and possible solutions for them.","PeriodicalId":254249,"journal":{"name":"NEWS@ACM","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130255591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Linguistic Issues in the Machine Transliteration of Chinese, Japanese and Arabic Names 汉语、日语和阿拉伯语人名机器音译中的语言学问题
Pub Date : 1900-01-01 DOI: 10.18653/v1/W16-2707
Jack Halpern
The romanization of non-Latin scripts is a complex computational task that is highly language dependent. This presentation will focus on three of the most challenging nonLatin scripts: Chinese, Japanese, and Arabic (CJA). Much progress has been made in personal name machine-transliteration methodologies, as documented in the various NEWS reports over the last several years. Such techniques as phrase-based SMT, RNN-based LM and CRF have emerged, leading to gradual improvements in accuracy scores. But methodology is only one aspect of the problem. Equally important is the high level of ambiguity of the CJA scripts, which poses special challenges to named entity extraction and machine transliteration. These difficulties are exacerbated by the lack of comprehensive proper noun dictionaries, the multiplicity of ambiguous transcription schemes, and orthographic variation. This presentation will clear up the differences between three basic concepts -transliteration, transcription, and romanization -that are a source of much confusion, even among computational linguists, and will focus on (1) the major linguistics issues, that is, the special characteristics of the CJA scripts that impact machine transliteration, and (2) the important role played by lexical resources such as personal name dictionaries. A major issue in romanizing Simplified Chinese (SC) is the one-to-many ambiguity of many characters (polyphones), such as /le/ and /yue/ for 乐. To disambiguate accurately, the names must be looked up in word-level (not character-level) name mapping tables. This is complicated by (1) the presence of orthographic variants in traditional Chinese (TC), and (2) the need to for cross-script conversion between (SC) and (TC), Transcription into Chinese is even more ambiguous, since some phonemes can correspond to dozens of characters. A major characteristic of Japanese, a highly agglutinative language, is the presence of countless orthographic variants. The four Japanese scripts interact in a complex way, resulting in okurigana variants (取り扱い, 取扱い, 取扱 etc. for /toriatsukai/), crossscript variants (猫, ねこ, ネコ for /neko/), kanji variants (大幅 and 大巾 for /oohaba/), kana variants (ユーザー and ユーザ for /yuuza(a)/), and more. Another issue is the numerous kun and nanori readings (some kanji have dozens) and the various romanization systems in current use, such as the Hepburn, Kunrei and hybrid systems.
非拉丁文字的罗马化是一项高度依赖语言的复杂计算任务。本演讲将重点介绍三种最具挑战性的非拉丁文字:汉语、日语和阿拉伯语(CJA)。个人姓名的机器音译方法取得了很大进展,正如过去几年各种新闻报道所记载的那样。基于短语的SMT、基于rnn的LM和CRF等技术已经出现,导致准确率分数逐步提高。但方法论只是问题的一个方面。同样重要的是CJA脚本的高度模糊性,这对命名实体提取和机器音译提出了特殊的挑战。由于缺乏全面的专有名词词典,歧义转录方案的多样性和正字法的变化,这些困难加剧了。本演讲将澄清三个基本概念之间的差异——音译、转录和罗马化——这是许多混淆的来源,即使在计算语言学家中也是如此,并将重点放在(1)主要语言学问题,即影响机器音译的CJA脚本的特殊特征,以及(2)词汇资源(如人名字典)所起的重要作用。简体中文(SC)罗马化的一个主要问题是许多字符(多音素)的一对多歧义,例如“踢腿”的“/乐/”和“/越/”。为了准确地消除歧义,必须在单词级(而不是字符级)名称映射表中查找名称。(1)繁体中文(TC)中存在正字法变体,(2)繁体中文(SC)和繁体中文(TC)之间需要跨文字转换,这使得这一问题变得更加复杂,因为一些音素可以对应几十个字符。日语是一种高度黏着的语言,它的一个主要特点是存在着无数的正字法变体。四名日本脚本以复杂的方式相互作用,导致okurigana变体(取り扱い,取扱い,取扱toriatsukai /等等),crossscript变体(猫,ねこ,ネコ/三氯二苯脲/),汉字的变体(大幅和大巾/ oohaba /),假名变体(ユーザー和ユーザyuuza /(一)/),等等。另一个问题是大量的汉字和纳诺里读法(一些汉字有几十种)和目前使用的各种罗马化系统,如赫本、昆雷和混合系统。
{"title":"Linguistic Issues in the Machine Transliteration of Chinese, Japanese and Arabic Names","authors":"Jack Halpern","doi":"10.18653/v1/W16-2707","DOIUrl":"https://doi.org/10.18653/v1/W16-2707","url":null,"abstract":"The romanization of non-Latin scripts is a complex computational task that is highly language dependent. This presentation will focus on three of the most challenging nonLatin scripts: Chinese, Japanese, and Arabic (CJA). Much progress has been made in personal name machine-transliteration methodologies, as documented in the various NEWS reports over the last several years. Such techniques as phrase-based SMT, RNN-based LM and CRF have emerged, leading to gradual improvements in accuracy scores. But methodology is only one aspect of the problem. Equally important is the high level of ambiguity of the CJA scripts, which poses special challenges to named entity extraction and machine transliteration. These difficulties are exacerbated by the lack of comprehensive proper noun dictionaries, the multiplicity of ambiguous transcription schemes, and orthographic variation. This presentation will clear up the differences between three basic concepts -transliteration, transcription, and romanization -that are a source of much confusion, even among computational linguists, and will focus on (1) the major linguistics issues, that is, the special characteristics of the CJA scripts that impact machine transliteration, and (2) the important role played by lexical resources such as personal name dictionaries. A major issue in romanizing Simplified Chinese (SC) is the one-to-many ambiguity of many characters (polyphones), such as /le/ and /yue/ for 乐. To disambiguate accurately, the names must be looked up in word-level (not character-level) name mapping tables. This is complicated by (1) the presence of orthographic variants in traditional Chinese (TC), and (2) the need to for cross-script conversion between (SC) and (TC), Transcription into Chinese is even more ambiguous, since some phonemes can correspond to dozens of characters. A major characteristic of Japanese, a highly agglutinative language, is the presence of countless orthographic variants. The four Japanese scripts interact in a complex way, resulting in okurigana variants (取り扱い, 取扱い, 取扱 etc. for /toriatsukai/), crossscript variants (猫, ねこ, ネコ for /neko/), kanji variants (大幅 and 大巾 for /oohaba/), kana variants (ユーザー and ユーザ for /yuuza(a)/), and more. Another issue is the numerous kun and nanori readings (some kanji have dozens) and the various romanization systems in current use, such as the Hepburn, Kunrei and hybrid systems.","PeriodicalId":254249,"journal":{"name":"NEWS@ACM","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121578512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Constructing a Japanese Basic Named Entity Corpus of Various Genres 日语各种体裁基本命名实体语料库的构建
Pub Date : 1900-01-01 DOI: 10.18653/v1/W16-2706
Tomoya Iwakura, Kanako Komiya, R. Tachibana
This paper introduces a Japanese Named Entity (NE) corpus of various genres. We annotated 136 documents in the Balanced Corpus of Contemporary Written Japanese (BCCWJ) with the eight types of NE tags defined by Information Retrieval and Extraction Exercise. The NE corpus consists of six types of genres of documents such as blogs, magazines, white papers, and so on, and the corpus contains 2,464 NE tags in total. The corpus can be reproduced with BCCWJ corpus and the tagging information obtained from https://sites.google.com/ site/projectnextnlpne/en/ .
介绍了一个包含多种体裁的日语命名实体语料库。我们用信息检索与提取练习中定义的8种NE标签对当代书面日语平衡语料库(BCCWJ)中的136个文档进行标注。网元语料库包含博客、杂志、白皮书等六种类型的文档,共包含2464个网元标签。语料库可以用BCCWJ语料库和从https://sites.google.com/ site/projectnextnlpne/en/获得的标注信息进行复制。
{"title":"Constructing a Japanese Basic Named Entity Corpus of Various Genres","authors":"Tomoya Iwakura, Kanako Komiya, R. Tachibana","doi":"10.18653/v1/W16-2706","DOIUrl":"https://doi.org/10.18653/v1/W16-2706","url":null,"abstract":"This paper introduces a Japanese Named Entity (NE) corpus of various genres. We annotated 136 documents in the Balanced Corpus of Contemporary Written Japanese (BCCWJ) with the eight types of NE tags defined by Information Retrieval and Extraction Exercise. The NE corpus consists of six types of genres of documents such as blogs, magazines, white papers, and so on, and the corpus contains 2,464 NE tags in total. The corpus can be reproduced with BCCWJ corpus and the tagging information obtained from https://sites.google.com/ site/projectnextnlpne/en/ .","PeriodicalId":254249,"journal":{"name":"NEWS@ACM","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124865601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Leveraging Entity Linking and Related Language Projection to Improve Name Transliteration 利用实体链接和相关语言投射来提高姓名音译
Pub Date : 1900-01-01 DOI: 10.18653/v1/W16-2701
Ying Lin, Xiaoman Pan, Aliya Deri, Heng Ji, Kevin Knight
Traditional name transliteration methods largely ignore source context information and inter-dependency among entities for entity disambiguation. We propose a novel approach to leverage state-of-the-art Entity Linking (EL) techniques to automatically correct name transliteration results, using collective inference from source contexts and additional evidence from knowledge base. Experiments on transliterating names from seven languages to English demonstrate that our approach achieves 2.6% to 15.7% absolute gain over the baseline model, and significantly advances state-of-the-art. When contextual information exists, our approach can achieve further gains (24.2%) by collectively transliterating and disambiguating multiple related entities. We also prove that combining Entity Linking and projecting resources from related languages obtained comparable performance as themethod using the same amount of training pairs in the original languageswithout Entity Linking.1
传统的名称音译方法在消除实体歧义时,很大程度上忽略了源上下文信息和实体之间的相互依赖关系。我们提出了一种新的方法,利用最先进的实体链接(EL)技术,使用来自源上下文的集体推理和来自知识库的额外证据,自动纠正名称音译结果。将七种语言的名字音译为英语的实验表明,我们的方法比基线模型获得了2.6%到15.7%的绝对增益,并且显著提高了最先进的技术水平。当上下文信息存在时,我们的方法可以通过集体音译和消除多个相关实体的歧义来获得进一步的收益(24.2%)。我们还证明,结合实体链接并从相关语言中投射资源的方法与在原始语言中使用相同数量的训练对而不使用实体链接的方法获得了相当的性能
{"title":"Leveraging Entity Linking and Related Language Projection to Improve Name Transliteration","authors":"Ying Lin, Xiaoman Pan, Aliya Deri, Heng Ji, Kevin Knight","doi":"10.18653/v1/W16-2701","DOIUrl":"https://doi.org/10.18653/v1/W16-2701","url":null,"abstract":"Traditional name transliteration methods largely ignore source context information and inter-dependency among entities for entity disambiguation. We propose a novel approach to leverage state-of-the-art Entity Linking (EL) techniques to automatically correct name transliteration results, using collective inference from source contexts and additional evidence from knowledge base. Experiments on transliterating names from seven languages to English demonstrate that our approach achieves 2.6% to 15.7% absolute gain over the baseline model, and significantly advances state-of-the-art. When contextual information exists, our approach can achieve further gains (24.2%) by collectively transliterating and disambiguating multiple related entities. We also prove that combining Entity Linking and projecting resources from related languages obtained comparable performance as themethod using the same amount of training pairs in the original languageswithout Entity Linking.1","PeriodicalId":254249,"journal":{"name":"NEWS@ACM","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133488124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
NEWS@ACM
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1