Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019)最新文献

英文中文

Improving the Annotations in the Turkish Universal Dependency Treebank 改进了土耳其通用依赖树库中的注解

Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019)

Pub Date : 2019-08-01 DOI: 10.18653/v1/W19-8013

Utku Türk, Furkan Atmaca, Saziye Betül Özates, Balkiz Öztürk Basaran, Tunga Güngör, Arzucan Özgür

This study focuses on a comprehensive analysis and manual re-annotation of the Turkish IMST-UD Treebank, which was automatically converted from the IMST Treebank (Sulubacak et al., 2016b). In accordance with the Universal Dependencies’ guidelines and the necessities of Turkish grammar, the existing treebank was revised. The current study presents the revisions that were made alongside the motivations behind the major changes. Moreover, it reports the parsing results of a transition-based dependency parser and a graph-based dependency parser obtained over the previous and updated versions of the treebank. In light of these results, we have observed that the re-annotation of the Turkish IMST-UD treebank improves performance with regards to dependency parsing.

本研究的重点是对土耳其IMST- ud树库进行综合分析和人工重新标注，该树库是由IMST树库自动转换而来(Sulubacak et al.， 2016b)。根据普遍依赖的指导方针和土耳其语语法的需要，对现有的树库进行了修订。目前的研究展示了这些修订以及主要变化背后的动机。此外，它还报告基于转换的依赖项解析器和基于图的依赖项解析器的解析结果，这些解析器是在树库的先前版本和更新版本上获得的。根据这些结果，我们观察到土耳其IMST-UD树库的重新注释提高了依赖项解析的性能。

引用次数: 6

Universal Dependencies for Mbyá Guaraní mby<e:1> Guaraní的通用依赖

Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019)

Pub Date : 2019-08-01 DOI: 10.18653/v1/W19-8008

Guillaume Thomas

This paper presents the first treebank of Mbyá Guaraní, a Tupí-Guaraní language spoken in Argentina, Brazil and Paraguay. The Mbyá treebank is part of Universal Dependencies, a project that aims to create a set of guidelines for the consistent grammatical annotation of typologically different languages. We describe the composition of the treebank, and non-trivial choices that were made in the adaptation of Universal Dependencies guidelines to the annotation of Mbyá.

本文介绍了阿根廷、巴西和巴拉圭使用的Tupí-Guaraní语言mby Guaraní的第一个树库。mby树库是Universal Dependencies的一部分，Universal Dependencies是一个旨在为不同类型语言的一致语法注释创建一套指导方针的项目。我们描述了树库的组成，以及在将Universal Dependencies指南改编为mby注释时所做的重要选择。

引用次数: 10

Survey of Uralic Universal Dependencies development 乌拉尔语普遍依赖关系发展概况

Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019)

Pub Date : 2019-08-01 DOI: 10.18653/v1/W19-8009

N. Partanen, Jack Rueter

This paper attempts to evaluate some of the systematic differences in Uralic Universal Dependencies treebanks from a perspective that would help to introduce reasonable improvements in treebank annotation consistency within this language family. The study finds that the coverage of Uralic languages in the project is already relatively high, and the majority of typically Uralic features are already present and can be discussed on the basis of existing treebanks. Some of the idiosyncrasies found in individual treebanks stem from language-internal grammar traditions, and could be a target for harmonization in later phases.

本文试图从一个角度来评估乌拉尔通用依赖树库的一些系统差异，这将有助于在该语言家族中引入合理的树库注释一致性改进。研究发现，乌拉尔语在项目中的覆盖率已经比较高，大多数典型的乌拉尔语特征已经存在，可以在现有树库的基础上进行讨论。在个别树库中发现的一些特性源于语言内部的语法传统，可能是后期协调的目标。

引用次数: 10

Building a treebank for Occitan: what use for Romance UD corpora? 为欧西坦建立一个树库:对罗曼蒂克语料库有什么用?

Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019)

Pub Date : 2019-08-01 DOI: 10.18653/v1/W19-8002

A. Miletic, M. Bras, Louise Esher, J. Sibille, Marianne Vergez-Couret

This paper describes the application of delexicalized cross-lingual parsing on Occitan with a view to building the first dependency treebank of this language. Occitan is a Romance language spoken in the south of France and in parts of Italy and Spain. It is a relatively low-resourced language and does not have a syntactically annotated corpus as of yet. In order to facilitate the manual annotation process, we train parsing models on the existing Romance corpora from the Universal Dependencies project and apply them to Occitan. Special attention is given to the effect of this cross-lingual annotation on the work of human annotators in terms of annotation speed and ease.

本文描述了在Occitan上的跨语言去语义分析的应用，以期建立该语言的第一个依赖树库。欧西坦语是一种罗曼语，在法国南部以及意大利和西班牙的部分地区使用。它是一种资源相对较少的语言，到目前为止还没有语法注释的语料库。为了方便手工标注过程，我们在Universal Dependencies项目中现有的罗曼语语料库上训练了解析模型，并将其应用于Occitan。特别注意这种跨语言注释对人类注释者在注释速度和便利性方面的工作的影响。

引用次数: 4

Developing Universal Dependencies for Wolof 开发Wolof的通用依赖

Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019)

Pub Date : 2019-08-01 DOI: 10.18653/v1/W19-8003

Cheikh M. Bamba Dione

This paper presents work on the creation of a Universal Dependency (UD) treebank for Wolof as the first UD treebank within the Northern Atlantic branch of the Niger-Congo languages. The paper reports on various issues related to word segmentation for tokenization and the mapping of PoS tags, morphological features and dependency relations to existing conventions for annotating Wolof. It also outlines some specific constructions as a starting point for discussing several more general UD annotation guidelines, in particular for noun class marking, deixis encoding, and focus marking.

本文介绍了为Wolof创建通用依赖(UD)树库的工作，作为尼日尔-刚果语言北大西洋分支中的第一个UD树库。本文报告了用于标记化的分词问题，以及词性标记、形态特征和依赖关系与Wolof注释的现有约定的映射。它还概述了一些特定的结构，作为讨论几个更通用的UD注释指南的起点，特别是名词类标记、指示符号编码和焦点标记。

引用次数: 12

Recursive LSTM Tree Representation for Arc-Standard Transition-Based Dependency Parsing 基于arc标准转换的依赖解析的递归LSTM树表示

Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019)

Pub Date : 1900-01-01 DOI: 10.18653/v1/W19-8012

Mohab Elkaref, Bernd Bohnet

We propose a method to represent dependency trees as dense vectors through the recursive application of Long Short-Term Memory networks to build Recursive LSTM Trees (RLTs). We show that the dense vectors produced by Recursive LSTM Trees replace the need for structural features by using them as feature vectors for a greedy Arc-Standard transition-based dependency parser. We also show that RLTs have the ability to incorporate useful information from the bi-LSTM contextualized representation used by Cross and Huang (2016) and Kiperwasser and Goldberg (2016b). The resulting dense vectors are able to express both structural information relating to the dependency tree, as well as sequential information relating to the position in the sentence. The resulting parser only requires the vector representations of the top two items on the parser stack, which is, to the best of our knowledge, the smallest feature set ever published for Arc-Standard parsers to date, while still managing to achieve competitive results.

我们提出了一种将依赖树表示为密集向量的方法，通过递归应用长短期记忆网络来构建递归LSTM树(rlt)。我们展示了递归LSTM树产生的密集向量，通过使用它们作为贪婪的Arc-Standard基于转换的依赖解析器的特征向量，取代了对结构特征的需求。我们还表明，rlt有能力从Cross和Huang(2016)以及Kiperwasser和Goldberg (2016b)使用的bi-LSTM情境化表示中吸收有用的信息。得到的密集向量既可以表示与依赖树相关的结构信息，也可以表示与句子中位置相关的顺序信息。所得到的解析器只需要解析器堆栈上最前面两个项的向量表示，据我们所知，这是迄今为止为Arc-Standard解析器发布的最小的特性集，同时仍然能够获得有竞争力的结果。

引用次数: 1

Improving UD processing via satellite resources for morphology 利用卫星资源改进形态学的UD处理

Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019)

Pub Date : 1900-01-01 DOI: 10.18653/v1/W19-8004

K. Dobrovoljc, T. Erjavec, Nikola Ljubesic

This paper presents the conversion of the reference language resources for Croatian and Slovenian morphology processing to UD morphological specifications. We show that the newly available training corpora and inflectional dictionaries improve the baseline stanfordnlp performance obtained on officially released UD datasets for lemmatization, morphology prediction and dependency parsing, illustrating the potential value of such satellite UD resources for languages with rich morphology.

本文介绍了克罗地亚语和斯洛文尼亚语形态学处理的参考语言资源到UD形态学规范的转换。我们的研究表明，新获得的训练语料库和屈折字典提高了在官方发布的语义语义数据集上获得的基线standfordnlp性能，用于词法化、词法预测和依赖关系分析，说明了这种卫星语义语义资源对具有丰富词法的语言的潜在价值。

引用次数: 3

Universal Dependencies in a galaxy far, far away... What makes Yoda’s English truly alien 宇宙依赖关系在遥远的星系…是什么让尤达的英语如此异类

Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019)

Pub Date : 1900-01-01 DOI: 10.18653/v1/W19-8005

N. Levshina

This paper investigates the word order used by Yoda, a character from the Star Wars universe. His clauses typically contain an Object, Oblique and/or non-finite part of the predicate followed by the subject and the finite predicate/auxiliary/copula, e.g. Help you it will. Using the sentences in Yodish from the scripts of the Star War films, this paper examines three crosslinguistically common tendencies, which can be explained by optimization of processing: the trade-off between entropy of S and O order and morphological cues, minimization of dependency lengths, and the tendency to place the verb in the end of a clause. For comparison, a standardized version of Yoda’s sentences is used, as well as the Universal Dependencies corpora. The results of quantitative analyses indicate that Yodish is less adjusted to human processor’s needs than standard English and other human languages.

本文研究了《星球大战》中尤达的语序。他的从句通常包含宾语、谓语的斜向部分和/或非有限部分，后跟主语和有限谓语/助语/联系词，例如Help you it will。本文以《星球大战》电影剧本中的Yodish句子为例，研究了三种跨语言的共同趋势，这些趋势可以通过优化处理来解释:S和O顺序的熵与形态学线索之间的权衡，依赖性长度的最小化，以及将动词放在从句末尾的趋势。为了进行比较，我们使用了尤达大师句子的标准化版本，以及通用依赖语料库。定量分析结果表明，与标准英语和其他人类语言相比，尤迪斯语对人类处理器需求的适应性较差。

引用次数: 1

Building minority dependency treebanks, dictionaries and computational grammars at the same time—an experiment in Karelian treebanking 同时建立少数依赖关系树库、字典和计算语法——卡累利阿树库的实验

Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019)

Pub Date : 1900-01-01 DOI: 10.18653/v1/W19-8016

Flammie A. Pirinen

Building a treebank from scratch can easily be an elaborate, highly time consuming task, especially when working with a minority language with moderately complex morphology and no existing resources. It is also then typically true that language experts and informants with suitable skill sets are a very scarce resource. In this experiment I have attempted to work in parallel on building NLP resources while gathering and annotating the treebank. In particular, I aim to build a decent coverage morphologically annotated lexicon suitable for rule-based morphological analysis as well as accompanying rules for basic morphosyntactic analysis. I propose here a workflow, that I have found useful in avoiding redoing same work with related NLP resource construction.

从头开始构建树库很容易是一项复杂且耗时的任务，特别是在使用具有中等复杂形态学且没有现有资源的少数民族语言时。具有适当技能的语言专家和线人是非常稀缺的资源，这也是典型的事实。在这个实验中，我试图在收集和注释树库的同时并行地构建NLP资源。特别是，我的目标是建立一个适合于基于规则的形态学分析的体面的覆盖形态学注释词典，以及用于基本形态句法分析的附带规则。我在这里提出了一个工作流，我发现它有助于避免使用相关的NLP资源构建重复相同的工作。

引用次数: 11

Towards an adequate account of parataxis in Universal Dependencies 对普遍依赖中意合的充分解释

Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019)

Pub Date : 1900-01-01 DOI: 10.18653/v1/W19-8011

Lars Ahrenberg

The parataxis relation as defined for Universal Dependencies 2.0 is general and, for this reason, sometimes hard to distinguish from competing analyses, such as coordination, conj, or apposition, appos. The specific subtypes that are listed for parataxis are also quite different in character. In this study we first show that the actual practice by UD-annotators is varied, using the parallel UD (PUD-) treebanks as data. We then review the current definitions and guidelines and suggest improvements.

为Universal Dependencies 2.0定义的parataxis关系是通用的，因此，有时很难将其与相互竞争的分析区分开来，例如协调(coordination)、联合(conj)或并列(appos)。并列的特定亚型在特征上也有很大的不同。在这项研究中，我们首先表明，使用并行UD (PUD-)树库作为数据，UD-注释器的实际实践是多种多样的。然后我们回顾当前的定义和指导方针，并提出改进建议。

引用次数: 1

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀