This paper presents a new schema to annotate Chinese Treebanks on the character level. The original Universal Dependencies (UD) and Surface-Syntactic Universal Dependencies (SUD) projects provide token-level resources with rich morphosyntactic language details. However, without any commonly accepted word definition for Chinese, the dependency parsing always faces the dilemma of word segmentation. Therefore we present a character-level annotation schema integrated into the existing Universal Dependencies schema as an extension.
{"title":"Character-level Annotation for Chinese Surface-Syntactic Universal Dependencies","authors":"Chuan-Wei Dong, Yixuan Li, Kim Gerdes","doi":"10.18653/v1/W19-7726","DOIUrl":"https://doi.org/10.18653/v1/W19-7726","url":null,"abstract":"This paper presents a new schema to annotate Chinese Treebanks on the character level. The original Universal Dependencies (UD) and Surface-Syntactic Universal Dependencies (SUD) projects provide token-level resources with rich morphosyntactic language details. However, without any commonly accepted word definition for Chinese, the dependency parsing always faces the dilemma of word segmentation. Therefore we present a character-level annotation schema integrated into the existing Universal Dependencies schema as an extension.","PeriodicalId":443459,"journal":{"name":"Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122401113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many linguistic theories and annotation frameworks contain a deep-syntactic and/or semantic layer. While many of these frameworks have been applied to more than one language, none of them is anywhere near the number of languages that are covered in Universal Dependencies (UD). In this paper, we present a prototype of Deep Universal Dependencies, a two-speed concept where minimal deep annotation can be derived automatically from surface UD trees, while richer annotation can be added for datasets where appropriate resources are available. We release the Deep UD data in Lindat.
{"title":"Towards Deep Universal Dependencies","authors":"Kira Droganova, Daniel Zeman","doi":"10.18653/v1/W19-7717","DOIUrl":"https://doi.org/10.18653/v1/W19-7717","url":null,"abstract":"Many linguistic theories and annotation frameworks contain a deep-syntactic and/or semantic layer. While many of these frameworks have been applied to more than one language, none of them is anywhere near the number of languages that are covered in Universal Dependencies (UD). In this paper, we present a prototype of Deep Universal Dependencies, a two-speed concept where minimal deep annotation can be derived automatically from surface UD trees, while richer annotation can be added for datasets where appropriate resources are available. We release the Deep UD data in Lindat.","PeriodicalId":443459,"journal":{"name":"Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115059774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The paper aims at studying the evolution of syntactic valency of Chinese verbs. We construct three corpora of ancient classical Chinese, ancient vernacular Chinese and modern vernacular Chinese. From these corpora, ten main verbs are selected to probe into the evolution of their valency, namely, their complements and adjuncts. The paper reveals that the syntactic structures has a trend toward complex. The ancient classical Chinese and the ancient vernacular Chinese are similar in sentence structure. With the transformation from the ancient vernacular to the modern vernacular, syntactic complexity increases dramatically, indicating drastic changes in sentence structure.
{"title":"Quantitative Analysis on verb valence evolution of Chinese","authors":"Bingli Liu, Chunshan Xu","doi":"10.18653/v1/W19-7721","DOIUrl":"https://doi.org/10.18653/v1/W19-7721","url":null,"abstract":"The paper aims at studying the evolution of syntactic valency of Chinese verbs. We construct three corpora of ancient classical Chinese, ancient vernacular Chinese and modern vernacular Chinese. From these corpora, ten main verbs are selected to probe into the evolution of their valency, namely, their complements and adjuncts. The paper reveals that the syntactic structures has a trend toward complex. The ancient classical Chinese and the ancient vernacular Chinese are similar in sentence structure. With the transformation from the ancient vernacular to the modern vernacular, syntactic complexity increases dramatically, indicating drastic changes in sentence structure.","PeriodicalId":443459,"journal":{"name":"Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)","volume":"16 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120845930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper investigates the evolution of the spatial rationales of Tesnière’s syntactic diagrams (stemma). I show that the conventions change from his first attempts to model complete sentences up to the classical stemma he uses in his Elements of structural syntax (1959). From mostly symbolic representations of hierarchy (directed arrows from the dependent to the governor), he shifts to a more configurational one (connected dependents are placed below the governor).
{"title":"The evolution of spatial rationales in Tesnière’s stemmas","authors":"N. Mazziotta","doi":"10.18653/v1/W19-7709","DOIUrl":"https://doi.org/10.18653/v1/W19-7709","url":null,"abstract":"This paper investigates the evolution of the spatial rationales of Tesnière’s syntactic diagrams (stemma). I show that the conventions change from his first attempts to model complete sentences up to the classical stemma he uses in his Elements of structural syntax (1959). From mostly symbolic representations of hierarchy (directed arrows from the dependent to the governor), he shifts to a more configurational one (connected dependents are placed below the governor).","PeriodicalId":443459,"journal":{"name":"Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)","volume":"2676 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127820964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
How can we build Natural Language Processing models for new domains and new languages? In this talk I will survey some recent advances to address this ubiquitous challenge, from crosslingual transfer to learning models under distant supervision from disparate sources, multitasklearning and data selection.
{"title":"SyntaxFest 2019 Invited talk - Transferring NLP models across languages and domains","authors":"Barbara Plank","doi":"10.18653/v1/w19-7702","DOIUrl":"https://doi.org/10.18653/v1/w19-7702","url":null,"abstract":"How can we build Natural Language Processing models for new domains and new languages? In this talk I will survey some recent advances to address this ubiquitous challenge, from crosslingual transfer to learning models under distant supervision from disparate sources, multitasklearning and data selection.","PeriodicalId":443459,"journal":{"name":"Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128258098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maria Auxiliadora Barrios Rodriguez, I. Boguslavsky
We present a new e-dictionary of Spanish (in progress) called Diretes (DIccionario RETicular de ESpañol). It contains descriptions of collocations by means of Lexical Functions (LFs), both standard and non-standard, in the sense of the Meaning – Text Theory by Igor Mel’ č uk. At present, Diretes contains about 50,000 collocations. This paper concentrates on the collocations in which the collocate is an adjectival or an adverbial phrase. These collocations are mostly extracted from the Práctico combinatorial dictionary of modern Spanish. We explain the structure of the e-dictionary, the types of information it contains and the way it is presented. We also show how the LF-interpreted collocations can be used in NLP applications. We demonstrate it with the SemETAP semantic analyzer, in which LFs are used to normalize semantic structures and make inferences.
我们提出了一个新的西班牙语电子词典(正在进行中),称为directes (DIccionario RETicular de ESpañol)。它包含了用词汇功能(LFs)对搭配的描述,包括标准的和非标准的,即伊戈尔·梅尔乌克的意义-文本理论。目前,directes包含约50,000种搭配。本文主要研究形容词和副词短语的搭配。这些搭配大多是从Práctico现代西班牙语组合词典中提取出来的。我们解释了电子词典的结构,它包含的信息类型和它的呈现方式。我们还展示了如何在NLP应用中使用lf解释的搭配。我们用SemETAP语义分析器来演示它,其中LFs用于规范化语义结构并进行推理。
{"title":"A Spanish E-dictionary of Collocations","authors":"Maria Auxiliadora Barrios Rodriguez, I. Boguslavsky","doi":"10.18653/v1/W19-7719","DOIUrl":"https://doi.org/10.18653/v1/W19-7719","url":null,"abstract":"We present a new e-dictionary of Spanish (in progress) called Diretes (DIccionario RETicular de ESpañol). It contains descriptions of collocations by means of Lexical Functions (LFs), both standard and non-standard, in the sense of the Meaning – Text Theory by Igor Mel’ č uk. At present, Diretes contains about 50,000 collocations. This paper concentrates on the collocations in which the collocate is an adjectival or an adverbial phrase. These collocations are mostly extracted from the Práctico combinatorial dictionary of modern Spanish. We explain the structure of the e-dictionary, the types of information it contains and the way it is presented. We also show how the LF-interpreted collocations can be used in NLP applications. We demonstrate it with the SemETAP semantic analyzer, in which LFs are used to normalize semantic structures and make inferences.","PeriodicalId":443459,"journal":{"name":"Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121870403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper highlights the advantages of not interpreting connections in a dependency tree as combinations between words but of interpreting them more broadly as sets of combinations between catenae. One of the most important outcomes is the possibility of associating a connection structure to any set of combinations assuming some well-formedness properties and of providing a new way to define dependency trees and other kinds of dependency structures, which are not trees but “bubble graphs”. The status of catenae of dependency trees as syntactic units is discussed.
{"title":"Interpreting and defining connections in a dependency structure","authors":"Sylvain Kahane","doi":"10.18653/v1/W19-7711","DOIUrl":"https://doi.org/10.18653/v1/W19-7711","url":null,"abstract":"This paper highlights the advantages of not interpreting connections in a dependency tree as combinations between words but of interpreting them more broadly as sets of combinations between catenae. One of the most important outcomes is the possibility of associating a connection structure to any set of combinations assuming some well-formedness properties and of providing a new way to define dependency trees and other kinds of dependency structures, which are not trees but “bubble graphs”. The status of catenae of dependency trees as syntactic units is discussed.","PeriodicalId":443459,"journal":{"name":"Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126345364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents the preliminary results of a multifactorial analysis of word order in Mbyá Guaraní, a Tupí-Guaraní language spoken in Argentina, Brazil and Paraguay, based on a corpus of written narratives with multiple layers of annotation. Our goals are to assess the validity of previous claims about Mbyá word order (Martins, 2003; Dooley, 1982; Dooley, 2015), and to explore the effects of different types of factors on the position of core arguments relative to their verb. We show that SV and VO are the most frequently attested orders in matrix clauses and that subordinate clauses favour the OV order. Givenness, transitivity and clause type (root vs subordinate) are found to be significant predictors of word order. We identify differences in object position between Mbyá and Paraguayan Guaraní (Tonhauser and Colijn, 2010), and we argue that these differences support Dietrich (2009)’s proposal that Tupí-Guaraní languages are undergoing a change in word order from OV to VO, induced by contact with Spanish and Portuguese.
{"title":"Word order variation in Mbyá Guaraní","authors":"Angelika Kiss, Guillaume Thomas","doi":"10.18653/v1/W19-7714","DOIUrl":"https://doi.org/10.18653/v1/W19-7714","url":null,"abstract":"This paper presents the preliminary results of a multifactorial analysis of word order in Mbyá Guaraní, a Tupí-Guaraní language spoken in Argentina, Brazil and Paraguay, based on a corpus of written narratives with multiple layers of annotation. Our goals are to assess the validity of previous claims about Mbyá word order (Martins, 2003; Dooley, 1982; Dooley, 2015), and to explore the effects of different types of factors on the position of core arguments relative to their verb. We show that SV and VO are the most frequently attested orders in matrix clauses and that subordinate clauses favour the OV order. Givenness, transitivity and clause type (root vs subordinate) are found to be significant predictors of word order. We identify differences in object position between Mbyá and Paraguayan Guaraní (Tonhauser and Colijn, 2010), and we argue that these differences support Dietrich (2009)’s proposal that Tupí-Guaraní languages are undergoing a change in word order from OV to VO, induced by contact with Spanish and Portuguese.","PeriodicalId":443459,"journal":{"name":"Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114446913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present in this paper a list of dependency relations based on Pāṇini’s grammar for Sanskrit. The important feature of this list is that most of the relations represent well defined semantics that can be extracted from the surface string without any extra-linguistic information.
{"title":"Pāṇinian Syntactico-Semantic Relation Labels","authors":"Amba P. Kulkarni, D. Sharma","doi":"10.18653/v1/W19-7724","DOIUrl":"https://doi.org/10.18653/v1/W19-7724","url":null,"abstract":"We present in this paper a list of dependency relations based on Pāṇini’s grammar for Sanskrit. The important feature of this list is that most of the relations represent well defined semantics that can be extracted from the surface string without any extra-linguistic information.","PeriodicalId":443459,"journal":{"name":"Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131964071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Richard Futrell, Peng Qian, E. Gibson, Evelina Fedorenko, I. Blank
How is syntactic dependency structure reflected in the statistical distribution of words in corpora? Here we give empirical evidence and theoretical arguments for what we call the Head–Dependent Mutual Information (HDMI) Hypothesis: that syntactic heads and their dependents correspond to word pairs with especially high mutual information, an information-theoretic measure of strength of association. In support of this idea, we estimate mutual information between word pairs in dependencies based on an automatically-parsed corpus of 320 million tokens of English web text, finding that the mutual information between words in dependencies is robustly higher than a controlled baseline consisting of non-dependent word pairs. Next, we give a formal argument which derives the HDMI Hypothesis from a probabilistic interpretation of the postulates of dependency grammar. Our study also provides some useful empirical results about mutual information in corpora: we find that maximum-likelihood estimates of mutual information between raw word-forms are biased even at our large sample size, and we find that there is a general decay of mutual information between part-of-speech tags with distance.
{"title":"Syntactic dependencies correspond to word pairs with high mutual information","authors":"Richard Futrell, Peng Qian, E. Gibson, Evelina Fedorenko, I. Blank","doi":"10.18653/v1/W19-7703","DOIUrl":"https://doi.org/10.18653/v1/W19-7703","url":null,"abstract":"How is syntactic dependency structure reflected in the statistical distribution of words in corpora? Here we give empirical evidence and theoretical arguments for what we call the Head–Dependent Mutual Information (HDMI) Hypothesis: that syntactic heads and their dependents correspond to word pairs with especially high mutual information, an information-theoretic measure of strength of association. In support of this idea, we estimate mutual information between word pairs in dependencies based on an automatically-parsed corpus of 320 million tokens of English web text, finding that the mutual information between words in dependencies is robustly higher than a controlled baseline consisting of non-dependent word pairs. Next, we give a formal argument which derives the HDMI Hypothesis from a probabilistic interpretation of the postulates of dependency grammar. Our study also provides some useful empirical results about mutual information in corpora: we find that maximum-likelihood estimates of mutual information between raw word-forms are biased even at our large sample size, and we find that there is a general decay of mutual information between part-of-speech tags with distance.","PeriodicalId":443459,"journal":{"name":"Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)","volume":"515 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116088281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}