首页 > 最新文献

Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)最新文献

英文 中文
Character-level Annotation for Chinese Surface-Syntactic Universal Dependencies 汉字表面句法普遍依赖关系的字符级标注
Chuan-Wei Dong, Yixuan Li, Kim Gerdes
This paper presents a new schema to annotate Chinese Treebanks on the character level. The original Universal Dependencies (UD) and Surface-Syntactic Universal Dependencies (SUD) projects provide token-level resources with rich morphosyntactic language details. However, without any commonly accepted word definition for Chinese, the dependency parsing always faces the dilemma of word segmentation. Therefore we present a character-level annotation schema integrated into the existing Universal Dependencies schema as an extension.
本文提出了一种汉字层次的汉语树库标注新模式。原始的通用依赖关系(UD)和表面语法通用依赖关系(SUD)项目提供了具有丰富形态语法语言细节的令牌级资源。然而,由于汉语没有统一的词定义,依存句法分析一直面临着分词的困境。因此,我们提出了一个字符级注释模式,作为扩展集成到现有的通用依赖模式中。
{"title":"Character-level Annotation for Chinese Surface-Syntactic Universal Dependencies","authors":"Chuan-Wei Dong, Yixuan Li, Kim Gerdes","doi":"10.18653/v1/W19-7726","DOIUrl":"https://doi.org/10.18653/v1/W19-7726","url":null,"abstract":"This paper presents a new schema to annotate Chinese Treebanks on the character level. The original Universal Dependencies (UD) and Surface-Syntactic Universal Dependencies (SUD) projects provide token-level resources with rich morphosyntactic language details. However, without any commonly accepted word definition for Chinese, the dependency parsing always faces the dilemma of word segmentation. Therefore we present a character-level annotation schema integrated into the existing Universal Dependencies schema as an extension.","PeriodicalId":443459,"journal":{"name":"Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122401113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Towards Deep Universal Dependencies 走向深度普遍依赖
Kira Droganova, Daniel Zeman
Many linguistic theories and annotation frameworks contain a deep-syntactic and/or semantic layer. While many of these frameworks have been applied to more than one language, none of them is anywhere near the number of languages that are covered in Universal Dependencies (UD). In this paper, we present a prototype of Deep Universal Dependencies, a two-speed concept where minimal deep annotation can be derived automatically from surface UD trees, while richer annotation can be added for datasets where appropriate resources are available. We release the Deep UD data in Lindat.
许多语言学理论和注释框架都包含深层语法和/或语义层。虽然这些框架中的许多已经应用于不止一种语言,但它们中没有一个与通用依赖项(Universal Dependencies, UD)所涵盖的语言数量相近。在本文中,我们提出了一个深度通用依赖的原型,这是一个双速概念,其中可以从表面UD树中自动获得最小的深度注释,同时可以为适当资源可用的数据集添加更丰富的注释。我们用linda发布Deep UD数据。
{"title":"Towards Deep Universal Dependencies","authors":"Kira Droganova, Daniel Zeman","doi":"10.18653/v1/W19-7717","DOIUrl":"https://doi.org/10.18653/v1/W19-7717","url":null,"abstract":"Many linguistic theories and annotation frameworks contain a deep-syntactic and/or semantic layer. While many of these frameworks have been applied to more than one language, none of them is anywhere near the number of languages that are covered in Universal Dependencies (UD). In this paper, we present a prototype of Deep Universal Dependencies, a two-speed concept where minimal deep annotation can be derived automatically from surface UD trees, while richer annotation can be added for datasets where appropriate resources are available. We release the Deep UD data in Lindat.","PeriodicalId":443459,"journal":{"name":"Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115059774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Quantitative Analysis on verb valence evolution of Chinese 汉语动词配价演变的定量分析
Bingli Liu, Chunshan Xu
The paper aims at studying the evolution of syntactic valency of Chinese verbs. We construct three corpora of ancient classical Chinese, ancient vernacular Chinese and modern vernacular Chinese. From these corpora, ten main verbs are selected to probe into the evolution of their valency, namely, their complements and adjuncts. The paper reveals that the syntactic structures has a trend toward complex. The ancient classical Chinese and the ancient vernacular Chinese are similar in sentence structure. With the transformation from the ancient vernacular to the modern vernacular, syntactic complexity increases dramatically, indicating drastic changes in sentence structure.
本文旨在研究汉语动词句法配价的演变。我们构建了古代文言文、古代白话文和现代白话文三个语料库。从这些语料库中选取了10个主要的动词来探讨它们的配价演变,即它们的补语和副语。文章发现,汉语汉语的句法结构呈现复合化的趋势。古代文言文和古代白话文在句式结构上是相似的。随着古代白话向现代白话的转变,句法复杂性急剧增加,表明句子结构发生了巨大变化。
{"title":"Quantitative Analysis on verb valence evolution of Chinese","authors":"Bingli Liu, Chunshan Xu","doi":"10.18653/v1/W19-7721","DOIUrl":"https://doi.org/10.18653/v1/W19-7721","url":null,"abstract":"The paper aims at studying the evolution of syntactic valency of Chinese verbs. We construct three corpora of ancient classical Chinese, ancient vernacular Chinese and modern vernacular Chinese. From these corpora, ten main verbs are selected to probe into the evolution of their valency, namely, their complements and adjuncts. The paper reveals that the syntactic structures has a trend toward complex. The ancient classical Chinese and the ancient vernacular Chinese are similar in sentence structure. With the transformation from the ancient vernacular to the modern vernacular, syntactic complexity increases dramatically, indicating drastic changes in sentence structure.","PeriodicalId":443459,"journal":{"name":"Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)","volume":"16 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120845930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The evolution of spatial rationales in Tesnière’s stemmas tesniires词干中空间理据的演化
N. Mazziotta
This paper investigates the evolution of the spatial rationales of Tesnière’s syntactic diagrams (stemma). I show that the conventions change from his first attempts to model complete sentences up to the classical stemma he uses in his Elements of structural syntax (1959). From mostly symbolic representations of hierarchy (directed arrows from the dependent to the governor), he shifts to a more configurational one (connected dependents are placed below the governor).
本文研究了tesni句法图(系统)的空间基本原理的演变。我表明,从他第一次尝试为完整句子建模到他在《结构句法要素》(1959)中使用的经典体系,这些惯例发生了变化。他从层次结构的大多数符号表示(从依赖项到调控器的定向箭头)转向了更具配置性的表示(连接的依赖项位于调控器下方)。
{"title":"The evolution of spatial rationales in Tesnière’s stemmas","authors":"N. Mazziotta","doi":"10.18653/v1/W19-7709","DOIUrl":"https://doi.org/10.18653/v1/W19-7709","url":null,"abstract":"This paper investigates the evolution of the spatial rationales of Tesnière’s syntactic diagrams (stemma). I show that the conventions change from his first attempts to model complete sentences up to the classical stemma he uses in his Elements of structural syntax (1959). From mostly symbolic representations of hierarchy (directed arrows from the dependent to the governor), he shifts to a more configurational one (connected dependents are placed below the governor).","PeriodicalId":443459,"journal":{"name":"Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)","volume":"2676 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127820964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
SyntaxFest 2019 Invited talk - Transferring NLP models across languages and domains SyntaxFest 2019特邀演讲-跨语言和领域转移NLP模型
Barbara Plank
How can we build Natural Language Processing models for new domains and new languages? In this talk I will survey some recent advances to address this ubiquitous challenge, from crosslingual transfer to learning models under distant supervision from disparate sources, multitasklearning and data selection.
我们如何为新领域和新语言构建自然语言处理模型?在这次演讲中,我将概述一些解决这一普遍挑战的最新进展,从跨语言迁移到来自不同来源的远程监督下的学习模型,多任务学习和数据选择。
{"title":"SyntaxFest 2019 Invited talk - Transferring NLP models across languages and domains","authors":"Barbara Plank","doi":"10.18653/v1/w19-7702","DOIUrl":"https://doi.org/10.18653/v1/w19-7702","url":null,"abstract":"How can we build Natural Language Processing models for new domains and new languages? In this talk I will survey some recent advances to address this ubiquitous challenge, from crosslingual transfer to learning models under distant supervision from disparate sources, multitasklearning and data selection.","PeriodicalId":443459,"journal":{"name":"Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128258098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Spanish E-dictionary of Collocations 西班牙语电子搭配词典
Maria Auxiliadora Barrios Rodriguez, I. Boguslavsky
We present a new e-dictionary of Spanish (in progress) called Diretes (DIccionario RETicular de ESpañol). It contains descriptions of collocations by means of Lexical Functions (LFs), both standard and non-standard, in the sense of the Meaning – Text Theory by Igor Mel’ č uk. At present, Diretes contains about 50,000 collocations. This paper concentrates on the collocations in which the collocate is an adjectival or an adverbial phrase. These collocations are mostly extracted from the Práctico combinatorial dictionary of modern Spanish. We explain the structure of the e-dictionary, the types of information it contains and the way it is presented. We also show how the LF-interpreted collocations can be used in NLP applications. We demonstrate it with the SemETAP semantic analyzer, in which LFs are used to normalize semantic structures and make inferences.
我们提出了一个新的西班牙语电子词典(正在进行中),称为directes (DIccionario RETicular de ESpañol)。它包含了用词汇功能(LFs)对搭配的描述,包括标准的和非标准的,即伊戈尔·梅尔乌克的意义-文本理论。目前,directes包含约50,000种搭配。本文主要研究形容词和副词短语的搭配。这些搭配大多是从Práctico现代西班牙语组合词典中提取出来的。我们解释了电子词典的结构,它包含的信息类型和它的呈现方式。我们还展示了如何在NLP应用中使用lf解释的搭配。我们用SemETAP语义分析器来演示它,其中LFs用于规范化语义结构并进行推理。
{"title":"A Spanish E-dictionary of Collocations","authors":"Maria Auxiliadora Barrios Rodriguez, I. Boguslavsky","doi":"10.18653/v1/W19-7719","DOIUrl":"https://doi.org/10.18653/v1/W19-7719","url":null,"abstract":"We present a new e-dictionary of Spanish (in progress) called Diretes (DIccionario RETicular de ESpañol). It contains descriptions of collocations by means of Lexical Functions (LFs), both standard and non-standard, in the sense of the Meaning – Text Theory by Igor Mel’ č uk. At present, Diretes contains about 50,000 collocations. This paper concentrates on the collocations in which the collocate is an adjectival or an adverbial phrase. These collocations are mostly extracted from the Práctico combinatorial dictionary of modern Spanish. We explain the structure of the e-dictionary, the types of information it contains and the way it is presented. We also show how the LF-interpreted collocations can be used in NLP applications. We demonstrate it with the SemETAP semantic analyzer, in which LFs are used to normalize semantic structures and make inferences.","PeriodicalId":443459,"journal":{"name":"Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121870403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Interpreting and defining connections in a dependency structure 解释和定义依赖关系结构中的连接
Sylvain Kahane
This paper highlights the advantages of not interpreting connections in a dependency tree as combinations between words but of interpreting them more broadly as sets of combinations between catenae. One of the most important outcomes is the possibility of associating a connection structure to any set of combinations assuming some well-formedness properties and of providing a new way to define dependency trees and other kinds of dependency structures, which are not trees but “bubble graphs”. The status of catenae of dependency trees as syntactic units is discussed.
本文强调了不将依赖树中的连接解释为单词之间的组合,而是将其更广泛地解释为单词之间的组合集的优点。最重要的结果之一是,可以将连接结构与任何假设具有良好格式属性的组合关联起来,并提供一种新的方法来定义依赖树和其他类型的依赖结构,这些依赖结构不是树,而是“气泡图”。讨论了依赖树链作为句法单元的地位。
{"title":"Interpreting and defining connections in a dependency structure","authors":"Sylvain Kahane","doi":"10.18653/v1/W19-7711","DOIUrl":"https://doi.org/10.18653/v1/W19-7711","url":null,"abstract":"This paper highlights the advantages of not interpreting connections in a dependency tree as combinations between words but of interpreting them more broadly as sets of combinations between catenae. One of the most important outcomes is the possibility of associating a connection structure to any set of combinations assuming some well-formedness properties and of providing a new way to define dependency trees and other kinds of dependency structures, which are not trees but “bubble graphs”. The status of catenae of dependency trees as syntactic units is discussed.","PeriodicalId":443459,"journal":{"name":"Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126345364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Word order variation in Mbyá Guaraní mby<e:1> Guaraní中的词序变化
Angelika Kiss, Guillaume Thomas
This paper presents the preliminary results of a multifactorial analysis of word order in Mbyá Guaraní, a Tupí-Guaraní language spoken in Argentina, Brazil and Paraguay, based on a corpus of written narratives with multiple layers of annotation. Our goals are to assess the validity of previous claims about Mbyá word order (Martins, 2003; Dooley, 1982; Dooley, 2015), and to explore the effects of different types of factors on the position of core arguments relative to their verb. We show that SV and VO are the most frequently attested orders in matrix clauses and that subordinate clauses favour the OV order. Givenness, transitivity and clause type (root vs subordinate) are found to be significant predictors of word order. We identify differences in object position between Mbyá and Paraguayan Guaraní (Tonhauser and Colijn, 2010), and we argue that these differences support Dietrich (2009)’s proposal that Tupí-Guaraní languages are undergoing a change in word order from OV to VO, induced by contact with Spanish and Portuguese.
本文介绍了基于多层注释的书面叙述语料库对阿根廷、巴西和巴拉圭使用的Tupí-Guaraní语言mby Guaraní的词序进行多因素分析的初步结果。我们的目标是评估先前关于mby语序的说法的有效性(Martins, 2003;杜利,1982;Dooley, 2015),并探讨不同类型的因素对核心论点相对于其动词位置的影响。我们证明了SV和VO是矩阵分句中最常被证明的顺序,并且从属分句倾向于OV顺序。给予性、及物性和子句类型(词根和从属)被发现是词序的重要预测因子。我们确定了mby语和巴拉圭语Guaraní在物体位置上的差异(Tonhauser和Colijn, 2010),我们认为这些差异支持Dietrich(2009)的建议,即Tupí-Guaraní语言正在经历由OV到VO的词序变化,这是由与西班牙语和葡萄牙语的接触引起的。
{"title":"Word order variation in Mbyá Guaraní","authors":"Angelika Kiss, Guillaume Thomas","doi":"10.18653/v1/W19-7714","DOIUrl":"https://doi.org/10.18653/v1/W19-7714","url":null,"abstract":"This paper presents the preliminary results of a multifactorial analysis of word order in Mbyá Guaraní, a Tupí-Guaraní language spoken in Argentina, Brazil and Paraguay, based on a corpus of written narratives with multiple layers of annotation. Our goals are to assess the validity of previous claims about Mbyá word order (Martins, 2003; Dooley, 1982; Dooley, 2015), and to explore the effects of different types of factors on the position of core arguments relative to their verb. We show that SV and VO are the most frequently attested orders in matrix clauses and that subordinate clauses favour the OV order. Givenness, transitivity and clause type (root vs subordinate) are found to be significant predictors of word order. We identify differences in object position between Mbyá and Paraguayan Guaraní (Tonhauser and Colijn, 2010), and we argue that these differences support Dietrich (2009)’s proposal that Tupí-Guaraní languages are undergoing a change in word order from OV to VO, induced by contact with Spanish and Portuguese.","PeriodicalId":443459,"journal":{"name":"Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114446913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Pāṇinian Syntactico-Semantic Relation Labels Pāṇinian语法语义关系标签
Amba P. Kulkarni, D. Sharma
We present in this paper a list of dependency relations based on Pāṇini’s grammar for Sanskrit. The important feature of this list is that most of the relations represent well defined semantics that can be extracted from the surface string without any extra-linguistic information.
我们在本文中提出了一个基于Pāṇini的梵语语法的依赖关系列表。该列表的重要特征是,大多数关系表示定义良好的语义,可以从表面字符串中提取,而不需要任何额外的语言信息。
{"title":"Pāṇinian Syntactico-Semantic Relation Labels","authors":"Amba P. Kulkarni, D. Sharma","doi":"10.18653/v1/W19-7724","DOIUrl":"https://doi.org/10.18653/v1/W19-7724","url":null,"abstract":"We present in this paper a list of dependency relations based on Pāṇini’s grammar for Sanskrit. The important feature of this list is that most of the relations represent well defined semantics that can be extracted from the surface string without any extra-linguistic information.","PeriodicalId":443459,"journal":{"name":"Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131964071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Syntactic dependencies correspond to word pairs with high mutual information 句法依赖关系对应于互信息较高的词对
Richard Futrell, Peng Qian, E. Gibson, Evelina Fedorenko, I. Blank
How is syntactic dependency structure reflected in the statistical distribution of words in corpora? Here we give empirical evidence and theoretical arguments for what we call the Head–Dependent Mutual Information (HDMI) Hypothesis: that syntactic heads and their dependents correspond to word pairs with especially high mutual information, an information-theoretic measure of strength of association. In support of this idea, we estimate mutual information between word pairs in dependencies based on an automatically-parsed corpus of 320 million tokens of English web text, finding that the mutual information between words in dependencies is robustly higher than a controlled baseline consisting of non-dependent word pairs. Next, we give a formal argument which derives the HDMI Hypothesis from a probabilistic interpretation of the postulates of dependency grammar. Our study also provides some useful empirical results about mutual information in corpora: we find that maximum-likelihood estimates of mutual information between raw word-forms are biased even at our large sample size, and we find that there is a general decay of mutual information between part-of-speech tags with distance.
语料库中词的统计分布如何反映句法依存结构?在这里,我们为我们所谓的头部依赖互信息假说(HDMI假说)提供了经验证据和理论论据:句法头部及其依赖关系对应于具有特别高互信息的单词对,这是一种信息理论衡量联想强度的方法。为了支持这一想法,我们基于自动解析的3.2亿个英语网络文本标记的语料库估计依赖词对之间的互信息,发现依赖词之间的互信息显著高于由非依赖词对组成的控制基线。接下来,我们给出了一个形式论证,该论证从依赖语法的假设的概率解释中推导出HDMI假设。我们的研究还提供了一些关于语料库中互信息的有用的经验结果:我们发现,即使在我们的大样本量下,原始词形之间互信息的最大似然估计也是有偏差的,并且我们发现词性标签之间的互信息随着距离的增加而普遍衰减。
{"title":"Syntactic dependencies correspond to word pairs with high mutual information","authors":"Richard Futrell, Peng Qian, E. Gibson, Evelina Fedorenko, I. Blank","doi":"10.18653/v1/W19-7703","DOIUrl":"https://doi.org/10.18653/v1/W19-7703","url":null,"abstract":"How is syntactic dependency structure reflected in the statistical distribution of words in corpora? Here we give empirical evidence and theoretical arguments for what we call the Head–Dependent Mutual Information (HDMI) Hypothesis: that syntactic heads and their dependents correspond to word pairs with especially high mutual information, an information-theoretic measure of strength of association. In support of this idea, we estimate mutual information between word pairs in dependencies based on an automatically-parsed corpus of 320 million tokens of English web text, finding that the mutual information between words in dependencies is robustly higher than a controlled baseline consisting of non-dependent word pairs. Next, we give a formal argument which derives the HDMI Hypothesis from a probabilistic interpretation of the postulates of dependency grammar. Our study also provides some useful empirical results about mutual information in corpora: we find that maximum-likelihood estimates of mutual information between raw word-forms are biased even at our large sample size, and we find that there is a general decay of mutual information between part-of-speech tags with distance.","PeriodicalId":443459,"journal":{"name":"Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)","volume":"515 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116088281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
期刊
Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1