Linear-Time Calculation of the Expected Sum of Edge Lengths in Random Projective Linearizations of Trees

IF 5.3 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Computational Linguistics Pub Date : 2021-07-07 DOI:10.1162/coli_a_00442

Lluís Alemany-Puig, R. Ferrer-i-Cancho

{"title":"Linear-Time Calculation of the Expected Sum of Edge Lengths in Random Projective Linearizations of Trees","authors":"Lluís Alemany-Puig, R. Ferrer-i-Cancho","doi":"10.1162/coli_a_00442","DOIUrl":null,"url":null,"abstract":"Abstract The syntactic structure of a sentence is often represented using syntactic dependency trees. The sum of the distances between syntactically related words has been in the limelight for the past decades. Research on dependency distances led to the formulation of the principle of dependency distance minimization whereby words in sentences are ordered so as to minimize that sum. Numerous random baselines have been defined to carry out related quantitative studies on lan- guages. The simplest random baseline is the expected value of the sum in unconstrained random permutations of the words in the sentence, namely, when all the shufflings of the words of a sentence are allowed and equally likely. Here we focus on a popular baseline: random projective per- mutations of the words of the sentence, that is, permutations where the syntactic dependency structure is projective, a formal constraint that sentences satisfy often in languages. Thus far, the expectation of the sum of dependency distances in random projective shufflings of a sentence has been estimated approximately with a Monte Carlo procedure whose cost is of the order of Rn, where n is the number of words of the sentence and R is the number of samples; it is well known that the larger R is, the lower the error of the estimation but the larger the time cost. Here we pre- sent formulae to compute that expectation without error in time of the order of n. Furthermore, we show that star trees maximize it, and provide an algorithm to retrieve the trees that minimize it.","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"48 1","pages":"491-516"},"PeriodicalIF":5.3000,"publicationDate":"2021-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Linguistics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1162/coli_a_00442","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 5

Abstract

Abstract The syntactic structure of a sentence is often represented using syntactic dependency trees. The sum of the distances between syntactically related words has been in the limelight for the past decades. Research on dependency distances led to the formulation of the principle of dependency distance minimization whereby words in sentences are ordered so as to minimize that sum. Numerous random baselines have been defined to carry out related quantitative studies on lan- guages. The simplest random baseline is the expected value of the sum in unconstrained random permutations of the words in the sentence, namely, when all the shufflings of the words of a sentence are allowed and equally likely. Here we focus on a popular baseline: random projective per- mutations of the words of the sentence, that is, permutations where the syntactic dependency structure is projective, a formal constraint that sentences satisfy often in languages. Thus far, the expectation of the sum of dependency distances in random projective shufflings of a sentence has been estimated approximately with a Monte Carlo procedure whose cost is of the order of Rn, where n is the number of words of the sentence and R is the number of samples; it is well known that the larger R is, the lower the error of the estimation but the larger the time cost. Here we pre- sent formulae to compute that expectation without error in time of the order of n. Furthermore, we show that star trees maximize it, and provide an algorithm to retrieve the trees that minimize it.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

树的随机投影线性化中期望边长和的线性时间计算

摘要句子的句法结构通常用句法依赖树来表示。在过去的几十年里，句法相关词之间的距离总和一直是人们关注的焦点。对依赖距离的研究导致了依赖距离最小化原则的形成，即对句子中的单词进行排序，使其总和最小。为了对语言进行相关的定量研究，已经确定了许多随机基线。最简单的随机基线是句子中单词的无约束随机排列的和的期望值，即当一个句子的所有单词的洗牌都是允许的并且是等可能的。在这里，我们关注一个流行的基线:句子单词的随机投影突变，即句法依赖结构是投影的排列，这是语言中句子经常满足的一种形式约束。到目前为止，一个句子的随机投影洗牌中依赖距离和的期望已经用蒙特卡罗过程近似估计，该过程的代价是Rn阶，其中n是句子的单词数，R是样本数;众所周知，R越大，估计误差越小，但时间代价越大。在这里，我们给出了计算期望的公式，在n阶的时间内没有误差。此外，我们展示了星树最大化它，并提供了一种算法来检索最小化它的树。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Computational Linguistics 工程技术-计算机：跨学科应用

CiteScore

15.80

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： Computational Linguistics, the longest-running publication dedicated solely to the computational and mathematical aspects of language and the design of natural language processing systems, provides university and industry linguists, computational linguists, AI and machine learning researchers, cognitive scientists, speech specialists, and philosophers with the latest insights into the computational aspects of language research.