首页 > 最新文献

Journal of Language Evolution最新文献

英文 中文
Phylogeny of the Turkic Languages Inferred from Basic Vocabulary: Limitations of the Lexicostatistical Methods in an Intensive Contact Situation 从基本词汇推断突厥语言的系统发育:密集接触情况下词汇统计方法的局限性
IF 2.6 Pub Date : 2022-07-06 DOI: 10.1093/jole/lzac006
Ilya M Egorov, Anna V Dybo, Alexei S Kassian
This article provides an attempt to revise the phylogenetic structure of the Turkic family using a computational lexicostatistical approach. The methodological framework of the present research is characterized by the following features: (1) wordlists with strictly controlled semantics; (2) step-by-step reconstruction using Swadesh wordlists for proto-languages; (3) three stages of post-processing of the input data (analysis of root cognacy, elimination of derivational drift, and optimization of homoplasy); (4) application of several computational algorithms (Starling neighbor-joining, Bayesian MCMC, and maximum parsimony). The analysis provided confirms the status of Chuvash as the first outlier and suggests a subsequent multifurcation of Proto-Nuclear-Turkic into eight branches. The Siberian Turkic group is a purely areal unity, that is, Yakut-Dolgan, Tofa-Tuvinian, Khakas-Mrassu, Sarygh Yugur and Altai do not form a clade. Altai is grouped together with the Kipchak languages as a separate taxon; it does not show a particularly close relationship with Kirghiz, which belongs to another Kipchak subgroup. Karluk is a low-level taxon inside the Kipchak clade.
这篇文章提供了一个尝试修改突厥家族的系统发育结构使用计算词典统计方法。本研究的方法论框架具有以下特点:(1)严格控制语义的词表;(2)利用Swadesh词表对原语言进行分步重建;(3)输入数据的三个后处理阶段(词根同源性分析、导数漂移消除和同质性优化);(4)几种计算算法(Starling neighbor-joining, Bayesian MCMC, maximum parsimony)的应用。分析证实了Chuvash作为第一个异常的地位,并提出了原始核突厥语系随后的多分支,分为八个分支。西伯利亚突厥群是一个纯粹的地区统一,也就是说,雅库特-多尔干,托法-图维尼亚,Khakas-Mrassu, Sarygh Yugur和阿尔泰不形成一个分支。阿尔泰语与奇普恰克语归为一个单独的分类群;它并没有显示出与吉尔吉斯语的特别密切的关系,吉尔吉斯语属于另一个奇普察克亚群。Karluk是Kipchak分支中的一个低级分类单元。
{"title":"Phylogeny of the Turkic Languages Inferred from Basic Vocabulary: Limitations of the Lexicostatistical Methods in an Intensive Contact Situation","authors":"Ilya M Egorov, Anna V Dybo, Alexei S Kassian","doi":"10.1093/jole/lzac006","DOIUrl":"https://doi.org/10.1093/jole/lzac006","url":null,"abstract":"This article provides an attempt to revise the phylogenetic structure of the Turkic family using a computational lexicostatistical approach. The methodological framework of the present research is characterized by the following features: (1) wordlists with strictly controlled semantics; (2) step-by-step reconstruction using Swadesh wordlists for proto-languages; (3) three stages of post-processing of the input data (analysis of root cognacy, elimination of derivational drift, and optimization of homoplasy); (4) application of several computational algorithms (Starling neighbor-joining, Bayesian MCMC, and maximum parsimony). The analysis provided confirms the status of Chuvash as the first outlier and suggests a subsequent multifurcation of Proto-Nuclear-Turkic into eight branches. The Siberian Turkic group is a purely areal unity, that is, Yakut-Dolgan, Tofa-Tuvinian, Khakas-Mrassu, Sarygh Yugur and Altai do not form a clade. Altai is grouped together with the Kipchak languages as a separate taxon; it does not show a particularly close relationship with Kirghiz, which belongs to another Kipchak subgroup. Karluk is a low-level taxon inside the Kipchak clade.","PeriodicalId":37118,"journal":{"name":"Journal of Language Evolution","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138519790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian methods for ancestral state reconstruction in morphosyntax: Exploring the history of argument marking strategies in a large language family 形态句法中祖先状态重建的贝叶斯方法:在一个大语系中探讨论点标记策略的历史
IF 2.6 Pub Date : 2022-05-28 DOI: 10.1093/jole/lzac002
Joshua L. Phillips, Claire Bowern
Bayesian phylogenetic methods have been gaining traction and currency in historical linguistics, as their potential for uncovering elements of language change is increasingly understood. Here, we demonstrate a proof of concept for using ancestral state reconstruction methods to reconstruct changes in morphology. We use a simple Brownian motion model of character evolution to test how splits in ergative marking evolve across Pama-Nyungan, a large family of Australian languages. We are able to recover linguistically plausible paths of change, as well as rejecting implausible paths. The results of these analyses elucidate constraints on changes that have led to extensive synchronic variation in an interlocking morphological system. They further provide evidence of an ergative–accusative split traceable to Proto-Pama-Nyungan.
贝叶斯系统发育方法在历史语言学中越来越受欢迎,因为人们越来越了解它们揭示语言变化要素的潜力。在这里,我们展示了使用祖先状态重建方法来重建形态学变化的概念证明。我们使用一个简单的字符进化的布朗运动模型来测试作格标记中的分裂是如何在澳大利亚语言大家族Pama Nyungan中进化的。我们能够恢复语言上看似合理的变化路径,也能够拒绝看似不合理的路径。这些分析的结果阐明了对变化的限制,这些变化导致了连锁形态系统中广泛的同步变化。它们进一步提供了可追溯到Proto Pama Nyungan的作格-宾格分裂的证据。
{"title":"Bayesian methods for ancestral state reconstruction in morphosyntax: Exploring the history of argument marking strategies in a large language family","authors":"Joshua L. Phillips, Claire Bowern","doi":"10.1093/jole/lzac002","DOIUrl":"https://doi.org/10.1093/jole/lzac002","url":null,"abstract":"\u0000 Bayesian phylogenetic methods have been gaining traction and currency in historical linguistics, as their potential for uncovering elements of language change is increasingly understood. Here, we demonstrate a proof of concept for using ancestral state reconstruction methods to reconstruct changes in morphology. We use a simple Brownian motion model of character evolution to test how splits in ergative marking evolve across Pama-Nyungan, a large family of Australian languages. We are able to recover linguistically plausible paths of change, as well as rejecting implausible paths. The results of these analyses elucidate constraints on changes that have led to extensive synchronic variation in an interlocking morphological system. They further provide evidence of an ergative–accusative split traceable to Proto-Pama-Nyungan.","PeriodicalId":37118,"journal":{"name":"Journal of Language Evolution","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2022-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44792758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
BayesVarbrul: a unified multidimensional analysis of language change in a speaker community BayesVarbrul:说话人群体语言变化的统一多维分析
IF 2.6 Pub Date : 2022-04-19 DOI: 10.1093/jole/lzac004
Xia Hua
Exchange in ideas between language evolution and biological evolution has a long history, due to a shared theoretical foundation between language and biology as two evolving systems. Both systems evolve in terms of the frequency of a variant in a population for each of a large number of variables, that is how often a particular variant of a language variable is used in a speaker community and how many individuals in a biological population carry a particular variant of a gene. The way these frequencies change has been modelled under a similar mathematical framework. Here, I show how we can use concepts from genome wide association studies that identify the source of natural selection and the genes under selection in a biological population to study how social factors affect the usage of language variables in a speaker community or how some social groups use some language variables differently from other groups. Using the Gurindji Kriol language as a case study, I show how this approach unifies existing mathematical and statistical tools in studying language evolution over a large number of speakers and a large number of language variables, which provides a promising link between micro- and macro-evolution in language. The approach is named BayesVarbrul and is ready to apply to datasets other than the Gurindji Kriol dataset, including existing corpus data. The code and the instructions are available at https://github.com/huaxia1985/BayesVarbrul.
语言进化和生物进化之间的思想交流有着悠久的历史,因为语言和生物作为两个进化系统有着共同的理论基础。对于大量变量中的每一个,这两个系统都是根据群体中变体的频率来进化的,即语言变量的特定变体在说话者群体中使用的频率,以及生物群体中有多少人携带特定变体的基因。这些频率的变化方式是在类似的数学框架下建模的。在这里,我展示了我们如何使用全基因组关联研究中的概念来研究社会因素如何影响说话者群体中语言变量的使用,或者一些社会群体如何与其他群体不同地使用某些语言变量。以Gurindji-Kriol语言为例,我展示了这种方法如何将现有的数学和统计工具统一起来,研究大量说话者和大量语言变量的语言进化,这为语言的微观和宏观进化之间提供了一个很有希望的联系。该方法名为BayesVarbrul,可应用于Gurindji-Kriol数据集以外的数据集,包括现有的语料库数据。代码和说明可在https://github.com/huaxia1985/BayesVarbrul.
{"title":"BayesVarbrul: a unified multidimensional analysis of language change in a speaker community","authors":"Xia Hua","doi":"10.1093/jole/lzac004","DOIUrl":"https://doi.org/10.1093/jole/lzac004","url":null,"abstract":"\u0000 Exchange in ideas between language evolution and biological evolution has a long history, due to a shared theoretical foundation between language and biology as two evolving systems. Both systems evolve in terms of the frequency of a variant in a population for each of a large number of variables, that is how often a particular variant of a language variable is used in a speaker community and how many individuals in a biological population carry a particular variant of a gene. The way these frequencies change has been modelled under a similar mathematical framework. Here, I show how we can use concepts from genome wide association studies that identify the source of natural selection and the genes under selection in a biological population to study how social factors affect the usage of language variables in a speaker community or how some social groups use some language variables differently from other groups. Using the Gurindji Kriol language as a case study, I show how this approach unifies existing mathematical and statistical tools in studying language evolution over a large number of speakers and a large number of language variables, which provides a promising link between micro- and macro-evolution in language. The approach is named BayesVarbrul and is ready to apply to datasets other than the Gurindji Kriol dataset, including existing corpus data. The code and the instructions are available at https://github.com/huaxia1985/BayesVarbrul.","PeriodicalId":37118,"journal":{"name":"Journal of Language Evolution","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2022-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48421279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The evolution of color naming reflects pressure for efficiency: Evidence from the recent past 颜色命名的演变反映了效率的压力:来自最近的证据
IF 2.6 Pub Date : 2022-04-11 DOI: 10.1093/jole/lzac001
Noga Zaslavsky, Karee Garvin, Charles Kemp, Naftali Tishby, Terry Regier
It has been proposed that semantic systems evolve under pressure for efficiency. This hypothesis has so far been supported largely indirectly, by synchronic cross-language comparison, rather than directly by diachronic data. Here, we directly test this hypothesis in the domain of color naming, by analyzing recent diachronic data from Nafaanra, a language of Ghana and Côte d’Ivoire, and comparing it with quantitative predictions derived from the mathematical theory of efficient data compression. We show that color naming in Nafaanra has changed over the past four decades while remaining near-optimally efficient, and that this outcome would be unlikely under a random drift process that maintains structured color categories without pressure for efficiency. To our knowledge, this finding provides the first direct evidence that color naming evolves under pressure for efficiency, supporting the hypothesis that efficiency shapes the evolution of the lexicon.
有人提出语义系统是在效率的压力下进化的。到目前为止,这一假设主要是由共时跨语言比较间接支持的,而不是由历时数据直接支持的。在这里,我们通过分析来自加纳和Côte科特迪瓦的一种语言Nafaanra的最新历时数据,直接在颜色命名领域测试了这一假设,并将其与有效数据压缩数学理论得出的定量预测进行了比较。我们表明,在Nafaanra中,颜色命名在过去四十年中发生了变化,同时保持了近乎最佳的效率,并且在没有效率压力的情况下保持结构化颜色类别的随机漂移过程中,这种结果不太可能发生。据我们所知,这一发现提供了第一个直接证据,证明颜色命名是在效率的压力下进化的,支持了效率影响词汇进化的假设。
{"title":"The evolution of color naming reflects pressure for efficiency: Evidence from the recent past","authors":"Noga Zaslavsky, Karee Garvin, Charles Kemp, Naftali Tishby, Terry Regier","doi":"10.1093/jole/lzac001","DOIUrl":"https://doi.org/10.1093/jole/lzac001","url":null,"abstract":"It has been proposed that semantic systems evolve under pressure for efficiency. This hypothesis has so far been supported largely indirectly, by synchronic cross-language comparison, rather than directly by diachronic data. Here, we directly test this hypothesis in the domain of color naming, by analyzing recent diachronic data from Nafaanra, a language of Ghana and Côte d’Ivoire, and comparing it with quantitative predictions derived from the mathematical theory of efficient data compression. We show that color naming in Nafaanra has changed over the past four decades while remaining near-optimally efficient, and that this outcome would be unlikely under a random drift process that maintains structured color categories without pressure for efficiency. To our knowledge, this finding provides the first direct evidence that color naming evolves under pressure for efficiency, supporting the hypothesis that efficiency shapes the evolution of the lexicon.","PeriodicalId":37118,"journal":{"name":"Journal of Language Evolution","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2022-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138519794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Methodological Problems in Quantitative Research on Environmental Effects in Phonology 语音学环境效应定量研究的方法论问题
IF 2.6 Pub Date : 2022-04-09 DOI: 10.1093/jole/lzac003
F. Hartmann
This paper engages with the quantitative methodology underlying studies proposing a link between environment and phonology by replicating three prominent studies on ejectives and altitude, vowels and humidity, and sonority and ambient temperature. It argues that there are several issues regarding the methodological footing of such correlational studies. Further, the paper finds that the problems of statistically analyzing environmental datasets in phonology run deeper than the focus on individual phonetic features suggests: there are several overarching patterns of correlations to be found in these datasets that, if not understood and accounted for, render mistaking spurious correlations for real effects inevitable. This paper further makes concrete suggestions for what is needed to move beyond pairwise correlational studies between environmental and phonological variables in future investigations.
本文通过复制三个突出的关于射词和海拔、元音和湿度、响度和环境温度的研究,运用定量方法提出了环境和音系之间的联系。它认为,关于这种相关性研究的方法论基础存在几个问题。此外,本文还发现,在音系学中对环境数据集进行统计分析的问题比对单个语音特征的关注更深:在这些数据集中可以找到几种总体的相关性模式,如果不加以理解和解释,就会不可避免地将虚假的相关性误认为真实的影响。本文进一步提出了在未来的研究中需要超越环境和语音变量之间的成对相关研究的具体建议。
{"title":"Methodological Problems in Quantitative Research on Environmental Effects in Phonology","authors":"F. Hartmann","doi":"10.1093/jole/lzac003","DOIUrl":"https://doi.org/10.1093/jole/lzac003","url":null,"abstract":"\u0000 This paper engages with the quantitative methodology underlying studies proposing a link between environment and phonology by replicating three prominent studies on ejectives and altitude, vowels and humidity, and sonority and ambient temperature. It argues that there are several issues regarding the methodological footing of such correlational studies. Further, the paper finds that the problems of statistically analyzing environmental datasets in phonology run deeper than the focus on individual phonetic features suggests: there are several overarching patterns of correlations to be found in these datasets that, if not understood and accounted for, render mistaking spurious correlations for real effects inevitable. This paper further makes concrete suggestions for what is needed to move beyond pairwise correlational studies between environmental and phonological variables in future investigations.","PeriodicalId":37118,"journal":{"name":"Journal of Language Evolution","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2022-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43189859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A simulation on coevolution between language and multiple cognitive abilities 语言与多种认知能力协同进化的模拟
IF 2.6 Pub Date : 2022-01-29 DOI: 10.1093/jole/lzab006
T. Gong, L. Shuai, Xiaolong Yang
We propose a coevolution scenario between language and two cognitive abilities, namely shared intentionality and lexical memory, under a conceptual framework that integrates biological evolution of language learners and cultural evolution of communal language among language users. Piggybacking on a well-attested agent-based model on the origin of simple lexicon and constituent word order out of holistic utterances, we demonstrate: (1) once adopted by early hominins to handle preliminary linguistic materials, along with the origin of an evolving communal language having a high mutual understandability among language users, the initially low levels of the two cognitive abilities are boosted and get ratcheted at sufficiently high levels in language users for proficient language learning and use; (2) the socio-cultural environment is indispensable for the coevolution, and natural selection (selecting highly understandable adults to produce offspring), not cultural selection (choosing highly understandable adults to teach offspring), drives the coevolution. This work modifies existing models and theories of coevolution between language and human cognition and clarifies theoretical controversies regarding the roles of natural and cultural selections on language evolution.
我们提出了一种语言与两种认知能力(即共同意向性和词汇记忆)之间的共同进化情景,该情景是在一个概念框架下进行的,该框架将语言学习者的生物进化和语言使用者之间公共语言的文化进化相结合。基于一个已得到充分证明的基于主体的模型,该模型基于整体话语中简单词汇和组成语序的起源,我们证明:(1)一旦被早期人类用来处理初步的语言材料,以及一种在语言使用者之间具有高互可理解性的不断发展的公共语言的起源,这两种认知能力最初的低水平在语言使用者中得到提升,并逐渐达到足够高的水平,以熟练地学习和使用语言;(2) 社会文化环境对共同进化是不可或缺的,而推动共同进化的是自然选择(选择高度可理解的成年人产生后代),而不是文化选择(选择非常可理解的成人教导后代)。这项工作修改了现有的语言和人类认知共同进化的模型和理论,并澄清了关于自然和文化选择在语言进化中的作用的理论争议。
{"title":"A simulation on coevolution between language and multiple cognitive abilities","authors":"T. Gong, L. Shuai, Xiaolong Yang","doi":"10.1093/jole/lzab006","DOIUrl":"https://doi.org/10.1093/jole/lzab006","url":null,"abstract":"\u0000 We propose a coevolution scenario between language and two cognitive abilities, namely shared intentionality and lexical memory, under a conceptual framework that integrates biological evolution of language learners and cultural evolution of communal language among language users. Piggybacking on a well-attested agent-based model on the origin of simple lexicon and constituent word order out of holistic utterances, we demonstrate: (1) once adopted by early hominins to handle preliminary linguistic materials, along with the origin of an evolving communal language having a high mutual understandability among language users, the initially low levels of the two cognitive abilities are boosted and get ratcheted at sufficiently high levels in language users for proficient language learning and use; (2) the socio-cultural environment is indispensable for the coevolution, and natural selection (selecting highly understandable adults to produce offspring), not cultural selection (choosing highly understandable adults to teach offspring), drives the coevolution. This work modifies existing models and theories of coevolution between language and human cognition and clarifies theoretical controversies regarding the roles of natural and cultural selections on language evolution.","PeriodicalId":37118,"journal":{"name":"Journal of Language Evolution","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2022-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48347890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correcting a bias in TIGER rates resulting from high amounts of invariant and singleton cognate sets 纠正由于大量不变和单一同源集而导致的TIGER率偏差
IF 2.6 Pub Date : 2022-01-19 DOI: 10.1093/jole/lzab007
Johann-Mattis List
In a recent issue of the Journal of Language Evolution, Syrjänen et al. (2021) investigate the suitability of computing Cummins and McInerney’s (2011) TIGER rates for estimating the tree-likeness of linguistic datasets compiled for phylogenetic reconstruction. The authors test the TIGER rates on a diverse sample of simulated data, which by and large confirms the usefulness of TIGER rates as an analytic tool for investigating linguistic data, but they test them only on one real-world dataset of Uralic languages which turns out to behave quite differently from the simulated data. When testing the TIGER rates on additional datasets, I detected a bias in the computation which leads to an unnatural increase in those cases where a dataset contains many characters with invariant or singleton states. To overcome this problem, I suggest a modified variant of TIGER rates, which is provided in the form of a freely available Python package. Testing the modified TIGER scores on the simulated data of Syrjänen et al. shows that the corrected TIGER rates still readily distinguish between different degrees of tree-likeness. Testing them on a dataset in which the number of singletons and invariants was artificially increased further shows that the corrected TIGER rates are not influenced by the bias. A final tests on seven linguistic datasets show the usefulness of the corrected TIGER rates on a larger variety of linguistic datasets and illustrate the importance to take specific aspects of linguistic data into account when using biological methods in the domain of language evolution.
在最近一期的《语言进化杂志》中,Syrjänen等人(2021)研究了计算Cummins和McInerney(2011)的TIGER率用于估计用于系统发育重建的语言数据集的树状相似性的适用性。作者在不同的模拟数据样本上测试了TIGER率,这在很大程度上证实了TIGER率作为调查语言数据的分析工具的有效性,但他们只在乌拉尔语的一个真实数据集上测试了TIGER率,结果发现它的行为与模拟数据大不相同。当在其他数据集上测试TIGER率时,我检测到计算中的偏差,当数据集包含许多具有不变或单例状态的字符时,这种偏差会导致不自然的增加。为了克服这个问题,我建议使用TIGER速率的修改变体,它以免费提供的Python包的形式提供。在Syrjänen等人的模拟数据上测试修改后的TIGER分数表明,修正后的TIGER率仍然很容易区分不同程度的树相似度。在人工增加单例和不变量数量的数据集上测试它们进一步表明,校正后的TIGER率不受偏差的影响。对七个语言数据集进行的最后测试表明,修正后的TIGER率在更多种类的语言数据集上是有用的,并说明在语言进化领域使用生物学方法时考虑语言数据的特定方面的重要性。
{"title":"Correcting a bias in TIGER rates resulting from high amounts of invariant and singleton cognate sets","authors":"Johann-Mattis List","doi":"10.1093/jole/lzab007","DOIUrl":"https://doi.org/10.1093/jole/lzab007","url":null,"abstract":"\u0000 In a recent issue of the Journal of Language Evolution, Syrjänen et al. (2021) investigate the suitability of computing Cummins and McInerney’s (2011) TIGER rates for estimating the tree-likeness of linguistic datasets compiled for phylogenetic reconstruction. The authors test the TIGER rates on a diverse sample of simulated data, which by and large confirms the usefulness of TIGER rates as an analytic tool for investigating linguistic data, but they test them only on one real-world dataset of Uralic languages which turns out to behave quite differently from the simulated data. When testing the TIGER rates on additional datasets, I detected a bias in the computation which leads to an unnatural increase in those cases where a dataset contains many characters with invariant or singleton states. To overcome this problem, I suggest a modified variant of TIGER rates, which is provided in the form of a freely available Python package. Testing the modified TIGER scores on the simulated data of Syrjänen et al. shows that the corrected TIGER rates still readily distinguish between different degrees of tree-likeness. Testing them on a dataset in which the number of singletons and invariants was artificially increased further shows that the corrected TIGER rates are not influenced by the bias. A final tests on seven linguistic datasets show the usefulness of the corrected TIGER rates on a larger variety of linguistic datasets and illustrate the importance to take specific aspects of linguistic data into account when using biological methods in the domain of language evolution.","PeriodicalId":37118,"journal":{"name":"Journal of Language Evolution","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2022-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43164472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Crouching TIGER, hidden structure: Exploring the nature of linguistic data using TIGER values 卧虎藏龙,隐藏结构:用TIGER值探索语言数据的本质
IF 2.6 Pub Date : 2021-11-15 DOI: 10.1093/jole/lzab004
K. Syrjänen, L. Maurits, Unni Leino, T. Honkola, J. Rota, O. Vesakoski
In recent years, techniques such as Bayesian inference of phylogeny have become a standard part of the quantitative linguistic toolkit. While these tools successfully model the tree-like component of a linguistic dataset, real-world datasets generally include a combination of tree-like and nontree-like signals. Alongside developing techniques for modeling nontree-like data, an important requirement for future quantitative work is to build a principled understanding of this structural complexity of linguistic datasets. Some techniques exist for exploring the general structure of a linguistic dataset, such as NeighborNets, δ scores, and Q-residuals; however, these methods are not without limitations or drawbacks. In general, the question of what kinds of historical structure a linguistic dataset can contain and how these might be detected or measured remains critically underexplored from an objective, quantitative perspective. In this article, we propose TIGER values, a metric that estimates the internal consistency of a genetic dataset, as an additional metric for assessing how tree-like a linguistic dataset is. We use TIGER values to explore simulated language data ranging from very tree-like to completely unstructured, and also use them to analyze a cognate-coded basic vocabulary dataset of Uralic languages. As a point of comparison for the TIGER values, we also explore the same data using δ scores, Q-residuals, and NeighborNets. Our results suggest that TIGER values are capable of both ranking tree-like datasets according to their degree of treelikeness, as well as distinguishing datasets with tree-like structure from datasets with a nontree-like structure. Consequently, we argue that TIGER values serve as a useful metric for measuring the historical heterogeneity of datasets. Our results also highlight the complexities in measuring treelikeness from linguistic data, and how the metrics approach this question from different perspectives.
近年来,系统发育的贝叶斯推理等技术已成为定量语言学工具包的标准组成部分。虽然这些工具成功地对语言数据集的树状成分进行了建模,但现实世界的数据集通常包括树状和非树状信号的组合。除了开发非三类数据建模技术外,未来定量工作的一个重要要求是对语言数据集的这种结构复杂性建立原则性的理解。存在一些用于探索语言数据集的一般结构的技术,如邻居网、δ分数和Q残差;然而,这些方法并非没有限制或缺点。总的来说,从客观、定量的角度来看,语言数据集可以包含什么样的历史结构以及如何检测或测量这些历史结构的问题仍然严重缺乏探索。在这篇文章中,我们提出了TIGER值,这是一种估计遗传数据集内部一致性的指标,作为评估语言数据集树状程度的额外指标。我们使用TIGER数值来探索从非常树状到完全非结构化的模拟语言数据,并用它们来分析乌拉尔语的同源编码基本词汇数据集。作为TIGER值的比较点,我们还使用δ分数、Q残差和邻居网来探索相同的数据。我们的结果表明,TIGER值既能够根据树状数据集的树状程度对其进行排序,也能够区分具有树状结构的数据集和具有非树状结构的数据库。因此,我们认为TIGER值是衡量数据集历史异质性的有用指标。我们的研究结果还强调了从语言数据中测量树木相似性的复杂性,以及度量标准如何从不同角度处理这个问题。
{"title":"Crouching TIGER, hidden structure: Exploring the nature of linguistic data using TIGER values","authors":"K. Syrjänen, L. Maurits, Unni Leino, T. Honkola, J. Rota, O. Vesakoski","doi":"10.1093/jole/lzab004","DOIUrl":"https://doi.org/10.1093/jole/lzab004","url":null,"abstract":"\u0000 In recent years, techniques such as Bayesian inference of phylogeny have become a standard part of the quantitative linguistic toolkit. While these tools successfully model the tree-like component of a linguistic dataset, real-world datasets generally include a combination of tree-like and nontree-like signals. Alongside developing techniques for modeling nontree-like data, an important requirement for future quantitative work is to build a principled understanding of this structural complexity of linguistic datasets. Some techniques exist for exploring the general structure of a linguistic dataset, such as NeighborNets, δ scores, and Q-residuals; however, these methods are not without limitations or drawbacks. In general, the question of what kinds of historical structure a linguistic dataset can contain and how these might be detected or measured remains critically underexplored from an objective, quantitative perspective. In this article, we propose TIGER values, a metric that estimates the internal consistency of a genetic dataset, as an additional metric for assessing how tree-like a linguistic dataset is. We use TIGER values to explore simulated language data ranging from very tree-like to completely unstructured, and also use them to analyze a cognate-coded basic vocabulary dataset of Uralic languages. As a point of comparison for the TIGER values, we also explore the same data using δ scores, Q-residuals, and NeighborNets. Our results suggest that TIGER values are capable of both ranking tree-like datasets according to their degree of treelikeness, as well as distinguishing datasets with tree-like structure from datasets with a nontree-like structure. Consequently, we argue that TIGER values serve as a useful metric for measuring the historical heterogeneity of datasets. Our results also highlight the complexities in measuring treelikeness from linguistic data, and how the metrics approach this question from different perspectives.","PeriodicalId":37118,"journal":{"name":"Journal of Language Evolution","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2021-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46292851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
The philosophical interpretation of language game theory 语言博弈论的哲学解释
IF 2.6 Pub Date : 2021-11-03 DOI: 10.1093/jole/lzab003
Nick Zangwill
I give an informal presentation of the evolutionary game theoretic approach to the conventions that constitute linguistic meaning. The aim is to give a philosophical interpretation of the project, which accounts for the role of game theoretic mathematics in explaining linguistic phenomena. I articulate the main virtue of this sort of account, which is its psychological economy, and I point to the casual mechanisms that are the ground of the application of evolutionary game theory to linguistic phenomena. Lastly, I consider the objection that the account cannot explain predication, logic, and compositionality.
我非正式地介绍了进化博弈论的方法来研究构成语言意义的惯例。目的是给出一个项目的哲学解释,这说明了博弈论数学在解释语言现象中的作用。我阐明了这种解释的主要优点,即它的心理经济性,我还指出了作为将进化博弈论应用于语言现象的基础的偶然机制。最后,我考虑反对意见,即帐户不能解释预测,逻辑和组合性。
{"title":"The philosophical interpretation of language game theory","authors":"Nick Zangwill","doi":"10.1093/jole/lzab003","DOIUrl":"https://doi.org/10.1093/jole/lzab003","url":null,"abstract":"\u0000 I give an informal presentation of the evolutionary game theoretic approach to the conventions that constitute linguistic meaning. The aim is to give a philosophical interpretation of the project, which accounts for the role of game theoretic mathematics in explaining linguistic phenomena. I articulate the main virtue of this sort of account, which is its psychological economy, and I point to the casual mechanisms that are the ground of the application of evolutionary game theory to linguistic phenomena. Lastly, I consider the objection that the account cannot explain predication, logic, and compositionality.","PeriodicalId":37118,"journal":{"name":"Journal of Language Evolution","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2021-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46907149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Bayesian phylogenetic analysis of linguistic data using BEAST 基于BEAST的语言数据贝叶斯系统发育分析
IF 2.6 Pub Date : 2021-09-23 DOI: 10.1093/jole/lzab005
Konstantin Hoffmann, R. Bouckaert, Simon J. Greenhill, D. Kühnert
Bayesian phylogenetic methods provide a set of tools to efficiently evaluate large linguistic datasets by reconstructing phylogenies—family trees—that represent the history of language families. These methods provide a powerful way to test hypotheses about prehistory, regarding the subgrouping, origins, expansion, and timing of the languages and their speakers. Through phylogenetics, we gain insights into the process of language evolution in general and into how fast individual features change in particular. This article introduces Bayesian phylogenetics as applied to languages. We describe substitution models for cognate evolution, molecular clock models for the evolutionary rate along the branches of a tree, and tree generating processes suitable for linguistic data. We explain how to find the best-suited model using path sampling or nested sampling. The theoretical background of these models is supplemented by a practical tutorial describing how to set up a Bayesian phylogenetic analysis using the software tool BEAST2.
贝叶斯系统发育方法提供了一套工具,通过重建代表语系历史的系统发育(家谱)来有效评估大型语言数据集。这些方法提供了一种强有力的方法来检验关于史前的假设,关于语言及其使用者的亚组、起源、扩展和时间安排。通过系统发育学,我们可以深入了解语言进化的一般过程,尤其是个体特征的变化速度。本文介绍了贝叶斯系统发育学在语言中的应用。我们描述了同源进化的替代模型,树分支进化率的分子时钟模型,以及适用于语言数据的树生成过程。我们解释了如何使用路径采样或嵌套采样来找到最适合的模型。这些模型的理论背景由一个实用教程补充,该教程描述了如何使用软件工具BEAST2建立贝叶斯系统发育分析。
{"title":"Bayesian phylogenetic analysis of linguistic data using BEAST","authors":"Konstantin Hoffmann, R. Bouckaert, Simon J. Greenhill, D. Kühnert","doi":"10.1093/jole/lzab005","DOIUrl":"https://doi.org/10.1093/jole/lzab005","url":null,"abstract":"\u0000 Bayesian phylogenetic methods provide a set of tools to efficiently evaluate large linguistic datasets by reconstructing phylogenies—family trees—that represent the history of language families. These methods provide a powerful way to test hypotheses about prehistory, regarding the subgrouping, origins, expansion, and timing of the languages and their speakers. Through phylogenetics, we gain insights into the process of language evolution in general and into how fast individual features change in particular. This article introduces Bayesian phylogenetics as applied to languages. We describe substitution models for cognate evolution, molecular clock models for the evolutionary rate along the branches of a tree, and tree generating processes suitable for linguistic data. We explain how to find the best-suited model using path sampling or nested sampling. The theoretical background of these models is supplemented by a practical tutorial describing how to set up a Bayesian phylogenetic analysis using the software tool BEAST2.","PeriodicalId":37118,"journal":{"name":"Journal of Language Evolution","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2021-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44792974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
Journal of Language Evolution
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1