语言变化的动力:以波兰barzo bbbbo bardzo为例

Q2 Arts and Humanities Studies in Polish Linguistics Pub Date : 2021-01-01 DOI:10.4467/23005920spl.21.007.14261

R. L. Górski

{"title":"语言变化的动力:以波兰barzo bbbbo bardzo为例","authors":"R. L. Górski","doi":"10.4467/23005920spl.21.007.14261","DOIUrl":null,"url":null,"abstract":"The paper discusses the benefits and shortcomings of modelling a language change with logistic regression, an approach often called the Piotrowski-Altmann law. It is shown with an example of an isolated change, which occurred in Middle Polish, namely barzo > bardzo. The study is based on a historical corpus of Polish consisting of several hundreds of texts with over 12 million running words. Logistic regression based on the entire dataset shows relatively high goodness of fit, still there are some data points, especially close to the end of the process, which are quite far removed from the idealised trajectory. In the article, the author seeks to answer the question: to what extent the quality of the corpus affects the model. An experiment was conducted: a number of texts were randomly removed in order to create a smaller corpus, containing 90%, 75% and 50% of the texts of the entire set. Since such procedure is repeated 200 times, it is possible to compare the distribution of the scores indicating the goodness of fit of the model. It turns out that the smaller the corpus, the more diverse the goodness of fit, and in some rare cases it is even better than its counterpart for a larger corpus. Still the larger the corpus, the scores indicating goodness of fit tend to be higher.","PeriodicalId":37336,"journal":{"name":"Studies in Polish Linguistics","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dynamics of Language Change: The Case of Polish barzo > bardzo\",\"authors\":\"R. L. Górski\",\"doi\":\"10.4467/23005920spl.21.007.14261\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The paper discusses the benefits and shortcomings of modelling a language change with logistic regression, an approach often called the Piotrowski-Altmann law. It is shown with an example of an isolated change, which occurred in Middle Polish, namely barzo > bardzo. The study is based on a historical corpus of Polish consisting of several hundreds of texts with over 12 million running words. Logistic regression based on the entire dataset shows relatively high goodness of fit, still there are some data points, especially close to the end of the process, which are quite far removed from the idealised trajectory. In the article, the author seeks to answer the question: to what extent the quality of the corpus affects the model. An experiment was conducted: a number of texts were randomly removed in order to create a smaller corpus, containing 90%, 75% and 50% of the texts of the entire set. Since such procedure is repeated 200 times, it is possible to compare the distribution of the scores indicating the goodness of fit of the model. It turns out that the smaller the corpus, the more diverse the goodness of fit, and in some rare cases it is even better than its counterpart for a larger corpus. Still the larger the corpus, the scores indicating goodness of fit tend to be higher.\",\"PeriodicalId\":37336,\"journal\":{\"name\":\"Studies in Polish Linguistics\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Studies in Polish Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4467/23005920spl.21.007.14261\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Arts and Humanities\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies in Polish Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4467/23005920spl.21.007.14261","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Arts and Humanities","Score":null,"Total":0}

引用次数: 0

摘要

本文讨论了用逻辑回归建模语言变化的优点和缺点，这种方法通常被称为Piotrowski-Altmann定律。这里有一个单独的变化的例子，发生在中波兰语，即barzo > bardzo。这项研究基于波兰语的历史语料库，该语料库由数百个文本组成，超过1200万运行单词。基于整个数据集的逻辑回归显示出较高的拟合优度，但仍有一些数据点，特别是接近过程结束的数据点，与理想轨迹相距甚远。在本文中，作者试图回答这样一个问题:语料库的质量在多大程度上影响了模型。进行了一个实验:为了创建一个更小的语料库，随机删除一些文本，其中包含整个集合的90%，75%和50%的文本。由于这样的过程重复了200次，因此可以比较表明模型拟合优度的分数的分布。事实证明，语料库越小，拟合优度就越多样化，在某些罕见的情况下，它甚至比一个更大的语料库还要好。然而，语料库越大，表示拟合优度的分数往往越高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Dynamics of Language Change: The Case of Polish barzo > bardzo

The paper discusses the benefits and shortcomings of modelling a language change with logistic regression, an approach often called the Piotrowski-Altmann law. It is shown with an example of an isolated change, which occurred in Middle Polish, namely barzo > bardzo. The study is based on a historical corpus of Polish consisting of several hundreds of texts with over 12 million running words. Logistic regression based on the entire dataset shows relatively high goodness of fit, still there are some data points, especially close to the end of the process, which are quite far removed from the idealised trajectory. In the article, the author seeks to answer the question: to what extent the quality of the corpus affects the model. An experiment was conducted: a number of texts were randomly removed in order to create a smaller corpus, containing 90%, 75% and 50% of the texts of the entire set. Since such procedure is repeated 200 times, it is possible to compare the distribution of the scores indicating the goodness of fit of the model. It turns out that the smaller the corpus, the more diverse the goodness of fit, and in some rare cases it is even better than its counterpart for a larger corpus. Still the larger the corpus, the scores indicating goodness of fit tend to be higher.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Studies in Polish Linguistics Arts and Humanities-Language and Linguistics

CiteScore

0.50

自引率

0.00%

发文量