Journal of Quantitative Linguistics最新文献

英文中文

Word Length Distribution in German Texts during the 17th-19th Century 17-19世纪德语文本中的词长分布

IF 1.4 2区文学 0 LANGUAGE & LINGUISTICS

Journal of Quantitative Linguistics

Pub Date : 2019-09-15 DOI: 10.1080/09296174.2019.1662536

Fei Lian, Y. Li

ABSTRACT Word length in German texts has been a frequently discussed issue in the field of quantitative linguistics. Taking an overall view of the existing research data, however, most of the research focuses on literary texts and private letters and the size of data corpus for each research is relatively small. This paper provides a time- and genre-based analysis of word length distribution in German using 360 texts originated between the 17th and 19th centuries, aiming to find a probability distribution that can capture well the German word length distribution from a diachronic perspective and to reveal the relationship between the word length distribution and boundary conditions such as the genre and the creation time of text. Results indicate that the word length distribution in German texts written in different eras abides by the 1-displaced hyper-Poisson distribution, whose parameters (a, b) are interconnected with boundary conditions. This study corroborates that the word length distribution of a certain language is consistent, due to the constraint of the cognitive mechanism. Besides, the parameters of probability distribution can be good indicators of the writing style as well as the creation time of text.

摘要德语文本中的单词长度一直是数量语言学领域中经常讨论的问题。然而，从现有的研究数据来看，大多数研究都集中在文学文本和私人信件上，每项研究的数据语料库规模相对较小。本文利用源自17世纪至19世纪的360篇文本，对德语中的单词长度分布进行了基于时间和体裁的分析，旨在从历时的角度找到一个能够很好地捕捉德语单词长度分布的概率分布，并揭示单词长度分布与文本类型和创作时间等边界条件之间的关系。结果表明，不同时代德语文本中的字长分布遵循1维超泊松分布，其参数（a，b）与边界条件相互关联。本研究证实，由于认知机制的限制，某一语言的单词长度分布是一致的。此外，概率分布参数可以很好地指示写作风格以及文本的创作时间。

{"title":"Word Length Distribution in German Texts during the 17th-19th Century","authors":"Fei Lian, Y. Li","doi":"10.1080/09296174.2019.1662536","DOIUrl":"https://doi.org/10.1080/09296174.2019.1662536","url":null,"abstract":"ABSTRACT Word length in German texts has been a frequently discussed issue in the field of quantitative linguistics. Taking an overall view of the existing research data, however, most of the research focuses on literary texts and private letters and the size of data corpus for each research is relatively small. This paper provides a time- and genre-based analysis of word length distribution in German using 360 texts originated between the 17th and 19th centuries, aiming to find a probability distribution that can capture well the German word length distribution from a diachronic perspective and to reveal the relationship between the word length distribution and boundary conditions such as the genre and the creation time of text. Results indicate that the word length distribution in German texts written in different eras abides by the 1-displaced hyper-Poisson distribution, whose parameters (a, b) are interconnected with boundary conditions. This study corroborates that the word length distribution of a certain language is consistent, due to the constraint of the cognitive mechanism. Besides, the parameters of probability distribution can be good indicators of the writing style as well as the creation time of text.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"28 1","pages":"117 - 137"},"PeriodicalIF":1.4,"publicationDate":"2019-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2019.1662536","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42201015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Statistics in Corpus Linguistics: A Practical Guide 语料库语言学中的统计:实用指南

IF 1.4 2区文学 0 LANGUAGE & LINGUISTICS

Journal of Quantitative Linguistics

Pub Date : 2019-07-29 DOI: 10.1080/09296174.2019.1646069

Cunxin Han

Corpus linguistics is a powerful quantitative methodology, which heavily relies on frequency data and statistical procedures. It is difficult to talk about corpus linguistics without mentioning sta...

语料库语言学是一种强大的定量方法，它在很大程度上依赖于频率数据和统计程序。谈到语料库语言学，就不能不提到sta。

引用次数: 74

Optimal Coding and the Origins of Zipfian Laws 最优编码与Zipfian定律的起源

IF 1.4 2区文学 0 LANGUAGE & LINGUISTICS

Journal of Quantitative Linguistics

Pub Date : 2019-06-04 DOI: 10.1080/09296174.2020.1778387

R. Ferrer-i-Cancho, C. Bentz

ABSTRACT The problem of compression in standard information theory consists of assigning codes as short as possible to numbers. Here we consider the problem of optimal coding – under an arbitrary coding scheme – and show that it predicts Zipf’s law of abbreviation, namely a tendency in natural languages for more frequent words to be shorter. We apply this result to investigate optimal coding also under so-called non-singular coding, a scheme where unique segmentation is not warranted but codes stand for a distinct number. Optimal non-singular coding predicts that the length of a word should grow approximately as the logarithm of its frequency rank, which is again consistent with Zipf’s law of abbreviation. Optimal non-singular coding in combination with the maximum entropy principle also predicts Zipf’s rank-frequency distribution. Furthermore, our findings on optimal non-singular coding challenge common beliefs about random typing. It turns out that random typing is in fact an optimal coding process, in stark contrast with the common assumption that it is detached from cost cutting considerations. Finally, we discuss the implications of optimal coding for the construction of a compact theory of Zipfian laws more generally as well as other linguistic laws.

标准信息论中的压缩问题包括将尽可能短的代码分配给数字。在这里，我们考虑了在任意编码方案下的最优编码问题，并表明它预测了Zipf缩写定律，即自然语言中更频繁的单词更短的趋势。我们将这一结果应用于研究在所谓的非奇异编码下的最优编码，该方案不保证唯一分割，但代码代表不同的数字。最优非奇异编码预测单词的长度应该近似于其频率秩的对数增长，这再次符合Zipf的缩写定律。结合最大熵原理的最优非奇异编码也预测了齐普夫的秩频率分布。此外，我们关于最优非奇异编码的发现挑战了关于随机类型的普遍信念。事实证明，随机打字实际上是一种最佳的编码过程，这与人们普遍认为它脱离了成本削减的考虑形成了鲜明对比。最后，我们讨论了最优编码对Zipfian定律以及其他语言定律的紧凑理论的构建的影响。

{"title":"Optimal Coding and the Origins of Zipfian Laws","authors":"R. Ferrer-i-Cancho, C. Bentz","doi":"10.1080/09296174.2020.1778387","DOIUrl":"https://doi.org/10.1080/09296174.2020.1778387","url":null,"abstract":"ABSTRACT The problem of compression in standard information theory consists of assigning codes as short as possible to numbers. Here we consider the problem of optimal coding – under an arbitrary coding scheme – and show that it predicts Zipf’s law of abbreviation, namely a tendency in natural languages for more frequent words to be shorter. We apply this result to investigate optimal coding also under so-called non-singular coding, a scheme where unique segmentation is not warranted but codes stand for a distinct number. Optimal non-singular coding predicts that the length of a word should grow approximately as the logarithm of its frequency rank, which is again consistent with Zipf’s law of abbreviation. Optimal non-singular coding in combination with the maximum entropy principle also predicts Zipf’s rank-frequency distribution. Furthermore, our findings on optimal non-singular coding challenge common beliefs about random typing. It turns out that random typing is in fact an optimal coding process, in stark contrast with the common assumption that it is detached from cost cutting considerations. Finally, we discuss the implications of optimal coding for the construction of a compact theory of Zipfian laws more generally as well as other linguistic laws.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"29 1","pages":"165 - 194"},"PeriodicalIF":1.4,"publicationDate":"2019-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2020.1778387","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47778723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

Correlations and Potential Cross-Linguistic Indicators of Writing Style 写作风格的相关性和潜在的跨语言指标

IF 1.4 2区文学 0 LANGUAGE & LINGUISTICS

Journal of Quantitative Linguistics

Pub Date : 2019-04-03 DOI: 10.1080/09296174.2018.1458395

P. Juola, George K. Mikros, Sean Vinsick

Abstract In this paper, we present preliminary results on how an individual’s writing style persists even across languages. In other words, what aspects of an individual’s writing will persist irrespective of the language in which he or she writes? We argue that cognitive and social traits are likely to persist and demonstrate this by two separate analyses of bilingual corpora using the same individuals. We show that for various measures of linguistic complexity (which we consider to be a cognitive variable) and participation in specific social conventions (a social one), the correlation between scores on the two languages studied is significantly higher than would be expected by chance. We argue that this type of correlation may permit cross-linguistic authorship attribution.

摘要在本文中，我们提出了关于个人写作风格如何在不同语言中持续存在的初步结果。换句话说，不管一个人用什么语言写作，他或她的写作中有哪些方面会持续存在?我们认为，认知和社会特征可能会持续存在，并通过使用同一个人对双语语料库进行两个单独的分析来证明这一点。我们表明，对于语言复杂性(我们认为这是一个认知变量)和参与特定社会习俗(社会习俗)的各种衡量标准，两种语言的得分之间的相关性明显高于偶然预期。我们认为，这种类型的相关性可能允许跨语言作者归属。

引用次数: 17

Calculation of Semantic Distances Between Words: From Synonymy to Antonymy 词间语义距离的计算:从同义词到反义词

IF 1.4 2区文学 0 LANGUAGE & LINGUISTICS

Journal of Quantitative Linguistics

Pub Date : 2019-04-03 DOI: 10.1080/09296174.2018.1452524

M. Vakulenko

Abstract A new approach to numerically measure the semantic distances between lexical units (words and collocations) based on the geometric analogies and analytical calculation, is put forward. Having considered the cases of equal and different weights of semes, we obtained exact algebraic formulas describing different levels of the meanings proximity, ranging from absolute synonymy to full antonymy. It was emphasized that absolute synonymy arises when the compared units contain equal numbers of semes that fully coincide and have equal weights in the corresponding pairs. Calculation of the semes weights helps to locate the unit more precisely on the semantic sphere. It was shown that the level of synonymy and antonymy decreases if different semes are accentuated, while the semantic distance between the units without identical semes cannot be influenced by seme boosting. It was observed that depending on the context, a word may wander over this sphere, thus modifying its lexical semantic relations with other units. As the proposed approach contributes to formalization of the units comparison procedure, it is advisable for incorporation into relevant automatic tools, particularly into WordNet and FrameNet. The obtained results may be useful for various linguistic and associated studies including automatic text analysis and processing, computer lexicography, information search and retrieval, machine translation and other NLP applications that are related to the artificial intelligence problem.

摘要提出了一种基于几何类比和解析计算的词汇单位(词和搭配)之间语义距离的数值测量方法。在考虑了义权相等和不同的情况下，我们得到了从绝对同义词到完全反义词的不同意义接近程度的精确代数公式。当被比较的单位包含相同数量的完全重合且在相应的对中具有相同权重的义时，就会出现绝对同义。义位权值的计算有助于在语义范围内更精确地定位单元。结果表明，强化不同义素会降低同义词和反义词的水平，而强化义素不会影响义素不相同的单元之间的语义距离。我们观察到，根据上下文，一个词可能会在这个范围内徘徊，从而改变其与其他单位的词汇语义关系。由于建议的方法有助于单元比较过程的形式化，因此建议将其合并到相关的自动工具中，特别是合并到WordNet和FrameNet中。所获得的结果可能对各种语言学和相关研究有用，包括自动文本分析和处理，计算机词典编纂，信息搜索和检索，机器翻译和其他与人工智能问题相关的NLP应用。

{"title":"Calculation of Semantic Distances Between Words: From Synonymy to Antonymy","authors":"M. Vakulenko","doi":"10.1080/09296174.2018.1452524","DOIUrl":"https://doi.org/10.1080/09296174.2018.1452524","url":null,"abstract":"Abstract A new approach to numerically measure the semantic distances between lexical units (words and collocations) based on the geometric analogies and analytical calculation, is put forward. Having considered the cases of equal and different weights of semes, we obtained exact algebraic formulas describing different levels of the meanings proximity, ranging from absolute synonymy to full antonymy. It was emphasized that absolute synonymy arises when the compared units contain equal numbers of semes that fully coincide and have equal weights in the corresponding pairs. Calculation of the semes weights helps to locate the unit more precisely on the semantic sphere. It was shown that the level of synonymy and antonymy decreases if different semes are accentuated, while the semantic distance between the units without identical semes cannot be influenced by seme boosting. It was observed that depending on the context, a word may wander over this sphere, thus modifying its lexical semantic relations with other units. As the proposed approach contributes to formalization of the units comparison procedure, it is advisable for incorporation into relevant automatic tools, particularly into WordNet and FrameNet. The obtained results may be useful for various linguistic and associated studies including automatic text analysis and processing, computer lexicography, information search and retrieval, machine translation and other NLP applications that are related to the artificial intelligence problem.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"26 1","pages":"116 - 128"},"PeriodicalIF":1.4,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2018.1452524","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47278957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Frequency Effect and Neutralization of Tones in Mandarin Chinese 普通话声调的频率效应与中性化

IF 1.4 2区文学 0 LANGUAGE & LINGUISTICS

Journal of Quantitative Linguistics

Pub Date : 2019-04-03 DOI: 10.1080/09296174.2018.1452140

Huifang Kong, Shengyi Wu

Abstract Tonal neutralization in Mandarin has long been thought to be connected with lexical frequency. But this has never been investigated quantitatively because of the methodological challenge. In this study, a production experiment was run with speakers reading disyllabic words in neutral tones with frequency estimates derived from a Frequency Dictionary. The dependent measures were the three acoustic correlates of: duration, F0 contour and intensity. Independent measures included the lexical frequency at three levels (low, middle and high). Regression analysis showed that neutralization of tones are directly correlated with lexical frequency independent of other factors. A regularity, the more frequent, the shorter in duration; the more frequent, the lower in pitch; the more frequent, the weaker in intensity governs the neutralization of tones in reduced syllables. However, the exact shape of such an effect displays a different scenario in a different frequency range. Only high frequency words display a significant difference from low frequency words. Last but not the least, an exemplar representation is proposed to express a neutral tone’s observed frequency effect naturally.

摘要长期以来，人们一直认为普通话的音调中和与词汇频率有关。但由于方法上的挑战，这一点从未得到定量研究。在这项研究中，对说话者进行了一项生产实验，用频率词典中的频率估计值阅读中性音调的双音节单词。依赖性测量是三个声学相关性：持续时间、F0轮廓和强度。独立测量包括三个层次（低、中、高）的词汇频率。回归分析表明，声调的中和与词汇频率直接相关，不受其他因素的影响。规律性，频率越高，持续时间越短；频率越高，音调越低；越频繁，强度就越弱，这就决定了缩减音节中音调的中和。然而，这种效果的确切形状在不同的频率范围内显示了不同的场景。只有高频词显示出与低频词的显著差异。最后但并非最不重要的是，提出了一个示例表示来自然地表达中性音调的观察到的频率效应。

{"title":"Frequency Effect and Neutralization of Tones in Mandarin Chinese","authors":"Huifang Kong, Shengyi Wu","doi":"10.1080/09296174.2018.1452140","DOIUrl":"https://doi.org/10.1080/09296174.2018.1452140","url":null,"abstract":"Abstract Tonal neutralization in Mandarin has long been thought to be connected with lexical frequency. But this has never been investigated quantitatively because of the methodological challenge. In this study, a production experiment was run with speakers reading disyllabic words in neutral tones with frequency estimates derived from a Frequency Dictionary. The dependent measures were the three acoustic correlates of: duration, F0 contour and intensity. Independent measures included the lexical frequency at three levels (low, middle and high). Regression analysis showed that neutralization of tones are directly correlated with lexical frequency independent of other factors. A regularity, the more frequent, the shorter in duration; the more frequent, the lower in pitch; the more frequent, the weaker in intensity governs the neutralization of tones in reduced syllables. However, the exact shape of such an effect displays a different scenario in a different frequency range. Only high frequency words display a significant difference from low frequency words. Last but not the least, an exemplar representation is proposed to express a neutral tone’s observed frequency effect naturally.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"26 1","pages":"115 - 95"},"PeriodicalIF":1.4,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2018.1452140","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45619077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A Comprehensive Study of the Parameters in the Creation and Comparison of Feature Vectors in Distributional Semantic Models 分布语义模型中特征向量创建与比较的参数综合研究

IF 1.4 2区文学 0 LANGUAGE & LINGUISTICS

Journal of Quantitative Linguistics

Pub Date : 2019-03-12 DOI: 10.1080/09296174.2019.1570897

A. Dobó, J. Csirik

ABSTRACT Measuring the semantic similarity and relatedness of words can play a vital role in many natural language processing tasks. Distributional semantic models computing these measures can have many different parameters, such as different weighting schemes, vector similarity measures, feature transformation functions and dimensionality reduction techniques. Despite their importance there is no truly comprehensive study simultaneously evaluating the numerous parameters of such models, while also considering the interaction of these parameters with each other. We would like to address this gap with our systematic study. Taking the necessary distributional information extracted from the chosen dataset as already granted, we evaluate all important aspects of the creation and comparison of feature vectors in distributional semantic models. Testing altogether 10 parameters simultaneously, we try to find the best combination of parameter settings, with a large number of settings examined in case of some of the parameters. Beside evaluating the conventionally used settings for the parameters, we also propose numerous novel variants, as well as novel combinations of parameter settings, some of which significantly outperform the combinations of settings in general use, thus achieving state-of-the-art results.

摘要测量单词的语义相似性和关联性在许多自然语言处理任务中起着至关重要的作用。计算这些度量的分布语义模型可以具有许多不同的参数，例如不同的加权方案、向量相似性度量、特征转换函数和降维技术。尽管它们很重要，但还没有真正全面的研究同时评估这些模型的众多参数，同时考虑这些参数之间的相互作用。我们希望通过系统的研究来弥补这一差距。假设从所选数据集中提取的必要分布信息是理所当然的，我们评估了分布语义模型中特征向量的创建和比较的所有重要方面。同时测试总共10个参数，我们试图找到参数设置的最佳组合，在某些参数的情况下检查大量设置。除了评估传统使用的参数设置外，我们还提出了许多新的变体，以及参数设置的新组合，其中一些显著优于一般使用的设置组合，从而获得了最先进的结果。

{"title":"A Comprehensive Study of the Parameters in the Creation and Comparison of Feature Vectors in Distributional Semantic Models","authors":"A. Dobó, J. Csirik","doi":"10.1080/09296174.2019.1570897","DOIUrl":"https://doi.org/10.1080/09296174.2019.1570897","url":null,"abstract":"ABSTRACT Measuring the semantic similarity and relatedness of words can play a vital role in many natural language processing tasks. Distributional semantic models computing these measures can have many different parameters, such as different weighting schemes, vector similarity measures, feature transformation functions and dimensionality reduction techniques. Despite their importance there is no truly comprehensive study simultaneously evaluating the numerous parameters of such models, while also considering the interaction of these parameters with each other. We would like to address this gap with our systematic study. Taking the necessary distributional information extracted from the chosen dataset as already granted, we evaluate all important aspects of the creation and comparison of feature vectors in distributional semantic models. Testing altogether 10 parameters simultaneously, we try to find the best combination of parameter settings, with a large number of settings examined in case of some of the parameters. Beside evaluating the conventionally used settings for the parameters, we also propose numerous novel variants, as well as novel combinations of parameter settings, some of which significantly outperform the combinations of settings in general use, thus achieving state-of-the-art results.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"27 1","pages":"244 - 271"},"PeriodicalIF":1.4,"publicationDate":"2019-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2019.1570897","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48204474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

A Systemic Dynamics Model of Text Production 文本生产的系统动力学模型

IF 1.4 2区文学 0 LANGUAGE & LINGUISTICS

Journal of Quantitative Linguistics

Pub Date : 2019-02-11 DOI: 10.1080/09296174.2019.1567301

Giacomo P. Figueredo, G. Figueredo

ABSTRACT This paper introduces a quantitative model of text as it unfolds in time. The model conceptualizes text as a functional unit of language. This organization can be difficult to identify because it forms complex patterns of linguistic laws, probability and dynamics. These patterns are covert configurations and need complex methods to be investigated. One such method is to draw from qualitative frameworks derived from the quantitative properties of language. Previous studies have shown that covert configurations can be obtained through qualitative frameworks. When dynamics is considered, however, a model of text production including the variable time is needed. This paper therefore aims at addressing this research gap by proposing a dynamics model of text unfolding. It draws from systemic theory and models its categories quantitatively. Time is introduced as variation of choice. The model is applied to a sample of text. Results show how individual choices contribute to text unfolding – describing the amount of meanings at any given moment in text time. In addition, the dynamic accumulation indicates core characteristics of a text, which can be further explored in text behaviour and typology.

本文介绍了一个文本随时间展开的定量模型。该模型将文本概念化为语言的功能单位。这种组织很难识别，因为它形成了语言规律、概率和动态的复杂模式。这些模式是隐蔽的配置，需要复杂的方法来研究。其中一种方法是从语言的定量特性中提取定性框架。以往的研究表明，隐蔽构型可以通过定性框架获得。然而，当考虑动态时，需要一个包含可变时间的文本生成模型。因此，本文旨在通过提出文本展开的动态模型来解决这一研究缺口。它借鉴了系统理论，并对其分类进行了定量建模。时间作为选择的变量被引入。该模型应用于文本样本。结果显示了个人选择如何对文本展开做出贡献——描述文本时间中任何给定时刻的含义数量。此外，动态积累表明了文本的核心特征，这可以在文本行为和类型学中进一步探索。

{"title":"A Systemic Dynamics Model of Text Production","authors":"Giacomo P. Figueredo, G. Figueredo","doi":"10.1080/09296174.2019.1567301","DOIUrl":"https://doi.org/10.1080/09296174.2019.1567301","url":null,"abstract":"ABSTRACT This paper introduces a quantitative model of text as it unfolds in time. The model conceptualizes text as a functional unit of language. This organization can be difficult to identify because it forms complex patterns of linguistic laws, probability and dynamics. These patterns are covert configurations and need complex methods to be investigated. One such method is to draw from qualitative frameworks derived from the quantitative properties of language. Previous studies have shown that covert configurations can be obtained through qualitative frameworks. When dynamics is considered, however, a model of text production including the variable time is needed. This paper therefore aims at addressing this research gap by proposing a dynamics model of text unfolding. It draws from systemic theory and models its categories quantitatively. Time is introduced as variation of choice. The model is applied to a sample of text. Results show how individual choices contribute to text unfolding – describing the amount of meanings at any given moment in text time. In addition, the dynamic accumulation indicates core characteristics of a text, which can be further explored in text behaviour and typology.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"27 1","pages":"291 - 320"},"PeriodicalIF":1.4,"publicationDate":"2019-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2019.1567301","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59838178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Quantitative Approaches to the Russian Language 俄语的量化方法

IF 1.4 2区文学 0 LANGUAGE & LINGUISTICS

Journal of Quantitative Linguistics

Pub Date : 2019-01-10 DOI: 10.1080/09296174.2018.1558834

E. Kelih

The omnibus volume under review comprises 10 individual chapters by 22 authors, thus most of the chapters are co-authored. This seems to reflect the overall interdisciplinary approach focus of the ...

正在审查的综合卷包括22位作者的10个单独章节，因此大多数章节都是合著的。这似乎反映了跨学科方法的整体重点。。。

引用次数: 14

Statistical Analysis of the Tables in Mahadevan’s Concordance of the Indus Valley Script 《印度河流域文字玛哈德万汇编》表的统计分析

IF 1.4 2区文学 0 LANGUAGE & LINGUISTICS

Journal of Quantitative Linguistics

Pub Date : 2019-01-02 DOI: 10.1080/09296174.2017.1406294

M. Oakes

Abstract The Indus Script originates from the culture known as the Indus Valley Civilization, which flourished from approximately 2600 to 1900 bc. Several thousand objects bearing these signs have been found over a wide area of Northern India and Pakistan. In 1977, Iravatham Mahadevan published a concordance of all of the scripts that had been discovered so far. Accompanying the concordance are a set of nine tables showing the distribution of individual signs by position, archaeological site, object type, field symbol (accompanying image), and direction of writing. Analysis of the frequencies of the signs found so far using Large Numbers of Rare Events (LNRE) models estimated the total vocabulary of the language, including signs not yet found, to be about 857. All the tables were analysed using Pearson’s residuals, and it was found that the signs were not randomly distributed, but some showed statistically significant associations with position, object, field symbol or direction of writing. A more detailed analysis of the relation between signs and field symbols was made using correspondence analysis, which showed that certain signs were associated with the unicorn symbol, while others were associated with the gharial and dotted circle symbols.

摘要印度河文字起源于印度河流域文明，该文化繁荣于约公元前2600年至1900年。在印度北部和巴基斯坦的大片地区发现了数千个带有这些标志的物体。1977年，Iravatham Mahadevan出版了迄今为止发现的所有剧本的索引。伴随着一致性的是一组九张表，显示了按位置、考古遗址、物体类型、场地符号（附带图像）和书写方向排列的单个标志的分布。使用大量罕见事件（LNRE）模型对迄今为止发现的迹象的频率进行分析，估计该语言的总词汇（包括尚未发现的迹象）约为857个。使用Pearson残差对所有表格进行了分析，发现这些符号不是随机分布的，但有些符号与位置、对象、场符号或书写方向具有统计学意义。通过对应分析，对符号与场符号之间的关系进行了更详细的分析，结果表明，某些符号与独角兽符号有关，而另一些符号则与gharial和点圆符号有关。

{"title":"Statistical Analysis of the Tables in Mahadevan’s Concordance of the Indus Valley Script","authors":"M. Oakes","doi":"10.1080/09296174.2017.1406294","DOIUrl":"https://doi.org/10.1080/09296174.2017.1406294","url":null,"abstract":"Abstract The Indus Script originates from the culture known as the Indus Valley Civilization, which flourished from approximately 2600 to 1900 bc. Several thousand objects bearing these signs have been found over a wide area of Northern India and Pakistan. In 1977, Iravatham Mahadevan published a concordance of all of the scripts that had been discovered so far. Accompanying the concordance are a set of nine tables showing the distribution of individual signs by position, archaeological site, object type, field symbol (accompanying image), and direction of writing. Analysis of the frequencies of the signs found so far using Large Numbers of Rare Events (LNRE) models estimated the total vocabulary of the language, including signs not yet found, to be about 857. All the tables were analysed using Pearson’s residuals, and it was found that the signs were not randomly distributed, but some showed statistically significant associations with position, object, field symbol or direction of writing. A more detailed analysis of the relation between signs and field symbols was made using correspondence analysis, which showed that certain signs were associated with the unicorn symbol, while others were associated with the gharial and dotted circle symbols.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"26 1","pages":"22 - 47"},"PeriodicalIF":1.4,"publicationDate":"2019-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2017.1406294","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41996848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Journal of Quantitative Linguistics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀