首页 > 最新文献

Journal of Quantitative Linguistics最新文献

英文 中文
Word Length Distribution in German Texts during the 17th-19th Century 17-19世纪德语文本中的词长分布
IF 1.4 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2019-09-15 DOI: 10.1080/09296174.2019.1662536
Fei Lian, Y. Li
ABSTRACT Word length in German texts has been a frequently discussed issue in the field of quantitative linguistics. Taking an overall view of the existing research data, however, most of the research focuses on literary texts and private letters and the size of data corpus for each research is relatively small. This paper provides a time- and genre-based analysis of word length distribution in German using 360 texts originated between the 17th and 19th centuries, aiming to find a probability distribution that can capture well the German word length distribution from a diachronic perspective and to reveal the relationship between the word length distribution and boundary conditions such as the genre and the creation time of text. Results indicate that the word length distribution in German texts written in different eras abides by the 1-displaced hyper-Poisson distribution, whose parameters (a, b) are interconnected with boundary conditions. This study corroborates that the word length distribution of a certain language is consistent, due to the constraint of the cognitive mechanism. Besides, the parameters of probability distribution can be good indicators of the writing style as well as the creation time of text.
摘要德语文本中的单词长度一直是数量语言学领域中经常讨论的问题。然而,从现有的研究数据来看,大多数研究都集中在文学文本和私人信件上,每项研究的数据语料库规模相对较小。本文利用源自17世纪至19世纪的360篇文本,对德语中的单词长度分布进行了基于时间和体裁的分析,旨在从历时的角度找到一个能够很好地捕捉德语单词长度分布的概率分布,并揭示单词长度分布与文本类型和创作时间等边界条件之间的关系。结果表明,不同时代德语文本中的字长分布遵循1维超泊松分布,其参数(a,b)与边界条件相互关联。本研究证实,由于认知机制的限制,某一语言的单词长度分布是一致的。此外,概率分布参数可以很好地指示写作风格以及文本的创作时间。
{"title":"Word Length Distribution in German Texts during the 17th-19th Century","authors":"Fei Lian, Y. Li","doi":"10.1080/09296174.2019.1662536","DOIUrl":"https://doi.org/10.1080/09296174.2019.1662536","url":null,"abstract":"ABSTRACT Word length in German texts has been a frequently discussed issue in the field of quantitative linguistics. Taking an overall view of the existing research data, however, most of the research focuses on literary texts and private letters and the size of data corpus for each research is relatively small. This paper provides a time- and genre-based analysis of word length distribution in German using 360 texts originated between the 17th and 19th centuries, aiming to find a probability distribution that can capture well the German word length distribution from a diachronic perspective and to reveal the relationship between the word length distribution and boundary conditions such as the genre and the creation time of text. Results indicate that the word length distribution in German texts written in different eras abides by the 1-displaced hyper-Poisson distribution, whose parameters (a, b) are interconnected with boundary conditions. This study corroborates that the word length distribution of a certain language is consistent, due to the constraint of the cognitive mechanism. Besides, the parameters of probability distribution can be good indicators of the writing style as well as the creation time of text.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"28 1","pages":"117 - 137"},"PeriodicalIF":1.4,"publicationDate":"2019-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2019.1662536","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42201015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Statistics in Corpus Linguistics: A Practical Guide 语料库语言学中的统计:实用指南
IF 1.4 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2019-07-29 DOI: 10.1080/09296174.2019.1646069
Cunxin Han
Corpus linguistics is a powerful quantitative methodology, which heavily relies on frequency data and statistical procedures. It is difficult to talk about corpus linguistics without mentioning sta...
语料库语言学是一种强大的定量方法,它在很大程度上依赖于频率数据和统计程序。谈到语料库语言学,就不能不提到sta。
{"title":"Statistics in Corpus Linguistics: A Practical Guide","authors":"Cunxin Han","doi":"10.1080/09296174.2019.1646069","DOIUrl":"https://doi.org/10.1080/09296174.2019.1646069","url":null,"abstract":"Corpus linguistics is a powerful quantitative methodology, which heavily relies on frequency data and statistical procedures. It is difficult to talk about corpus linguistics without mentioning sta...","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"27 1","pages":"379 - 383"},"PeriodicalIF":1.4,"publicationDate":"2019-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2019.1646069","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46755901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 74
Optimal Coding and the Origins of Zipfian Laws 最优编码与Zipfian定律的起源
IF 1.4 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2019-06-04 DOI: 10.1080/09296174.2020.1778387
R. Ferrer-i-Cancho, C. Bentz
ABSTRACT The problem of compression in standard information theory consists of assigning codes as short as possible to numbers. Here we consider the problem of optimal coding – under an arbitrary coding scheme – and show that it predicts Zipf’s law of abbreviation, namely a tendency in natural languages for more frequent words to be shorter. We apply this result to investigate optimal coding also under so-called non-singular coding, a scheme where unique segmentation is not warranted but codes stand for a distinct number. Optimal non-singular coding predicts that the length of a word should grow approximately as the logarithm of its frequency rank, which is again consistent with Zipf’s law of abbreviation. Optimal non-singular coding in combination with the maximum entropy principle also predicts Zipf’s rank-frequency distribution. Furthermore, our findings on optimal non-singular coding challenge common beliefs about random typing. It turns out that random typing is in fact an optimal coding process, in stark contrast with the common assumption that it is detached from cost cutting considerations. Finally, we discuss the implications of optimal coding for the construction of a compact theory of Zipfian laws more generally as well as other linguistic laws.
标准信息论中的压缩问题包括将尽可能短的代码分配给数字。在这里,我们考虑了在任意编码方案下的最优编码问题,并表明它预测了Zipf缩写定律,即自然语言中更频繁的单词更短的趋势。我们将这一结果应用于研究在所谓的非奇异编码下的最优编码,该方案不保证唯一分割,但代码代表不同的数字。最优非奇异编码预测单词的长度应该近似于其频率秩的对数增长,这再次符合Zipf的缩写定律。结合最大熵原理的最优非奇异编码也预测了齐普夫的秩频率分布。此外,我们关于最优非奇异编码的发现挑战了关于随机类型的普遍信念。事实证明,随机打字实际上是一种最佳的编码过程,这与人们普遍认为它脱离了成本削减的考虑形成了鲜明对比。最后,我们讨论了最优编码对Zipfian定律以及其他语言定律的紧凑理论的构建的影响。
{"title":"Optimal Coding and the Origins of Zipfian Laws","authors":"R. Ferrer-i-Cancho, C. Bentz","doi":"10.1080/09296174.2020.1778387","DOIUrl":"https://doi.org/10.1080/09296174.2020.1778387","url":null,"abstract":"ABSTRACT The problem of compression in standard information theory consists of assigning codes as short as possible to numbers. Here we consider the problem of optimal coding – under an arbitrary coding scheme – and show that it predicts Zipf’s law of abbreviation, namely a tendency in natural languages for more frequent words to be shorter. We apply this result to investigate optimal coding also under so-called non-singular coding, a scheme where unique segmentation is not warranted but codes stand for a distinct number. Optimal non-singular coding predicts that the length of a word should grow approximately as the logarithm of its frequency rank, which is again consistent with Zipf’s law of abbreviation. Optimal non-singular coding in combination with the maximum entropy principle also predicts Zipf’s rank-frequency distribution. Furthermore, our findings on optimal non-singular coding challenge common beliefs about random typing. It turns out that random typing is in fact an optimal coding process, in stark contrast with the common assumption that it is detached from cost cutting considerations. Finally, we discuss the implications of optimal coding for the construction of a compact theory of Zipfian laws more generally as well as other linguistic laws.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"29 1","pages":"165 - 194"},"PeriodicalIF":1.4,"publicationDate":"2019-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2020.1778387","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47778723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Correlations and Potential Cross-Linguistic Indicators of Writing Style 写作风格的相关性和潜在的跨语言指标
IF 1.4 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2019-04-03 DOI: 10.1080/09296174.2018.1458395
P. Juola, George K. Mikros, Sean Vinsick
Abstract In this paper, we present preliminary results on how an individual’s writing style persists even across languages. In other words, what aspects of an individual’s writing will persist irrespective of the language in which he or she writes? We argue that cognitive and social traits are likely to persist and demonstrate this by two separate analyses of bilingual corpora using the same individuals. We show that for various measures of linguistic complexity (which we consider to be a cognitive variable) and participation in specific social conventions (a social one), the correlation between scores on the two languages studied is significantly higher than would be expected by chance. We argue that this type of correlation may permit cross-linguistic authorship attribution.
摘要在本文中,我们提出了关于个人写作风格如何在不同语言中持续存在的初步结果。换句话说,不管一个人用什么语言写作,他或她的写作中有哪些方面会持续存在?我们认为,认知和社会特征可能会持续存在,并通过使用同一个人对双语语料库进行两个单独的分析来证明这一点。我们表明,对于语言复杂性(我们认为这是一个认知变量)和参与特定社会习俗(社会习俗)的各种衡量标准,两种语言的得分之间的相关性明显高于偶然预期。我们认为,这种类型的相关性可能允许跨语言作者归属。
{"title":"Correlations and Potential Cross-Linguistic Indicators of Writing Style","authors":"P. Juola, George K. Mikros, Sean Vinsick","doi":"10.1080/09296174.2018.1458395","DOIUrl":"https://doi.org/10.1080/09296174.2018.1458395","url":null,"abstract":"Abstract In this paper, we present preliminary results on how an individual’s writing style persists even across languages. In other words, what aspects of an individual’s writing will persist irrespective of the language in which he or she writes? We argue that cognitive and social traits are likely to persist and demonstrate this by two separate analyses of bilingual corpora using the same individuals. We show that for various measures of linguistic complexity (which we consider to be a cognitive variable) and participation in specific social conventions (a social one), the correlation between scores on the two languages studied is significantly higher than would be expected by chance. We argue that this type of correlation may permit cross-linguistic authorship attribution.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"26 1","pages":"146 - 171"},"PeriodicalIF":1.4,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2018.1458395","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46316258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Frequency Effect and Neutralization of Tones in Mandarin Chinese 普通话声调的频率效应与中性化
IF 1.4 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2019-04-03 DOI: 10.1080/09296174.2018.1452140
Huifang Kong, Shengyi Wu
Abstract Tonal neutralization in Mandarin has long been thought to be connected with lexical frequency. But this has never been investigated quantitatively because of the methodological challenge. In this study, a production experiment was run with speakers reading disyllabic words in neutral tones with frequency estimates derived from a Frequency Dictionary. The dependent measures were the three acoustic correlates of: duration, F0 contour and intensity. Independent measures included the lexical frequency at three levels (low, middle and high). Regression analysis showed that neutralization of tones are directly correlated with lexical frequency independent of other factors. A regularity, the more frequent, the shorter in duration; the more frequent, the lower in pitch; the more frequent, the weaker in intensity governs the neutralization of tones in reduced syllables. However, the exact shape of such an effect displays a different scenario in a different frequency range. Only high frequency words display a significant difference from low frequency words. Last but not the least, an exemplar representation is proposed to express a neutral tone’s observed frequency effect naturally.
摘要长期以来,人们一直认为普通话的音调中和与词汇频率有关。但由于方法上的挑战,这一点从未得到定量研究。在这项研究中,对说话者进行了一项生产实验,用频率词典中的频率估计值阅读中性音调的双音节单词。依赖性测量是三个声学相关性:持续时间、F0轮廓和强度。独立测量包括三个层次(低、中、高)的词汇频率。回归分析表明,声调的中和与词汇频率直接相关,不受其他因素的影响。规律性,频率越高,持续时间越短;频率越高,音调越低;越频繁,强度就越弱,这就决定了缩减音节中音调的中和。然而,这种效果的确切形状在不同的频率范围内显示了不同的场景。只有高频词显示出与低频词的显著差异。最后但并非最不重要的是,提出了一个示例表示来自然地表达中性音调的观察到的频率效应。
{"title":"Frequency Effect and Neutralization of Tones in Mandarin Chinese","authors":"Huifang Kong, Shengyi Wu","doi":"10.1080/09296174.2018.1452140","DOIUrl":"https://doi.org/10.1080/09296174.2018.1452140","url":null,"abstract":"Abstract Tonal neutralization in Mandarin has long been thought to be connected with lexical frequency. But this has never been investigated quantitatively because of the methodological challenge. In this study, a production experiment was run with speakers reading disyllabic words in neutral tones with frequency estimates derived from a Frequency Dictionary. The dependent measures were the three acoustic correlates of: duration, F0 contour and intensity. Independent measures included the lexical frequency at three levels (low, middle and high). Regression analysis showed that neutralization of tones are directly correlated with lexical frequency independent of other factors. A regularity, the more frequent, the shorter in duration; the more frequent, the lower in pitch; the more frequent, the weaker in intensity governs the neutralization of tones in reduced syllables. However, the exact shape of such an effect displays a different scenario in a different frequency range. Only high frequency words display a significant difference from low frequency words. Last but not the least, an exemplar representation is proposed to express a neutral tone’s observed frequency effect naturally.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"26 1","pages":"115 - 95"},"PeriodicalIF":1.4,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2018.1452140","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45619077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Calculation of Semantic Distances Between Words: From Synonymy to Antonymy 词间语义距离的计算:从同义词到反义词
IF 1.4 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2019-04-03 DOI: 10.1080/09296174.2018.1452524
M. Vakulenko
Abstract A new approach to numerically measure the semantic distances between lexical units (words and collocations) based on the geometric analogies and analytical calculation, is put forward. Having considered the cases of equal and different weights of semes, we obtained exact algebraic formulas describing different levels of the meanings proximity, ranging from absolute synonymy to full antonymy. It was emphasized that absolute synonymy arises when the compared units contain equal numbers of semes that fully coincide and have equal weights in the corresponding pairs. Calculation of the semes weights helps to locate the unit more precisely on the semantic sphere. It was shown that the level of synonymy and antonymy decreases if different semes are accentuated, while the semantic distance between the units without identical semes cannot be influenced by seme boosting. It was observed that depending on the context, a word may wander over this sphere, thus modifying its lexical semantic relations with other units. As the proposed approach contributes to formalization of the units comparison procedure, it is advisable for incorporation into relevant automatic tools, particularly into WordNet and FrameNet. The obtained results may be useful for various linguistic and associated studies including automatic text analysis and processing, computer lexicography, information search and retrieval, machine translation and other NLP applications that are related to the artificial intelligence problem.
摘要提出了一种基于几何类比和解析计算的词汇单位(词和搭配)之间语义距离的数值测量方法。在考虑了义权相等和不同的情况下,我们得到了从绝对同义词到完全反义词的不同意义接近程度的精确代数公式。当被比较的单位包含相同数量的完全重合且在相应的对中具有相同权重的义时,就会出现绝对同义。义位权值的计算有助于在语义范围内更精确地定位单元。结果表明,强化不同义素会降低同义词和反义词的水平,而强化义素不会影响义素不相同的单元之间的语义距离。我们观察到,根据上下文,一个词可能会在这个范围内徘徊,从而改变其与其他单位的词汇语义关系。由于建议的方法有助于单元比较过程的形式化,因此建议将其合并到相关的自动工具中,特别是合并到WordNet和FrameNet中。所获得的结果可能对各种语言学和相关研究有用,包括自动文本分析和处理,计算机词典编纂,信息搜索和检索,机器翻译和其他与人工智能问题相关的NLP应用。
{"title":"Calculation of Semantic Distances Between Words: From Synonymy to Antonymy","authors":"M. Vakulenko","doi":"10.1080/09296174.2018.1452524","DOIUrl":"https://doi.org/10.1080/09296174.2018.1452524","url":null,"abstract":"Abstract A new approach to numerically measure the semantic distances between lexical units (words and collocations) based on the geometric analogies and analytical calculation, is put forward. Having considered the cases of equal and different weights of semes, we obtained exact algebraic formulas describing different levels of the meanings proximity, ranging from absolute synonymy to full antonymy. It was emphasized that absolute synonymy arises when the compared units contain equal numbers of semes that fully coincide and have equal weights in the corresponding pairs. Calculation of the semes weights helps to locate the unit more precisely on the semantic sphere. It was shown that the level of synonymy and antonymy decreases if different semes are accentuated, while the semantic distance between the units without identical semes cannot be influenced by seme boosting. It was observed that depending on the context, a word may wander over this sphere, thus modifying its lexical semantic relations with other units. As the proposed approach contributes to formalization of the units comparison procedure, it is advisable for incorporation into relevant automatic tools, particularly into WordNet and FrameNet. The obtained results may be useful for various linguistic and associated studies including automatic text analysis and processing, computer lexicography, information search and retrieval, machine translation and other NLP applications that are related to the artificial intelligence problem.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"26 1","pages":"116 - 128"},"PeriodicalIF":1.4,"publicationDate":"2019-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2018.1452524","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47278957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
A Comprehensive Study of the Parameters in the Creation and Comparison of Feature Vectors in Distributional Semantic Models 分布语义模型中特征向量创建与比较的参数综合研究
IF 1.4 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2019-03-12 DOI: 10.1080/09296174.2019.1570897
A. Dobó, J. Csirik
ABSTRACT Measuring the semantic similarity and relatedness of words can play a vital role in many natural language processing tasks. Distributional semantic models computing these measures can have many different parameters, such as different weighting schemes, vector similarity measures, feature transformation functions and dimensionality reduction techniques. Despite their importance there is no truly comprehensive study simultaneously evaluating the numerous parameters of such models, while also considering the interaction of these parameters with each other. We would like to address this gap with our systematic study. Taking the necessary distributional information extracted from the chosen dataset as already granted, we evaluate all important aspects of the creation and comparison of feature vectors in distributional semantic models. Testing altogether 10 parameters simultaneously, we try to find the best combination of parameter settings, with a large number of settings examined in case of some of the parameters. Beside evaluating the conventionally used settings for the parameters, we also propose numerous novel variants, as well as novel combinations of parameter settings, some of which significantly outperform the combinations of settings in general use, thus achieving state-of-the-art results.
摘要测量单词的语义相似性和关联性在许多自然语言处理任务中起着至关重要的作用。计算这些度量的分布语义模型可以具有许多不同的参数,例如不同的加权方案、向量相似性度量、特征转换函数和降维技术。尽管它们很重要,但还没有真正全面的研究同时评估这些模型的众多参数,同时考虑这些参数之间的相互作用。我们希望通过系统的研究来弥补这一差距。假设从所选数据集中提取的必要分布信息是理所当然的,我们评估了分布语义模型中特征向量的创建和比较的所有重要方面。同时测试总共10个参数,我们试图找到参数设置的最佳组合,在某些参数的情况下检查大量设置。除了评估传统使用的参数设置外,我们还提出了许多新的变体,以及参数设置的新组合,其中一些显著优于一般使用的设置组合,从而获得了最先进的结果。
{"title":"A Comprehensive Study of the Parameters in the Creation and Comparison of Feature Vectors in Distributional Semantic Models","authors":"A. Dobó, J. Csirik","doi":"10.1080/09296174.2019.1570897","DOIUrl":"https://doi.org/10.1080/09296174.2019.1570897","url":null,"abstract":"ABSTRACT Measuring the semantic similarity and relatedness of words can play a vital role in many natural language processing tasks. Distributional semantic models computing these measures can have many different parameters, such as different weighting schemes, vector similarity measures, feature transformation functions and dimensionality reduction techniques. Despite their importance there is no truly comprehensive study simultaneously evaluating the numerous parameters of such models, while also considering the interaction of these parameters with each other. We would like to address this gap with our systematic study. Taking the necessary distributional information extracted from the chosen dataset as already granted, we evaluate all important aspects of the creation and comparison of feature vectors in distributional semantic models. Testing altogether 10 parameters simultaneously, we try to find the best combination of parameter settings, with a large number of settings examined in case of some of the parameters. Beside evaluating the conventionally used settings for the parameters, we also propose numerous novel variants, as well as novel combinations of parameter settings, some of which significantly outperform the combinations of settings in general use, thus achieving state-of-the-art results.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"27 1","pages":"244 - 271"},"PeriodicalIF":1.4,"publicationDate":"2019-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2019.1570897","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48204474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Systemic Dynamics Model of Text Production 文本生产的系统动力学模型
IF 1.4 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2019-02-11 DOI: 10.1080/09296174.2019.1567301
Giacomo P. Figueredo, G. Figueredo
ABSTRACT This paper introduces a quantitative model of text as it unfolds in time. The model conceptualizes text as a functional unit of language. This organization can be difficult to identify because it forms complex patterns of linguistic laws, probability and dynamics. These patterns are covert configurations and need complex methods to be investigated. One such method is to draw from qualitative frameworks derived from the quantitative properties of language. Previous studies have shown that covert configurations can be obtained through qualitative frameworks. When dynamics is considered, however, a model of text production including the variable time is needed. This paper therefore aims at addressing this research gap by proposing a dynamics model of text unfolding. It draws from systemic theory and models its categories quantitatively. Time is introduced as variation of choice. The model is applied to a sample of text. Results show how individual choices contribute to text unfolding – describing the amount of meanings at any given moment in text time. In addition, the dynamic accumulation indicates core characteristics of a text, which can be further explored in text behaviour and typology.
本文介绍了一个文本随时间展开的定量模型。该模型将文本概念化为语言的功能单位。这种组织很难识别,因为它形成了语言规律、概率和动态的复杂模式。这些模式是隐蔽的配置,需要复杂的方法来研究。其中一种方法是从语言的定量特性中提取定性框架。以往的研究表明,隐蔽构型可以通过定性框架获得。然而,当考虑动态时,需要一个包含可变时间的文本生成模型。因此,本文旨在通过提出文本展开的动态模型来解决这一研究缺口。它借鉴了系统理论,并对其分类进行了定量建模。时间作为选择的变量被引入。该模型应用于文本样本。结果显示了个人选择如何对文本展开做出贡献——描述文本时间中任何给定时刻的含义数量。此外,动态积累表明了文本的核心特征,这可以在文本行为和类型学中进一步探索。
{"title":"A Systemic Dynamics Model of Text Production","authors":"Giacomo P. Figueredo, G. Figueredo","doi":"10.1080/09296174.2019.1567301","DOIUrl":"https://doi.org/10.1080/09296174.2019.1567301","url":null,"abstract":"ABSTRACT This paper introduces a quantitative model of text as it unfolds in time. The model conceptualizes text as a functional unit of language. This organization can be difficult to identify because it forms complex patterns of linguistic laws, probability and dynamics. These patterns are covert configurations and need complex methods to be investigated. One such method is to draw from qualitative frameworks derived from the quantitative properties of language. Previous studies have shown that covert configurations can be obtained through qualitative frameworks. When dynamics is considered, however, a model of text production including the variable time is needed. This paper therefore aims at addressing this research gap by proposing a dynamics model of text unfolding. It draws from systemic theory and models its categories quantitatively. Time is introduced as variation of choice. The model is applied to a sample of text. Results show how individual choices contribute to text unfolding – describing the amount of meanings at any given moment in text time. In addition, the dynamic accumulation indicates core characteristics of a text, which can be further explored in text behaviour and typology.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"27 1","pages":"291 - 320"},"PeriodicalIF":1.4,"publicationDate":"2019-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2019.1567301","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59838178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Quantitative Approaches to the Russian Language 俄语的量化方法
IF 1.4 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2019-01-10 DOI: 10.1080/09296174.2018.1558834
E. Kelih
The omnibus volume under review comprises 10 individual chapters by 22 authors, thus most of the chapters are co-authored. This seems to reflect the overall interdisciplinary approach focus of the ...
正在审查的综合卷包括22位作者的10个单独章节,因此大多数章节都是合著的。这似乎反映了跨学科方法的整体重点。。。
{"title":"Quantitative Approaches to the Russian Language","authors":"E. Kelih","doi":"10.1080/09296174.2018.1558834","DOIUrl":"https://doi.org/10.1080/09296174.2018.1558834","url":null,"abstract":"The omnibus volume under review comprises 10 individual chapters by 22 authors, thus most of the chapters are co-authored. This seems to reflect the overall interdisciplinary approach focus of the ...","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"27 1","pages":"80 - 83"},"PeriodicalIF":1.4,"publicationDate":"2019-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2018.1558834","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43913979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
On the ‘Stickiness’ of Words. A Comparative Language Study Screening the Internet for English, German, French and Latin Phrases 关于单词的“粘性”。一项比较语言研究:筛选网络上的英语、德语、法语和拉丁语短语
IF 1.4 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2019-01-02 DOI: 10.1080/09296174.2018.1451206
M. Berger
Abstract Language, one of the defining attributes of Homo sapiens, not only deploys as a chain of words. Rather, words group together in a non-random way to form phrases. Here, the world-wide web was searched for idiomatic expressions in three living and one extinct language: 1102 English, 1183 German, 1138 French and 1128 Latin phrases distributed into three categories, with high, middle and low frequencies. High-frequency phrases such as in addition to and as a matter of fact constituted 49.5% of all English phrases, but only 9.0% of the French and 2.5% of the German ones. The middle-frequency category with classical idioms such as a bitter pill or carved in stone comprised 34.9% of the English, 33.0% of the French, and 24.9% of the German phrases. Most French and German phrases were of low frequency. Latin phrases were found as often as French and more often than German ones in the world-wide web, and exhibited a frequency distribution similar to those of French and German. Frequency distributions yielded three main categories around similar maxima for all four languages, with differing relative proportions. The internet may prove useful for the quantitative comparison of languages.
抽象语言是智人的特征之一,它不仅是一个单词链。相反,单词以非随机的方式组合在一起形成短语。在这里,万维网搜索了三种现存和一种已灭绝语言的习语:1102个英语、1183个德语、1138个法语和1128个拉丁短语,分为三类,频率分别为高、中、低频。事实上,高频短语占所有英语短语的49.5%,但仅占法语短语的9.0%和德语短语的2.5%。英语、法语和德语中分别占34.9%、33.0%和24.9%。大多数法语和德语短语频率较低。在万维网上,拉丁短语与法语一样常见,比德语更常见,其频率分布与法语和德语相似。频率分布在所有四种语言的相似最大值周围产生了三个主要类别,具有不同的相对比例。事实证明,互联网可能有助于语言的定量比较。
{"title":"On the ‘Stickiness’ of Words. A Comparative Language Study Screening the Internet for English, German, French and Latin Phrases","authors":"M. Berger","doi":"10.1080/09296174.2018.1451206","DOIUrl":"https://doi.org/10.1080/09296174.2018.1451206","url":null,"abstract":"Abstract Language, one of the defining attributes of Homo sapiens, not only deploys as a chain of words. Rather, words group together in a non-random way to form phrases. Here, the world-wide web was searched for idiomatic expressions in three living and one extinct language: 1102 English, 1183 German, 1138 French and 1128 Latin phrases distributed into three categories, with high, middle and low frequencies. High-frequency phrases such as in addition to and as a matter of fact constituted 49.5% of all English phrases, but only 9.0% of the French and 2.5% of the German ones. The middle-frequency category with classical idioms such as a bitter pill or carved in stone comprised 34.9% of the English, 33.0% of the French, and 24.9% of the German phrases. Most French and German phrases were of low frequency. Latin phrases were found as often as French and more often than German ones in the world-wide web, and exhibited a frequency distribution similar to those of French and German. Frequency distributions yielded three main categories around similar maxima for all four languages, with differing relative proportions. The internet may prove useful for the quantitative comparison of languages.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"26 1","pages":"81 - 94"},"PeriodicalIF":1.4,"publicationDate":"2019-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2018.1451206","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44814266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Quantitative Linguistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1