Glottometrics最新文献

英文中文

The meaning distributions on different levels of granularity 不同粒度级别上的意义分布

Q4 LINGUISTICS

Glottometrics

Pub Date : 2023-01-01 DOI: 10.53482/2023_54_405

T. Yih, Haitao Liu

The meaning distributions of certain linguistic forms generally follow a Zipfian distribution. However, since the meanings can be observed and classified on different levels of granularity, it is thus interesting to ask whether their distributions on different levels can be fitted by the same model and whether the parameters are the same. In this study, we investigate three quasi-prepositions in Shanghainese, a dialect of Wu Chinese, and test whether the meaning distributions on two levels of granularity can be fitted by the same model and whether the parameters are close. The results first show that the three models proposed by modern quantitative linguists can both achieve a good fit for all cases, while both the exponential (EXP) model and the right-truncated negative binomial (RTBN) models behave better than the modified right-truncated Zipf-Alekseev distribution (MRTZA), in terms of the consistency of the goodness of fit, parameter change, rationality, and simplicity. Second, the parameters of the distributions on the two levels and the curves are not exactly the same or even close to each other. This has supported a weak view of the concept of ‘scaling’ in complex sciences. Finally, differences are found to lie between the distributions on the two levels. The fine-grained meaning distributions are more right-skewed and more non-linear. This is attributed to the openness of the categories of systems. The finer semantic differentiation behaves like systems with open set of categories, while the coarse-grained meaning distribution resembles those having a close set of few categories.

某些语言形式的意义分布一般遵循齐夫分布。然而，由于意义可以在不同的粒度级别上观察和分类，因此，它们在不同级别上的分布是否可以用同一个模型来拟合，参数是否相同，这是一个有趣的问题。本研究以吴语上海话中的三个准介词为研究对象，检验了两个粒度水平上的意义分布是否可以用同一模型拟合，参数是否接近。结果首先表明，现代定量语言学家提出的三种模型都能很好地拟合所有情况，而指数模型(EXP)和右截断负二项模型(RTBN)在拟合优度的一致性、参数变化、合理性和简单性方面都优于修正的右截断Zipf-Alekseev分布(MRTZA)。二是两层分布和曲线的参数不完全相同，甚至不接近。这支持了复杂科学中“尺度”概念的薄弱观点。最后，发现两个层次的分布之间存在差异。细粒度的意义分布更右偏，更非线性。这要归功于系统类别的开放性。更精细的语义区分表现为具有开放类别集的系统，而粗粒度的含义分布类似于具有少数类别的紧密集的系统。

{"title":"The meaning distributions on different levels of granularity","authors":"T. Yih, Haitao Liu","doi":"10.53482/2023_54_405","DOIUrl":"https://doi.org/10.53482/2023_54_405","url":null,"abstract":"The meaning distributions of certain linguistic forms generally follow a Zipfian distribution. However, since the meanings can be observed and classified on different levels of granularity, it is thus interesting to ask whether their distributions on different levels can be fitted by the same model and whether the parameters are the same. In this study, we investigate three quasi-prepositions in Shanghainese, a dialect of Wu Chinese, and test whether the meaning distributions on two levels of granularity can be fitted by the same model and whether the parameters are close. The results first show that the three models proposed by modern quantitative linguists can both achieve a good fit for all cases, while both the exponential (EXP) model and the right-truncated negative binomial (RTBN) models behave better than the modified right-truncated Zipf-Alekseev distribution (MRTZA), in terms of the consistency of the goodness of fit, parameter change, rationality, and simplicity. Second, the parameters of the distributions on the two levels and the curves are not exactly the same or even close to each other. This has supported a weak view of the concept of ‘scaling’ in complex sciences. Finally, differences are found to lie between the distributions on the two levels. The fine-grained meaning distributions are more right-skewed and more non-linear. This is attributed to the openness of the categories of systems. The finer semantic differentiation behaves like systems with open set of categories, while the coarse-grained meaning distribution resembles those having a close set of few categories.","PeriodicalId":51918,"journal":{"name":"Glottometrics","volume":"93 1","pages":"13-38"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74544443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fellow or foe? A quantitative thematic exploration into Putin's and Trump's stylometric features 伙伴还是敌人?对普京和特朗普文体特征的定量专题探讨

Q4 LINGUISTICS

Glottometrics

Pub Date : 2023-01-01 DOI: 10.53482/2023_54_406

Yaqin Wang, Ting Zeng

Thematic concentration, a quantitative linguistic method, can reflect the speech style of a particular person. It may, to some degree, reflect the degree of a speaker’s intention to communicate certain themes. There has been limited empirical research on the similarity between Trump and Putin with respect to their linguistic features. Thus, the present study aims to compare Putin’s and Trump’s stylometric features and political themes based on thematic concentration with a corpus of Putin’s, Medvedev’s, Trump’s, and Obama’s speeches. Results show that 1) Both Putin’s and Trump’s speeches’ thematic concentration values are significantly or marginally significantly different from their precedents’. 2) Two leaders pay great attention to the concept of nationalism. 3) Thematic words of their speeches were slightly different across periods, reflecting the influence of external factors on the choice of language. The results of the present study may shed light on the stylometric studies of Putin and Trump.

主位集中是一种定量的语言方法，可以反映一个人的说话风格。在某种程度上，它可能反映了说话者想要传达某些主题的程度。关于特朗普和普京在语言特征上的相似性的实证研究有限。因此，本研究旨在将普京和特朗普的演讲风格特征和基于主题集中的政治主题与普京、梅德韦杰夫、特朗普和奥巴马的演讲语料库进行比较。结果表明:1)普京和特朗普演讲的主题集中度值与其前例相比存在显著或微显著差异。2)两位领导人都非常重视民族主义的概念。3)不同时期的演讲主题词略有不同，反映了外部因素对语言选择的影响。本研究的结果可能对普京和特朗普的文体学研究有所启发。

引用次数: 0

A comparison of two text specificity measures analyzing a heterogenous text corpus 分析异质文本语料库的两种文本特异性度量的比较

Q4 LINGUISTICS

Glottometrics

Pub Date : 2023-01-01 DOI: 10.53482/2023_54_404

A. Oleinik

The article compares the performance of two term specificity measures, Cohen’s d and Z-score, when analyzing political and media discourses on Russia’s war in Ukraine in four languages and five countries. In addition to linguistic and stylistic heterogeneity, 3,347 texts included in the corpus have variable length. The two measures display convergent validity, as confirmed by various performance metrics. It is argued that the measures can be adapted to a broader range of tasks in information retrieval and digital humanities, in addition to their usefulness for text mining and content analysis.

这篇文章在分析俄罗斯在乌克兰战争中以四种语言和五个国家的政治和媒体话语时，比较了两种术语特异性指标科恩的d和z得分的表现。除了语言和文体的异质性外，语料库中包含的3347个文本长度不一。正如各种性能指标所证实的那样，这两个度量显示出收敛的有效性。有人认为，除了对文本挖掘和内容分析有用之外，这些措施还可以适用于信息检索和数字人文学科中更广泛的任务。

引用次数: 1

The journal SMIL - Statistical Methods in Linguistics (1962-1976) - some notes about the history of quantitative linguistics in Scandinavia and beyond 杂志SMIL -语言学统计方法(1962-1976)-一些关于数量语言学在斯堪的纳维亚半岛和超越历史的笔记

Q4 LINGUISTICS

Glottometrics

Pub Date : 2023-01-01 DOI: 10.53482/2023_54_408

E. Kelih

This article deals with the history of quantitative linguistics. The focus of this paper is the journal SMIL – Statistical Methods in Linguistics, which was published by Hans Karlgren in Stockholm from 1962 to 1976 (with a short interruption between 1966 and 1969). SMIL is a representative example of the process of differentiation in quantitative linguistics during the seventies and can be seen as one early major “Scandinavian” contribution to statistical and quantitative linguistics.

这篇文章论述了数量语言学的历史。本文的研究重点是Hans Karlgren于1962年至1976年在斯德哥尔摩出版的《语言学统计方法》(SMIL - Statistical Methods in Linguistics)杂志(1966年至1969年短暂中断)。SMIL是70年代数量语言学分化过程的代表性例子，可以被视为早期“斯堪的纳维亚”对统计和数量语言学的主要贡献。

引用次数: 0

Quantifying syntax similarity with a polynomial representation of dependency trees 用依赖树的多项式表示量化语法相似度

Q4 LINGUISTICS

Glottometrics

Pub Date : 2022-11-13 DOI: 10.48550/arXiv.2211.07005

Peng Liu, Tinghao Feng, Rui Liu

We introduce a graph polynomial that distinguishes tree structures to represent dependency grammar and a measure based on the polynomial representation to quantify syntax similarity. The polynomial encodes accurate and comprehensive information about the dependency structure and dependency relations of words in a sentence, which enables in-depth analysis of dependency trees with data analysis tools. We apply the polynomial-based methods to analyze sentences in the ParallelUniversal Dependencies treebanks. Specifically, we compare the syntax of sentences and their translations in different languages, and we perform a syntactic typology study of available languages in the Parallel Universal Dependencies treebanks. We also demonstrate and discuss the potential of the methods in measuring syntax diversity of corpora.

我们引入了区分树结构的图多项式来表示依赖语法，并引入了基于多项式表示的度量来量化语法相似度。该多项式编码了一个句子中单词的依赖结构和依赖关系的准确而全面的信息，可以使用数据分析工具对依赖树进行深入分析。我们应用基于多项式的方法来分析并行通用依赖树库中的句子。具体来说，我们比较了不同语言的句子及其翻译的语法，并对Parallel Universal Dependencies树库中的可用语言进行了句法类型学研究。我们还展示和讨论了这些方法在测量语料库句法多样性方面的潜力。

引用次数: 1

Attempting at parametrization of moderate-length poetic texts: Moses, a poem by Ivan Franko 试图将中等长度的诗歌文本参数化:《摩西》，伊万·弗兰科的一首诗

Q4 LINGUISTICS

Glottometrics

Pub Date : 2022-01-01 DOI: 10.53482/2022_53_399

S. Buk, Andrij Rovenchak

The aim of this study is to find parameters that can be used for classification of not very long texts, for example, by author, genre, etc. We go through various known parameters and analyze to what extent they are useful for the intended purposes. We also suggest some improvements that need to be checked further. We calculate the values of parameters at various points of text comprising N tokens (running words) counted from the beginning of text. As parameters with prospects of author and/or language attribution we identify, in particular, the h-point scaling coefficient, Yule’s K, relative repeat rate, and the fraction of dis legomena. These parameters demonstrate quite stable behavior in N. Another set includes scaling exponents of parameters with respect to N. Certain modifications are suggested for Lambda and entropy introducing logarithmic corrections being powers of ln N. The results are applicable for texts of thousands to tens of thousand words.

本研究的目的是寻找可用于不太长的文本分类的参数，例如，按作者、体裁等进行分类。我们遍历各种已知参数，并分析它们在多大程度上对预期目的有用。我们还建议了一些需要进一步检查的改进。我们计算文本各点的参数值，包括从文本开始计算的N个令牌(运行词)。作为作者和/或语言归因前景的参数，我们特别确定了h点标度系数、Yule’s K、相对重复率和异常现象的比例。这些参数在n中表现出相当稳定的行为。另一组包括参数相对于n的缩放指数。建议对Lambda和熵进行某些修改，引入ln n的对数修正。结果适用于数千到数万个单词的文本。

引用次数: 1

Dynamics of language in social emergency: investigating COVID-19 hot words on Weibo 社会突发事件中的语言动态:新冠肺炎微博热词调查

Q4 LINGUISTICS

Glottometrics

Pub Date : 2022-01-01 DOI: 10.53482/2022_52_395

Yi Zhou, Rui Li, Guangfeng Chen, Haitao Liu

Drawing on word embeddings techniques and tracking the frequency and semantic change of hot words on Sina Weibo during the COVID-19 pandemic, this study investigates how language and discourse change during crisis. More specifically, correlation tests were conducted between word frequency ranks, pandemic data, and word meaning change ratio. Results indicated that the frequency of some hot words changed with both pandemic data and the frequency of other hot words, which were significantly correlated with the American pandemic data rather than that of China. Moreover, February of 2020 saw the most distinctive semantic changes marked by a large part of the nearest neighbors for WAR metaphors. The correlations between changes in the frequency and nearest neighbors of COVID-19 related hot words exhibited some acceptable peculiarities. This study proves the availability of studying discourse through language change by observing minor semantic change on connotation level from social media, which adds a new perspective to the impact of the COVID-19 pandemic.

本研究利用词嵌入技术，跟踪新冠疫情期间新浪微博热词的频率和语义变化，探讨危机期间语言和话语的变化。更具体地说，在词频等级、流行病数据和词义变化率之间进行了相关检验。结果表明，部分热词的出现频率随大流行数据和其他热词出现频率的变化而变化，其中与美国大流行数据的相关性显著，与中国大流行数据的相关性不显著。此外，2020年2月出现了最明显的语义变化，其中大部分是WAR隐喻的近邻。新冠肺炎相关热词的频率变化与最近邻居之间的相关性表现出一些可接受的特殊性。本研究通过观察社交媒体内涵层面的微小语义变化，证明了通过语言变化研究话语的有效性，为新冠肺炎疫情的影响增加了新的视角。

引用次数: 0

Statistical tests for text homogeneity: using forward and backward processes of numbers of different words 文本同质性的统计测试:使用不同单词数量的前向和后向处理

Q4 LINGUISTICS

Glottometrics

Pub Date : 2022-01-01 DOI: 10.53482/2022_53_401

Berhane Abebe, M. Chebunin, A. Kovalevskii, N. Zakrevskaya

The processes of growth in the number of diverse words in a text, when reading in the forward and backward directions, are studied in this article. Based upon the statistics achieved from the difference between these two processes, we construct a statistical test. This statistical test is used for text homogeneity checks. The elementary model states that words in a text are selected from some dictionary independent of each other according to the Zipf–Mandelbrot law. P-values of the statistical test are calculated based on the elementary probabilistic model using the asymptotic normality of corresponding statistics. At last but not least, this statistical test is applied for the analysis of homogeneity of sequences of sonnets.

本文研究了在向前阅读和向后阅读时，文本中不同词汇数量的增长过程。根据这两个过程之间的差异所获得的统计量，我们构造了一个统计检验。这个统计检验用于文本同质性检查。基本模型表明，文本中的单词是根据Zipf-Mandelbrot定律从相互独立的字典中选择出来的。利用相应统计量的渐近正态性，基于初等概率模型计算统计检验的p值。最后，将此统计检验应用于十四行诗序列的同质性分析。

引用次数: 1

A Corpus-Driven Study of the Style Variation in The Grapes of Wrath 《愤怒的葡萄》文体变异的语料库驱动研究

Q4 LINGUISTICS

Glottometrics

Pub Date : 2022-01-01 DOI: 10.53482/2022_52_396

Yiyang Hu, Qingshun He

The novel The Grapes of Wrath is distinctive in the arrangement of intercalary chapters and narrative chapters. Existing studies of the narratological distinction of this novel are primarily qualitative. This article conducted a corpus-driven study of the variation of styles in this novel from the perspectives of word cluster, type-token ratio, descriptivity and activity, keyness, and sentiment. The cluster analysis shows that the choice of words in the narrative chapters is more consistent than that in the intercalary chapters. The type-token ratio analysis testifies to the heterogeneity of the intercalary chapters in terms of lexical richness. The descriptivity and activity analysis and the keyness analysis reveal that the narrative chapters are more active than the intercalary chapters. The sentiment analysis finds that the novel is pervaded by negative sentiments and that negative sentiments are more prevalent in the narrative chapters than in the intercalary chapters. The research concludes that the corpus-driven study can provide insights into the narrative structure and the stylistic variation of the novel.

小说《愤怒的葡萄》在穿插章节和叙事章节的安排上独具特色。现有的关于这部小说叙事区别的研究主要是定性的。本文以语料库为基础，从词类、字词比、描述性与活跃性、关键度、情感等方面对小说的文体变化进行了研究。聚类分析表明，叙事性篇章的用词选择比穿插性篇章的用词选择更一致。字词比分析证明了跨章在词汇丰富度方面的异质性。描述性、活动性分析和关键度分析表明，叙事性篇章比穿插性篇章更具活动性。情绪分析发现，小说中弥漫着消极情绪，消极情绪在叙事章节中比在中间章节中更为普遍。研究认为，语料库驱动的研究有助于深入了解小说的叙事结构和风格变化。

引用次数: 0

Book review - On Invisible Language in Modern English: A Corpus-based Approach to Ellipsis. By Evelyn Gandón-Chapela. London: Bloomsbury Academic. 2020 书评-现代英语中的隐形语言:基于语料库的省略号分析。伊夫林Gandón-Chapela。伦敦:布鲁姆斯伯里学术出版社，2020

Q4 LINGUISTICS

Glottometrics

Pub Date : 2022-01-01 DOI: 10.53482/2022_52_398

Zheyuan Dai

引用次数: 0

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Glottometrics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀