首页 > 最新文献

Journal of Quantitative Linguistics最新文献

英文 中文
Latent-Variable Modelling of Ordinal Outcomes in Language Data Analysis 语言数据分析中序数结果的潜变量建模
IF 1.4 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2024-04-08 DOI: 10.1080/09296174.2024.2329448
Lukas Sönning, Manfred Krug, Fabian Vetter, Timo Schmid, Anne Leucht, Paul Messer
In empirical work, ordinal variables are typically analysed using means based on numeric scores assigned to categories. While this strategy has met with justified criticism in the methodological li...
在实证研究工作中,通常使用基于分类数字分数的平均值来分析序数变量。虽然这种策略在方法论上受到了合理的批评,但在实证研究中却得到了广泛的应用。
{"title":"Latent-Variable Modelling of Ordinal Outcomes in Language Data Analysis","authors":"Lukas Sönning, Manfred Krug, Fabian Vetter, Timo Schmid, Anne Leucht, Paul Messer","doi":"10.1080/09296174.2024.2329448","DOIUrl":"https://doi.org/10.1080/09296174.2024.2329448","url":null,"abstract":"In empirical work, ordinal variables are typically analysed using means based on numeric scores assigned to categories. While this strategy has met with justified criticism in the methodological li...","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"63 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corrections to Nelson (2023): DPnorm and DKLnorm are Not Wrong on Pi at All 对纳尔逊(2023)的修正:DPnorm 和 DKLnorm 在圆周率上完全没有错
IF 1.4 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2024-03-24 DOI: 10.1080/09296174.2024.2324616
Stefan Th Gries
This paper mainly discusses two computational errors in Nelson (2023), which demonstrate that part of his conclusions regarding two dispersion measures are flawed.
本文主要讨论纳尔逊(2023 年)的两个计算错误,这两个错误表明他关于两个离散度量的部分结论存在缺陷。
{"title":"Corrections to Nelson (2023): DPnorm and DKLnorm are Not Wrong on Pi at All","authors":"Stefan Th Gries","doi":"10.1080/09296174.2024.2324616","DOIUrl":"https://doi.org/10.1080/09296174.2024.2324616","url":null,"abstract":"This paper mainly discusses two computational errors in Nelson (2023), which demonstrate that part of his conclusions regarding two dispersion measures are flawed.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"52 11 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140297981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multifractal Analysis of the Distribution of Three Grammatical Constructions in English Texts 英语文本中三种语法结构分布的多分形分析
IF 1.4 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2024-02-08 DOI: 10.1080/09296174.2024.2302674
Rosmawati, Wander Lowie
Both the Menzerath-Altmann law and the Zipf-Mandelbrot law note that language is a fractal structure and, like any other fractals, follows power laws. Studies on fractal linguistics demonstrated th...
Menzerath-Altmann定律和Zipf-Mandelbrot定律都指出,语言是一种分形结构,与其他分形一样,遵循幂律。对分形语言学的研究表明...
{"title":"Multifractal Analysis of the Distribution of Three Grammatical Constructions in English Texts","authors":"Rosmawati, Wander Lowie","doi":"10.1080/09296174.2024.2302674","DOIUrl":"https://doi.org/10.1080/09296174.2024.2302674","url":null,"abstract":"Both the Menzerath-Altmann law and the Zipf-Mandelbrot law note that language is a fractal structure and, like any other fractals, follows power laws. Studies on fractal linguistics demonstrated th...","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"113 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139752644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantitative Approaches to Universality and Individuality in Language 语言普遍性和个性化的量化方法
IF 1.4 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2023-12-17 DOI: 10.1080/09296174.2023.2294786
Wei Huang, Tenghao Ji
Published in Journal of Quantitative Linguistics (Ahead of Print, 2023)
发表于《定量语言学杂志》(2023 年提前出版)
{"title":"Quantitative Approaches to Universality and Individuality in Language","authors":"Wei Huang, Tenghao Ji","doi":"10.1080/09296174.2023.2294786","DOIUrl":"https://doi.org/10.1080/09296174.2023.2294786","url":null,"abstract":"Published in Journal of Quantitative Linguistics (Ahead of Print, 2023)","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"98 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138826658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Current State and Prominent Features of Quantitative Linguistics Through the Lens of QUALICO 2023: A Conference Report 定量语言学的现状和突出特点通过QUALICO 2023镜头:会议报告
IF 1.4 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2023-11-28 DOI: 10.1080/09296174.2023.2283932
Jianwei Yan
Quantitative Linguistics (QL) is an academic field that employs quantitative and statistical methods to explore language patterns and linguistic laws. From June 28th to 30th, 2023, the Internationa...
数量语言学是一门运用定量和统计方法研究语言模式和语言规律的学科。2023年6月28日至30日,国际…
{"title":"The Current State and Prominent Features of Quantitative Linguistics Through the Lens of QUALICO 2023: A Conference Report","authors":"Jianwei Yan","doi":"10.1080/09296174.2023.2283932","DOIUrl":"https://doi.org/10.1080/09296174.2023.2283932","url":null,"abstract":"Quantitative Linguistics (QL) is an academic field that employs quantitative and statistical methods to explore language patterns and linguistic laws. From June 28th to 30th, 2023, the Internationa...","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"51 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138529561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Text Segmentation Via Processes that Count the Number of Different Words Forward and Backward 通过向前和向后计数不同单词数量的过程进行文本分割
2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2023-11-12 DOI: 10.1080/09296174.2023.2275342
Berhane Abebe, Mikhail Chebunin, Artyom Kovalevskii
ABSTRACTThe paper is developing a new statistical approach to automatic partitioning of texts into parts belonging to different authors. It is based on the analysis of processes that counts the number of different words forward and backward. The theoretical study of the processes is based on the assumptions of an elementary probability model with a change point. We prove consistence of our statistical estimate of the point of concatenation in the case when the concatenated texts have different Zipf exponents. This method is being tested on the Brown corpus and also on newspaper texts in different languages. Testing shows a good estimate of the concatenation point. This method can be used in parallel with other text segmentation methods. AcknowledgmentsThe authors like to thank anonymous referees for their helpful and constructive comments and suggestions.Disclosure statementNo potential conflict of interest was reported by the author(s).Data availability statementWe used texts from open sources.Additional informationFundingThe work was supported by the Siberian Branch, Russian Academy of Sciences [FWNF-2022-0010].
摘要本文提出了一种新的统计方法,用于将文本自动划分为属于不同作者的部分。它基于对进程的分析,向前和向后计算不同单词的数量。该过程的理论研究是基于一个具有变化点的初等概率模型的假设。我们证明了在连接文本具有不同Zipf指数的情况下,我们对连接点的统计估计的一致性。这种方法正在布朗语料库和不同语言的报纸文本上进行测试。测试显示了对连接点的一个很好的估计。该方法可以与其他文本分割方法并行使用。作者要感谢匿名审稿人提供的有帮助和建设性的意见和建议。披露声明作者未报告潜在的利益冲突。数据可用性声明我们使用了来自开放资源的文本。本研究得到了俄罗斯科学院西伯利亚分院的支持[FWNF-2022-0010]。
{"title":"Text Segmentation Via Processes that Count the Number of Different Words Forward and Backward","authors":"Berhane Abebe, Mikhail Chebunin, Artyom Kovalevskii","doi":"10.1080/09296174.2023.2275342","DOIUrl":"https://doi.org/10.1080/09296174.2023.2275342","url":null,"abstract":"ABSTRACTThe paper is developing a new statistical approach to automatic partitioning of texts into parts belonging to different authors. It is based on the analysis of processes that counts the number of different words forward and backward. The theoretical study of the processes is based on the assumptions of an elementary probability model with a change point. We prove consistence of our statistical estimate of the point of concatenation in the case when the concatenated texts have different Zipf exponents. This method is being tested on the Brown corpus and also on newspaper texts in different languages. Testing shows a good estimate of the concatenation point. This method can be used in parallel with other text segmentation methods. AcknowledgmentsThe authors like to thank anonymous referees for their helpful and constructive comments and suggestions.Disclosure statementNo potential conflict of interest was reported by the author(s).Data availability statementWe used texts from open sources.Additional informationFundingThe work was supported by the Siberian Branch, Russian Academy of Sciences [FWNF-2022-0010].","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"79 19","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135037122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Word Length in Chinese: The Menzerath-Altmann Law is Valid After All 单词长度:Menzerath-Altmann定律仍然有效
2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2023-11-06 DOI: 10.1080/09296174.2023.2259937
Tereza Motalová, Ján Mačutek, Radek Čech
ABSTRACTAccording to the Menzerath-Altmann law, longer language constructs consist, on average, of shorter constituents. It is most often studied at the level of words and syllables (the mean syllable length gets shorter with the increasing word length). Its validity at this level was corroborated in several languages. However, it was claimed that Chinese is an exception with respect to the validity of the Menzerath-Altmann law. We show that the law is valid if word types are considered, while the behaviour of word tokens is different. This difference can be explained by the fact that the Zipf law of abbreviation is valid not only for words but also for syllables (shorter syllables are used more frequently).KEYWORDS: word lengthMenzerath-Altmann lawChinesesyllableChinese characters AcknowledgmentsThe work was supported from European Regional Development Fund Project “Sinophone Borderlands – Interaction at the Edges”, CZ.02.1.01/0.0/0.0/16_019/0000791 (T. Motalová), VEGA 2/0096/21 (J. Mačutek), APVV-21-0216 (J. Mačutek), and Operational Programme Integrated Infrastructure (OPII) for the project 313011BWH2: “InoCHF – Research and development in the field of innovative technologies in the management of patients with CHF”, co-financed by the European Regional Development Fund (J. Mačutek).Disclosure statementNo potential conflict of interest was reported by the author(s).Notes1. A more general formula with an additional parameter c, yx=axbecx, is sometimes used, see e.g. Mačutek et al. (Citation2019).2. The MAL has found its place also in research areas outside of human language, such as e.g. music (Boroda & Altmann, Citation1991), animal communication (Gustison et al., Citation2016), and genome structure (Ferrer-I-Cancho et al., Citation2014). The ‘common denominator’ of these branches of science is that they study information flow (in a very general sense).3. Syllable length was measured in moras, not in phonemes.4. In some of the papers cited in this paragraph, the mean syllable length is expressed in the number of graphemes rather than phonemes. The mean syllable length is quite similar for both choices in languages with shallow orthographies (Coulmas, Citation2002).5. Erization is an addition of the r-suffix (儿) to a syllable, e.g. 花 huā becomes 花儿 huār (‘flower’). Moreover, there are a few singular exceptions of polysyllabic characters in Chinese. Qiu (Citation2000, p. 26, 406) mentions 瓩 qiānwǎ ‘kilowatt’, 浬 hǎilǐ ‘nautical mile’, and 哩 yīnglǐ ‘English mile’ (none of these words occurs in our language material).6. Xin Han-Da cidian – Das neue Chinesisch-Deutsche Wörterbuch, 1985. Commercial Press, Beijing.7. In fact, one can speak about phonological words here, see e.g. Hall (Citation1999) or Zsiga (Citation2013, pp. 342–346). Thus, this approach can be considered a study of the MAL on the level of words, albeit from a slightly different perspective.8. Lengths of stress units ranged between 1 and 18 syllables while in the case of rhythmic segm
摘要根据Menzerath-Altmann定律,较长的语言结构平均由较短的组成部分组成。它通常在单词和音节的层面上进行研究(平均音节长度随着单词长度的增加而变短)。它在这一级的有效性在若干语文中得到证实。然而,有人声称中国在Menzerath-Altmann法的有效性方面是个例外。我们表明,如果考虑单词类型,该定律是有效的,而单词标记的行为是不同的。这种差异可以用以下事实来解释:齐夫缩写定律不仅对单词有效,而且对音节也有效(更短的音节使用得更频繁)。本研究由欧洲区域发展基金项目“汉语边疆——边缘的互动”、cz . 02.01 /0.0/0.0/16_019/0000791 (T. motalov<e:1>)、VEGA 2/0096/21 (J. ma<e:1> utek)、APVV-21-0216 (J. ma<e:1> utek)和项目313011BWH2的运营计划综合基础设施(OPII)资助。“InoCHF - CHF患者管理创新技术领域的研究与开发”,由欧洲区域发展基金(J. mautek)共同资助。披露声明作者未报告潜在的利益冲突。有时会使用带有额外参数c的更一般的公式,yx=axbecx,参见例如ma<e:1>尤特克等人(Citation2019)。MAL在人类语言以外的研究领域也占有一席之地,例如音乐(Boroda & Altmann, Citation1991)、动物交流(Gustison等,Citation2016)和基因组结构(Ferrer-I-Cancho等,Citation2014)。这些科学分支的“共同点”是它们研究信息流(在非常普遍的意义上)。音节长度是用动词而不是音素来衡量的。在本段引用的一些论文中,平均音节长度是用字素而不是音素的数量来表示的。在浅正字法的语言中,两种选择的平均音节长度相当相似(Coulmas, Citation2002)。“Erization”是在一个音节后面加上r后缀,例如“花”变成了“花”huār。此外,汉语多音节汉字也有个别例外。邱(Citation2000, p. 26,406)提到瓩qiānwǎ“千瓦”、浬hǎilǐ“海里”和“英哩”(这些词在我们的语言材料中都没有出现)。新汉-大典-新汉德Wörterbuch, 1985。商务印书馆,北京。事实上,人们可以在这里谈论音系词,例如Hall (Citation1999)或Zsiga (Citation2013, pp. 342-346)。因此,这种方法可以被认为是在单词层面上对MAL的研究,尽管是从一个稍微不同的角度。重音单元的长度在1到18个音节之间,而节奏段的长度在1到7个音节之间(Ščigulinská & schusterov<e:1>, Citation2014, pp. 70-72, p. 77)。Kovaľová和schusterov<e:1> (Citation2016, pp. 122-133)报告了1到21个音节之间的重音单位长度,类似于Rothe-Neves等人(Citation2017, p. 6)报告的2到29个音节之间的话语长度。另一方面,Geršić和Altmann (Citation1980, pp. 115-123)测试了单词长度不超过5个音节的规律。https://www.fon.hum.uva.nl/praat/(2023年6月1日访问)。回想一下,Stave等人(Citation2021)研究了语素中的单词长度与字形中的平均语素长度之间的关系。https://www.wordproject.org/(2023年6月1日访问)。国际圣经协会。词汇计划®:盛静:x<s:1> nyuu æ Quán shhi[圣经]。新约]。可在https://www.wordproject.org/bibles/pn/index.htm获得(2023年6月1日访问)。国际圣经协会。文字工程®:《圣经》。新约]。可在https://www.wordproject.org/bibles/gb_cat/index.htm获得(2023年6月1日访问)。可在https://github.com/tsroten/pynlpir获得(2023年6月1日访问)。可在https://github.com/NLPIR-team/NLPIR获得(2023年6月1日访问)。可在http://bcc.blcu.edu.cn/downloads/resources/%E6%B1%89%E5%AD%97%E4%BF%A1%E6%81%AF%E8%AF%8D%E5%85%B8.zip获得(2023年6月1日访问)。可在https://github.com/mozillazg/python-pinyin获得(2023年7月23日访问)。http://www.nlreg.com(2023年6月访问)当然,这个要求是另一条经验法则。参见ma<e:1> utek和Rovenchak (Citation2011)以及ma<e:1> utek等人(Citation2021)对频率过低的词长度分类问题的类似但略有不同的方法。例如,如果我们测量单词的音节长度,长度从1到5出现超过10次,长度6的频率为12,长度7的频率为1,我们将后两个长度归为一类。 该类别的加权平均词长为12×6+1×712+1=6.08;数据见表1.22。我们还获得了单词长度和平均音节长度之间关系的可比较结果,包括张启龄写的日记(http://www.pinyin.info/readings/pinyin_riji_duanwen.html,访问日期为2023年6月1日),以及兰开斯特普通话语料库中包含新闻报道文学(文本类别a)和科学学术散文(文本类别J)的样本(McEnery等人,Citation2003)。与表1和图1类似,平均音节长度呈下降趋势,最长的单词略有增加。我们也获得了类似的结果之间的关系用汉字字长和平均字符大小的组件和中风,分别为短篇小说我为什么要结婚(为什么我想结婚)从一个短篇小说集黄昏里的男孩[黄昏中的小男孩])余华写的(Citation2012),以及一个示例包含新闻报道(文本类别)和科学学术散文的兰开斯特文集(文本类别J)普通话(McEnery et al .,Citation2003)。。由一个、两个和三个音节组成的单词占新约中文翻译中所有单词标记的99.7%,见表1.25。考虑到最小努力原则的广泛应用范围(见Zipf, Citation1949),容易发音的音调可能出现得更频繁(见Zhang, Citation2002)。语调特征还可以与其他单词属性相互作用,例如,较长的单词比较短的单词具有更高比例的简单语调。根据Berdicevskis (Citation2021,第27页)的说法,“在语言中,子句的重复频率不够高,无法进行频率估计”。本研究得到Agentúra na Podporu Výskumu a Vývoja [APVV-21-0216]的支持;欧洲区域发展基金[CZ.02.1.
{"title":"Word Length in Chinese: The Menzerath-Altmann Law is Valid After All","authors":"Tereza Motalová, Ján Mačutek, Radek Čech","doi":"10.1080/09296174.2023.2259937","DOIUrl":"https://doi.org/10.1080/09296174.2023.2259937","url":null,"abstract":"ABSTRACTAccording to the Menzerath-Altmann law, longer language constructs consist, on average, of shorter constituents. It is most often studied at the level of words and syllables (the mean syllable length gets shorter with the increasing word length). Its validity at this level was corroborated in several languages. However, it was claimed that Chinese is an exception with respect to the validity of the Menzerath-Altmann law. We show that the law is valid if word types are considered, while the behaviour of word tokens is different. This difference can be explained by the fact that the Zipf law of abbreviation is valid not only for words but also for syllables (shorter syllables are used more frequently).KEYWORDS: word lengthMenzerath-Altmann lawChinesesyllableChinese characters AcknowledgmentsThe work was supported from European Regional Development Fund Project “Sinophone Borderlands – Interaction at the Edges”, CZ.02.1.01/0.0/0.0/16_019/0000791 (T. Motalová), VEGA 2/0096/21 (J. Mačutek), APVV-21-0216 (J. Mačutek), and Operational Programme Integrated Infrastructure (OPII) for the project 313011BWH2: “InoCHF – Research and development in the field of innovative technologies in the management of patients with CHF”, co-financed by the European Regional Development Fund (J. Mačutek).Disclosure statementNo potential conflict of interest was reported by the author(s).Notes1. A more general formula with an additional parameter c, yx=axbecx, is sometimes used, see e.g. Mačutek et al. (Citation2019).2. The MAL has found its place also in research areas outside of human language, such as e.g. music (Boroda & Altmann, Citation1991), animal communication (Gustison et al., Citation2016), and genome structure (Ferrer-I-Cancho et al., Citation2014). The ‘common denominator’ of these branches of science is that they study information flow (in a very general sense).3. Syllable length was measured in moras, not in phonemes.4. In some of the papers cited in this paragraph, the mean syllable length is expressed in the number of graphemes rather than phonemes. The mean syllable length is quite similar for both choices in languages with shallow orthographies (Coulmas, Citation2002).5. Erization is an addition of the r-suffix (儿) to a syllable, e.g. 花 huā becomes 花儿 huār (‘flower’). Moreover, there are a few singular exceptions of polysyllabic characters in Chinese. Qiu (Citation2000, p. 26, 406) mentions 瓩 qiānwǎ ‘kilowatt’, 浬 hǎilǐ ‘nautical mile’, and 哩 yīnglǐ ‘English mile’ (none of these words occurs in our language material).6. Xin Han-Da cidian – Das neue Chinesisch-Deutsche Wörterbuch, 1985. Commercial Press, Beijing.7. In fact, one can speak about phonological words here, see e.g. Hall (Citation1999) or Zsiga (Citation2013, pp. 342–346). Thus, this approach can be considered a study of the MAL on the level of words, albeit from a slightly different perspective.8. Lengths of stress units ranged between 1 and 18 syllables while in the case of rhythmic segm","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"6 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135584918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structural Factor Analysis of Lexical Complexity Constructs and Measures: A Quantitative Measure-Testing Process on Specialised Academic Texts 词汇复杂性的结构因素分析及其测度:专业学术文本的定量测量测试过程
2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2023-11-02 DOI: 10.1080/09296174.2023.2258782
Maryam Nasseri, Philip McCarthy
ABSTRACTThis study evaluates 22 lexical complexity measures that represent the three constructs of density, diversity and sophistication. The selection of these measures stems from an extensive review of the SLA linguistics literature. All measures were subjected to qualitative screening for indicators/predictors of lexical proficiency/development and criterion validity based on the body of scholarship. This study’s measure-testing process begins by dividing the selected measures into two groups, similarly calculated and dissimilarly calculated, based on their quantification methods and the results of correlation tests. Using a specialized corpus of postgraduate academic texts, a Structural Factor Analysis (SFA) comprising a Confirmatory Factor Analysis (CFA) and Exploratory Factor Analysis (EFA) is then conducted. The purpose of SFA is to 1) verify and examine the lexical classifications proposed in the literature, 2) evaluate the relationship between various lexical constructs and their representative measures, 3) identify the indices that best represent each construct and 4) detect possible new structures/dimensions. Based on the analysis of the corpus, the study discusses the construct-distinctiveness of lexical complexity constructs, as well as strong indicators of each conceptual/mathematical group among the measures. Finally, a unique and smaller set of measures representative of each construct is suggested for future studies that require measure selection. AcknowledgmentsWe would like to thank the two anonymous reviewers for their valuable suggestions and comments.Disclosure statementNo potential conflict of interest was reported by the author(s).Credit authorship contribution statementMaryam Nasseri: Conceptualization, Data curation, Methodology, Data analysis and evaluation of findings, Project administration, Visualization, Writing: original draft, Writing: critical review & editing, Funding acquisition.Philip McCarthy: Measure-selection, Writing: critical review & editing, Funding acquisition.Notes1. The lexical sophistication measures in LCA-AW are filtered through the BAWE (British Academic Written English) corpus and its most-frequently-used academic writing words used in linguistics and language studies as well as the general English frequency word lists based on the BNC (the British National Corpus) or ANC (American National Corpus).2. LCA-AW and TAALED calculate the indices based on lemma forms while Coh-Metrix calculates the vocd-D index based on word forms. In the latter case, lemmatized files can be used as the input to Coh-Metrix.3. The R packages used in this study include psych (version 1.8.12, Revelle, Citation2018), lavaan (version 0.5–18, Rosseel, Citation2012) and corrplot (version 0.84, Wei & Simko, Citation2017).Additional informationFundingThis study is part of the “Lexical Proficiency Grading for Academic Writing (FRG23-C-S66)” comprehensive research granted by the American University of Sharjah (AUS).Notes on cont
摘要本研究评估了22个词汇复杂性指标,分别代表密度、多样性和复杂性三种结构。这些测量方法的选择源于对二语习得语言学文献的广泛回顾。所有的测量都经过定性筛选,以确定词汇熟练度/发展和标准效度的指标/预测因素。本研究的测量-测试过程首先根据其量化方法和相关测试的结果,将选定的测量分为计算相似和计算不同的两组。使用研究生学术文本的专门语料库,然后进行结构因素分析(SFA),包括验证性因素分析(CFA)和探索性因素分析(EFA)。SFA的目的是1)验证和检查文献中提出的词汇分类,2)评估各种词汇结构与其代表性度量之间的关系,3)确定最能代表每种结构的指标,4)发现可能的新结构/维度。在语料库分析的基础上,探讨了词汇复杂性构式的构式差异性,以及构式中各概念/数学组的强指标。最后,为未来需要测量选择的研究提出了一套独特且较小的测量方法,代表每个结构。感谢两位匿名审稿人提出的宝贵建议和意见。披露声明作者未报告潜在的利益冲突。maryam Nasseri:概念化,数据管理,方法论,数据分析和结果评估,项目管理,可视化,写作:原稿,写作:批判性审查和编辑,资金获取。Philip McCarthy:测量-选择,写作:批判性审查和编辑,资金获取。LCA-AW中的词汇复杂性度量是通过BAWE(英国学术书面英语)语料库及其在语言学和语言研究中使用的最常用的学术写作词,以及基于BNC(英国国家语料库)或ANC(美国国家语料库)的一般英语频率词表进行过滤的。LCA-AW和TAALED基于引理形式计算索引,而Coh-Metrix基于词形计算vocd-D索引。在后一种情况下,规范化文件可以用作oh- metrix .3的输入。本研究中使用的R软件包包括psych(版本1.8.12,Revelle, Citation2018)、lavaan(版本0.5-18,Rosseel, Citation2012)和corrplot(版本0.84,Wei & Simko, Citation2017)。本研究是由沙迦美国大学(AUS)资助的“学术写作词汇能力评分(FRG23-C-S66)”综合研究的一部分。maryam Nasseri在英国伯明翰大学获得博士学位,在那里她致力于统计建模、自然语言处理和语料库语言学方法在词汇和句法复杂性方面的应用。她获得了多个奖项和资助,包括ISLE 2020学术文本句法复杂性的资助和AUS 2023-26学术写作词汇熟练度评分统计建模和设计软件的研究资助。她曾在System, Journal of English for Academic Purposes (JEAP)和assessment Writing等期刊上发表文章,并为Taylor & Francis, assessment Writing和Journal of Language and Education (JLE)评论了多篇文章和书籍。Philip McCarthy,副教授和话语科学家,专攻软件设计和语料库分析。他的主要兴趣是分析学生的英语写作。他的文章发表在《话语过程》、《现代语言杂志》、《书面交际》和《应用心理语言学》等期刊上。麦卡锡已经当了30年的老师,在土耳其、日本、英国、美国和阿联酋等地工作。他目前是由沙迦美国大学(AUS)资助的学术写作词汇能力分级项目的首席研究员。
{"title":"Structural Factor Analysis of Lexical Complexity Constructs and Measures: A Quantitative Measure-Testing Process on Specialised Academic Texts","authors":"Maryam Nasseri, Philip McCarthy","doi":"10.1080/09296174.2023.2258782","DOIUrl":"https://doi.org/10.1080/09296174.2023.2258782","url":null,"abstract":"ABSTRACTThis study evaluates 22 lexical complexity measures that represent the three constructs of density, diversity and sophistication. The selection of these measures stems from an extensive review of the SLA linguistics literature. All measures were subjected to qualitative screening for indicators/predictors of lexical proficiency/development and criterion validity based on the body of scholarship. This study’s measure-testing process begins by dividing the selected measures into two groups, similarly calculated and dissimilarly calculated, based on their quantification methods and the results of correlation tests. Using a specialized corpus of postgraduate academic texts, a Structural Factor Analysis (SFA) comprising a Confirmatory Factor Analysis (CFA) and Exploratory Factor Analysis (EFA) is then conducted. The purpose of SFA is to 1) verify and examine the lexical classifications proposed in the literature, 2) evaluate the relationship between various lexical constructs and their representative measures, 3) identify the indices that best represent each construct and 4) detect possible new structures/dimensions. Based on the analysis of the corpus, the study discusses the construct-distinctiveness of lexical complexity constructs, as well as strong indicators of each conceptual/mathematical group among the measures. Finally, a unique and smaller set of measures representative of each construct is suggested for future studies that require measure selection. AcknowledgmentsWe would like to thank the two anonymous reviewers for their valuable suggestions and comments.Disclosure statementNo potential conflict of interest was reported by the author(s).Credit authorship contribution statementMaryam Nasseri: Conceptualization, Data curation, Methodology, Data analysis and evaluation of findings, Project administration, Visualization, Writing: original draft, Writing: critical review & editing, Funding acquisition.Philip McCarthy: Measure-selection, Writing: critical review & editing, Funding acquisition.Notes1. The lexical sophistication measures in LCA-AW are filtered through the BAWE (British Academic Written English) corpus and its most-frequently-used academic writing words used in linguistics and language studies as well as the general English frequency word lists based on the BNC (the British National Corpus) or ANC (American National Corpus).2. LCA-AW and TAALED calculate the indices based on lemma forms while Coh-Metrix calculates the vocd-D index based on word forms. In the latter case, lemmatized files can be used as the input to Coh-Metrix.3. The R packages used in this study include psych (version 1.8.12, Revelle, Citation2018), lavaan (version 0.5–18, Rosseel, Citation2012) and corrplot (version 0.84, Wei & Simko, Citation2017).Additional informationFundingThis study is part of the “Lexical Proficiency Grading for Academic Writing (FRG23-C-S66)” comprehensive research granted by the American University of Sharjah (AUS).Notes on cont","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"24 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135935652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Words and Numbers. In Memory of Peter Grzybek (1957-2019) Words and Numbers. In Memory of Peter Grzybek (1957-2019) , edited by Emmerich Kelih and Reinhard Köhler, Lüdenscheid, RAM-Verlag, 2020, 248 pp., ISBN 978-3-942303-89-7, 55,00 EUR for the paperback version 单词和数字。纪念彼得·格兹贝克(1957-2019)《文字与数字》。纪念彼得·格兹贝克(1957-2019),由Emmerich Kelih和Reinhard编辑Köhler, l<s:1>登沙伊德,RAM-Verlag, 2020, 248页,ISBN 978-3-942303-89-7,平装版5.5万欧元
2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2023-10-30 DOI: 10.1080/09296174.2023.2262696
Mengge Wang
{"title":"Words and Numbers. In Memory of Peter Grzybek (1957-2019) <b>Words and Numbers. In Memory of Peter Grzybek (1957-2019)</b> , edited by Emmerich Kelih and Reinhard Köhler, Lüdenscheid, RAM-Verlag, 2020, 248 pp., ISBN 978-3-942303-89-7, 55,00 EUR for the paperback version","authors":"Mengge Wang","doi":"10.1080/09296174.2023.2262696","DOIUrl":"https://doi.org/10.1080/09296174.2023.2262696","url":null,"abstract":"","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136102652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lexical Features and Psychological States: A Quantitative Linguistic Approach 词汇特征与心理状态:定量语言学研究
2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2023-10-19 DOI: 10.1080/09296174.2023.2256211
Xiaowei Du
ABSTRACTIn recent decades, there has been an increasing interest in the relation between lexical features and texts of psychological states. Previous studies demonstrated that some lexical features varied significantly among the texts of psychological states. However, the lexical features at the textual level have received little attention. This paper extends this work by examining the performance of quantitative linguistic indices in classifying texts of psychological issues. A large dataset of forum posts including texts of anxiety, depression, suicide ideation, and normal states were experimented with Machine Learning algorithms. The results revealed that the quantitative linguistic indices with Machine Learning algorithms achieved a high level of success in identifying psychological states. Meanwhile, some quantitative linguistic indices, namely, ALT and Writer’s view, may extract adequate lexical features for classifying texts of different psychological states. The study is probably the first attempt that uses quantitative linguistic indices as lexical features to detect texts of psychological states, and the findings may contribute to our understanding of how accuracy may be enhanced in the identification of various psychological states. Finally, the implications of these findings are discussed. Publisher’s NoteAll claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.AcknowledgmentsWe thank the JQL referees and the editors for their insightful comments. Their suggestions have significantly enhanced the quality of the initial manuscripts.Disclosure StatementThe authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.Data Availability StatementPublicly available datasets were analysed in this study. This data can be found here: We used AlMosaiwi and Johnstone’s (2018) dataset which can be accessed at https://doi.org/10.6084/m9.figshare.474 3547.v1.Supplemental dataSupplemental data for this article can be accessed online at https://doi.org/10.1080/09296174.2023.2256211.Notes1. The dataset can be accessed at https://doi.org/10.6084/m9.figshare.4743547.Additional informationFundingThis study was Supported by “the Fundamental Research Funds for the Central Universities” (Grant No. 3132023331).
摘要近几十年来,人们对词汇特征与心理状态文本之间的关系越来越感兴趣。以往的研究表明,心理状态语篇的某些词汇特征存在显著差异。然而,语篇层面的词汇特征却很少受到关注。本文通过考察定量语言指标在心理问题文本分类中的表现来扩展这项工作。一个包含焦虑、抑郁、自杀意念和正常状态文本的大型论坛帖子数据集,用机器学习算法进行了实验。结果表明,使用机器学习算法的定量语言指标在识别心理状态方面取得了很高的成功。同时,一些定量的语言指标,如ALT和作者观点,可以提取出足够的词汇特征,对不同心理状态的文本进行分类。本研究可能是第一次尝试使用定量语言指标作为词汇特征来检测心理状态文本,研究结果可能有助于我们理解如何提高识别各种心理状态的准确性。最后,讨论了这些发现的意义。本文中表达的所有声明仅代表作者的观点,并不一定代表其附属组织的观点,也不代表出版商、编辑和审稿人的观点。任何产品,可能在本文中进行评估,或声称,可能是由其制造商,是不保证或认可的出版商。感谢JQL审稿人和编辑们的宝贵意见。他们的建议大大提高了初稿的质量。作者声明,本研究是在没有任何可能被解释为潜在利益冲突的商业或财务关系的情况下进行的。数据可用性声明本研究分析了公开可用的数据集。这些数据可以在这里找到:我们使用了AlMosaiwi和Johnstone(2018)的数据集,可以在https://doi.org/10.6084/m9.figshare.474 3547.v1访问。补充数据本文的补充数据可以在线访问https://doi.org/10.1080/09296174.2023.2256211.Notes1。本研究由中央高校基本科研业务费专项资金(批准号:3132023331)资助。
{"title":"Lexical Features and Psychological States: A Quantitative Linguistic Approach","authors":"Xiaowei Du","doi":"10.1080/09296174.2023.2256211","DOIUrl":"https://doi.org/10.1080/09296174.2023.2256211","url":null,"abstract":"ABSTRACTIn recent decades, there has been an increasing interest in the relation between lexical features and texts of psychological states. Previous studies demonstrated that some lexical features varied significantly among the texts of psychological states. However, the lexical features at the textual level have received little attention. This paper extends this work by examining the performance of quantitative linguistic indices in classifying texts of psychological issues. A large dataset of forum posts including texts of anxiety, depression, suicide ideation, and normal states were experimented with Machine Learning algorithms. The results revealed that the quantitative linguistic indices with Machine Learning algorithms achieved a high level of success in identifying psychological states. Meanwhile, some quantitative linguistic indices, namely, ALT and Writer’s view, may extract adequate lexical features for classifying texts of different psychological states. The study is probably the first attempt that uses quantitative linguistic indices as lexical features to detect texts of psychological states, and the findings may contribute to our understanding of how accuracy may be enhanced in the identification of various psychological states. Finally, the implications of these findings are discussed. Publisher’s NoteAll claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.AcknowledgmentsWe thank the JQL referees and the editors for their insightful comments. Their suggestions have significantly enhanced the quality of the initial manuscripts.Disclosure StatementThe authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.Data Availability StatementPublicly available datasets were analysed in this study. This data can be found here: We used AlMosaiwi and Johnstone’s (2018) dataset which can be accessed at https://doi.org/10.6084/m9.figshare.474 3547.v1.Supplemental dataSupplemental data for this article can be accessed online at https://doi.org/10.1080/09296174.2023.2256211.Notes1. The dataset can be accessed at https://doi.org/10.6084/m9.figshare.4743547.Additional informationFundingThis study was Supported by “the Fundamental Research Funds for the Central Universities” (Grant No. 3132023331).","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135729302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Quantitative Linguistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1