Pub Date : 2024-04-08DOI: 10.1080/09296174.2024.2329448
Lukas Sönning, Manfred Krug, Fabian Vetter, Timo Schmid, Anne Leucht, Paul Messer
In empirical work, ordinal variables are typically analysed using means based on numeric scores assigned to categories. While this strategy has met with justified criticism in the methodological li...
{"title":"Latent-Variable Modelling of Ordinal Outcomes in Language Data Analysis","authors":"Lukas Sönning, Manfred Krug, Fabian Vetter, Timo Schmid, Anne Leucht, Paul Messer","doi":"10.1080/09296174.2024.2329448","DOIUrl":"https://doi.org/10.1080/09296174.2024.2329448","url":null,"abstract":"In empirical work, ordinal variables are typically analysed using means based on numeric scores assigned to categories. While this strategy has met with justified criticism in the methodological li...","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"63 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140583169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-24DOI: 10.1080/09296174.2024.2324616
Stefan Th Gries
This paper mainly discusses two computational errors in Nelson (2023), which demonstrate that part of his conclusions regarding two dispersion measures are flawed.
{"title":"Corrections to Nelson (2023): DPnorm and DKLnorm are Not Wrong on Pi at All","authors":"Stefan Th Gries","doi":"10.1080/09296174.2024.2324616","DOIUrl":"https://doi.org/10.1080/09296174.2024.2324616","url":null,"abstract":"This paper mainly discusses two computational errors in Nelson (2023), which demonstrate that part of his conclusions regarding two dispersion measures are flawed.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"52 11 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140297981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-08DOI: 10.1080/09296174.2024.2302674
Rosmawati, Wander Lowie
Both the Menzerath-Altmann law and the Zipf-Mandelbrot law note that language is a fractal structure and, like any other fractals, follows power laws. Studies on fractal linguistics demonstrated th...
{"title":"Multifractal Analysis of the Distribution of Three Grammatical Constructions in English Texts","authors":"Rosmawati, Wander Lowie","doi":"10.1080/09296174.2024.2302674","DOIUrl":"https://doi.org/10.1080/09296174.2024.2302674","url":null,"abstract":"Both the Menzerath-Altmann law and the Zipf-Mandelbrot law note that language is a fractal structure and, like any other fractals, follows power laws. Studies on fractal linguistics demonstrated th...","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"113 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139752644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-17DOI: 10.1080/09296174.2023.2294786
Wei Huang, Tenghao Ji
Published in Journal of Quantitative Linguistics (Ahead of Print, 2023)
发表于《定量语言学杂志》(2023 年提前出版)
{"title":"Quantitative Approaches to Universality and Individuality in Language","authors":"Wei Huang, Tenghao Ji","doi":"10.1080/09296174.2023.2294786","DOIUrl":"https://doi.org/10.1080/09296174.2023.2294786","url":null,"abstract":"Published in Journal of Quantitative Linguistics (Ahead of Print, 2023)","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"98 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138826658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-28DOI: 10.1080/09296174.2023.2283932
Jianwei Yan
Quantitative Linguistics (QL) is an academic field that employs quantitative and statistical methods to explore language patterns and linguistic laws. From June 28th to 30th, 2023, the Internationa...
{"title":"The Current State and Prominent Features of Quantitative Linguistics Through the Lens of QUALICO 2023: A Conference Report","authors":"Jianwei Yan","doi":"10.1080/09296174.2023.2283932","DOIUrl":"https://doi.org/10.1080/09296174.2023.2283932","url":null,"abstract":"Quantitative Linguistics (QL) is an academic field that employs quantitative and statistical methods to explore language patterns and linguistic laws. From June 28th to 30th, 2023, the Internationa...","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"51 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138529561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
ABSTRACTThe paper is developing a new statistical approach to automatic partitioning of texts into parts belonging to different authors. It is based on the analysis of processes that counts the number of different words forward and backward. The theoretical study of the processes is based on the assumptions of an elementary probability model with a change point. We prove consistence of our statistical estimate of the point of concatenation in the case when the concatenated texts have different Zipf exponents. This method is being tested on the Brown corpus and also on newspaper texts in different languages. Testing shows a good estimate of the concatenation point. This method can be used in parallel with other text segmentation methods. AcknowledgmentsThe authors like to thank anonymous referees for their helpful and constructive comments and suggestions.Disclosure statementNo potential conflict of interest was reported by the author(s).Data availability statementWe used texts from open sources.Additional informationFundingThe work was supported by the Siberian Branch, Russian Academy of Sciences [FWNF-2022-0010].
{"title":"Text Segmentation Via Processes that Count the Number of Different Words Forward and Backward","authors":"Berhane Abebe, Mikhail Chebunin, Artyom Kovalevskii","doi":"10.1080/09296174.2023.2275342","DOIUrl":"https://doi.org/10.1080/09296174.2023.2275342","url":null,"abstract":"ABSTRACTThe paper is developing a new statistical approach to automatic partitioning of texts into parts belonging to different authors. It is based on the analysis of processes that counts the number of different words forward and backward. The theoretical study of the processes is based on the assumptions of an elementary probability model with a change point. We prove consistence of our statistical estimate of the point of concatenation in the case when the concatenated texts have different Zipf exponents. This method is being tested on the Brown corpus and also on newspaper texts in different languages. Testing shows a good estimate of the concatenation point. This method can be used in parallel with other text segmentation methods. AcknowledgmentsThe authors like to thank anonymous referees for their helpful and constructive comments and suggestions.Disclosure statementNo potential conflict of interest was reported by the author(s).Data availability statementWe used texts from open sources.Additional informationFundingThe work was supported by the Siberian Branch, Russian Academy of Sciences [FWNF-2022-0010].","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"79 19","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135037122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-06DOI: 10.1080/09296174.2023.2259937
Tereza Motalová, Ján Mačutek, Radek Čech
ABSTRACTAccording to the Menzerath-Altmann law, longer language constructs consist, on average, of shorter constituents. It is most often studied at the level of words and syllables (the mean syllable length gets shorter with the increasing word length). Its validity at this level was corroborated in several languages. However, it was claimed that Chinese is an exception with respect to the validity of the Menzerath-Altmann law. We show that the law is valid if word types are considered, while the behaviour of word tokens is different. This difference can be explained by the fact that the Zipf law of abbreviation is valid not only for words but also for syllables (shorter syllables are used more frequently).KEYWORDS: word lengthMenzerath-Altmann lawChinesesyllableChinese characters AcknowledgmentsThe work was supported from European Regional Development Fund Project “Sinophone Borderlands – Interaction at the Edges”, CZ.02.1.01/0.0/0.0/16_019/0000791 (T. Motalová), VEGA 2/0096/21 (J. Mačutek), APVV-21-0216 (J. Mačutek), and Operational Programme Integrated Infrastructure (OPII) for the project 313011BWH2: “InoCHF – Research and development in the field of innovative technologies in the management of patients with CHF”, co-financed by the European Regional Development Fund (J. Mačutek).Disclosure statementNo potential conflict of interest was reported by the author(s).Notes1. A more general formula with an additional parameter c, yx=axbecx, is sometimes used, see e.g. Mačutek et al. (Citation2019).2. The MAL has found its place also in research areas outside of human language, such as e.g. music (Boroda & Altmann, Citation1991), animal communication (Gustison et al., Citation2016), and genome structure (Ferrer-I-Cancho et al., Citation2014). The ‘common denominator’ of these branches of science is that they study information flow (in a very general sense).3. Syllable length was measured in moras, not in phonemes.4. In some of the papers cited in this paragraph, the mean syllable length is expressed in the number of graphemes rather than phonemes. The mean syllable length is quite similar for both choices in languages with shallow orthographies (Coulmas, Citation2002).5. Erization is an addition of the r-suffix (儿) to a syllable, e.g. 花 huā becomes 花儿 huār (‘flower’). Moreover, there are a few singular exceptions of polysyllabic characters in Chinese. Qiu (Citation2000, p. 26, 406) mentions 瓩 qiānwǎ ‘kilowatt’, 浬 hǎilǐ ‘nautical mile’, and 哩 yīnglǐ ‘English mile’ (none of these words occurs in our language material).6. Xin Han-Da cidian – Das neue Chinesisch-Deutsche Wörterbuch, 1985. Commercial Press, Beijing.7. In fact, one can speak about phonological words here, see e.g. Hall (Citation1999) or Zsiga (Citation2013, pp. 342–346). Thus, this approach can be considered a study of the MAL on the level of words, albeit from a slightly different perspective.8. Lengths of stress units ranged between 1 and 18 syllables while in the case of rhythmic segm
摘要根据Menzerath-Altmann定律,较长的语言结构平均由较短的组成部分组成。它通常在单词和音节的层面上进行研究(平均音节长度随着单词长度的增加而变短)。它在这一级的有效性在若干语文中得到证实。然而,有人声称中国在Menzerath-Altmann法的有效性方面是个例外。我们表明,如果考虑单词类型,该定律是有效的,而单词标记的行为是不同的。这种差异可以用以下事实来解释:齐夫缩写定律不仅对单词有效,而且对音节也有效(更短的音节使用得更频繁)。本研究由欧洲区域发展基金项目“汉语边疆——边缘的互动”、cz . 02.01 /0.0/0.0/16_019/0000791 (T. motalov<e:1>)、VEGA 2/0096/21 (J. ma<e:1> utek)、APVV-21-0216 (J. ma<e:1> utek)和项目313011BWH2的运营计划综合基础设施(OPII)资助。“InoCHF - CHF患者管理创新技术领域的研究与开发”,由欧洲区域发展基金(J. mautek)共同资助。披露声明作者未报告潜在的利益冲突。有时会使用带有额外参数c的更一般的公式,yx=axbecx,参见例如ma<e:1>尤特克等人(Citation2019)。MAL在人类语言以外的研究领域也占有一席之地,例如音乐(Boroda & Altmann, Citation1991)、动物交流(Gustison等,Citation2016)和基因组结构(Ferrer-I-Cancho等,Citation2014)。这些科学分支的“共同点”是它们研究信息流(在非常普遍的意义上)。音节长度是用动词而不是音素来衡量的。在本段引用的一些论文中,平均音节长度是用字素而不是音素的数量来表示的。在浅正字法的语言中,两种选择的平均音节长度相当相似(Coulmas, Citation2002)。“Erization”是在一个音节后面加上r后缀,例如“花”变成了“花”huār。此外,汉语多音节汉字也有个别例外。邱(Citation2000, p. 26,406)提到瓩qiānwǎ“千瓦”、浬hǎilǐ“海里”和“英哩”(这些词在我们的语言材料中都没有出现)。新汉-大典-新汉德Wörterbuch, 1985。商务印书馆,北京。事实上,人们可以在这里谈论音系词,例如Hall (Citation1999)或Zsiga (Citation2013, pp. 342-346)。因此,这种方法可以被认为是在单词层面上对MAL的研究,尽管是从一个稍微不同的角度。重音单元的长度在1到18个音节之间,而节奏段的长度在1到7个音节之间(Ščigulinská & schusterov<e:1>, Citation2014, pp. 70-72, p. 77)。Kovaľová和schusterov<e:1> (Citation2016, pp. 122-133)报告了1到21个音节之间的重音单位长度,类似于Rothe-Neves等人(Citation2017, p. 6)报告的2到29个音节之间的话语长度。另一方面,Geršić和Altmann (Citation1980, pp. 115-123)测试了单词长度不超过5个音节的规律。https://www.fon.hum.uva.nl/praat/(2023年6月1日访问)。回想一下,Stave等人(Citation2021)研究了语素中的单词长度与字形中的平均语素长度之间的关系。https://www.wordproject.org/(2023年6月1日访问)。国际圣经协会。词汇计划®:盛静:x<s:1> nyuu æ Quán shhi[圣经]。新约]。可在https://www.wordproject.org/bibles/pn/index.htm获得(2023年6月1日访问)。国际圣经协会。文字工程®:《圣经》。新约]。可在https://www.wordproject.org/bibles/gb_cat/index.htm获得(2023年6月1日访问)。可在https://github.com/tsroten/pynlpir获得(2023年6月1日访问)。可在https://github.com/NLPIR-team/NLPIR获得(2023年6月1日访问)。可在http://bcc.blcu.edu.cn/downloads/resources/%E6%B1%89%E5%AD%97%E4%BF%A1%E6%81%AF%E8%AF%8D%E5%85%B8.zip获得(2023年6月1日访问)。可在https://github.com/mozillazg/python-pinyin获得(2023年7月23日访问)。http://www.nlreg.com(2023年6月访问)当然,这个要求是另一条经验法则。参见ma<e:1> utek和Rovenchak (Citation2011)以及ma<e:1> utek等人(Citation2021)对频率过低的词长度分类问题的类似但略有不同的方法。例如,如果我们测量单词的音节长度,长度从1到5出现超过10次,长度6的频率为12,长度7的频率为1,我们将后两个长度归为一类。 该类别的加权平均词长为12×6+1×712+1=6.08;数据见表1.22。我们还获得了单词长度和平均音节长度之间关系的可比较结果,包括张启龄写的日记(http://www.pinyin.info/readings/pinyin_riji_duanwen.html,访问日期为2023年6月1日),以及兰开斯特普通话语料库中包含新闻报道文学(文本类别a)和科学学术散文(文本类别J)的样本(McEnery等人,Citation2003)。与表1和图1类似,平均音节长度呈下降趋势,最长的单词略有增加。我们也获得了类似的结果之间的关系用汉字字长和平均字符大小的组件和中风,分别为短篇小说我为什么要结婚(为什么我想结婚)从一个短篇小说集黄昏里的男孩[黄昏中的小男孩])余华写的(Citation2012),以及一个示例包含新闻报道(文本类别)和科学学术散文的兰开斯特文集(文本类别J)普通话(McEnery et al .,Citation2003)。。由一个、两个和三个音节组成的单词占新约中文翻译中所有单词标记的99.7%,见表1.25。考虑到最小努力原则的广泛应用范围(见Zipf, Citation1949),容易发音的音调可能出现得更频繁(见Zhang, Citation2002)。语调特征还可以与其他单词属性相互作用,例如,较长的单词比较短的单词具有更高比例的简单语调。根据Berdicevskis (Citation2021,第27页)的说法,“在语言中,子句的重复频率不够高,无法进行频率估计”。本研究得到Agentúra na Podporu Výskumu a Vývoja [APVV-21-0216]的支持;欧洲区域发展基金[CZ.02.1.
{"title":"Word Length in Chinese: The Menzerath-Altmann Law is Valid After All","authors":"Tereza Motalová, Ján Mačutek, Radek Čech","doi":"10.1080/09296174.2023.2259937","DOIUrl":"https://doi.org/10.1080/09296174.2023.2259937","url":null,"abstract":"ABSTRACTAccording to the Menzerath-Altmann law, longer language constructs consist, on average, of shorter constituents. It is most often studied at the level of words and syllables (the mean syllable length gets shorter with the increasing word length). Its validity at this level was corroborated in several languages. However, it was claimed that Chinese is an exception with respect to the validity of the Menzerath-Altmann law. We show that the law is valid if word types are considered, while the behaviour of word tokens is different. This difference can be explained by the fact that the Zipf law of abbreviation is valid not only for words but also for syllables (shorter syllables are used more frequently).KEYWORDS: word lengthMenzerath-Altmann lawChinesesyllableChinese characters AcknowledgmentsThe work was supported from European Regional Development Fund Project “Sinophone Borderlands – Interaction at the Edges”, CZ.02.1.01/0.0/0.0/16_019/0000791 (T. Motalová), VEGA 2/0096/21 (J. Mačutek), APVV-21-0216 (J. Mačutek), and Operational Programme Integrated Infrastructure (OPII) for the project 313011BWH2: “InoCHF – Research and development in the field of innovative technologies in the management of patients with CHF”, co-financed by the European Regional Development Fund (J. Mačutek).Disclosure statementNo potential conflict of interest was reported by the author(s).Notes1. A more general formula with an additional parameter c, yx=axbecx, is sometimes used, see e.g. Mačutek et al. (Citation2019).2. The MAL has found its place also in research areas outside of human language, such as e.g. music (Boroda & Altmann, Citation1991), animal communication (Gustison et al., Citation2016), and genome structure (Ferrer-I-Cancho et al., Citation2014). The ‘common denominator’ of these branches of science is that they study information flow (in a very general sense).3. Syllable length was measured in moras, not in phonemes.4. In some of the papers cited in this paragraph, the mean syllable length is expressed in the number of graphemes rather than phonemes. The mean syllable length is quite similar for both choices in languages with shallow orthographies (Coulmas, Citation2002).5. Erization is an addition of the r-suffix (儿) to a syllable, e.g. 花 huā becomes 花儿 huār (‘flower’). Moreover, there are a few singular exceptions of polysyllabic characters in Chinese. Qiu (Citation2000, p. 26, 406) mentions 瓩 qiānwǎ ‘kilowatt’, 浬 hǎilǐ ‘nautical mile’, and 哩 yīnglǐ ‘English mile’ (none of these words occurs in our language material).6. Xin Han-Da cidian – Das neue Chinesisch-Deutsche Wörterbuch, 1985. Commercial Press, Beijing.7. In fact, one can speak about phonological words here, see e.g. Hall (Citation1999) or Zsiga (Citation2013, pp. 342–346). Thus, this approach can be considered a study of the MAL on the level of words, albeit from a slightly different perspective.8. Lengths of stress units ranged between 1 and 18 syllables while in the case of rhythmic segm","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"6 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135584918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-02DOI: 10.1080/09296174.2023.2258782
Maryam Nasseri, Philip McCarthy
ABSTRACTThis study evaluates 22 lexical complexity measures that represent the three constructs of density, diversity and sophistication. The selection of these measures stems from an extensive review of the SLA linguistics literature. All measures were subjected to qualitative screening for indicators/predictors of lexical proficiency/development and criterion validity based on the body of scholarship. This study’s measure-testing process begins by dividing the selected measures into two groups, similarly calculated and dissimilarly calculated, based on their quantification methods and the results of correlation tests. Using a specialized corpus of postgraduate academic texts, a Structural Factor Analysis (SFA) comprising a Confirmatory Factor Analysis (CFA) and Exploratory Factor Analysis (EFA) is then conducted. The purpose of SFA is to 1) verify and examine the lexical classifications proposed in the literature, 2) evaluate the relationship between various lexical constructs and their representative measures, 3) identify the indices that best represent each construct and 4) detect possible new structures/dimensions. Based on the analysis of the corpus, the study discusses the construct-distinctiveness of lexical complexity constructs, as well as strong indicators of each conceptual/mathematical group among the measures. Finally, a unique and smaller set of measures representative of each construct is suggested for future studies that require measure selection. AcknowledgmentsWe would like to thank the two anonymous reviewers for their valuable suggestions and comments.Disclosure statementNo potential conflict of interest was reported by the author(s).Credit authorship contribution statementMaryam Nasseri: Conceptualization, Data curation, Methodology, Data analysis and evaluation of findings, Project administration, Visualization, Writing: original draft, Writing: critical review & editing, Funding acquisition.Philip McCarthy: Measure-selection, Writing: critical review & editing, Funding acquisition.Notes1. The lexical sophistication measures in LCA-AW are filtered through the BAWE (British Academic Written English) corpus and its most-frequently-used academic writing words used in linguistics and language studies as well as the general English frequency word lists based on the BNC (the British National Corpus) or ANC (American National Corpus).2. LCA-AW and TAALED calculate the indices based on lemma forms while Coh-Metrix calculates the vocd-D index based on word forms. In the latter case, lemmatized files can be used as the input to Coh-Metrix.3. The R packages used in this study include psych (version 1.8.12, Revelle, Citation2018), lavaan (version 0.5–18, Rosseel, Citation2012) and corrplot (version 0.84, Wei & Simko, Citation2017).Additional informationFundingThis study is part of the “Lexical Proficiency Grading for Academic Writing (FRG23-C-S66)” comprehensive research granted by the American University of Sharjah (AUS).Notes on cont
摘要本研究评估了22个词汇复杂性指标,分别代表密度、多样性和复杂性三种结构。这些测量方法的选择源于对二语习得语言学文献的广泛回顾。所有的测量都经过定性筛选,以确定词汇熟练度/发展和标准效度的指标/预测因素。本研究的测量-测试过程首先根据其量化方法和相关测试的结果,将选定的测量分为计算相似和计算不同的两组。使用研究生学术文本的专门语料库,然后进行结构因素分析(SFA),包括验证性因素分析(CFA)和探索性因素分析(EFA)。SFA的目的是1)验证和检查文献中提出的词汇分类,2)评估各种词汇结构与其代表性度量之间的关系,3)确定最能代表每种结构的指标,4)发现可能的新结构/维度。在语料库分析的基础上,探讨了词汇复杂性构式的构式差异性,以及构式中各概念/数学组的强指标。最后,为未来需要测量选择的研究提出了一套独特且较小的测量方法,代表每个结构。感谢两位匿名审稿人提出的宝贵建议和意见。披露声明作者未报告潜在的利益冲突。maryam Nasseri:概念化,数据管理,方法论,数据分析和结果评估,项目管理,可视化,写作:原稿,写作:批判性审查和编辑,资金获取。Philip McCarthy:测量-选择,写作:批判性审查和编辑,资金获取。LCA-AW中的词汇复杂性度量是通过BAWE(英国学术书面英语)语料库及其在语言学和语言研究中使用的最常用的学术写作词,以及基于BNC(英国国家语料库)或ANC(美国国家语料库)的一般英语频率词表进行过滤的。LCA-AW和TAALED基于引理形式计算索引,而Coh-Metrix基于词形计算vocd-D索引。在后一种情况下,规范化文件可以用作oh- metrix .3的输入。本研究中使用的R软件包包括psych(版本1.8.12,Revelle, Citation2018)、lavaan(版本0.5-18,Rosseel, Citation2012)和corrplot(版本0.84,Wei & Simko, Citation2017)。本研究是由沙迦美国大学(AUS)资助的“学术写作词汇能力评分(FRG23-C-S66)”综合研究的一部分。maryam Nasseri在英国伯明翰大学获得博士学位,在那里她致力于统计建模、自然语言处理和语料库语言学方法在词汇和句法复杂性方面的应用。她获得了多个奖项和资助,包括ISLE 2020学术文本句法复杂性的资助和AUS 2023-26学术写作词汇熟练度评分统计建模和设计软件的研究资助。她曾在System, Journal of English for Academic Purposes (JEAP)和assessment Writing等期刊上发表文章,并为Taylor & Francis, assessment Writing和Journal of Language and Education (JLE)评论了多篇文章和书籍。Philip McCarthy,副教授和话语科学家,专攻软件设计和语料库分析。他的主要兴趣是分析学生的英语写作。他的文章发表在《话语过程》、《现代语言杂志》、《书面交际》和《应用心理语言学》等期刊上。麦卡锡已经当了30年的老师,在土耳其、日本、英国、美国和阿联酋等地工作。他目前是由沙迦美国大学(AUS)资助的学术写作词汇能力分级项目的首席研究员。
{"title":"Structural Factor Analysis of Lexical Complexity Constructs and Measures: A Quantitative Measure-Testing Process on Specialised Academic Texts","authors":"Maryam Nasseri, Philip McCarthy","doi":"10.1080/09296174.2023.2258782","DOIUrl":"https://doi.org/10.1080/09296174.2023.2258782","url":null,"abstract":"ABSTRACTThis study evaluates 22 lexical complexity measures that represent the three constructs of density, diversity and sophistication. The selection of these measures stems from an extensive review of the SLA linguistics literature. All measures were subjected to qualitative screening for indicators/predictors of lexical proficiency/development and criterion validity based on the body of scholarship. This study’s measure-testing process begins by dividing the selected measures into two groups, similarly calculated and dissimilarly calculated, based on their quantification methods and the results of correlation tests. Using a specialized corpus of postgraduate academic texts, a Structural Factor Analysis (SFA) comprising a Confirmatory Factor Analysis (CFA) and Exploratory Factor Analysis (EFA) is then conducted. The purpose of SFA is to 1) verify and examine the lexical classifications proposed in the literature, 2) evaluate the relationship between various lexical constructs and their representative measures, 3) identify the indices that best represent each construct and 4) detect possible new structures/dimensions. Based on the analysis of the corpus, the study discusses the construct-distinctiveness of lexical complexity constructs, as well as strong indicators of each conceptual/mathematical group among the measures. Finally, a unique and smaller set of measures representative of each construct is suggested for future studies that require measure selection. AcknowledgmentsWe would like to thank the two anonymous reviewers for their valuable suggestions and comments.Disclosure statementNo potential conflict of interest was reported by the author(s).Credit authorship contribution statementMaryam Nasseri: Conceptualization, Data curation, Methodology, Data analysis and evaluation of findings, Project administration, Visualization, Writing: original draft, Writing: critical review & editing, Funding acquisition.Philip McCarthy: Measure-selection, Writing: critical review & editing, Funding acquisition.Notes1. The lexical sophistication measures in LCA-AW are filtered through the BAWE (British Academic Written English) corpus and its most-frequently-used academic writing words used in linguistics and language studies as well as the general English frequency word lists based on the BNC (the British National Corpus) or ANC (American National Corpus).2. LCA-AW and TAALED calculate the indices based on lemma forms while Coh-Metrix calculates the vocd-D index based on word forms. In the latter case, lemmatized files can be used as the input to Coh-Metrix.3. The R packages used in this study include psych (version 1.8.12, Revelle, Citation2018), lavaan (version 0.5–18, Rosseel, Citation2012) and corrplot (version 0.84, Wei & Simko, Citation2017).Additional informationFundingThis study is part of the “Lexical Proficiency Grading for Academic Writing (FRG23-C-S66)” comprehensive research granted by the American University of Sharjah (AUS).Notes on cont","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"24 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135935652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-30DOI: 10.1080/09296174.2023.2262696
Mengge Wang
{"title":"Words and Numbers. In Memory of Peter Grzybek (1957-2019) <b>Words and Numbers. In Memory of Peter Grzybek (1957-2019)</b> , edited by Emmerich Kelih and Reinhard Köhler, Lüdenscheid, RAM-Verlag, 2020, 248 pp., ISBN 978-3-942303-89-7, 55,00 EUR for the paperback version","authors":"Mengge Wang","doi":"10.1080/09296174.2023.2262696","DOIUrl":"https://doi.org/10.1080/09296174.2023.2262696","url":null,"abstract":"","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136102652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-19DOI: 10.1080/09296174.2023.2256211
Xiaowei Du
ABSTRACTIn recent decades, there has been an increasing interest in the relation between lexical features and texts of psychological states. Previous studies demonstrated that some lexical features varied significantly among the texts of psychological states. However, the lexical features at the textual level have received little attention. This paper extends this work by examining the performance of quantitative linguistic indices in classifying texts of psychological issues. A large dataset of forum posts including texts of anxiety, depression, suicide ideation, and normal states were experimented with Machine Learning algorithms. The results revealed that the quantitative linguistic indices with Machine Learning algorithms achieved a high level of success in identifying psychological states. Meanwhile, some quantitative linguistic indices, namely, ALT and Writer’s view, may extract adequate lexical features for classifying texts of different psychological states. The study is probably the first attempt that uses quantitative linguistic indices as lexical features to detect texts of psychological states, and the findings may contribute to our understanding of how accuracy may be enhanced in the identification of various psychological states. Finally, the implications of these findings are discussed. Publisher’s NoteAll claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.AcknowledgmentsWe thank the JQL referees and the editors for their insightful comments. Their suggestions have significantly enhanced the quality of the initial manuscripts.Disclosure StatementThe authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.Data Availability StatementPublicly available datasets were analysed in this study. This data can be found here: We used AlMosaiwi and Johnstone’s (2018) dataset which can be accessed at https://doi.org/10.6084/m9.figshare.474 3547.v1.Supplemental dataSupplemental data for this article can be accessed online at https://doi.org/10.1080/09296174.2023.2256211.Notes1. The dataset can be accessed at https://doi.org/10.6084/m9.figshare.4743547.Additional informationFundingThis study was Supported by “the Fundamental Research Funds for the Central Universities” (Grant No. 3132023331).
{"title":"Lexical Features and Psychological States: A Quantitative Linguistic Approach","authors":"Xiaowei Du","doi":"10.1080/09296174.2023.2256211","DOIUrl":"https://doi.org/10.1080/09296174.2023.2256211","url":null,"abstract":"ABSTRACTIn recent decades, there has been an increasing interest in the relation between lexical features and texts of psychological states. Previous studies demonstrated that some lexical features varied significantly among the texts of psychological states. However, the lexical features at the textual level have received little attention. This paper extends this work by examining the performance of quantitative linguistic indices in classifying texts of psychological issues. A large dataset of forum posts including texts of anxiety, depression, suicide ideation, and normal states were experimented with Machine Learning algorithms. The results revealed that the quantitative linguistic indices with Machine Learning algorithms achieved a high level of success in identifying psychological states. Meanwhile, some quantitative linguistic indices, namely, ALT and Writer’s view, may extract adequate lexical features for classifying texts of different psychological states. The study is probably the first attempt that uses quantitative linguistic indices as lexical features to detect texts of psychological states, and the findings may contribute to our understanding of how accuracy may be enhanced in the identification of various psychological states. Finally, the implications of these findings are discussed. Publisher’s NoteAll claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.AcknowledgmentsWe thank the JQL referees and the editors for their insightful comments. Their suggestions have significantly enhanced the quality of the initial manuscripts.Disclosure StatementThe authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.Data Availability StatementPublicly available datasets were analysed in this study. This data can be found here: We used AlMosaiwi and Johnstone’s (2018) dataset which can be accessed at https://doi.org/10.6084/m9.figshare.474 3547.v1.Supplemental dataSupplemental data for this article can be accessed online at https://doi.org/10.1080/09296174.2023.2256211.Notes1. The dataset can be accessed at https://doi.org/10.6084/m9.figshare.4743547.Additional informationFundingThis study was Supported by “the Fundamental Research Funds for the Central Universities” (Grant No. 3132023331).","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135729302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}