首页 > 最新文献

Journal of Quantitative Linguistics最新文献

英文 中文
Authorship Attribution via Occupancy-problem-type Indices 基于占用问题型指标的作者归属
IF 1.4 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2022-02-14 DOI: 10.1080/09296174.2022.2037276
Lukun Zheng, Huiqiang Zheng, Chandra Kundu
ABSTRACT In this paper, we propose a new methodology for authorship attribution based on a profile of indices related to the occupancy problem, called occupancy-problem indices. The occupancy problem has a long history and is an important example in standard textbooks like Feller (1971). We base our methodology on function words. We establish a testing procedure by constructing a confidence band of the occupancy-problem indices using the sampling distribution of the number of distinct function words. We validate our proposed methodology using controlled and constructed writing samples whose authorship is known. We then apply this methodology to explore the question of who wrote the 15th Oz book, which has a disputing authorship between Lyman Frank Baum (1856–1919) and his successor Ruth Plumly Thompson (1891–1976) on the Oz series.
在本文中,我们提出了一种新的作者归属方法,该方法基于与占用问题相关的指数概况,称为占用问题指数。占用问题由来已久,在Feller(1971)等标准教科书中是一个重要的例子。我们的研究方法以虚词为基础。我们利用不同虚词数目的抽样分布,构造了占用问题指标的置信带,建立了一个检验程序。我们使用已知作者的受控和构造的写作样本来验证我们提出的方法。然后,我们用这种方法来探讨谁写了第15本《绿野仙踪》的问题,这是莱曼·弗兰克·鲍姆(1856-1919)和他的继任者露丝·普拉利·汤普森(1891-1976)在《绿野仙踪》系列中有争议的作者。
{"title":"Authorship Attribution via Occupancy-problem-type Indices","authors":"Lukun Zheng, Huiqiang Zheng, Chandra Kundu","doi":"10.1080/09296174.2022.2037276","DOIUrl":"https://doi.org/10.1080/09296174.2022.2037276","url":null,"abstract":"ABSTRACT In this paper, we propose a new methodology for authorship attribution based on a profile of indices related to the occupancy problem, called occupancy-problem indices. The occupancy problem has a long history and is an important example in standard textbooks like Feller (1971). We base our methodology on function words. We establish a testing procedure by constructing a confidence band of the occupancy-problem indices using the sampling distribution of the number of distinct function words. We validate our proposed methodology using controlled and constructed writing samples whose authorship is known. We then apply this methodology to explore the question of who wrote the 15th Oz book, which has a disputing authorship between Lyman Frank Baum (1856–1919) and his successor Ruth Plumly Thompson (1891–1976) on the Oz series.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"30 1","pages":"27 - 41"},"PeriodicalIF":1.4,"publicationDate":"2022-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43211595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
To Move or Not to Move: An Entropy-based Approach to the Informativeness of Research Article Abstracts across Disciplines 移动还是不移动:基于熵的跨学科研究论文摘要信息性方法
IF 1.4 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2022-02-10 DOI: 10.1080/09296174.2022.2037275
Wei Xiao, Li Li, Jin Liu
ABSTRACT Research article (RA) abstracts succinctly and skilfully epitomize the core information of the full text and have thus attracted the attention of a number of scholars. While previous studies mainly focused on the rhetorical structures, meta-discursive features and lexico-grammatical features, few have made explorations from the perspective of information theory. To bridge this gap, the present study conducted an entropy-based analysis to explore the distribution pattern of information content across moves and the variations across disciplines. 318 RA abstracts across the natural sciences, social sciences and humanities (106 abstracts per discipline) were selected and three indices, i.e. the 1-/ 2-/ 3-gram entropies, were used to examine whether different indices yielded different features. The results show that in an RA abstract, the information content is unevenly distributed across moves; different entropy indices may reflect different linguistic properties; and both similarities and variations exist in information content across disciplines. These phenomena can be attributed to the functions of moves, the linguistic meanings of indices and disciplinary features. This study has implications for RA abstract writing instruction and practice, as well as for broadening the applications of quantitative linguistic methods into less touched fields.
摘要研究文章简明扼要地概括了全文的核心信息,吸引了许多学者的注意。虽然以往的研究主要集中在修辞结构、元话语特征和词典语法特征上,但很少有人从信息论的角度进行探索。为了弥补这一差距,本研究进行了基于熵的分析,以探索信息内容在不同动作中的分布模式以及不同学科之间的差异。选择了318篇自然科学、社会科学和人文学科的RA摘要(每个学科106篇摘要),并使用三个指数,即1-/2-/3-克熵,来检查不同的指数是否产生不同的特征。结果表明,在RA摘要中,信息内容在不同动作中分布不均匀;不同的熵指数可以反映不同的语言特性;跨学科的信息内容既有相似之处,也有差异。这些现象可归因于动作的功能、指标的语言意义和学科特征。这项研究对RA抽象写作的指导和实践,以及将定量语言学方法扩展到较少涉及的领域都有意义。
{"title":"To Move or Not to Move: An Entropy-based Approach to the Informativeness of Research Article Abstracts across Disciplines","authors":"Wei Xiao, Li Li, Jin Liu","doi":"10.1080/09296174.2022.2037275","DOIUrl":"https://doi.org/10.1080/09296174.2022.2037275","url":null,"abstract":"ABSTRACT Research article (RA) abstracts succinctly and skilfully epitomize the core information of the full text and have thus attracted the attention of a number of scholars. While previous studies mainly focused on the rhetorical structures, meta-discursive features and lexico-grammatical features, few have made explorations from the perspective of information theory. To bridge this gap, the present study conducted an entropy-based analysis to explore the distribution pattern of information content across moves and the variations across disciplines. 318 RA abstracts across the natural sciences, social sciences and humanities (106 abstracts per discipline) were selected and three indices, i.e. the 1-/ 2-/ 3-gram entropies, were used to examine whether different indices yielded different features. The results show that in an RA abstract, the information content is unevenly distributed across moves; different entropy indices may reflect different linguistic properties; and both similarities and variations exist in information content across disciplines. These phenomena can be attributed to the functions of moves, the linguistic meanings of indices and disciplinary features. This study has implications for RA abstract writing instruction and practice, as well as for broadening the applications of quantitative linguistic methods into less touched fields.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"30 1","pages":"1 - 26"},"PeriodicalIF":1.4,"publicationDate":"2022-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48228211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Menzerath-Altmann Law in Consecutive and Simultaneous Interpreting: Insights into Varied Cognitive Processes and Load 交替传译和同声传译中的Menzerath-Altmann定律:对不同认知过程和负荷的洞察
IF 1.4 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2022-01-16 DOI: 10.1080/09296174.2022.2027657
Xinlei Jiang, Yue Jiang
ABSTRACT Notwithstanding theoretical simulations of distinctive cognitive processes and load of consecutive (CI) and simultaneous interpreting (SI), quantitative linguistic inquiry into their outputs is needed for solid empirical evidence. As a fundamental law of quantitative linguistics, Menzerath–Altmann Law (MAL) mirrors the economic processing of linguistic information and complex dynamic language system. Given its extensive validation at various linguistic levels and predictive power of its parameters in register, language and authorship differentiation, MAL is worthy of being applied to interpreting studies. We endeavour to investigate whether interpreted languages follow the MAL and reveal varied cognitive load of CI versus SI, as manifested by different MAL fitting models. Results show that (1) both CI and SI outputs follow the MAL; (2) SI processing involves more diversified structural information and shows a greater tendency of shortening the clauses of a sentence with increased sentence length, than CI processing, expressed by significantly higher a and lower b in SI models than that in CI models. Our findings suggest the disparate language representations are shaped by cognitive capacity limitations and interpreting modalities, and reveal how language system dynamically re-regulates and reorganizes the linguistic information to accommodate environmental settings from the perspective of synergetic linguistics.
摘要尽管理论上模拟了不同的认知过程以及连续口译和同声传译的负荷,但需要对其输出进行定量的语言学研究,以获得可靠的经验证据。作为数量语言学的一条基本定律,Menzerath–Altmann定律(MAL)反映了语言信息和复杂动态语言系统的经济处理。鉴于其在各个语言层面的广泛验证,以及其参数在语域、语言和作者区分方面的预测能力,MAL值得应用于口译研究。我们试图调查解释语言是否遵循MAL,并揭示CI与SI的不同认知负荷,如不同的MAL拟合模型所示。结果表明:(1)CI和SI输出均遵循MAL;(2) 与CI处理相比,SI处理涉及更多样化的结构信息,并表现出随着句子长度的增加而缩短句子分句的更大趋势,SI模型中的a和b显著高于CI模型。我们的研究结果表明,不同的语言表征是由认知能力限制和解释模式形成的,并从协同语言学的角度揭示了语言系统如何动态地重新调节和重组语言信息以适应环境。
{"title":"Menzerath-Altmann Law in Consecutive and Simultaneous Interpreting: Insights into Varied Cognitive Processes and Load","authors":"Xinlei Jiang, Yue Jiang","doi":"10.1080/09296174.2022.2027657","DOIUrl":"https://doi.org/10.1080/09296174.2022.2027657","url":null,"abstract":"ABSTRACT Notwithstanding theoretical simulations of distinctive cognitive processes and load of consecutive (CI) and simultaneous interpreting (SI), quantitative linguistic inquiry into their outputs is needed for solid empirical evidence. As a fundamental law of quantitative linguistics, Menzerath–Altmann Law (MAL) mirrors the economic processing of linguistic information and complex dynamic language system. Given its extensive validation at various linguistic levels and predictive power of its parameters in register, language and authorship differentiation, MAL is worthy of being applied to interpreting studies. We endeavour to investigate whether interpreted languages follow the MAL and reveal varied cognitive load of CI versus SI, as manifested by different MAL fitting models. Results show that (1) both CI and SI outputs follow the MAL; (2) SI processing involves more diversified structural information and shows a greater tendency of shortening the clauses of a sentence with increased sentence length, than CI processing, expressed by significantly higher a and lower b in SI models than that in CI models. Our findings suggest the disparate language representations are shaped by cognitive capacity limitations and interpreting modalities, and reveal how language system dynamically re-regulates and reorganizes the linguistic information to accommodate environmental settings from the perspective of synergetic linguistics.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"29 1","pages":"541 - 559"},"PeriodicalIF":1.4,"publicationDate":"2022-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45393907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Markov Models for Multi-state Language Change 多状态语言变化的马尔可夫模型
IF 1.4 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2022-01-01 DOI: 10.1080/09296174.2021.1877004
F. Velde, Isabeau De Smet
{"title":"Markov Models for Multi-state Language Change","authors":"F. Velde, Isabeau De Smet","doi":"10.1080/09296174.2021.1877004","DOIUrl":"https://doi.org/10.1080/09296174.2021.1877004","url":null,"abstract":"","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"29 1","pages":"314-338"},"PeriodicalIF":1.4,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2021.1877004","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"59838234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Syntactic Complexity of Different Text Types: From the Perspective of Dependency Distance Both Linearly and Hierarchically 不同文本类型的句法复杂性:从线性和层次依赖距离看
IF 1.4 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2021-12-09 DOI: 10.1080/09296174.2021.2005960
Ruina Chen, Sirui Deng, Haitao Liu
ABSTRACT Dependency distance (DD) is a well-established measure of syntactic complexity. Previous studies largely focused on the linear dimension, mostly by mean of dependency distance (MDD). In the present study, a new quantitative indicator –mean hierarchical dependency distance (MHDD), is proposed to discuss DD-related issues. Combining MHDD and MDD, the study investigates syntactic complexity of different texts, using strictly length-controlled sentences of 12 text types from the Freiburg-Brown corpus of American English. Correlations of MHDD and MDD have been identified, and possible reasons are discussed from the mathematical and theoretical perspectives. Mathematically, one is that the numerator of MHDD overlaps with the denominator of MDD, both being (n-1) where n is the number of words in the sentence. The other is that the denominator of MHDD (maximum hierarchical layer: MAXHL) and the numerator of MDD (sum of DD: SOD), are positively correlated. We believe that it is the positive correlation of SOD and MAXHL that ensures the change of MDD and MHDD in the same direction. It is also worth noting that both MAXHL and SOD seem to be minimized at their respective data spectrum, which foreshadows the dependency distance minimization (DDM) tendency on the hierarchical dimension.
依赖距离(DD)是一种公认的衡量句法复杂性的方法。以前的研究主要集中在线性维度上,主要是通过依赖距离(MDD)的平均值。在本研究中,提出了一个新的定量指标——平均层次依赖距离(MHDD),以讨论DD相关问题。本研究将MHDD和MDD相结合,使用美国英语弗赖堡-布朗语料库中12种文本类型的严格长度控制句子,研究了不同文本的句法复杂性。已经确定了MHDD和MDD的相关性,并从数学和理论角度讨论了可能的原因。从数学上讲,一种是MHDD的分子与MDD的分母重叠,两者都是(n-1),其中n是句子中的单词数量。另一种是MHDD的分母(最大层次层:MAXHL)和MDD的分子(DD:SOD之和)呈正相关。我们认为,正是SOD和MAXHL的正相关关系确保了MDD和MHDD朝着同一方向变化。同样值得注意的是,MAXHL和SOD似乎都在各自的数据谱上最小化,这预示着层次维度上的依赖距离最小化(DDM)趋势。
{"title":"Syntactic Complexity of Different Text Types: From the Perspective of Dependency Distance Both Linearly and Hierarchically","authors":"Ruina Chen, Sirui Deng, Haitao Liu","doi":"10.1080/09296174.2021.2005960","DOIUrl":"https://doi.org/10.1080/09296174.2021.2005960","url":null,"abstract":"ABSTRACT Dependency distance (DD) is a well-established measure of syntactic complexity. Previous studies largely focused on the linear dimension, mostly by mean of dependency distance (MDD). In the present study, a new quantitative indicator –mean hierarchical dependency distance (MHDD), is proposed to discuss DD-related issues. Combining MHDD and MDD, the study investigates syntactic complexity of different texts, using strictly length-controlled sentences of 12 text types from the Freiburg-Brown corpus of American English. Correlations of MHDD and MDD have been identified, and possible reasons are discussed from the mathematical and theoretical perspectives. Mathematically, one is that the numerator of MHDD overlaps with the denominator of MDD, both being (n-1) where n is the number of words in the sentence. The other is that the denominator of MHDD (maximum hierarchical layer: MAXHL) and the numerator of MDD (sum of DD: SOD), are positively correlated. We believe that it is the positive correlation of SOD and MAXHL that ensures the change of MDD and MHDD in the same direction. It is also worth noting that both MAXHL and SOD seem to be minimized at their respective data spectrum, which foreshadows the dependency distance minimization (DDM) tendency on the hierarchical dimension.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"29 1","pages":"510 - 540"},"PeriodicalIF":1.4,"publicationDate":"2021-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42565734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dependency Distance and Its Probability Distribution: Are They the Universals for Measuring Second Language Learners’ Language Proficiency? 依赖距离及其概率分布:它们是衡量第二语言学习者语言能力的普遍性吗?
IF 1.4 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2021-11-17 DOI: 10.1080/09296174.2021.1991684
Yuxin Hao, Xuelin Wang, Yanni Lin
ABSTRACT Previous studies have shown that dependency distance and its probability distribution can be applied as syntactic indicators of English as interlanguage. However, the universal application of these indicators has not been verified from the perspective of language typology. The issues are addressed in the present study based on a treebank of Chinese interlanguage of English and Japanese native speakers. The findings are as follows: (1) with the improvement of L2 proficiency, the MDDs of learners with different native language backgrounds gradually approach that of the target language in different patterns, and dependency distance is of universal significance as a metric to measure the development of interlanguage’s syntactic complexity; (2) Chinese interlanguage also follows the principle of least effort, and its probability distribution of dependency distance, like those of natural languages, presents a power–law distribution, which can successfully fit the Zipf-Alekseev distribution; (3) the right truncated modified Zipf-Alekseev distribution can be used to measure Chinese interlanguage proficiency, and the fitting parameters of the probability distribution of dependency distance as a metric of interlanguage proficiency are also of universal value.
已有研究表明,依存距离及其概率分布可以作为英语作为中介语的句法指标。然而,这些指标的普遍适用性还没有从语言类型学的角度得到验证。本研究基于英语和日语母语者的中文中介语树库来解决这些问题。研究发现:(1)随着第二语言水平的提高,不同母语背景的学习者的mdd逐渐以不同的模式接近目的语学习者的mdd,依赖距离作为衡量中介语句法复杂性发展的指标具有普遍意义;(2)汉语中介语也遵循最小努力原则,其依赖距离的概率分布与自然语言相似,呈幂律分布,可以很好地拟合Zipf-Alekseev分布;(3)右截断修正Zipf-Alekseev分布可以用来衡量汉语中介语水平,依赖距离概率分布的拟合参数作为中介语水平的度量也具有普遍价值。
{"title":"Dependency Distance and Its Probability Distribution: Are They the Universals for Measuring Second Language Learners’ Language Proficiency?","authors":"Yuxin Hao, Xuelin Wang, Yanni Lin","doi":"10.1080/09296174.2021.1991684","DOIUrl":"https://doi.org/10.1080/09296174.2021.1991684","url":null,"abstract":"ABSTRACT Previous studies have shown that dependency distance and its probability distribution can be applied as syntactic indicators of English as interlanguage. However, the universal application of these indicators has not been verified from the perspective of language typology. The issues are addressed in the present study based on a treebank of Chinese interlanguage of English and Japanese native speakers. The findings are as follows: (1) with the improvement of L2 proficiency, the MDDs of learners with different native language backgrounds gradually approach that of the target language in different patterns, and dependency distance is of universal significance as a metric to measure the development of interlanguage’s syntactic complexity; (2) Chinese interlanguage also follows the principle of least effort, and its probability distribution of dependency distance, like those of natural languages, presents a power–law distribution, which can successfully fit the Zipf-Alekseev distribution; (3) the right truncated modified Zipf-Alekseev distribution can be used to measure Chinese interlanguage proficiency, and the fitting parameters of the probability distribution of dependency distance as a metric of interlanguage proficiency are also of universal value.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"29 1","pages":"485 - 509"},"PeriodicalIF":1.4,"publicationDate":"2021-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47417978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Zipfian Approach to Words in Contexts: The Cases of Modern English and Chinese Zipfian语境中的词语研究——以现代英汉为例
IF 1.4 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2021-05-19 DOI: 10.1080/09296174.2021.1926110
Jinzhou Cong
ABSTRACT The system-level complexity of language has been thoroughly investigated in terms of Zipf’s law, whose quantitative features have proved to reflect text/language typology. This study extends the scope of Zipf’s law from the macroscopic scale of language to specific words in contexts, with the aim of examining its potential as an indicator of word typology. The focus is confined to the high-frequency words in English and Chinese as found in the FLOB and LCMC corpora. It has been found that the log–log rank-frequency distributions of contextual words of the words in question generally abide by the linear function y = ax+b. Moreover, it has been shown that an adjusted version of parameter a can help to distinguish the words in question’s classes. The contextual information as reflected by this Zipf-based index might be more important to the emergence of word classes of Chinese, which has no real inflection as a word-class indicator. From a Zipfian approach, the findings have preliminarily approved Saussure’s systems thinking regarding linguistic signs. Meanwhile, they may also contribute to such fields as usage-based linguistics.
摘要从齐普夫定律的角度对语言的系统级复杂性进行了深入的研究,其数量特征已被证明反映了文本/语言类型学。本研究将齐普夫定律的范围从语言的宏观尺度扩展到语境中的特定单词,目的是考察其作为单词类型学指标的潜力。焦点仅限于FLOB和LCMC语料库中的英语和汉语高频词。研究发现,所讨论单词的上下文单词的对数-对数秩频率分布通常遵循线性函数y=ax+b。此外,研究表明,参数a的调整版本可以帮助区分问题类别中的单词。这种基于齐普夫指数的语境信息可能对汉语词类的出现更为重要,汉语词类没有真正的屈折作为词类指标。从齐普法的角度来看,这些发现初步认可了索绪尔关于语言符号的系统思维。同时,它们也可能对基于使用的语言学等领域做出贡献。
{"title":"A Zipfian Approach to Words in Contexts: The Cases of Modern English and Chinese","authors":"Jinzhou Cong","doi":"10.1080/09296174.2021.1926110","DOIUrl":"https://doi.org/10.1080/09296174.2021.1926110","url":null,"abstract":"ABSTRACT The system-level complexity of language has been thoroughly investigated in terms of Zipf’s law, whose quantitative features have proved to reflect text/language typology. This study extends the scope of Zipf’s law from the macroscopic scale of language to specific words in contexts, with the aim of examining its potential as an indicator of word typology. The focus is confined to the high-frequency words in English and Chinese as found in the FLOB and LCMC corpora. It has been found that the log–log rank-frequency distributions of contextual words of the words in question generally abide by the linear function y = ax+b. Moreover, it has been shown that an adjusted version of parameter a can help to distinguish the words in question’s classes. The contextual information as reflected by this Zipf-based index might be more important to the emergence of word classes of Chinese, which has no real inflection as a word-class indicator. From a Zipfian approach, the findings have preliminarily approved Saussure’s systems thinking regarding linguistic signs. Meanwhile, they may also contribute to such fields as usage-based linguistics.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"29 1","pages":"465 - 484"},"PeriodicalIF":1.4,"publicationDate":"2021-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2021.1926110","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47942307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Indicative/subjunctive Mood Alternation with Adverbs of Doubt in Spanish 西班牙语中指示/虚拟语气与疑问副词的转换
IF 1.4 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2021-04-27 DOI: 10.1080/09296174.2021.1919376
Harunobu Hirota
ABSTRACT This study aims to analyse the indicative/subjunctive mood alternation in Spanish sentences with adverbs of doubt (acaso, posiblemente, probablemente, quizá, quizás, tal vez, seguramente, a lo mejor, igual). To this end, this study statistically analysed the linguistic and social factors conditioning the mood alternation in sentences with adverbs of doubt. A total of 1278 tokens were analysed. Each datum was annotated with verb type, verb aspect, verb person, distance between the adverb and the verb, sex, age, region, and education level. To exclude confounding factors, multivariable logistic regression was conducted, and the analysis yielded significant odds ratios (ORs) for 10 items, including sex, region, education level, adverbs (posiblemente, probablemente, quizá, quizás, tal vez), aspect, and distance between the verb and the adverb. These results show that these adverbs can be divided into two groups, where posiblemente, probablemente, quizá, quizás, and tal vez are more likely to co-occur with the subjunctive than the adverbs acaso, seguramente, a lo mejor, and igual. Furthermore, this study has shown that each adverb differs in the likelihood of co-occurring with the subjunctive, and that social factors of speakers affect the mood selection. Thus, an analysis of mood alternations should include social and linguistic factors.
摘要本研究旨在分析西班牙语疑问句(acaso, posiblemente, probablemente, quiz, quizás, tal vez, seguramente, a lo major, igual)中指示语气/虚拟语气的变化。为此,本研究对影响怀疑副词句子语气变化的语言因素和社会因素进行了统计分析。总共分析了1278个令牌。每个数据都标注了动词类型、动词方面、动词人、副词与动词之间的距离、性别、年龄、地区和教育程度。为了排除混杂因素,我们进行了多变量logistic回归,分析结果显示,性别、地区、教育程度、副词(posiblemente、probablemente、quiz、quizás、tal vez)、aspect和动词与副词之间的距离等10个项目的比值比(ORs)显著。这些结果表明,这些副词可以分为两类,其中possible、probablemente、quiz、quizás和tal vez比acaso、seguramente、a lo major和igual更容易与虚拟语气同时出现。此外,本研究还表明,每个副词与虚拟语气同时出现的可能性是不同的,说话者的社会因素影响着语气的选择。因此,对情绪变化的分析应该包括社会和语言因素。
{"title":"The Indicative/subjunctive Mood Alternation with Adverbs of Doubt in Spanish","authors":"Harunobu Hirota","doi":"10.1080/09296174.2021.1919376","DOIUrl":"https://doi.org/10.1080/09296174.2021.1919376","url":null,"abstract":"ABSTRACT This study aims to analyse the indicative/subjunctive mood alternation in Spanish sentences with adverbs of doubt (acaso, posiblemente, probablemente, quizá, quizás, tal vez, seguramente, a lo mejor, igual). To this end, this study statistically analysed the linguistic and social factors conditioning the mood alternation in sentences with adverbs of doubt. A total of 1278 tokens were analysed. Each datum was annotated with verb type, verb aspect, verb person, distance between the adverb and the verb, sex, age, region, and education level. To exclude confounding factors, multivariable logistic regression was conducted, and the analysis yielded significant odds ratios (ORs) for 10 items, including sex, region, education level, adverbs (posiblemente, probablemente, quizá, quizás, tal vez), aspect, and distance between the verb and the adverb. These results show that these adverbs can be divided into two groups, where posiblemente, probablemente, quizá, quizás, and tal vez are more likely to co-occur with the subjunctive than the adverbs acaso, seguramente, a lo mejor, and igual. Furthermore, this study has shown that each adverb differs in the likelihood of co-occurring with the subjunctive, and that social factors of speakers affect the mood selection. Thus, an analysis of mood alternations should include social and linguistic factors.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"29 1","pages":"450 - 464"},"PeriodicalIF":1.4,"publicationDate":"2021-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2021.1919376","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46192868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The Entropy of Morphological Systems in Natural Languages Is Modulated by Functional and Semantic Properties 自然语言中形态系统的熵受功能和语义属性的调节
IF 1.4 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2021-04-26 DOI: 10.1080/09296174.2022.2063501
Francesca Franzon, Chiara Zanini
ABSTRACT In most natural languages, grammatical gender and number features encode semantic attributes concerning animacy, sex, and numerosity. Despite the likely advantage of promptly communicating about such salient attributes, inflectional systems rarely display consistently bijective correspondences between the semantic attributes and the grammatical feature values. In a study on Italian, we explored how this apparently noisy encoding depends on a trade-off between the semantic and the functional aspects of grammatical features. Using entropy metrics, we assessed the primarily functional purpose of gender and number features in the lexicon, observing a distribution of nouns that can optimally serve agreement-based parsing and prediction of words in sentences. A novel context entropy measure, introduced in this study to assess meaning specificity, revealed a semantic underspecification in masculine and singular nouns denoting animate referents. We argue that underspecification is the hallmark of the particular type of information compression occurring in inflectional systems. In binary inflectional systems, one value specifically encodes a semantic attribute, while the other value does not encode any semantic information, and surfaces as a default for functional purposes. By providing an information-theoretical account of the role of grammatical features, we set the basis for a scientifically informed pursue of language inclusiveness.
在大多数自然语言中,语法上的性别和数字特征编码了有关动物性、性别和数量的语义属性。尽管快速交流这些显著属性可能具有优势,但屈折变化系统很少在语义属性和语法特征值之间显示一致的双客观对应。在一项关于意大利语的研究中,我们探讨了这种明显嘈杂的编码如何取决于语法特征的语义和功能方面之间的权衡。使用熵度量,我们评估了词汇中性别和数字特征的主要功能目的,观察了名词的分布,这些名词可以最优地服务于基于协议的解析和句子中的单词预测。本研究引入了一种新的语境熵测度来评估意义特异性,揭示了在表示有生命的指称物的阳性和单数名词中存在语义欠规范。我们认为,规格不足是发生在屈折系统的特定类型的信息压缩的标志。在二元屈折系统中,一个值专门编码语义属性,而另一个值不编码任何语义信息,并且作为功能目的的默认值出现。通过对语法特征作用的信息理论解释,我们为科学地追求语言包容性奠定了基础。
{"title":"The Entropy of Morphological Systems in Natural Languages Is Modulated by Functional and Semantic Properties","authors":"Francesca Franzon, Chiara Zanini","doi":"10.1080/09296174.2022.2063501","DOIUrl":"https://doi.org/10.1080/09296174.2022.2063501","url":null,"abstract":"ABSTRACT In most natural languages, grammatical gender and number features encode semantic attributes concerning animacy, sex, and numerosity. Despite the likely advantage of promptly communicating about such salient attributes, inflectional systems rarely display consistently bijective correspondences between the semantic attributes and the grammatical feature values. In a study on Italian, we explored how this apparently noisy encoding depends on a trade-off between the semantic and the functional aspects of grammatical features. Using entropy metrics, we assessed the primarily functional purpose of gender and number features in the lexicon, observing a distribution of nouns that can optimally serve agreement-based parsing and prediction of words in sentences. A novel context entropy measure, introduced in this study to assess meaning specificity, revealed a semantic underspecification in masculine and singular nouns denoting animate referents. We argue that underspecification is the hallmark of the particular type of information compression occurring in inflectional systems. In binary inflectional systems, one value specifically encodes a semantic attribute, while the other value does not encode any semantic information, and surfaces as a default for functional purposes. By providing an information-theoretical account of the role of grammatical features, we set the basis for a scientifically informed pursue of language inclusiveness.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"30 1","pages":"42 - 66"},"PeriodicalIF":1.4,"publicationDate":"2021-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43283129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Modelling the Dynamics of Language Change: Logistic Regression, Piotrowski’s Law, and a Handful of Examples in Polish 语言变化的动力学建模:Logistic回归、Piotrowski定律和波兰语中的一大堆例子
IF 1.4 2区 文学 0 LANGUAGE & LINGUISTICS Pub Date : 2021-04-13 DOI: 10.1080/09296174.2022.2151208
Rafal L. Górski, Maciej Eder
ABSTRACT The study discusses modelling diachronic processes by logistic regression. The phenomenon of nonlinear changes in language was first observed by Raimund Piotrowski (hence labelled as Piotrowski’s law), even if actual linguistic evidence often speaks against using the notion of a ‘law’ in this context. In our study, we apply logistic regression models to changes which occurred between 15th and 18th century in the Polish language. The attested course of the majority of these changes closely follow the expected values, which proves that the language change might indeed resemble a nonlinear phase change scenario. We also extend the original Piotrowski’s approach by proposing polynomial logistic regression for these cases which can hardly be described by its standard version. Also, we propose to consider individual language change cases jointly, in order to inspect their possible collinearity or, more likely, their different dynamics in the function of time. Last but not least, we evaluate our results by testing the influence of the subcorpus size on the model’s goodness-of-fit.
摘要本研究讨论了通过逻辑回归对历时过程进行建模。语言中的非线性变化现象最早是由Raimund Piotrowski观察到的(因此被称为Piotrowsky定律),即使实际的语言证据经常反对在这种情况下使用“定律”的概念。在我们的研究中,我们将逻辑回归模型应用于15世纪至18世纪波兰语的变化。这些变化的大部分经过证实的过程与预期值密切相关,这证明语言变化可能确实类似于非线性相变场景。我们还扩展了Piotrowski的原始方法,提出了这些情况下的多项式逻辑回归,其标准版本几乎无法描述。此外,我们建议联合考虑个别语言变化情况,以检查它们可能的共线性,或者更可能的是,它们在时间函数中的不同动态。最后但同样重要的是,我们通过测试子核心大小对模型拟合优度的影响来评估我们的结果。
{"title":"Modelling the Dynamics of Language Change: Logistic Regression, Piotrowski’s Law, and a Handful of Examples in Polish","authors":"Rafal L. Górski, Maciej Eder","doi":"10.1080/09296174.2022.2151208","DOIUrl":"https://doi.org/10.1080/09296174.2022.2151208","url":null,"abstract":"ABSTRACT The study discusses modelling diachronic processes by logistic regression. The phenomenon of nonlinear changes in language was first observed by Raimund Piotrowski (hence labelled as Piotrowski’s law), even if actual linguistic evidence often speaks against using the notion of a ‘law’ in this context. In our study, we apply logistic regression models to changes which occurred between 15th and 18th century in the Polish language. The attested course of the majority of these changes closely follow the expected values, which proves that the language change might indeed resemble a nonlinear phase change scenario. We also extend the original Piotrowski’s approach by proposing polynomial logistic regression for these cases which can hardly be described by its standard version. Also, we propose to consider individual language change cases jointly, in order to inspect their possible collinearity or, more likely, their different dynamics in the function of time. Last but not least, we evaluate our results by testing the influence of the subcorpus size on the model’s goodness-of-fit.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"30 1","pages":"125 - 151"},"PeriodicalIF":1.4,"publicationDate":"2021-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43674217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Quantitative Linguistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1