Journal of Quantitative Linguistics最新文献

英文中文

Prose, Verse and Authorship in Dream of the Red Chamber: A Stylometric Analysis 《红楼梦》中的文、诗与作者:文体分析

IF 1.4 2区文学 0 LANGUAGE & LINGUISTICS

Journal of Quantitative Linguistics

Pub Date : 2020-02-09 DOI: 10.1080/09296174.2020.1724677

Haoran Zhu, L. Lei, Hugh Craig

ABSTRACT In this study, we provide a quantitative analysis of prose and verse in the classical Chinese novel, Dream of the Red Chamber (DRC), and discuss the implications for the disputed authorship of the novel. Firstly, we examine the amount of verse in across the chapters of DRC, and compare the style of the verse and prose portions of DRC. Secondly, a Principal Component Analysis (PCA) of DRC is performed based on the prose portions of the novel. Lastly, we discuss the implications of our experimental results for authorship attribution as well as descriptive stylistic analysis of DRC. Our authorial analysis largely confirms the findings of some previous studies that the novel has two authors. Meanwhile, stylistic analyses of the prose portions of the novel yield new and interesting results, which demonstrates that stylometric tools can be used to facilitate descriptive studies of classical Chinese literature.

摘要本研究对中国古典小说《红楼梦》中的散文和韵文进行了定量分析，并探讨了其对小说作者争议的影响。首先，我们研究了DRC各章节的诗歌数量，并比较了DRC的诗歌和散文部分的风格。其次，对小说散文部分进行主成分分析(PCA)。最后，我们讨论了我们的实验结果对作者归因以及DRC的描述性风格分析的影响。我们的作者分析在很大程度上证实了之前一些研究的发现，即这部小说有两个作者。同时，对小说散文部分的文体分析产生了新的有趣的结果，这表明文体计量学工具可以用来促进中国古典文学的描述性研究。

引用次数: 7

Syntactic Impairments of Chinese Alzheimer’s Disease Patients from a Language Dependency Network Perspective 语言依赖网络视角下中国阿尔茨海默病患者的句法障碍

IF 1.4 2区文学 0 LANGUAGE & LINGUISTICS

Journal of Quantitative Linguistics

Pub Date : 2020-01-07 DOI: 10.1080/09296174.2019.1703485

Jianpeng Liu, Junhai Zhao, Xiaohui Bai

ABSTRACT This study examined the syntactic impairments of Chinese Alzheimer’s disease patients with a dependency network approach. The dependency treebanks and dependency networks are constructed from the discourses of both the patient group and its healthy peers. By analysing the contrasts in the dependency networks of the two groups, we found that 1) the mean dependency distance (MDD) of the AD group is shorter than that of the HP group; furthermore, the MDDs of both AD and HP groups are far below the standard Chinese MDD; 2) the content words like remember, forget, know, etc. and the negative forms of the verbs like don’t know, can’t remember, can’t say, etc. show highly repetitive uncertain and negative expressions that are typical of the predicates of the clauses of AD patients; 3) the function word vertices in the AD dependency network have distinctive network parameters such as higher ‘betweenness’ centrality, closeness centrality, and clustering coefficients, etc., indicating that the syntax of AD is impaired and features more simplified stereotypes. These results indicate that the syntax of the AD group has been impaired from parts of speech to the whole syntactic structure.

摘要本研究采用依赖网络方法对中国阿尔茨海默病患者的句法障碍进行了研究。依赖树库和依赖网络是从患者群体及其健康同伴的话语中构建的。通过分析两组依赖网络的对比，我们发现1）AD组的平均依赖距离（MDD）比HP组短；AD组和HP组的MDD均远低于中国标准MDD；2） “记住”、“忘记”、“知道”等内容词和“不知道”、“不记得”、“不能说”等动词的否定形式表现出高度重复的不确定否定表达，是AD患者从句谓语的典型表现；3） AD依赖网络中的功能词顶点具有较高的网络参数，如“介数”中心性、贴近度中心性和聚类系数等，表明AD的语法受损，具有更简化的刻板印象特征。这些结果表明，AD组的句法从词性到整个句法结构都受到了损害。

{"title":"Syntactic Impairments of Chinese Alzheimer’s Disease Patients from a Language Dependency Network Perspective","authors":"Jianpeng Liu, Junhai Zhao, Xiaohui Bai","doi":"10.1080/09296174.2019.1703485","DOIUrl":"https://doi.org/10.1080/09296174.2019.1703485","url":null,"abstract":"ABSTRACT This study examined the syntactic impairments of Chinese Alzheimer’s disease patients with a dependency network approach. The dependency treebanks and dependency networks are constructed from the discourses of both the patient group and its healthy peers. By analysing the contrasts in the dependency networks of the two groups, we found that 1) the mean dependency distance (MDD) of the AD group is shorter than that of the HP group; furthermore, the MDDs of both AD and HP groups are far below the standard Chinese MDD; 2) the content words like remember, forget, know, etc. and the negative forms of the verbs like don’t know, can’t remember, can’t say, etc. show highly repetitive uncertain and negative expressions that are typical of the predicates of the clauses of AD patients; 3) the function word vertices in the AD dependency network have distinctive network parameters such as higher ‘betweenness’ centrality, closeness centrality, and clustering coefficients, etc., indicating that the syntax of AD is impaired and features more simplified stereotypes. These results indicate that the syntax of the AD group has been impaired from parts of speech to the whole syntactic structure.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"28 1","pages":"253 - 281"},"PeriodicalIF":1.4,"publicationDate":"2020-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2019.1703485","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43008185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Functional Role of Frequency in Word-Formation Processes: A System Theoretical Approach 频率在造词过程中的作用：一种系统理论方法

IF 1.4 2区文学 0 LANGUAGE & LINGUISTICS

Journal of Quantitative Linguistics

Pub Date : 2020-01-02 DOI: 10.1080/09296174.2018.1496990

Inna Uglanova

ABSTRACT Three experimental models are reported in which verb-formation processes are used to investigate the effect of frequency on language structure. The first model examined the impact of frequency on length of a language unit. Only the smoothed data for the prototypical verb formation confirmed the hypothesis. The second model tested the dependency of frequency on the depth of a word-formation structure. Good-fitting results were found for all main verb-formation structures. The third model aimed to study the influence of frequency on productivity (number of derivatives). The results of smoothing data showed that the more frequently a unit is used, the more derivatives it has. The outcomes allow clarifying some aspects of functioning of frequency in the synergetic mechanisms of language. In particular, it was shown that the observed frequency oscillation could be considered as a dialogue between the system and its environment.

本文报道了三个实验模型，利用动词形成过程来研究频率对语言结构的影响。第一个模型考察了频率对语言单元长度的影响。只有原型动词构成的平滑数据证实了这一假设。第二个模型测试了频率对单词形成结构深度的依赖性。所有主要的动词构成结构都有很好的拟合结果。第三个模型旨在研究频率对生产率（导数数量）的影响。平滑数据的结果表明，一个单位的使用频率越高，其导数就越多。这些结果可以澄清频率在语言协同机制中的某些方面的作用。特别是，研究表明，观测到的频率振荡可以被视为系统与其环境之间的对话。

引用次数: 0

Normalized Dependency Distance: Proposing a New Measure 归一化依赖距离：一种新的度量方法

IF 1.4 2区文学 0 LANGUAGE & LINGUISTICS

Journal of Quantitative Linguistics

Pub Date : 2020-01-02 DOI: 10.1080/09296174.2018.1504615

L. Lei, Matthew L. Jockers

ABSTRACT Previous studies of dependency distance as a measure of, or a proxy for, syntactic complexity do not consider factors such as sentence length and root distance. In the present study, we propose a new algorithm, i.e. Normalized Dependency Distance (NDD), that takes sentence length and root distance into consideration. Our analysis showed that exponential distribution fit well the distribution model of NDD as it did with Mean Dependency Distance (MDD), the algorithm used in previous studies. Findings indicated that NDD is significantly less dependent on sentence length than MDD is, which suggests that the new algorithm may have, to some extent, addressed the issue of MDD’s dependency on sentence length. It is argued that NDD may serve as a measure of syntactic complexity, which is a kind of universality limited by the capacity of human working memory.

摘要以往关于依赖距离作为句法复杂性的度量或替代的研究没有考虑句子长度和词根距离等因素。在本研究中，我们提出了一种新的算法，即归一化依赖距离（NDD），它考虑了句子长度和根距离。我们的分析表明，指数分布很好地拟合了NDD的分布模型，就像它与先前研究中使用的算法平均依赖距离（MDD）一样。研究结果表明，NDD对句子长度的依赖性明显低于MDD，这表明新算法可能在一定程度上解决了MDD对句子长度依赖性的问题。NDD可以作为句法复杂性的一种衡量标准，它是一种受人类工作记忆能力限制的普遍性。

引用次数: 11

Numerical Assessment of Orthographic Neighbourhood Size Fluctuation in Writing Using Fractal Dimension Analysis 应用分形分析对书写中正交邻域大小波动的数值评估

IF 1.4 2区文学 0 LANGUAGE & LINGUISTICS

Journal of Quantitative Linguistics

Pub Date : 2019-11-25 DOI: 10.1080/09296174.2019.1694360

R. Taibu, E. Cheung, Weier Ye, S. Dehipawala, V. Shekoyan, G. Tremberger, T. Cheung

ABSTRACT The orthographic size of a targeted word, the number of new words that can be generated from a targeted word by exchanging a single letter, offers a research window where words can be transformed into numerical values. The CLEARPOND technology from Northwestern University was used for the transformation. A writing can then be modelled as a time series where the fluctuation can be further described using fractal dimension analysis. This project used the Higuchi fractal method for the computation of the fractal dimensions of time series. The proof of concept was conducted using writing examples which include Astronomy writing and English writing, the responses of Trump and Clinton in a Presidential election debate, and song lyrics. The results suggested that a high fractal dimension has an association with a high-demand cognitive task. The use of fractal dimension analysis as a writing assessment tool is discussed with relationship to the current lexical diversity computation technology.

摘要目标词的正字法大小，即通过交换单个字母可以从目标词中生成的新词数量，为将单词转换为数值提供了一个研究窗口。改造采用了西北大学CLEARPOND技术。然后可以将书写建模为时间序列，其中可以使用分形维数分析来进一步描述波动。该项目使用Higuchi分形方法计算时间序列的分形维数。概念验证使用了写作示例，包括天文学写作和英语写作、特朗普和克林顿在总统选举辩论中的回应以及歌词。结果表明，高分形维数与高需求的认知任务有关。结合当前的词汇多样性计算技术，讨论了分形维数分析作为写作评估工具的应用。

{"title":"Numerical Assessment of Orthographic Neighbourhood Size Fluctuation in Writing Using Fractal Dimension Analysis","authors":"R. Taibu, E. Cheung, Weier Ye, S. Dehipawala, V. Shekoyan, G. Tremberger, T. Cheung","doi":"10.1080/09296174.2019.1694360","DOIUrl":"https://doi.org/10.1080/09296174.2019.1694360","url":null,"abstract":"ABSTRACT The orthographic size of a targeted word, the number of new words that can be generated from a targeted word by exchanging a single letter, offers a research window where words can be transformed into numerical values. The CLEARPOND technology from Northwestern University was used for the transformation. A writing can then be modelled as a time series where the fluctuation can be further described using fractal dimension analysis. This project used the Higuchi fractal method for the computation of the fractal dimensions of time series. The proof of concept was conducted using writing examples which include Astronomy writing and English writing, the responses of Trump and Clinton in a Presidential election debate, and song lyrics. The results suggested that a high fractal dimension has an association with a high-demand cognitive task. The use of fractal dimension analysis as a writing assessment tool is discussed with relationship to the current lexical diversity computation technology.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"28 1","pages":"237 - 252"},"PeriodicalIF":1.4,"publicationDate":"2019-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2019.1694360","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49612311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Word Length Distribution in Zhuang Language 壮语的词长分布

IF 1.4 2区文学 0 LANGUAGE & LINGUISTICS

Journal of Quantitative Linguistics

Pub Date : 2019-10-30 DOI: 10.1080/09296174.2019.1678225

Aiyun Wei, Qian Lu, Haitao Liu

ABSTRACT The present study focuses on the word length distribution (WLD) of Zhuang language. The results show that the WLDs of all texts investigated can be described by the Positive Cohen-Poisson model when the word length is measured by the syllable numbers. However, when the word length is measured by the letter numbers, they do not follow any model from the Poisson or Binomial distribution families widely observed in other languages. However, the WLDs of all the Zhuang texts investigated follow the Zipf-Alekseev function either in terms of syllable or letter numbers. Moreover, the research on the WLDs of different Zhuang genres indicates that WLD may not be a sensitive index in distinguishing different Zhuang genres but an effective one in distinguishing different Zhuang styles (spoken or written). Then, the study of the relationship between the parameters a and b in the Zipf-Alekseev function shows that the self-organizing regularity observed in other languages also exists in Zhuang. Finally, the study of the word length-frequency relationship of Zhuang indicates that Zhuang word length is influenced by its frequency, which can be explained by Zipf’s ‘Principle of Least Effort’ and thus follow the law of lexical synergetic subsystem in synergetic linguistics.

摘要本文主要研究壮语的词长分布。结果表明，当用音节数来衡量单词长度时，所研究的所有文本的WLD都可以用正Cohen Poisson模型来描述。然而，当用字母数字来衡量单词长度时，它们并不遵循其他语言中广泛观察到的泊松分布族或二项式分布族的任何模型。然而，所研究的所有壮族文本的WLD在音节或字母数方面都遵循齐普夫-阿列克谢夫函数。此外，对不同壮族语类WLD的研究表明，WLD可能不是区分不同壮族语类型的敏感指标，而是区分不同壮族风格（口语或书面）的有效指标。然后，对Zipf-Alekseev函数中参数a和b之间关系的研究表明，在其他语言中观察到的自组织规律在壮语中也存在。最后，对壮语词长频率关系的研究表明，壮语的词长受其频率的影响，这可以用齐普夫的“最小努力原则”来解释，从而遵循协同语言学中词汇协同子系统的规律。

{"title":"Word Length Distribution in Zhuang Language","authors":"Aiyun Wei, Qian Lu, Haitao Liu","doi":"10.1080/09296174.2019.1678225","DOIUrl":"https://doi.org/10.1080/09296174.2019.1678225","url":null,"abstract":"ABSTRACT The present study focuses on the word length distribution (WLD) of Zhuang language. The results show that the WLDs of all texts investigated can be described by the Positive Cohen-Poisson model when the word length is measured by the syllable numbers. However, when the word length is measured by the letter numbers, they do not follow any model from the Poisson or Binomial distribution families widely observed in other languages. However, the WLDs of all the Zhuang texts investigated follow the Zipf-Alekseev function either in terms of syllable or letter numbers. Moreover, the research on the WLDs of different Zhuang genres indicates that WLD may not be a sensitive index in distinguishing different Zhuang genres but an effective one in distinguishing different Zhuang styles (spoken or written). Then, the study of the relationship between the parameters a and b in the Zipf-Alekseev function shows that the self-organizing regularity observed in other languages also exists in Zhuang. Finally, the study of the word length-frequency relationship of Zhuang indicates that Zhuang word length is influenced by its frequency, which can be explained by Zipf’s ‘Principle of Least Effort’ and thus follow the law of lexical synergetic subsystem in synergetic linguistics.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"28 1","pages":"195 - 222"},"PeriodicalIF":1.4,"publicationDate":"2019-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2019.1678225","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43491576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Calculation of Phonetic Distances between Speech Sounds 语音间距的计算

IF 1.4 2区文学 0 LANGUAGE & LINGUISTICS

Journal of Quantitative Linguistics

Pub Date : 2019-10-23 DOI: 10.1080/09296174.2019.1678709

M. Vakulenko

ABSTRACT A new formalism to numerically measure phonetic differences between speech sounds treating feature values of the compared phones as independent parameters that give rise to corresponding Euclidean distances is put forward. The articulatory and acoustic methods within this formalism were compared, where the corresponding results display good agreement. The more reliable and more universal character of the acoustic approach is provided by robust and precise acoustic parameters used therein. The theoretical model and the findings of this article comply also with the experimental phonetic results. The proposed approach contributes to formalization of the procedure of phone comparison and mapping needed for automatic text and speech processing.

摘要提出了一种新的形式来数值测量语音之间的语音差异，将被比较的手机的特征值视为独立的参数，从而产生相应的欧几里得距离。比较了这种形式中的发音和声学方法，相应的结果显示出良好的一致性。声学方法的更可靠和更通用的特性由其中使用的稳健和精确的声学参数提供。本文的理论模型和研究结果也与实验语音结果相吻合。所提出的方法有助于实现自动文本和语音处理所需的电话比较和映射程序的形式化。

引用次数: 3

Comparing χ2 Tables for Separability of Distribution and Effect: Meta-Tests for Comparing Homogeneity and Goodness of Fit Contingency Test Outcomes 比较分布和效果可分离性的χ2表:比较偶然性检验结果的同质性和拟合优度的元检验

IF 1.4 2区文学 0 LANGUAGE & LINGUISTICS

Journal of Quantitative Linguistics

Pub Date : 2019-10-02 DOI: 10.1080/09296174.2018.1496537

S. Wallis

ABSTRACT This paper describes a series of statistical meta-tests for comparing independent contingency tables for different types of significant difference. Recognizing when an experiment obtains a significantly different result and when it does not is frequently overlooked in research publication. Papers are frequently published citing ‘p values’ or test scores suggesting a ‘stronger effect’ substituting for sound statistical reasoning. This paper sets out a series of tests that together illustrate the correct approach to this question. These meta-tests permit us to evaluate whether experiments have failed to replicate on new data; whether a particular data source or subcorpus obtains a significantly different result than another; or whether changing experimental parameters obtains a stronger effect. The meta-tests are derived mathematically from the χ2 test and the Wilson score interval, and consist of pairwise ‘point’ tests, ‘homogeneity’ tests and ‘goodness of fit’ tests. Meta-tests for comparing tests with one degree of freedom (e.g. ‘2 × 1ʹ and ‘2 × 2ʹ tests) are generalized to those of arbitrary size. Finally, we compare our approach with a competing approach offered by Zar, which, while straightforward to calculate, turns out to be both less powerful and less robust. (Note: A spreadsheet including all the tests in this paper is publicly available at www.ucl.ac.uk/english-usage/statspapers/2x2-x2-separability.xls.)

摘要本文介绍了一系列统计元检验，用于比较不同类型显著性差异的独立列联表。在研究出版物中，对实验何时获得显著不同结果以及何时没有显著不同结果的认识经常被忽视。论文经常引用“p值”或测试分数来代替可靠的统计推理，表明“更强的效应”。本文列出了一系列测试，共同说明了解决这个问题的正确方法。这些元测试允许我们评估实验是否无法在新数据上复制;一个特定的数据源或子语料库是否获得与其他数据源或子语料库明显不同的结果;或者改变实验参数是否能获得更强的效果。元检验由χ2检验和Wilson评分区间数学推导而来，由两两“点”检验、“同质性”检验和“拟合优度”检验组成。用于比较具有一个自由度的检验(例如' 2 × 1 '和' 2 × 2 '检验)的元检验可以推广到任意大小的检验。最后，我们将我们的方法与Zar提供的竞争方法进行了比较，后者虽然易于计算，但功能较弱且鲁棒性较差。(注:包含本文中所有测试的电子表格可在www.ucl.ac.uk/english-usage/statspapers/2x2-x2-separability.xls上公开获取。)

{"title":"Comparing χ2 Tables for Separability of Distribution and Effect: Meta-Tests for Comparing Homogeneity and Goodness of Fit Contingency Test Outcomes","authors":"S. Wallis","doi":"10.1080/09296174.2018.1496537","DOIUrl":"https://doi.org/10.1080/09296174.2018.1496537","url":null,"abstract":"ABSTRACT This paper describes a series of statistical meta-tests for comparing independent contingency tables for different types of significant difference. Recognizing when an experiment obtains a significantly different result and when it does not is frequently overlooked in research publication. Papers are frequently published citing ‘p values’ or test scores suggesting a ‘stronger effect’ substituting for sound statistical reasoning. This paper sets out a series of tests that together illustrate the correct approach to this question. These meta-tests permit us to evaluate whether experiments have failed to replicate on new data; whether a particular data source or subcorpus obtains a significantly different result than another; or whether changing experimental parameters obtains a stronger effect. The meta-tests are derived mathematically from the χ2 test and the Wilson score interval, and consist of pairwise ‘point’ tests, ‘homogeneity’ tests and ‘goodness of fit’ tests. Meta-tests for comparing tests with one degree of freedom (e.g. ‘2 × 1ʹ and ‘2 × 2ʹ tests) are generalized to those of arbitrary size. Finally, we compare our approach with a competing approach offered by Zar, which, while straightforward to calculate, turns out to be both less powerful and less robust. (Note: A spreadsheet including all the tests in this paper is publicly available at www.ucl.ac.uk/english-usage/statspapers/2x2-x2-separability.xls.)","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"26 1","pages":"330 - 355"},"PeriodicalIF":1.4,"publicationDate":"2019-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2018.1496537","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44599559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Peter Grzybek (1957 – 2019) 彼得·格日贝克（1957–2019）

IF 1.4 2区文学 0 LANGUAGE & LINGUISTICS

Journal of Quantitative Linguistics

Pub Date : 2019-10-02 DOI: 10.1080/09296174.2019.1651514

引用次数: 0

The Discriminativeness of Internal Syntactic Representations in Automatic Genre Classification 自动分类中内部句法表征的判别性

IF 1.4 2区文学 0 LANGUAGE & LINGUISTICS

Journal of Quantitative Linguistics

Pub Date : 2019-09-26 DOI: 10.1080/09296174.2019.1663655

Mingyu Wan, A. Fang, Chu-Ren Huang

ABSTRACT Genre characterizes a document differently from a subject that has been the focus of most document retrieval and classification applications. This work hypothesizes a close interaction between syntactic variation and genre differentiation by introspecting stylistic cues in functional and structural aspects beyond word level. It has engineered 14 syntactic feature sets of internal representations for genre classification through Machine Learning devices. Experiment results show significant superiority of fusing structural and lexical features for genre classification (F∆max. = 9.2%, sig. = 0.001), suggesting the effectiveness of incorporating syntactic cues for genre discrimination. In addition, the PCA analysis reports the noun phrases (NP) as the most principle component (66%) for genre variation and prepositional phrases (PP) the second. Particularly, noun phrases with dominant structures of prepositional complements and pronouns functioning as a subject are most effective for identifying printed texts of high formality, while prepositional phrases are useful for identifying speeches of low formality. Error analysis suggests that the phrasal features are particularly useful for classifying four groups of genre classes, i.e. unscripted speech, fiction, news reports, and academic writing, all distributed with distinct structural characteristics, and they demonstrate an incremental degree of formality in the continuum of language complexity.

摘要类型对文档的描述与大多数文档检索和分类应用程序关注的主题不同。这项工作通过在单词层面之外的功能和结构方面反思文体线索，假设句法变异和体裁分化之间存在密切的互动。它设计了14个内部表示的句法特征集，用于通过机器学习设备进行类型分类。实验结果表明，在体裁分类中，融合结构和词汇特征具有显著的优势（F∆max=9.2%，sig.=0.001），这表明结合句法线索进行体裁识别是有效的。此外，主成分分析报告称，名词短语（NP）是类型变化的最主要成分（66%），介词短语（PP）其次。特别是，具有介词补语主导结构的名词短语和充当主语的代词对于识别高形式的印刷文本最有效，而介词短语对于识别低形式的演讲则很有用。错误分析表明，短语特征对于四组类型类别的分类特别有用，即无脚本演讲、小说、新闻报道和学术写作，它们都以不同的结构特征分布，并且在语言复杂性的连续体中表现出递增的形式度。

{"title":"The Discriminativeness of Internal Syntactic Representations in Automatic Genre Classification","authors":"Mingyu Wan, A. Fang, Chu-Ren Huang","doi":"10.1080/09296174.2019.1663655","DOIUrl":"https://doi.org/10.1080/09296174.2019.1663655","url":null,"abstract":"ABSTRACT Genre characterizes a document differently from a subject that has been the focus of most document retrieval and classification applications. This work hypothesizes a close interaction between syntactic variation and genre differentiation by introspecting stylistic cues in functional and structural aspects beyond word level. It has engineered 14 syntactic feature sets of internal representations for genre classification through Machine Learning devices. Experiment results show significant superiority of fusing structural and lexical features for genre classification (F∆max. = 9.2%, sig. = 0.001), suggesting the effectiveness of incorporating syntactic cues for genre discrimination. In addition, the PCA analysis reports the noun phrases (NP) as the most principle component (66%) for genre variation and prepositional phrases (PP) the second. Particularly, noun phrases with dominant structures of prepositional complements and pronouns functioning as a subject are most effective for identifying printed texts of high formality, while prepositional phrases are useful for identifying speeches of low formality. Error analysis suggests that the phrasal features are particularly useful for classifying four groups of genre classes, i.e. unscripted speech, fiction, news reports, and academic writing, all distributed with distinct structural characteristics, and they demonstrate an incremental degree of formality in the continuum of language complexity.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"28 1","pages":"138 - 171"},"PeriodicalIF":1.4,"publicationDate":"2019-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2019.1663655","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43498085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Journal of Quantitative Linguistics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀