Pub Date : 2020-02-09DOI: 10.1080/09296174.2020.1724677
Haoran Zhu, L. Lei, Hugh Craig
ABSTRACT In this study, we provide a quantitative analysis of prose and verse in the classical Chinese novel, Dream of the Red Chamber (DRC), and discuss the implications for the disputed authorship of the novel. Firstly, we examine the amount of verse in across the chapters of DRC, and compare the style of the verse and prose portions of DRC. Secondly, a Principal Component Analysis (PCA) of DRC is performed based on the prose portions of the novel. Lastly, we discuss the implications of our experimental results for authorship attribution as well as descriptive stylistic analysis of DRC. Our authorial analysis largely confirms the findings of some previous studies that the novel has two authors. Meanwhile, stylistic analyses of the prose portions of the novel yield new and interesting results, which demonstrates that stylometric tools can be used to facilitate descriptive studies of classical Chinese literature.
{"title":"Prose, Verse and Authorship in Dream of the Red Chamber: A Stylometric Analysis","authors":"Haoran Zhu, L. Lei, Hugh Craig","doi":"10.1080/09296174.2020.1724677","DOIUrl":"https://doi.org/10.1080/09296174.2020.1724677","url":null,"abstract":"ABSTRACT In this study, we provide a quantitative analysis of prose and verse in the classical Chinese novel, Dream of the Red Chamber (DRC), and discuss the implications for the disputed authorship of the novel. Firstly, we examine the amount of verse in across the chapters of DRC, and compare the style of the verse and prose portions of DRC. Secondly, a Principal Component Analysis (PCA) of DRC is performed based on the prose portions of the novel. Lastly, we discuss the implications of our experimental results for authorship attribution as well as descriptive stylistic analysis of DRC. Our authorial analysis largely confirms the findings of some previous studies that the novel has two authors. Meanwhile, stylistic analyses of the prose portions of the novel yield new and interesting results, which demonstrates that stylometric tools can be used to facilitate descriptive studies of classical Chinese literature.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"28 1","pages":"289 - 305"},"PeriodicalIF":1.4,"publicationDate":"2020-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2020.1724677","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46696603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-01-07DOI: 10.1080/09296174.2019.1703485
Jianpeng Liu, Junhai Zhao, Xiaohui Bai
ABSTRACT This study examined the syntactic impairments of Chinese Alzheimer’s disease patients with a dependency network approach. The dependency treebanks and dependency networks are constructed from the discourses of both the patient group and its healthy peers. By analysing the contrasts in the dependency networks of the two groups, we found that 1) the mean dependency distance (MDD) of the AD group is shorter than that of the HP group; furthermore, the MDDs of both AD and HP groups are far below the standard Chinese MDD; 2) the content words like remember, forget, know, etc. and the negative forms of the verbs like don’t know, can’t remember, can’t say, etc. show highly repetitive uncertain and negative expressions that are typical of the predicates of the clauses of AD patients; 3) the function word vertices in the AD dependency network have distinctive network parameters such as higher ‘betweenness’ centrality, closeness centrality, and clustering coefficients, etc., indicating that the syntax of AD is impaired and features more simplified stereotypes. These results indicate that the syntax of the AD group has been impaired from parts of speech to the whole syntactic structure.
{"title":"Syntactic Impairments of Chinese Alzheimer’s Disease Patients from a Language Dependency Network Perspective","authors":"Jianpeng Liu, Junhai Zhao, Xiaohui Bai","doi":"10.1080/09296174.2019.1703485","DOIUrl":"https://doi.org/10.1080/09296174.2019.1703485","url":null,"abstract":"ABSTRACT This study examined the syntactic impairments of Chinese Alzheimer’s disease patients with a dependency network approach. The dependency treebanks and dependency networks are constructed from the discourses of both the patient group and its healthy peers. By analysing the contrasts in the dependency networks of the two groups, we found that 1) the mean dependency distance (MDD) of the AD group is shorter than that of the HP group; furthermore, the MDDs of both AD and HP groups are far below the standard Chinese MDD; 2) the content words like remember, forget, know, etc. and the negative forms of the verbs like don’t know, can’t remember, can’t say, etc. show highly repetitive uncertain and negative expressions that are typical of the predicates of the clauses of AD patients; 3) the function word vertices in the AD dependency network have distinctive network parameters such as higher ‘betweenness’ centrality, closeness centrality, and clustering coefficients, etc., indicating that the syntax of AD is impaired and features more simplified stereotypes. These results indicate that the syntax of the AD group has been impaired from parts of speech to the whole syntactic structure.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"28 1","pages":"253 - 281"},"PeriodicalIF":1.4,"publicationDate":"2020-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2019.1703485","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43008185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-01-02DOI: 10.1080/09296174.2018.1504615
L. Lei, Matthew L. Jockers
ABSTRACT Previous studies of dependency distance as a measure of, or a proxy for, syntactic complexity do not consider factors such as sentence length and root distance. In the present study, we propose a new algorithm, i.e. Normalized Dependency Distance (NDD), that takes sentence length and root distance into consideration. Our analysis showed that exponential distribution fit well the distribution model of NDD as it did with Mean Dependency Distance (MDD), the algorithm used in previous studies. Findings indicated that NDD is significantly less dependent on sentence length than MDD is, which suggests that the new algorithm may have, to some extent, addressed the issue of MDD’s dependency on sentence length. It is argued that NDD may serve as a measure of syntactic complexity, which is a kind of universality limited by the capacity of human working memory.
{"title":"Normalized Dependency Distance: Proposing a New Measure","authors":"L. Lei, Matthew L. Jockers","doi":"10.1080/09296174.2018.1504615","DOIUrl":"https://doi.org/10.1080/09296174.2018.1504615","url":null,"abstract":"ABSTRACT Previous studies of dependency distance as a measure of, or a proxy for, syntactic complexity do not consider factors such as sentence length and root distance. In the present study, we propose a new algorithm, i.e. Normalized Dependency Distance (NDD), that takes sentence length and root distance into consideration. Our analysis showed that exponential distribution fit well the distribution model of NDD as it did with Mean Dependency Distance (MDD), the algorithm used in previous studies. Findings indicated that NDD is significantly less dependent on sentence length than MDD is, which suggests that the new algorithm may have, to some extent, addressed the issue of MDD’s dependency on sentence length. It is argued that NDD may serve as a measure of syntactic complexity, which is a kind of universality limited by the capacity of human working memory.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"27 1","pages":"62 - 79"},"PeriodicalIF":1.4,"publicationDate":"2020-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2018.1504615","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42357852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-01-02DOI: 10.1080/09296174.2018.1496990
Inna Uglanova
ABSTRACT Three experimental models are reported in which verb-formation processes are used to investigate the effect of frequency on language structure. The first model examined the impact of frequency on length of a language unit. Only the smoothed data for the prototypical verb formation confirmed the hypothesis. The second model tested the dependency of frequency on the depth of a word-formation structure. Good-fitting results were found for all main verb-formation structures. The third model aimed to study the influence of frequency on productivity (number of derivatives). The results of smoothing data showed that the more frequently a unit is used, the more derivatives it has. The outcomes allow clarifying some aspects of functioning of frequency in the synergetic mechanisms of language. In particular, it was shown that the observed frequency oscillation could be considered as a dialogue between the system and its environment.
{"title":"Functional Role of Frequency in Word-Formation Processes: A System Theoretical Approach","authors":"Inna Uglanova","doi":"10.1080/09296174.2018.1496990","DOIUrl":"https://doi.org/10.1080/09296174.2018.1496990","url":null,"abstract":"ABSTRACT Three experimental models are reported in which verb-formation processes are used to investigate the effect of frequency on language structure. The first model examined the impact of frequency on length of a language unit. Only the smoothed data for the prototypical verb formation confirmed the hypothesis. The second model tested the dependency of frequency on the depth of a word-formation structure. Good-fitting results were found for all main verb-formation structures. The third model aimed to study the influence of frequency on productivity (number of derivatives). The results of smoothing data showed that the more frequently a unit is used, the more derivatives it has. The outcomes allow clarifying some aspects of functioning of frequency in the synergetic mechanisms of language. In particular, it was shown that the observed frequency oscillation could be considered as a dialogue between the system and its environment.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"27 1","pages":"1 - 31"},"PeriodicalIF":1.4,"publicationDate":"2020-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2018.1496990","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45884280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-11-25DOI: 10.1080/09296174.2019.1694360
R. Taibu, E. Cheung, Weier Ye, S. Dehipawala, V. Shekoyan, G. Tremberger, T. Cheung
ABSTRACT The orthographic size of a targeted word, the number of new words that can be generated from a targeted word by exchanging a single letter, offers a research window where words can be transformed into numerical values. The CLEARPOND technology from Northwestern University was used for the transformation. A writing can then be modelled as a time series where the fluctuation can be further described using fractal dimension analysis. This project used the Higuchi fractal method for the computation of the fractal dimensions of time series. The proof of concept was conducted using writing examples which include Astronomy writing and English writing, the responses of Trump and Clinton in a Presidential election debate, and song lyrics. The results suggested that a high fractal dimension has an association with a high-demand cognitive task. The use of fractal dimension analysis as a writing assessment tool is discussed with relationship to the current lexical diversity computation technology.
{"title":"Numerical Assessment of Orthographic Neighbourhood Size Fluctuation in Writing Using Fractal Dimension Analysis","authors":"R. Taibu, E. Cheung, Weier Ye, S. Dehipawala, V. Shekoyan, G. Tremberger, T. Cheung","doi":"10.1080/09296174.2019.1694360","DOIUrl":"https://doi.org/10.1080/09296174.2019.1694360","url":null,"abstract":"ABSTRACT The orthographic size of a targeted word, the number of new words that can be generated from a targeted word by exchanging a single letter, offers a research window where words can be transformed into numerical values. The CLEARPOND technology from Northwestern University was used for the transformation. A writing can then be modelled as a time series where the fluctuation can be further described using fractal dimension analysis. This project used the Higuchi fractal method for the computation of the fractal dimensions of time series. The proof of concept was conducted using writing examples which include Astronomy writing and English writing, the responses of Trump and Clinton in a Presidential election debate, and song lyrics. The results suggested that a high fractal dimension has an association with a high-demand cognitive task. The use of fractal dimension analysis as a writing assessment tool is discussed with relationship to the current lexical diversity computation technology.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"28 1","pages":"237 - 252"},"PeriodicalIF":1.4,"publicationDate":"2019-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2019.1694360","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49612311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-10-30DOI: 10.1080/09296174.2019.1678225
Aiyun Wei, Qian Lu, Haitao Liu
ABSTRACT The present study focuses on the word length distribution (WLD) of Zhuang language. The results show that the WLDs of all texts investigated can be described by the Positive Cohen-Poisson model when the word length is measured by the syllable numbers. However, when the word length is measured by the letter numbers, they do not follow any model from the Poisson or Binomial distribution families widely observed in other languages. However, the WLDs of all the Zhuang texts investigated follow the Zipf-Alekseev function either in terms of syllable or letter numbers. Moreover, the research on the WLDs of different Zhuang genres indicates that WLD may not be a sensitive index in distinguishing different Zhuang genres but an effective one in distinguishing different Zhuang styles (spoken or written). Then, the study of the relationship between the parameters a and b in the Zipf-Alekseev function shows that the self-organizing regularity observed in other languages also exists in Zhuang. Finally, the study of the word length-frequency relationship of Zhuang indicates that Zhuang word length is influenced by its frequency, which can be explained by Zipf’s ‘Principle of Least Effort’ and thus follow the law of lexical synergetic subsystem in synergetic linguistics.
{"title":"Word Length Distribution in Zhuang Language","authors":"Aiyun Wei, Qian Lu, Haitao Liu","doi":"10.1080/09296174.2019.1678225","DOIUrl":"https://doi.org/10.1080/09296174.2019.1678225","url":null,"abstract":"ABSTRACT The present study focuses on the word length distribution (WLD) of Zhuang language. The results show that the WLDs of all texts investigated can be described by the Positive Cohen-Poisson model when the word length is measured by the syllable numbers. However, when the word length is measured by the letter numbers, they do not follow any model from the Poisson or Binomial distribution families widely observed in other languages. However, the WLDs of all the Zhuang texts investigated follow the Zipf-Alekseev function either in terms of syllable or letter numbers. Moreover, the research on the WLDs of different Zhuang genres indicates that WLD may not be a sensitive index in distinguishing different Zhuang genres but an effective one in distinguishing different Zhuang styles (spoken or written). Then, the study of the relationship between the parameters a and b in the Zipf-Alekseev function shows that the self-organizing regularity observed in other languages also exists in Zhuang. Finally, the study of the word length-frequency relationship of Zhuang indicates that Zhuang word length is influenced by its frequency, which can be explained by Zipf’s ‘Principle of Least Effort’ and thus follow the law of lexical synergetic subsystem in synergetic linguistics.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"28 1","pages":"195 - 222"},"PeriodicalIF":1.4,"publicationDate":"2019-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2019.1678225","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43491576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-10-23DOI: 10.1080/09296174.2019.1678709
M. Vakulenko
ABSTRACT A new formalism to numerically measure phonetic differences between speech sounds treating feature values of the compared phones as independent parameters that give rise to corresponding Euclidean distances is put forward. The articulatory and acoustic methods within this formalism were compared, where the corresponding results display good agreement. The more reliable and more universal character of the acoustic approach is provided by robust and precise acoustic parameters used therein. The theoretical model and the findings of this article comply also with the experimental phonetic results. The proposed approach contributes to formalization of the procedure of phone comparison and mapping needed for automatic text and speech processing.
{"title":"Calculation of Phonetic Distances between Speech Sounds","authors":"M. Vakulenko","doi":"10.1080/09296174.2019.1678709","DOIUrl":"https://doi.org/10.1080/09296174.2019.1678709","url":null,"abstract":"ABSTRACT A new formalism to numerically measure phonetic differences between speech sounds treating feature values of the compared phones as independent parameters that give rise to corresponding Euclidean distances is put forward. The articulatory and acoustic methods within this formalism were compared, where the corresponding results display good agreement. The more reliable and more universal character of the acoustic approach is provided by robust and precise acoustic parameters used therein. The theoretical model and the findings of this article comply also with the experimental phonetic results. The proposed approach contributes to formalization of the procedure of phone comparison and mapping needed for automatic text and speech processing.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"28 1","pages":"223 - 236"},"PeriodicalIF":1.4,"publicationDate":"2019-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2019.1678709","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45277965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-10-02DOI: 10.1080/09296174.2018.1496537
S. Wallis
ABSTRACT This paper describes a series of statistical meta-tests for comparing independent contingency tables for different types of significant difference. Recognizing when an experiment obtains a significantly different result and when it does not is frequently overlooked in research publication. Papers are frequently published citing ‘p values’ or test scores suggesting a ‘stronger effect’ substituting for sound statistical reasoning. This paper sets out a series of tests that together illustrate the correct approach to this question. These meta-tests permit us to evaluate whether experiments have failed to replicate on new data; whether a particular data source or subcorpus obtains a significantly different result than another; or whether changing experimental parameters obtains a stronger effect. The meta-tests are derived mathematically from the χ2 test and the Wilson score interval, and consist of pairwise ‘point’ tests, ‘homogeneity’ tests and ‘goodness of fit’ tests. Meta-tests for comparing tests with one degree of freedom (e.g. ‘2 × 1ʹ and ‘2 × 2ʹ tests) are generalized to those of arbitrary size. Finally, we compare our approach with a competing approach offered by Zar, which, while straightforward to calculate, turns out to be both less powerful and less robust. (Note: A spreadsheet including all the tests in this paper is publicly available at www.ucl.ac.uk/english-usage/statspapers/2x2-x2-separability.xls.)
{"title":"Comparing χ2 Tables for Separability of Distribution and Effect: Meta-Tests for Comparing Homogeneity and Goodness of Fit Contingency Test Outcomes","authors":"S. Wallis","doi":"10.1080/09296174.2018.1496537","DOIUrl":"https://doi.org/10.1080/09296174.2018.1496537","url":null,"abstract":"ABSTRACT This paper describes a series of statistical meta-tests for comparing independent contingency tables for different types of significant difference. Recognizing when an experiment obtains a significantly different result and when it does not is frequently overlooked in research publication. Papers are frequently published citing ‘p values’ or test scores suggesting a ‘stronger effect’ substituting for sound statistical reasoning. This paper sets out a series of tests that together illustrate the correct approach to this question. These meta-tests permit us to evaluate whether experiments have failed to replicate on new data; whether a particular data source or subcorpus obtains a significantly different result than another; or whether changing experimental parameters obtains a stronger effect. The meta-tests are derived mathematically from the χ2 test and the Wilson score interval, and consist of pairwise ‘point’ tests, ‘homogeneity’ tests and ‘goodness of fit’ tests. Meta-tests for comparing tests with one degree of freedom (e.g. ‘2 × 1ʹ and ‘2 × 2ʹ tests) are generalized to those of arbitrary size. Finally, we compare our approach with a competing approach offered by Zar, which, while straightforward to calculate, turns out to be both less powerful and less robust. (Note: A spreadsheet including all the tests in this paper is publicly available at www.ucl.ac.uk/english-usage/statspapers/2x2-x2-separability.xls.)","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"26 1","pages":"330 - 355"},"PeriodicalIF":1.4,"publicationDate":"2019-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2018.1496537","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44599559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-26DOI: 10.1080/09296174.2019.1663655
Mingyu Wan, A. Fang, Chu-Ren Huang
ABSTRACT Genre characterizes a document differently from a subject that has been the focus of most document retrieval and classification applications. This work hypothesizes a close interaction between syntactic variation and genre differentiation by introspecting stylistic cues in functional and structural aspects beyond word level. It has engineered 14 syntactic feature sets of internal representations for genre classification through Machine Learning devices. Experiment results show significant superiority of fusing structural and lexical features for genre classification (F∆max. = 9.2%, sig. = 0.001), suggesting the effectiveness of incorporating syntactic cues for genre discrimination. In addition, the PCA analysis reports the noun phrases (NP) as the most principle component (66%) for genre variation and prepositional phrases (PP) the second. Particularly, noun phrases with dominant structures of prepositional complements and pronouns functioning as a subject are most effective for identifying printed texts of high formality, while prepositional phrases are useful for identifying speeches of low formality. Error analysis suggests that the phrasal features are particularly useful for classifying four groups of genre classes, i.e. unscripted speech, fiction, news reports, and academic writing, all distributed with distinct structural characteristics, and they demonstrate an incremental degree of formality in the continuum of language complexity.
{"title":"The Discriminativeness of Internal Syntactic Representations in Automatic Genre Classification","authors":"Mingyu Wan, A. Fang, Chu-Ren Huang","doi":"10.1080/09296174.2019.1663655","DOIUrl":"https://doi.org/10.1080/09296174.2019.1663655","url":null,"abstract":"ABSTRACT Genre characterizes a document differently from a subject that has been the focus of most document retrieval and classification applications. This work hypothesizes a close interaction between syntactic variation and genre differentiation by introspecting stylistic cues in functional and structural aspects beyond word level. It has engineered 14 syntactic feature sets of internal representations for genre classification through Machine Learning devices. Experiment results show significant superiority of fusing structural and lexical features for genre classification (F∆max. = 9.2%, sig. = 0.001), suggesting the effectiveness of incorporating syntactic cues for genre discrimination. In addition, the PCA analysis reports the noun phrases (NP) as the most principle component (66%) for genre variation and prepositional phrases (PP) the second. Particularly, noun phrases with dominant structures of prepositional complements and pronouns functioning as a subject are most effective for identifying printed texts of high formality, while prepositional phrases are useful for identifying speeches of low formality. Error analysis suggests that the phrasal features are particularly useful for classifying four groups of genre classes, i.e. unscripted speech, fiction, news reports, and academic writing, all distributed with distinct structural characteristics, and they demonstrate an incremental degree of formality in the continuum of language complexity.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"28 1","pages":"138 - 171"},"PeriodicalIF":1.4,"publicationDate":"2019-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2019.1663655","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43498085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}