Jie Song, Yunhua Qu, Xiaonan Zhu, Xiaoying Wang, Yifan Zhang
Multi-dimensional Analysis (MD) is a quantitative corpus-based approach which describes and interprets patterns of register variations through factor analysis of a set of linguistic fea-tures across text varieties, and reveals their systematic relationships with communicative purposes. The model has been employed to explore language variation in many languages (e.g., English, Somali, Nukulaelae Tuvaluan, Korean, and Spanish), yet insufficient research has been carried out on register variation in Mandarin Chinese on a full scale. In this research, 88 linguistic features are tagged in a balanced corpus composed of 20 Mandarin Chinese spoken and written registers. Through factor analysis, five dimensions which consist of 65 linguistic features are identified and interpreted from linguistic and functional perspectives. The first two dimensions, interactive vs. informational discourse and narrative vs. non-narrative concern, are similar to dimensions that have been claimed to constitute universal parameters of register variation in previous MD studies. The exist-ence of two potential universal dimensions suggests that the basic communicative purposes and functions underlying the different languages are markedly similar, given the existing social, cultural, and linguistic dissimilarities. Dimension 4, casual real-time speech with stance, is identified as a distinctive dimension in Mandarin Chinese. Dimension 3, explicit-ness in cohesion and reasoning, and Dimension 5, abstract information, are found to be as-sociated with foreign influence, and their register variation patterns illustrate how foreign contact affects Chinese register variation in a quantitative manner.
{"title":"A Multi-dimensional Approach to Register Variations in Mandarin Chinese","authors":"Jie Song, Yunhua Qu, Xiaonan Zhu, Xiaoying Wang, Yifan Zhang","doi":"10.53482/2021_51_393","DOIUrl":"https://doi.org/10.53482/2021_51_393","url":null,"abstract":"Multi-dimensional Analysis (MD) is a quantitative corpus-based approach which describes and interprets patterns of register variations through factor analysis of a set of linguistic fea-tures across text varieties, and reveals their systematic relationships with communicative purposes. The model has been employed to explore language variation in many languages (e.g., English, Somali, Nukulaelae Tuvaluan, Korean, and Spanish), yet insufficient research has been carried out on register variation in Mandarin Chinese on a full scale. In this research, 88 linguistic features are tagged in a balanced corpus composed of 20 Mandarin Chinese spoken and written registers. Through factor analysis, five dimensions which consist of 65 linguistic features are identified and interpreted from linguistic and functional perspectives. The first two dimensions, interactive vs. informational discourse and narrative vs. non-narrative concern, are similar to dimensions that have been claimed to constitute universal parameters of register variation in previous MD studies. The exist-ence of two potential universal dimensions suggests that the basic communicative purposes and functions underlying the different languages are markedly similar, given the existing social, cultural, and linguistic dissimilarities. Dimension 4, casual real-time speech with stance, is identified as a distinctive dimension in Mandarin Chinese. Dimension 3, explicit-ness in cohesion and reasoning, and Dimension 5, abstract information, are found to be as-sociated with foreign influence, and their register variation patterns illustrate how foreign contact affects Chinese register variation in a quantitative manner.","PeriodicalId":51918,"journal":{"name":"Glottometrics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76154281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The article is devoted to the study of the stability and variability of part of speech structures in the collections of lyrical poems by B. Pasternak, the Nobel Prize winner for literature. The analysis is based on the methodology proposed by Gabriel Altmann in his studies. The database includes 7 collections of Pasternak’s lyrics, published by him over the period of more than 40 years. The study was carried out on the material of both individual poems and the framework of entire collections. The results obtained showed that in Pasternak's lyrics, nominality of texts is very high. Within the framework of each separate collection a high stability of the general structure of parts of speech was observed. Dynamic description was found to prevail over static description. It was found that both types of description are guided by the tendency to compensation when the growth of one of them causes a decrease in the other. It was discovered that the distribution of parts of speech within each collection of lyrics is very well fitted by the Zipf-Alekseev function. Using the Euclidean distances between the collections of lyrical poems, published during different periods of the author’s creative work, assumptions were made about possible stages of the author's style evolution.
{"title":"Pasternak lyrics: part of speech structure","authors":"S. Andreev","doi":"10.53482/2021_51_391","DOIUrl":"https://doi.org/10.53482/2021_51_391","url":null,"abstract":"The article is devoted to the study of the stability and variability of part of speech structures in the collections of lyrical poems by B. Pasternak, the Nobel Prize winner for literature. The analysis is based on the methodology proposed by Gabriel Altmann in his studies. The database includes 7 collections of Pasternak’s lyrics, published by him over the period of more than 40 years. The study was carried out on the material of both individual poems and the framework of entire collections. The results obtained showed that in Pasternak's lyrics, nominality of texts is very high. Within the framework of each separate collection a high stability of the general structure of parts of speech was observed. Dynamic description was found to prevail over static description. It was found that both types of description are guided by the tendency to compensation when the growth of one of them causes a decrease in the other. It was discovered that the distribution of parts of speech within each collection of lyrics is very well fitted by the Zipf-Alekseev function. Using the Euclidean distances between the collections of lyrical poems, published during different periods of the author’s creative work, assumptions were made about possible stages of the author's style evolution.","PeriodicalId":51918,"journal":{"name":"Glottometrics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86404889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carlos Gómez-Rodríguez, Morten H. Christiansen, R. Ferrer-i-Cancho
The ability to produce and understand an unlimited number of different sentences is a hallmark of human language. Linguists have sought to define the essence of this generative capacity using formal grammars that describe the syntactic dependencies between constituents, independent of the computational limitations of the human brain. Here, we evaluate this independence assumption by sampling sentences uniformly from the space of possible syntactic structures. We find that the average dependency distance between syntactically related words, a proxy for memory limitations, is less than expected by chance in a collection of state-of-the-art classes of dependency grammars. Our findings indicate that memory limitations have permeated grammatical descriptions, suggesting that it may be impossible to build a parsimonious theory of human linguistic productivity independent of non-linguistic cognitive constraints.
{"title":"Memory limitations are hidden in grammar","authors":"Carlos Gómez-Rodríguez, Morten H. Christiansen, R. Ferrer-i-Cancho","doi":"10.53482/2022_52_397","DOIUrl":"https://doi.org/10.53482/2022_52_397","url":null,"abstract":"The ability to produce and understand an unlimited number of different sentences is a hallmark of human language. Linguists have sought to define the essence of this generative capacity using formal grammars that describe the syntactic dependencies between constituents, independent of the computational limitations of the human brain. Here, we evaluate this independence assumption by sampling sentences uniformly from the space of possible syntactic structures. We find that the average dependency distance between syntactically related words, a proxy for memory limitations, is less than expected by chance in a collection of state-of-the-art classes of dependency grammars. Our findings indicate that memory limitations have permeated grammatical descriptions, suggesting that it may be impossible to build a parsimonious theory of human linguistic productivity independent of non-linguistic cognitive constraints.","PeriodicalId":51918,"journal":{"name":"Glottometrics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78579716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}