Abstract For centuries, investigations of disputed authorship have shown that people have unique styles of writing. Given sufficient data, it is generally possible to distinguish between the writings of a small group of authors, for example, through the multivariate analysis of the relative frequencies of common function words. There is, however, no accepted explanation for why this type of stylometric analysis is successful. Authorship analysts often argue that authors write in subtly different dialects, but the analysis of individual words is not licensed by standard theories of sociolinguistic variation. Alternatively, stylometric analysis is consistent with standard theories of register variation. In this paper, I argue that stylometric methods work because authors write in subtly different registers. To support this claim, I present the results of parallel stylometric and multidimensional register analyses of a corpus of newspaper articles written by two columnists. I demonstrate that both analyses not only distinguish between these authors but identify the same underlying patterns of linguistic variation. I therefore propose that register variation, as opposed to dialect variation, provides a basis for explaining these differences and for explaining stylometric analyses of authorship more generally.
{"title":"Register variation explains stylometric authorship analysis","authors":"J. Grieve","doi":"10.1515/cllt-2022-0040","DOIUrl":"https://doi.org/10.1515/cllt-2022-0040","url":null,"abstract":"Abstract For centuries, investigations of disputed authorship have shown that people have unique styles of writing. Given sufficient data, it is generally possible to distinguish between the writings of a small group of authors, for example, through the multivariate analysis of the relative frequencies of common function words. There is, however, no accepted explanation for why this type of stylometric analysis is successful. Authorship analysts often argue that authors write in subtly different dialects, but the analysis of individual words is not licensed by standard theories of sociolinguistic variation. Alternatively, stylometric analysis is consistent with standard theories of register variation. In this paper, I argue that stylometric methods work because authors write in subtly different registers. To support this claim, I present the results of parallel stylometric and multidimensional register analyses of a corpus of newspaper articles written by two columnists. I demonstrate that both analyses not only distinguish between these authors but identify the same underlying patterns of linguistic variation. I therefore propose that register variation, as opposed to dialect variation, provides a basis for explaining these differences and for explaining stylometric analyses of authorship more generally.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2023-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41269648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One way to resolve the actuation problem of metaphorical language change is to provide a statistical profile of metaphorical constructions and generative rules with antecedent conditions. Based on arguments from the view of language as complex systems and the dynamic view of metaphor, this paper argues that metaphorical language change qualifies as a Self-Organized Criticality state and the linguistic expressions of a metaphor can be profiled as a fractal with spatio-temporal correlations. Synchronously, these metaphorical expressions self-organize into a self-similar, scale-invariant fractal that follows a power-law distribution; temporally, long range interdependence constrains the self-organization process by the way of transformation rules that are intrinsic of a language system. This argument is verified in the paper with statistical analyses of twelve randomly selected Chinese verb metaphors in a large-scale diachronic corpus.
{"title":"Metaphorical language change is Self-Organized Criticality","authors":"Xuri Tang, Huifang Ye","doi":"10.1515/cllt-2022-0016","DOIUrl":"https://doi.org/10.1515/cllt-2022-0016","url":null,"abstract":"One way to resolve the actuation problem of metaphorical language change is to provide a statistical profile of metaphorical constructions and generative rules with antecedent conditions. Based on arguments from the view of language as complex systems and the dynamic view of metaphor, this paper argues that metaphorical language change qualifies as a Self-Organized Criticality state and the linguistic expressions of a metaphor can be profiled as a fractal with spatio-temporal correlations. Synchronously, these metaphorical expressions self-organize into a self-similar, scale-invariant fractal that follows a power-law distribution; temporally, long range interdependence constrains the self-organization process by the way of transformation rules that are intrinsic of a language system. This argument is verified in the paper with statistical analyses of twelve randomly selected Chinese verb metaphors in a large-scale diachronic corpus.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138513488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Register variation and corpus linguistics: empirical findings and emerging theories. Special issue introduction of Corpus Linguistics and Linguistic Theory in honor of Douglas Biber","authors":"Jesse Egbert, Bethany Gray, Tove Larsson","doi":"10.1515/cllt-2022-0093","DOIUrl":"https://doi.org/10.1515/cllt-2022-0093","url":null,"abstract":"","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42403128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Several studies have shown that there is considerable cross-genre variation as regards what linguistic units tend to be coordinated by and. While literate, expository writing favors coordination of phrasal units such as noun phrases, coordinated units are more often clausal (e.g., main or subordinate clauses) in speech-related texts. This difference has been attested in studies that focus exclusively on coordination as well as in macro-level studies of co-variation among a large number of linguistic features. However, this register differentiation has increased over time: studies of Early and Late Modern English point to less pronounced differences among registers than those attested in the present-day language. This study fills a gap in research by considering data on coordination by and from the middle of the 20th century, a period that does not belong fully to either Late Modern or Present-Day English, and the late 20th and early 21st century, and thus ties diachronic and synchronic research on register variation in coordination together. We also examine language from films and television in order to complement historical findings for speech-related language with data on registers that arose in the 20th century.
{"title":"Clausal and phrasal coordination in recent American English","authors":"Merja Kytö, Erik Smitterberg","doi":"10.1515/cllt-2022-0035","DOIUrl":"https://doi.org/10.1515/cllt-2022-0035","url":null,"abstract":"Abstract Several studies have shown that there is considerable cross-genre variation as regards what linguistic units tend to be coordinated by and. While literate, expository writing favors coordination of phrasal units such as noun phrases, coordinated units are more often clausal (e.g., main or subordinate clauses) in speech-related texts. This difference has been attested in studies that focus exclusively on coordination as well as in macro-level studies of co-variation among a large number of linguistic features. However, this register differentiation has increased over time: studies of Early and Late Modern English point to less pronounced differences among registers than those attested in the present-day language. This study fills a gap in research by considering data on coordination by and from the middle of the 20th century, a period that does not belong fully to either Late Modern or Present-Day English, and the late 20th and early 21st century, and thus ties diachronic and synchronic research on register variation in coordination together. We also examine language from films and television in order to complement historical findings for speech-related language with data on registers that arose in the 20th century.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42014549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract This article provides an overview of Douglas Biber’s work on register and his central role in establishing register as both an empirical focus and a theoretical construct in corpus linguistics. I identity four general phases of his work. Each has a slightly different emphasis, but each also advances intertwined threads of research that lead to an increased understanding of register variation. Biber’s work has made major contributions to distinct areas within the study of registers, from cross-linguistic speech-writing differences to English grammar, but he has advanced the field especially by integrating the findings from different areas. He has offered conceptualizations of register that account for findings from multiple areas of study, and he continues to refine the conceptualization as he engages in new lines of inquiry today.
{"title":"Register in corpus linguistics: the role and legacy of Douglas Biber","authors":"Susan Conrad","doi":"10.1515/cllt-2022-0032","DOIUrl":"https://doi.org/10.1515/cllt-2022-0032","url":null,"abstract":"Abstract This article provides an overview of Douglas Biber’s work on register and his central role in establishing register as both an empirical focus and a theoretical construct in corpus linguistics. I identity four general phases of his work. Each has a slightly different emphasis, but each also advances intertwined threads of research that lead to an increased understanding of register variation. Biber’s work has made major contributions to distinct areas within the study of registers, from cross-linguistic speech-writing differences to English grammar, but he has advanced the field especially by integrating the findings from different areas. He has offered conceptualizations of register that account for findings from multiple areas of study, and he continues to refine the conceptualization as he engages in new lines of inquiry today.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48474675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-19DOI: 10.48550/arXiv.2211.10709
Xuri Tang, Huifang Ye
Abstract One way to resolve the actuation problem of metaphorical language change is to provide a statistical profile of metaphorical constructions and generative rules with antecedent conditions. Based on arguments from the view of language as complex systems and the dynamic view of metaphor, this paper argues that metaphorical language change qualifies as a Self-Organized Criticality state and the linguistic expressions of a metaphor can be profiled as a fractal with spatio-temporal correlations. Synchronously, these metaphorical expressions self-organize into a self-similar, scale-invariant fractal that follows a power-law distribution; temporally, long range interdependence constrains the self-organization process by the way of transformation rules that are intrinsic of a language system. This argument is verified in the paper with statistical analyses of twelve randomly selected Chinese verb metaphors in a large-scale diachronic corpus.
{"title":"Metaphorical language change is Self-Organized Criticality","authors":"Xuri Tang, Huifang Ye","doi":"10.48550/arXiv.2211.10709","DOIUrl":"https://doi.org/10.48550/arXiv.2211.10709","url":null,"abstract":"Abstract One way to resolve the actuation problem of metaphorical language change is to provide a statistical profile of metaphorical constructions and generative rules with antecedent conditions. Based on arguments from the view of language as complex systems and the dynamic view of metaphor, this paper argues that metaphorical language change qualifies as a Self-Organized Criticality state and the linguistic expressions of a metaphor can be profiled as a fractal with spatio-temporal correlations. Synchronously, these metaphorical expressions self-organize into a self-similar, scale-invariant fractal that follows a power-law distribution; temporally, long range interdependence constrains the self-organization process by the way of transformation rules that are intrinsic of a language system. This argument is verified in the paper with statistical analyses of twelve randomly selected Chinese verb metaphors in a large-scale diachronic corpus.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46217476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract The term negative emotive word refers to those words that, on their own, i.e. without context, have a semantic content that may be associated with negative emotion, but sometimes they lose it partly or wholly. In the literature negative emotive words are mainly discussed within the group of intensifiers, e.g. awfully good. In the present paper, we call this phenomenon polarity loss. At the same time, there is another use of negative emotive words that is rarely discussed in the literature, namely the case where the examined word, despite its negative semantic content, expresses a positive evaluation of the speaker, e.g. brutális alaplap (lit. ‘brutal motherboard’ – ‘high quality motherboard’). We call this phenomenon polarity shift. The aim here is to thoroughly examine the two different phenomena on the basis of the data of a Hungarian speech corpus HuTongue. After an in-depth analysis of the qualitative and quantitative features of negative emotive words, we propose corresponding ways of their meaning representations, using a lexical pragmatic approach and the concept of enantiosemy.
{"title":"“Thank you for the terrific party!” – An analysis of Hungarian negative emotive words","authors":"Martina Katalin Szabó, V. Vincze, Károly Bibok","doi":"10.1515/cllt-2022-0013","DOIUrl":"https://doi.org/10.1515/cllt-2022-0013","url":null,"abstract":"Abstract The term negative emotive word refers to those words that, on their own, i.e. without context, have a semantic content that may be associated with negative emotion, but sometimes they lose it partly or wholly. In the literature negative emotive words are mainly discussed within the group of intensifiers, e.g. awfully good. In the present paper, we call this phenomenon polarity loss. At the same time, there is another use of negative emotive words that is rarely discussed in the literature, namely the case where the examined word, despite its negative semantic content, expresses a positive evaluation of the speaker, e.g. brutális alaplap (lit. ‘brutal motherboard’ – ‘high quality motherboard’). We call this phenomenon polarity shift. The aim here is to thoroughly examine the two different phenomena on the basis of the data of a Hungarian speech corpus HuTongue. After an in-depth analysis of the qualitative and quantitative features of negative emotive words, we propose corresponding ways of their meaning representations, using a lexical pragmatic approach and the concept of enantiosemy.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41505625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract In this paper, we operationalize register differences at the intersection of formality and mode, and distinguish four broad register categories: spoken informal (conversations), spoken formal (parliamentary debates), written informal (blogs), and written formal (newspaper articles). We are specifically interested in the comparative probabilistic/variationist complexity of these registers – when speakers have grammatical choices, are the probabilistic grammars regulating these choices more or less complex in particular registers than in others? Based on multivariate modeling of richly annotated datasets covering three grammatical alternations in two languages (English and Dutch), we assess the complexity of probabilistic grammars by drawing on three criteria: (a) the number of constraints on variant choice, (b) the number of interactions between constraints, and (c) the relative importance of lexical conditioning. Analysis shows that contrary to theorizing in variationist sociolinguistics, probabilistic complexity differences between registers are not quantitatively simple: formal registers are consistently the most complex ones, while spoken registers are the least complex ones. The most complex register under study is written-formal quality newspaper writing. We submit that the complexity differentials we uncover are a function of acquisitional difficulty, of on-line processing limitations, and of normative pressures.
{"title":"A variationist perspective on the comparative complexity of four registers at the intersection of mode and formality","authors":"Benedikt Szmrecsanyi, Alexandra Engel","doi":"10.1515/cllt-2022-0031","DOIUrl":"https://doi.org/10.1515/cllt-2022-0031","url":null,"abstract":"Abstract In this paper, we operationalize register differences at the intersection of formality and mode, and distinguish four broad register categories: spoken informal (conversations), spoken formal (parliamentary debates), written informal (blogs), and written formal (newspaper articles). We are specifically interested in the comparative probabilistic/variationist complexity of these registers – when speakers have grammatical choices, are the probabilistic grammars regulating these choices more or less complex in particular registers than in others? Based on multivariate modeling of richly annotated datasets covering three grammatical alternations in two languages (English and Dutch), we assess the complexity of probabilistic grammars by drawing on three criteria: (a) the number of constraints on variant choice, (b) the number of interactions between constraints, and (c) the relative importance of lexical conditioning. Analysis shows that contrary to theorizing in variationist sociolinguistics, probabilistic complexity differences between registers are not quantitatively simple: formal registers are consistently the most complex ones, while spoken registers are the least complex ones. The most complex register under study is written-formal quality newspaper writing. We submit that the complexity differentials we uncover are a function of acquisitional difficulty, of on-line processing limitations, and of normative pressures.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42585882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Register studies have focused on accounting for linguistic variation between culturally recognized register categories. This comparative approach to register has consistently demonstrated that culturally recognized register categories can predict language variation at all linguistic levels. Nevertheless, it has also been shown by previous research that even the most well-established register categories have substantial internal linguistic variation. We propose that at least some of this unexplained variance could be the result of how a text is defined, as well as whether and how researchers account for situational variables within registers. We present four case studies that explore the extent to which linguistic variation within registers is influenced by the definition of the textual unit and the situational parameters. We show that the functional correspondence between situation and language use exists even within register categories and discuss the theoretical and methodological implications of these findings for register research.
{"title":"Linguistic variation within registers: granularity in textual units and situational parameters","authors":"Jesse Egbert, M. Gracheva","doi":"10.1515/cllt-2022-0034","DOIUrl":"https://doi.org/10.1515/cllt-2022-0034","url":null,"abstract":"Abstract Register studies have focused on accounting for linguistic variation between culturally recognized register categories. This comparative approach to register has consistently demonstrated that culturally recognized register categories can predict language variation at all linguistic levels. Nevertheless, it has also been shown by previous research that even the most well-established register categories have substantial internal linguistic variation. We propose that at least some of this unexplained variance could be the result of how a text is defined, as well as whether and how researchers account for situational variables within registers. We present four case studies that explore the extent to which linguistic variation within registers is influenced by the definition of the textual unit and the situational parameters. We show that the functional correspondence between situation and language use exists even within register categories and discuss the theoretical and methodological implications of these findings for register research.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43830842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract This study applies a corpus-based quantitative approach to the word order typology and linguistic theories about word order in several genetically unrelated language varieties in northwestern Iran, such as Mukri Kurdish, Northeastern Kurdish and Armenian (Indo-European), Jewish Neo-Aramaic (Semitic), and Azeri Turkic (Turkish). Despite the difference in the default position of the direct object, the existing corpora of published and personal field data of narrative free speech demonstrate that these languages share the clause-final position of Targets predominantly (e.g., physical and metaphorical goals, recipients, addressees, and resultant-states) in their word order. Yet, Targets are more flexible in Mukri Kurdish, Northeastern Neo-Aramaic, and Azeri Turkic, whereas they are less flexible in Armenian and Northeastern Kurdish. Among various factors relevant to the placement of Targets, morphosyntactic features such as parts of speech exhibit constraints and clear preferences in the pre- and postverbal placement of Targets.
{"title":"Parts of speech and the placement of Targets in the corpus of languages in northwestern Iran","authors":"H. Asadpour","doi":"10.1515/cllt-2022-0001","DOIUrl":"https://doi.org/10.1515/cllt-2022-0001","url":null,"abstract":"Abstract This study applies a corpus-based quantitative approach to the word order typology and linguistic theories about word order in several genetically unrelated language varieties in northwestern Iran, such as Mukri Kurdish, Northeastern Kurdish and Armenian (Indo-European), Jewish Neo-Aramaic (Semitic), and Azeri Turkic (Turkish). Despite the difference in the default position of the direct object, the existing corpora of published and personal field data of narrative free speech demonstrate that these languages share the clause-final position of Targets predominantly (e.g., physical and metaphorical goals, recipients, addressees, and resultant-states) in their word order. Yet, Targets are more flexible in Mukri Kurdish, Northeastern Neo-Aramaic, and Azeri Turkic, whereas they are less flexible in Armenian and Northeastern Kurdish. Among various factors relevant to the placement of Targets, morphosyntactic features such as parts of speech exhibit constraints and clear preferences in the pre- and postverbal placement of Targets.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49501305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}