Register variation explains stylometric authorship analysis

IF 1.7 2区文学 0 LANGUAGE & LINGUISTICS Corpus Linguistics and Linguistic Theory Pub Date : 2023-01-02 DOI:10.1515/cllt-2022-0040

J. Grieve

{"title":"Register variation explains stylometric authorship analysis","authors":"J. Grieve","doi":"10.1515/cllt-2022-0040","DOIUrl":null,"url":null,"abstract":"Abstract For centuries, investigations of disputed authorship have shown that people have unique styles of writing. Given sufficient data, it is generally possible to distinguish between the writings of a small group of authors, for example, through the multivariate analysis of the relative frequencies of common function words. There is, however, no accepted explanation for why this type of stylometric analysis is successful. Authorship analysts often argue that authors write in subtly different dialects, but the analysis of individual words is not licensed by standard theories of sociolinguistic variation. Alternatively, stylometric analysis is consistent with standard theories of register variation. In this paper, I argue that stylometric methods work because authors write in subtly different registers. To support this claim, I present the results of parallel stylometric and multidimensional register analyses of a corpus of newspaper articles written by two columnists. I demonstrate that both analyses not only distinguish between these authors but identify the same underlying patterns of linguistic variation. I therefore propose that register variation, as opposed to dialect variation, provides a basis for explaining these differences and for explaining stylometric analyses of authorship more generally.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":"38 1","pages":"47 - 77"},"PeriodicalIF":1.7000,"publicationDate":"2023-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Corpus Linguistics and Linguistic Theory","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1515/cllt-2022-0040","RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}

引用次数: 3

Abstract

Abstract For centuries, investigations of disputed authorship have shown that people have unique styles of writing. Given sufficient data, it is generally possible to distinguish between the writings of a small group of authors, for example, through the multivariate analysis of the relative frequencies of common function words. There is, however, no accepted explanation for why this type of stylometric analysis is successful. Authorship analysts often argue that authors write in subtly different dialects, but the analysis of individual words is not licensed by standard theories of sociolinguistic variation. Alternatively, stylometric analysis is consistent with standard theories of register variation. In this paper, I argue that stylometric methods work because authors write in subtly different registers. To support this claim, I present the results of parallel stylometric and multidimensional register analyses of a corpus of newspaper articles written by two columnists. I demonstrate that both analyses not only distinguish between these authors but identify the same underlying patterns of linguistic variation. I therefore propose that register variation, as opposed to dialect variation, provides a basis for explaining these differences and for explaining stylometric analyses of authorship more generally.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

语域变异解释了文体学作者分析

摘要几个世纪以来，对有争议的作者的调查表明，人们有着独特的写作风格。如果有足够的数据，通常可以区分一小群作者的作品，例如，通过对常见虚词相对频率的多元分析。然而，对于为什么这种风格分析是成功的，目前还没有公认的解释。作者分析人士经常认为，作者用微妙不同的方言写作，但对单个单词的分析并没有得到社会语言学变异标准理论的许可。或者，风格分析与语域变异的标准理论是一致的。在这篇论文中，我认为风格计量法之所以有效，是因为作者用微妙的不同语域写作。为了支持这一说法，我对两位专栏作家撰写的报纸文章语料库进行了平行风格分析和多维语域分析。我证明，这两种分析不仅区分了这些作者，而且确定了语言变异的相同潜在模式。因此，我认为语域变异，而不是方言变异，为解释这些差异和更普遍地解释作者的风格分析提供了基础。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Corpus Linguistics and Linguistic Theory Multiple-

CiteScore

4.20

自引率

12.50%

发文量

期刊介绍： Corpus Linguistics and Linguistic Theory (CLLT) is a peer-reviewed journal publishing high-quality original corpus-based research focusing on theoretically relevant issues in all core areas of linguistic research, or other recognized topic areas. It provides a forum for researchers from different theoretical backgrounds and different areas of interest that share a commitment to the systematic and exhaustive analysis of naturally occurring language. Contributions from all theoretical frameworks are welcome but they should be addressed at a general audience and thus be explicit about their assumptions and discovery procedures and provide sufficient theoretical background to be accessible to researchers from different frameworks. Topics Corpus Linguistics Quantitative Linguistics Phonology Morphology Semantics Syntax Pragmatics.