Register variation explains stylometric authorship analysis

IF 1 2区 文学 0 LANGUAGE & LINGUISTICS Corpus Linguistics and Linguistic Theory Pub Date : 2023-01-02 DOI:10.1515/cllt-2022-0040
J. Grieve
{"title":"Register variation explains stylometric authorship analysis","authors":"J. Grieve","doi":"10.1515/cllt-2022-0040","DOIUrl":null,"url":null,"abstract":"Abstract For centuries, investigations of disputed authorship have shown that people have unique styles of writing. Given sufficient data, it is generally possible to distinguish between the writings of a small group of authors, for example, through the multivariate analysis of the relative frequencies of common function words. There is, however, no accepted explanation for why this type of stylometric analysis is successful. Authorship analysts often argue that authors write in subtly different dialects, but the analysis of individual words is not licensed by standard theories of sociolinguistic variation. Alternatively, stylometric analysis is consistent with standard theories of register variation. In this paper, I argue that stylometric methods work because authors write in subtly different registers. To support this claim, I present the results of parallel stylometric and multidimensional register analyses of a corpus of newspaper articles written by two columnists. I demonstrate that both analyses not only distinguish between these authors but identify the same underlying patterns of linguistic variation. I therefore propose that register variation, as opposed to dialect variation, provides a basis for explaining these differences and for explaining stylometric analyses of authorship more generally.","PeriodicalId":45605,"journal":{"name":"Corpus Linguistics and Linguistic Theory","volume":"38 1","pages":"47 - 77"},"PeriodicalIF":1.0000,"publicationDate":"2023-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Corpus Linguistics and Linguistic Theory","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1515/cllt-2022-0040","RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 3

Abstract

Abstract For centuries, investigations of disputed authorship have shown that people have unique styles of writing. Given sufficient data, it is generally possible to distinguish between the writings of a small group of authors, for example, through the multivariate analysis of the relative frequencies of common function words. There is, however, no accepted explanation for why this type of stylometric analysis is successful. Authorship analysts often argue that authors write in subtly different dialects, but the analysis of individual words is not licensed by standard theories of sociolinguistic variation. Alternatively, stylometric analysis is consistent with standard theories of register variation. In this paper, I argue that stylometric methods work because authors write in subtly different registers. To support this claim, I present the results of parallel stylometric and multidimensional register analyses of a corpus of newspaper articles written by two columnists. I demonstrate that both analyses not only distinguish between these authors but identify the same underlying patterns of linguistic variation. I therefore propose that register variation, as opposed to dialect variation, provides a basis for explaining these differences and for explaining stylometric analyses of authorship more generally.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
语域变异解释了文体学作者分析
摘要几个世纪以来,对有争议的作者的调查表明,人们有着独特的写作风格。如果有足够的数据,通常可以区分一小群作者的作品,例如,通过对常见虚词相对频率的多元分析。然而,对于为什么这种风格分析是成功的,目前还没有公认的解释。作者分析人士经常认为,作者用微妙不同的方言写作,但对单个单词的分析并没有得到社会语言学变异标准理论的许可。或者,风格分析与语域变异的标准理论是一致的。在这篇论文中,我认为风格计量法之所以有效,是因为作者用微妙的不同语域写作。为了支持这一说法,我对两位专栏作家撰写的报纸文章语料库进行了平行风格分析和多维语域分析。我证明,这两种分析不仅区分了这些作者,而且确定了语言变异的相同潜在模式。因此,我认为语域变异,而不是方言变异,为解释这些差异和更普遍地解释作者的风格分析提供了基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
4.20
自引率
12.50%
发文量
15
期刊介绍: Corpus Linguistics and Linguistic Theory (CLLT) is a peer-reviewed journal publishing high-quality original corpus-based research focusing on theoretically relevant issues in all core areas of linguistic research, or other recognized topic areas. It provides a forum for researchers from different theoretical backgrounds and different areas of interest that share a commitment to the systematic and exhaustive analysis of naturally occurring language. Contributions from all theoretical frameworks are welcome but they should be addressed at a general audience and thus be explicit about their assumptions and discovery procedures and provide sufficient theoretical background to be accessible to researchers from different frameworks. Topics Corpus Linguistics Quantitative Linguistics Phonology Morphology Semantics Syntax Pragmatics.
期刊最新文献
The red dress is cute: why subjective adjectives are more often predicative A corpus-based study on semantic and cognitive features of bei sentences in Mandarin Chinese Verb influence on French wh-placement: a parallel corpus study Idiosyncratic entrenchment: tracing change in constructional schematicity with nested random effects Transfer five ways: applications of multiple distinctive collexeme analysis to the dative alternation in Mandarin Chinese
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1