Principal components analysis in stylometry

IF 0.7 3区 文学 0 HUMANITIES, MULTIDISCIPLINARY Digital Scholarship in the Humanities Pub Date : 2023-11-29 DOI:10.1093/llc/fqad083
Hugh Craig
{"title":"Principal components analysis in stylometry","authors":"Hugh Craig","doi":"10.1093/llc/fqad083","DOIUrl":null,"url":null,"abstract":"Principal components analysis (PCA) has been one of the staple methods used in stylometry. In a 2021 article, Pervez Rizvi casts doubt on this method and argues that some widely cited results based on it should be set aside. In the current article, I show that none of Rizvi’s theoretical claims or experimental results stand up to examination. Rizvi argues that discarding the principal components beyond the first two makes the method unreliable, but permutation testing of PCAs shows that the top components in these trials are significant and robust, and the results across many experiments show the combination of the first and second component to be effective in classification. Rizvi argues that PCA components must be treated separately, and much of his critique of the PCA method is based on this standpoint, but this is not the practice in the work presented in the publications he cites or in the wider literature. Rizvi is unable to replicate a chart in an article by Craig, but his replication, unlike the original, does not account for the widely varying sizes of samples in his data. The current article shows that Rizvi’s claims are misguided and that using PCA in the Burrows tradition to find and formalize authorial discriminations in text samples from plays of the Shakespearean era is efficacious and robust.","PeriodicalId":45315,"journal":{"name":"Digital Scholarship in the Humanities","volume":"37 9","pages":""},"PeriodicalIF":0.7000,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Scholarship in the Humanities","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1093/llc/fqad083","RegionNum":3,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"HUMANITIES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Principal components analysis (PCA) has been one of the staple methods used in stylometry. In a 2021 article, Pervez Rizvi casts doubt on this method and argues that some widely cited results based on it should be set aside. In the current article, I show that none of Rizvi’s theoretical claims or experimental results stand up to examination. Rizvi argues that discarding the principal components beyond the first two makes the method unreliable, but permutation testing of PCAs shows that the top components in these trials are significant and robust, and the results across many experiments show the combination of the first and second component to be effective in classification. Rizvi argues that PCA components must be treated separately, and much of his critique of the PCA method is based on this standpoint, but this is not the practice in the work presented in the publications he cites or in the wider literature. Rizvi is unable to replicate a chart in an article by Craig, but his replication, unlike the original, does not account for the widely varying sizes of samples in his data. The current article shows that Rizvi’s claims are misguided and that using PCA in the Burrows tradition to find and formalize authorial discriminations in text samples from plays of the Shakespearean era is efficacious and robust.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
文体学中的主成分分析
主成分分析(PCA)一直是文体学中常用的主要方法之一。在2021年的一篇文章中,Pervez Rizvi对这种方法表示怀疑,并认为一些被广泛引用的基于这种方法的结果应该被搁置一边。在这篇文章中,我指出里兹维的理论主张和实验结果都经不起检验。Rizvi认为,在前两个成分之外丢弃主成分会使该方法不可靠,但pca的排列测试表明,在这些试验中,最重要的成分是显著的和稳健的,许多实验的结果表明,第一个和第二个成分的组合在分类中是有效的。Rizvi认为PCA成分必须单独处理,他对PCA方法的许多批评都是基于这一立场,但这并不是他引用的出版物或更广泛的文献中提出的工作实践。里兹维无法复制克雷格文章中的图表,但他的复制与原版不同,没有考虑到他的数据中样本大小的广泛差异。当前的文章表明,Rizvi的主张是错误的,使用Burrows传统的PCA来发现和形式化莎士比亚时代戏剧文本样本中的作者歧视是有效和稳健的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
1.80
自引率
25.00%
发文量
78
期刊介绍: DSH or Digital Scholarship in the Humanities is an international, peer reviewed journal which publishes original contributions on all aspects of digital scholarship in the Humanities including, but not limited to, the field of what is currently called the Digital Humanities. Long and short papers report on theoretical, methodological, experimental, and applied research and include results of research projects, descriptions and evaluations of tools, techniques, and methodologies, and reports on work in progress. DSH also publishes reviews of books and resources. Digital Scholarship in the Humanities was previously known as Literary and Linguistic Computing.
期刊最新文献
Social network analysis of the Babylonian Talmud Ancient classical theatre from the digital humanities: a systematic review 2010–21 Language-based machine perception: linguistic perspectives on the compilation of captioning datasets Personality prediction via multi-task transformer architecture combined with image aesthetics Who wrote the first Constitutions of Freemasonry?
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1