Quantification and identification of authorial writing style through higher-order text network modeling and analysis

IF 3.4 2区 管理学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Journal of Informetrics Pub Date : 2024-11-19 DOI:10.1016/j.joi.2024.101603
Hongzhong Deng, Chengxing Wu, Bingfeng Ge, Hongqian Wu
{"title":"Quantification and identification of authorial writing style through higher-order text network modeling and analysis","authors":"Hongzhong Deng,&nbsp;Chengxing Wu,&nbsp;Bingfeng Ge,&nbsp;Hongqian Wu","doi":"10.1016/j.joi.2024.101603","DOIUrl":null,"url":null,"abstract":"<div><div>Determining the true author of anonymized texts has important applications ranging from text classification and information extraction to forensic investigations. Despite substantial progress, current authorship identification solutions are limited to extracting straightforward semantic relationships in writing styles, lacking consideration for higher-order features among multiple vocabulary, phrases, or sentences in language structure. Here, we propose a novel approach based on hypernetwork theory to encode higher-order text features into a unified text hyper-network and investigate whether the hyper-order topological features of the text hyper-network contribute to revealing the author's stylistic preferences. Our results indicate that metrics of the text hyper-network, such as hyperdegree, average shortest path length, and intermittency, can capture more information about the author's writing styles. More importantly, in the author identification task of 170 novels, our method accurately distinguished the authorship of 81% of the novels, surpassing the accuracy of the method of using paired word relationships. This further highlights the importance of higher-order features in text analysis, beyond mere pairwise interactions of words.</div></div>","PeriodicalId":48662,"journal":{"name":"Journal of Informetrics","volume":"19 1","pages":"Article 101603"},"PeriodicalIF":3.4000,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Informetrics","FirstCategoryId":"91","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1751157724001159","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Determining the true author of anonymized texts has important applications ranging from text classification and information extraction to forensic investigations. Despite substantial progress, current authorship identification solutions are limited to extracting straightforward semantic relationships in writing styles, lacking consideration for higher-order features among multiple vocabulary, phrases, or sentences in language structure. Here, we propose a novel approach based on hypernetwork theory to encode higher-order text features into a unified text hyper-network and investigate whether the hyper-order topological features of the text hyper-network contribute to revealing the author's stylistic preferences. Our results indicate that metrics of the text hyper-network, such as hyperdegree, average shortest path length, and intermittency, can capture more information about the author's writing styles. More importantly, in the author identification task of 170 novels, our method accurately distinguished the authorship of 81% of the novels, surpassing the accuracy of the method of using paired word relationships. This further highlights the importance of higher-order features in text analysis, beyond mere pairwise interactions of words.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过高阶文本网络建模和分析量化和识别作者的写作风格
确定匿名文本的真正作者具有重要的应用价值,从文本分类、信息提取到法医调查,不一而足。尽管取得了长足进步,但目前的作者身份识别解决方案仅限于提取写作风格中的直接语义关系,缺乏对语言结构中多个词汇、短语或句子之间的高阶特征的考虑。在此,我们提出了一种基于超网络理论的新方法,将高阶文本特征编码到统一的文本超网络中,并研究文本超网络的超阶拓扑特征是否有助于揭示作者的文体偏好。我们的研究结果表明,文本超网络的度量指标,如超度、平均最短路径长度和间歇性,可以捕捉到更多有关作者写作风格的信息。更重要的是,在 170 篇小说的作者识别任务中,我们的方法准确地区分了 81% 的小说的作者,准确率超过了使用成对词语关系的方法。这进一步凸显了高阶特征在文本分析中的重要性,而不仅仅是词与词之间的配对交互作用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Informetrics
Journal of Informetrics Social Sciences-Library and Information Sciences
CiteScore
6.40
自引率
16.20%
发文量
95
期刊介绍: Journal of Informetrics (JOI) publishes rigorous high-quality research on quantitative aspects of information science. The main focus of the journal is on topics in bibliometrics, scientometrics, webometrics, patentometrics, altmetrics and research evaluation. Contributions studying informetric problems using methods from other quantitative fields, such as mathematics, statistics, computer science, economics and econometrics, and network science, are especially encouraged. JOI publishes both theoretical and empirical work. In general, case studies, for instance a bibliometric analysis focusing on a specific research field or a specific country, are not considered suitable for publication in JOI, unless they contain innovative methodological elements.
期刊最新文献
A comprehensive comparative analysis of publication monopoly phenomenon in scientific journals Leveraging patent classification based on deep learning: The case study on smart cities and industrial Internet of Things Citation counts and inclusion of references in seven free-access scholarly databases: A comparative analysis Linkages among science, technology, and industry on the basis of main path analysis Gender differences in dropout rate: From field, career status, and generation perspectives
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1