AuthorNet: Leveraging attention-based early fusion of transformers for low-resource authorship attribution

IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Expert Systems with Applications Pub Date : 2024-11-04 DOI:10.1016/j.eswa.2024.125643
Md. Rajib Hossain , Mohammed Moshiul Hoque , M. Ali Akber Dewan , Enamul Hoque , Nazmul Siddique
{"title":"AuthorNet: Leveraging attention-based early fusion of transformers for low-resource authorship attribution","authors":"Md. Rajib Hossain ,&nbsp;Mohammed Moshiul Hoque ,&nbsp;M. Ali Akber Dewan ,&nbsp;Enamul Hoque ,&nbsp;Nazmul Siddique","doi":"10.1016/j.eswa.2024.125643","DOIUrl":null,"url":null,"abstract":"<div><div>Authorship Attribution (AA) is crucial for identifying the author of a given text from a pool of suspects, especially with the widespread use of the internet and electronic devices. However, most AA research has primarily focused on high-resource languages like English, leaving low-resource languages such as Bengali relatively unexplored. Challenges faced in this domain include the absence of benchmark corpora, a lack of context-aware feature extractors, limited availability of tuned hyperparameters, and OOV issues. To address these challenges, this study introduces AuthorNet for authorship attribution using attention-based early fusion of transformer-based language models, i.e., concatenation of an embeddings output of two existing models that were fine-tuned. AuthorNet consists of three key modules: Feature extraction, Fine-tuning and selection of best-performing models, and Attention-based early fusion. To evaluate the performance of AuthorNet, a number of experiments using four benchmark corpora have been conducted. The results demonstrated exceptional accuracy: 98.86 ± 0.01%, 99.49 ± 0.01%, 97.91 ± 0.01%, and 99.87 ± 0.01% for four corpora. Notably, AuthorNet outperformed all foundation models, achieving accuracy improvements ranging from 0.24% to 2.92% across the four corpora.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"262 ","pages":"Article 125643"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417424025107","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Authorship Attribution (AA) is crucial for identifying the author of a given text from a pool of suspects, especially with the widespread use of the internet and electronic devices. However, most AA research has primarily focused on high-resource languages like English, leaving low-resource languages such as Bengali relatively unexplored. Challenges faced in this domain include the absence of benchmark corpora, a lack of context-aware feature extractors, limited availability of tuned hyperparameters, and OOV issues. To address these challenges, this study introduces AuthorNet for authorship attribution using attention-based early fusion of transformer-based language models, i.e., concatenation of an embeddings output of two existing models that were fine-tuned. AuthorNet consists of three key modules: Feature extraction, Fine-tuning and selection of best-performing models, and Attention-based early fusion. To evaluate the performance of AuthorNet, a number of experiments using four benchmark corpora have been conducted. The results demonstrated exceptional accuracy: 98.86 ± 0.01%, 99.49 ± 0.01%, 97.91 ± 0.01%, and 99.87 ± 0.01% for four corpora. Notably, AuthorNet outperformed all foundation models, achieving accuracy improvements ranging from 0.24% to 2.92% across the four corpora.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
作者网:利用基于注意力的早期融合转换器实现低资源作者归属
作者归属(AA)对于从众多嫌疑人中识别特定文本的作者至关重要,尤其是在互联网和电子设备广泛使用的情况下。然而,大多数作者署名研究主要集中在英语等高资源语言上,孟加拉语等低资源语言则相对较少。该领域面临的挑战包括缺乏基准语料库、缺乏上下文感知特征提取器、可调谐超参数的可用性有限以及 OOV 问题。为了应对这些挑战,本研究引入了 AuthorNet,利用基于注意力的早期融合转换器语言模型(即连接两个经过微调的现有模型的嵌入输出)来实现作者归属。AuthorNet 由三个关键模块组成:特征提取、微调和选择性能最佳的模型,以及基于注意力的早期融合。为了评估 AuthorNet 的性能,我们使用四个基准语料库进行了大量实验。结果表明,AuthorNet 的准确率非常高:四个语料库的准确率分别为 98.86 ± 0.01%、99.49 ± 0.01%、97.91 ± 0.01% 和 99.87 ± 0.01%。值得注意的是,AuthorNet 的表现优于所有基础模型,在四个语料库中的准确率提高了 0.24% 到 2.92%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Expert Systems with Applications
Expert Systems with Applications 工程技术-工程:电子与电气
CiteScore
13.80
自引率
10.60%
发文量
2045
审稿时长
8.7 months
期刊介绍: Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.
期刊最新文献
Towards optimal control of HPV model using safe reinforcement learning with actor–critic neural networks Deep hashing with mutual information: A comprehensive strategy for image retrieval Maximizing failure occurrence in water distribution Systems: A Multi-Objective approach considering Reliability, Economic, and environmental aspects Multi-objective grey wolf optimizer based on reinforcement learning for distributed hybrid flowshop scheduling towards mass personalized manufacturing Multi-Context enhanced Lane-Changing prediction using a heterogeneous Graph Neural Network
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1