Identifying Author in Bengali Literature by Bi-LSTM with Attention Mechanism

Ibrahim Al Azhar, Sohel Ahmed, Md Saiful Islam, Aisha Khatun
{"title":"Identifying Author in Bengali Literature by Bi-LSTM with Attention Mechanism","authors":"Ibrahim Al Azhar, Sohel Ahmed, Md Saiful Islam, Aisha Khatun","doi":"10.1109/ICCIT54785.2021.9689840","DOIUrl":null,"url":null,"abstract":"Authorship Attribution is the task of determining the author of an unknown text using one’s writing patterns. It is a well-established task for high-resource languages like English, but it is challenging for low-resource languages like Bengali. In this paper, we propose a Bi-directional Long Short Term Memory(Bi-LSTM) model with self-attention mechanism to address this problem. GloVe embedding vectors encode the semantic and syntactic knowledge of words, which are then fed into the Bi-LSTM models. Moreover, attention mechanism enhances the model’s ability to learn the complex linguistics patterns through learnable parameters, which gives lower weights to common words and higher weights to keywords that capture an author’s stylistic components. It improves performance extract contextual features. We evaluate our model on multiple datasets and experiment with various architectures. Our proposed model outperforms the state-of-the-art model by 12.14%-20.24% in the BAAD6 author dataset, 1.05% - 7.34% in the BAAD16 author dataset, with best performance accuracy of 97.99%. The experimental results demonstrate that the Bi-LSTM model’s attention mechanism notably boosts performance. (The source code are shared as free tools at https://github.com/IbrahimAlAzhar/AuthorshipAttribution)","PeriodicalId":166450,"journal":{"name":"2021 24th International Conference on Computer and Information Technology (ICCIT)","volume":"182 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 24th International Conference on Computer and Information Technology (ICCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIT54785.2021.9689840","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Authorship Attribution is the task of determining the author of an unknown text using one’s writing patterns. It is a well-established task for high-resource languages like English, but it is challenging for low-resource languages like Bengali. In this paper, we propose a Bi-directional Long Short Term Memory(Bi-LSTM) model with self-attention mechanism to address this problem. GloVe embedding vectors encode the semantic and syntactic knowledge of words, which are then fed into the Bi-LSTM models. Moreover, attention mechanism enhances the model’s ability to learn the complex linguistics patterns through learnable parameters, which gives lower weights to common words and higher weights to keywords that capture an author’s stylistic components. It improves performance extract contextual features. We evaluate our model on multiple datasets and experiment with various architectures. Our proposed model outperforms the state-of-the-art model by 12.14%-20.24% in the BAAD6 author dataset, 1.05% - 7.34% in the BAAD16 author dataset, with best performance accuracy of 97.99%. The experimental results demonstrate that the Bi-LSTM model’s attention mechanism notably boosts performance. (The source code are shared as free tools at https://github.com/IbrahimAlAzhar/AuthorshipAttribution)
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于注意机制的Bi-LSTM识别孟加拉文学作者
作者归属是使用一个人的写作模式来确定未知文本的作者的任务。对于像英语这样资源丰富的语言来说,这是一个既定的任务,但对于像孟加拉语这样资源贫乏的语言来说,这是一个挑战。本文提出了一种具有自注意机制的双向长短期记忆模型来解决这一问题。GloVe嵌入向量对单词的语义和句法知识进行编码,然后将其输入到Bi-LSTM模型中。此外,注意机制通过可学习的参数增强了模型学习复杂语言模式的能力,对常用词赋予较低的权重,对捕捉作者文体成分的关键词赋予较高的权重。它提高了提取上下文特性的性能。我们在多个数据集上评估我们的模型,并在各种架构上进行实验。该模型在BAAD6作者数据集上的性能优于现有模型12.14% ~ 20.24%,在BAAD16作者数据集上的性能优于现有模型1.05% ~ 7.34%,最佳性能准确率为97.99%。实验结果表明,Bi-LSTM模型的注意机制显著提高了性能。(源代码作为免费工具在https://github.com/IbrahimAlAzhar/AuthorshipAttribution上共享)
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
The Eigenvalue Distribution of Hankel Matrix: A Tool for Spectral Estimation From Noisy Data Demystify the Black-box of Deep Learning Models for COVID-19 Detection from Chest CT Radiographs Machine Learning Techniques to Precaution of Emerging Disease in the Poultry Industry A Framework for Multi-party Skyline Query Maintaining Privacy and Data Integrity Application of Feature based Face Detection in Adaptive Skin Pixel Identification Using Signal Processing Techniques
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1