以阿塞拜疆语文本为例,机器学习方法与文本特征在文本作者身份识别中的比较研究

IF 1.8 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Algorithms Pub Date : 2024-06-05 DOI:10.3390/a17060242
Rustam Azimov, Efthimios Providas
{"title":"以阿塞拜疆语文本为例,机器学习方法与文本特征在文本作者身份识别中的比较研究","authors":"Rustam Azimov, Efthimios Providas","doi":"10.3390/a17060242","DOIUrl":null,"url":null,"abstract":"This paper presents various machine learning methods with different text features that are explored and evaluated to determine the authorship of the texts in the example of the Azerbaijani language. We consider techniques like artificial neural network, convolutional neural network, random forest, and support vector machine. These techniques are used with different text features like word length, sentence length, combined word length and sentence length, n-grams, and word frequencies. The models were trained and tested on the works of many famous Azerbaijani writers. The results of computer experiments obtained by utilizing a comparison of various techniques and text features were analyzed. The cases where the usage of text features allowed better results were determined.","PeriodicalId":7636,"journal":{"name":"Algorithms","volume":null,"pages":null},"PeriodicalIF":1.8000,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Comparative Study of Machine Learning Methods and Text Features for Text Authorship Recognition in the Example of Azerbaijani Language Texts\",\"authors\":\"Rustam Azimov, Efthimios Providas\",\"doi\":\"10.3390/a17060242\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents various machine learning methods with different text features that are explored and evaluated to determine the authorship of the texts in the example of the Azerbaijani language. We consider techniques like artificial neural network, convolutional neural network, random forest, and support vector machine. These techniques are used with different text features like word length, sentence length, combined word length and sentence length, n-grams, and word frequencies. The models were trained and tested on the works of many famous Azerbaijani writers. The results of computer experiments obtained by utilizing a comparison of various techniques and text features were analyzed. The cases where the usage of text features allowed better results were determined.\",\"PeriodicalId\":7636,\"journal\":{\"name\":\"Algorithms\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2024-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Algorithms\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/a17060242\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Algorithms","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/a17060242","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

本文以阿塞拜疆语为例,介绍了各种具有不同文本特征的机器学习方法,并对这些方法进行了探讨和评估,以确定文本的作者身份。我们考虑了人工神经网络、卷积神经网络、随机森林和支持向量机等技术。这些技术使用了不同的文本特征,如单词长度、句子长度、单词长度和句子长度的组合、n-grams 和词频。这些模型在许多阿塞拜疆著名作家的作品上进行了训练和测试。通过比较各种技术和文本特征,对计算机实验结果进行了分析。确定了在哪些情况下使用文本特征可以获得更好的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Comparative Study of Machine Learning Methods and Text Features for Text Authorship Recognition in the Example of Azerbaijani Language Texts
This paper presents various machine learning methods with different text features that are explored and evaluated to determine the authorship of the texts in the example of the Azerbaijani language. We consider techniques like artificial neural network, convolutional neural network, random forest, and support vector machine. These techniques are used with different text features like word length, sentence length, combined word length and sentence length, n-grams, and word frequencies. The models were trained and tested on the works of many famous Azerbaijani writers. The results of computer experiments obtained by utilizing a comparison of various techniques and text features were analyzed. The cases where the usage of text features allowed better results were determined.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Algorithms
Algorithms Mathematics-Numerical Analysis
CiteScore
4.10
自引率
4.30%
发文量
394
审稿时长
11 weeks
期刊最新文献
EEG Channel Selection for Stroke Patient Rehabilitation Using BAT Optimizer Classification and Regression of Pinhole Corrosions on Pipelines Based on Magnetic Flux Leakage Signals Using Convolutional Neural Networks The Parallel Machine Scheduling Problem with Different Speeds and Release Times in the Ore Hauling Operation A Novel Hybrid Crow Search Arithmetic Optimization Algorithm for Solving Weighted Combined Economic Emission Dispatch with Load-Shifting Practice Normalization of Web of Science Institution Names Based on Deep Learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1