以阿塞拜疆语文本为例，机器学习方法与文本特征在文本作者身份识别中的比较研究

IF 17.7 1区化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Accounts of Chemical Research Pub Date : 2024-06-05 DOI:10.3390/a17060242

Rustam Azimov, Efthimios Providas

{"title":"以阿塞拜疆语文本为例，机器学习方法与文本特征在文本作者身份识别中的比较研究","authors":"Rustam Azimov, Efthimios Providas","doi":"10.3390/a17060242","DOIUrl":null,"url":null,"abstract":"This paper presents various machine learning methods with different text features that are explored and evaluated to determine the authorship of the texts in the example of the Azerbaijani language. We consider techniques like artificial neural network, convolutional neural network, random forest, and support vector machine. These techniques are used with different text features like word length, sentence length, combined word length and sentence length, n-grams, and word frequencies. The models were trained and tested on the works of many famous Azerbaijani writers. The results of computer experiments obtained by utilizing a comparison of various techniques and text features were analyzed. The cases where the usage of text features allowed better results were determined.","PeriodicalId":1,"journal":{"name":"Accounts of Chemical Research","volume":"9 11","pages":""},"PeriodicalIF":17.7000,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Comparative Study of Machine Learning Methods and Text Features for Text Authorship Recognition in the Example of Azerbaijani Language Texts\",\"authors\":\"Rustam Azimov, Efthimios Providas\",\"doi\":\"10.3390/a17060242\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents various machine learning methods with different text features that are explored and evaluated to determine the authorship of the texts in the example of the Azerbaijani language. We consider techniques like artificial neural network, convolutional neural network, random forest, and support vector machine. These techniques are used with different text features like word length, sentence length, combined word length and sentence length, n-grams, and word frequencies. The models were trained and tested on the works of many famous Azerbaijani writers. The results of computer experiments obtained by utilizing a comparison of various techniques and text features were analyzed. The cases where the usage of text features allowed better results were determined.\",\"PeriodicalId\":1,\"journal\":{\"name\":\"Accounts of Chemical Research\",\"volume\":\"9 11\",\"pages\":\"\"},\"PeriodicalIF\":17.7000,\"publicationDate\":\"2024-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Accounts of Chemical Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/a17060242\",\"RegionNum\":1,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Accounts of Chemical Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/a17060242","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

本文以阿塞拜疆语为例，介绍了各种具有不同文本特征的机器学习方法，并对这些方法进行了探讨和评估，以确定文本的作者身份。我们考虑了人工神经网络、卷积神经网络、随机森林和支持向量机等技术。这些技术使用了不同的文本特征，如单词长度、句子长度、单词长度和句子长度的组合、n-grams 和词频。这些模型在许多阿塞拜疆著名作家的作品上进行了训练和测试。通过比较各种技术和文本特征，对计算机实验结果进行了分析。确定了在哪些情况下使用文本特征可以获得更好的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Comparative Study of Machine Learning Methods and Text Features for Text Authorship Recognition in the Example of Azerbaijani Language Texts

This paper presents various machine learning methods with different text features that are explored and evaluated to determine the authorship of the texts in the example of the Azerbaijani language. We consider techniques like artificial neural network, convolutional neural network, random forest, and support vector machine. These techniques are used with different text features like word length, sentence length, combined word length and sentence length, n-grams, and word frequencies. The models were trained and tested on the works of many famous Azerbaijani writers. The results of computer experiments obtained by utilizing a comparison of various techniques and text features were analyzed. The cases where the usage of text features allowed better results were determined.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Accounts of Chemical Research 化学-化学综合

CiteScore

31.40

自引率

1.10%

发文量

312

审稿时长

2 months

期刊介绍： Accounts of Chemical Research presents short, concise and critical articles offering easy-to-read overviews of basic research and applications in all areas of chemistry and biochemistry. These short reviews focus on research from the author’s own laboratory and are designed to teach the reader about a research project. In addition, Accounts of Chemical Research publishes commentaries that give an informed opinion on a current research problem. Special Issues online are devoted to a single topic of unusual activity and significance. Accounts of Chemical Research replaces the traditional article abstract with an article "Conspectus." These entries synopsize the research affording the reader a closer look at the content and significance of an article. Through this provision of a more detailed description of the article contents, the Conspectus enhances the article's discoverability by search engines and the exposure for the research.