{"title":"A Comprehensive Study of Learning Approaches for Author Gender Identification","authors":"Tuǧba Dalyan, H. Ayral, Özgür Özdemir","doi":"10.5755/j01.itc.51.3.29907","DOIUrl":null,"url":null,"abstract":"In recent years, author gender identification is an important yet challenging task in the fields of information retrieval and computational linguistics. In this paper, different learning approaches are presented to address the problem of author gender identification for Turkish articles. First, several classification algorithms are applied to the list of representations based on different paradigms: fixed-length vector representations such as Stylometric Features (SF), Bag-of-Words (BoW) and distributed word/document embeddings such as Word2vec, fastText and Doc2vec. Secondly, deep learning architectures, Convolution Neural Network (CNN), Recurrent Neural Network (RNN), special kinds of RNN such as Long-Short Term Memory (LSTM) and Gated Recurrent Unit (GRU), C-RNN, Bidirectional LSTM (bi-LSTM), Bidirectional GRU (bi-GRU), Hierarchical Attention Networks and Multi-head Attention (MHA) are designated and their comparable performances are evaluated. We conducted a variety of experiments and achieved outstanding empirical results. To conclude, ML algorithms with BoW have promising results. fast-Text is also probably suitable between embedding models. This comprehensive study contributes to literature utilizing different learning approaches based on several ways of representations. It is also first important attempt to identify author gender applying SF, fastText and DNN architectures to the Turkish language.","PeriodicalId":54982,"journal":{"name":"Information Technology and Control","volume":"28 1","pages":"429-445"},"PeriodicalIF":2.0000,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Technology and Control","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.5755/j01.itc.51.3.29907","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 2
Abstract
In recent years, author gender identification is an important yet challenging task in the fields of information retrieval and computational linguistics. In this paper, different learning approaches are presented to address the problem of author gender identification for Turkish articles. First, several classification algorithms are applied to the list of representations based on different paradigms: fixed-length vector representations such as Stylometric Features (SF), Bag-of-Words (BoW) and distributed word/document embeddings such as Word2vec, fastText and Doc2vec. Secondly, deep learning architectures, Convolution Neural Network (CNN), Recurrent Neural Network (RNN), special kinds of RNN such as Long-Short Term Memory (LSTM) and Gated Recurrent Unit (GRU), C-RNN, Bidirectional LSTM (bi-LSTM), Bidirectional GRU (bi-GRU), Hierarchical Attention Networks and Multi-head Attention (MHA) are designated and their comparable performances are evaluated. We conducted a variety of experiments and achieved outstanding empirical results. To conclude, ML algorithms with BoW have promising results. fast-Text is also probably suitable between embedding models. This comprehensive study contributes to literature utilizing different learning approaches based on several ways of representations. It is also first important attempt to identify author gender applying SF, fastText and DNN architectures to the Turkish language.
期刊介绍:
Periodical journal covers a wide field of computer science and control systems related problems including:
-Software and hardware engineering;
-Management systems engineering;
-Information systems and databases;
-Embedded systems;
-Physical systems modelling and application;
-Computer networks and cloud computing;
-Data visualization;
-Human-computer interface;
-Computer graphics, visual analytics, and multimedia systems.