Gender Inference for Arabic Language in Social Media

A. I. Al-Ghadir, Abdullatif M. AlAbdullatif, Aqil M. Azmi
{"title":"Gender Inference for Arabic Language in Social Media","authors":"A. I. Al-Ghadir, Abdullatif M. AlAbdullatif, Aqil M. Azmi","doi":"10.4018/IJKSR.2014100101","DOIUrl":null,"url":null,"abstract":"The widespread usage of social media has attracted a new group of researchers seeking information on who, what and, where the users are. Some of the information retrieval researchers are interested in identifying the gender, age group, and the educational level of the users. The objective of this work is to identify the gender in the Arabic posts in the social media. Most of the works related to gender classification has been for English based content in the social media. Work for other languages, such as Arabic, is almost next to none. Typically people express themselves in the social media using colloquial, so this study is geared towards the identification of genders using the Saudi dialect of the Arabic language. To solve the gender identification problem the authors, a novel method called k-Top Vector k-TV, which is based on the k-top words based on the words occurrences and the frequency of the stems, was introduced. Part of this work required compiling a dataset of Saudi dialect words. For this, a well-known widely used social site was relied on. To test the system, we compiled 1200 samples equally split between both genders. The authors trained Support Vector Machine SVM and k-NN classifiers using different number of samples for training and testing. SVM did a better job and achieved an accuracy of 95% for gender classification.","PeriodicalId":296518,"journal":{"name":"Int. J. Knowl. Soc. Res.","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Knowl. Soc. Res.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/IJKSR.2014100101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

The widespread usage of social media has attracted a new group of researchers seeking information on who, what and, where the users are. Some of the information retrieval researchers are interested in identifying the gender, age group, and the educational level of the users. The objective of this work is to identify the gender in the Arabic posts in the social media. Most of the works related to gender classification has been for English based content in the social media. Work for other languages, such as Arabic, is almost next to none. Typically people express themselves in the social media using colloquial, so this study is geared towards the identification of genders using the Saudi dialect of the Arabic language. To solve the gender identification problem the authors, a novel method called k-Top Vector k-TV, which is based on the k-top words based on the words occurrences and the frequency of the stems, was introduced. Part of this work required compiling a dataset of Saudi dialect words. For this, a well-known widely used social site was relied on. To test the system, we compiled 1200 samples equally split between both genders. The authors trained Support Vector Machine SVM and k-NN classifiers using different number of samples for training and testing. SVM did a better job and achieved an accuracy of 95% for gender classification.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
社交媒体中阿拉伯语的性别推断
社交媒体的广泛使用吸引了一群新的研究人员,他们在寻找关于用户是谁、什么、在哪里的信息。一些信息检索研究者对识别用户的性别、年龄和教育程度感兴趣。这项工作的目的是识别社交媒体上阿拉伯语帖子中的性别。与性别分类相关的工作大多是针对社交媒体中基于英语的内容。其他语言的工作,比如阿拉伯语,几乎没有。通常人们在社交媒体上使用口语表达自己,所以这项研究旨在使用阿拉伯语的沙特方言来识别性别。为了解决性别识别问题,作者提出了一种基于词干出现频率和k-Top词的k-Top向量k-TV方法。这项工作的一部分需要汇编沙特方言词汇的数据集。为此,依赖于一个知名的广泛使用的社交网站。为了测试这个系统,我们编译了1200个男女平均分配的样本。作者使用不同数量的样本训练支持向量机SVM和k-NN分类器进行训练和测试。SVM在性别分类方面做得更好,准确率达到95%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Toward Knowledge Technology Synchronicity Framework for Asynchronous Environment The State of People and Knowledge in the GCC Countries per a New Index and the Future Ahead Framing the Conflict: How Students See It Corporate Social Responsibility: Case Study in UAE Organizations A Framework of Key E-Services Issues: Strategy, Architecture and Performance
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1