Predicting age and gender in online social networks

SMUC '11 Pub Date : 2011-10-28 DOI:10.1145/2065023.2065035
Claudia Peersman, Walter Daelemans, L. V. Vaerenbergh
{"title":"Predicting age and gender in online social networks","authors":"Claudia Peersman, Walter Daelemans, L. V. Vaerenbergh","doi":"10.1145/2065023.2065035","DOIUrl":null,"url":null,"abstract":"A common characteristic of communication on online social networks is that it happens via short messages, often using non-standard language variations. These characteristics make this type of text a challenging text genre for natural language processing. Moreover, in these digital communities it is easy to provide a false name, age, gender and location in order to hide one's true identity, providing criminals such as pedophiles with new possibilities to groom their victims. It would therefore be useful if user profiles can be checked on the basis of text analysis, and false profiles flagged for monitoring. This paper presents an exploratory study in which we apply a text categorization approach for the prediction of age and gender on a corpus of chat texts, which we collected from the Belgian social networking site Netlog. We examine which types of features are most informative for a reliable prediction of age and gender on this difficult text type and perform experiments with different data set sizes in order to acquire more insight into the minimum data size requirements for this task.","PeriodicalId":341071,"journal":{"name":"SMUC '11","volume":"108 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"302","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SMUC '11","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2065023.2065035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 302

Abstract

A common characteristic of communication on online social networks is that it happens via short messages, often using non-standard language variations. These characteristics make this type of text a challenging text genre for natural language processing. Moreover, in these digital communities it is easy to provide a false name, age, gender and location in order to hide one's true identity, providing criminals such as pedophiles with new possibilities to groom their victims. It would therefore be useful if user profiles can be checked on the basis of text analysis, and false profiles flagged for monitoring. This paper presents an exploratory study in which we apply a text categorization approach for the prediction of age and gender on a corpus of chat texts, which we collected from the Belgian social networking site Netlog. We examine which types of features are most informative for a reliable prediction of age and gender on this difficult text type and perform experiments with different data set sizes in order to acquire more insight into the minimum data size requirements for this task.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
预测在线社交网络中的年龄和性别
在线社交网络上交流的一个共同特征是通过短信进行,通常使用非标准语言变体。这些特征使得这种类型的文本成为自然语言处理的一种具有挑战性的文本类型。此外,在这些数字社区中,为了隐藏自己的真实身份,很容易提供虚假的姓名、年龄、性别和地点,这为恋童癖等犯罪分子提供了培养受害者的新可能性。因此,如果可以在文本分析的基础上检查用户配置文件,并标记错误的配置文件以进行监视,这将是有用的。本文提出了一项探索性研究,我们在比利时社交网站Netlog收集的聊天文本语料库上应用文本分类方法来预测年龄和性别。我们研究了哪些类型的特征对于在这种困难的文本类型上可靠地预测年龄和性别最有信息,并使用不同的数据集大小进行实验,以便更深入地了解该任务的最小数据大小要求。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Improved answer ranking in social question-answering portals On the generation of rich content metadata from social media Characterizing Wikipedia pages using edit network motif profiles Detection of near-duplicate user generated contents: the SMS spam collection ThemeCrowds: multiresolution summaries of twitter usage
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1