Deep Contextualized Word Embedding for Text-based Online User Profiling to Detect Social Bots on Twitter

Maryam Heidari, James H. Jones, Özlem Uzuner
{"title":"Deep Contextualized Word Embedding for Text-based Online User Profiling to Detect Social Bots on Twitter","authors":"Maryam Heidari, James H. Jones, Özlem Uzuner","doi":"10.1109/ICDMW51313.2020.00071","DOIUrl":null,"url":null,"abstract":"Social media platforms can expose influential trends in many aspects of everyday life. However, the trends they represent can be contaminated by disinformation. Social bots are one of the significant sources of disinformation in social media. Social bots can pose serious cyber threats to society and public opinion. This research aims to develop machine learning models to detect bots based on the extracted user's profile from a Tweet's text. Online user profiles show the user's personal information, such as age, gender, education, and personality. In this work, the user's profile is constructed based on the user's online posts. This work's main contribution is three-fold: First, we aim to improve bot detection through machine learning models based on the user's personal information generated by the user's online comments. The similarity of personal information when comparing two online posts makes it difficult to differentiate a bot from a human user. However, in this research, we leverage personal information similarity among two online posts as an advantage for the new bot detection model. The new proposed model for bot detection creates user profiles based on personal information such as age, personality, gender, education from user's online posts, and introduces a machine learning model to detect social bots with high prediction accuracy based on personal information. Second, we create a new public data set that shows the user's profile for more than 6900 Twitter accounts in the Cresci 2017 [1] data set. All user's profiles are extracted from the online user's posts on Twitter. Third, for the first time, this paper uses a deep contextualized word embedding model, ELMO [2], for a social media bot detection task.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"198 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"59","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW51313.2020.00071","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 59

Abstract

Social media platforms can expose influential trends in many aspects of everyday life. However, the trends they represent can be contaminated by disinformation. Social bots are one of the significant sources of disinformation in social media. Social bots can pose serious cyber threats to society and public opinion. This research aims to develop machine learning models to detect bots based on the extracted user's profile from a Tweet's text. Online user profiles show the user's personal information, such as age, gender, education, and personality. In this work, the user's profile is constructed based on the user's online posts. This work's main contribution is three-fold: First, we aim to improve bot detection through machine learning models based on the user's personal information generated by the user's online comments. The similarity of personal information when comparing two online posts makes it difficult to differentiate a bot from a human user. However, in this research, we leverage personal information similarity among two online posts as an advantage for the new bot detection model. The new proposed model for bot detection creates user profiles based on personal information such as age, personality, gender, education from user's online posts, and introduces a machine learning model to detect social bots with high prediction accuracy based on personal information. Second, we create a new public data set that shows the user's profile for more than 6900 Twitter accounts in the Cresci 2017 [1] data set. All user's profiles are extracted from the online user's posts on Twitter. Third, for the first time, this paper uses a deep contextualized word embedding model, ELMO [2], for a social media bot detection task.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于文本的在线用户分析的深度语境词嵌入检测Twitter上的社交机器人
社交媒体平台可以在日常生活的许多方面揭示有影响力的趋势。然而,它们所代表的趋势可能会受到虚假信息的污染。社交机器人是社交媒体上虚假信息的重要来源之一。社交机器人会对社会和公众舆论构成严重的网络威胁。这项研究旨在开发机器学习模型,根据从推文中提取的用户资料来检测机器人。在线用户档案显示用户的个人信息,如年龄、性别、教育程度和性格。在这项工作中,用户的个人资料是基于用户的在线帖子构建的。这项工作的主要贡献有三个方面:首先,我们的目标是通过基于用户在线评论生成的用户个人信息的机器学习模型来改进机器人检测。在比较两个在线帖子时,个人信息的相似性使得很难区分机器人和人类用户。然而,在本研究中,我们利用两个在线帖子之间的个人信息相似性作为新的机器人检测模型的优势。该机器人检测模型基于用户在线帖子中的年龄、性格、性别、教育程度等个人信息创建用户档案,并引入机器学习模型,基于个人信息检测具有较高预测精度的社交机器人。其次,我们创建了一个新的公共数据集,其中显示了Cresci 2017[1]数据集中6900多个Twitter帐户的用户简介。所有用户的个人资料都是从在线用户在Twitter上的帖子中提取出来的。第三,本文首次将深度语境化词嵌入模型ELMO[2]用于社交媒体机器人检测任务。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Synthetic Data by Principal Component Analysis Deep Contextualized Word Embedding for Text-based Online User Profiling to Detect Social Bots on Twitter Integration of Fuzzy and Deep Learning in Three-Way Decisions Mining Heterogeneous Data for Formulation Design Restructuring of Hoeffding Trees for Trapezoidal Data Streams
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1