使用机器学习和有效的预处理来检测垃圾推文

Berk Kardaş, İsmail Erdem Bayar, Tansel Özyer, R. Alhajj
{"title":"使用机器学习和有效的预处理来检测垃圾推文","authors":"Berk Kardaş, İsmail Erdem Bayar, Tansel Özyer, R. Alhajj","doi":"10.1145/3487351.3490968","DOIUrl":null,"url":null,"abstract":"Nowadays, with the rapid increase in popularity of online social networks (OSNs), these platforms are realized as ideal places for spammers. Unfortunately, these spammers can easily publish malicious content, advertise phishing scams by taking advantage of OSNs. Therefore, effective identification and filtering of spam tweets will be beneficial to both OSNs and users. However, it is becoming increasingly difficult to check and eliminate spam tweets due to this great flow of posts. Motivated by these observations, in this paper we propose an approach for the detection of spam tweets using machine learning and effective preprocessing techniques. The approach proposes the advantages of the preprocessing and which of these preprocessing techniques are the most effective. To compare these techniques UtkML Twitter spam dataset is used in testing. After the most effective methods determined, the detection accuracy of the spam tweets will be better optimized by combining them. We have evaluated our solution with four different machine learning algorithms namely - Naïve Bayes Classifier, Neural Network, Logistic Regression and Support Vector Machine. With SVM Classifier, we are able to achieve an accuracy of 93.02%. Experimental results show that our approach can improve the performance of spam tweet classification effectively.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Detecting spam tweets using machine learning and effective preprocessing\",\"authors\":\"Berk Kardaş, İsmail Erdem Bayar, Tansel Özyer, R. Alhajj\",\"doi\":\"10.1145/3487351.3490968\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays, with the rapid increase in popularity of online social networks (OSNs), these platforms are realized as ideal places for spammers. Unfortunately, these spammers can easily publish malicious content, advertise phishing scams by taking advantage of OSNs. Therefore, effective identification and filtering of spam tweets will be beneficial to both OSNs and users. However, it is becoming increasingly difficult to check and eliminate spam tweets due to this great flow of posts. Motivated by these observations, in this paper we propose an approach for the detection of spam tweets using machine learning and effective preprocessing techniques. The approach proposes the advantages of the preprocessing and which of these preprocessing techniques are the most effective. To compare these techniques UtkML Twitter spam dataset is used in testing. After the most effective methods determined, the detection accuracy of the spam tweets will be better optimized by combining them. We have evaluated our solution with four different machine learning algorithms namely - Naïve Bayes Classifier, Neural Network, Logistic Regression and Support Vector Machine. With SVM Classifier, we are able to achieve an accuracy of 93.02%. Experimental results show that our approach can improve the performance of spam tweet classification effectively.\",\"PeriodicalId\":320904,\"journal\":{\"name\":\"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining\",\"volume\":\"63 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3487351.3490968\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3487351.3490968","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

如今,随着在线社交网络(OSNs)的迅速普及,这些平台成为垃圾邮件发送者的理想场所。不幸的是,这些垃圾邮件发送者可以很容易地利用osn发布恶意内容,宣传网络钓鱼骗局。因此,对垃圾推文进行有效的识别和过滤,对osn和用户都是有利的。然而,由于这种巨大的帖子流量,检查和消除垃圾推文变得越来越困难。基于这些观察结果,在本文中,我们提出了一种使用机器学习和有效预处理技术检测垃圾推文的方法。该方法提出了各种预处理技术的优点以及哪种预处理技术是最有效的。为了比较这些技术,在测试中使用了UtkML Twitter垃圾邮件数据集。在确定了最有效的方法后,将它们结合起来,可以更好地优化垃圾推文的检测精度。我们用四种不同的机器学习算法来评估我们的解决方案,即Naïve贝叶斯分类器、神经网络、逻辑回归和支持向量机。使用SVM分类器,我们能够达到93.02%的准确率。实验结果表明,该方法可以有效地提高垃圾推文分类的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Detecting spam tweets using machine learning and effective preprocessing
Nowadays, with the rapid increase in popularity of online social networks (OSNs), these platforms are realized as ideal places for spammers. Unfortunately, these spammers can easily publish malicious content, advertise phishing scams by taking advantage of OSNs. Therefore, effective identification and filtering of spam tweets will be beneficial to both OSNs and users. However, it is becoming increasingly difficult to check and eliminate spam tweets due to this great flow of posts. Motivated by these observations, in this paper we propose an approach for the detection of spam tweets using machine learning and effective preprocessing techniques. The approach proposes the advantages of the preprocessing and which of these preprocessing techniques are the most effective. To compare these techniques UtkML Twitter spam dataset is used in testing. After the most effective methods determined, the detection accuracy of the spam tweets will be better optimized by combining them. We have evaluated our solution with four different machine learning algorithms namely - Naïve Bayes Classifier, Neural Network, Logistic Regression and Support Vector Machine. With SVM Classifier, we are able to achieve an accuracy of 93.02%. Experimental results show that our approach can improve the performance of spam tweet classification effectively.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Predicting COVID-19 with AI techniques: current research and future directions Predictions of drug metabolism pathways through CYP 3A4 enzyme by analysing drug-target interactions network graph An insight into network structure measures and number of driver nodes Temporal dynamics of posts and user engagement of influencers on Facebook and Instagram Vibe check: social resonance learning for enhanced recommendation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1