在线漂移Twitter垃圾邮件的统计检测:特邀论文

Shigang Liu, Jun Zhang, Yang Xiang
{"title":"在线漂移Twitter垃圾邮件的统计检测:特邀论文","authors":"Shigang Liu, Jun Zhang, Yang Xiang","doi":"10.1145/2897845.2897928","DOIUrl":null,"url":null,"abstract":"Spam has become a critical problem in online social networks. This paper focuses on Twitter spam detection. Recent research works focus on applying machine learning techniques for Twitter spam detection, which make use of the statistical features of tweets. We observe existing machine learning based detection methods suffer from the problem of Twitter spam drift, i.e., the statistical properties of spam tweets vary over time. To avoid this problem, an effective solution is to train one twitter spam classifier every day. However, it faces a challenge of the small number of imbalanced training data because labelling spam samples is time-consuming. This paper proposes a new method to address this challenge. The new method employs two new techniques, fuzzy-based redistribution and asymmetric sampling. We develop a fuzzy-based information decomposition technique to re-distribute the spam class and generate more spam samples. Moreover, an asymmetric sampling technique is proposed to re-balance the sizes of spam samples and non-spam samples in the training data. Finally, we apply the ensemble technique to combine the spam classifiers over two different training sets. A number of experiments are performed on a real-world 10-day ground-truth dataset to evaluate the new method. Experiments results show that the new method can significantly improve the detection performance for drifting Twitter spam.","PeriodicalId":166633,"journal":{"name":"Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":"{\"title\":\"Statistical Detection of Online Drifting Twitter Spam: Invited Paper\",\"authors\":\"Shigang Liu, Jun Zhang, Yang Xiang\",\"doi\":\"10.1145/2897845.2897928\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Spam has become a critical problem in online social networks. This paper focuses on Twitter spam detection. Recent research works focus on applying machine learning techniques for Twitter spam detection, which make use of the statistical features of tweets. We observe existing machine learning based detection methods suffer from the problem of Twitter spam drift, i.e., the statistical properties of spam tweets vary over time. To avoid this problem, an effective solution is to train one twitter spam classifier every day. However, it faces a challenge of the small number of imbalanced training data because labelling spam samples is time-consuming. This paper proposes a new method to address this challenge. The new method employs two new techniques, fuzzy-based redistribution and asymmetric sampling. We develop a fuzzy-based information decomposition technique to re-distribute the spam class and generate more spam samples. Moreover, an asymmetric sampling technique is proposed to re-balance the sizes of spam samples and non-spam samples in the training data. Finally, we apply the ensemble technique to combine the spam classifiers over two different training sets. A number of experiments are performed on a real-world 10-day ground-truth dataset to evaluate the new method. Experiments results show that the new method can significantly improve the detection performance for drifting Twitter spam.\",\"PeriodicalId\":166633,\"journal\":{\"name\":\"Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"35\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2897845.2897928\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2897845.2897928","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 35

摘要

垃圾邮件已经成为在线社交网络中的一个关键问题。本文主要研究Twitter垃圾邮件检测。最近的研究工作集中在将机器学习技术应用于Twitter垃圾邮件检测上,该技术利用了tweet的统计特征。我们观察到现有的基于机器学习的检测方法存在推特垃圾邮件漂移的问题,即垃圾推文的统计属性随着时间的推移而变化。为了避免这个问题,一个有效的解决方案是每天训练一个twitter垃圾邮件分类器。然而,由于标记垃圾邮件样本耗时长,它面临着训练数据数量少的挑战。本文提出了一种新的方法来解决这一挑战。该方法采用了基于模糊的再分配和非对称采样两种新技术。我们开发了一种基于模糊的信息分解技术来重新分配垃圾邮件类并生成更多的垃圾邮件样本。此外,提出了一种非对称采样技术来重新平衡训练数据中垃圾邮件样本和非垃圾邮件样本的大小。最后,我们应用集成技术在两个不同的训练集上组合垃圾邮件分类器。在真实世界的10天真实数据集上进行了许多实验来评估新方法。实验结果表明,新方法可以显著提高对漂移Twitter垃圾邮件的检测性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Statistical Detection of Online Drifting Twitter Spam: Invited Paper
Spam has become a critical problem in online social networks. This paper focuses on Twitter spam detection. Recent research works focus on applying machine learning techniques for Twitter spam detection, which make use of the statistical features of tweets. We observe existing machine learning based detection methods suffer from the problem of Twitter spam drift, i.e., the statistical properties of spam tweets vary over time. To avoid this problem, an effective solution is to train one twitter spam classifier every day. However, it faces a challenge of the small number of imbalanced training data because labelling spam samples is time-consuming. This paper proposes a new method to address this challenge. The new method employs two new techniques, fuzzy-based redistribution and asymmetric sampling. We develop a fuzzy-based information decomposition technique to re-distribute the spam class and generate more spam samples. Moreover, an asymmetric sampling technique is proposed to re-balance the sizes of spam samples and non-spam samples in the training data. Finally, we apply the ensemble technique to combine the spam classifiers over two different training sets. A number of experiments are performed on a real-world 10-day ground-truth dataset to evaluate the new method. Experiments results show that the new method can significantly improve the detection performance for drifting Twitter spam.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Generally Hybrid Proxy Re-Encryption: A Secure Data Sharing among Cryptographic Clouds Hardening OpenStack Cloud Platforms against Compute Node Compromises Data Exfiltration in the Face of CSP Anonymous Identity-Based Broadcast Encryption with Constant Decryption Complexity and Strong Security FLEX: A Flexible Code Authentication Framework for Delegating Mobile App Customization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1