{"title":"在线漂移Twitter垃圾邮件的统计检测:特邀论文","authors":"Shigang Liu, Jun Zhang, Yang Xiang","doi":"10.1145/2897845.2897928","DOIUrl":null,"url":null,"abstract":"Spam has become a critical problem in online social networks. This paper focuses on Twitter spam detection. Recent research works focus on applying machine learning techniques for Twitter spam detection, which make use of the statistical features of tweets. We observe existing machine learning based detection methods suffer from the problem of Twitter spam drift, i.e., the statistical properties of spam tweets vary over time. To avoid this problem, an effective solution is to train one twitter spam classifier every day. However, it faces a challenge of the small number of imbalanced training data because labelling spam samples is time-consuming. This paper proposes a new method to address this challenge. The new method employs two new techniques, fuzzy-based redistribution and asymmetric sampling. We develop a fuzzy-based information decomposition technique to re-distribute the spam class and generate more spam samples. Moreover, an asymmetric sampling technique is proposed to re-balance the sizes of spam samples and non-spam samples in the training data. Finally, we apply the ensemble technique to combine the spam classifiers over two different training sets. A number of experiments are performed on a real-world 10-day ground-truth dataset to evaluate the new method. Experiments results show that the new method can significantly improve the detection performance for drifting Twitter spam.","PeriodicalId":166633,"journal":{"name":"Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":"{\"title\":\"Statistical Detection of Online Drifting Twitter Spam: Invited Paper\",\"authors\":\"Shigang Liu, Jun Zhang, Yang Xiang\",\"doi\":\"10.1145/2897845.2897928\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Spam has become a critical problem in online social networks. This paper focuses on Twitter spam detection. Recent research works focus on applying machine learning techniques for Twitter spam detection, which make use of the statistical features of tweets. We observe existing machine learning based detection methods suffer from the problem of Twitter spam drift, i.e., the statistical properties of spam tweets vary over time. To avoid this problem, an effective solution is to train one twitter spam classifier every day. However, it faces a challenge of the small number of imbalanced training data because labelling spam samples is time-consuming. This paper proposes a new method to address this challenge. The new method employs two new techniques, fuzzy-based redistribution and asymmetric sampling. We develop a fuzzy-based information decomposition technique to re-distribute the spam class and generate more spam samples. Moreover, an asymmetric sampling technique is proposed to re-balance the sizes of spam samples and non-spam samples in the training data. Finally, we apply the ensemble technique to combine the spam classifiers over two different training sets. A number of experiments are performed on a real-world 10-day ground-truth dataset to evaluate the new method. Experiments results show that the new method can significantly improve the detection performance for drifting Twitter spam.\",\"PeriodicalId\":166633,\"journal\":{\"name\":\"Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"35\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2897845.2897928\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2897845.2897928","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Statistical Detection of Online Drifting Twitter Spam: Invited Paper
Spam has become a critical problem in online social networks. This paper focuses on Twitter spam detection. Recent research works focus on applying machine learning techniques for Twitter spam detection, which make use of the statistical features of tweets. We observe existing machine learning based detection methods suffer from the problem of Twitter spam drift, i.e., the statistical properties of spam tweets vary over time. To avoid this problem, an effective solution is to train one twitter spam classifier every day. However, it faces a challenge of the small number of imbalanced training data because labelling spam samples is time-consuming. This paper proposes a new method to address this challenge. The new method employs two new techniques, fuzzy-based redistribution and asymmetric sampling. We develop a fuzzy-based information decomposition technique to re-distribute the spam class and generate more spam samples. Moreover, an asymmetric sampling technique is proposed to re-balance the sizes of spam samples and non-spam samples in the training data. Finally, we apply the ensemble technique to combine the spam classifiers over two different training sets. A number of experiments are performed on a real-world 10-day ground-truth dataset to evaluate the new method. Experiments results show that the new method can significantly improve the detection performance for drifting Twitter spam.