基于多视图数据融合的半监督深度社会垃圾邮件发送者检测

2018 IEEE International Conference on Data Mining (ICDM) Pub Date : 2018-11-01 DOI:10.1109/ICDM.2018.00040

Chaozhuo Li, Senzhang Wang, Lifang He, Philip S. Yu, Yanbo Liang, Zhoujun Li

{"title":"基于多视图数据融合的半监督深度社会垃圾邮件发送者检测","authors":"Chaozhuo Li, Senzhang Wang, Lifang He, Philip S. Yu, Yanbo Liang, Zhoujun Li","doi":"10.1109/ICDM.2018.00040","DOIUrl":null,"url":null,"abstract":"The explosive use of social media makes it a popular platform for malicious users, known as social spammers, to overwhelm legitimate users with unwanted content. Most existing social spammer detection approaches are supervised and need a large number of manually labeled data for training, which is infeasible in practice. To address this issue, some semi-supervised models are proposed by incorporating side information such as user profiles and posted tweets. However, these shallow models are not effective to deeply learn the desirable user representations for spammer detection, and the multi-view data are usually loosely coupled without considering their correlations. In this paper, we propose a Semi-Supervised Deep social spammer detection model by Multi-View data fusion (SSDMV). The insight is that we aim to extensively learn the task-relevant discriminative representations for users to address the challenge of annotation scarcity. Under a unified semi-supervised learning framework, we first design a deep multi-view feature learning module which fuses information from different views, and then propose a label inference module to predict labels for users. The mutual refinement between the two modules ensures SSDMV to be able to both generate high quality features and make accurate predictions.Empirically, we evaluate SSDMV over two real social network datasets on three tasks, and the results demonstrate that SSDMV significantly outperforms the state-of-the-art methods.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"27","resultStr":"{\"title\":\"SSDMV: Semi-Supervised Deep Social Spammer Detection by Multi-view Data Fusion\",\"authors\":\"Chaozhuo Li, Senzhang Wang, Lifang He, Philip S. Yu, Yanbo Liang, Zhoujun Li\",\"doi\":\"10.1109/ICDM.2018.00040\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The explosive use of social media makes it a popular platform for malicious users, known as social spammers, to overwhelm legitimate users with unwanted content. Most existing social spammer detection approaches are supervised and need a large number of manually labeled data for training, which is infeasible in practice. To address this issue, some semi-supervised models are proposed by incorporating side information such as user profiles and posted tweets. However, these shallow models are not effective to deeply learn the desirable user representations for spammer detection, and the multi-view data are usually loosely coupled without considering their correlations. In this paper, we propose a Semi-Supervised Deep social spammer detection model by Multi-View data fusion (SSDMV). The insight is that we aim to extensively learn the task-relevant discriminative representations for users to address the challenge of annotation scarcity. Under a unified semi-supervised learning framework, we first design a deep multi-view feature learning module which fuses information from different views, and then propose a label inference module to predict labels for users. The mutual refinement between the two modules ensures SSDMV to be able to both generate high quality features and make accurate predictions.Empirically, we evaluate SSDMV over two real social network datasets on three tasks, and the results demonstrate that SSDMV significantly outperforms the state-of-the-art methods.\",\"PeriodicalId\":286444,\"journal\":{\"name\":\"2018 IEEE International Conference on Data Mining (ICDM)\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"27\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Conference on Data Mining (ICDM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDM.2018.00040\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Data Mining (ICDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2018.00040","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 27

摘要

社交媒体的爆炸性使用使其成为恶意用户的流行平台，这些恶意用户被称为社交垃圾邮件发送者，他们用不想要的内容淹没合法用户。现有的大多数社交垃圾邮件检测方法都是有监督的，并且需要大量人工标记的数据进行训练，这在实践中是不可行的。为了解决这个问题，提出了一些半监督模型，通过合并用户个人资料和发布的tweet等侧信息。然而，这些浅层模型不能有效地深入学习垃圾邮件发送者检测所需的用户表示，并且多视图数据通常是松散耦合的，而不考虑它们之间的相关性。本文提出了一种基于多视图数据融合(SSDMV)的半监督深度社交垃圾邮件检测模型。我们的目标是为用户广泛学习与任务相关的判别表示，以解决注释稀缺性的挑战。在统一的半监督学习框架下，我们首先设计了融合不同视图信息的深度多视图特征学习模块，然后提出了标签推理模块，为用户预测标签。两个模块之间的相互改进确保了SSDMV能够生成高质量的特征并做出准确的预测。在经验上，我们在两个真实的社会网络数据集上对三个任务进行了评估，结果表明SSDMV显著优于最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SSDMV: Semi-Supervised Deep Social Spammer Detection by Multi-view Data Fusion

The explosive use of social media makes it a popular platform for malicious users, known as social spammers, to overwhelm legitimate users with unwanted content. Most existing social spammer detection approaches are supervised and need a large number of manually labeled data for training, which is infeasible in practice. To address this issue, some semi-supervised models are proposed by incorporating side information such as user profiles and posted tweets. However, these shallow models are not effective to deeply learn the desirable user representations for spammer detection, and the multi-view data are usually loosely coupled without considering their correlations. In this paper, we propose a Semi-Supervised Deep social spammer detection model by Multi-View data fusion (SSDMV). The insight is that we aim to extensively learn the task-relevant discriminative representations for users to address the challenge of annotation scarcity. Under a unified semi-supervised learning framework, we first design a deep multi-view feature learning module which fuses information from different views, and then propose a label inference module to predict labels for users. The mutual refinement between the two modules ensures SSDMV to be able to both generate high quality features and make accurate predictions.Empirically, we evaluate SSDMV over two real social network datasets on three tasks, and the results demonstrate that SSDMV significantly outperforms the state-of-the-art methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助