{"title":"SPGD_HIN: Spammer Group Detection based on Heterogeneous Information Network","authors":"Alireza Bitarafan, Chitra Dadkhah","doi":"10.1109/ICWR.2019.8765274","DOIUrl":null,"url":null,"abstract":"Online stores and e-commerce platforms have become increasingly popular in recent years, and a reasonable approach to compare the available products is to use comments or feedbacks written by other online users for each product. Therefore, these platforms can be a great opportunity for spammers to promote or demote their target products with fake reviews. So far, there is plenty of studies done with the purpose of distinguishing spam reviews or spammers from genuine ones, but it should not be neglected that often spammers work in collusion with each other to control the rating score of a product more naturally. Hence, this article focuses on the latter aspect i.e., review spammer group detection. In most of the previous works, Frequent Item set Mining (FIM) is applied in the early stage to find candidate groups and then an unsupervised ranking procedure is done based on some predefined features. Although, FIM methods mostly suffer from threshold setting, i.e., using low support values causes inefficiency and high support values ignore some useful patterns. Furthermore, instead of unsupervised methods, semi-supervised ones which don’t need many labeled data, can improve the accuracy of detection greatly. In this article, we tackle the above-mentioned challenges taking advantage of some labeled instances in a Heterogeneous Information Network (HIN). Using a HIN can preserve the semantics between different kinds of nodes in the network. Also, we extract candidate groups using spammer behaviors and their relations which makes it a robust approach when spammers decide to be more intelligent. Experiments on a real-life Yelp dataset show the efficiency of our approach.","PeriodicalId":6680,"journal":{"name":"2019 5th International Conference on Web Research (ICWR)","volume":"126 1","pages":"228-233"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 5th International Conference on Web Research (ICWR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICWR.2019.8765274","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Online stores and e-commerce platforms have become increasingly popular in recent years, and a reasonable approach to compare the available products is to use comments or feedbacks written by other online users for each product. Therefore, these platforms can be a great opportunity for spammers to promote or demote their target products with fake reviews. So far, there is plenty of studies done with the purpose of distinguishing spam reviews or spammers from genuine ones, but it should not be neglected that often spammers work in collusion with each other to control the rating score of a product more naturally. Hence, this article focuses on the latter aspect i.e., review spammer group detection. In most of the previous works, Frequent Item set Mining (FIM) is applied in the early stage to find candidate groups and then an unsupervised ranking procedure is done based on some predefined features. Although, FIM methods mostly suffer from threshold setting, i.e., using low support values causes inefficiency and high support values ignore some useful patterns. Furthermore, instead of unsupervised methods, semi-supervised ones which don’t need many labeled data, can improve the accuracy of detection greatly. In this article, we tackle the above-mentioned challenges taking advantage of some labeled instances in a Heterogeneous Information Network (HIN). Using a HIN can preserve the semantics between different kinds of nodes in the network. Also, we extract candidate groups using spammer behaviors and their relations which makes it a robust approach when spammers decide to be more intelligent. Experiments on a real-life Yelp dataset show the efficiency of our approach.