A Supervised Framework for Review Spam Detection in the Persian Language

Mohammad Ehsan Basiri, Neshat Safarian, Hadi Khosravi Farsani
{"title":"A Supervised Framework for Review Spam Detection in the Persian Language","authors":"Mohammad Ehsan Basiri, Neshat Safarian, Hadi Khosravi Farsani","doi":"10.1109/ICWR.2019.8765275","DOIUrl":null,"url":null,"abstract":"Sentiment analysis of online reviews has attracted an increasing attention from both academia and industry. Although online reviews are valuable sources of information for detecting public opinion towards different aspects of products, they may be written by spammers with different purposes. In order to detect such spam reviews, several methods have been proposed for English language but no study has been reported on Persian spam detection so far. In the current study, Persian reviews of cell-phones are investigated to find spam type 1 and type 2 which are fake reviews and reviews only written about brands, respectively. In the proposed framework a labeled dataset, SpamPer, is first created using a majority voting on the answers of 11 questions previously designed for spam detection by human annotators. Then several preprocessing steps for Persian language are performed to refine the training data. Finally review-based and metadata features are extracted. The obtained results on 3000 reviews of SpamPer shows that the highest accuracy is obtained using the decision tree with 0.78 F1-measure. Moreover, the results reveal that SVM for unbalanced data and decision tree for balanced data achieve better performance when they are trained on the combination of metadata and review-based features.","PeriodicalId":6680,"journal":{"name":"2019 5th International Conference on Web Research (ICWR)","volume":"47 1","pages":"203-207"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 5th International Conference on Web Research (ICWR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICWR.2019.8765275","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Sentiment analysis of online reviews has attracted an increasing attention from both academia and industry. Although online reviews are valuable sources of information for detecting public opinion towards different aspects of products, they may be written by spammers with different purposes. In order to detect such spam reviews, several methods have been proposed for English language but no study has been reported on Persian spam detection so far. In the current study, Persian reviews of cell-phones are investigated to find spam type 1 and type 2 which are fake reviews and reviews only written about brands, respectively. In the proposed framework a labeled dataset, SpamPer, is first created using a majority voting on the answers of 11 questions previously designed for spam detection by human annotators. Then several preprocessing steps for Persian language are performed to refine the training data. Finally review-based and metadata features are extracted. The obtained results on 3000 reviews of SpamPer shows that the highest accuracy is obtained using the decision tree with 0.78 F1-measure. Moreover, the results reveal that SVM for unbalanced data and decision tree for balanced data achieve better performance when they are trained on the combination of metadata and review-based features.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
波斯语评论垃圾邮件检测的监督框架
网络评论的情感分析越来越受到学术界和业界的关注。尽管在线评论是检测公众对产品不同方面看法的宝贵信息来源,但它们可能是由不同目的的垃圾邮件发送者撰写的。为了检测这种垃圾邮件评论,已经提出了几种针对英语的方法,但迄今为止还没有关于波斯语垃圾邮件检测的研究报告。在目前的研究中,研究人员调查了波斯语对手机的评论,发现垃圾邮件类型1和类型2分别是虚假评论和只写品牌的评论。在提议的框架中,首先使用对11个问题的答案进行多数投票来创建标记数据集SpamPer,这些问题是由人类注释者设计用于垃圾邮件检测的。然后对波斯语进行预处理,对训练数据进行细化。最后提取基于评审的特征和元数据特征。在SpamPer的3000条评论上获得的结果表明,使用决策树获得的准确率最高,其F1-measure值为0.78。此外,研究结果表明,将元数据与基于评论的特征相结合,对非平衡数据的支持向量机和平衡数据的决策树进行训练,可以获得更好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
An Anomaly-Based IDS for Detecting Attacks in RPL-Based Internet of Things A Sentiment Aggregation System based on an OWA Operator Using Web Mining in the Analysis of Housing Prices: A Case study of Tehran An Adaptive Machine Learning Based Approach for Phishing Detection Using Hybrid Features Mobility-Aware Parent Selection for Routing Protocol in Wireless Sensor Networks using RPL
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1