Mohammad Ehsan Basiri, Neshat Safarian, Hadi Khosravi Farsani
{"title":"波斯语评论垃圾邮件检测的监督框架","authors":"Mohammad Ehsan Basiri, Neshat Safarian, Hadi Khosravi Farsani","doi":"10.1109/ICWR.2019.8765275","DOIUrl":null,"url":null,"abstract":"Sentiment analysis of online reviews has attracted an increasing attention from both academia and industry. Although online reviews are valuable sources of information for detecting public opinion towards different aspects of products, they may be written by spammers with different purposes. In order to detect such spam reviews, several methods have been proposed for English language but no study has been reported on Persian spam detection so far. In the current study, Persian reviews of cell-phones are investigated to find spam type 1 and type 2 which are fake reviews and reviews only written about brands, respectively. In the proposed framework a labeled dataset, SpamPer, is first created using a majority voting on the answers of 11 questions previously designed for spam detection by human annotators. Then several preprocessing steps for Persian language are performed to refine the training data. Finally review-based and metadata features are extracted. The obtained results on 3000 reviews of SpamPer shows that the highest accuracy is obtained using the decision tree with 0.78 F1-measure. Moreover, the results reveal that SVM for unbalanced data and decision tree for balanced data achieve better performance when they are trained on the combination of metadata and review-based features.","PeriodicalId":6680,"journal":{"name":"2019 5th International Conference on Web Research (ICWR)","volume":"47 1","pages":"203-207"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"A Supervised Framework for Review Spam Detection in the Persian Language\",\"authors\":\"Mohammad Ehsan Basiri, Neshat Safarian, Hadi Khosravi Farsani\",\"doi\":\"10.1109/ICWR.2019.8765275\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sentiment analysis of online reviews has attracted an increasing attention from both academia and industry. Although online reviews are valuable sources of information for detecting public opinion towards different aspects of products, they may be written by spammers with different purposes. In order to detect such spam reviews, several methods have been proposed for English language but no study has been reported on Persian spam detection so far. In the current study, Persian reviews of cell-phones are investigated to find spam type 1 and type 2 which are fake reviews and reviews only written about brands, respectively. In the proposed framework a labeled dataset, SpamPer, is first created using a majority voting on the answers of 11 questions previously designed for spam detection by human annotators. Then several preprocessing steps for Persian language are performed to refine the training data. Finally review-based and metadata features are extracted. The obtained results on 3000 reviews of SpamPer shows that the highest accuracy is obtained using the decision tree with 0.78 F1-measure. Moreover, the results reveal that SVM for unbalanced data and decision tree for balanced data achieve better performance when they are trained on the combination of metadata and review-based features.\",\"PeriodicalId\":6680,\"journal\":{\"name\":\"2019 5th International Conference on Web Research (ICWR)\",\"volume\":\"47 1\",\"pages\":\"203-207\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 5th International Conference on Web Research (ICWR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICWR.2019.8765275\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 5th International Conference on Web Research (ICWR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICWR.2019.8765275","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Supervised Framework for Review Spam Detection in the Persian Language
Sentiment analysis of online reviews has attracted an increasing attention from both academia and industry. Although online reviews are valuable sources of information for detecting public opinion towards different aspects of products, they may be written by spammers with different purposes. In order to detect such spam reviews, several methods have been proposed for English language but no study has been reported on Persian spam detection so far. In the current study, Persian reviews of cell-phones are investigated to find spam type 1 and type 2 which are fake reviews and reviews only written about brands, respectively. In the proposed framework a labeled dataset, SpamPer, is first created using a majority voting on the answers of 11 questions previously designed for spam detection by human annotators. Then several preprocessing steps for Persian language are performed to refine the training data. Finally review-based and metadata features are extracted. The obtained results on 3000 reviews of SpamPer shows that the highest accuracy is obtained using the decision tree with 0.78 F1-measure. Moreover, the results reveal that SVM for unbalanced data and decision tree for balanced data achieve better performance when they are trained on the combination of metadata and review-based features.