基于特征选择的垃圾邮件检测

2014 International Conference on Computational Intelligence and Communication Networks Pub Date : 2014-11-14 DOI:10.1109/CICN.2014.127

Rinki Patel, Priyank Thakkar

{"title":"基于特征选择的垃圾邮件检测","authors":"Rinki Patel, Priyank Thakkar","doi":"10.1109/CICN.2014.127","DOIUrl":null,"url":null,"abstract":"In modern times, it has become very essential for e-commerce businesses to empower their end customers to write reviews about the services that they have utilized. Such reviews provide vital sources of information on these products or services. This information is utilized by the future potential customers before deciding on purchase of new products or services. These opinions or reviews are also exploited by marketers to find out the drawbacks of their own products or services and alternatively to find the vital information related to their competitor's products or services. This in turn allows to identify weaknesses or strengths of products. Unfortunately, this significant usefulness of opinions has also raised the problem for spam, which contains forged positive or spiteful negative opinions. This paper focuses on the detection of deceptive opinion spam. A recently proposed opinion spam detection method which is based on n-gram techniques is extended by means of feature selection and different representation of the opinions. The problem is modelled as the classification problem and Naïve Bayes (NB) classifier and Least Squares Support Vector Machine (LS-SVM) are used on three different representations (Boolean, bag-of-words and term frequency -- inverse document frequency (TF-IDF) ) of the opinions. All the experiments are carried out on widely used gold-standard dataset.","PeriodicalId":6487,"journal":{"name":"2014 International Conference on Computational Intelligence and Communication Networks","volume":"117 1","pages":"560-564"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"Opinion Spam Detection Using Feature Selection\",\"authors\":\"Rinki Patel, Priyank Thakkar\",\"doi\":\"10.1109/CICN.2014.127\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In modern times, it has become very essential for e-commerce businesses to empower their end customers to write reviews about the services that they have utilized. Such reviews provide vital sources of information on these products or services. This information is utilized by the future potential customers before deciding on purchase of new products or services. These opinions or reviews are also exploited by marketers to find out the drawbacks of their own products or services and alternatively to find the vital information related to their competitor's products or services. This in turn allows to identify weaknesses or strengths of products. Unfortunately, this significant usefulness of opinions has also raised the problem for spam, which contains forged positive or spiteful negative opinions. This paper focuses on the detection of deceptive opinion spam. A recently proposed opinion spam detection method which is based on n-gram techniques is extended by means of feature selection and different representation of the opinions. The problem is modelled as the classification problem and Naïve Bayes (NB) classifier and Least Squares Support Vector Machine (LS-SVM) are used on three different representations (Boolean, bag-of-words and term frequency -- inverse document frequency (TF-IDF) ) of the opinions. All the experiments are carried out on widely used gold-standard dataset.\",\"PeriodicalId\":6487,\"journal\":{\"name\":\"2014 International Conference on Computational Intelligence and Communication Networks\",\"volume\":\"117 1\",\"pages\":\"560-564\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-11-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 International Conference on Computational Intelligence and Communication Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CICN.2014.127\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Computational Intelligence and Communication Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CICN.2014.127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 18

摘要

在现代，对于电子商务企业来说，授权其最终客户撰写关于他们所使用的服务的评论已经变得非常重要。这些审查提供了有关这些产品或服务的重要信息来源。在决定购买新产品或服务之前，这些信息被未来的潜在客户所利用。这些意见或评论也被营销人员用来找出他们自己的产品或服务的缺点，或者找到与竞争对手的产品或服务相关的重要信息。这反过来又允许识别产品的弱点或优势。不幸的是，这种重要的有用性也引发了垃圾邮件的问题，其中包含伪造的正面或恶意的负面意见。本文主要研究欺骗性意见垃圾邮件的检测问题。最近提出了一种基于n-gram技术的意见垃圾检测方法，通过特征选择和意见的不同表示进行了扩展。该问题被建模为分类问题，Naïve贝叶斯(NB)分类器和最小二乘支持向量机(LS-SVM)用于意见的三种不同表示(布尔，词袋和术语频率-逆文档频率(TF-IDF))。所有实验都是在广泛使用的金标准数据集上进行的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Opinion Spam Detection Using Feature Selection

In modern times, it has become very essential for e-commerce businesses to empower their end customers to write reviews about the services that they have utilized. Such reviews provide vital sources of information on these products or services. This information is utilized by the future potential customers before deciding on purchase of new products or services. These opinions or reviews are also exploited by marketers to find out the drawbacks of their own products or services and alternatively to find the vital information related to their competitor's products or services. This in turn allows to identify weaknesses or strengths of products. Unfortunately, this significant usefulness of opinions has also raised the problem for spam, which contains forged positive or spiteful negative opinions. This paper focuses on the detection of deceptive opinion spam. A recently proposed opinion spam detection method which is based on n-gram techniques is extended by means of feature selection and different representation of the opinions. The problem is modelled as the classification problem and Naïve Bayes (NB) classifier and Least Squares Support Vector Machine (LS-SVM) are used on three different representations (Boolean, bag-of-words and term frequency -- inverse document frequency (TF-IDF) ) of the opinions. All the experiments are carried out on widely used gold-standard dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 International Conference on Computational Intelligence and Communication Networks

自引率

0.00%

发文量