基于特征选择的垃圾邮件检测

Rinki Patel, Priyank Thakkar
{"title":"基于特征选择的垃圾邮件检测","authors":"Rinki Patel, Priyank Thakkar","doi":"10.1109/CICN.2014.127","DOIUrl":null,"url":null,"abstract":"In modern times, it has become very essential for e-commerce businesses to empower their end customers to write reviews about the services that they have utilized. Such reviews provide vital sources of information on these products or services. This information is utilized by the future potential customers before deciding on purchase of new products or services. These opinions or reviews are also exploited by marketers to find out the drawbacks of their own products or services and alternatively to find the vital information related to their competitor's products or services. This in turn allows to identify weaknesses or strengths of products. Unfortunately, this significant usefulness of opinions has also raised the problem for spam, which contains forged positive or spiteful negative opinions. This paper focuses on the detection of deceptive opinion spam. A recently proposed opinion spam detection method which is based on n-gram techniques is extended by means of feature selection and different representation of the opinions. The problem is modelled as the classification problem and Naïve Bayes (NB) classifier and Least Squares Support Vector Machine (LS-SVM) are used on three different representations (Boolean, bag-of-words and term frequency -- inverse document frequency (TF-IDF) ) of the opinions. All the experiments are carried out on widely used gold-standard dataset.","PeriodicalId":6487,"journal":{"name":"2014 International Conference on Computational Intelligence and Communication Networks","volume":"117 1","pages":"560-564"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"Opinion Spam Detection Using Feature Selection\",\"authors\":\"Rinki Patel, Priyank Thakkar\",\"doi\":\"10.1109/CICN.2014.127\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In modern times, it has become very essential for e-commerce businesses to empower their end customers to write reviews about the services that they have utilized. Such reviews provide vital sources of information on these products or services. This information is utilized by the future potential customers before deciding on purchase of new products or services. These opinions or reviews are also exploited by marketers to find out the drawbacks of their own products or services and alternatively to find the vital information related to their competitor's products or services. This in turn allows to identify weaknesses or strengths of products. Unfortunately, this significant usefulness of opinions has also raised the problem for spam, which contains forged positive or spiteful negative opinions. This paper focuses on the detection of deceptive opinion spam. A recently proposed opinion spam detection method which is based on n-gram techniques is extended by means of feature selection and different representation of the opinions. The problem is modelled as the classification problem and Naïve Bayes (NB) classifier and Least Squares Support Vector Machine (LS-SVM) are used on three different representations (Boolean, bag-of-words and term frequency -- inverse document frequency (TF-IDF) ) of the opinions. All the experiments are carried out on widely used gold-standard dataset.\",\"PeriodicalId\":6487,\"journal\":{\"name\":\"2014 International Conference on Computational Intelligence and Communication Networks\",\"volume\":\"117 1\",\"pages\":\"560-564\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-11-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 International Conference on Computational Intelligence and Communication Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CICN.2014.127\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Computational Intelligence and Communication Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CICN.2014.127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

摘要

在现代,对于电子商务企业来说,授权其最终客户撰写关于他们所使用的服务的评论已经变得非常重要。这些审查提供了有关这些产品或服务的重要信息来源。在决定购买新产品或服务之前,这些信息被未来的潜在客户所利用。这些意见或评论也被营销人员用来找出他们自己的产品或服务的缺点,或者找到与竞争对手的产品或服务相关的重要信息。这反过来又允许识别产品的弱点或优势。不幸的是,这种重要的有用性也引发了垃圾邮件的问题,其中包含伪造的正面或恶意的负面意见。本文主要研究欺骗性意见垃圾邮件的检测问题。最近提出了一种基于n-gram技术的意见垃圾检测方法,通过特征选择和意见的不同表示进行了扩展。该问题被建模为分类问题,Naïve贝叶斯(NB)分类器和最小二乘支持向量机(LS-SVM)用于意见的三种不同表示(布尔,词袋和术语频率-逆文档频率(TF-IDF))。所有实验都是在广泛使用的金标准数据集上进行的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Opinion Spam Detection Using Feature Selection
In modern times, it has become very essential for e-commerce businesses to empower their end customers to write reviews about the services that they have utilized. Such reviews provide vital sources of information on these products or services. This information is utilized by the future potential customers before deciding on purchase of new products or services. These opinions or reviews are also exploited by marketers to find out the drawbacks of their own products or services and alternatively to find the vital information related to their competitor's products or services. This in turn allows to identify weaknesses or strengths of products. Unfortunately, this significant usefulness of opinions has also raised the problem for spam, which contains forged positive or spiteful negative opinions. This paper focuses on the detection of deceptive opinion spam. A recently proposed opinion spam detection method which is based on n-gram techniques is extended by means of feature selection and different representation of the opinions. The problem is modelled as the classification problem and Naïve Bayes (NB) classifier and Least Squares Support Vector Machine (LS-SVM) are used on three different representations (Boolean, bag-of-words and term frequency -- inverse document frequency (TF-IDF) ) of the opinions. All the experiments are carried out on widely used gold-standard dataset.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Research on Flow Control of all Vanadium Flow Battery Energy Storage Based on Fuzzy Algorithm Synthetic Aperture Radar System Using Digital Chirp Signal Generator Based on the Piecewise Higher Order Polynomial Interpolation Technique Frequency-Domain Equalization for E-Band Transmission System A Mean-Semi-variance Portfolio Optimization Model with Full Transaction Costs Detailed Evaluation of DEM Interpolation Methods in GIS Using DGPS Data
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1