{"title":"基于特征选择的垃圾邮件检测","authors":"Rinki Patel, Priyank Thakkar","doi":"10.1109/CICN.2014.127","DOIUrl":null,"url":null,"abstract":"In modern times, it has become very essential for e-commerce businesses to empower their end customers to write reviews about the services that they have utilized. Such reviews provide vital sources of information on these products or services. This information is utilized by the future potential customers before deciding on purchase of new products or services. These opinions or reviews are also exploited by marketers to find out the drawbacks of their own products or services and alternatively to find the vital information related to their competitor's products or services. This in turn allows to identify weaknesses or strengths of products. Unfortunately, this significant usefulness of opinions has also raised the problem for spam, which contains forged positive or spiteful negative opinions. This paper focuses on the detection of deceptive opinion spam. A recently proposed opinion spam detection method which is based on n-gram techniques is extended by means of feature selection and different representation of the opinions. The problem is modelled as the classification problem and Naïve Bayes (NB) classifier and Least Squares Support Vector Machine (LS-SVM) are used on three different representations (Boolean, bag-of-words and term frequency -- inverse document frequency (TF-IDF) ) of the opinions. All the experiments are carried out on widely used gold-standard dataset.","PeriodicalId":6487,"journal":{"name":"2014 International Conference on Computational Intelligence and Communication Networks","volume":"117 1","pages":"560-564"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"Opinion Spam Detection Using Feature Selection\",\"authors\":\"Rinki Patel, Priyank Thakkar\",\"doi\":\"10.1109/CICN.2014.127\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In modern times, it has become very essential for e-commerce businesses to empower their end customers to write reviews about the services that they have utilized. Such reviews provide vital sources of information on these products or services. This information is utilized by the future potential customers before deciding on purchase of new products or services. These opinions or reviews are also exploited by marketers to find out the drawbacks of their own products or services and alternatively to find the vital information related to their competitor's products or services. This in turn allows to identify weaknesses or strengths of products. Unfortunately, this significant usefulness of opinions has also raised the problem for spam, which contains forged positive or spiteful negative opinions. This paper focuses on the detection of deceptive opinion spam. A recently proposed opinion spam detection method which is based on n-gram techniques is extended by means of feature selection and different representation of the opinions. The problem is modelled as the classification problem and Naïve Bayes (NB) classifier and Least Squares Support Vector Machine (LS-SVM) are used on three different representations (Boolean, bag-of-words and term frequency -- inverse document frequency (TF-IDF) ) of the opinions. All the experiments are carried out on widely used gold-standard dataset.\",\"PeriodicalId\":6487,\"journal\":{\"name\":\"2014 International Conference on Computational Intelligence and Communication Networks\",\"volume\":\"117 1\",\"pages\":\"560-564\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-11-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 International Conference on Computational Intelligence and Communication Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CICN.2014.127\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Computational Intelligence and Communication Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CICN.2014.127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
In modern times, it has become very essential for e-commerce businesses to empower their end customers to write reviews about the services that they have utilized. Such reviews provide vital sources of information on these products or services. This information is utilized by the future potential customers before deciding on purchase of new products or services. These opinions or reviews are also exploited by marketers to find out the drawbacks of their own products or services and alternatively to find the vital information related to their competitor's products or services. This in turn allows to identify weaknesses or strengths of products. Unfortunately, this significant usefulness of opinions has also raised the problem for spam, which contains forged positive or spiteful negative opinions. This paper focuses on the detection of deceptive opinion spam. A recently proposed opinion spam detection method which is based on n-gram techniques is extended by means of feature selection and different representation of the opinions. The problem is modelled as the classification problem and Naïve Bayes (NB) classifier and Least Squares Support Vector Machine (LS-SVM) are used on three different representations (Boolean, bag-of-words and term frequency -- inverse document frequency (TF-IDF) ) of the opinions. All the experiments are carried out on widely used gold-standard dataset.