Classification of Spam Mail using different machine learning algorithms

A. Shrivastava, R. Dubey
{"title":"Classification of Spam Mail using different machine learning algorithms","authors":"A. Shrivastava, R. Dubey","doi":"10.1109/ICACAT.2018.8933787","DOIUrl":null,"url":null,"abstract":"Email is necessary and essential for communication in today's life. Today internet users are increases, and email is necessary for communication over the internet. Spam mail is a major and big problem of researchers to analyze and reduce it. Spam emails are received in bulk amount and it contains trojans, viruses, malware and causes phishing attacks. Problems are arise when number of unwanted mails are come from unknown sites and how to classify the user that email are received which is spam email or ham. This paper used to classify that incoming emails are spam mail or ham by the use of different classification techniques to identify spam mail and remove it. Naive bayes classifier are apply in the concept of posterior probability and decision tree algorithms are apply namely Random Tree, REPTree, Random Forest,and J48 decision tree classifier. For the identification of spam mail, UCI spambase dataset is used. It is a benchmark dataset which contains 58 attributes and 4601 instances. Weka software is used for the analysis and implementation of results. In Weka tool, classification algorithms are used to find spam mail in the classification phase of weka software.These papers play a very important role to remove viruses, trojans, malware and websites including phishing attacks and fraudulent attempts in emails. Feature selection is applied on dataset for training set and cross validation. Cfs Subset evaluation method is used for best first method in feature selection. For the classification of spam mail, we use two tests are cross validation and training set under classifier option in Weka Tool. For training set, same data will be used for training and testing. And for cross validation, training data is segmented in a number of folds. And finally using training set, Random Tree gives the best result for the classification of spam mail.","PeriodicalId":6575,"journal":{"name":"2018 International Conference on Advanced Computation and Telecommunication (ICACAT)","volume":"4 1","pages":"1-10"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Advanced Computation and Telecommunication (ICACAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACAT.2018.8933787","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Email is necessary and essential for communication in today's life. Today internet users are increases, and email is necessary for communication over the internet. Spam mail is a major and big problem of researchers to analyze and reduce it. Spam emails are received in bulk amount and it contains trojans, viruses, malware and causes phishing attacks. Problems are arise when number of unwanted mails are come from unknown sites and how to classify the user that email are received which is spam email or ham. This paper used to classify that incoming emails are spam mail or ham by the use of different classification techniques to identify spam mail and remove it. Naive bayes classifier are apply in the concept of posterior probability and decision tree algorithms are apply namely Random Tree, REPTree, Random Forest,and J48 decision tree classifier. For the identification of spam mail, UCI spambase dataset is used. It is a benchmark dataset which contains 58 attributes and 4601 instances. Weka software is used for the analysis and implementation of results. In Weka tool, classification algorithms are used to find spam mail in the classification phase of weka software.These papers play a very important role to remove viruses, trojans, malware and websites including phishing attacks and fraudulent attempts in emails. Feature selection is applied on dataset for training set and cross validation. Cfs Subset evaluation method is used for best first method in feature selection. For the classification of spam mail, we use two tests are cross validation and training set under classifier option in Weka Tool. For training set, same data will be used for training and testing. And for cross validation, training data is segmented in a number of folds. And finally using training set, Random Tree gives the best result for the classification of spam mail.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
垃圾邮件分类使用不同的机器学习算法
电子邮件是当今生活中必不可少的交流工具。今天,互联网用户在增加,电子邮件是在互联网上交流所必需的。垃圾邮件的分析和减少是研究人员面临的一个重大问题。垃圾邮件是大量收到的,它包含木马、病毒、恶意软件,并引起网络钓鱼攻击。当大量的垃圾邮件来自未知的网站时,以及如何将收到的电子邮件分类为垃圾邮件或火腿时,问题就出现了。本文通过使用不同的分类技术来识别和删除垃圾邮件,从而对收到的电子邮件进行垃圾邮件和非垃圾邮件的分类。朴素贝叶斯分类器应用于后验概率的概念,决策树算法应用于随机树、REPTree、随机森林和J48决策树分类器。对于垃圾邮件的识别,使用UCI spambase数据集。它是一个基准数据集,包含58个属性和4601个实例。使用Weka软件对结果进行分析和实现。在Weka工具中,在Weka软件的分类阶段,使用分类算法来发现垃圾邮件。这些文件在清除病毒、木马、恶意软件和网站(包括网络钓鱼攻击和电子邮件中的欺诈企图)方面发挥着非常重要的作用。对数据集进行特征选择,进行训练集和交叉验证。在特征选择中,采用Cfs子集评价方法作为最佳优先方法。对于垃圾邮件的分类,我们使用了Weka Tool中分类器选项下的交叉验证和训练集两个测试。对于训练集,将使用相同的数据进行训练和测试。对于交叉验证,训练数据被分割成许多折叠。最后利用训练集,随机树给出了垃圾邮件分类的最佳结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Metaphoric Investigation on Prediction of Heart Disease using Machine Learning Dynamic Weight Ranking algorithm using R-F score for Efficient Caching VLSI Architecture for Low Cost and Power Reversible Arithmetic Logic Unit based on Reversible Gate Advance Malware Analysis Using Static and Dynamic Methodology Evaluate Performance of student by using Normalized data set, Fuzzy and A-priori Like Algorithm
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1