{"title":"Spam email detection using a novel multilayer classification-based decision technique","authors":"Subhajit Das, Sourav Mandal, Rohini Basak","doi":"10.1080/1206212x.2023.2258328","DOIUrl":null,"url":null,"abstract":"AbstractBecause of the rapid advancement of technology over the last several years, the number of internet users is growing at an exponential rate, and as a result, email communication has become popular as a means of exchanging information over the internet. Sending data and communicating with peers via email is the most cost-effective method. These email services also cause problems for users by sending electronic junk mail, often known as spam mail. Spam email is a privacy concern that is linked to a slew of commercial and dangerous websites, causing phishing, virus distribution, and a slew of other problems. This study examines several aspects that have been used for email spam classification, as well as offering an overview of a handful of classifiers or algorithms that have been successfully evaluated, as well as exploratory data analysis. The proposed email spam classifier uses three parallel layers of machine learning and deep learning techniques, followed by a decision function to determine whether or not the emails are spam. During testing, it was found that the proposed classifier beats similar systems on the standard dataset with an accuracy of 98.4%.KEYWORDS: Content-based spam classificationemail spam classificationtext classificationmachine learningdeep learning Disclosure statementNo potential conflict of interest was reported by the author(s).Notes1 https://github.com/tensorflow/estimator2 https://nlp.stanford.edu/projects/glove/3 http://nlp.cs.aueb.gr/software_and_datasets/Enron-Spam/index.html4 https://www.tensorflow.org/Additional informationNotes on contributorsSubhajit DasSubhajit Das is an Information Technology Engineer with more than 11 years of experience in software Development. He has completed Master of Engineering from Jadavpur University, Kolakta, India on Software Engineering and received a bachelor's degree in Computer Science and Engineering from West Bengal University of Technology, India. He presently holds the position of Senior Software Engineer at Cognizant Technology Solutions. He is also interested in building the architecture of contemporary systems using cloud and GenAI solutions, addressing difficult problems, migrating technologies, and optimizing algorithms.Sourav MandalSourav Mandal has been an Assistant Professor at XIM University's School of Computer Science and Engineering (SCSE), in Bhubaneswar, Odisha, India since October 2020. Prior to that, he had been employed since 2006 as an Assistant Professor in the Department of Computer Science and Engineering at the Haldia Institute of Technology in Haldia, India. Among his research interests in the natural language processing (NLP) and artificial intelligence (AI) field are natural language understanding, information extraction, text classification, text summarization, etc. with data science, machine learning, and deep learning. Sourav Mandal earned a bachelor's degree in Computer Science & Engineering from The University of Burdwan in Burdwan, India, in 2003, a master's degree in Multimedia Development from Jadavpur University in Kolkata, India, in 2005, and a Ph.D. in engineering from Jadavpur University in Kolkata, India, in 2020.Rohini BasakRohini Basak is currently working as an Assistant Professor in the department of Information Technology of Jadavpur University from 2018. She received her Ph.D. degree in Computer Science and Engineering from the same university in 2020. Her areas of research interest include Natural Language Processing, Computational Linguistics, Sentiment Analysis, Deep Learning, etc. She has supervised 10 master's degree students till now. Her areas of teaching are mainly focused on Object Oriented Programming using C++, Object Oriented Systems using Java, Data Structures and algorithms, Computer Organization and Networking, etc.","PeriodicalId":39673,"journal":{"name":"International Journal of Computers and Applications","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computers and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/1206212x.2023.2258328","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0
Abstract
AbstractBecause of the rapid advancement of technology over the last several years, the number of internet users is growing at an exponential rate, and as a result, email communication has become popular as a means of exchanging information over the internet. Sending data and communicating with peers via email is the most cost-effective method. These email services also cause problems for users by sending electronic junk mail, often known as spam mail. Spam email is a privacy concern that is linked to a slew of commercial and dangerous websites, causing phishing, virus distribution, and a slew of other problems. This study examines several aspects that have been used for email spam classification, as well as offering an overview of a handful of classifiers or algorithms that have been successfully evaluated, as well as exploratory data analysis. The proposed email spam classifier uses three parallel layers of machine learning and deep learning techniques, followed by a decision function to determine whether or not the emails are spam. During testing, it was found that the proposed classifier beats similar systems on the standard dataset with an accuracy of 98.4%.KEYWORDS: Content-based spam classificationemail spam classificationtext classificationmachine learningdeep learning Disclosure statementNo potential conflict of interest was reported by the author(s).Notes1 https://github.com/tensorflow/estimator2 https://nlp.stanford.edu/projects/glove/3 http://nlp.cs.aueb.gr/software_and_datasets/Enron-Spam/index.html4 https://www.tensorflow.org/Additional informationNotes on contributorsSubhajit DasSubhajit Das is an Information Technology Engineer with more than 11 years of experience in software Development. He has completed Master of Engineering from Jadavpur University, Kolakta, India on Software Engineering and received a bachelor's degree in Computer Science and Engineering from West Bengal University of Technology, India. He presently holds the position of Senior Software Engineer at Cognizant Technology Solutions. He is also interested in building the architecture of contemporary systems using cloud and GenAI solutions, addressing difficult problems, migrating technologies, and optimizing algorithms.Sourav MandalSourav Mandal has been an Assistant Professor at XIM University's School of Computer Science and Engineering (SCSE), in Bhubaneswar, Odisha, India since October 2020. Prior to that, he had been employed since 2006 as an Assistant Professor in the Department of Computer Science and Engineering at the Haldia Institute of Technology in Haldia, India. Among his research interests in the natural language processing (NLP) and artificial intelligence (AI) field are natural language understanding, information extraction, text classification, text summarization, etc. with data science, machine learning, and deep learning. Sourav Mandal earned a bachelor's degree in Computer Science & Engineering from The University of Burdwan in Burdwan, India, in 2003, a master's degree in Multimedia Development from Jadavpur University in Kolkata, India, in 2005, and a Ph.D. in engineering from Jadavpur University in Kolkata, India, in 2020.Rohini BasakRohini Basak is currently working as an Assistant Professor in the department of Information Technology of Jadavpur University from 2018. She received her Ph.D. degree in Computer Science and Engineering from the same university in 2020. Her areas of research interest include Natural Language Processing, Computational Linguistics, Sentiment Analysis, Deep Learning, etc. She has supervised 10 master's degree students till now. Her areas of teaching are mainly focused on Object Oriented Programming using C++, Object Oriented Systems using Java, Data Structures and algorithms, Computer Organization and Networking, etc.
期刊介绍:
The International Journal of Computers and Applications (IJCA) is a unique platform for publishing novel ideas, research outcomes and fundamental advances in all aspects of Computer Science, Computer Engineering, and Computer Applications. This is a peer-reviewed international journal with a vision to provide the academic and industrial community a platform for presenting original research ideas and applications. IJCA welcomes four special types of papers in addition to the regular research papers within its scope: (a) Papers for which all results could be easily reproducible. For such papers, the authors will be asked to upload "instructions for reproduction'''', possibly with the source codes or stable URLs (from where the codes could be downloaded). (b) Papers with negative results. For such papers, the experimental setting and negative results must be presented in detail. Also, why the negative results are important for the research community must be explained clearly. The rationale behind this kind of paper is that this would help researchers choose the correct approaches to solve problems and avoid the (already worked out) failed approaches. (c) Detailed report, case study and literature review articles about innovative software / hardware, new technology, high impact computer applications and future development with sufficient background and subject coverage. (d) Special issue papers focussing on a particular theme with significant importance or papers selected from a relevant conference with sufficient improvement and new material to differentiate from the papers published in a conference proceedings.