{"title":"基于多层分类决策技术的垃圾邮件检测","authors":"Subhajit Das, Sourav Mandal, Rohini Basak","doi":"10.1080/1206212x.2023.2258328","DOIUrl":null,"url":null,"abstract":"AbstractBecause of the rapid advancement of technology over the last several years, the number of internet users is growing at an exponential rate, and as a result, email communication has become popular as a means of exchanging information over the internet. Sending data and communicating with peers via email is the most cost-effective method. These email services also cause problems for users by sending electronic junk mail, often known as spam mail. Spam email is a privacy concern that is linked to a slew of commercial and dangerous websites, causing phishing, virus distribution, and a slew of other problems. This study examines several aspects that have been used for email spam classification, as well as offering an overview of a handful of classifiers or algorithms that have been successfully evaluated, as well as exploratory data analysis. The proposed email spam classifier uses three parallel layers of machine learning and deep learning techniques, followed by a decision function to determine whether or not the emails are spam. During testing, it was found that the proposed classifier beats similar systems on the standard dataset with an accuracy of 98.4%.KEYWORDS: Content-based spam classificationemail spam classificationtext classificationmachine learningdeep learning Disclosure statementNo potential conflict of interest was reported by the author(s).Notes1 https://github.com/tensorflow/estimator2 https://nlp.stanford.edu/projects/glove/3 http://nlp.cs.aueb.gr/software_and_datasets/Enron-Spam/index.html4 https://www.tensorflow.org/Additional informationNotes on contributorsSubhajit DasSubhajit Das is an Information Technology Engineer with more than 11 years of experience in software Development. He has completed Master of Engineering from Jadavpur University, Kolakta, India on Software Engineering and received a bachelor's degree in Computer Science and Engineering from West Bengal University of Technology, India. He presently holds the position of Senior Software Engineer at Cognizant Technology Solutions. He is also interested in building the architecture of contemporary systems using cloud and GenAI solutions, addressing difficult problems, migrating technologies, and optimizing algorithms.Sourav MandalSourav Mandal has been an Assistant Professor at XIM University's School of Computer Science and Engineering (SCSE), in Bhubaneswar, Odisha, India since October 2020. Prior to that, he had been employed since 2006 as an Assistant Professor in the Department of Computer Science and Engineering at the Haldia Institute of Technology in Haldia, India. Among his research interests in the natural language processing (NLP) and artificial intelligence (AI) field are natural language understanding, information extraction, text classification, text summarization, etc. with data science, machine learning, and deep learning. Sourav Mandal earned a bachelor's degree in Computer Science & Engineering from The University of Burdwan in Burdwan, India, in 2003, a master's degree in Multimedia Development from Jadavpur University in Kolkata, India, in 2005, and a Ph.D. in engineering from Jadavpur University in Kolkata, India, in 2020.Rohini BasakRohini Basak is currently working as an Assistant Professor in the department of Information Technology of Jadavpur University from 2018. She received her Ph.D. degree in Computer Science and Engineering from the same university in 2020. Her areas of research interest include Natural Language Processing, Computational Linguistics, Sentiment Analysis, Deep Learning, etc. She has supervised 10 master's degree students till now. Her areas of teaching are mainly focused on Object Oriented Programming using C++, Object Oriented Systems using Java, Data Structures and algorithms, Computer Organization and Networking, etc.","PeriodicalId":39673,"journal":{"name":"International Journal of Computers and Applications","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Spam email detection using a novel multilayer classification-based decision technique\",\"authors\":\"Subhajit Das, Sourav Mandal, Rohini Basak\",\"doi\":\"10.1080/1206212x.2023.2258328\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"AbstractBecause of the rapid advancement of technology over the last several years, the number of internet users is growing at an exponential rate, and as a result, email communication has become popular as a means of exchanging information over the internet. Sending data and communicating with peers via email is the most cost-effective method. These email services also cause problems for users by sending electronic junk mail, often known as spam mail. Spam email is a privacy concern that is linked to a slew of commercial and dangerous websites, causing phishing, virus distribution, and a slew of other problems. This study examines several aspects that have been used for email spam classification, as well as offering an overview of a handful of classifiers or algorithms that have been successfully evaluated, as well as exploratory data analysis. The proposed email spam classifier uses three parallel layers of machine learning and deep learning techniques, followed by a decision function to determine whether or not the emails are spam. During testing, it was found that the proposed classifier beats similar systems on the standard dataset with an accuracy of 98.4%.KEYWORDS: Content-based spam classificationemail spam classificationtext classificationmachine learningdeep learning Disclosure statementNo potential conflict of interest was reported by the author(s).Notes1 https://github.com/tensorflow/estimator2 https://nlp.stanford.edu/projects/glove/3 http://nlp.cs.aueb.gr/software_and_datasets/Enron-Spam/index.html4 https://www.tensorflow.org/Additional informationNotes on contributorsSubhajit DasSubhajit Das is an Information Technology Engineer with more than 11 years of experience in software Development. He has completed Master of Engineering from Jadavpur University, Kolakta, India on Software Engineering and received a bachelor's degree in Computer Science and Engineering from West Bengal University of Technology, India. He presently holds the position of Senior Software Engineer at Cognizant Technology Solutions. He is also interested in building the architecture of contemporary systems using cloud and GenAI solutions, addressing difficult problems, migrating technologies, and optimizing algorithms.Sourav MandalSourav Mandal has been an Assistant Professor at XIM University's School of Computer Science and Engineering (SCSE), in Bhubaneswar, Odisha, India since October 2020. Prior to that, he had been employed since 2006 as an Assistant Professor in the Department of Computer Science and Engineering at the Haldia Institute of Technology in Haldia, India. Among his research interests in the natural language processing (NLP) and artificial intelligence (AI) field are natural language understanding, information extraction, text classification, text summarization, etc. with data science, machine learning, and deep learning. Sourav Mandal earned a bachelor's degree in Computer Science & Engineering from The University of Burdwan in Burdwan, India, in 2003, a master's degree in Multimedia Development from Jadavpur University in Kolkata, India, in 2005, and a Ph.D. in engineering from Jadavpur University in Kolkata, India, in 2020.Rohini BasakRohini Basak is currently working as an Assistant Professor in the department of Information Technology of Jadavpur University from 2018. She received her Ph.D. degree in Computer Science and Engineering from the same university in 2020. Her areas of research interest include Natural Language Processing, Computational Linguistics, Sentiment Analysis, Deep Learning, etc. She has supervised 10 master's degree students till now. Her areas of teaching are mainly focused on Object Oriented Programming using C++, Object Oriented Systems using Java, Data Structures and algorithms, Computer Organization and Networking, etc.\",\"PeriodicalId\":39673,\"journal\":{\"name\":\"International Journal of Computers and Applications\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Computers and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/1206212x.2023.2258328\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computers and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/1206212x.2023.2258328","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0
摘要
摘要由于近年来科技的飞速发展,互联网用户的数量呈指数级增长,因此,电子邮件通信作为一种在互联网上交换信息的手段已经变得流行起来。通过电子邮件发送数据和与同行通信是最经济有效的方法。这些电子邮件服务还通过发送电子垃圾邮件(通常被称为垃圾邮件)给用户带来问题。垃圾邮件是一种隐私问题,它与大量商业和危险网站有关,导致网络钓鱼、病毒传播和一系列其他问题。本研究考察了用于垃圾邮件分类的几个方面,并概述了一些已成功评估的分类器或算法,以及探索性数据分析。提出的垃圾邮件分类器使用机器学习和深度学习技术的三个并行层,然后是一个决策函数来确定电子邮件是否为垃圾邮件。在测试过程中,发现所提出的分类器以98.4%的准确率击败了标准数据集上的类似系统。关键词:基于内容的垃圾邮件分类电子邮件垃圾邮件分类文本分类机器学习深度学习披露声明作者未报告潜在的利益冲突。注1 https://github.com/tensorflow/estimator2 https://nlp.stanford.edu/projects/glove/3 http://nlp.cs.aueb.gr/software_and_datasets/Enron-Spam/index.html4 https://www.tensorflow.org/Additional信息贡献者说明subhajit DasSubhajit Das是一名信息技术工程师,在软件开发方面拥有超过11年的经验。他获得了印度Kolakta Jadavpur大学软件工程硕士学位,并获得了印度西孟加拉邦科技大学计算机科学与工程学士学位。他目前担任Cognizant Technology Solutions的高级软件工程师。他还对使用云和GenAI解决方案构建当代系统架构、解决难题、迁移技术和优化算法感兴趣。Sourav Mandal自2020年10月起担任印度奥里萨邦布巴内斯瓦尔的XIM大学计算机科学与工程学院(SCSE)的助理教授。在此之前,他自2006年以来一直担任位于印度Haldia的Haldia Institute of Technology的计算机科学与工程系助理教授。他在自然语言处理(NLP)和人工智能(AI)领域的研究兴趣包括自然语言理解、信息提取、文本分类、文本摘要等与数据科学、机器学习和深度学习的结合。Sourav Mandal于2003年在印度布尔德万大学获得计算机科学与工程学士学位,2005年在印度加尔各答贾达夫普尔大学获得多媒体开发硕士学位,并于2020年在印度加尔各答贾达夫普尔大学获得工程学博士学位。Rohini BasakRohini Basak自2018年起担任印度贾达夫普尔大学信息技术系助理教授。她于2020年在同一所大学获得计算机科学与工程博士学位。她的研究兴趣包括自然语言处理、计算语言学、情感分析、深度学习等。至今已指导硕士生10名。主要讲授面向对象编程(c++)、面向对象系统(Java)、数据结构与算法、计算机组织与网络等。
Spam email detection using a novel multilayer classification-based decision technique
AbstractBecause of the rapid advancement of technology over the last several years, the number of internet users is growing at an exponential rate, and as a result, email communication has become popular as a means of exchanging information over the internet. Sending data and communicating with peers via email is the most cost-effective method. These email services also cause problems for users by sending electronic junk mail, often known as spam mail. Spam email is a privacy concern that is linked to a slew of commercial and dangerous websites, causing phishing, virus distribution, and a slew of other problems. This study examines several aspects that have been used for email spam classification, as well as offering an overview of a handful of classifiers or algorithms that have been successfully evaluated, as well as exploratory data analysis. The proposed email spam classifier uses three parallel layers of machine learning and deep learning techniques, followed by a decision function to determine whether or not the emails are spam. During testing, it was found that the proposed classifier beats similar systems on the standard dataset with an accuracy of 98.4%.KEYWORDS: Content-based spam classificationemail spam classificationtext classificationmachine learningdeep learning Disclosure statementNo potential conflict of interest was reported by the author(s).Notes1 https://github.com/tensorflow/estimator2 https://nlp.stanford.edu/projects/glove/3 http://nlp.cs.aueb.gr/software_and_datasets/Enron-Spam/index.html4 https://www.tensorflow.org/Additional informationNotes on contributorsSubhajit DasSubhajit Das is an Information Technology Engineer with more than 11 years of experience in software Development. He has completed Master of Engineering from Jadavpur University, Kolakta, India on Software Engineering and received a bachelor's degree in Computer Science and Engineering from West Bengal University of Technology, India. He presently holds the position of Senior Software Engineer at Cognizant Technology Solutions. He is also interested in building the architecture of contemporary systems using cloud and GenAI solutions, addressing difficult problems, migrating technologies, and optimizing algorithms.Sourav MandalSourav Mandal has been an Assistant Professor at XIM University's School of Computer Science and Engineering (SCSE), in Bhubaneswar, Odisha, India since October 2020. Prior to that, he had been employed since 2006 as an Assistant Professor in the Department of Computer Science and Engineering at the Haldia Institute of Technology in Haldia, India. Among his research interests in the natural language processing (NLP) and artificial intelligence (AI) field are natural language understanding, information extraction, text classification, text summarization, etc. with data science, machine learning, and deep learning. Sourav Mandal earned a bachelor's degree in Computer Science & Engineering from The University of Burdwan in Burdwan, India, in 2003, a master's degree in Multimedia Development from Jadavpur University in Kolkata, India, in 2005, and a Ph.D. in engineering from Jadavpur University in Kolkata, India, in 2020.Rohini BasakRohini Basak is currently working as an Assistant Professor in the department of Information Technology of Jadavpur University from 2018. She received her Ph.D. degree in Computer Science and Engineering from the same university in 2020. Her areas of research interest include Natural Language Processing, Computational Linguistics, Sentiment Analysis, Deep Learning, etc. She has supervised 10 master's degree students till now. Her areas of teaching are mainly focused on Object Oriented Programming using C++, Object Oriented Systems using Java, Data Structures and algorithms, Computer Organization and Networking, etc.
期刊介绍:
The International Journal of Computers and Applications (IJCA) is a unique platform for publishing novel ideas, research outcomes and fundamental advances in all aspects of Computer Science, Computer Engineering, and Computer Applications. This is a peer-reviewed international journal with a vision to provide the academic and industrial community a platform for presenting original research ideas and applications. IJCA welcomes four special types of papers in addition to the regular research papers within its scope: (a) Papers for which all results could be easily reproducible. For such papers, the authors will be asked to upload "instructions for reproduction'''', possibly with the source codes or stable URLs (from where the codes could be downloaded). (b) Papers with negative results. For such papers, the experimental setting and negative results must be presented in detail. Also, why the negative results are important for the research community must be explained clearly. The rationale behind this kind of paper is that this would help researchers choose the correct approaches to solve problems and avoid the (already worked out) failed approaches. (c) Detailed report, case study and literature review articles about innovative software / hardware, new technology, high impact computer applications and future development with sufficient background and subject coverage. (d) Special issue papers focussing on a particular theme with significant importance or papers selected from a relevant conference with sufficient improvement and new material to differentiate from the papers published in a conference proceedings.