Improving Phishing Email Detection Using the Hybrid Machine Learning Approach

Naveen Palanichamy, Yoga Shri Murti
{"title":"Improving Phishing Email Detection Using the Hybrid Machine Learning Approach","authors":"Naveen Palanichamy, Yoga Shri Murti","doi":"10.18080/jtde.v11n3.778","DOIUrl":null,"url":null,"abstract":"Phishing emails pose a severe risk to online users, necessitating effective identification methods to safeguard digital communication. Detection techniques are continuously researched to address the evolution of phishing strategies. Machine learning (ML) is a powerful tool for automated phishing email detection, but existing techniques like support vector machines and Naive Bayes have proven slow or ineffective in handling spam filtering. This study attempts to provide a phishing email detector and reliable classifier using a hybrid machine classifier with term frequency-inverse document frequency (TF-IDF) and an effective feature extraction technique (FET) on a real-world dataset from Kaggle. Exploratory data analysis is conducted to enhance understanding of the dataset and identify any conspicuous errors and outliers to facilitate the detection process. The FET converts the data text into a numerical representation that can be used for ML algorithms. The model’s performance is evaluated using accuracy, precision, recall, F1 score, receiver operating characteristic (ROC) curve and area under the ROC curve metrics. The research findings indicate that the hybrid model utilising TF-IDF achieved superior performance, with an accuracy of 87.5%. The paper offers valuable knowledge on using ML to identify phishing emails and highlights the importance of combining various models.","PeriodicalId":37752,"journal":{"name":"Australian Journal of Telecommunications and the Digital Economy","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Australian Journal of Telecommunications and the Digital Economy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18080/jtde.v11n3.778","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 0

Abstract

Phishing emails pose a severe risk to online users, necessitating effective identification methods to safeguard digital communication. Detection techniques are continuously researched to address the evolution of phishing strategies. Machine learning (ML) is a powerful tool for automated phishing email detection, but existing techniques like support vector machines and Naive Bayes have proven slow or ineffective in handling spam filtering. This study attempts to provide a phishing email detector and reliable classifier using a hybrid machine classifier with term frequency-inverse document frequency (TF-IDF) and an effective feature extraction technique (FET) on a real-world dataset from Kaggle. Exploratory data analysis is conducted to enhance understanding of the dataset and identify any conspicuous errors and outliers to facilitate the detection process. The FET converts the data text into a numerical representation that can be used for ML algorithms. The model’s performance is evaluated using accuracy, precision, recall, F1 score, receiver operating characteristic (ROC) curve and area under the ROC curve metrics. The research findings indicate that the hybrid model utilising TF-IDF achieved superior performance, with an accuracy of 87.5%. The paper offers valuable knowledge on using ML to identify phishing emails and highlights the importance of combining various models.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用混合机器学习方法改进网络钓鱼电子邮件检测
网络钓鱼邮件对在线用户构成严重威胁,需要有效的识别方法来保护数字通信。检测技术的不断研究,以解决网络钓鱼策略的演变。机器学习(ML)是自动检测网络钓鱼电子邮件的强大工具,但现有的技术,如支持向量机和朴素贝叶斯,在处理垃圾邮件过滤方面已经被证明是缓慢或无效的。本研究试图在来自Kaggle的真实数据集上使用具有词频-逆文档频率(TF-IDF)和有效特征提取技术(FET)的混合机器分类器提供一个网络钓鱼邮件检测器和可靠的分类器。探索性数据分析是为了加强对数据集的理解,并识别任何明显的错误和异常值,以促进检测过程。FET将数据文本转换为可用于ML算法的数字表示形式。采用准确率、精密度、召回率、F1评分、受试者工作特征(ROC)曲线和ROC曲线下面积等指标评价模型的性能。研究结果表明,利用TF-IDF的混合模型取得了优异的性能,准确率达到87.5%。本文提供了使用机器学习识别网络钓鱼电子邮件的宝贵知识,并强调了组合各种模型的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
1.60
自引率
0.00%
发文量
37
期刊介绍: The Journal of Telecommunications and the Digital Economy (JTDE) is an international, open-access, high quality, peer reviewed journal, indexed by Scopus and Google Scholar, covering innovative research and practice in Telecommunications, Digital Economy and Applications. The mission of JTDE is to further through publication the objective of advancing learning, knowledge and research worldwide. The JTDE publishes peer reviewed papers that may take the following form: *Research Paper - a paper making an original contribution to engineering knowledge. *Special Interest Paper – a report on significant aspects of a major or notable project. *Review Paper for specialists – an overview of a relevant area intended for specialists in the field covered. *Review Paper for non-specialists – an overview of a relevant area suitable for a reader with an electrical/electronics background. *Public Policy Discussion - a paper that identifies or discusses public policy and includes investigation of legislation, regulation and what is happening around the world including best practice *Tutorial Paper – a paper that explains an important subject or clarifies the approach to an area of design or investigation. *Technical Note – a technical note or letter to the Editors that is not sufficiently developed or extensive in scope to constitute a full paper. *Industry Case Study - a paper that provides details of industry practices utilising a case study to provide an understanding of what is occurring and how the outcomes have been achieved. *Discussion – a contribution to discuss a published paper to which the original author''s response will be sought. Historical - a paper covering a historical topic related to telecommunications or the digital economy.
期刊最新文献
Blockchain Technology for Tourism Post COVID-19 ICT-driven Transparency: Empirical Evidence from Selected Asian Countries Big Data Analytics in Tracking COVID-19 Spread Utilizing Google Location Data Harry S. Wragge AM (1929-2023) Phishing Message Detection Based on Keyword Matching
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1