Machine Learning-Based Detection of Phishing in COVID-19 Theme-Related Emails and Web Links

International Journal of Scientific Research in Computer Science, Engineering and Information Technology Pub Date : 2023-10-01 DOI:10.32628/cseit2390563

None Usman Ali, None Dr. Isma Farah Siddiqui

{"title":"Machine Learning-Based Detection of Phishing in COVID-19 Theme-Related Emails and Web Links","authors":"None Usman Ali, None Dr. Isma Farah Siddiqui","doi":"10.32628/cseit2390563","DOIUrl":null,"url":null,"abstract":"During the COVID-19 epidemic phishing dodges increased in frequency mostly the links provided current updates about COVID-19 hence it became easy to trick the victims. Many research studies suggest several solutions to prevent those attacks but still phishing assaults upsurge. There is no only way to perform phishing attacks through web links attackers also perform attacks through electronic mail. This study aims to propose an Effective Model using Ensemble Classifiers to predict phishing using COVID-19-themed emails and Web Links. Our study comprises two types of Datasets. Dataset 1 for web links and Dataset 2 for email. Dataset 1 contains a textual dataset while Dataset 2 contains images that were downloaded from different sources. We select ensemble classifiers including, Random Forest (RF), Ada Boost, Bagging, ExtraTree (ET), and Gradient Boosting (GB). During the analysis, we observed that Dataset 1 achieves the highest accuracy rate as compared to Dataset 2 which is 88.91%. The ET classifier performs with an accuracy rate of 88.91%, a precision rate of 89%, a recall rate of 89%, and an f1 score of 89% which is better as compared to other classifiers over both datasets. Interesting concepts were found during the study.","PeriodicalId":313456,"journal":{"name":"International Journal of Scientific Research in Computer Science, Engineering and Information Technology","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Scientific Research in Computer Science, Engineering and Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32628/cseit2390563","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

During the COVID-19 epidemic phishing dodges increased in frequency mostly the links provided current updates about COVID-19 hence it became easy to trick the victims. Many research studies suggest several solutions to prevent those attacks but still phishing assaults upsurge. There is no only way to perform phishing attacks through web links attackers also perform attacks through electronic mail. This study aims to propose an Effective Model using Ensemble Classifiers to predict phishing using COVID-19-themed emails and Web Links. Our study comprises two types of Datasets. Dataset 1 for web links and Dataset 2 for email. Dataset 1 contains a textual dataset while Dataset 2 contains images that were downloaded from different sources. We select ensemble classifiers including, Random Forest (RF), Ada Boost, Bagging, ExtraTree (ET), and Gradient Boosting (GB). During the analysis, we observed that Dataset 1 achieves the highest accuracy rate as compared to Dataset 2 which is 88.91%. The ET classifier performs with an accuracy rate of 88.91%, a precision rate of 89%, a recall rate of 89%, and an f1 score of 89% which is better as compared to other classifiers over both datasets. Interesting concepts were found during the study.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于机器学习的COVID-19主题相关电子邮件和Web链接中的网络钓鱼检测

在COVID-19流行期间，网络钓鱼的频率增加，主要是提供有关COVID-19的最新更新的链接，因此很容易欺骗受害者。许多研究提出了几种解决方案来防止这些攻击，但网络钓鱼攻击仍然高涨。网络钓鱼攻击不仅可以通过网络链接进行，还可以通过电子邮件进行攻击。本研究旨在提出一个使用集成分类器的有效模型来预测使用covid -19主题电子邮件和Web链接的网络钓鱼。我们的研究包括两类数据集。数据集1用于网络链接，数据集2用于电子邮件。数据集1包含一个文本数据集，而数据集2包含从不同来源下载的图像。我们选择的集成分类器包括随机森林(RF)、Ada Boost、Bagging、ExtraTree (ET)和梯度增强(GB)。在分析过程中，我们观察到与Dataset 2相比，Dataset 1的准确率最高，为88.91%。ET分类器的准确率为88.91%，准确率为89%，召回率为89%，f1得分为89%，在这两个数据集上都优于其他分类器。在研究过程中发现了一些有趣的概念。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International Journal of Scientific Research in Computer Science, Engineering and Information Technology

自引率

0.00%

发文量