Deep learning based phishing website identification system using CNN-LSTM classifier

IF 1.1 Q3 INFORMATION SCIENCE & LIBRARY SCIENCE JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES Pub Date : 2023-01-01 DOI:10.47974/jios-1343
Vinod Sapkal, Praveen Gupta, Aboo Bakar Khan
{"title":"Deep learning based phishing website identification system using CNN-LSTM classifier","authors":"Vinod Sapkal, Praveen Gupta, Aboo Bakar Khan","doi":"10.47974/jios-1343","DOIUrl":null,"url":null,"abstract":"The term phishing refers to an attack that pretends to be the website of a large corporation, typically one dealing with money, such as a bank or other financial institution or an online retailer. Its primary objective is to acquire personally identifiable information from users, such as their social security numbers, credit card information, and passwords. Due to the rise of phishing attacks, various techniques have been developed in order to combat these threats. One of these is deep learning algorithms, which are capable of learning and analyzing massive datasets. Due to their capabilities, these algorithms are very useful in identifying and preventing phishing attacks. Due to the complexity of the phishing websites, many development systems have been created to detect them. Unfortunately, the output that was desired cannot be achieved by these systems, and they have a number of other flaws as well. The purpose of this paper is to propose a hybrid deep learning-based phishing detection system that is easy to put into practice. The quality of the input dataset is improved through the process of preprocessing the dataset. After that, the procedures of clustering and feature selection are carried out in order to improve the accuracy and decrease the amount of time required for the processing. The resulting features are then fed into the CNN_LSTM, which is a classification system that classifies websites that are phishing and legitimate. Proposed Hybrid deep learning models are proposed to combine the features of natural language processing (NLP) and character embedding. They can then reveal high-level connections between characters. In terms of the metric that is being used for the evaluation, the performance of the models that have been proposed is better than that of the other models.","PeriodicalId":46518,"journal":{"name":"JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES","volume":null,"pages":null},"PeriodicalIF":1.1000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.47974/jios-1343","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
引用次数: 0

Abstract

The term phishing refers to an attack that pretends to be the website of a large corporation, typically one dealing with money, such as a bank or other financial institution or an online retailer. Its primary objective is to acquire personally identifiable information from users, such as their social security numbers, credit card information, and passwords. Due to the rise of phishing attacks, various techniques have been developed in order to combat these threats. One of these is deep learning algorithms, which are capable of learning and analyzing massive datasets. Due to their capabilities, these algorithms are very useful in identifying and preventing phishing attacks. Due to the complexity of the phishing websites, many development systems have been created to detect them. Unfortunately, the output that was desired cannot be achieved by these systems, and they have a number of other flaws as well. The purpose of this paper is to propose a hybrid deep learning-based phishing detection system that is easy to put into practice. The quality of the input dataset is improved through the process of preprocessing the dataset. After that, the procedures of clustering and feature selection are carried out in order to improve the accuracy and decrease the amount of time required for the processing. The resulting features are then fed into the CNN_LSTM, which is a classification system that classifies websites that are phishing and legitimate. Proposed Hybrid deep learning models are proposed to combine the features of natural language processing (NLP) and character embedding. They can then reveal high-level connections between characters. In terms of the metric that is being used for the evaluation, the performance of the models that have been proposed is better than that of the other models.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于CNN-LSTM分类器的深度学习网络钓鱼网站识别系统
网络钓鱼指的是假装是大型公司网站的攻击,通常是处理金钱的公司,如银行或其他金融机构或在线零售商。它的主要目标是从用户那里获取个人身份信息,例如他们的社会安全号码、信用卡信息和密码。由于网络钓鱼攻击的增加,为了对抗这些威胁,已经开发了各种技术。其中之一是深度学习算法,它能够学习和分析大量数据集。由于它们的功能,这些算法在识别和防止网络钓鱼攻击方面非常有用。由于网络钓鱼网站的复杂性,已经创建了许多开发系统来检测它们。不幸的是,这些系统无法实现期望的输出,而且它们还有许多其他缺陷。本文的目的是提出一种易于实施的基于深度学习的混合网络钓鱼检测系统。通过对数据集进行预处理,提高了输入数据集的质量。然后进行聚类和特征选择,以提高精度,减少处理所需的时间。然后将得到的特征输入到CNN_LSTM中,这是一个分类系统,可以对钓鱼网站和合法网站进行分类。将自然语言处理(NLP)和字符嵌入相结合,提出了混合深度学习模型。然后,它们可以揭示人物之间的高层联系。就用于评估的度量而言,已经提出的模型的性能比其他模型的性能要好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES
JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES INFORMATION SCIENCE & LIBRARY SCIENCE-
自引率
21.40%
发文量
88
期刊最新文献
An approach to fuzzy transportation problem using Triacontakaidigon fuzzy number with alpha cut ranking technique Credit strategy of micro, small, and medium enterprises with known reputation risk: Evidence from a comprehensive evaluation model Some results on the open subset intersection graph of a product topological space Deep learning for automatic identification of plants through leaf DCGAN-based deep learning approach for medicinal leaf identification
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1