Uniform Resource Locator Classification Using Classical Machine Learning & Deep Learning Techniques

Cloud Computing and Data Science Pub Date : 2022-10-31 DOI:10.37256/ccds.4120231847

Aws Rayyan, Mohammad Ghassan Aburas, Amjed Al-mousa

引用次数: 2

Abstract

In the Internet era, there is no doubt that the Internet has helped us in many ways by providing us with a means to communicate with anyone around the world. That is said, some people misuse such technology to conduct malicious behaviors. Many things could be exploited to perform such acts, but this work focuses on exploitation methods that use the uniform resource locator (URL). This paper presents the means to extract features from a raw URL. These are used to predict whether a URL is safe for a user to visit or not. The whole process of extracting the data and preparing it for a model is discussed thoroughly in this paper. Several machine learning (ML) models have been trained using different algorithms, including Catboost, RandomForest, and Decision trees, in addition to using and exploring several feedforward deep neural networks learning models. The best model achieved an accuracy of 95.61% on a test set using a deep learning model.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用经典机器学习和深度学习技术的统一资源定位器分类

在互联网时代，毫无疑问，互联网在很多方面帮助了我们，为我们提供了一种与世界各地的任何人交流的手段。也就是说，有些人滥用这种技术进行恶意行为。可以利用许多东西来执行此类行为，但本文主要关注使用统一资源定位符(URL)的利用方法。本文介绍了从原始URL中提取特征的方法。它们用于预测URL对用户访问是否安全。本文对数据提取和模型准备的整个过程进行了深入的讨论。除了使用和探索几种前馈深度神经网络学习模型外，还使用不同的算法训练了几种机器学习(ML)模型，包括Catboost、RandomForest和Decision trees。最好的模型在使用深度学习模型的测试集上实现了95.61%的准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Cloud Computing and Data Science

自引率

0.00%

发文量