Aws Rayyan, Mohammad Ghassan Aburas, Amjed Al-mousa
{"title":"使用经典机器学习和深度学习技术的统一资源定位器分类","authors":"Aws Rayyan, Mohammad Ghassan Aburas, Amjed Al-mousa","doi":"10.37256/ccds.4120231847","DOIUrl":null,"url":null,"abstract":"In the Internet era, there is no doubt that the Internet has helped us in many ways by providing us with a means to communicate with anyone around the world. That is said, some people misuse such technology to conduct malicious behaviors. Many things could be exploited to perform such acts, but this work focuses on exploitation methods that use the uniform resource locator (URL). This paper presents the means to extract features from a raw URL. These are used to predict whether a URL is safe for a user to visit or not. The whole process of extracting the data and preparing it for a model is discussed thoroughly in this paper. Several machine learning (ML) models have been trained using different algorithms, including Catboost, RandomForest, and Decision trees, in addition to using and exploring several feedforward deep neural networks learning models. The best model achieved an accuracy of 95.61% on a test set using a deep learning model.","PeriodicalId":158315,"journal":{"name":"Cloud Computing and Data Science","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Uniform Resource Locator Classification Using Classical Machine Learning & Deep Learning Techniques\",\"authors\":\"Aws Rayyan, Mohammad Ghassan Aburas, Amjed Al-mousa\",\"doi\":\"10.37256/ccds.4120231847\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the Internet era, there is no doubt that the Internet has helped us in many ways by providing us with a means to communicate with anyone around the world. That is said, some people misuse such technology to conduct malicious behaviors. Many things could be exploited to perform such acts, but this work focuses on exploitation methods that use the uniform resource locator (URL). This paper presents the means to extract features from a raw URL. These are used to predict whether a URL is safe for a user to visit or not. The whole process of extracting the data and preparing it for a model is discussed thoroughly in this paper. Several machine learning (ML) models have been trained using different algorithms, including Catboost, RandomForest, and Decision trees, in addition to using and exploring several feedforward deep neural networks learning models. The best model achieved an accuracy of 95.61% on a test set using a deep learning model.\",\"PeriodicalId\":158315,\"journal\":{\"name\":\"Cloud Computing and Data Science\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cloud Computing and Data Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.37256/ccds.4120231847\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cloud Computing and Data Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.37256/ccds.4120231847","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Uniform Resource Locator Classification Using Classical Machine Learning & Deep Learning Techniques
In the Internet era, there is no doubt that the Internet has helped us in many ways by providing us with a means to communicate with anyone around the world. That is said, some people misuse such technology to conduct malicious behaviors. Many things could be exploited to perform such acts, but this work focuses on exploitation methods that use the uniform resource locator (URL). This paper presents the means to extract features from a raw URL. These are used to predict whether a URL is safe for a user to visit or not. The whole process of extracting the data and preparing it for a model is discussed thoroughly in this paper. Several machine learning (ML) models have been trained using different algorithms, including Catboost, RandomForest, and Decision trees, in addition to using and exploring several feedforward deep neural networks learning models. The best model achieved an accuracy of 95.61% on a test set using a deep learning model.