{"title":"Web architecture for URL-based phishing detection based on Random Forest, Classification Trees, and Support Vector Machine","authors":"Julio Lamas Piñeiro, Lenis Wong Portillo","doi":"10.4114/intartif.vol25iss69pp107-121","DOIUrl":null,"url":null,"abstract":"Nowadays phishing is as serious a problem as any other, but it has intensified a lot in the current coronavirus pandemic, a time when more than ever we all use the Internet even to make payments daily. In this context, tools have been developed to detect phishing, there are quite complex tools in a computational calculation, and they are not so easy to use for any user. Therefore, in this work, we propose a web architecture based on 3 machine learning models to predict whether a web address has phishing or not based mainly on Random Forest, Classification Trees, and Support Vector Machine. Therefore, 3 different models are developed with each of the indicated techniques and 2 models based on the models, which are applied to web addresses previously processed by a feature retrieval module. All this is deployed in an API that is consumed by a Frontend so that any user can use it and choose which type of model he/she wants to predict with. The results reveal that the best performing model when predicting both results is the Classification Trees model obtaining precision and accuracy of 80%. \nEn la actualidad el phishing es un problema tan serio como cualquier otro, pero se ha intensificado bastante en la actual pandemia del coronavirus, un momento en el que más que nunca todos utilizamos internet hasta para realizar pagos cotidianamente. En este contexto se han desarrollado herramientas para detectar phishing, existen herramientas bastante complejas en calculo computacional y que no son de tan sencilla utilización para cualquier usuario. Por ende, en este trabajo proponemos una arquitectura web basada en 3 modelos de aprendizaje automático para predecir si una dirección web tiene phishing o no basados principalmente en Random Forest, Classification Trees y Support Vector Machine. Por lo tanto, se desarrollan 3 modelos distintos con cada una de las técnicas indicadas y 2 modelos basados en los anteriormente mencionados modelos, los cuales son aplicados a direcciones web previamente procesadas por un módulo de obtención de características. Todo ello se despliega en un API la cual es consumida por un Frontend para que cualquier usuario lo pueda utilizar y escoger con qué tipo de modelo quiere predecir. Los resultados revelan que el modelo que mejor se comporta al momento de predecir ambos resultados es el modelo de Árboles de clasificación obteniendo una precisión y exactitud de 80%.","PeriodicalId":43470,"journal":{"name":"Inteligencia Artificial-Iberoamerical Journal of Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2022-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Inteligencia Artificial-Iberoamerical Journal of Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4114/intartif.vol25iss69pp107-121","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 3
Abstract
Nowadays phishing is as serious a problem as any other, but it has intensified a lot in the current coronavirus pandemic, a time when more than ever we all use the Internet even to make payments daily. In this context, tools have been developed to detect phishing, there are quite complex tools in a computational calculation, and they are not so easy to use for any user. Therefore, in this work, we propose a web architecture based on 3 machine learning models to predict whether a web address has phishing or not based mainly on Random Forest, Classification Trees, and Support Vector Machine. Therefore, 3 different models are developed with each of the indicated techniques and 2 models based on the models, which are applied to web addresses previously processed by a feature retrieval module. All this is deployed in an API that is consumed by a Frontend so that any user can use it and choose which type of model he/she wants to predict with. The results reveal that the best performing model when predicting both results is the Classification Trees model obtaining precision and accuracy of 80%.
En la actualidad el phishing es un problema tan serio como cualquier otro, pero se ha intensificado bastante en la actual pandemia del coronavirus, un momento en el que más que nunca todos utilizamos internet hasta para realizar pagos cotidianamente. En este contexto se han desarrollado herramientas para detectar phishing, existen herramientas bastante complejas en calculo computacional y que no son de tan sencilla utilización para cualquier usuario. Por ende, en este trabajo proponemos una arquitectura web basada en 3 modelos de aprendizaje automático para predecir si una dirección web tiene phishing o no basados principalmente en Random Forest, Classification Trees y Support Vector Machine. Por lo tanto, se desarrollan 3 modelos distintos con cada una de las técnicas indicadas y 2 modelos basados en los anteriormente mencionados modelos, los cuales son aplicados a direcciones web previamente procesadas por un módulo de obtención de características. Todo ello se despliega en un API la cual es consumida por un Frontend para que cualquier usuario lo pueda utilizar y escoger con qué tipo de modelo quiere predecir. Los resultados revelan que el modelo que mejor se comporta al momento de predecir ambos resultados es el modelo de Árboles de clasificación obteniendo una precisión y exactitud de 80%.
如今,网络钓鱼和其他任何问题一样严重,但在当前的冠状病毒大流行中,它已经加剧了很多,我们比以往任何时候都更多地使用互联网,甚至每天进行支付。在这种背景下,已经开发出了检测网络钓鱼的工具,在计算计算中有相当复杂的工具,并且它们对任何用户来说都不是那么容易使用。因此,在这项工作中,我们提出了一种基于3种机器学习模型的web架构,主要基于随机森林、分类树和支持向量机来预测网址是否存在网络钓鱼。因此,使用每种技术开发了3个不同的模型,并基于这些模型开发了2个模型,这些模型应用于先前由特征检索模块处理的web地址。所有这些都部署在由前端使用的API中,以便任何用户都可以使用它并选择他/她想要预测的模型类型。结果表明,在预测两种结果时,表现最好的模型是分类树模型,其精度和准确度均达到80%。在现实中,网络钓鱼是一个问题,而严重的网络钓鱼是一个问题,人们认为,在实际的冠状病毒大流行中,网络钓鱼的加剧是一个问题,在现实中,网络钓鱼的加剧是一个问题,在现实中,网络钓鱼的加剧是一个问题。在这种情况下,我们可以看到,在网络钓鱼检测中,存在的网络钓鱼检测是复杂的,在计算中,存在的网络钓鱼检测是复杂的,在网络钓鱼检测中,存在的网络钓鱼检测是复杂的,在网络钓鱼检测中存在的网络钓鱼检测是复杂的。基于随机森林、分类树和支持向量机的网络钓鱼的基本原理。基于随机森林、分类树和支持向量机的网络钓鱼的基本原理。在这里,我们将介绍3个模型的不同之处,例如:1个模型的不同之处,例如:1个模型的不同之处,例如:1个模型的不同之处,例如:1个模型的不同之处,例如:1个模型的不同之处,例如:1个模型的不同之处,例如:1个模型的不同之处,例如:1个模型的不同之处,例如:1个模型的不同之处,例如:1个模型的不同之处,例如:1个模型的不同之处,例如:1个模型的不同之处,例如:módulo de obtención de características。为了更好地描述API中所描述的特性,可以使用使用的特性,或者使用使用的特性,或者使用使用的特性,例如使用使用的特性,或者使用的特性。结果表明,该模型的精度为80%;结果表明,该模型的精度为80%;结果表明,该模型的精度为80%;
期刊介绍:
Inteligencia Artificial is a quarterly journal promoted and sponsored by the Spanish Association for Artificial Intelligence. The journal publishes high-quality original research papers reporting theoretical or applied advances in all branches of Artificial Intelligence. The journal publishes high-quality original research papers reporting theoretical or applied advances in all branches of Artificial Intelligence. Particularly, the Journal welcomes: New approaches, techniques or methods to solve AI problems, which should include demonstrations of effectiveness oor improvement over existing methods. These demonstrations must be reproducible. Integration of different technologies or approaches to solve wide problems or belonging different areas. AI applications, which should describe in detail the problem or the scenario and the proposed solution, emphasizing its novelty and present a evaluation of the AI techniques that are applied. In addition to rapid publication and dissemination of unsolicited contributions, the journal is also committed to producing monographs, surveys or special issues on topics, methods or techniques of special relevance to the AI community. Inteligencia Artificial welcomes submissions written in English, Spaninsh or Portuguese. But at least, a title, summary and keywords in english should be included in each contribution.