{"title":"Machine Learning Approach for Phishing Attack Detection","authors":"Tarun Choudhary, Siddhesh Mhapankar, Rohit Bhddha, Ashish Kharuk, Rohini Patil","doi":"10.37965/jait.2023.0197","DOIUrl":null,"url":null,"abstract":"Phishing is the easiest method for gathering sensitive information from unwary people. Phishers seek to get private data including passwords, login information, and bank account details. Cyber security experts are actively seeking for trustworthy and effective ways to identify phishing websites. In order to distinguish between legal and phishing URLs, we used machine learning (ML) technology. In this research work using ML technology extraction and analysis of both types of URLs was performed. Extreme Gradient Boosting (XGBoost), Decision Tree (DT), Logistic Regression (LR), Random Forest (RF), and Support Vector Machine (SVM) were used to identify phishing websites. The goal was to identify phishing URLs and determine the most effective ML technique by comparing the accuracy rates of each algorithm. In this, proposed methodology two datasets were used. The accuracy of models was calculated on Phishtank and UCI dataset using kfold, feature selection and hyperparameter tuning method. Performance measures precision, recall and F1-score and Receiver Operating Characteristics (ROC) curve were calculated. RF provided an accuracy of 98.80% and 97.87% on the Phishtank dataset and UCI respectively. Highest precision, recall, F1-score value was 99% each and AUC-ROC value was 99.89% with Phishtank dataset. Validation with other researchers showed better results with proposed methodology. Therefore this methodology can be of help to identify phishing websites.","PeriodicalId":70996,"journal":{"name":"人工智能技术学报(英文)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"人工智能技术学报(英文)","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.37965/jait.2023.0197","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Phishing is the easiest method for gathering sensitive information from unwary people. Phishers seek to get private data including passwords, login information, and bank account details. Cyber security experts are actively seeking for trustworthy and effective ways to identify phishing websites. In order to distinguish between legal and phishing URLs, we used machine learning (ML) technology. In this research work using ML technology extraction and analysis of both types of URLs was performed. Extreme Gradient Boosting (XGBoost), Decision Tree (DT), Logistic Regression (LR), Random Forest (RF), and Support Vector Machine (SVM) were used to identify phishing websites. The goal was to identify phishing URLs and determine the most effective ML technique by comparing the accuracy rates of each algorithm. In this, proposed methodology two datasets were used. The accuracy of models was calculated on Phishtank and UCI dataset using kfold, feature selection and hyperparameter tuning method. Performance measures precision, recall and F1-score and Receiver Operating Characteristics (ROC) curve were calculated. RF provided an accuracy of 98.80% and 97.87% on the Phishtank dataset and UCI respectively. Highest precision, recall, F1-score value was 99% each and AUC-ROC value was 99.89% with Phishtank dataset. Validation with other researchers showed better results with proposed methodology. Therefore this methodology can be of help to identify phishing websites.