{"title":"Comparative Analysis ofK-Nn, Naïve Bayes, and logistic regression for credit card fraud detection","authors":"Kavita Arora, Sonal Pathak, Nguyen Thi Dieu Linh","doi":"10.16925/2357-6014.2023.03.05","DOIUrl":null,"url":null,"abstract":"Introduction:This paper highlights the outcome of the comparative study of “Various Machine learning algo-rithms namely K-NN, Naive Bayes, and Logistic Regression for Credit Card Fraud Detection” carried out based on a dataset taken from UCI.com in 2022-23 at Manav Rachna International Institute of Research and Studies.Problem: Credit card fraud is still rife today and the modes are increasingly varied. Quite often we hear of fraud cases that cause irreplaceable injury to banks and financial institutions which cannot be compensated in terms of costs. To avoid scams with various modes of credit cards, we must be able to identify and find out the modes often used by fraudsters. This scheme liberates such financial institutions and banks with complete and appropriate information using Machine Learning Techniques, not only about the modes that scammers or fraudsters often use but also ways to protect against such frauds.Objective: The present paper discusses the various machine learning models based on classification and re-gression, namely K-Nearest Neighbors, Naïve Bayes, and Logistic Regression, which are successfully able to achieve the classification accuracy of 80% using Logistic Regression with a Precision of 78%, Recall of 100%, and F1-Score of 88% for fraudulent credit card transactions. Methodology: The comparative analysis demonstrates that for Precision, Recall, and Accuracy parameters, the K-Nearest Neighbor is a better approach for detecting fraudulent transactions than the Logistic Regression and Naïve Bayes. Results:The accuracy is marginal high in Logistic Regression but the False Positive parameters are not able to identify the imbalanced data; therefore, they disguise the results and accuracy of Logistic Regression and K-Nearest Neighbor deems fit for such cases.Conclusion: This scheme depicts the automated fraud classification systems using machine learning techni-ques, namely K-Nearest Neighbor, Logistic Regression, and Naive Bayes, to produce a model that can distin-guish valid and invalid credit card transactions.Originality:Through this research, the most relevant features are used to go through the visualization of accu-racy with the confusion matrix, and accuracy calculations are obtained from the dataset used.Limitations:Deep learning techniques could have been used to fetch even better results.","PeriodicalId":41023,"journal":{"name":"Ingenieria Solidaria","volume":"2 1","pages":"0"},"PeriodicalIF":0.4000,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ingenieria Solidaria","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.16925/2357-6014.2023.03.05","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction:This paper highlights the outcome of the comparative study of “Various Machine learning algo-rithms namely K-NN, Naive Bayes, and Logistic Regression for Credit Card Fraud Detection” carried out based on a dataset taken from UCI.com in 2022-23 at Manav Rachna International Institute of Research and Studies.Problem: Credit card fraud is still rife today and the modes are increasingly varied. Quite often we hear of fraud cases that cause irreplaceable injury to banks and financial institutions which cannot be compensated in terms of costs. To avoid scams with various modes of credit cards, we must be able to identify and find out the modes often used by fraudsters. This scheme liberates such financial institutions and banks with complete and appropriate information using Machine Learning Techniques, not only about the modes that scammers or fraudsters often use but also ways to protect against such frauds.Objective: The present paper discusses the various machine learning models based on classification and re-gression, namely K-Nearest Neighbors, Naïve Bayes, and Logistic Regression, which are successfully able to achieve the classification accuracy of 80% using Logistic Regression with a Precision of 78%, Recall of 100%, and F1-Score of 88% for fraudulent credit card transactions. Methodology: The comparative analysis demonstrates that for Precision, Recall, and Accuracy parameters, the K-Nearest Neighbor is a better approach for detecting fraudulent transactions than the Logistic Regression and Naïve Bayes. Results:The accuracy is marginal high in Logistic Regression but the False Positive parameters are not able to identify the imbalanced data; therefore, they disguise the results and accuracy of Logistic Regression and K-Nearest Neighbor deems fit for such cases.Conclusion: This scheme depicts the automated fraud classification systems using machine learning techni-ques, namely K-Nearest Neighbor, Logistic Regression, and Naive Bayes, to produce a model that can distin-guish valid and invalid credit card transactions.Originality:Through this research, the most relevant features are used to go through the visualization of accu-racy with the confusion matrix, and accuracy calculations are obtained from the dataset used.Limitations:Deep learning techniques could have been used to fetch even better results.