Credit Card Fraud Detection on Original European Credit Card Holder Dataset Using Ensemble Machine Learning Technique

Q3 Computer Science Journal of Cyber Security and Mobility Pub Date : 2023-01-01 DOI:10.32604/jcs.2023.045422

Yih Bing Chu, Zhi Min Lim, Bryan Keane, Ping Hao Kong, Ahmed Rafat Elkilany, Osama Hisham Abusetta

{"title":"Credit Card Fraud Detection on Original European Credit Card Holder Dataset Using Ensemble Machine Learning Technique","authors":"Yih Bing Chu, Zhi Min Lim, Bryan Keane, Ping Hao Kong, Ahmed Rafat Elkilany, Osama Hisham Abusetta","doi":"10.32604/jcs.2023.045422","DOIUrl":null,"url":null,"abstract":"The proliferation of digital payment methods facilitated by various online platforms and applications has led to a surge in financial fraud, particularly in credit card transactions. Advanced technologies such as machine learning have been widely employed to enhance the early detection and prevention of losses arising from potentially fraudulent activities. However, a prevalent approach in existing literature involves the use of extensive data sampling and feature selection algorithms as a precursor to subsequent investigations. While sampling techniques can significantly reduce computational time, the resulting dataset relies on generated data and the accuracy of the pre-processing machine learning models employed. Such datasets often lack true representativeness of real-world data, potentially introducing secondary issues that affect the precision of the results. For instance, under-sampling may result in the loss of critical information, while over-sampling can lead to overfitting machine learning models. In this paper, we proposed a classification study of credit card fraud using fundamental machine learning models without the application of any sampling techniques on all the features present in the original dataset. The results indicate that Support Vector Machine (SVM) consistently achieves classification performance exceeding 90% across various evaluation metrics. This discovery serves as a valuable reference for future research, encouraging comparative studies on original dataset without the reliance on sampling techniques. Furthermore, we explore hybrid machine learning techniques, such as ensemble learning constructed based on SVM, K-Nearest Neighbor (KNN) and decision tree, highlighting their potential advancements in the field. The study demonstrates that the proposed machine learning models yield promising results, suggesting that pre-processing the dataset with sampling algorithm or additional machine learning technique may not always be necessary. This research contributes to the field of credit card fraud detection by emphasizing the potential of employing machine learning models directly on original datasets, thereby simplifying the workflow and potentially improving the accuracy and efficiency of fraud detection systems.","PeriodicalId":37820,"journal":{"name":"Journal of Cyber Security and Mobility","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cyber Security and Mobility","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32604/jcs.2023.045422","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 0

Abstract

The proliferation of digital payment methods facilitated by various online platforms and applications has led to a surge in financial fraud, particularly in credit card transactions. Advanced technologies such as machine learning have been widely employed to enhance the early detection and prevention of losses arising from potentially fraudulent activities. However, a prevalent approach in existing literature involves the use of extensive data sampling and feature selection algorithms as a precursor to subsequent investigations. While sampling techniques can significantly reduce computational time, the resulting dataset relies on generated data and the accuracy of the pre-processing machine learning models employed. Such datasets often lack true representativeness of real-world data, potentially introducing secondary issues that affect the precision of the results. For instance, under-sampling may result in the loss of critical information, while over-sampling can lead to overfitting machine learning models. In this paper, we proposed a classification study of credit card fraud using fundamental machine learning models without the application of any sampling techniques on all the features present in the original dataset. The results indicate that Support Vector Machine (SVM) consistently achieves classification performance exceeding 90% across various evaluation metrics. This discovery serves as a valuable reference for future research, encouraging comparative studies on original dataset without the reliance on sampling techniques. Furthermore, we explore hybrid machine learning techniques, such as ensemble learning constructed based on SVM, K-Nearest Neighbor (KNN) and decision tree, highlighting their potential advancements in the field. The study demonstrates that the proposed machine learning models yield promising results, suggesting that pre-processing the dataset with sampling algorithm or additional machine learning technique may not always be necessary. This research contributes to the field of credit card fraud detection by emphasizing the potential of employing machine learning models directly on original datasets, thereby simplifying the workflow and potentially improving the accuracy and efficiency of fraud detection systems.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于集成机器学习技术的欧洲原始信用卡持卡人数据集的信用卡欺诈检测

各种在线平台和应用程序促进了数字支付方式的激增，导致金融欺诈激增，尤其是信用卡交易。机器学习等先进技术已被广泛用于加强早期发现和预防潜在欺诈活动造成的损失。然而，在现有文献中，一种流行的方法涉及使用广泛的数据采样和特征选择算法作为后续研究的先驱。虽然采样技术可以显著减少计算时间，但最终的数据集依赖于生成的数据和所采用的预处理机器学习模型的准确性。这样的数据集通常缺乏真实世界数据的真正代表性，可能会引入影响结果精度的次要问题。例如，欠采样可能导致关键信息的丢失，而过采样可能导致机器学习模型的过拟合。在本文中，我们提出了一种使用基本机器学习模型的信用卡欺诈分类研究，而无需对原始数据集中存在的所有特征应用任何采样技术。结果表明，支持向量机(SVM)在各种评价指标上的分类性能均达到90%以上。这一发现为未来的研究提供了有价值的参考，鼓励了对原始数据集的比较研究，而不依赖于采样技术。此外，我们探讨了混合机器学习技术，如基于支持向量机、k -最近邻(KNN)和决策树构建的集成学习，突出了它们在该领域的潜在进展。该研究表明，提出的机器学习模型产生了有希望的结果，这表明使用采样算法或额外的机器学习技术预处理数据集可能并不总是必要的。这项研究强调了直接在原始数据集上使用机器学习模型的潜力，从而简化了工作流程，并有可能提高欺诈检测系统的准确性和效率，从而为信用卡欺诈检测领域做出了贡献。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Cyber Security and Mobility Computer Science-Computer Networks and Communications

CiteScore

2.30

自引率

0.00%

发文量

期刊介绍： Journal of Cyber Security and Mobility is an international, open-access, peer reviewed journal publishing original research, review/survey, and tutorial papers on all cyber security fields including information, computer & network security, cryptography, digital forensics etc. but also interdisciplinary articles that cover privacy, ethical, legal, economical aspects of cyber security or emerging solutions drawn from other branches of science, for example, nature-inspired. The journal aims at becoming an international source of innovation and an essential reading for IT security professionals around the world by providing an in-depth and holistic view on all security spectrum and solutions ranging from practical to theoretical. Its goal is to bring together researchers and practitioners dealing with the diverse fields of cybersecurity and to cover topics that are equally valuable for professionals as well as for those new in the field from all sectors industry, commerce and academia. This journal covers diverse security issues in cyber space and solutions thereof. As cyber space has moved towards the wireless/mobile world, issues in wireless/mobile communications and those involving mobility aspects will also be published.