Yih Bing Chu, Zhi Min Lim, Bryan Keane, Ping Hao Kong, Ahmed Rafat Elkilany, Osama Hisham Abusetta
{"title":"基于集成机器学习技术的欧洲原始信用卡持卡人数据集的信用卡欺诈检测","authors":"Yih Bing Chu, Zhi Min Lim, Bryan Keane, Ping Hao Kong, Ahmed Rafat Elkilany, Osama Hisham Abusetta","doi":"10.32604/jcs.2023.045422","DOIUrl":null,"url":null,"abstract":"The proliferation of digital payment methods facilitated by various online platforms and applications has led to a surge in financial fraud, particularly in credit card transactions. Advanced technologies such as machine learning have been widely employed to enhance the early detection and prevention of losses arising from potentially fraudulent activities. However, a prevalent approach in existing literature involves the use of extensive data sampling and feature selection algorithms as a precursor to subsequent investigations. While sampling techniques can significantly reduce computational time, the resulting dataset relies on generated data and the accuracy of the pre-processing machine learning models employed. Such datasets often lack true representativeness of real-world data, potentially introducing secondary issues that affect the precision of the results. For instance, under-sampling may result in the loss of critical information, while over-sampling can lead to overfitting machine learning models. In this paper, we proposed a classification study of credit card fraud using fundamental machine learning models without the application of any sampling techniques on all the features present in the original dataset. The results indicate that Support Vector Machine (SVM) consistently achieves classification performance exceeding 90% across various evaluation metrics. This discovery serves as a valuable reference for future research, encouraging comparative studies on original dataset without the reliance on sampling techniques. Furthermore, we explore hybrid machine learning techniques, such as ensemble learning constructed based on SVM, K-Nearest Neighbor (KNN) and decision tree, highlighting their potential advancements in the field. The study demonstrates that the proposed machine learning models yield promising results, suggesting that pre-processing the dataset with sampling algorithm or additional machine learning technique may not always be necessary. This research contributes to the field of credit card fraud detection by emphasizing the potential of employing machine learning models directly on original datasets, thereby simplifying the workflow and potentially improving the accuracy and efficiency of fraud detection systems.","PeriodicalId":37820,"journal":{"name":"Journal of Cyber Security and Mobility","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Credit Card Fraud Detection on Original European Credit Card Holder Dataset Using Ensemble Machine Learning Technique\",\"authors\":\"Yih Bing Chu, Zhi Min Lim, Bryan Keane, Ping Hao Kong, Ahmed Rafat Elkilany, Osama Hisham Abusetta\",\"doi\":\"10.32604/jcs.2023.045422\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The proliferation of digital payment methods facilitated by various online platforms and applications has led to a surge in financial fraud, particularly in credit card transactions. Advanced technologies such as machine learning have been widely employed to enhance the early detection and prevention of losses arising from potentially fraudulent activities. However, a prevalent approach in existing literature involves the use of extensive data sampling and feature selection algorithms as a precursor to subsequent investigations. While sampling techniques can significantly reduce computational time, the resulting dataset relies on generated data and the accuracy of the pre-processing machine learning models employed. Such datasets often lack true representativeness of real-world data, potentially introducing secondary issues that affect the precision of the results. For instance, under-sampling may result in the loss of critical information, while over-sampling can lead to overfitting machine learning models. In this paper, we proposed a classification study of credit card fraud using fundamental machine learning models without the application of any sampling techniques on all the features present in the original dataset. The results indicate that Support Vector Machine (SVM) consistently achieves classification performance exceeding 90% across various evaluation metrics. This discovery serves as a valuable reference for future research, encouraging comparative studies on original dataset without the reliance on sampling techniques. Furthermore, we explore hybrid machine learning techniques, such as ensemble learning constructed based on SVM, K-Nearest Neighbor (KNN) and decision tree, highlighting their potential advancements in the field. The study demonstrates that the proposed machine learning models yield promising results, suggesting that pre-processing the dataset with sampling algorithm or additional machine learning technique may not always be necessary. This research contributes to the field of credit card fraud detection by emphasizing the potential of employing machine learning models directly on original datasets, thereby simplifying the workflow and potentially improving the accuracy and efficiency of fraud detection systems.\",\"PeriodicalId\":37820,\"journal\":{\"name\":\"Journal of Cyber Security and Mobility\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Cyber Security and Mobility\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.32604/jcs.2023.045422\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cyber Security and Mobility","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32604/jcs.2023.045422","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Computer Science","Score":null,"Total":0}
Credit Card Fraud Detection on Original European Credit Card Holder Dataset Using Ensemble Machine Learning Technique
The proliferation of digital payment methods facilitated by various online platforms and applications has led to a surge in financial fraud, particularly in credit card transactions. Advanced technologies such as machine learning have been widely employed to enhance the early detection and prevention of losses arising from potentially fraudulent activities. However, a prevalent approach in existing literature involves the use of extensive data sampling and feature selection algorithms as a precursor to subsequent investigations. While sampling techniques can significantly reduce computational time, the resulting dataset relies on generated data and the accuracy of the pre-processing machine learning models employed. Such datasets often lack true representativeness of real-world data, potentially introducing secondary issues that affect the precision of the results. For instance, under-sampling may result in the loss of critical information, while over-sampling can lead to overfitting machine learning models. In this paper, we proposed a classification study of credit card fraud using fundamental machine learning models without the application of any sampling techniques on all the features present in the original dataset. The results indicate that Support Vector Machine (SVM) consistently achieves classification performance exceeding 90% across various evaluation metrics. This discovery serves as a valuable reference for future research, encouraging comparative studies on original dataset without the reliance on sampling techniques. Furthermore, we explore hybrid machine learning techniques, such as ensemble learning constructed based on SVM, K-Nearest Neighbor (KNN) and decision tree, highlighting their potential advancements in the field. The study demonstrates that the proposed machine learning models yield promising results, suggesting that pre-processing the dataset with sampling algorithm or additional machine learning technique may not always be necessary. This research contributes to the field of credit card fraud detection by emphasizing the potential of employing machine learning models directly on original datasets, thereby simplifying the workflow and potentially improving the accuracy and efficiency of fraud detection systems.
期刊介绍:
Journal of Cyber Security and Mobility is an international, open-access, peer reviewed journal publishing original research, review/survey, and tutorial papers on all cyber security fields including information, computer & network security, cryptography, digital forensics etc. but also interdisciplinary articles that cover privacy, ethical, legal, economical aspects of cyber security or emerging solutions drawn from other branches of science, for example, nature-inspired. The journal aims at becoming an international source of innovation and an essential reading for IT security professionals around the world by providing an in-depth and holistic view on all security spectrum and solutions ranging from practical to theoretical. Its goal is to bring together researchers and practitioners dealing with the diverse fields of cybersecurity and to cover topics that are equally valuable for professionals as well as for those new in the field from all sectors industry, commerce and academia. This journal covers diverse security issues in cyber space and solutions thereof. As cyber space has moved towards the wireless/mobile world, issues in wireless/mobile communications and those involving mobility aspects will also be published.