Efficient Resampling for Fraud Detection During Anonymised Credit Card Transactions with Unbalanced Datasets

Petr Mrozek, John Panneerselvam, O. Bagdasar
{"title":"Efficient Resampling for Fraud Detection During Anonymised Credit Card Transactions with Unbalanced Datasets","authors":"Petr Mrozek, John Panneerselvam, O. Bagdasar","doi":"10.1109/UCC48980.2020.00067","DOIUrl":null,"url":null,"abstract":"The rapid growth of e-commerce and online shopping have resulted in an unprecedented increase in the amount of money that is annually lost to credit card fraudsters. In an attempt to address credit card fraud, researchers are leveraging the application of various machine learning techniques for efficiently detecting and preventing fraudulent credit card transactions. One of the prevalent common issues around the analytics of credit card transactions is the highly unbalanced nature of the datasets, which is frequently associated with the binary classification problems. This paper intends to review, analyse and implement a selection of notable machine learning algorithms such as Logistic Regression, Random Forest, K-Nearest Neighbours and Stochastic Gradient Descent, with the motivation of empirically evaluating their efficiencies in handling unbalanced datasets whilst detecting credit card fraud transactions. A publicly available dataset comprising 284807 transactions of European cardholders is analysed and trained with the studied machine learning techniques to detect fraudulent transactions. Furthermore, this paper also evaluates the incorporation of two notable resampling methods, namely Random Under-sampling and Synthetic Majority Oversampling Techniques (SMOTE) in the aforementioned algorithms, in order to analyse their efficiency in handling unbalanced datasets. The proposed resampling methods significantly increased the detection ability, the most successful technique of combination of Random Forest with Random Under-sampling achieved the recall score of 100% in contrast to the recall score 77% of model without resampling technique. The key contribution of this paper is the postulation of efficient machine learning algorithms together with suitable resampling methods, suitable for credit card fraud detection with unbalanced dataset.","PeriodicalId":125849,"journal":{"name":"2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UCC48980.2020.00067","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

The rapid growth of e-commerce and online shopping have resulted in an unprecedented increase in the amount of money that is annually lost to credit card fraudsters. In an attempt to address credit card fraud, researchers are leveraging the application of various machine learning techniques for efficiently detecting and preventing fraudulent credit card transactions. One of the prevalent common issues around the analytics of credit card transactions is the highly unbalanced nature of the datasets, which is frequently associated with the binary classification problems. This paper intends to review, analyse and implement a selection of notable machine learning algorithms such as Logistic Regression, Random Forest, K-Nearest Neighbours and Stochastic Gradient Descent, with the motivation of empirically evaluating their efficiencies in handling unbalanced datasets whilst detecting credit card fraud transactions. A publicly available dataset comprising 284807 transactions of European cardholders is analysed and trained with the studied machine learning techniques to detect fraudulent transactions. Furthermore, this paper also evaluates the incorporation of two notable resampling methods, namely Random Under-sampling and Synthetic Majority Oversampling Techniques (SMOTE) in the aforementioned algorithms, in order to analyse their efficiency in handling unbalanced datasets. The proposed resampling methods significantly increased the detection ability, the most successful technique of combination of Random Forest with Random Under-sampling achieved the recall score of 100% in contrast to the recall score 77% of model without resampling technique. The key contribution of this paper is the postulation of efficient machine learning algorithms together with suitable resampling methods, suitable for credit card fraud detection with unbalanced dataset.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于非平衡数据集的信用卡匿名交易欺诈检测的高效重采样
电子商务和网上购物的快速发展导致信用卡诈骗者每年损失的金额空前增加。为了解决信用卡欺诈问题,研究人员正在利用各种机器学习技术的应用来有效地检测和防止欺诈性信用卡交易。信用卡交易分析的一个普遍问题是数据集的高度不平衡,这通常与二元分类问题有关。本文旨在回顾、分析和实现一系列著名的机器学习算法,如逻辑回归、随机森林、k近邻和随机梯度下降,其动机是在检测信用卡欺诈交易的同时,通过经验评估它们在处理不平衡数据集方面的效率。一个公开可用的数据集包括284807笔欧洲持卡人的交易,并使用研究的机器学习技术进行分析和训练,以检测欺诈交易。此外,本文还评估了上述算法中两种著名的重采样方法,即随机欠采样和合成多数过采样技术(SMOTE),以分析它们在处理不平衡数据集方面的效率。提出的重采样方法显著提高了检测能力,其中随机森林与随机欠采样相结合的方法最成功,召回率达到100%,而没有重采样的模型召回率为77%。本文的关键贡献是假设了有效的机器学习算法以及合适的重采样方法,适用于不平衡数据集的信用卡欺诈检测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Blockchain Mobility Solution for Charging Transactions of Electrical Vehicles Open-source Serverless Architectures: an Evaluation of Apache OpenWhisk Explaining probabilistic Artificial Intelligence (AI) models by discretizing Deep Neural Networks Message from the B2D2LM 2020 Workshop Chairs Dynamic Network Slicing in Fog Computing for Mobile Users in MobFogSim
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1