G. Parthasarathy, L. Ramanathan, Y. Justindhas, J. Saravanakumar, J. Darwin
{"title":"Comparative Case Study of Machine Learning Classification Techniques Using Imbalanced Credit Card Fraud Datasets","authors":"G. Parthasarathy, L. Ramanathan, Y. Justindhas, J. Saravanakumar, J. Darwin","doi":"10.2139/ssrn.3351584","DOIUrl":null,"url":null,"abstract":"Today, the total transaction volume of credit cards is increasing consistently, as a result fraudulent transaction cases are also on a rise, producing losses in billions of dollars for financial institutions and banking sectors every year. Hence there is a need for a robust, reliable mechanism which is able to identify and prevent such fraudulent transactions effectively and efficiently. Some data mining techniques helps in detecting patterns between data attributes (classifying the transaction as fraudulent or non-fraudulent) and results in probabilistic prediction of the transaction category. In this study, multiple Machine Learning classification techniques are applied on a highly imbalanced datasets consisting of credit card transaction. ‘Chip and Pin’ is considered as one of the trusted mechanisms today in terms of securing payment transaction but even this mechanism doesn’t stops fake credit card utilizations on virtual Point Of Sale nodes or email orders known as an online 'credit card bankrupt'. It was observed that SVM, Random Forest and J48 Decision Tree classifiers yield a very high accuracy ratio but are suggested not to be leveraged while classifying such dataset where class imbalance is present. While thinking about these methodologies, this investigation gives a comprehensive overview of various classification methods, their highlights and restrictions of bankruptcy.<br>","PeriodicalId":406435,"journal":{"name":"CompSciRN: Other Machine Learning (Topic)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CompSciRN: Other Machine Learning (Topic)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3351584","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Today, the total transaction volume of credit cards is increasing consistently, as a result fraudulent transaction cases are also on a rise, producing losses in billions of dollars for financial institutions and banking sectors every year. Hence there is a need for a robust, reliable mechanism which is able to identify and prevent such fraudulent transactions effectively and efficiently. Some data mining techniques helps in detecting patterns between data attributes (classifying the transaction as fraudulent or non-fraudulent) and results in probabilistic prediction of the transaction category. In this study, multiple Machine Learning classification techniques are applied on a highly imbalanced datasets consisting of credit card transaction. ‘Chip and Pin’ is considered as one of the trusted mechanisms today in terms of securing payment transaction but even this mechanism doesn’t stops fake credit card utilizations on virtual Point Of Sale nodes or email orders known as an online 'credit card bankrupt'. It was observed that SVM, Random Forest and J48 Decision Tree classifiers yield a very high accuracy ratio but are suggested not to be leveraged while classifying such dataset where class imbalance is present. While thinking about these methodologies, this investigation gives a comprehensive overview of various classification methods, their highlights and restrictions of bankruptcy.
如今,信用卡交易总量持续增长,因此欺诈交易案件也在增加,每年给金融机构和银行业造成数十亿美元的损失。因此,需要一个强有力的、可靠的机制,能够有效地识别和防止这种欺诈性交易。一些数据挖掘技术有助于检测数据属性之间的模式(将事务分类为欺诈性或非欺诈性),并对事务类别进行概率预测。在本研究中,将多种机器学习分类技术应用于由信用卡交易组成的高度不平衡数据集。“芯片和密码”被认为是当今安全支付交易的可信机制之一,但即使这种机制也无法阻止虚拟销售点节点或电子邮件订单上的虚假信用卡使用,即在线“信用卡破产”。观察到SVM, Random Forest和J48 Decision Tree分类器产生非常高的准确率,但建议不要在分类存在类不平衡的数据集时使用。在思考这些方法的同时,本调查全面概述了各种分类方法,它们的亮点和破产的限制。