A Minimum Error-Based PCA for Improving Classifier Performance in Detecting Financial Fraud

Jurnal Teknik Elektro Pub Date : 2022-06-27 DOI:10.15294/jte.v14i1.35787

B. Pambudi, Silmi Fauziati, Indriana Hidayah

{"title":"A Minimum Error-Based PCA for Improving Classifier Performance in Detecting Financial Fraud","authors":"B. Pambudi, Silmi Fauziati, Indriana Hidayah","doi":"10.15294/jte.v14i1.35787","DOIUrl":null,"url":null,"abstract":"The main challenge of data mining approaches to detect fraud in financial transaction data is the imbalance of data classes in available datasets, with a much smaller fraud class proportion than the non-fraud. This imbalance affects the f1-score to be low due to unbalanced precision and recall. Therefore, the model can predict one class well, but it does not apply to another class. In addition, the lengthy training time duration and high computational resource requirements in implementing data mining also make them a particular concern. Therefore, solely handling imbalanced data is still insufficient to produce the expected performance. Reduction of data dimensions can be a solution to increase the speed of the process. However, this method actually reduces the classifier’s performance when it comes to classification. Furthermore, this study intends to improve the performance of the data mining approach based on the Support Vector Machine (SVM) classifier aiming at detecting financial fraud transactions. The SVM performance was refined by tuning the kernel and hyperparameter integrated with the Random Under Sampling (RUS) and our Minimum error-based Principal Component Analysis (MebPCA). The RUS was used to handle imbalanced data, while MebPCA modified data dimension reduction techniques based on classification errors to speed up computational time without disturbing the performance of SVM. This combination improves the classifier's performance in detecting fraud effectively with a precision improvement of 29.31% and f1-score of 19.8%, and efficiently reduces the duration of training time significantly by 36.39% compared to previous research regarding the SVM method for fraud detection.","PeriodicalId":33631,"journal":{"name":"Jurnal Teknik Elektro","volume":"72 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal Teknik Elektro","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15294/jte.v14i1.35787","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The main challenge of data mining approaches to detect fraud in financial transaction data is the imbalance of data classes in available datasets, with a much smaller fraud class proportion than the non-fraud. This imbalance affects the f1-score to be low due to unbalanced precision and recall. Therefore, the model can predict one class well, but it does not apply to another class. In addition, the lengthy training time duration and high computational resource requirements in implementing data mining also make them a particular concern. Therefore, solely handling imbalanced data is still insufficient to produce the expected performance. Reduction of data dimensions can be a solution to increase the speed of the process. However, this method actually reduces the classifier’s performance when it comes to classification. Furthermore, this study intends to improve the performance of the data mining approach based on the Support Vector Machine (SVM) classifier aiming at detecting financial fraud transactions. The SVM performance was refined by tuning the kernel and hyperparameter integrated with the Random Under Sampling (RUS) and our Minimum error-based Principal Component Analysis (MebPCA). The RUS was used to handle imbalanced data, while MebPCA modified data dimension reduction techniques based on classification errors to speed up computational time without disturbing the performance of SVM. This combination improves the classifier's performance in detecting fraud effectively with a precision improvement of 29.31% and f1-score of 19.8%, and efficiently reduces the duration of training time significantly by 36.39% compared to previous research regarding the SVM method for fraud detection.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于最小误差的PCA在财务欺诈检测中的应用

数据挖掘方法在金融交易数据中检测欺诈的主要挑战是可用数据集中数据类的不平衡，欺诈类的比例比非欺诈类的比例小得多。由于精确度和召回率不平衡，这种不平衡影响了f1得分较低。因此，该模型可以很好地预测一类，但不适用于另一类。此外，在实现数据挖掘过程中，较长的训练时间和较高的计算资源需求也使它们成为一个特别值得关注的问题。因此，仅仅处理不平衡数据仍然不足以产生预期的性能。减少数据维度是提高处理速度的一种解决方案。然而，这种方法实际上降低了分类器在分类时的性能。此外，本研究旨在改进基于支持向量机(SVM)分类器的数据挖掘方法的性能，以检测金融欺诈交易。通过对核和超参数进行调优，结合随机欠采样(RUS)和基于最小误差的主成分分析(MebPCA)改进SVM的性能。采用RUS处理不平衡数据，而MebPCA改进了基于分类误差的数据降维技术，在不影响SVM性能的前提下加快了计算时间。这种组合有效地提高了分类器检测欺诈的性能，准确率提高了29.31%，f1-score提高了19.8%，并且与之前的SVM方法进行欺诈检测的研究相比，有效地减少了36.39%的训练时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Jurnal Teknik Elektro

自引率

0.00%

发文量

审稿时长

12 weeks