A Minimum Error-Based PCA for Improving Classifier Performance in Detecting Financial Fraud

B. Pambudi, Silmi Fauziati, Indriana Hidayah
{"title":"A Minimum Error-Based PCA for Improving Classifier Performance in Detecting Financial Fraud","authors":"B. Pambudi, Silmi Fauziati, Indriana Hidayah","doi":"10.15294/jte.v14i1.35787","DOIUrl":null,"url":null,"abstract":"The main challenge of data mining approaches to detect fraud in financial transaction data is the imbalance of data classes in available datasets, with a much smaller fraud class proportion than the non-fraud. This imbalance affects the f1-score to be low due to unbalanced precision and recall. Therefore, the model can predict one class well, but it does not apply to another class. In addition, the lengthy training time duration and high computational resource requirements in implementing data mining also make them a particular concern. Therefore, solely handling imbalanced data is still insufficient to produce the expected performance. Reduction of data dimensions can be a solution to increase the speed of the process. However, this method actually reduces the classifier’s performance when it comes to classification. Furthermore, this study intends to improve the performance of the data mining approach based on the Support Vector Machine (SVM) classifier aiming at detecting financial fraud transactions. The SVM performance was refined by tuning the kernel and hyperparameter integrated with the Random Under Sampling (RUS) and our Minimum error-based Principal Component Analysis (MebPCA). The RUS was used to handle imbalanced data, while MebPCA modified data dimension reduction techniques based on classification errors to speed up computational time without disturbing the performance of SVM. This combination improves the classifier's performance in detecting fraud effectively with a precision improvement of 29.31% and f1-score of 19.8%, and efficiently reduces the duration of training time significantly by 36.39% compared to previous research regarding the SVM method for fraud detection.","PeriodicalId":33631,"journal":{"name":"Jurnal Teknik Elektro","volume":"72 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal Teknik Elektro","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15294/jte.v14i1.35787","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The main challenge of data mining approaches to detect fraud in financial transaction data is the imbalance of data classes in available datasets, with a much smaller fraud class proportion than the non-fraud. This imbalance affects the f1-score to be low due to unbalanced precision and recall. Therefore, the model can predict one class well, but it does not apply to another class. In addition, the lengthy training time duration and high computational resource requirements in implementing data mining also make them a particular concern. Therefore, solely handling imbalanced data is still insufficient to produce the expected performance. Reduction of data dimensions can be a solution to increase the speed of the process. However, this method actually reduces the classifier’s performance when it comes to classification. Furthermore, this study intends to improve the performance of the data mining approach based on the Support Vector Machine (SVM) classifier aiming at detecting financial fraud transactions. The SVM performance was refined by tuning the kernel and hyperparameter integrated with the Random Under Sampling (RUS) and our Minimum error-based Principal Component Analysis (MebPCA). The RUS was used to handle imbalanced data, while MebPCA modified data dimension reduction techniques based on classification errors to speed up computational time without disturbing the performance of SVM. This combination improves the classifier's performance in detecting fraud effectively with a precision improvement of 29.31% and f1-score of 19.8%, and efficiently reduces the duration of training time significantly by 36.39% compared to previous research regarding the SVM method for fraud detection.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于最小误差的PCA在财务欺诈检测中的应用
数据挖掘方法在金融交易数据中检测欺诈的主要挑战是可用数据集中数据类的不平衡,欺诈类的比例比非欺诈类的比例小得多。由于精确度和召回率不平衡,这种不平衡影响了f1得分较低。因此,该模型可以很好地预测一类,但不适用于另一类。此外,在实现数据挖掘过程中,较长的训练时间和较高的计算资源需求也使它们成为一个特别值得关注的问题。因此,仅仅处理不平衡数据仍然不足以产生预期的性能。减少数据维度是提高处理速度的一种解决方案。然而,这种方法实际上降低了分类器在分类时的性能。此外,本研究旨在改进基于支持向量机(SVM)分类器的数据挖掘方法的性能,以检测金融欺诈交易。通过对核和超参数进行调优,结合随机欠采样(RUS)和基于最小误差的主成分分析(MebPCA)改进SVM的性能。采用RUS处理不平衡数据,而MebPCA改进了基于分类误差的数据降维技术,在不影响SVM性能的前提下加快了计算时间。这种组合有效地提高了分类器检测欺诈的性能,准确率提高了29.31%,f1-score提高了19.8%,并且与之前的SVM方法进行欺诈检测的研究相比,有效地减少了36.39%的训练时间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
审稿时长
12 weeks
期刊最新文献
Leveraging Convolutional Neural Networks for Automated Detection and Grading of Diabetic Retinopathy from Fundus Images Sequential Detection under Correlated Observations using Recursive Method Deep Learning for Investment Risk Analysis, Expected Return and Stock Market Prediction Vivaldi Tapered Slot Antenna for Microwave Imaging in Medical Applications Tracing Knowledge States through Student Assessment in a Blended Learning Environment
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1