基于缺失价值估算的成本敏感级联森林增强财务欺诈检测

IF 3.7 Q1 Economics, Econometrics and Finance Intelligent Systems in Accounting, Finance and Management Pub Date : 2022-07-28 DOI:10.1002/isaf.1517

Lukui Huang, Alan Abrahams, Peter Ractham

{"title":"基于缺失价值估算的成本敏感级联森林增强财务欺诈检测","authors":"Lukui Huang, Alan Abrahams, Peter Ractham","doi":"10.1002/isaf.1517","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Financial statement fraud is a global problem for investors, audit firms, regulators, and other stakeholders. Fraud detection can be regarded as a binary classification problem with a false negative being more expensive than a false positive. Although existing studies have made great efforts to detect fraud using various data-mining techniques, the difference in misclassification costs is seldom considered. In this study, we propose a cost-sensitive cascade forest (CSCF) for fraud detection, which places heavy penalty on false negative prediction and self-adjusts the depth of a cascade forest according to the classifier’s recall (i.e. the classifier’s sensitivity). As missing values are ubiquitous in fraud research, we also explore the effect of selected missing data treatments on prediction performance, including complete case analysis, three selected classic statistical mechanisms (zero, mean, and modified mean imputation), and two machine learning (K-nearest neighbor [KNN] and random forest [RF]) approaches. The experimental results show that the proposed CSCF significantly improves the fraud prediction in comparison with one of the latest fraud detection models using the RUSBoost algorithm. Comparing different missing value treatments, even though RUSBoost and CSCF perform well when using complete case analysis, we find that the best performance is achieved when CSCF is used with missing data imputed as zero. Such treatment further improves the performance, and results in an area under curve (AUC) score of 0.82 compared to the highest AUC (0.71) from the baseline model. Supplementary analysis further reveals that the low AUC of complete case analysis for the two examined models persists under different training sizes. Thus, our findings shed light on the potential benefits of missing value imputation for the model’s performance for fraud detection.</p>\n </div>","PeriodicalId":53473,"journal":{"name":"Intelligent Systems in Accounting, Finance and Management","volume":"29 3","pages":"133-155"},"PeriodicalIF":3.7000,"publicationDate":"2022-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhanced financial fraud detection using cost-sensitive cascade forest with missing value imputation\",\"authors\":\"Lukui Huang, Alan Abrahams, Peter Ractham\",\"doi\":\"10.1002/isaf.1517\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>Financial statement fraud is a global problem for investors, audit firms, regulators, and other stakeholders. Fraud detection can be regarded as a binary classification problem with a false negative being more expensive than a false positive. Although existing studies have made great efforts to detect fraud using various data-mining techniques, the difference in misclassification costs is seldom considered. In this study, we propose a cost-sensitive cascade forest (CSCF) for fraud detection, which places heavy penalty on false negative prediction and self-adjusts the depth of a cascade forest according to the classifier’s recall (i.e. the classifier’s sensitivity). As missing values are ubiquitous in fraud research, we also explore the effect of selected missing data treatments on prediction performance, including complete case analysis, three selected classic statistical mechanisms (zero, mean, and modified mean imputation), and two machine learning (K-nearest neighbor [KNN] and random forest [RF]) approaches. The experimental results show that the proposed CSCF significantly improves the fraud prediction in comparison with one of the latest fraud detection models using the RUSBoost algorithm. Comparing different missing value treatments, even though RUSBoost and CSCF perform well when using complete case analysis, we find that the best performance is achieved when CSCF is used with missing data imputed as zero. Such treatment further improves the performance, and results in an area under curve (AUC) score of 0.82 compared to the highest AUC (0.71) from the baseline model. Supplementary analysis further reveals that the low AUC of complete case analysis for the two examined models persists under different training sizes. Thus, our findings shed light on the potential benefits of missing value imputation for the model’s performance for fraud detection.</p>\\n </div>\",\"PeriodicalId\":53473,\"journal\":{\"name\":\"Intelligent Systems in Accounting, Finance and Management\",\"volume\":\"29 3\",\"pages\":\"133-155\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2022-07-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Intelligent Systems in Accounting, Finance and Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/isaf.1517\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Economics, Econometrics and Finance\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Systems in Accounting, Finance and Management","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/isaf.1517","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Economics, Econometrics and Finance","Score":null,"Total":0}

引用次数: 0

摘要

对于投资者、审计公司、监管机构和其他利益相关者来说，财务报表欺诈是一个全球性问题。欺诈检测可以看作是一个二值分类问题，假阴性比假阳性代价更大。尽管现有的研究已经利用各种数据挖掘技术做出了很大的努力来检测欺诈行为，但很少考虑错误分类成本的差异。在本研究中，我们提出了一种成本敏感级联森林(CSCF)用于欺诈检测，它对假阴性预测进行重罚，并根据分类器的召回率(即分类器的灵敏度)自调整级联森林的深度。由于缺失值在欺诈研究中无处不在，我们还探讨了选定的缺失数据处理对预测性能的影响，包括完整的案例分析，三种选定的经典统计机制(零、均值和修正均值imputation)，以及两种机器学习(k最近邻[KNN]和随机森林[RF])方法。实验结果表明，与使用RUSBoost算法的最新欺诈检测模型相比，所提出的CSCF显著提高了欺诈预测。比较不同的缺失值处理，尽管RUSBoost和CSCF在使用完整的案例分析时表现良好，但我们发现，当CSCF使用缺失数据为零时，可以获得最佳性能。这种处理进一步提高了性能，与基线模型的最高AUC(0.71)相比，曲线下面积(AUC)得分为0.82。补充分析进一步表明，在不同的训练规模下，两种模型的完整案例分析的低AUC仍然存在。因此，我们的研究结果揭示了缺失值估算对模型欺诈检测性能的潜在好处。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Enhanced financial fraud detection using cost-sensitive cascade forest with missing value imputation

Financial statement fraud is a global problem for investors, audit firms, regulators, and other stakeholders. Fraud detection can be regarded as a binary classification problem with a false negative being more expensive than a false positive. Although existing studies have made great efforts to detect fraud using various data-mining techniques, the difference in misclassification costs is seldom considered. In this study, we propose a cost-sensitive cascade forest (CSCF) for fraud detection, which places heavy penalty on false negative prediction and self-adjusts the depth of a cascade forest according to the classifier’s recall (i.e. the classifier’s sensitivity). As missing values are ubiquitous in fraud research, we also explore the effect of selected missing data treatments on prediction performance, including complete case analysis, three selected classic statistical mechanisms (zero, mean, and modified mean imputation), and two machine learning (K-nearest neighbor [KNN] and random forest [RF]) approaches. The experimental results show that the proposed CSCF significantly improves the fraud prediction in comparison with one of the latest fraud detection models using the RUSBoost algorithm. Comparing different missing value treatments, even though RUSBoost and CSCF perform well when using complete case analysis, we find that the best performance is achieved when CSCF is used with missing data imputed as zero. Such treatment further improves the performance, and results in an area under curve (AUC) score of 0.82 compared to the highest AUC (0.71) from the baseline model. Supplementary analysis further reveals that the low AUC of complete case analysis for the two examined models persists under different training sizes. Thus, our findings shed light on the potential benefits of missing value imputation for the model’s performance for fraud detection.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Intelligent Systems in Accounting, Finance and Management Economics, Econometrics and Finance-Finance

CiteScore

6.00

自引率

0.00%

发文量

期刊介绍： Intelligent Systems in Accounting, Finance and Management is a quarterly international journal which publishes original, high quality material dealing with all aspects of intelligent systems as they relate to the fields of accounting, economics, finance, marketing and management. In addition, the journal also is concerned with related emerging technologies, including big data, business intelligence, social media and other technologies. It encourages the development of novel technologies, and the embedding of new and existing technologies into applications of real, practical value. Therefore, implementation issues are of as much concern as development issues. The journal is designed to appeal to academics in the intelligent systems, emerging technologies and business fields, as well as to advanced practitioners who wish to improve the effectiveness, efficiency, or economy of their working practices. A special feature of the journal is the use of two groups of reviewers, those who specialize in intelligent systems work, and also those who specialize in applications areas. Reviewers are asked to address issues of originality and actual or potential impact on research, teaching, or practice in the accounting, finance, or management fields. Authors working on conceptual developments or on laboratory-based explorations of data sets therefore need to address the issue of potential impact at some level in submissions to the journal.