{"title":"基于缺失价值估算的成本敏感级联森林增强财务欺诈检测","authors":"Lukui Huang, Alan Abrahams, Peter Ractham","doi":"10.1002/isaf.1517","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Financial statement fraud is a global problem for investors, audit firms, regulators, and other stakeholders. Fraud detection can be regarded as a binary classification problem with a false negative being more expensive than a false positive. Although existing studies have made great efforts to detect fraud using various data-mining techniques, the difference in misclassification costs is seldom considered. In this study, we propose a cost-sensitive cascade forest (CSCF) for fraud detection, which places heavy penalty on false negative prediction and self-adjusts the depth of a cascade forest according to the classifier’s recall (i.e. the classifier’s sensitivity). As missing values are ubiquitous in fraud research, we also explore the effect of selected missing data treatments on prediction performance, including complete case analysis, three selected classic statistical mechanisms (zero, mean, and modified mean imputation), and two machine learning (K-nearest neighbor [KNN] and random forest [RF]) approaches. The experimental results show that the proposed CSCF significantly improves the fraud prediction in comparison with one of the latest fraud detection models using the RUSBoost algorithm. Comparing different missing value treatments, even though RUSBoost and CSCF perform well when using complete case analysis, we find that the best performance is achieved when CSCF is used with missing data imputed as zero. Such treatment further improves the performance, and results in an area under curve (AUC) score of 0.82 compared to the highest AUC (0.71) from the baseline model. Supplementary analysis further reveals that the low AUC of complete case analysis for the two examined models persists under different training sizes. Thus, our findings shed light on the potential benefits of missing value imputation for the model’s performance for fraud detection.</p>\n </div>","PeriodicalId":53473,"journal":{"name":"Intelligent Systems in Accounting, Finance and Management","volume":"29 3","pages":"133-155"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhanced financial fraud detection using cost-sensitive cascade forest with missing value imputation\",\"authors\":\"Lukui Huang, Alan Abrahams, Peter Ractham\",\"doi\":\"10.1002/isaf.1517\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>Financial statement fraud is a global problem for investors, audit firms, regulators, and other stakeholders. Fraud detection can be regarded as a binary classification problem with a false negative being more expensive than a false positive. Although existing studies have made great efforts to detect fraud using various data-mining techniques, the difference in misclassification costs is seldom considered. In this study, we propose a cost-sensitive cascade forest (CSCF) for fraud detection, which places heavy penalty on false negative prediction and self-adjusts the depth of a cascade forest according to the classifier’s recall (i.e. the classifier’s sensitivity). As missing values are ubiquitous in fraud research, we also explore the effect of selected missing data treatments on prediction performance, including complete case analysis, three selected classic statistical mechanisms (zero, mean, and modified mean imputation), and two machine learning (K-nearest neighbor [KNN] and random forest [RF]) approaches. The experimental results show that the proposed CSCF significantly improves the fraud prediction in comparison with one of the latest fraud detection models using the RUSBoost algorithm. Comparing different missing value treatments, even though RUSBoost and CSCF perform well when using complete case analysis, we find that the best performance is achieved when CSCF is used with missing data imputed as zero. Such treatment further improves the performance, and results in an area under curve (AUC) score of 0.82 compared to the highest AUC (0.71) from the baseline model. Supplementary analysis further reveals that the low AUC of complete case analysis for the two examined models persists under different training sizes. Thus, our findings shed light on the potential benefits of missing value imputation for the model’s performance for fraud detection.</p>\\n </div>\",\"PeriodicalId\":53473,\"journal\":{\"name\":\"Intelligent Systems in Accounting, Finance and Management\",\"volume\":\"29 3\",\"pages\":\"133-155\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Intelligent Systems in Accounting, Finance and Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/isaf.1517\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Economics, Econometrics and Finance\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Systems in Accounting, Finance and Management","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/isaf.1517","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Economics, Econometrics and Finance","Score":null,"Total":0}
Enhanced financial fraud detection using cost-sensitive cascade forest with missing value imputation
Financial statement fraud is a global problem for investors, audit firms, regulators, and other stakeholders. Fraud detection can be regarded as a binary classification problem with a false negative being more expensive than a false positive. Although existing studies have made great efforts to detect fraud using various data-mining techniques, the difference in misclassification costs is seldom considered. In this study, we propose a cost-sensitive cascade forest (CSCF) for fraud detection, which places heavy penalty on false negative prediction and self-adjusts the depth of a cascade forest according to the classifier’s recall (i.e. the classifier’s sensitivity). As missing values are ubiquitous in fraud research, we also explore the effect of selected missing data treatments on prediction performance, including complete case analysis, three selected classic statistical mechanisms (zero, mean, and modified mean imputation), and two machine learning (K-nearest neighbor [KNN] and random forest [RF]) approaches. The experimental results show that the proposed CSCF significantly improves the fraud prediction in comparison with one of the latest fraud detection models using the RUSBoost algorithm. Comparing different missing value treatments, even though RUSBoost and CSCF perform well when using complete case analysis, we find that the best performance is achieved when CSCF is used with missing data imputed as zero. Such treatment further improves the performance, and results in an area under curve (AUC) score of 0.82 compared to the highest AUC (0.71) from the baseline model. Supplementary analysis further reveals that the low AUC of complete case analysis for the two examined models persists under different training sizes. Thus, our findings shed light on the potential benefits of missing value imputation for the model’s performance for fraud detection.
期刊介绍:
Intelligent Systems in Accounting, Finance and Management is a quarterly international journal which publishes original, high quality material dealing with all aspects of intelligent systems as they relate to the fields of accounting, economics, finance, marketing and management. In addition, the journal also is concerned with related emerging technologies, including big data, business intelligence, social media and other technologies. It encourages the development of novel technologies, and the embedding of new and existing technologies into applications of real, practical value. Therefore, implementation issues are of as much concern as development issues. The journal is designed to appeal to academics in the intelligent systems, emerging technologies and business fields, as well as to advanced practitioners who wish to improve the effectiveness, efficiency, or economy of their working practices. A special feature of the journal is the use of two groups of reviewers, those who specialize in intelligent systems work, and also those who specialize in applications areas. Reviewers are asked to address issues of originality and actual or potential impact on research, teaching, or practice in the accounting, finance, or management fields. Authors working on conceptual developments or on laboratory-based explorations of data sets therefore need to address the issue of potential impact at some level in submissions to the journal.