{"title":"Predicting credit card fraud with Sarbanes-Oxley assessments and Fama-French risk factors","authors":"James Christopher Westland","doi":"10.1002/isaf.1472","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>This research developed and tested machine learning models to predict significant credit card fraud in corporate systems using Sarbanes-Oxley (SOX) reports, news reports of breaches and Fama-French risk factors (FF). Exploratory analysis found that SOX information predicted several types of security breaches, with the strongest performance in predicting credit card fraud. A systematic tuning of hyperparamters for a suite of machine learning models, starting with a random forest, an extremely-randomized forest, a random grid of gradient boosting machines (GBMs), a random grid of deep neural nets, a fixed grid of general linear models where assembled into two trained stacked ensemble models optimized for F1 performance; an ensemble that contained all the models, and an ensemble containing just the best performing model from each algorithm class. Tuned GBMs performed best under all conditions. Without FF, models yielded an AUC of 99.3% and closeness of the training and validation matrices confirm that the model is robust. The most important predictors were firm specific, as would be expected, since control weaknesses vary at the firm level. Audit firm fees were the most important non-firm-specific predictors. Adding FF to the model rendered perfect prediction (100%) in the trained confusion matrix and AUC of 99.8%. The most important predictors of credit card fraud were the FF coefficient for the High book-to-market ratio Minus Low factor. The second most influential variable was the year of reporting, and third most important was the Fama-French 3-factor model <i>R</i><sup>2</sup> – together these described most of the variance in credit card fraud occurrence. In all cases the four major SOX specific opinions rendered by auditors and the signed SOX report had little predictive influence.</p>\n </div>","PeriodicalId":53473,"journal":{"name":"Intelligent Systems in Accounting, Finance and Management","volume":"27 2","pages":"95-107"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/isaf.1472","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Systems in Accounting, Finance and Management","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/isaf.1472","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Economics, Econometrics and Finance","Score":null,"Total":0}
引用次数: 8
Abstract
This research developed and tested machine learning models to predict significant credit card fraud in corporate systems using Sarbanes-Oxley (SOX) reports, news reports of breaches and Fama-French risk factors (FF). Exploratory analysis found that SOX information predicted several types of security breaches, with the strongest performance in predicting credit card fraud. A systematic tuning of hyperparamters for a suite of machine learning models, starting with a random forest, an extremely-randomized forest, a random grid of gradient boosting machines (GBMs), a random grid of deep neural nets, a fixed grid of general linear models where assembled into two trained stacked ensemble models optimized for F1 performance; an ensemble that contained all the models, and an ensemble containing just the best performing model from each algorithm class. Tuned GBMs performed best under all conditions. Without FF, models yielded an AUC of 99.3% and closeness of the training and validation matrices confirm that the model is robust. The most important predictors were firm specific, as would be expected, since control weaknesses vary at the firm level. Audit firm fees were the most important non-firm-specific predictors. Adding FF to the model rendered perfect prediction (100%) in the trained confusion matrix and AUC of 99.8%. The most important predictors of credit card fraud were the FF coefficient for the High book-to-market ratio Minus Low factor. The second most influential variable was the year of reporting, and third most important was the Fama-French 3-factor model R2 – together these described most of the variance in credit card fraud occurrence. In all cases the four major SOX specific opinions rendered by auditors and the signed SOX report had little predictive influence.
本研究开发并测试了机器学习模型,利用萨班斯-奥克斯利法案(SOX)报告、违规新闻报道和Fama-French风险因素(FF)来预测企业系统中的重大信用卡欺诈行为。探索性分析发现,SOX信息预测了几种类型的安全漏洞,在预测信用卡欺诈方面表现最好。对一组机器学习模型的超参数进行系统调优,从随机森林、极端随机森林、梯度增强机(GBMs)的随机网格、深度神经网络的随机网格、一般线性模型的固定网格开始,这些模型组装成两个针对F1性能优化的训练有素的堆叠集成模型;一个包含所有模型的集成,一个只包含每个算法类中表现最好的模型的集成。调优的GBMs在所有条件下都表现最好。在没有FF的情况下,模型的AUC为99.3%,训练矩阵和验证矩阵的接近度证实了模型的鲁棒性。正如预期的那样,最重要的预测因素是公司特有的,因为控制弱点在公司层面有所不同。审计事务所收费是最重要的非特定公司预测指标。将FF添加到模型中,在训练的混淆矩阵中呈现完美的预测(100%),AUC为99.8%。信用卡欺诈最重要的预测因子是高账面市值比减去低因子的FF系数。第二个最具影响力的变量是报告年份,第三个最重要的变量是Fama-French 3-factor model R2——它们共同描述了信用卡欺诈发生的大部分差异。在所有情况下,审计员提出的四种主要SOX具体意见和签署的SOX报告几乎没有预测影响。
期刊介绍:
Intelligent Systems in Accounting, Finance and Management is a quarterly international journal which publishes original, high quality material dealing with all aspects of intelligent systems as they relate to the fields of accounting, economics, finance, marketing and management. In addition, the journal also is concerned with related emerging technologies, including big data, business intelligence, social media and other technologies. It encourages the development of novel technologies, and the embedding of new and existing technologies into applications of real, practical value. Therefore, implementation issues are of as much concern as development issues. The journal is designed to appeal to academics in the intelligent systems, emerging technologies and business fields, as well as to advanced practitioners who wish to improve the effectiveness, efficiency, or economy of their working practices. A special feature of the journal is the use of two groups of reviewers, those who specialize in intelligent systems work, and also those who specialize in applications areas. Reviewers are asked to address issues of originality and actual or potential impact on research, teaching, or practice in the accounting, finance, or management fields. Authors working on conceptual developments or on laboratory-based explorations of data sets therefore need to address the issue of potential impact at some level in submissions to the journal.