Some new invariant sum tests and MAD tests for the assessment of Benford’s law

IF 1.4 4区数学 Q3 STATISTICS & PROBABILITY Computational Statistics Pub Date : 2024-02-13 DOI:10.1007/s00180-024-01463-8

Wolfgang Kössler, Hans-J. Lenz, Xing D. Wang

{"title":"Some new invariant sum tests and MAD tests for the assessment of Benford’s law","authors":"Wolfgang Kössler, Hans-J. Lenz, Xing D. Wang","doi":"10.1007/s00180-024-01463-8","DOIUrl":null,"url":null,"abstract":"<p>The Benford law is used world-wide for detecting non-conformance or data fraud of numerical data. It says that the significand of a data set from the universe is not uniformly, but logarithmically distributed. Especially, the first non-zero digit is One with an approximate probability of 0.3. There are several tests available for testing Benford, the best known are Pearson’s <span>\\(\\chi ^2\\)</span>-test, the Kolmogorov–Smirnov test and a modified version of the MAD-test. In the present paper we propose some tests, three of the four invariant sum tests are new and they are motivated by the sum invariance property of the Benford law. Two distance measures are investigated, Euclidean and Mahalanobis distance of the standardized sums to the orign. We use the significands corresponding to the first significant digit as well as the second significant digit, respectively. Moreover, we suggest inproved versions of the MAD-test and obtain critical values that are independent of the sample sizes. For illustration the tests are applied to specifically selected data sets where prior knowledge is available about being or not being Benford. Furthermore we discuss the role of truncation of distributions.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"170 1","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s00180-024-01463-8","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

Abstract

The Benford law is used world-wide for detecting non-conformance or data fraud of numerical data. It says that the significand of a data set from the universe is not uniformly, but logarithmically distributed. Especially, the first non-zero digit is One with an approximate probability of 0.3. There are several tests available for testing Benford, the best known are Pearson’s \(\chi ^2\)-test, the Kolmogorov–Smirnov test and a modified version of the MAD-test. In the present paper we propose some tests, three of the four invariant sum tests are new and they are motivated by the sum invariance property of the Benford law. Two distance measures are investigated, Euclidean and Mahalanobis distance of the standardized sums to the orign. We use the significands corresponding to the first significant digit as well as the second significant digit, respectively. Moreover, we suggest inproved versions of the MAD-test and obtain critical values that are independent of the sample sizes. For illustration the tests are applied to specifically selected data sets where prior knowledge is available about being or not being Benford. Furthermore we discuss the role of truncation of distributions.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于评估本福德定律的一些新的不变量总和检验和 MAD 检验

本福德定律在世界范围内被用于检测数字数据的不一致性或数据欺诈。它指出，来自宇宙的数据集的意义值不是均匀分布的，而是对数分布的。特别是第一个非零数字为一的概率约为 0.3。有几种检验方法可以用来检验 Benford，其中最著名的是：Pearson's \(\chi^2\)检验、Kolmogorov-Smirnov 检验和改进版的 MAD 检验。在本文中，我们提出了一些检验方法，其中四个不变量和检验中有三个是新的，它们都是由本福德定律的和不变量属性激发的。本文研究了两种距离度量，即标准化和与原点的欧氏距离和马哈罗诺比距离。我们分别使用了与第一位有效数字和第二位有效数字相对应的符号。此外，我们还提出了 MAD 检验的改进版本，并获得了与样本大小无关的临界值。为了说明问题，我们将测试应用于特定的数据集，在这些数据集中，我们可以事先了解是否为 Benford 数据集。此外，我们还讨论了截断分布的作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Computational Statistics 数学-统计学与概率论

CiteScore

2.90

自引率

0.00%

发文量

122

审稿时长

>12 weeks

期刊介绍： Computational Statistics (CompStat) is an international journal which promotes the publication of applications and methodological research in the field of Computational Statistics. The focus of papers in CompStat is on the contribution to and influence of computing on statistics and vice versa. The journal provides a forum for computer scientists, mathematicians, and statisticians in a variety of fields of statistics such as biometrics, econometrics, data analysis, graphics, simulation, algorithms, knowledge based systems, and Bayesian computing. CompStat publishes hardware, software plus package reports.