Pub Date : 2024-11-13DOI: 10.1007/s10182-024-00515-0
Dennis Kant, Andreas Pick, Jasper de Winter
This paper compares the ability of several econometric and machine learning methods to nowcast GDP in (pseudo) real-time. The analysis takes the example of Dutch GDP over the period 1992Q1–2018Q4 using a broad data set of monthly indicators. It discusses the forecast accuracy but also analyzes the use of information from the large data set of macroeconomic and financial predictors. We find that, on average, the random forest provides the most accurate forecast and nowcasts, whilst the dynamic factor model provides the most accurate backcasts.
{"title":"Nowcasting GDP using machine learning methods","authors":"Dennis Kant, Andreas Pick, Jasper de Winter","doi":"10.1007/s10182-024-00515-0","DOIUrl":"10.1007/s10182-024-00515-0","url":null,"abstract":"<div><p>This paper compares the ability of several econometric and machine learning methods to nowcast GDP in (pseudo) real-time. The analysis takes the example of Dutch GDP over the period 1992Q1–2018Q4 using a broad data set of monthly indicators. It discusses the forecast accuracy but also analyzes the use of information from the large data set of macroeconomic and financial predictors. We find that, on average, the random forest provides the most accurate forecast and nowcasts, whilst the dynamic factor model provides the most accurate backcasts.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"109 1","pages":"1 - 24"},"PeriodicalIF":1.4,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-024-00515-0.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143530002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-09DOI: 10.1007/s10182-024-00516-z
Seonghun Cho, Minsup Shin, Young Hyun Cho, Johan Lim
This research proposes a method to test and estimate change points in the covariance structure of high-dimensional multivariate series data. Our method uses the trace of the beta matrix, known as Pillai’s statistics, to test the change in covariance matrix at each time point. We study the asymptotic normality of Pillai’s statistics for testing the equality of two covariance matrices when both sample size and dimension increase at the same rate. We test the existence of a single change point in a given time period using Cauchy combination test, the test using an weighted sum of Cauchy transformed p-values, and estimate the change point as the point whose statistic is the greatest. To test and estimate multiple change points, we use the idea of the wild binary segmentation and repeatedly apply the procedure for a single change point to each segmented period until no significant change point exists. We numerically provide the size and power of our method. We finally apply our procedure to finding abnormal behavior in the investment of a private equity fund.
本研究提出了一种测试和估计高维多变量序列数据协方差结构变化点的方法。我们的方法使用贝塔矩阵的迹,即 Pillai 统计量,来检验每个时间点协方差矩阵的变化。我们研究了 Pillai 统计量的渐近正态性,以检验样本量和维度以相同速度增加时两个协方差矩阵的相等性。我们使用考奇组合检验(该检验使用考奇转换 p 值的加权和)来检验给定时间段内是否存在单个变化点,并将统计量最大的点作为变化点。为了检验和估计多个变化点,我们采用了二元狂分段的思想,对每个分段时期重复应用单个变化点的程序,直到不存在显著变化点为止。我们用数字说明了我们方法的规模和威力。最后,我们将我们的程序应用于发现私募股权基金投资中的异常行为。
{"title":"Change point detection in high dimensional covariance matrix using Pillai’s statistics","authors":"Seonghun Cho, Minsup Shin, Young Hyun Cho, Johan Lim","doi":"10.1007/s10182-024-00516-z","DOIUrl":"10.1007/s10182-024-00516-z","url":null,"abstract":"<div><p>This research proposes a method to test and estimate change points in the covariance structure of high-dimensional multivariate series data. Our method uses the trace of the beta matrix, known as Pillai’s statistics, to test the change in covariance matrix at each time point. We study the asymptotic normality of Pillai’s statistics for testing the equality of two covariance matrices when both sample size and dimension increase at the same rate. We test the existence of a single change point in a given time period using Cauchy combination test, the test using an weighted sum of Cauchy transformed <i>p</i>-values, and estimate the change point as the point whose statistic is the greatest. To test and estimate multiple change points, we use the idea of the wild binary segmentation and repeatedly apply the procedure for a single change point to each segmented period until no significant change point exists. We numerically provide the size and power of our method. We finally apply our procedure to finding abnormal behavior in the investment of a private equity fund.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"109 1","pages":"53 - 84"},"PeriodicalIF":1.4,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-024-00516-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143530001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Questionnaires are useful tool for exploring respondents’ perceptions through ratings, assumed to result from a latent decision process (DP). The DP varies when respondents rate on Likert or Semantic Differential scales. A possible paradigm to formalize the DP is based on the presence of a feeling and an uncertainty latent component, originally proposed as the foundations of the CUB (Combination of Uniform and shifted Binomial) class. It can be assumed that with Likert scales, respondents begin reasoning from the bottom, progressing upwards based on their sensations. Conversely, Semantic Differential scale users are assumed to start from the middle and move either upward or downward. The CUM (Combination of Uniform and Multinomial), a new model in the CUB class, derived from this DP, analyzes rating data on a Semantic Differential scale. This paper defines the concept of local and global unidirectional equivalence and studies, from an analytical point of view, the conditions under which CUB and CUM models generate identical theoretical probabilities, in order to enhance the interpretative understanding of the models.
问卷是有用的工具,探索受访者的看法通过评级,假设结果从一个潜在的决策过程(DP)。当被调查者在李克特或语义差异量表上评分时,DP会有所不同。将DP形式化的一个可能范例是基于感觉和不确定性潜在成分的存在,最初被提议作为CUB(统一和转移二项组合)类的基础。可以假设,在李克特量表中,被调查者从底部开始推理,根据他们的感觉向上发展。相反,假设语义差异量表的用户从中间开始,向上或向下移动。在此基础上衍生出了CUB类中的新模型CUM (combined of Uniform and Multinomial),该模型在语义差异尺度上分析评级数据。本文定义了局部和全局单向等价的概念,并从分析的角度研究了CUB和CUM模型产生相同理论概率的条件,以增强对模型的解释性理解。
{"title":"On the equivalence of two mixture models for rating data","authors":"Matteo Ventura, Ambra Macis, Marica Manisera, Paola Zuccolotto","doi":"10.1007/s10182-024-00513-2","DOIUrl":"10.1007/s10182-024-00513-2","url":null,"abstract":"<div><p>Questionnaires are useful tool for exploring respondents’ perceptions through ratings, assumed to result from a latent decision process (DP). The DP varies when respondents rate on Likert or Semantic Differential scales. A possible paradigm to formalize the DP is based on the presence of a feeling and an uncertainty latent component, originally proposed as the foundations of the CUB (Combination of Uniform and shifted Binomial) class. It can be assumed that with Likert scales, respondents begin reasoning from the bottom, progressing upwards based on their sensations. Conversely, Semantic Differential scale users are assumed to start from the middle and move either upward or downward. The CUM (Combination of Uniform and Multinomial), a new model in the CUB class, derived from this DP, analyzes rating data on a Semantic Differential scale. This paper defines the concept of local and global unidirectional equivalence and studies, from an analytical point of view, the conditions under which CUB and CUM models generate identical theoretical probabilities, in order to enhance the interpretative understanding of the models.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"109 2","pages":"387 - 411"},"PeriodicalIF":1.4,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145170213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-15DOI: 10.1007/s10182-024-00514-1
Christian H. Weiß, Osama Swidan
A common approach for modeling categorical time series is Hidden-Markov models (HMMs), where the actual observations are assumed to depend on hidden states in their behavior and transitions. Such categorical HMMs are even applicable to nominal data but suffer from a large number of model parameters. In the ordinal case, however, the natural order among the categorical outcomes offers the potential to reduce the number of parameters while improving their interpretability at the same time. The class of ordinal HMMs proposed in this article link a latent-variable approach with categorical HMMs. They are characterized by parametric parsimony and allow the easy calculation of relevant stochastic properties, such as marginal and bivariate probabilities. These points are illustrated by numerical examples and simulation experiments, where the performance of maximum likelihood estimation is analyzed in finite samples. The developed methodology is applied to real-world data from a health application.
{"title":"Hidden-Markov models for ordinal time series","authors":"Christian H. Weiß, Osama Swidan","doi":"10.1007/s10182-024-00514-1","DOIUrl":"10.1007/s10182-024-00514-1","url":null,"abstract":"<div><p>A common approach for modeling categorical time series is Hidden-Markov models (HMMs), where the actual observations are assumed to depend on hidden states in their behavior and transitions. Such categorical HMMs are even applicable to nominal data but suffer from a large number of model parameters. In the ordinal case, however, the natural order among the categorical outcomes offers the potential to reduce the number of parameters while improving their interpretability at the same time. The class of ordinal HMMs proposed in this article link a latent-variable approach with categorical HMMs. They are characterized by parametric parsimony and allow the easy calculation of relevant stochastic properties, such as marginal and bivariate probabilities. These points are illustrated by numerical examples and simulation experiments, where the performance of maximum likelihood estimation is analyzed in finite samples. The developed methodology is applied to real-world data from a health application.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"109 2","pages":"217 - 239"},"PeriodicalIF":1.4,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-024-00514-1.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145165788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-25DOI: 10.1007/s10182-024-00511-4
Anna Gloria Billé, Marco Rogna
Prompted by the need to reduce the concentration of ({hbox {CO}}_2) in the atmosphere in order to limit global warming, several countries are adopting policies to incentivize the production of clean energy. In this context, a relevant aspect to be examined is the effect of expanding renewable resources on employment. Despite the large use of panel and time series analysis to investigate the topic, most of the econometric models generally consider a very small number of regressors. Furthermore, the spatial component, a potentially important determinant of employment, has been always neglected. By making use of a relatively large dataset of 59 countries spanning for 19 years (from 1996 to 2014), the present paper tries to fill these gaps by specifying a dynamic spatial panel data (SDPD) model with fixed effects and spatial error autocorrelation. The specification of both the individual and time fixed effects allows us to consider both spatial and temporal heterogeneity. Moreover, their presence and the long panel dimension avoid spurious correlations (Granger and Hyung 1999). Proper marginal effects are then calculated also to reveal different impacts among countries worldwide. Our results confirm the positive role of expanding renewable energy production on employment, at the 5% significant level, leading also to significant total and direct short-term and long-terms marginal effects.
由于需要减少大气中({hbox {CO}}_2)的浓度,以限制全球变暖,一些国家正在采取鼓励生产清洁能源的政策。在这方面,需要审查的一个有关方面是扩大可再生资源对就业的影响。尽管大量使用面板和时间序列分析来调查该主题,但大多数计量经济模型通常只考虑非常少量的回归量。此外,空间因素是就业的一个潜在的重要决定因素,但一直被忽视。本文利用59个国家19年(1996 - 2014)的相对较大的数据集,试图通过指定具有固定效应和空间误差自相关的动态空间面板数据(SDPD)模型来填补这些空白。个体效应和时间固定效应的具体化使我们能够同时考虑空间和时间的异质性。此外,它们的存在和长面板维度避免了虚假相关性(Granger和Hyung 1999)。然后计算适当的边际效应,以揭示世界各国之间的不同影响。我们的研究结果证实了扩大可再生能源生产对就业的积极作用% significant level, leading also to significant total and direct short-term and long-terms marginal effects.
{"title":"Spillovers effects and temporal dynamics on the impact of renewables on labour force: a world perspective","authors":"Anna Gloria Billé, Marco Rogna","doi":"10.1007/s10182-024-00511-4","DOIUrl":"10.1007/s10182-024-00511-4","url":null,"abstract":"<div><p>Prompted by the need to reduce the concentration of <span>({hbox {CO}}_2)</span> in the atmosphere in order to limit global warming, several countries are adopting policies to incentivize the production of clean energy. In this context, a relevant aspect to be examined is the effect of expanding renewable resources on employment. Despite the large use of panel and time series analysis to investigate the topic, most of the econometric models generally consider a very small number of regressors. Furthermore, the spatial component, a potentially important determinant of employment, has been always neglected. By making use of a relatively large dataset of 59 countries spanning for 19 years (from 1996 to 2014), the present paper tries to fill these gaps by specifying a dynamic spatial panel data (SDPD) model with fixed effects and spatial error autocorrelation. The specification of both the individual and time fixed effects allows us to consider both spatial and temporal heterogeneity. Moreover, their presence and the long panel dimension avoid spurious correlations (Granger and Hyung 1999). Proper marginal effects are then calculated also to reveal different impacts among countries worldwide. Our results confirm the positive role of expanding renewable energy production on employment, at the 5% significant level, leading also to significant total and direct short-term and long-terms marginal effects.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"109 4","pages":"637 - 665"},"PeriodicalIF":1.4,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145915724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-17DOI: 10.1007/s10182-024-00512-3
Huiqiao Wang, Christian H. Weiß, Mingming Zhang
A common choice for the marginal distribution of a bivariate count time series is the bivariate Poisson distribution. In practice, however, when the count data exhibit zero inflation, overdispersion or non-stationarity features, such that a marginal bivariate Poisson distribution is not suitable. To test the discrepancy between the actual count data and the bivariate Poisson distribution, we propose a new goodness-of-fit test based on a bivariate dispersion index. The asymptotic distribution of the test statistic under the null hypothesis of a first-order bivariate integer-valued autoregressive model with marginal bivariate Poisson distribution is derived, and the finite-sample performance of the goodness-of-fit test is analyzed by simulations. A real-data example illustrate the application and usefulness of the test in practice.
{"title":"Goodness-of-fit testing in bivariate count time series based on a bivariate dispersion index","authors":"Huiqiao Wang, Christian H. Weiß, Mingming Zhang","doi":"10.1007/s10182-024-00512-3","DOIUrl":"10.1007/s10182-024-00512-3","url":null,"abstract":"<div><p>A common choice for the marginal distribution of a bivariate count time series is the bivariate Poisson distribution. In practice, however, when the count data exhibit zero inflation, overdispersion or non-stationarity features, such that a marginal bivariate Poisson distribution is not suitable. To test the discrepancy between the actual count data and the bivariate Poisson distribution, we propose a new goodness-of-fit test based on a bivariate dispersion index. The asymptotic distribution of the test statistic under the null hypothesis of a first-order bivariate integer-valued autoregressive model with marginal bivariate Poisson distribution is derived, and the finite-sample performance of the goodness-of-fit test is analyzed by simulations. A real-data example illustrate the application and usefulness of the test in practice.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"109 2","pages":"241 - 279"},"PeriodicalIF":1.4,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-024-00512-3.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142253230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-20DOI: 10.1007/s10182-024-00509-y
YuZhu Tian, ChunHo Wu, ManLai Tang, MaoZai Tian
In this paper, we propose a Bayesian quantile regression (QR) approach to jointly model multivariate ordinal data. Firstly, a multivariate latent variable model is used to link the multivariate ordinal data and latent continuous responses and the multivariate asymmetric Laplace (MAL) distribution is employed to construct the joint QR-based working likelihood for the considered model. Secondly, adaptive-(L_{1/2}) penalization priors of regression parameters are incorporated into the working likelihood to implement high-dimensional Bayesian joint QR inference. Markov Chain Monte Carlo (MCMC) algorithm is utilized to derive the fully conditional posterior distributions of all parameters. Thirdly, Bayesian joint relatively QR estimation approach is recommended to result in more efficient estimation results. Finally, Monte Carlo simulation studies and a real instance analysis of multirater agreement data are presented to illustrate the performance of the proposed Bayesian joint relatively QR approach.
本文提出了一种贝叶斯量化回归(QR)方法,用于对多元序数数据进行联合建模。首先,使用多变量潜变量模型将多变量序数数据和潜连续响应联系起来,并使用多变量非对称拉普拉斯(MAL)分布为所考虑的模型构建基于 QR 的联合工作似然。其次,将回归参数的自适应-(L_{1/2}) 惩罚先验纳入工作似然,以实现高维贝叶斯联合 QR 推理。利用马尔可夫链蒙特卡罗(MCMC)算法得出所有参数的全条件后验分布。第三,建议采用贝叶斯联合相对 QR 估计方法,以获得更高效的估计结果。最后,介绍了蒙特卡罗模拟研究和多方一致数据的真实实例分析,以说明所建议的贝叶斯联合相对 QR 方法的性能。
{"title":"Bayesian joint relatively quantile regression of latent ordinal multivariate linear models with application to multirater agreement analysis","authors":"YuZhu Tian, ChunHo Wu, ManLai Tang, MaoZai Tian","doi":"10.1007/s10182-024-00509-y","DOIUrl":"10.1007/s10182-024-00509-y","url":null,"abstract":"<div><p>In this paper, we propose a Bayesian quantile regression (QR) approach to jointly model multivariate ordinal data. Firstly, a multivariate latent variable model is used to link the multivariate ordinal data and latent continuous responses and the multivariate asymmetric Laplace (MAL) distribution is employed to construct the joint QR-based working likelihood for the considered model. Secondly, adaptive-<span>(L_{1/2})</span> penalization priors of regression parameters are incorporated into the working likelihood to implement high-dimensional Bayesian joint QR inference. Markov Chain Monte Carlo (MCMC) algorithm is utilized to derive the fully conditional posterior distributions of all parameters. Thirdly, Bayesian joint relatively QR estimation approach is recommended to result in more efficient estimation results. Finally, Monte Carlo simulation studies and a real instance analysis of multirater agreement data are presented to illustrate the performance of the proposed Bayesian joint relatively QR approach.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"109 1","pages":"85 - 116"},"PeriodicalIF":1.4,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-14DOI: 10.1007/s10182-024-00510-5
Ali Al-Sharadqah, Karine Bagdasaryan, Ola Nusierat
This paper focuses on the general linear measurement error model, in which some or all predictors are measured with error, while others are measured precisely. We propose a semi-parametric estimator that works under general mechanisms of measurement error, including differential and non-differential errors. Other popular methods, such as the corrected score and conditional score methods, only work for non-differential measurement error models, but our estimator works in all scenarios. We develop our estimator by considering a family of objective functions that depend on an unspecified weight function. Using statistical error analysis and perturbation theory, we derive the optimal weight function under the small-sigma regime. The resulting estimator is statistically optimal in all senses. Even though we develop it under the small-sigma regime, we also establish its consistency and asymptotic normality under the large sample regime. Finally, we conduct a series of numerical experiments to confirm that the proposed estimator outperforms other existing methods.
{"title":"A Finite-sample bias correction method for general linear model in the presence of differential measurement errors","authors":"Ali Al-Sharadqah, Karine Bagdasaryan, Ola Nusierat","doi":"10.1007/s10182-024-00510-5","DOIUrl":"10.1007/s10182-024-00510-5","url":null,"abstract":"<div><p>This paper focuses on the general linear measurement error model, in which some or all predictors are measured with error, while others are measured precisely. We propose a semi-parametric estimator that works under general mechanisms of measurement error, including differential and non-differential errors. Other popular methods, such as the corrected score and conditional score methods, only work for non-differential measurement error models, but our estimator works in all scenarios. We develop our estimator by considering a family of objective functions that depend on an unspecified weight function. Using statistical error analysis and perturbation theory, we derive the optimal weight function under the small-sigma regime. The resulting estimator is statistically optimal in all senses. Even though we develop it under the small-sigma regime, we also establish its consistency and asymptotic normality under the large sample regime. Finally, we conduct a series of numerical experiments to confirm that the proposed estimator outperforms other existing methods.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"109 1","pages":"149 - 195"},"PeriodicalIF":1.4,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-08DOI: 10.1007/s10182-024-00505-2
Roy Cerqueti, Mario Maggi
Benford’s law is a particular discrete probability distribution that is often satisfied by the significant digits of a dataset. The nonconformity with Benford’s law suggests the possible presence of data manipulation. This paper introduces two novel generalized versions of Benford’s law that are less restrictive than the original Benford’s law—hence, leading to more probable conformity of a given dataset. Such generalizations are grounded on the existing mathematical relations between Benford’s law probability distribution elements. Moreover, one of them leads to a set of probability distributions that is a proper subset of that of the other one. We show that the considered versions of Benford’s law have a geometric representation on the three-dimensional Euclidean space. Through suitable optimization models, we show that all the probability distributions satisfying the more restrictive generalization exhibit at least acceptable conformity with Benford’s law, according to the most popular distance measures. We also present some examples to highlight the practical usefulness of the introduced devices.
{"title":"Classes of probability measures built on the properties of Benford’s law","authors":"Roy Cerqueti, Mario Maggi","doi":"10.1007/s10182-024-00505-2","DOIUrl":"10.1007/s10182-024-00505-2","url":null,"abstract":"<div><p>Benford’s law is a particular discrete probability distribution that is often satisfied by the significant digits of a dataset. The nonconformity with Benford’s law suggests the possible presence of data manipulation. This paper introduces two novel generalized versions of Benford’s law that are less restrictive than the original Benford’s law—hence, leading to more probable conformity of a given dataset. Such generalizations are grounded on the existing mathematical relations between Benford’s law probability distribution elements. Moreover, one of them leads to a set of probability distributions that is a proper subset of that of the other one. We show that the considered versions of Benford’s law have a geometric representation on the three-dimensional Euclidean space. Through suitable optimization models, we show that all the probability distributions satisfying the more restrictive generalization exhibit at least acceptable conformity with Benford’s law, according to the most popular distance measures. We also present some examples to highlight the practical usefulness of the introduced devices.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"109 1","pages":"197 - 216"},"PeriodicalIF":1.4,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-024-00505-2.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141948728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-30DOI: 10.1007/s10182-024-00508-z
Alexander Gerharz, Andreas Groll, Gunther Schauberger
{"title":"Publisher Correction: Deducing neighborhoods of classes from a fitted model","authors":"Alexander Gerharz, Andreas Groll, Gunther Schauberger","doi":"10.1007/s10182-024-00508-z","DOIUrl":"10.1007/s10182-024-00508-z","url":null,"abstract":"","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"108 4","pages":"915 - 915"},"PeriodicalIF":1.4,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-024-00508-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}