首页 > 最新文献

Computational Statistics最新文献

英文 中文
Variational Bayesian Lasso for spline regression 用于样条回归的变异贝叶斯套索法
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-02-24 DOI: 10.1007/s00180-024-01470-9
Larissa C. Alves, Ronaldo Dias, Helio S. Migon

This work presents a new scalable automatic Bayesian Lasso methodology with variational inference for non-parametric splines regression that can capture the non-linear relationship between a response variable and predictor variables. Note that under non-parametric point of view the regression curve is assumed to lie in a infinite dimension space. Regression splines use a finite approximation of this infinite space, representing the regression function by a linear combination of basis functions. The crucial point of the approach is determining the appropriate number of bases or equivalently number of knots, avoiding over-fitting/under-fitting. A decision-theoretic approach was devised for knot selection. Comprehensive simulation studies were conducted in challenging scenarios to compare alternative criteria for knot selection, thereby ensuring the efficacy of the proposed algorithms. Additionally, the performance of the proposed method was assessed using real-world datasets. The novel procedure demonstrated good performance in capturing the underlying data structure by selecting the appropriate number of knots/basis.

本研究提出了一种新的可扩展自动贝叶斯拉索方法,该方法采用变异推理进行非参数劈叉回归,可以捕捉响应变量与预测变量之间的非线性关系。请注意,从非参数的角度来看,回归曲线被假定位于无限维空间中。回归样条曲线使用这个无限空间的有限近似值,通过基函数的线性组合来表示回归函数。该方法的关键点在于确定适当的基数或等效的节数,避免过度拟合/拟合不足。为选择节点设计了一种决策理论方法。在具有挑战性的场景中进行了全面的模拟研究,以比较选择绳结的替代标准,从而确保所建议算法的有效性。此外,还利用现实世界的数据集对所提出方法的性能进行了评估。通过选择适当数量的节点/基点,新程序在捕捉底层数据结构方面表现出色。
{"title":"Variational Bayesian Lasso for spline regression","authors":"Larissa C. Alves, Ronaldo Dias, Helio S. Migon","doi":"10.1007/s00180-024-01470-9","DOIUrl":"https://doi.org/10.1007/s00180-024-01470-9","url":null,"abstract":"<p>This work presents a new scalable automatic Bayesian Lasso methodology with variational inference for non-parametric splines regression that can capture the non-linear relationship between a response variable and predictor variables. Note that under non-parametric point of view the regression curve is assumed to lie in a infinite dimension space. Regression splines use a finite approximation of this infinite space, representing the regression function by a linear combination of basis functions. The crucial point of the approach is determining the appropriate number of bases or equivalently number of knots, avoiding over-fitting/under-fitting. A decision-theoretic approach was devised for knot selection. Comprehensive simulation studies were conducted in challenging scenarios to compare alternative criteria for knot selection, thereby ensuring the efficacy of the proposed algorithms. Additionally, the performance of the proposed method was assessed using real-world datasets. The novel procedure demonstrated good performance in capturing the underlying data structure by selecting the appropriate number of knots/basis.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"611 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139956295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian estimation of the number of species from Poisson-Lindley stochastic abundance model using non-informative priors 利用非信息先验从泊松-林德利随机丰度模型中对物种数量进行贝叶斯估计
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-02-23 DOI: 10.1007/s00180-024-01464-7
Anurag Pathak, Manoj Kumar, Sanjay Kumar Singh, Umesh Singh, Sandeep Kumar

In this article, we propose a Poisson-Lindley distribution as a stochastic abundance model in which the sample is according to the independent Poisson process. Jeffery’s and Bernardo’s reference priors have been obtaining and proposed the Bayes estimators of the number of species for this model. The proposed Bayes estimators have been compared with the corresponding profile and conditional maximum likelihood estimators for their square root of the risks under squared error loss function (SELF). Jeffery’s and Bernardo’s reference priors have been considered and compared with the Bayesian approach based on biological data.

在本文中,我们提出了泊松-林德利分布作为随机丰度模型,其中样本是根据独立泊松过程。我们获得了 Jeffery 和 Bernardo 的参考先验,并提出了该模型的物种数量贝叶斯估计值。所提出的贝叶斯估计值与相应的轮廓估计值和条件最大似然估计值在平方误差损失函数(SELF)下的风险平方根进行了比较。还考虑了杰弗里和贝尔纳多的参考先验,并与基于生物数据的贝叶斯方法进行了比较。
{"title":"Bayesian estimation of the number of species from Poisson-Lindley stochastic abundance model using non-informative priors","authors":"Anurag Pathak, Manoj Kumar, Sanjay Kumar Singh, Umesh Singh, Sandeep Kumar","doi":"10.1007/s00180-024-01464-7","DOIUrl":"https://doi.org/10.1007/s00180-024-01464-7","url":null,"abstract":"<p>In this article, we propose a Poisson-Lindley distribution as a stochastic abundance model in which the sample is according to the independent Poisson process. Jeffery’s and Bernardo’s reference priors have been obtaining and proposed the Bayes estimators of the number of species for this model. The proposed Bayes estimators have been compared with the corresponding profile and conditional maximum likelihood estimators for their square root of the risks under squared error loss function (SELF). Jeffery’s and Bernardo’s reference priors have been considered and compared with the Bayesian approach based on biological data.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"19 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139951516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generation of normal distributions revisited 重新审视正态分布的生成
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-02-23 DOI: 10.1007/s00180-024-01468-3
Takayuki Umeda

Normally distributed random numbers are commonly used in scientific computing in various fields. It is important to generate a set of random numbers as close to a normal distribution as possible for reducing initial fluctuations. Two types of samples from a uniform distribution are examined as source samples for inverse transform sampling methods. Three types of inverse transform sampling methods with new approximations of inverse cumulative distribution functions are also discussed for converting uniformly distributed source samples to normally distributed samples.

正态分布随机数常用于各个领域的科学计算。为减少初始波动,生成一组尽可能接近正态分布的随机数非常重要。本文研究了均匀分布的两种样本,作为反变换采样方法的源样本。此外,还讨论了三种具有新的反向累积分布函数近似值的反变换抽样方法,用于将均匀分布源样本转换为正态分布样本。
{"title":"Generation of normal distributions revisited","authors":"Takayuki Umeda","doi":"10.1007/s00180-024-01468-3","DOIUrl":"https://doi.org/10.1007/s00180-024-01468-3","url":null,"abstract":"<p>Normally distributed random numbers are commonly used in scientific computing in various fields. It is important to generate a set of random numbers as close to a normal distribution as possible for reducing initial fluctuations. Two types of samples from a uniform distribution are examined as source samples for inverse transform sampling methods. Three types of inverse transform sampling methods with new approximations of inverse cumulative distribution functions are also discussed for converting uniformly distributed source samples to normally distributed samples.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"32 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139951514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian regression models in gretl: the BayTool package gretl 中的贝叶斯回归模型:BayTool 软件包
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-02-21 DOI: 10.1007/s00180-024-01466-5
Luca Pedini

This article presents the gretl package BayTool which integrates the software functionalities, mostly concerned with frequentist approaches, with Bayesian estimation methods of commonly used econometric models. Computational efficiency is achieved by pairing an extensive use of Gibbs sampling for posterior simulation with the possibility of splitting single-threaded experiments into multiple cores or machines by means of parallelization. From the user’s perspective, the package requires only basic knowledge of gretl scripting to fully access its functionality, while providing a point-and-click solution in the form of a graphical interface for a less experienced audience. These features, in particular, make BayTool stand out as an excellent teaching device without sacrificing more advanced or complex applications.

本文介绍的 gretl 软件包 BayTool 整合了软件功能(主要涉及频繁主义方法)和常用计量经济学模型的贝叶斯估计方法。通过广泛使用吉布斯采样进行后验模拟,并通过并行化将单线程实验分拆到多个内核或多台机器上的可能性,实现了计算效率。从用户的角度来看,该软件包只需要具备基本的 gretl 脚本知识就能完全使用其功能,同时还以图形界面的形式为经验不足的用户提供了点选式解决方案。这些特点尤其使 BayTool 成为出色的教学设备,而不会牺牲更高级或更复杂的应用。
{"title":"Bayesian regression models in gretl: the BayTool package","authors":"Luca Pedini","doi":"10.1007/s00180-024-01466-5","DOIUrl":"https://doi.org/10.1007/s00180-024-01466-5","url":null,"abstract":"<p>This article presents the <span>gretl</span> package <span>BayTool</span> which integrates the software functionalities, mostly concerned with frequentist approaches, with Bayesian estimation methods of commonly used econometric models. Computational efficiency is achieved by pairing an extensive use of Gibbs sampling for posterior simulation with the possibility of splitting single-threaded experiments into multiple cores or machines by means of parallelization. From the user’s perspective, the package requires only basic knowledge of <span>gretl</span> scripting to fully access its functionality, while providing a point-and-click solution in the form of a graphical interface for a less experienced audience. These features, in particular, make <span>BayTool</span> stand out as an excellent teaching device without sacrificing more advanced or complex applications.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"14 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139927827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian sequential probability ratio test for vaccine efficacy trials 疫苗效力试验的贝叶斯序列概率比检验
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-02-20 DOI: 10.1007/s00180-024-01458-5
Erina Paul, Santosh Sutradhar, Jonathan Hartzel, Devan V. Mehrotra

Designing vaccine efficacy (VE) trials often requires recruiting large numbers of participants when the diseases of interest have a low incidence. When developing novel vaccines, such as for COVID-19 disease, the plausible range of VE is quite large at the design stage. Thus, the number of events needed to demonstrate efficacy above a pre-defined regulatory threshold can be difficult to predict and the time needed to accrue the necessary events can often be long. Therefore, it is advantageous to evaluate the efficacy at earlier interim analysis in the trial to potentially allow the trials to stop early for overwhelming VE or futility. In such cases, incorporating interim analyses through the use of the sequential probability ratio test (SPRT) can be helpful to allow for multiple analyses while controlling for both type-I and type-II errors. In this article, we propose a Bayesian SPRT for designing a vaccine trial for comparing a test vaccine with a control assuming two Poisson incidence rates. We provide guidance on how to choose the prior distribution and how to optimize the number of events for interim analyses to maximize the efficiency of the design. Through simulations, we demonstrate how the proposed Bayesian SPRT performs better when compared with the corresponding frequentist SPRT. An R repository to implement the proposed method is placed at: https://github.com/Merck/bayesiansprt.

当相关疾病的发病率较低时,设计疫苗效力(VE)试验往往需要招募大量参与者。在开发新型疫苗(如 COVID-19 疾病)时,在设计阶段 VE 的合理范围相当大。因此,要证明疗效超过预先设定的监管阈值所需的事件数量可能难以预测,而积累必要事件所需的时间往往很长。因此,在试验的早期中期分析中对疗效进行评估是很有好处的,这样有可能使试验因VE过高或无效而提前结束。在这种情况下,通过使用序贯概率比检验(SPRT)进行中期分析有助于进行多重分析,同时控制 I 型和 II 型误差。在本文中,我们提出了一种贝叶斯概率比检验方法,用于设计疫苗试验,在假设两种泊松发病率的情况下比较试验疫苗和对照疫苗。我们就如何选择先验分布以及如何优化中期分析的事件数以最大限度地提高设计效率提供了指导。通过模拟,我们展示了所提出的贝叶斯 SPRT 与相应的频数 SPRT 相比如何表现得更好。实现所提方法的 R 代码库位于:https://github.com/Merck/bayesiansprt。
{"title":"Bayesian sequential probability ratio test for vaccine efficacy trials","authors":"Erina Paul, Santosh Sutradhar, Jonathan Hartzel, Devan V. Mehrotra","doi":"10.1007/s00180-024-01458-5","DOIUrl":"https://doi.org/10.1007/s00180-024-01458-5","url":null,"abstract":"<p>Designing vaccine efficacy (VE) trials often requires recruiting large numbers of participants when the diseases of interest have a low incidence. When developing novel vaccines, such as for COVID-19 disease, the plausible range of VE is quite large at the design stage. Thus, the number of events needed to demonstrate efficacy above a pre-defined regulatory threshold can be difficult to predict and the time needed to accrue the necessary events can often be long. Therefore, it is advantageous to evaluate the efficacy at earlier interim analysis in the trial to potentially allow the trials to stop early for overwhelming VE or futility. In such cases, incorporating interim analyses through the use of the sequential probability ratio test (SPRT) can be helpful to allow for multiple analyses while controlling for both type-I and type-II errors. In this article, we propose a Bayesian SPRT for designing a vaccine trial for comparing a test vaccine with a control assuming two Poisson incidence rates. We provide guidance on how to choose the prior distribution and how to optimize the number of events for interim analyses to maximize the efficiency of the design. Through simulations, we demonstrate how the proposed Bayesian SPRT performs better when compared with the corresponding frequentist SPRT. An R repository to implement the proposed method is placed at: https://github.com/Merck/bayesiansprt.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"14 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139927751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Overlapping coefficient in network-based semi-supervised clustering 基于网络的半监督聚类中的重叠系数
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-02-19 DOI: 10.1007/s00180-024-01457-6
Claudio Conversano, Luca Frigau, Giulia Contu

Network-based Semi-Supervised Clustering (NeSSC) is a semi-supervised approach for clustering in the presence of an outcome variable. It uses a classification or regression model on resampled versions of the original data to produce a proximity matrix that indicates the magnitude of the similarity between pairs of observations measured with respect to the outcome. This matrix is transformed into a complex network on which a community detection algorithm is applied to search for underlying community structures which is a partition of the instances into highly homogeneous clusters to be evaluated in terms of the outcome. In this paper, we focus on the case the outcome variable to be used in NeSSC is numeric and propose an alternative selection criterion of the optimal partition based on a measure of overlapping between density curves as well as a penalization criterion which takes accounts for the number of clusters in a candidate partition. Next, we consider the performance of the proposed method for some artificial datasets and for 20 different real datasets and compare NeSSC with the other three popular methods of semi-supervised clustering with a numeric outcome. Results show that NeSSC with the overlapping criterion works particularly well when a reduced number of clusters are scattered localized.

基于网络的半监督聚类(NeSSC)是一种在存在结果变量的情况下进行聚类的半监督方法。它在原始数据的重采样版本上使用分类或回归模型,生成一个邻近度矩阵,该矩阵显示了与结果相关的观测对之间的相似度大小。该矩阵被转化为一个复杂的网络,在该网络上应用群体检测算法来搜索潜在的群体结构,即把实例划分为高度同质的群组,以便根据结果进行评估。在本文中,我们重点讨论了 NeSSC 中使用的结果变量是数字变量的情况,并提出了一种基于密度曲线重叠度量的最优分区选择标准,以及一种考虑候选分区中聚类数量的惩罚标准。接下来,我们考虑了所提方法在一些人工数据集和 20 个不同真实数据集上的性能,并将 NeSSC 与其他三种流行的数字结果半监督聚类方法进行了比较。结果表明,采用重叠标准的 NeSSC 在聚类数量减少、分散定位的情况下效果尤佳。
{"title":"Overlapping coefficient in network-based semi-supervised clustering","authors":"Claudio Conversano, Luca Frigau, Giulia Contu","doi":"10.1007/s00180-024-01457-6","DOIUrl":"https://doi.org/10.1007/s00180-024-01457-6","url":null,"abstract":"<p>Network-based Semi-Supervised Clustering (NeSSC) is a semi-supervised approach for clustering in the presence of an outcome variable. It uses a classification or regression model on resampled versions of the original data to produce a proximity matrix that indicates the magnitude of the similarity between pairs of observations measured with respect to the outcome. This matrix is transformed into a complex network on which a community detection algorithm is applied to search for underlying community structures which is a partition of the instances into highly homogeneous clusters to be evaluated in terms of the outcome. In this paper, we focus on the case the outcome variable to be used in NeSSC is numeric and propose an alternative selection criterion of the optimal partition based on a measure of overlapping between density curves as well as a penalization criterion which takes accounts for the number of clusters in a candidate partition. Next, we consider the performance of the proposed method for some artificial datasets and for 20 different real datasets and compare NeSSC with the other three popular methods of semi-supervised clustering with a numeric outcome. Results show that NeSSC with the overlapping criterion works particularly well when a reduced number of clusters are scattered localized.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"18 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139927826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
First exit and Dirichlet problem for the nonisotropic tempered $$alpha$$ -stable processes 非各向同性节制 $$alpha$$ 稳定过程的首次出口和德里赫特问题
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-02-15 DOI: 10.1007/s00180-024-01462-9
Xing Liu, Weihua Deng

This paper discusses the first exit and Dirichlet problems of the nonisotropic tempered (alpha)-stable process (X_t). The upper bounds of all moments of the first exit position (left| X_{tau _D}right|) and the first exit time (tau _D) are explicitly obtained. It is found that the probability density function of (left| X_{tau _D}right|) or (tau _D) exponentially decays with the increase of (left| X_{tau _D}right|) or (tau _D), and (mathrm{E}left[ tau _Dright] sim mathrm{E}left[ left| X_{tau _D}-mathrm{E}left[ X_{tau _D}right] right| ^2right]), (mathrm{E}left[ tau _Dright] sim left| mathrm{E}left[ X_{tau _D}right] right|). Next, we obtain the Feynman–Kac representation of the Dirichlet problem by employing the semigroup theory. Furthermore, averaging the generated trajectories of the stochastic process leads to the solution of the Dirichlet problem, which is also verified by numerical experiments.

本文讨论了非各向同性的回火(α)-稳定过程 (X_t)的第一次出口问题和迪里夏特问题。明确得到了第一次退出位置 (left| X_{tau _D}right|) 和第一次退出时间 (tau _D) 的所有矩的上界。结果发现,随着 (left| X_{tau _D}right|) 或 (tau _D) 的增加,(left| X_{tau _D}right|) 或 (tau _D) 的概率密度函数呈指数衰减、and (mathrm{E}left[ tau _Dright] sim mathrm{E}left[ left| X_{tau _D}-mathrm{E}left[ X_{tau _D}right] right| ^2right])、(mathrm{E}left[ tau _Dright] sim left| mathrm{E}left[ X_{tau _D}right] right|)。接下来,我们利用半群理论得到迪里夏特问题的费曼-卡克表示。此外,对随机过程生成的轨迹进行平均,就可以得到迪里夏特问题的解,这也得到了数值实验的验证。
{"title":"First exit and Dirichlet problem for the nonisotropic tempered $$alpha$$ -stable processes","authors":"Xing Liu, Weihua Deng","doi":"10.1007/s00180-024-01462-9","DOIUrl":"https://doi.org/10.1007/s00180-024-01462-9","url":null,"abstract":"<p>This paper discusses the first exit and Dirichlet problems of the nonisotropic tempered <span>(alpha)</span>-stable process <span>(X_t)</span>. The upper bounds of all moments of the first exit position <span>(left| X_{tau _D}right|)</span> and the first exit time <span>(tau _D)</span> are explicitly obtained. It is found that the probability density function of <span>(left| X_{tau _D}right|)</span> or <span>(tau _D)</span> exponentially decays with the increase of <span>(left| X_{tau _D}right|)</span> or <span>(tau _D)</span>, and <span>(mathrm{E}left[ tau _Dright] sim mathrm{E}left[ left| X_{tau _D}-mathrm{E}left[ X_{tau _D}right] right| ^2right])</span>, <span>(mathrm{E}left[ tau _Dright] sim left| mathrm{E}left[ X_{tau _D}right] right|)</span>. Next, we obtain the Feynman–Kac representation of the Dirichlet problem by employing the semigroup theory. Furthermore, averaging the generated trajectories of the stochastic process leads to the solution of the Dirichlet problem, which is also verified by numerical experiments.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"23 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139766002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Some new invariant sum tests and MAD tests for the assessment of Benford’s law 用于评估本福德定律的一些新的不变量总和检验和 MAD 检验
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-02-13 DOI: 10.1007/s00180-024-01463-8
Wolfgang Kössler, Hans-J. Lenz, Xing D. Wang

The Benford law is used world-wide for detecting non-conformance or data fraud of numerical data. It says that the significand of a data set from the universe is not uniformly, but logarithmically distributed. Especially, the first non-zero digit is One with an approximate probability of 0.3. There are several tests available for testing Benford, the best known are Pearson’s (chi ^2)-test, the Kolmogorov–Smirnov test and a modified version of the MAD-test. In the present paper we propose some tests, three of the four invariant sum tests are new and they are motivated by the sum invariance property of the Benford law. Two distance measures are investigated, Euclidean and Mahalanobis distance of the standardized sums to the orign. We use the significands corresponding to the first significant digit as well as the second significant digit, respectively. Moreover, we suggest inproved versions of the MAD-test and obtain critical values that are independent of the sample sizes. For illustration the tests are applied to specifically selected data sets where prior knowledge is available about being or not being Benford. Furthermore we discuss the role of truncation of distributions.

本福德定律在世界范围内被用于检测数字数据的不一致性或数据欺诈。它指出,来自宇宙的数据集的意义值不是均匀分布的,而是对数分布的。特别是第一个非零数字为一的概率约为 0.3。有几种检验方法可以用来检验 Benford,其中最著名的是:Pearson's (chi^2)检验、Kolmogorov-Smirnov 检验和改进版的 MAD 检验。在本文中,我们提出了一些检验方法,其中四个不变量和检验中有三个是新的,它们都是由本福德定律的和不变量属性激发的。本文研究了两种距离度量,即标准化和与原点的欧氏距离和马哈罗诺比距离。我们分别使用了与第一位有效数字和第二位有效数字相对应的符号。此外,我们还提出了 MAD 检验的改进版本,并获得了与样本大小无关的临界值。为了说明问题,我们将测试应用于特定的数据集,在这些数据集中,我们可以事先了解是否为 Benford 数据集。此外,我们还讨论了截断分布的作用。
{"title":"Some new invariant sum tests and MAD tests for the assessment of Benford’s law","authors":"Wolfgang Kössler, Hans-J. Lenz, Xing D. Wang","doi":"10.1007/s00180-024-01463-8","DOIUrl":"https://doi.org/10.1007/s00180-024-01463-8","url":null,"abstract":"<p>The Benford law is used world-wide for detecting non-conformance or data fraud of numerical data. It says that the significand of a data set from the universe is not uniformly, but logarithmically distributed. Especially, the first non-zero digit is One with an approximate probability of 0.3. There are several tests available for testing Benford, the best known are Pearson’s <span>(chi ^2)</span>-test, the Kolmogorov–Smirnov test and a modified version of the MAD-test. In the present paper we propose some tests, three of the four invariant sum tests are new and they are motivated by the sum invariance property of the Benford law. Two distance measures are investigated, Euclidean and Mahalanobis distance of the standardized sums to the orign. We use the significands corresponding to the first significant digit as well as the second significant digit, respectively. Moreover, we suggest inproved versions of the MAD-test and obtain critical values that are independent of the sample sizes. For illustration the tests are applied to specifically selected data sets where prior knowledge is available about being or not being Benford. Furthermore we discuss the role of truncation of distributions.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"170 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139766122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Convergence of the CUSUM estimation for a mean shift in linear processes with random coefficients 具有随机系数的线性过程中均值移动的 CUSUM 估计的收敛性
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-02-12 DOI: 10.1007/s00180-024-01465-6
Yi Wu, Wei Wang, Xuejun Wang

Let ({X_{i},1le ile n}) be a sequence of linear process based on dependent random variables with random coefficients, which has a mean shift at an unknown location. The cumulative sum (CUSUM, for short) estimator of the change point is studied. The strong convergence, (L_{r}) convergence, complete convergence and the rate of strong convergence are established for the CUSUM estimator under some mild conditions. These results improve and extend the corresponding ones in the literature. Simulation studies and two real data examples are also provided to support the theoretical results.

设({X_{i},1le ile n} )是一个基于因变量的线性过程序列,具有随机系数,在未知位置有均值移动。研究了变化点的累积和(简称 CUSUM)估计器。在一些温和的条件下,建立了 CUSUM 估计器的强收敛性、(L_{r})收敛性、完全收敛性和强收敛率。这些结果改进并扩展了文献中的相应结果。此外,还提供了仿真研究和两个真实数据实例来支持理论结果。
{"title":"Convergence of the CUSUM estimation for a mean shift in linear processes with random coefficients","authors":"Yi Wu, Wei Wang, Xuejun Wang","doi":"10.1007/s00180-024-01465-6","DOIUrl":"https://doi.org/10.1007/s00180-024-01465-6","url":null,"abstract":"<p>Let <span>({X_{i},1le ile n})</span> be a sequence of linear process based on dependent random variables with random coefficients, which has a mean shift at an unknown location. The cumulative sum (CUSUM, for short) estimator of the change point is studied. The strong convergence, <span>(L_{r})</span> convergence, complete convergence and the rate of strong convergence are established for the CUSUM estimator under some mild conditions. These results improve and extend the corresponding ones in the literature. Simulation studies and two real data examples are also provided to support the theoretical results.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"1 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139765962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysis of estimating the Bayes rule for Gaussian mixture models with a specified missing-data mechanism 对具有指定缺失数据机制的高斯混合物模型贝叶斯规则的估计分析
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-02-10 DOI: 10.1007/s00180-023-01447-0

Abstract

Semi-supervised learning approaches have been successfully applied in a wide range of engineering and scientific fields. This paper investigates the generative model framework with a missingness mechanism for unclassified observations, as introduced by Ahfock and McLachlan (Stat Comput 30:1–12, 2020). We show that in a partially classified sample, a classifier using Bayes’ rule of allocation with a missing-data mechanism can surpass a fully supervised classifier in a two-class normal homoscedastic model, especially with moderate to low overlap and proportion of missing class labels, or with large overlap but few missing labels. It also outperforms a classifier with no missing-data mechanism regardless of the overlap region or the proportion of missing class labels. Our exploration of two- and three-component normal mixture models with unequal covariances through simulations further corroborates our findings. Finally, we illustrate the use of the proposed classifier with a missing-data mechanism on interneuronal and skin lesion datasets.

摘要 半监督学习方法已成功应用于广泛的工程和科学领域。本文研究了由 Ahfock 和 McLachlan(Stat Comput 30:1-12,2020 年)提出的带有未分类观测缺失机制的生成模型框架。我们的研究表明,在部分分类样本中,在两类正态同方差模型中,使用贝叶斯分配规则和缺失数据机制的分类器可以超越完全监督分类器,特别是在中低重叠度和缺失类标签比例的情况下,或者在重叠度大但缺失标签少的情况下。无论重叠区域或缺失类标签的比例如何,它的表现也优于没有缺失数据机制的分类器。我们通过模拟探索了具有不等协方差的两分量和三分量正态混合模型,进一步证实了我们的发现。最后,我们在神经元间和皮肤病变数据集上说明了所提出的具有数据缺失机制的分类器的使用情况。
{"title":"Analysis of estimating the Bayes rule for Gaussian mixture models with a specified missing-data mechanism","authors":"","doi":"10.1007/s00180-023-01447-0","DOIUrl":"https://doi.org/10.1007/s00180-023-01447-0","url":null,"abstract":"<h3>Abstract</h3> <p>Semi-supervised learning approaches have been successfully applied in a wide range of engineering and scientific fields. This paper investigates the generative model framework with a missingness mechanism for unclassified observations, as introduced by Ahfock and McLachlan (Stat Comput 30:1–12, 2020). We show that in a partially classified sample, a classifier using Bayes’ rule of allocation with a missing-data mechanism can surpass a fully supervised classifier in a two-class normal homoscedastic model, especially with moderate to low overlap and proportion of missing class labels, or with large overlap but few missing labels. It also outperforms a classifier with no missing-data mechanism regardless of the overlap region or the proportion of missing class labels. Our exploration of two- and three-component normal mixture models with unequal covariances through simulations further corroborates our findings. Finally, we illustrate the use of the proposed classifier with a missing-data mechanism on interneuronal and skin lesion datasets.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"212 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139765961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1