Statistical Papers最新文献

英文中文

A sequential feature selection approach to change point detection in mean-shift change point models 均值偏移变化点模型中变化点检测的顺序特征选择方法

IF 1.3 3区数学 Q2 STATISTICS & PROBABILITY

Statistical Papers

Pub Date : 2024-04-03 DOI: 10.1007/s00362-024-01548-y

Abstract

Change point detection is an important area of scientific research and has applications in a wide range of fields. In this paper, we propose a sequential change point detection (SCPD) procedure for mean-shift change point models. Unlike classical feature selection based approaches, the SCPD method detects change points in the order of the conditional change sizes and makes full use of the identified change points information. The extended Bayesian information criterion (EBIC) is employed as the stopping rule in the SCPD procedure. We investigate the theoretical property of the procedure and compare its performance with other methods existing in the literature. It is established that the SCPD procedure has the property of detection consistency. Simulation studies and real data analyses demonstrate that the SCPD procedure has the edge over the other methods in terms of detection accuracy and robustness.

摘要变更点检测是科学研究的一个重要领域，在许多领域都有应用。本文针对均值偏移变化点模型提出了一种序列变化点检测（SCPD）程序。与基于特征选择的经典方法不同，SCPD 方法按照条件变化大小的顺序检测变化点，并充分利用已识别的变化点信息。扩展贝叶斯信息准则（EBIC）被用作 SCPD 程序的停止规则。我们研究了该程序的理论属性，并将其性能与文献中已有的其他方法进行了比较。结果表明，SCPD 程序具有检测一致性的特性。模拟研究和实际数据分析表明，SCPD 程序在检测精度和鲁棒性方面优于其他方法。

引用次数: 0

Hypothesis testing for varying coefficient models in tail index regression 尾部指数回归中不同系数模型的假设检验

IF 1.3 3区数学 Q2 STATISTICS & PROBABILITY

Statistical Papers

Pub Date : 2024-04-02 DOI: 10.1007/s00362-024-01538-0

Koki Momoki, Takuma Yoshida

This study examines the varying coefficient model in tail index regression. The varying coefficient model is an efficient semiparametric model that avoids the curse of dimensionality when including large covariates in the model. In fact, the varying coefficient model is useful in mean, quantile, and other regressions. The tail index regression is not an exception. However, the varying coefficient model is flexible, but leaner and simpler models are preferred for applications. Therefore, it is important to evaluate whether the estimated coefficient function varies significantly with covariates. If the effect of the non-linearity of the model is weak, the varying coefficient structure is reduced to a simpler model, such as a constant or zero. Accordingly, the hypothesis test for model assessment in the varying coefficient model has been discussed in mean and quantile regression. However, there are no results in tail index regression. In this study, we investigate the asymptotic properties of an estimator and provide a hypothesis testing method for varying coefficient models for tail index regression.

本研究探讨了尾部指数回归中的变化系数模型。变化系数模型是一种高效的半参数模型，当模型中包含大量协变量时，它可以避免维度诅咒。事实上，变化系数模型在均值回归、量化回归和其他回归中都很有用。尾指数回归也不例外。然而，变化系数模型是灵活的，但在应用中更倾向于采用更精简、更简单的模型。因此，评估估计的系数函数是否随协变量的变化而显著变化非常重要。如果模型的非线性影响较弱，则可将变化系数结构简化为更简单的模型，如常数或零。因此，在均值回归和量回归中讨论了变化系数模型中模型评估的假设检验。但是，在尾指数回归中还没有结果。在本研究中，我们研究了估计器的渐近特性，并为尾指数回归的变化系数模型提供了一种假设检验方法。

引用次数: 0

Minimum contrast for the first-order intensity estimation of spatial and spatio-temporal point processes 空间和时空点过程一阶强度估计的最小对比度

IF 1.3 3区数学 Q2 STATISTICS & PROBABILITY

Statistical Papers

Pub Date : 2024-03-26 DOI: 10.1007/s00362-024-01541-5

Nicoletta D’Angelo, Giada Adelfio

In this paper, we harness a result in point process theory, specifically the expectation of the weighted K-function, where the weighting is done by the true first-order intensity function. This theoretical result can be employed as an estimation method to derive parameter estimates for a particular model assumed for the data. The underlying motivation is to avoid the difficulties associated with dealing with complex likelihoods in point process models and their maximization. The exploited result makes our method theoretically applicable to any model specification. In this paper, we restrict our study to Poisson models, whose likelihood represents the base for many more complex point process models. In this context, our proposed method can estimate the vector of local parameters that correspond to the points within the analyzed point pattern without introducing any additional complexity compared to the global estimation. We illustrate the method through simulation studies for both purely spatial and spatio-temporal point processes and show complex scenarios based on the Poisson model through the analysis of two real datasets concerning environmental problems.

在本文中，我们利用了点过程理论中的一个结果，特别是加权 K 函数的期望，其中加权是由真实的一阶强度函数完成的。这一理论结果可作为一种估算方法，用于推导为数据假设的特定模型的参数估计。其根本动机在于避免处理点过程模型中复杂似然及其最大化所带来的困难。所利用的结果使我们的方法在理论上适用于任何模型规范。在本文中，我们的研究仅限于泊松模型，而泊松模型的似然是许多更复杂的点过程模型的基础。在这种情况下，我们提出的方法可以估算出与分析点模式中的点相对应的局部参数向量，与全局估算相比，不会带来任何额外的复杂性。我们通过对纯空间点过程和时空点过程的模拟研究来说明该方法，并通过分析两个有关环境问题的真实数据集来展示基于泊松模型的复杂情景。

{"title":"Minimum contrast for the first-order intensity estimation of spatial and spatio-temporal point processes","authors":"Nicoletta D’Angelo, Giada Adelfio","doi":"10.1007/s00362-024-01541-5","DOIUrl":"https://doi.org/10.1007/s00362-024-01541-5","url":null,"abstract":"In this paper, we harness a result in point process theory, specifically the expectation of the weighted K-function, where the weighting is done by the true first-order intensity function. This theoretical result can be employed as an estimation method to derive parameter estimates for a particular model assumed for the data. The underlying motivation is to avoid the difficulties associated with dealing with complex likelihoods in point process models and their maximization. The exploited result makes our method theoretically applicable to any model specification. In this paper, we restrict our study to Poisson models, whose likelihood represents the base for many more complex point process models. In this context, our proposed method can estimate the vector of local parameters that correspond to the points within the analyzed point pattern without introducing any additional complexity compared to the global estimation. We illustrate the method through simulation studies for both purely spatial and spatio-temporal point processes and show complex scenarios based on the Poisson model through the analysis of two real datasets concerning environmental problems.","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"20 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140883224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The resampling method via representative points 通过代表点重新取样的方法

IF 1.3 3区数学 Q2 STATISTICS & PROBABILITY

Statistical Papers

Pub Date : 2024-03-18 DOI: 10.1007/s00362-024-01536-2

Long-Hao Xu, Yinan Li, Kai-Tai Fang

The bootstrap method relies on resampling from the empirical distribution to provide inferences about the population with a distribution F. The empirical distribution serves as an approximation to the population. It is possible, however, to resample from another approximating distribution of F to conduct simulation-based inferences. In this paper, we utilize representative points to form an alternative approximating distribution of F for resampling. The representative points in terms of minimum mean squared error from F have been widely applied to numerical integration, simulation, and the problems of grouping, quantization, and classification. The method of resampling via representative points can be used to estimate the sampling distribution of a statistic of interest. A basic theory for the proposed method is established. We prove the convergence of higher-order moments of the new approximating distribution of F, and establish the consistency of sampling distribution approximation in the cases of the sample mean and sample variance under the Kolmogorov metric and Mallows–Wasserstein metric. Based on some numerical studies, it has been shown that the proposed resampling method improves the nonparametric bootstrap in terms of confidence intervals for mean and variance.

自举法依赖于从经验分布中重新取样来推断具有分布 F 的群体。不过，也可以从 F 的另一个近似分布中重新取样，进行基于模拟的推断。在本文中，我们利用代表点来形成 F 的另一种近似分布，以进行重新采样。从 F 的最小均方误差来看，代表点已被广泛应用于数值积分、模拟以及分组、量化和分类等问题。通过代表点重新取样的方法可用于估计相关统计量的取样分布。我们建立了拟议方法的基本理论。我们证明了 F 的新近似分布的高阶矩的收敛性，并在 Kolmogorov 公制和 Mallows-Wasserstein 公制下建立了样本均值和样本方差情况下抽样分布近似的一致性。基于一些数值研究表明，所提出的重采样方法在均值和方差的置信区间方面改进了非参数引导法。

{"title":"The resampling method via representative points","authors":"Long-Hao Xu, Yinan Li, Kai-Tai Fang","doi":"10.1007/s00362-024-01536-2","DOIUrl":"https://doi.org/10.1007/s00362-024-01536-2","url":null,"abstract":"The bootstrap method relies on resampling from the empirical distribution to provide inferences about the population with a distribution F. The empirical distribution serves as an approximation to the population. It is possible, however, to resample from another approximating distribution of F to conduct simulation-based inferences. In this paper, we utilize representative points to form an alternative approximating distribution of F for resampling. The representative points in terms of minimum mean squared error from F have been widely applied to numerical integration, simulation, and the problems of grouping, quantization, and classification. The method of resampling via representative points can be used to estimate the sampling distribution of a statistic of interest. A basic theory for the proposed method is established. We prove the convergence of higher-order moments of the new approximating distribution of F, and establish the consistency of sampling distribution approximation in the cases of the sample mean and sample variance under the Kolmogorov metric and Mallows–Wasserstein metric. Based on some numerical studies, it has been shown that the proposed resampling method improves the nonparametric bootstrap in terms of confidence intervals for mean and variance.","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"84 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140147633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An heuristic scree plot criterion for the number of factors 因素数量的启发式克里图标准

IF 1.3 3区数学 Q2 STATISTICS & PROBABILITY

Statistical Papers

Pub Date : 2024-03-18 DOI: 10.1007/s00362-023-01517-x

Abstract

Cattel’s (Multivar Behav Res 1:245–276, 1966) heuristic determines the number of factors as the elbow point between ‘steep’ and ‘not steep’ in the scree plot. In contrast, an elbow is by definition absent in points on a hyberbole with corresponding equisized surfaces. We formalize this heuristic and propose a criterion to determine the number of factors by comparing surfaces under the scree plot. Monte Carlo simulations shows that the finite-sample properties of our proposed criterion outperform benchmarks in the dynamic factor model literature.

摘要卡特尔（Multivar Behav Res 1:245-276，1966 年）的启发式方法将因子数确定为树状图中 "陡峭 "与 "不陡峭 "之间的肘点。与此相反，根据定义，在具有相应等值化表面的小交叉点上不存在肘点。我们将这一启发式方法正式化，并提出了一个标准，通过比较克里图下的表面来确定因子的数量。蒙特卡罗模拟显示，我们提出的标准的有限样本属性优于动态因子模型文献中的基准。

引用次数: 0

A semi-orthogonal nonnegative matrix tri-factorization algorithm for overlapping community detection 用于重叠群落检测的半正交非负矩阵三因子化算法

IF 1.3 3区数学 Q2 STATISTICS & PROBABILITY

Statistical Papers

Pub Date : 2024-03-14 DOI: 10.1007/s00362-024-01537-1

Zhaoyang Li, Yuehan Yang

In this paper, we focus on overlapping community detection and propose an efficient semi-orthogonal nonnegative matrix tri-factorization (semi-ONMTF) algorithm. This method factorizes a matrix X into an orthogonal matrix U, a nonnegative matrix B, and a transposed matrix (U^mathrm {scriptscriptstyle T} ). We use the Cayley Transformation to maintain strict orthogonality of U that each iteration stays on the Stiefel Manifold. This algorithm is computationally efficient because the solutions of U and B are simplified into a matrix-wise update algorithm. Applying this method, we detect overlapping communities by the belonging coefficient vector and analyse associations between communities by the unweighted network of communities. We conduct simulations and applications to show that the proposed method has wide applicability. In a real data example, we apply the semi-ONMTF to a stock data set and construct a directed association network of companies. Based on the modularity for directed and overlapping communities, we obtain five overlapping communities, 17 overlapping nodes, and five outlier nodes in the network. We also discuss the associations between communities, providing insights into the overlapping community detection on the stock market network.

本文的重点是重叠群落检测，并提出了一种高效的半正交非负矩阵三因子化（semi-ONMTF）算法。该方法将矩阵 X 分解为一个正交矩阵 U、一个非负矩阵 B 和一个转置矩阵（U^mathrm {scriptscriptstyle T} ）。我们使用凯利变换（Cayley Transformation）来保持 U 的严格正交性，使每次迭代都保持在 Stiefel Manifold 上。这种算法的计算效率很高，因为 U 和 B 的解被简化为矩阵更新算法。应用这种方法，我们可以通过归属系数向量检测重叠群落，并通过非加权群落网络分析群落间的关联。我们通过模拟和应用表明，所提出的方法具有广泛的适用性。在一个真实数据示例中，我们将半ONMTF应用于股票数据集，并构建了公司的有向关联网络。根据有向和重叠群落的模块性，我们得到了网络中的 5 个重叠群落、17 个重叠节点和 5 个离群节点。我们还讨论了社群之间的关联，为在股票市场网络上检测重叠社群提供了启示。

{"title":"A semi-orthogonal nonnegative matrix tri-factorization algorithm for overlapping community detection","authors":"Zhaoyang Li, Yuehan Yang","doi":"10.1007/s00362-024-01537-1","DOIUrl":"https://doi.org/10.1007/s00362-024-01537-1","url":null,"abstract":"In this paper, we focus on overlapping community detection and propose an efficient semi-orthogonal nonnegative matrix tri-factorization (semi-ONMTF) algorithm. This method factorizes a matrix X into an orthogonal matrix U, a nonnegative matrix B, and a transposed matrix (U^mathrm {scriptscriptstyle T} ). We use the Cayley Transformation to maintain strict orthogonality of U that each iteration stays on the Stiefel Manifold. This algorithm is computationally efficient because the solutions of U and B are simplified into a matrix-wise update algorithm. Applying this method, we detect overlapping communities by the belonging coefficient vector and analyse associations between communities by the unweighted network of communities. We conduct simulations and applications to show that the proposed method has wide applicability. In a real data example, we apply the semi-ONMTF to a stock data set and construct a directed association network of companies. Based on the modularity for directed and overlapping communities, we obtain five overlapping communities, 17 overlapping nodes, and five outlier nodes in the network. We also discuss the associations between communities, providing insights into the overlapping community detection on the stock market network.","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"395 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140147637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Statistical simulations with LR random fuzzy numbers 使用 LR 随机模糊数进行统计模拟

IF 1.3 3区数学 Q2 STATISTICS & PROBABILITY

Statistical Papers

Pub Date : 2024-03-08 DOI: 10.1007/s00362-024-01533-5

Abbas Parchami, Przemyslaw Grzegorzewski, Maciej Romaniuk

Computer simulations are a powerful tool in many fields of research. This also applies to the broadly understood analysis of experimental data, which are frequently burdened with multiple imperfections. Often the underlying imprecision or vagueness can be suitably described in terms of fuzzy numbers which enable also the capture of subjectivity. On the other hand, due to the random nature of the experimental data, the tools for their description must take into account their statistical nature. In this way, we come to random fuzzy numbers that model fuzzy data and are also solidly formalized within the probabilistic setting. In this contribution, we introduce the so-called LR random fuzzy numbers that can be used in various Monte-Carlo simulations on fuzzy data. The proposed method of generating fuzzy numbers with membership functions given by probability densities is both simple and rich, well-grounded mathematically, and has a high application potential.

计算机模拟是许多研究领域的有力工具。这同样适用于对实验数据的广义分析，因为实验数据往往存在多种不完善之处。通常情况下，可以用模糊数来适当地描述潜在的不精确性或模糊性，模糊数还可以捕捉主观性。另一方面，由于实验数据的随机性，对其进行描述的工具必须考虑其统计性质。这样，我们就得出了能模拟模糊数据的随机模糊数，并在概率论环境中将其形式化。在本文中，我们介绍了所谓的 LR 随机模糊数，它可用于对模糊数据进行各种蒙特卡洛模拟。所提出的生成模糊数的方法，其成员函数由概率密度给出，既简单又丰富，具有坚实的数学基础，应用潜力很大。

引用次数: 0

Minimax weight learning for absorbing MDPs 吸收型 MDP 的最小权重学习

IF 1.3 3区数学 Q2 STATISTICS & PROBABILITY

Statistical Papers

Pub Date : 2024-03-06 DOI: 10.1007/s00362-023-01491-4

Fengying Li, Yuqiang Li, Xianyi Wu

Reinforcement learning policy evaluation problems are often modeled as finite or discounted/averaged infinite-horizon Markov Decision Processes (MDPs). In this paper, we study undiscounted off-policy evaluation for absorbing MDPs. Given the dataset consisting of i.i.d episodes under a given truncation level, we propose an algorithm (referred to as MWLA in the text) to directly estimate the expected return via the importance ratio of the state-action occupancy measure. The Mean Square Error (MSE) bound of the MWLA method is provided and the dependence of statistical errors on the data size and the truncation level are analyzed. The performance of the algorithm is illustrated by means of computational experiments under an episodic taxi environment

强化学习政策评估问题通常被建模为有限或贴现/平均无限视距马尔可夫决策过程（MDP）。在本文中，我们将研究吸收型 MDP 的未贴现非策略评估。给定截断水平下的数据集由 i.i.d 事件组成，我们提出了一种算法（文中称为 MWLA），通过状态-行动占用度量的重要性比直接估计预期收益。我们提供了 MWLA 方法的均方误差（MSE）边界，并分析了统计误差对数据规模和截断水平的依赖性。通过在偶发出租车环境下的计算实验，说明了该算法的性能。

引用次数: 0

Homogeneity tests and interval estimations of risk differences for stratified bilateral and unilateral correlated data 分层双边和单边相关数据的同质性检验和风险差异区间估计

IF 1.3 3区数学 Q2 STATISTICS & PROBABILITY

Statistical Papers

Pub Date : 2024-03-04 DOI: 10.1007/s00362-024-01532-6

Shuyi Liang, Kai-Tai Fang, Xin-Wei Huang, Yijing Xin, Chang-Xing Ma

In clinical trials studying paired parts of a subject with binary outcomes, it is expected to collect measurements bilaterally. However, there are cases where subjects contribute measurements for only one part. By utilizing combined data, it is possible to gain additional information compared to using bilateral or unilateral data alone. With the combined data, this article investigates homogeneity tests of risk differences with the presence of stratification effects and proposes interval estimations of a common risk difference if stratification does not introduce underlying dissimilarities. Under Dallal’s model (Biometrics 44:253–257, 1988), we propose three test statistics and evaluate their performances regarding type I error controls and powers. Confidence intervals of a common risk difference with satisfactory coverage probabilities and interval length are constructed. Our simulation results show that the score test is the most robust and the profile likelihood confidence interval outperforms other methods proposed. Data from a study of acute otitis media is used to illustrate our proposed procedures.

在对受试者的成对部分进行二元结果研究的临床试验中，预计要收集双侧的测量数据。不过，也有受试者只对一个部位进行测量的情况。与单独使用双侧或单侧数据相比，利用组合数据可以获得更多信息。利用合并数据，本文研究了存在分层效应时风险差异的同质性检验，并提出了在分层不引入潜在差异的情况下共同风险差异的区间估计。根据 Dallal 的模型（Biometrics 44:253-257, 1988），我们提出了三种检验统计量，并评估了它们在 I 型误差控制和幂级数方面的表现。我们构建了具有令人满意的覆盖概率和区间长度的共同风险差异置信区间。我们的模拟结果表明，得分检验是最稳健的，轮廓似然置信区间优于其他方法。我们使用急性中耳炎的研究数据来说明我们提出的程序。

{"title":"Homogeneity tests and interval estimations of risk differences for stratified bilateral and unilateral correlated data","authors":"Shuyi Liang, Kai-Tai Fang, Xin-Wei Huang, Yijing Xin, Chang-Xing Ma","doi":"10.1007/s00362-024-01532-6","DOIUrl":"https://doi.org/10.1007/s00362-024-01532-6","url":null,"abstract":"In clinical trials studying paired parts of a subject with binary outcomes, it is expected to collect measurements bilaterally. However, there are cases where subjects contribute measurements for only one part. By utilizing combined data, it is possible to gain additional information compared to using bilateral or unilateral data alone. With the combined data, this article investigates homogeneity tests of risk differences with the presence of stratification effects and proposes interval estimations of a common risk difference if stratification does not introduce underlying dissimilarities. Under Dallal’s model (Biometrics 44:253–257, 1988), we propose three test statistics and evaluate their performances regarding type I error controls and powers. Confidence intervals of a common risk difference with satisfactory coverage probabilities and interval length are constructed. Our simulation results show that the score test is the most robust and the profile likelihood confidence interval outperforms other methods proposed. Data from a study of acute otitis media is used to illustrate our proposed procedures.","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"55 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140033154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Welch’s t test is more sensitive to real world violations of distributional assumptions than student’s t test but logistic regression is more robust than either 韦尔奇 t 检验比学生 t 检验对现实世界中违反分布假设的情况更敏感，但逻辑回归比两者都更稳健。

IF 1.3 3区数学 Q2 STATISTICS & PROBABILITY

Statistical Papers

Pub Date : 2024-03-04 DOI: 10.1007/s00362-024-01531-7

David Curtis

It has previously been pointed out that Student’s t test, which assumes that samples are drawn from populations with equal standard deviations, can have an inflated Type I error rate if this assumption is violated. Hence it has been recommended that Welch’s t test should be preferred. In the context of carrying out gene-wise weighted burden tests for detecting association of rare variants with psoriasis we observe that Welch’s test performs unsatisfactorily. We show that if the assumption of normality is violated and observations follow a Poisson distribution, then with unequal sample sizes Welch’s t test has an inflated Type I error rate, is systematically biased and is prone to produce extremely low p values. We argue that such data can arise in a variety of real world situations and believe that researchers should be aware of this issue. Student’s t test performs much better in this scenario but a likelihood ratio test based on logistic regression models performs better still and we suggest that this might generally be a preferable method to test for a difference in distributions between two samples.

This research has been conducted using the UK Biobank Resource.

以前曾有人指出，学生 t 检验假定样本来自标准差相等的群体，如果违反了这一假定，I 类错误率就会增大。因此，建议采用韦尔奇 t 检验。在为检测罕见变异体与银屑病的关联而进行基因加权负担测试时，我们发现韦尔奇检验的表现并不令人满意。我们的研究表明，如果违反了正态性假设，观察结果呈泊松分布，那么在样本量不等的情况下，韦尔奇 t 检验的 I 类错误率就会升高，出现系统性偏差，并容易产生极低的 p 值。我们认为，这种数据可能出现在现实世界的各种情况中，研究人员应该意识到这个问题。在这种情况下，学生 t 检验的效果要好得多，但基于逻辑回归模型的似然比检验的效果更好，我们认为这可能是检验两个样本分布差异的较好方法。

{"title":"Welch’s t test is more sensitive to real world violations of distributional assumptions than student’s t test but logistic regression is more robust than either","authors":"David Curtis","doi":"10.1007/s00362-024-01531-7","DOIUrl":"https://doi.org/10.1007/s00362-024-01531-7","url":null,"abstract":"It has previously been pointed out that Student’s t test, which assumes that samples are drawn from populations with equal standard deviations, can have an inflated Type I error rate if this assumption is violated. Hence it has been recommended that Welch’s t test should be preferred. In the context of carrying out gene-wise weighted burden tests for detecting association of rare variants with psoriasis we observe that Welch’s test performs unsatisfactorily. We show that if the assumption of normality is violated and observations follow a Poisson distribution, then with unequal sample sizes Welch’s t test has an inflated Type I error rate, is systematically biased and is prone to produce extremely low p values. We argue that such data can arise in a variety of real world situations and believe that researchers should be aware of this issue. Student’s t test performs much better in this scenario but a likelihood ratio test based on logistic regression models performs better still and we suggest that this might generally be a preferable method to test for a difference in distributions between two samples.This research has been conducted using the UK Biobank Resource.","PeriodicalId":51166,"journal":{"name":"Statistical Papers","volume":"239 ","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140037982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Statistical Papers

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀