Computational Statistics最新文献

英文中文

Ranking handball teams from statistical strength estimation 根据统计实力估算手球队排名

IF 1.3 4区数学 Q3 STATISTICS & PROBABILITY

Computational Statistics

Pub Date : 2024-06-24 DOI: 10.1007/s00180-024-01522-0

Florian Felice

In this work, we present a methodology to estimate the strength of handball teams. We propose the use of the Conway-Maxwell-Poisson distribution to model the number of goals scored by a team as a flexible discrete distribution which can handle situations of non equi-dispersion. From its parameters, we derive a mathematical formula to determine the strength of a team. We propose a ranking based on the estimated strengths to compare teams across different championships. Applied to female handball club data from European competitions over the 2022/2023 season, we show that our new proposed ranking can have an echo in real sports events and is linked to recent results from European competitions.

在这项工作中，我们提出了一种估算手球队实力的方法。我们建议使用康威-麦克斯韦-泊松分布（Conway-Maxwell-Poisson distribution）来模拟球队的进球数，这是一种灵活的离散分布，可以处理非等离散的情况。根据其参数，我们推导出一个数学公式来确定一支球队的实力。我们提出了一种基于估计实力的排名方法，用于比较不同锦标赛的参赛队伍。我们将其应用于 2022/2023 赛季欧洲赛事的女子手球俱乐部数据，结果表明，我们提出的新排名可以在实际体育赛事中产生反响，并与欧洲赛事的最新结果相关联。

引用次数: 0

Hypothesis testing in Cox models when continuous covariates are dichotomized: bias analysis and bootstrap-based test 连续协变量二分时 Cox 模型中的假设检验：偏差分析和基于引导的检验

IF 1.3 4区数学 Q3 STATISTICS & PROBABILITY

Computational Statistics

Pub Date : 2024-06-23 DOI: 10.1007/s00180-024-01520-2

Hyunman Sim, Sungjeong Lee, Bo-Hyung Kim, Eun Shin, Woojoo Lee

Hypothesis testing for the regression coefficient associated with a dichotomized continuous covariate in a Cox proportional hazards model has been considered in clinical research. Although most existing testing methods do not allow covariates, except for a dichotomized continuous covariate, they have generally been applied. Through an analytic bias analysis and a numerical study, we show that the current practice is not free from an inflated type I error and a loss of power. To overcome this limitation, we develop a bootstrap-based test that allows additional covariates and dichotomizes two-dimensional covariates into a binary variable. In addition, we develop an efficient algorithm to speed up the calculation of the proposed test statistic. Our numerical study demonstrates that the proposed bootstrap-based test maintains the type I error well at the nominal level and exhibits higher power than other methods, as well as that the proposed efficient algorithm reduces computational costs.

临床研究一直在考虑对 Cox 比例危险模型中与二分连续协变量相关的回归系数进行假设检验。尽管除二分连续协变量外，现有的大多数检验方法不允许使用协变量，但这些方法已被普遍应用。通过分析偏差分析和数值研究，我们发现目前的做法并不能避免 I 型误差的扩大和功率的损失。为了克服这一局限性，我们开发了一种基于 bootstrap 的检验方法，允许使用额外的协变量，并将二维协变量二分为二元变量。此外，我们还开发了一种高效算法，以加快所提检验统计量的计算速度。我们的数值研究表明，与其他方法相比，所提出的基于引导的检验能在名义水平上很好地保持 I 型误差，并表现出更高的功率，同时所提出的高效算法也降低了计算成本。

引用次数: 0

Trend of high dimensional time series estimation using low-rank matrix factorization: heuristics and numerical experiments via the TrendTM package 使用低秩矩阵因式分解进行高维时间序列估计的趋势：通过 TrendTM 软件包进行启发式方法和数值实验

IF 1.3 4区数学 Q3 STATISTICS & PROBABILITY

Computational Statistics

Pub Date : 2024-06-20 DOI: 10.1007/s00180-024-01519-9

Emilie Lebarbier, Nicolas Marie, Amélie Rosier

This article focuses on the practical issue of a recent theoretical method proposed for trend estimation in high dimensional time series. This method falls within the scope of the low-rank matrix factorization methods in which the temporal structure is taken into account. It consists of minimizing a penalized criterion, theoretically efficient but which depends on two constants to be chosen in practice. We propose a two-step strategy to solve this question based on two different known heuristics. The performance and a comparison of the strategies are studied through an important simulation study in various scenarios. In order to make the estimation method with the best strategy available to the community, we implemented the method in an R package TrendTM which is presented and used here. Finally, we give a geometric interpretation of the results by linking it to PCA and use the results to solve a high-dimensional curve clustering problem. The package is available on CRAN.

本文重点讨论最近提出的一种用于高维时间序列趋势估计的理论方法的实际问题。该方法属于低秩矩阵因式分解方法的范畴，其中考虑了时间结构。它包括最小化一个惩罚性标准，该标准在理论上是有效的，但在实践中取决于两个常量的选择。我们基于两种不同的已知启发式方法，提出了一种分两步解决这一问题的策略。通过在各种情况下进行重要的模拟研究，对这些策略的性能和比较进行了研究。为了向社会提供具有最佳策略的估算方法，我们在 R 软件包 TrendTM 中实现了该方法，并在此介绍和使用。最后，我们通过将其与 PCA 相结合，对结果进行了几何解释，并利用结果解决了一个高维曲线聚类问题。该软件包可在 CRAN 上下载。

引用次数: 0

Some aspects of nonlinear dimensionality reduction 非线性降维的一些方面

IF 1.3 4区数学 Q3 STATISTICS & PROBABILITY

Computational Statistics

Pub Date : 2024-06-16 DOI: 10.1007/s00180-024-01514-0

Liwen Wang, Yongda Wang, Shifeng Xiong, Jiankui Yang

In this paper we discuss nonlinear dimensionality reduction within the framework of principal curves. We formulate dimensionality reduction as problems of estimating principal subspaces for both noiseless and noisy cases, and propose the corresponding iterative algorithms that modify existing principal curve algorithms. An R squared criterion is introduced to estimate the dimension of the principal subspace. In addition, we present new regression and density estimation strategies based on our dimensionality reduction algorithms. Theoretical analyses and numerical experiments show the effectiveness of the proposed methods.

本文讨论了主曲线框架内的非线性降维问题。我们将降维问题表述为估计无噪声和噪声情况下的主子空间问题，并提出了相应的迭代算法，对现有的主曲线算法进行了修改。我们引入了 R 平方准则来估计主子空间的维度。此外，我们还基于降维算法提出了新的回归和密度估计策略。理论分析和数值实验表明了所提方法的有效性。

引用次数: 0

Double truncation method for controlling local false discovery rate in case of spiky null 控制尖空情况下局部误发现率的双重截断法

IF 1.3 4区数学 Q3 STATISTICS & PROBABILITY

Computational Statistics

Pub Date : 2024-06-05 DOI: 10.1007/s00180-024-01510-4

Shinjune Kim, Youngjae Oh, Johan Lim, DoHwan Park, Erin M. Green, Mark L. Ramos, Jaesik Jeong

Many multiple test procedures, which control the false discovery rate, have been developed to identify some cases (e.g. genes) showing statistically significant difference between two different groups. However, a common issue encountered in some practical data sets is the presence of highly spiky null distributions. Existing methods struggle to control type I error in such cases due to the “inflated false positives," but this problem has not been addressed in previous literature. Our team recently encountered this issue while analyzing SET4 gene deletion data and proposed modeling the null distribution using a scale mixture normal distribution. However, the use of this approach is limited due to strong assumptions on the spiky peak. In this paper, we present a novel multiple test procedure that can be applied to any type of spiky peak data, including situations with no spiky peak or with one or two spiky peaks. Our approach involves truncating the central statistics around 0, which primarily contribute to the null spike, as well as the two tails that may be contaminated by alternative distributions. We refer to this method as the “double truncation method." After applying double truncation, we estimate the null density using the doubly truncated maximum likelihood estimator. We demonstrate numerically that our proposed method effectively controls the false discovery rate at the desired level using simulated data. Furthermore, we apply our method to two real data sets, namely the SET protein data and peony data.

目前已开发出许多控制误发现率的多重检验程序，用于识别一些在两个不同组别之间显示出显著统计学差异的情况（如基因）。然而，在一些实际数据集中遇到的一个常见问题是存在高度尖峰的空分布。在这种情况下，由于 "虚假阳性 "的存在，现有的方法很难控制 I 类错误，但这一问题在以往的文献中还没有得到解决。我们的团队最近在分析 SET4 基因缺失数据时遇到了这个问题，并建议使用比例混合正态分布来模拟空分布。然而，由于对尖峰的强烈假设，这种方法的使用受到了限制。在本文中，我们提出了一种新的多重检验程序，它可应用于任何类型的尖峰数据，包括无尖峰或有一个或两个尖峰的情况。我们的方法包括截断 0 附近的中心统计量（这是空尖峰的主要贡献），以及可能被其他分布污染的两个尾部。我们将这种方法称为 "双重截断法"。应用双重截断法后，我们使用双重截断最大似然估计法估计空密度。我们利用模拟数据用数字证明了我们提出的方法能有效地将误发现率控制在理想水平。此外，我们还将我们的方法应用于两个真实数据集，即 SET 蛋白质数据和牡丹数据。

{"title":"Double truncation method for controlling local false discovery rate in case of spiky null","authors":"Shinjune Kim, Youngjae Oh, Johan Lim, DoHwan Park, Erin M. Green, Mark L. Ramos, Jaesik Jeong","doi":"10.1007/s00180-024-01510-4","DOIUrl":"https://doi.org/10.1007/s00180-024-01510-4","url":null,"abstract":"<p>Many multiple test procedures, which control the false discovery rate, have been developed to identify some cases (e.g. genes) showing statistically significant difference between two different groups. However, a common issue encountered in some practical data sets is the presence of highly spiky null distributions. Existing methods struggle to control type I error in such cases due to the “inflated false positives,\" but this problem has not been addressed in previous literature. Our team recently encountered this issue while analyzing SET4 gene deletion data and proposed modeling the null distribution using a scale mixture normal distribution. However, the use of this approach is limited due to strong assumptions on the spiky peak. In this paper, we present a novel multiple test procedure that can be applied to any type of spiky peak data, including situations with no spiky peak or with one or two spiky peaks. Our approach involves truncating the central statistics around 0, which primarily contribute to the null spike, as well as the two tails that may be contaminated by alternative distributions. We refer to this method as the “double truncation method.\" After applying double truncation, we estimate the null density using the doubly truncated maximum likelihood estimator. We demonstrate numerically that our proposed method effectively controls the false discovery rate at the desired level using simulated data. Furthermore, we apply our method to two real data sets, namely the SET protein data and peony data.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"25 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141256347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Asymptotic properties of kernel density and hazard rate function estimators with censored widely orthant dependent data 核密度和危险率函数估计器的渐近特性与普查广泛正交依存数据

IF 1.3 4区数学 Q3 STATISTICS & PROBABILITY

Computational Statistics

Pub Date : 2024-06-03 DOI: 10.1007/s00180-024-01509-x

Yi Wu, Wei Wang, Wei Yu, Xuejun Wang

Kernel estimators of density function and hazard rate function are very important in nonparametric statistics. The paper aims to investigate the uniformly strong representations and the rates of uniformly strong consistency for kernel smoothing density and hazard rate function estimation with censored widely orthant dependent data based on the Kaplan–Meier estimator. Under some mild conditions, the rates of the remainder term and strong consistency are shown to be (Obig (sqrt{log (ng(n))/big (nb_{n}^{2}big )}big )~a.s.) and (Obig (sqrt{log (ng(n))/big (nb_{n}^{2}big )}big )+Obig (b_{n}^{2}big )~a.s.), respectively, where g(n) are the dominating coefficients of widely orthant dependent random variables. Some numerical simulations and a real data analysis are also presented to confirm the theoretical results based on finite sample performances.

密度函数和危险率函数的核估计器在非参数统计中非常重要。本文旨在研究基于 Kaplan-Meier 估计器的核平滑密度和危险率函数估计的均匀强表示和均匀强一致性率。在一些温和的条件下，余项率和强一致性被证明为 (Obig (sqrt{log (ng(n))/big (nb_{n}^{2}big )}big )~a.s.)和（Obig (sqrtlog (ng(n))/big (nb_{n}^{2}big )}big )+Obig (b_{n}^{2}big )~a.s.) ，其中 g(n) 是广义正交因变量的支配系数。本文还给出了一些数值模拟和实际数据分析，以证实基于有限样本性能的理论结果。

引用次数: 0

Expectile regression averaging method for probabilistic forecasting of electricity prices 用于电价概率预测的期望回归平均法

IF 1.3 4区数学 Q3 STATISTICS & PROBABILITY

Computational Statistics

Pub Date : 2024-05-29 DOI: 10.1007/s00180-024-01508-y

Joanna Janczura

In this paper we propose a new method for probabilistic forecasting of electricity prices. It is based on averaging point forecasts from different models combined with expectile regression. We show that deriving the predicted distribution in terms of expectiles, might be in some cases advantageous to the commonly used quantiles. We apply the proposed method to the day-ahead electricity prices from the German market and compare its accuracy with the Quantile Regression Averaging method and quantile- as well as expectile-based historical simulation. The obtained results indicate that using the expectile regression improves the accuracy of the probabilistic forecasts of electricity prices, but a variance stabilizing transformation should be applied prior to modelling.

本文提出了一种新的电价概率预测方法。该方法基于不同模型的平均点预测，并结合了期望值回归。我们证明，在某些情况下，用期望值推导预测分布可能比常用的量值更有优势。我们将所提出的方法应用于德国市场的日前电价，并将其准确性与量化回归平均法以及基于量化和期望值的历史模拟进行了比较。结果表明，使用期望值回归法可以提高电价概率预测的准确性，但在建模前应进行方差稳定转换。

引用次数: 0

Projection predictive variable selection for discrete response families with finite support 有限支持离散响应族的投影预测变量选择

IF 1.3 4区数学 Q3 STATISTICS & PROBABILITY

Computational Statistics

Pub Date : 2024-05-29 DOI: 10.1007/s00180-024-01506-0

Frank Weber, Änne Glass, Aki Vehtari

The projection predictive variable selection is a decision-theoretically justified Bayesian variable selection approach achieving an outstanding trade-off between predictive performance and sparsity. Its projection problem is not easy to solve in general because it is based on the Kullback–Leibler divergence from a restricted posterior predictive distribution of the so-called reference model to the parameter-conditional predictive distribution of a candidate model. Previous work showed how this projection problem can be solved for response families employed in generalized linear models and how an approximate latent-space approach can be used for many other response families. Here, we present an exact projection method for all response families with discrete and finite support, called the augmented-data projection. A simulation study for an ordinal response family shows that the proposed method performs better than or similarly to the previously proposed approximate latent-space projection. The cost of the slightly better performance of the augmented-data projection is a substantial increase in runtime. Thus, if the augmented-data projection’s runtime is too high, we recommend the latent projection in the early phase of the model-building workflow and the augmented-data projection for final results. The ordinal response family from our simulation study is supported by both projection methods, but we also include a real-world cancer subtyping example with a nominal response family, a case that is not supported by the latent projection.

投影预测变量选择是一种决策理论上合理的贝叶斯变量选择方法，可在预测性能和稀疏性之间实现出色的权衡。其投影问题在一般情况下并不容易解决，因为它是基于从所谓参考模型的受限后验预测分布到候选模型的参数条件预测分布的库尔贝-莱布勒发散。之前的研究表明了如何解决广义线性模型中的响应族的投影问题，以及如何使用近似潜空间方法解决许多其他响应族的投影问题。在这里，我们提出了一种适用于所有离散和有限支持的响应族的精确投影方法，即增强数据投影法。对一个序数响应族的仿真研究表明，所提出的方法比之前提出的近似潜空间投影方法性能更好，或者类似。增强数据投影性能略好的代价是运行时间大幅增加。因此，如果增强数据投影的运行时间过长，我们建议在模型建立工作流程的早期阶段使用潜空间投影，而在最终结果中使用增强数据投影。两种投影方法都支持我们模拟研究中的序数响应族，但我们也包含了一个真实世界中的癌症细分示例，该示例使用的是名义响应族，而潜在投影方法不支持这种情况。

{"title":"Projection predictive variable selection for discrete response families with finite support","authors":"Frank Weber, Änne Glass, Aki Vehtari","doi":"10.1007/s00180-024-01506-0","DOIUrl":"https://doi.org/10.1007/s00180-024-01506-0","url":null,"abstract":"<p>The projection predictive variable selection is a decision-theoretically justified Bayesian variable selection approach achieving an outstanding trade-off between predictive performance and sparsity. Its projection problem is not easy to solve in general because it is based on the Kullback–Leibler divergence from a restricted posterior predictive distribution of the so-called reference model to the parameter-conditional predictive distribution of a candidate model. Previous work showed how this projection problem can be solved for response families employed in generalized linear models and how an approximate latent-space approach can be used for many other response families. Here, we present an exact projection method for all response families with discrete and finite support, called the augmented-data projection. A simulation study for an ordinal response family shows that the proposed method performs better than or similarly to the previously proposed approximate latent-space projection. The cost of the slightly better performance of the augmented-data projection is a substantial increase in runtime. Thus, if the augmented-data projection’s runtime is too high, we recommend the latent projection in the early phase of the model-building workflow and the augmented-data projection for final results. The ordinal response family from our simulation study is supported by both projection methods, but we also include a real-world cancer subtyping example with a nominal response family, a case that is not supported by the latent projection.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"42 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141165753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient regression analyses with zero-augmented models based on ranking 基于排序的零增强模型的高效回归分析

IF 1.3 4区数学 Q3 STATISTICS & PROBABILITY

Computational Statistics

Pub Date : 2024-05-14 DOI: 10.1007/s00180-024-01503-3

Deborah Kanda, Jingjing Yin, Xinyan Zhang, Hani Samawi

Several zero-augmented models exist for estimation involving outcomes with large numbers of zero. Two of such models for handling count endpoints are zero-inflated and hurdle regression models. In this article, we apply the extreme ranked set sampling (ERSS) scheme in estimation using zero-inflated and hurdle regression models. We provide theoretical derivations showing superiority of ERSS compared to simple random sampling (SRS) using these zero-augmented models. A simulation study is also conducted to compare the efficiency of ERSS to SRS and lastly, we illustrate applications with real data sets.

有几种零增量模型可用于涉及大量零结果的估计。零膨胀回归模型和阶跃回归模型是处理计数终点的两种模型。在本文中，我们将极端排序集抽样（ERSS）方案应用于零膨胀和阶跃回归模型的估计中。我们提供的理论推导表明，与使用这些零膨胀模型的简单随机抽样（SRS）相比，ERSS 更具优势。我们还进行了模拟研究，比较了 ERSS 与 SRS 的效率，最后，我们用真实数据集说明了应用情况。

引用次数: 0

Exact and approximate computation of the scatter halfspace depth 散射半空间深度的精确和近似计算

IF 1.3 4区数学 Q3 STATISTICS & PROBABILITY

Computational Statistics

Pub Date : 2024-05-09 DOI: 10.1007/s00180-024-01500-6

Xiaohui Liu, Yuzi Liu, Petra Laketa, Stanislav Nagy, Yuting Chen

The scatter halfspace depth (sHD) is an extension of the location halfspace (also called Tukey) depth that is applicable in the nonparametric analysis of scatter. Using sHD, it is possible to define minimax optimal robust scatter estimators for multivariate data. The problem of exact computation of sHD for data of dimension (d ge 2) has, however, not been addressed in the literature. We develop an exact algorithm for the computation of sHD in any dimension d and implement it efficiently for any dimension (d ge 1). Since the exact computation of sHD is slow especially for higher dimensions, we also propose two fast approximate algorithms. All our programs are freely available in the R package scatterdepth.

散点半空间深度（sHD）是位置半空间深度（也称为 Tukey）的扩展，适用于散点的非参数分析。利用 sHD，可以定义多元数据的最小最优稳健散点估计值。然而，对于维数为 (d ge 2) 的数据，sHD 的精确计算问题在文献中还没有得到解决。我们开发了一种在任意维度 d 下计算 sHD 的精确算法，并在任意维度（dge 1 ）下有效地实现了这一算法。由于sHD的精确计算速度较慢，尤其是在高维情况下，因此我们还提出了两种快速近似算法。我们的所有程序都可以在R软件包scatterdepth中免费获取。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Computational Statistics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀