Pub Date : 2024-06-24DOI: 10.1007/s00180-024-01522-0
Florian Felice
In this work, we present a methodology to estimate the strength of handball teams. We propose the use of the Conway-Maxwell-Poisson distribution to model the number of goals scored by a team as a flexible discrete distribution which can handle situations of non equi-dispersion. From its parameters, we derive a mathematical formula to determine the strength of a team. We propose a ranking based on the estimated strengths to compare teams across different championships. Applied to female handball club data from European competitions over the 2022/2023 season, we show that our new proposed ranking can have an echo in real sports events and is linked to recent results from European competitions.
{"title":"Ranking handball teams from statistical strength estimation","authors":"Florian Felice","doi":"10.1007/s00180-024-01522-0","DOIUrl":"https://doi.org/10.1007/s00180-024-01522-0","url":null,"abstract":"<p>In this work, we present a methodology to estimate the strength of handball teams. We propose the use of the Conway-Maxwell-Poisson distribution to model the number of goals scored by a team as a flexible discrete distribution which can handle situations of non equi-dispersion. From its parameters, we derive a mathematical formula to determine the strength of a team. We propose a ranking based on the estimated strengths to compare teams across different championships. Applied to female handball club data from European competitions over the 2022/2023 season, we show that our new proposed ranking can have an echo in real sports events and is linked to recent results from European competitions.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"24 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141532487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-23DOI: 10.1007/s00180-024-01520-2
Hyunman Sim, Sungjeong Lee, Bo-Hyung Kim, Eun Shin, Woojoo Lee
Hypothesis testing for the regression coefficient associated with a dichotomized continuous covariate in a Cox proportional hazards model has been considered in clinical research. Although most existing testing methods do not allow covariates, except for a dichotomized continuous covariate, they have generally been applied. Through an analytic bias analysis and a numerical study, we show that the current practice is not free from an inflated type I error and a loss of power. To overcome this limitation, we develop a bootstrap-based test that allows additional covariates and dichotomizes two-dimensional covariates into a binary variable. In addition, we develop an efficient algorithm to speed up the calculation of the proposed test statistic. Our numerical study demonstrates that the proposed bootstrap-based test maintains the type I error well at the nominal level and exhibits higher power than other methods, as well as that the proposed efficient algorithm reduces computational costs.
临床研究一直在考虑对 Cox 比例危险模型中与二分连续协变量相关的回归系数进行假设检验。尽管除二分连续协变量外,现有的大多数检验方法不允许使用协变量,但这些方法已被普遍应用。通过分析偏差分析和数值研究,我们发现目前的做法并不能避免 I 型误差的扩大和功率的损失。为了克服这一局限性,我们开发了一种基于 bootstrap 的检验方法,允许使用额外的协变量,并将二维协变量二分为二元变量。此外,我们还开发了一种高效算法,以加快所提检验统计量的计算速度。我们的数值研究表明,与其他方法相比,所提出的基于引导的检验能在名义水平上很好地保持 I 型误差,并表现出更高的功率,同时所提出的高效算法也降低了计算成本。
{"title":"Hypothesis testing in Cox models when continuous covariates are dichotomized: bias analysis and bootstrap-based test","authors":"Hyunman Sim, Sungjeong Lee, Bo-Hyung Kim, Eun Shin, Woojoo Lee","doi":"10.1007/s00180-024-01520-2","DOIUrl":"https://doi.org/10.1007/s00180-024-01520-2","url":null,"abstract":"<p>Hypothesis testing for the regression coefficient associated with a dichotomized continuous covariate in a Cox proportional hazards model has been considered in clinical research. Although most existing testing methods do not allow covariates, except for a dichotomized continuous covariate, they have generally been applied. Through an analytic bias analysis and a numerical study, we show that the current practice is not free from an inflated type I error and a loss of power. To overcome this limitation, we develop a bootstrap-based test that allows additional covariates and dichotomizes two-dimensional covariates into a binary variable. In addition, we develop an efficient algorithm to speed up the calculation of the proposed test statistic. Our numerical study demonstrates that the proposed bootstrap-based test maintains the type I error well at the nominal level and exhibits higher power than other methods, as well as that the proposed efficient algorithm reduces computational costs.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"28 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141510737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-20DOI: 10.1007/s00180-024-01519-9
Emilie Lebarbier, Nicolas Marie, Amélie Rosier
This article focuses on the practical issue of a recent theoretical method proposed for trend estimation in high dimensional time series. This method falls within the scope of the low-rank matrix factorization methods in which the temporal structure is taken into account. It consists of minimizing a penalized criterion, theoretically efficient but which depends on two constants to be chosen in practice. We propose a two-step strategy to solve this question based on two different known heuristics. The performance and a comparison of the strategies are studied through an important simulation study in various scenarios. In order to make the estimation method with the best strategy available to the community, we implemented the method in an R package TrendTM which is presented and used here. Finally, we give a geometric interpretation of the results by linking it to PCA and use the results to solve a high-dimensional curve clustering problem. The package is available on CRAN.
本文重点讨论最近提出的一种用于高维时间序列趋势估计的理论方法的实际问题。该方法属于低秩矩阵因式分解方法的范畴,其中考虑了时间结构。它包括最小化一个惩罚性标准,该标准在理论上是有效的,但在实践中取决于两个常量的选择。我们基于两种不同的已知启发式方法,提出了一种分两步解决这一问题的策略。通过在各种情况下进行重要的模拟研究,对这些策略的性能和比较进行了研究。为了向社会提供具有最佳策略的估算方法,我们在 R 软件包 TrendTM 中实现了该方法,并在此介绍和使用。最后,我们通过将其与 PCA 相结合,对结果进行了几何解释,并利用结果解决了一个高维曲线聚类问题。该软件包可在 CRAN 上下载。
{"title":"Trend of high dimensional time series estimation using low-rank matrix factorization: heuristics and numerical experiments via the TrendTM package","authors":"Emilie Lebarbier, Nicolas Marie, Amélie Rosier","doi":"10.1007/s00180-024-01519-9","DOIUrl":"https://doi.org/10.1007/s00180-024-01519-9","url":null,"abstract":"<p>This article focuses on the practical issue of a recent theoretical method proposed for trend estimation in high dimensional time series. This method falls within the scope of the low-rank matrix factorization methods in which the temporal structure is taken into account. It consists of minimizing a penalized criterion, theoretically efficient but which depends on two constants to be chosen in practice. We propose a two-step strategy to solve this question based on two different known heuristics. The performance and a comparison of the strategies are studied through an important simulation study in various scenarios. In order to make the estimation method with the best strategy available to the community, we implemented the method in an R package <span>TrendTM</span> which is presented and used here. Finally, we give a geometric interpretation of the results by linking it to PCA and use the results to solve a high-dimensional curve clustering problem. The package is available on CRAN.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"3 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141530231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-16DOI: 10.1007/s00180-024-01514-0
Liwen Wang, Yongda Wang, Shifeng Xiong, Jiankui Yang
In this paper we discuss nonlinear dimensionality reduction within the framework of principal curves. We formulate dimensionality reduction as problems of estimating principal subspaces for both noiseless and noisy cases, and propose the corresponding iterative algorithms that modify existing principal curve algorithms. An R squared criterion is introduced to estimate the dimension of the principal subspace. In addition, we present new regression and density estimation strategies based on our dimensionality reduction algorithms. Theoretical analyses and numerical experiments show the effectiveness of the proposed methods.
本文讨论了主曲线框架内的非线性降维问题。我们将降维问题表述为估计无噪声和噪声情况下的主子空间问题,并提出了相应的迭代算法,对现有的主曲线算法进行了修改。我们引入了 R 平方准则来估计主子空间的维度。此外,我们还基于降维算法提出了新的回归和密度估计策略。理论分析和数值实验表明了所提方法的有效性。
{"title":"Some aspects of nonlinear dimensionality reduction","authors":"Liwen Wang, Yongda Wang, Shifeng Xiong, Jiankui Yang","doi":"10.1007/s00180-024-01514-0","DOIUrl":"https://doi.org/10.1007/s00180-024-01514-0","url":null,"abstract":"<p>In this paper we discuss nonlinear dimensionality reduction within the framework of principal curves. We formulate dimensionality reduction as problems of estimating principal subspaces for both noiseless and noisy cases, and propose the corresponding iterative algorithms that modify existing principal curve algorithms. An R squared criterion is introduced to estimate the dimension of the principal subspace. In addition, we present new regression and density estimation strategies based on our dimensionality reduction algorithms. Theoretical analyses and numerical experiments show the effectiveness of the proposed methods.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"202 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141510736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-05DOI: 10.1007/s00180-024-01510-4
Shinjune Kim, Youngjae Oh, Johan Lim, DoHwan Park, Erin M. Green, Mark L. Ramos, Jaesik Jeong
Many multiple test procedures, which control the false discovery rate, have been developed to identify some cases (e.g. genes) showing statistically significant difference between two different groups. However, a common issue encountered in some practical data sets is the presence of highly spiky null distributions. Existing methods struggle to control type I error in such cases due to the “inflated false positives," but this problem has not been addressed in previous literature. Our team recently encountered this issue while analyzing SET4 gene deletion data and proposed modeling the null distribution using a scale mixture normal distribution. However, the use of this approach is limited due to strong assumptions on the spiky peak. In this paper, we present a novel multiple test procedure that can be applied to any type of spiky peak data, including situations with no spiky peak or with one or two spiky peaks. Our approach involves truncating the central statistics around 0, which primarily contribute to the null spike, as well as the two tails that may be contaminated by alternative distributions. We refer to this method as the “double truncation method." After applying double truncation, we estimate the null density using the doubly truncated maximum likelihood estimator. We demonstrate numerically that our proposed method effectively controls the false discovery rate at the desired level using simulated data. Furthermore, we apply our method to two real data sets, namely the SET protein data and peony data.
目前已开发出许多控制误发现率的多重检验程序,用于识别一些在两个不同组别之间显示出显著统计学差异的情况(如基因)。然而,在一些实际数据集中遇到的一个常见问题是存在高度尖峰的空分布。在这种情况下,由于 "虚假阳性 "的存在,现有的方法很难控制 I 类错误,但这一问题在以往的文献中还没有得到解决。我们的团队最近在分析 SET4 基因缺失数据时遇到了这个问题,并建议使用比例混合正态分布来模拟空分布。然而,由于对尖峰的强烈假设,这种方法的使用受到了限制。在本文中,我们提出了一种新的多重检验程序,它可应用于任何类型的尖峰数据,包括无尖峰或有一个或两个尖峰的情况。我们的方法包括截断 0 附近的中心统计量(这是空尖峰的主要贡献),以及可能被其他分布污染的两个尾部。我们将这种方法称为 "双重截断法"。应用双重截断法后,我们使用双重截断最大似然估计法估计空密度。我们利用模拟数据用数字证明了我们提出的方法能有效地将误发现率控制在理想水平。此外,我们还将我们的方法应用于两个真实数据集,即 SET 蛋白质数据和牡丹数据。
{"title":"Double truncation method for controlling local false discovery rate in case of spiky null","authors":"Shinjune Kim, Youngjae Oh, Johan Lim, DoHwan Park, Erin M. Green, Mark L. Ramos, Jaesik Jeong","doi":"10.1007/s00180-024-01510-4","DOIUrl":"https://doi.org/10.1007/s00180-024-01510-4","url":null,"abstract":"<p>Many multiple test procedures, which control the false discovery rate, have been developed to identify some cases (e.g. genes) showing statistically significant difference between two different groups. However, a common issue encountered in some practical data sets is the presence of highly spiky null distributions. Existing methods struggle to control type I error in such cases due to the “inflated false positives,\" but this problem has not been addressed in previous literature. Our team recently encountered this issue while analyzing SET4 gene deletion data and proposed modeling the null distribution using a scale mixture normal distribution. However, the use of this approach is limited due to strong assumptions on the spiky peak. In this paper, we present a novel multiple test procedure that can be applied to any type of spiky peak data, including situations with no spiky peak or with one or two spiky peaks. Our approach involves truncating the central statistics around 0, which primarily contribute to the null spike, as well as the two tails that may be contaminated by alternative distributions. We refer to this method as the “double truncation method.\" After applying double truncation, we estimate the null density using the doubly truncated maximum likelihood estimator. We demonstrate numerically that our proposed method effectively controls the false discovery rate at the desired level using simulated data. Furthermore, we apply our method to two real data sets, namely the SET protein data and peony data.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"25 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141256347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-03DOI: 10.1007/s00180-024-01509-x
Yi Wu, Wei Wang, Wei Yu, Xuejun Wang
Kernel estimators of density function and hazard rate function are very important in nonparametric statistics. The paper aims to investigate the uniformly strong representations and the rates of uniformly strong consistency for kernel smoothing density and hazard rate function estimation with censored widely orthant dependent data based on the Kaplan–Meier estimator. Under some mild conditions, the rates of the remainder term and strong consistency are shown to be (Obig (sqrt{log (ng(n))/big (nb_{n}^{2}big )}big )~a.s.) and (Obig (sqrt{log (ng(n))/big (nb_{n}^{2}big )}big )+Obig (b_{n}^{2}big )~a.s.), respectively, where g(n) are the dominating coefficients of widely orthant dependent random variables. Some numerical simulations and a real data analysis are also presented to confirm the theoretical results based on finite sample performances.
{"title":"Asymptotic properties of kernel density and hazard rate function estimators with censored widely orthant dependent data","authors":"Yi Wu, Wei Wang, Wei Yu, Xuejun Wang","doi":"10.1007/s00180-024-01509-x","DOIUrl":"https://doi.org/10.1007/s00180-024-01509-x","url":null,"abstract":"<p>Kernel estimators of density function and hazard rate function are very important in nonparametric statistics. The paper aims to investigate the uniformly strong representations and the rates of uniformly strong consistency for kernel smoothing density and hazard rate function estimation with censored widely orthant dependent data based on the Kaplan–Meier estimator. Under some mild conditions, the rates of the remainder term and strong consistency are shown to be <span>(Obig (sqrt{log (ng(n))/big (nb_{n}^{2}big )}big )~a.s.)</span> and <span>(Obig (sqrt{log (ng(n))/big (nb_{n}^{2}big )}big )+Obig (b_{n}^{2}big )~a.s.)</span>, respectively, where <i>g</i>(<i>n</i>) are the dominating coefficients of widely orthant dependent random variables. Some numerical simulations and a real data analysis are also presented to confirm the theoretical results based on finite sample performances.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"128 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141256196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-29DOI: 10.1007/s00180-024-01508-y
Joanna Janczura
In this paper we propose a new method for probabilistic forecasting of electricity prices. It is based on averaging point forecasts from different models combined with expectile regression. We show that deriving the predicted distribution in terms of expectiles, might be in some cases advantageous to the commonly used quantiles. We apply the proposed method to the day-ahead electricity prices from the German market and compare its accuracy with the Quantile Regression Averaging method and quantile- as well as expectile-based historical simulation. The obtained results indicate that using the expectile regression improves the accuracy of the probabilistic forecasts of electricity prices, but a variance stabilizing transformation should be applied prior to modelling.
{"title":"Expectile regression averaging method for probabilistic forecasting of electricity prices","authors":"Joanna Janczura","doi":"10.1007/s00180-024-01508-y","DOIUrl":"https://doi.org/10.1007/s00180-024-01508-y","url":null,"abstract":"<p>In this paper we propose a new method for probabilistic forecasting of electricity prices. It is based on averaging point forecasts from different models combined with expectile regression. We show that deriving the predicted distribution in terms of expectiles, might be in some cases advantageous to the commonly used quantiles. We apply the proposed method to the day-ahead electricity prices from the German market and compare its accuracy with the Quantile Regression Averaging method and quantile- as well as expectile-based historical simulation. The obtained results indicate that using the expectile regression improves the accuracy of the probabilistic forecasts of electricity prices, but a variance stabilizing transformation should be applied prior to modelling.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"28 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141165757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-29DOI: 10.1007/s00180-024-01506-0
Frank Weber, Änne Glass, Aki Vehtari
The projection predictive variable selection is a decision-theoretically justified Bayesian variable selection approach achieving an outstanding trade-off between predictive performance and sparsity. Its projection problem is not easy to solve in general because it is based on the Kullback–Leibler divergence from a restricted posterior predictive distribution of the so-called reference model to the parameter-conditional predictive distribution of a candidate model. Previous work showed how this projection problem can be solved for response families employed in generalized linear models and how an approximate latent-space approach can be used for many other response families. Here, we present an exact projection method for all response families with discrete and finite support, called the augmented-data projection. A simulation study for an ordinal response family shows that the proposed method performs better than or similarly to the previously proposed approximate latent-space projection. The cost of the slightly better performance of the augmented-data projection is a substantial increase in runtime. Thus, if the augmented-data projection’s runtime is too high, we recommend the latent projection in the early phase of the model-building workflow and the augmented-data projection for final results. The ordinal response family from our simulation study is supported by both projection methods, but we also include a real-world cancer subtyping example with a nominal response family, a case that is not supported by the latent projection.
{"title":"Projection predictive variable selection for discrete response families with finite support","authors":"Frank Weber, Änne Glass, Aki Vehtari","doi":"10.1007/s00180-024-01506-0","DOIUrl":"https://doi.org/10.1007/s00180-024-01506-0","url":null,"abstract":"<p>The projection predictive variable selection is a decision-theoretically justified Bayesian variable selection approach achieving an outstanding trade-off between predictive performance and sparsity. Its projection problem is not easy to solve in general because it is based on the Kullback–Leibler divergence from a restricted posterior predictive distribution of the so-called reference model to the parameter-conditional predictive distribution of a candidate model. Previous work showed how this projection problem can be solved for response families employed in generalized linear models and how an approximate latent-space approach can be used for many other response families. Here, we present an exact projection method for all response families with discrete and finite support, called the augmented-data projection. A simulation study for an ordinal response family shows that the proposed method performs better than or similarly to the previously proposed approximate latent-space projection. The cost of the slightly better performance of the augmented-data projection is a substantial increase in runtime. Thus, if the augmented-data projection’s runtime is too high, we recommend the latent projection in the early phase of the model-building workflow and the augmented-data projection for final results. The ordinal response family from our simulation study is supported by both projection methods, but we also include a real-world cancer subtyping example with a nominal response family, a case that is not supported by the latent projection.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"42 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141165753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Several zero-augmented models exist for estimation involving outcomes with large numbers of zero. Two of such models for handling count endpoints are zero-inflated and hurdle regression models. In this article, we apply the extreme ranked set sampling (ERSS) scheme in estimation using zero-inflated and hurdle regression models. We provide theoretical derivations showing superiority of ERSS compared to simple random sampling (SRS) using these zero-augmented models. A simulation study is also conducted to compare the efficiency of ERSS to SRS and lastly, we illustrate applications with real data sets.
{"title":"Efficient regression analyses with zero-augmented models based on ranking","authors":"Deborah Kanda, Jingjing Yin, Xinyan Zhang, Hani Samawi","doi":"10.1007/s00180-024-01503-3","DOIUrl":"https://doi.org/10.1007/s00180-024-01503-3","url":null,"abstract":"<p>Several zero-augmented models exist for estimation involving outcomes with large numbers of zero. Two of such models for handling count endpoints are zero-inflated and hurdle regression models. In this article, we apply the extreme ranked set sampling (ERSS) scheme in estimation using zero-inflated and hurdle regression models. We provide theoretical derivations showing superiority of ERSS compared to simple random sampling (SRS) using these zero-augmented models. A simulation study is also conducted to compare the efficiency of ERSS to SRS and lastly, we illustrate applications with real data sets.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"5 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140935059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-09DOI: 10.1007/s00180-024-01500-6
Xiaohui Liu, Yuzi Liu, Petra Laketa, Stanislav Nagy, Yuting Chen
The scatter halfspace depth (sHD) is an extension of the location halfspace (also called Tukey) depth that is applicable in the nonparametric analysis of scatter. Using sHD, it is possible to define minimax optimal robust scatter estimators for multivariate data. The problem of exact computation of sHD for data of dimension (d ge 2) has, however, not been addressed in the literature. We develop an exact algorithm for the computation of sHD in any dimension d and implement it efficiently for any dimension (d ge 1). Since the exact computation of sHD is slow especially for higher dimensions, we also propose two fast approximate algorithms. All our programs are freely available in the R package scatterdepth.
散点半空间深度(sHD)是位置半空间深度(也称为 Tukey)的扩展,适用于散点的非参数分析。利用 sHD,可以定义多元数据的最小最优稳健散点估计值。然而,对于维数为 (d ge 2) 的数据,sHD 的精确计算问题在文献中还没有得到解决。我们开发了一种在任意维度 d 下计算 sHD 的精确算法,并在任意维度 (dge 1 )下有效地实现了这一算法。由于sHD的精确计算速度较慢,尤其是在高维情况下,因此我们还提出了两种快速近似算法。我们的所有程序都可以在R软件包scatterdepth中免费获取。
{"title":"Exact and approximate computation of the scatter halfspace depth","authors":"Xiaohui Liu, Yuzi Liu, Petra Laketa, Stanislav Nagy, Yuting Chen","doi":"10.1007/s00180-024-01500-6","DOIUrl":"https://doi.org/10.1007/s00180-024-01500-6","url":null,"abstract":"<p>The scatter halfspace depth (<b>sHD</b>) is an extension of the location halfspace (also called Tukey) depth that is applicable in the nonparametric analysis of scatter. Using <b>sHD</b>, it is possible to define minimax optimal robust scatter estimators for multivariate data. The problem of exact computation of <b>sHD</b> for data of dimension <span>(d ge 2)</span> has, however, not been addressed in the literature. We develop an exact algorithm for the computation of <b>sHD</b> in any dimension <i>d</i> and implement it efficiently for any dimension <span>(d ge 1)</span>. Since the exact computation of <b>sHD</b> is slow especially for higher dimensions, we also propose two fast approximate algorithms. All our programs are freely available in the <span>R</span> package <span>scatterdepth</span>.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"43 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140942041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}