Statistics and Computing最新文献

英文中文

Resampling-based confidence intervals and bands for the average treatment effect in observational studies with competing risks 有竞争风险的观察性研究中平均治疗效果的基于重采样的置信区间和带宽

IF 2.2 2区数学 Q1 Mathematics

Statistics and Computing

Pub Date : 2024-03-21 DOI: 10.1007/s11222-024-10420-w

Jasmin Rühl, Sarah Friedrich

The g-formula can be used to estimate the treatment effect while accounting for confounding bias in observational studies. With regard to time-to-event endpoints, possibly subject to competing risks, the construction of valid pointwise confidence intervals and time-simultaneous confidence bands for the causal risk difference is complicated, however. A convenient solution is to approximate the asymptotic distribution of the corresponding stochastic process by means of resampling approaches. In this paper, we consider three different resampling methods, namely the classical nonparametric bootstrap, the influence function equipped with a resampling approach as well as a martingale-based bootstrap version, the so-called wild bootstrap. For the latter, three sub-versions based on differing distributions of the underlying random multipliers are examined. We set up a simulation study to compare the accuracy of the different techniques, which reveals that the wild bootstrap should in general be preferred if the sample size is moderate and sufficient data on the event of interest have been accrued. For illustration, the resampling methods are further applied to data on the long-term survival in patients with early-stage Hodgkin’s disease.

g 公式可用于估计治疗效果，同时考虑观察性研究中的混杂偏倚。然而，对于可能存在竞争风险的时间到事件终点，构建有效的因果风险差异点式置信区间和时间同步置信区间非常复杂。一个方便的解决方案是通过重采样方法来近似相应随机过程的渐近分布。在本文中，我们考虑了三种不同的重采样方法，即经典的非参数自举法、配备重采样方法的影响函数以及基于马氏自举法的版本，即所谓的野生自举法。对于后者，我们研究了基于基础随机乘数不同分布的三个子版本。我们建立了一个模拟研究来比较不同技术的准确性，结果表明，如果样本量适中，并且已经积累了足够的相关事件的数据，一般情况下野生自举法更受欢迎。为了说明问题，我们还将重采样方法进一步应用于早期霍奇金病患者的长期生存数据。

{"title":"Resampling-based confidence intervals and bands for the average treatment effect in observational studies with competing risks","authors":"Jasmin Rühl, Sarah Friedrich","doi":"10.1007/s11222-024-10420-w","DOIUrl":"https://doi.org/10.1007/s11222-024-10420-w","url":null,"abstract":"The g-formula can be used to estimate the treatment effect while accounting for confounding bias in observational studies. With regard to time-to-event endpoints, possibly subject to competing risks, the construction of valid pointwise confidence intervals and time-simultaneous confidence bands for the causal risk difference is complicated, however. A convenient solution is to approximate the asymptotic distribution of the corresponding stochastic process by means of resampling approaches. In this paper, we consider three different resampling methods, namely the classical nonparametric bootstrap, the influence function equipped with a resampling approach as well as a martingale-based bootstrap version, the so-called wild bootstrap. For the latter, three sub-versions based on differing distributions of the underlying random multipliers are examined. We set up a simulation study to compare the accuracy of the different techniques, which reveals that the wild bootstrap should in general be preferred if the sample size is moderate and sufficient data on the event of interest have been accrued. For illustration, the resampling methods are further applied to data on the long-term survival in patients with early-stage Hodgkin’s disease.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140201223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A constant-per-iteration likelihood ratio test for online changepoint detection for exponential family models 指数族模型在线变化点检测的每次迭代恒定似然比检验

IF 2.2 2区数学 Q1 Mathematics

Statistics and Computing

Pub Date : 2024-03-19 DOI: 10.1007/s11222-024-10416-6

Kes Ward, Gaetano Romano, Idris Eckley, Paul Fearnhead

Online changepoint detection algorithms that are based on (generalised) likelihood-ratio tests have been shown to have excellent statistical properties. However, a simple online implementation is computationally infeasible as, at time T, it involves considering O(T) possible locations for the change. Recently, the FOCuS algorithm has been introduced for detecting changes in mean in Gaussian data that decreases the per-iteration cost to (O(log T)). This is possible by using pruning ideas, which reduce the set of changepoint locations that need to be considered at time T to approximately (log T). We show that if one wishes to perform the likelihood ratio test for a different one-parameter exponential family model, then exactly the same pruning rule can be used, and again one need only consider approximately (log T) locations at iteration T. Furthermore, we show how we can adaptively perform the maximisation step of the algorithm so that we need only maximise the test statistic over a small subset of these possible locations. Empirical results show that the resulting online algorithm, which can detect changes under a wide range of models, has a constant-per-iteration cost on average.

基于（广义）似然比检验的在线变化点检测算法已被证明具有出色的统计特性。然而，简单的在线实施在计算上是不可行的，因为在时间 T 上，需要考虑 O(T) 个可能的变化位置。最近，针对高斯数据均值变化的检测引入了 FOCuS 算法，该算法将每次迭代成本降低到了(O(log T))。这是通过使用剪枝思想实现的，剪枝思想将需要在时间 T 上考虑的变化点位置集减少到大约 (log T) 。我们证明，如果希望对不同的单参数指数族模型进行似然比检验，那么可以使用完全相同的剪枝规则，同样只需要在迭代 T 时考虑大约 (log T) 个位置。此外，我们还证明了如何自适应地执行算法的最大化步骤，从而只需要在这些可能位置的一小部分上最大化检验统计量。实证结果表明，由此产生的在线算法可以在多种模型下检测变化，平均每次迭代成本不变。

{"title":"A constant-per-iteration likelihood ratio test for online changepoint detection for exponential family models","authors":"Kes Ward, Gaetano Romano, Idris Eckley, Paul Fearnhead","doi":"10.1007/s11222-024-10416-6","DOIUrl":"https://doi.org/10.1007/s11222-024-10416-6","url":null,"abstract":"Online changepoint detection algorithms that are based on (generalised) likelihood-ratio tests have been shown to have excellent statistical properties. However, a simple online implementation is computationally infeasible as, at time T, it involves considering O(T) possible locations for the change. Recently, the FOCuS algorithm has been introduced for detecting changes in mean in Gaussian data that decreases the per-iteration cost to (O(log T)). This is possible by using pruning ideas, which reduce the set of changepoint locations that need to be considered at time T to approximately (log T). We show that if one wishes to perform the likelihood ratio test for a different one-parameter exponential family model, then exactly the same pruning rule can be used, and again one need only consider approximately (log T) locations at iteration T. Furthermore, we show how we can adaptively perform the maximisation step of the algorithm so that we need only maximise the test statistic over a small subset of these possible locations. Empirical results show that the resulting online algorithm, which can detect changes under a wide range of models, has a constant-per-iteration cost on average.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140168815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving model choice in classification: an approach based on clustering of covariance matrices 改进分类中的模型选择：基于协方差矩阵聚类的方法

IF 2.2 2区数学 Q1 Mathematics

Statistics and Computing

Pub Date : 2024-03-19 DOI: 10.1007/s11222-024-10410-y

David Rodríguez-Vítores, Carlos Matrán

This work introduces a refinement of the Parsimonious Model for fitting a Gaussian Mixture. The improvement is based on the consideration of clusters of the involved covariance matrices according to a criterion, such as sharing Principal Directions. This and other similarity criteria that arise from the spectral decomposition of a matrix are the bases of the Parsimonious Model. We show that such groupings of covariance matrices can be achieved through simple modifications of the CEM (Classification Expectation Maximization) algorithm. Our approach leads to propose Gaussian Mixture Models for model-based clustering and discriminant analysis, in which covariance matrices are clustered according to a parsimonious criterion, creating intermediate steps between the fourteen widely known parsimonious models. The added versatility not only allows us to obtain models with fewer parameters for fitting the data, but also provides greater interpretability. We show its usefulness for model-based clustering and discriminant analysis, providing algorithms to find approximate solutions verifying suitable size, shape and orientation constraints, and applying them to both simulation and real data examples.

这项工作对用于拟合高斯混合物的帕西莫尼模型（Parsimonious Model）进行了改进。改进的基础是根据标准（如共享主方向）考虑相关协方差矩阵的聚类。这个标准和其他由矩阵谱分解产生的相似性标准是帕西蒙模型的基础。我们的研究表明，通过对 CEM（分类期望最大化）算法进行简单的修改，就可以对协方差矩阵进行分组。我们的方法提出了基于模型的聚类和判别分析的高斯混杂模型，其中协方差矩阵是根据拟标准聚类的，在 14 个广为人知的拟模型之间创建了中间步骤。增加的多功能性不仅能让我们用更少的参数获得拟合数据的模型，还能提供更高的可解释性。我们展示了它在基于模型的聚类和判别分析中的实用性，提供了验证适当大小、形状和方向约束的近似解的算法，并将其应用于模拟和真实数据示例。

{"title":"Improving model choice in classification: an approach based on clustering of covariance matrices","authors":"David Rodríguez-Vítores, Carlos Matrán","doi":"10.1007/s11222-024-10410-y","DOIUrl":"https://doi.org/10.1007/s11222-024-10410-y","url":null,"abstract":"This work introduces a refinement of the Parsimonious Model for fitting a Gaussian Mixture. The improvement is based on the consideration of clusters of the involved covariance matrices according to a criterion, such as sharing Principal Directions. This and other similarity criteria that arise from the spectral decomposition of a matrix are the bases of the Parsimonious Model. We show that such groupings of covariance matrices can be achieved through simple modifications of the CEM (Classification Expectation Maximization) algorithm. Our approach leads to propose Gaussian Mixture Models for model-based clustering and discriminant analysis, in which covariance matrices are clustered according to a parsimonious criterion, creating intermediate steps between the fourteen widely known parsimonious models. The added versatility not only allows us to obtain models with fewer parameters for fitting the data, but also provides greater interpretability. We show its usefulness for model-based clustering and discriminant analysis, providing algorithms to find approximate solutions verifying suitable size, shape and orientation constraints, and applying them to both simulation and real data examples.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140169060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Functional mixtures-of-experts 专家功能混合物

IF 2.2 2区数学 Q1 Mathematics

Statistics and Computing

Pub Date : 2024-03-18 DOI: 10.1007/s11222-023-10379-0

Faïcel Chamroukhi, Nhat Thien Pham, Van Hà Hoang, Geoffrey J. McLachlan

We consider the statistical analysis of heterogeneous data for prediction, in situations where the observations include functions, typically time series. We extend the modeling with mixtures-of-experts (ME), as a framework of choice in modeling heterogeneity in data for prediction with vectorial observations, to this functional data analysis context. We first present a new family of ME models, named functional ME (FME), in which the predictors are potentially noisy observations, from entire functions. Furthermore, the data generating process of the predictor and the real response, is governed by a hidden discrete variable representing an unknown partition. Second, by imposing sparsity on derivatives of the underlying functional parameters via Lasso-like regularizations, we provide sparse and interpretable functional representations of the FME models called iFME. We develop dedicated expectation–maximization algorithms for Lasso-like regularized maximum-likelihood parameter estimation strategies to fit the models. The proposed models and algorithms are studied in simulated scenarios and in applications to two real data sets, and the obtained results demonstrate their performance in accurately capturing complex nonlinear relationships and in clustering the heterogeneous regression data.

我们考虑的是在观测数据包括函数（通常是时间序列）的情况下，对用于预测的异构数据进行统计分析。我们将专家混合物（ME）建模扩展到这种函数数据分析环境中，ME 是矢量观测数据异质性建模的首选框架。我们首先提出了一个新的 ME 模型系列，命名为函数 ME（FME），其中的预测因子是来自整个函数的潜在噪声观测值。此外，预测因子和实际响应的数据生成过程由代表未知分区的隐藏离散变量控制。其次，通过类似于 Lasso 的正则化对底层函数参数的导数施加稀疏性，我们为称为 iFME 的 FME 模型提供了稀疏且可解释的函数表示。我们为 Lasso 样正则化最大似然参数估计策略开发了专门的期望最大化算法，以拟合模型。我们在模拟场景和两个真实数据集的应用中研究了所提出的模型和算法，所获得的结果证明了它们在准确捕捉复杂的非线性关系和聚类异质回归数据方面的性能。

{"title":"Functional mixtures-of-experts","authors":"Faïcel Chamroukhi, Nhat Thien Pham, Van Hà Hoang, Geoffrey J. McLachlan","doi":"10.1007/s11222-023-10379-0","DOIUrl":"https://doi.org/10.1007/s11222-023-10379-0","url":null,"abstract":"We consider the statistical analysis of heterogeneous data for prediction, in situations where the observations include functions, typically time series. We extend the modeling with mixtures-of-experts (ME), as a framework of choice in modeling heterogeneity in data for prediction with vectorial observations, to this functional data analysis context. We first present a new family of ME models, named functional ME (FME), in which the predictors are potentially noisy observations, from entire functions. Furthermore, the data generating process of the predictor and the real response, is governed by a hidden discrete variable representing an unknown partition. Second, by imposing sparsity on derivatives of the underlying functional parameters via Lasso-like regularizations, we provide sparse and interpretable functional representations of the FME models called iFME. We develop dedicated expectation–maximization algorithms for Lasso-like regularized maximum-likelihood parameter estimation strategies to fit the models. The proposed models and algorithms are studied in simulated scenarios and in applications to two real data sets, and the obtained results demonstrate their performance in accurately capturing complex nonlinear relationships and in clustering the heterogeneous regression data.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140152764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Expectile and M-quantile regression for panel data 面板数据的期望值和 M 四分位回归

IF 2.2 2区数学 Q1 Mathematics

Statistics and Computing

Pub Date : 2024-03-17 DOI: 10.1007/s11222-024-10396-7

Ian Meneghel Danilevicz, Valdério Anselmo Reisen, Pascal Bondon

Linear fixed effect models are a general way to fit panel or longitudinal data with a distinct intercept for each unit. Based on expectile and M-quantile approaches, we propose alternative regression estimation methods to estimate the parameters of linear fixed effect models. The estimation functions are penalized by the least absolute shrinkage and selection operator to reduce the dimensionality of the data. Some asymptotic properties of the estimators are established, and finite sample size investigations are conducted to verify the empirical performances of the estimation methods. The computational implementations of the procedures are discussed, and real economic panel data from the Organisation for Economic Cooperation and Development are analyzed to show the usefulness of the methods in a practical problem.

线性固定效应模型是拟合面板或纵向数据的一种通用方法，每个单元都有一个截距。基于期望值和 M-quantile方法，我们提出了另一种回归估计方法来估计线性固定效应模型的参数。通过最小绝对收缩和选择算子对估计函数进行惩罚，以降低数据的维度。建立了估计器的一些渐近性质，并进行了有限样本量调查，以验证估计方法的经验性能。讨论了程序的计算实现，并分析了来自经济合作与发展组织的真实经济面板数据，以展示这些方法在实际问题中的实用性。

引用次数: 0

Matrix regression heterogeneity analysis 矩阵回归异质性分析

IF 2.2 2区数学 Q1 Mathematics

Statistics and Computing

Pub Date : 2024-03-16 DOI: 10.1007/s11222-024-10401-z

Fengchuan Zhang, Sanguo Zhang, Shi-Ming Li, Mingyang Ren

The development of modern science and technology has facilitated the collection of a large amount of matrix data in fields such as biomedicine. Matrix data modeling has been extensively studied, which advances from the naive approach of flattening the matrix into a vector. However, existing matrix modeling methods mainly focus on homogeneous data, failing to handle the data heterogeneity frequently encountered in the biomedical field, where samples from the same study belong to several underlying subgroups, and different subgroups follow different models. In this paper, we focus on regression-based heterogeneity analysis. We propose a matrix data heterogeneity analysis framework, by combining matrix bilinear sparse decomposition and penalized fusion techniques, which enables data-driven subgroup detection, including determining the number of subgroups and subgrouping membership. A rigorous theoretical analysis is conducted, including asymptotic consistency in terms of subgroup detection, the number of subgroups, and regression coefficients. Numerous numerical studies based on simulated and real data have been constructed, showcasing the superior performance of the proposed method in analyzing matrix heterogeneous data.

现代科学技术的发展促进了生物医学等领域大量矩阵数据的收集。矩阵数据建模已被广泛研究，它从将矩阵扁平化为矢量的天真方法中发展而来。然而，现有的矩阵建模方法主要针对同质数据，无法处理生物医学领域经常遇到的数据异质性问题，即同一研究的样本属于多个基础亚组，而不同的亚组遵循不同的模型。本文重点研究基于回归的异质性分析。我们结合矩阵双线性稀疏分解和惩罚融合技术，提出了一种矩阵数据异质性分析框架，实现了数据驱动的亚组检测，包括确定亚组数量和亚组成员。我们进行了严格的理论分析，包括子群检测、子群数量和回归系数的渐进一致性。基于模拟数据和真实数据构建了大量数值研究，展示了所提方法在分析矩阵异构数据时的优越性能。

{"title":"Matrix regression heterogeneity analysis","authors":"Fengchuan Zhang, Sanguo Zhang, Shi-Ming Li, Mingyang Ren","doi":"10.1007/s11222-024-10401-z","DOIUrl":"https://doi.org/10.1007/s11222-024-10401-z","url":null,"abstract":"The development of modern science and technology has facilitated the collection of a large amount of matrix data in fields such as biomedicine. Matrix data modeling has been extensively studied, which advances from the naive approach of flattening the matrix into a vector. However, existing matrix modeling methods mainly focus on homogeneous data, failing to handle the data heterogeneity frequently encountered in the biomedical field, where samples from the same study belong to several underlying subgroups, and different subgroups follow different models. In this paper, we focus on regression-based heterogeneity analysis. We propose a matrix data heterogeneity analysis framework, by combining matrix bilinear sparse decomposition and penalized fusion techniques, which enables data-driven subgroup detection, including determining the number of subgroups and subgrouping membership. A rigorous theoretical analysis is conducted, including asymptotic consistency in terms of subgroup detection, the number of subgroups, and regression coefficients. Numerous numerical studies based on simulated and real data have been constructed, showcasing the superior performance of the proposed method in analyzing matrix heterogeneous data.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140152763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Doubly robust estimation of optimal treatment regimes for survival data using an instrumental variable 利用工具变量对生存数据的最佳治疗方案进行双稳健估计

IF 2.2 2区数学 Q1 Mathematics

Statistics and Computing

Pub Date : 2024-03-16 DOI: 10.1007/s11222-024-10407-7

Xia Junwen, Zhan Zishu, Zhang Jingxiao

In survival contexts, substantial literature exists on estimating optimal treatment regimes, where treatments are assigned based on personal characteristics to maximize the survival probability. These methods assume that a set of covariates is sufficient to deconfound the treatment-outcome relationship. However, this assumption can be limited in observational studies or randomized trials in which non-adherence occurs. Therefore, we propose a novel approach to estimating optimal treatment regimes when certain confounders are unobservable and a binary instrumental variable is available. Specifically, via a binary instrumental variable, we propose a semiparametric estimator for optimal treatment regimes by maximizing a Kaplan–Meier-like estimator of the survival function. Furthermore, to increase resistance to model misspecification, we construct novel doubly robust estimators. Since the estimators of the survival function are jagged, we incorporate kernel smoothing methods to improve performance. Under appropriate regularity conditions, the asymptotic properties are rigorously established. Moreover, the finite sample performance is evaluated through simulation studies. Finally, we illustrate our method using data from the National Cancer Institute’s prostate, lung, colorectal, and ovarian cancer screening trial.

在存活情况下，有大量文献对最佳治疗方案进行了估计，即根据个人特征分配治疗方案，以最大限度地提高存活概率。这些方法假设一组协变量足以解除治疗与结果之间的关系。然而，在观察性研究或随机试验中，这一假设可能会受到限制，因为在这些研究中会出现不坚持治疗的情况。因此，我们提出了一种新方法，在某些混杂因素不可观测且有二元工具变量的情况下，估算最佳治疗方案。具体来说，通过二元工具变量，我们提出了一种最佳治疗方案的半参数估计方法，即最大化生存函数的卡普兰-梅耶估计值。此外，为了提高对模型错误规范的抵抗力，我们构建了新颖的双重稳健估计器。由于生存函数的估计值是锯齿状的，我们采用了核平滑方法来提高性能。在适当的正则条件下，我们严格地建立了渐近特性。此外，我们还通过模拟研究评估了有限样本的性能。最后，我们使用美国国家癌症研究所的前列腺癌、肺癌、结肠直肠癌和卵巢癌筛查试验数据来说明我们的方法。

{"title":"Doubly robust estimation of optimal treatment regimes for survival data using an instrumental variable","authors":"Xia Junwen, Zhan Zishu, Zhang Jingxiao","doi":"10.1007/s11222-024-10407-7","DOIUrl":"https://doi.org/10.1007/s11222-024-10407-7","url":null,"abstract":"In survival contexts, substantial literature exists on estimating optimal treatment regimes, where treatments are assigned based on personal characteristics to maximize the survival probability. These methods assume that a set of covariates is sufficient to deconfound the treatment-outcome relationship. However, this assumption can be limited in observational studies or randomized trials in which non-adherence occurs. Therefore, we propose a novel approach to estimating optimal treatment regimes when certain confounders are unobservable and a binary instrumental variable is available. Specifically, via a binary instrumental variable, we propose a semiparametric estimator for optimal treatment regimes by maximizing a Kaplan–Meier-like estimator of the survival function. Furthermore, to increase resistance to model misspecification, we construct novel doubly robust estimators. Since the estimators of the survival function are jagged, we incorporate kernel smoothing methods to improve performance. Under appropriate regularity conditions, the asymptotic properties are rigorously established. Moreover, the finite sample performance is evaluated through simulation studies. Finally, we illustrate our method using data from the National Cancer Institute’s prostate, lung, colorectal, and ovarian cancer screening trial.","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":null,"pages":null},"PeriodicalIF":2.2,"publicationDate":"2024-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140156495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Quantile ratio regression 定量比率回归

IF 2.2 2区数学 Q1 Mathematics

Statistics and Computing

Pub Date : 2024-03-14 DOI: 10.1007/s11222-024-10406-8

Alessio Farcomeni, Marco Geraci

We introduce quantile ratio regression. Our proposed model assumes that the ratio of two arbitrary quantiles of a continuous response distribution is a function of a linear predictor. Thanks to basic quantile properties, estimation can be carried out on the scale of either the response or the link function. The advantage of using the latter becomes tangible when implementing fast optimizers for linear regression in the presence of large datasets. We show the theoretical properties of the estimator and derive an efficient method to obtain standard errors. The good performance and merit of our methods are illustrated by means of a simulation study and a real data analysis; where we investigate income inequality in the European Union (EU) using data from a sample of about two million households. We find a significant association between inequality, as measured by quantile ratios, and certain macroeconomic indicators; and we identify countries with outlying income inequality relative to the rest of the EU. An R implementation of the proposed methods is freely available.

我们介绍量值比回归。我们提出的模型假定连续响应分布的两个任意量级的比值是线性预测因子的函数。利用量值的基本特性，可以在响应或链接函数的尺度上进行估计。当在大型数据集中实施线性回归的快速优化时，使用后者的优势就显而易见了。我们展示了估计器的理论特性，并推导出一种获取标准误差的有效方法。我们通过模拟研究和实际数据分析，对欧盟（EU）的收入不平等现象进行了调查。我们发现，以量子比率衡量的不平等与某些宏观经济指标之间存在重要关联；我们还确定了与欧盟其他国家相比收入不平等程度偏高的国家。我们免费提供所建议方法的 R 实现。

引用次数: 0

Robust score matching for compositional data 成分数据的稳健分数匹配

IF 2.2 2区数学 Q1 Mathematics

Statistics and Computing

Pub Date : 2024-03-13 DOI: 10.1007/s11222-024-10412-w

Janice L. Scealy, Kassel L. Hingee, John T. Kent, Andrew T. A. Wood

The restricted polynomially-tilted pairwise interaction (RPPI) distribution gives a flexible model for compositional data. It is particularly well-suited to situations where some of the marginal distributions of the components of a composition are concentrated near zero, possibly with right skewness. This article develops a method of tractable robust estimation for the model by combining two ideas. The first idea is to use score matching estimation after an additive log-ratio transformation. The resulting estimator is automatically insensitive to zeros in the data compositions. The second idea is to incorporate suitable weights in the estimating equations. The resulting estimator is additionally resistant to outliers. These properties are confirmed in simulation studies where we further also demonstrate that our new outlier-robust estimator is efficient in high concentration settings, even in the case when there is no model contamination. An example is given using microbiome data. A user-friendly R package accompanies the article.

受限多项式倾斜成对交互分布（RPPI）为成分数据提供了一个灵活的模型。它尤其适用于组成成分的某些边际分布集中在零附近，可能具有右偏斜的情况。本文结合两种思路，为该模型开发了一种可操作的稳健估计方法。第一个想法是在加法对数比率变换后使用分数匹配估计。由此产生的估计器对数据组成中的零自动不敏感。第二个想法是在估计方程中加入适当的权重。由此产生的估计器还能抵御异常值。这些特性在模拟研究中得到了证实，我们还进一步证明，即使在没有模型污染的情况下，我们新的抗异常值估计器在高浓度环境下也是有效的。我们以微生物组数据为例进行了说明。文章附有一个用户友好的 R 软件包。

引用次数: 0

Quantile generalized measures of correlation 广义相关量

IF 2.2 2区数学 Q1 Mathematics

Statistics and Computing

Pub Date : 2024-03-12 DOI: 10.1007/s11222-024-10414-8

Xinyu Zhang, Hongwei Shi, Niwen Zhou, Falong Tan, Xu Guo

In this paper, we introduce a quantile Generalized Measure of Correlation (GMC) to describe nonlinear quantile relationship between response variable and predictors. The introduced correlation takes values between zero and one. It is zero if and only if the conditional quantile function is equal to the unconditional quantile. We also introduce a quantile partial Generalized Measure of Correlation. Estimators of these correlations are developed. Notably by adopting machine learning methods, our estimation procedures allow the dimension of predictors very large. Under mild conditions, we establish the estimators’ consistency. For construction of confidence interval, we adopt sample splitting and show that the corresponding estimators are asymptotic normal. We also consider composite quantile GMC by integrating information from different quantile levels. Numerical studies are conducted to illustrate our methods. Moreover, we apply our methods to analyze genome-wide association study data from Carworth Farms White mice.

在本文中，我们引入了量级广义相关量（GMC）来描述响应变量与预测变量之间的非线性量级关系。引入的相关性取值介于 0 和 1 之间。当且仅当条件量值函数等于无条件量值时，它才为零。我们还引入了量级部分广义相关度。我们还开发了这些相关性的估算器。值得注意的是，通过采用机器学习方法，我们的估计程序允许预测因子的维度非常大。在温和条件下，我们建立了估计器的一致性。为了构建置信区间，我们采用了样本分割，并证明了相应的估计值是渐近正态的。我们还考虑了通过整合不同量级信息的复合量级 GMC。我们通过数值研究来说明我们的方法。此外，我们还应用我们的方法分析了来自卡沃斯农场白鼠的全基因组关联研究数据。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Statistics and Computing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀