Statistics and Its Interface最新文献

英文中文

Variable selection for doubly robust causal inference. 双稳健因果推理的变量选择。

IF 0.7 4区数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics and Its Interface

Pub Date : 2025-01-01 Epub Date: 2024-10-22 DOI: 10.4310/sii.241023040813

Eunah Cho, Shu Yang

Confounding control is crucial and yet challenging for causal inference based on observational studies. Under the typical unconfoundness assumption, augmented inverse probability weighting (AIPW) has been popular for estimating the average causal effect (ACE) due to its double robustness in the sense it relies on either the propensity score model or the outcome mean model to be correctly specified. To ensure the key assumption holds, the effort is often made to collect a sufficiently rich set of pretreatment variables, rendering variable selection imperative. It is well known that variable selection for the propensity score targeted for accurate prediction may produce a variable ACE estimator by including the instrument variables. Thus, many recent works recommend selecting all outcome predictors for both confounding control and efficient estimation. This article shows that the AIPW estimator with variable selection targeted for efficient estimation may lose the desirable double robustness property. Instead, we propose controlling the propensity score model for any covariate that is a predictor of either the treatment or the outcome or both, which preserves the double robustness of the AIPW estimator. Using this principle, we propose a two-stage procedure with penalization for variable selection and the AIPW estimator for estimation. We show the proposed procedure benefits from the desirable double robustness property. We evaluate the finite-sample performance of the AIPW estimator with various variable selection criteria through simulation and an application.

对于基于观察性研究的因果推理，混淆控制是至关重要的，但也是具有挑战性的。在典型的无混杂假设下，增广逆概率加权（AIPW）由于其双重稳健性而被广泛用于估计平均因果效应（ACE），即它既依赖于倾向得分模型，也依赖于结果均值模型来正确指定。为了确保关键假设成立，通常需要努力收集足够丰富的预处理变量集，从而使变量选择势在必行。众所周知，为了准确预测倾向性评分的变量选择可能会通过包括工具变量来产生变量ACE估计量。因此，许多最近的工作建议选择所有的结果预测因子，以进行混淆控制和有效估计。本文表明，以有效估计为目标的变量选择AIPW估计器可能会失去理想的双鲁棒性。相反，我们建议控制任何协变量的倾向评分模型，这些协变量是治疗或结果或两者的预测因子，这保留了AIPW估计器的双重稳健性。利用这一原理，我们提出了一个两阶段的过程，其中变量选择的惩罚和估计的AIPW估计器。结果表明，该方法具有良好的双鲁棒性。通过仿真和应用，对AIPW估计器在不同变量选择条件下的有限样本性能进行了评价。

{"title":"Variable selection for doubly robust causal inference.","authors":"Eunah Cho, Shu Yang","doi":"10.4310/sii.241023040813","DOIUrl":"10.4310/sii.241023040813","url":null,"abstract":"<p><p>Confounding control is crucial and yet challenging for causal inference based on observational studies. Under the typical unconfoundness assumption, augmented inverse probability weighting (AIPW) has been popular for estimating the average causal effect (ACE) due to its double robustness in the sense it relies on either the propensity score model or the outcome mean model to be correctly specified. To ensure the key assumption holds, the effort is often made to collect a sufficiently rich set of pretreatment variables, rendering variable selection imperative. It is well known that variable selection for the propensity score targeted for accurate prediction may produce a variable ACE estimator by including the instrument variables. Thus, many recent works recommend selecting all outcome predictors for both confounding control and efficient estimation. This article shows that the AIPW estimator with variable selection targeted for efficient estimation may lose the desirable double robustness property. Instead, we propose controlling the propensity score model for any covariate that is a predictor of either the treatment or the outcome or both, which preserves the double robustness of the AIPW estimator. Using this principle, we propose a two-stage procedure with penalization for variable selection and the AIPW estimator for estimation. We show the proposed procedure benefits from the desirable double robustness property. We evaluate the finite-sample performance of the AIPW estimator with various variable selection criteria through simulation and an application.</p>","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"18 1","pages":"93-105"},"PeriodicalIF":0.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12395465/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144977781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Composite quantile regression based robust empirical likelihood for partially linear spatial autoregressive models 部分线性空间自回归模型的基于稳健经验似然法的复合量化回归

IF 0.8 4区数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics and Its Interface

Pub Date : 2024-07-19 DOI: 10.4310/22-sii764

Peixin Zhao, Suli Cheng, Xiaoshuang Zhou

In this paper, we consider the robust estimation for a class of partially linear spatial autoregressive models. By combining empirical likelihood and composite quantile regression methods, we propose a robust empirical likelihood estimation procedure. Under some regularity conditions, the proposed empirical log-likelihood ratio is proved to be asymptotically chi-squared, and the convergence rate of the estimator for nonparametric component is also derived. Some simulation analyses are conducted for further illustrating the performance of the proposed method, and simulation results show that the proposed method is more robust.

本文考虑对一类部分线性空间自回归模型进行稳健估计。通过结合经验似然法和复合量化回归法，我们提出了一种稳健的经验似然估计程序。在一些规则性条件下，证明了所提出的经验对数似然比是渐近奇平方的，并推导出了非参数成分估计器的收敛率。为了进一步说明所提方法的性能，还进行了一些仿真分析，仿真结果表明所提方法更加稳健。

引用次数: 0

A consistent specification test for functional linear quantile regression models 功能线性量回归模型的一致性规范检验

IF 0.8 4区数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics and Its Interface

Pub Date : 2024-07-19 DOI: 10.4310/22-sii754

Lili Xia, Zhongzhan Zhang, Gongming Shi

This paper is focused on the specification test of functional linear quantile regression models. A nonparametric test statistic is proposed based on the orthogonality of residual and its conditional expectation. It is proved with mild assumptions that the proposed statistic follows asymptotically the standard normal distribution under the null hypothesis, but tends to infinity under alternative hypothesis. The asymptotic power of the test is also presented for some local alternative hypotheses. The test is easy to implement, and is shown by simulations powerful even for small sample sizes. A real data example with the Capital Bikeshare data is presented for illustration.

本文主要研究函数线性量回归模型的规格检验。根据残差的正交性及其条件期望，提出了一种非参数检验统计量。在温和的假设条件下，证明了所提出的统计量在零假设条件下近似服从标准正态分布，但在备择假设条件下则趋于无穷大。对于一些局部替代假设，还给出了检验的渐近功率。该检验易于实现，即使样本量较小，通过模拟也能显示出其强大的功能。为了便于说明，我们还提供了一个使用 Capital Bikeshare 数据的真实数据示例。

引用次数: 0

A latent class selection model for categorical response variables with nonignorably missing data 具有非明显缺失数据的分类响应变量的潜类选择模型

IF 0.8 4区数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics and Its Interface

Pub Date : 2024-07-19 DOI: 10.4310/22-sii753

Jung Wun Lee, Ofer Harel

We develop a new selection model for nonignorable missing values in multivariate categorical response variables by assuming that the response variables and their missingness can be summarized into categorical latent variables. Our proposed model contains two categorical latent variables. One latent variable summarizes the response patterns while the other describes the response variables’ missingness. Our selection model is an alternative method to other incomplete data methods when the incomplete data mechanism is nonignorable. We implement simulation studies to evaluate the performance of the proposed method and analyze the General Social Survey 2018 data to demonstrate its performance.

我们假定响应变量及其缺失性可以归纳为分类潜变量，从而为多元分类响应变量中的不可忽略缺失值建立了一个新的选择模型。我们提出的模型包含两个分类潜变量。一个潜变量概括了响应模式，另一个则描述了响应变量的缺失情况。当不完全数据机制不可忽略时，我们的选择模型是其他不完全数据方法的替代方法。我们实施了模拟研究来评估所提出方法的性能，并分析了 2018 年一般社会调查数据来证明其性能。

引用次数: 0

Empirical likelihood-based weighted estimation of average treatment effects in randomized clinical trials with missing outcomes 在结果缺失的随机临床试验中，基于经验似然法对平均治疗效果进行加权估计

IF 0.8 4区数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics and Its Interface

Pub Date : 2024-07-19 DOI: 10.4310/sii.2024.v17.n4.a7

Yuanyao Tan, Xialing Wen, Wei Liang, Ying Yan

There has been growing attention on covariate adjustment for treatment effect estimation in an objective and efficient manner in randomized clinical trials. In this paper, we propose a weighting approach to extract covariate information based on the empirical likelihood method for the randomized clinical trials with possible missingness in the outcomes. Multiple regression models are imposed to delineate the missing data mechanism and the covariate-outcome relationship, respectively. We demonstrate that the proposed estimator is suitable for objective inference of treatment effects. Theoretically, we prove that the proposed approach is multiply robust and semiparametrically efficient. We conduct simulations and a real data study to make comparisons with other existing methods.

人们越来越关注在随机临床试验中以客观、高效的方式对治疗效果进行估计的协变量调整。本文提出了一种基于经验似然法的加权方法，用于提取随机临床试验中可能存在的结果缺失的协变量信息。我们采用多元回归模型分别描述了数据缺失机制和协方差-结果关系。我们证明了所提出的估计方法适用于治疗效果的客观推断。从理论上讲，我们证明了所提出的方法具有多重稳健性和半参数效率。我们还进行了模拟和真实数据研究，以便与其他现有方法进行比较。

引用次数: 0

Modeling and identifiability of non-homogenous Poisson process cure rate model 非均质泊松过程治愈率模型的建模和可识别性

IF 0.8 4区数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics and Its Interface

Pub Date : 2024-07-19 DOI: 10.4310/22-sii763

Soorya Surendren, Asha Gopalakrishnan, Anup Dewanji

The promotion time cure models or bounded cumulative hazards model (BCH) was proposed as an alternative to the mixture cure models. In the present paper, this model is modified to provide a class of cure rate models based on a non-homogeneous Poisson process (NHPP). The properties of this class are studied. Also, when censored observations are present, distinguishing censored individuals from the cured group lead to identifiability issues in the members of this class. These identifiability issues are investigated and finally few members of this class are provided. Simulation results using an example of the NHPP cure rate model with exponentiated intensity and exponential baseline is supplemented. The application of the model is illustrated using E1684 real data from a study that included 284 patients from the Eastern Cooperative Oncology Group (ECOG) phase III clinical trial.

促进时间治愈模型或有界累积危险模型（BCH）是作为混合治愈模型的替代模型而提出的。本文对该模型进行了修改，以提供一类基于非均质泊松过程（NHPP）的治愈率模型。本文对该类模型的特性进行了研究。此外，当存在剔除的观测数据时，将剔除的个体从治愈组中区分出来会导致该类模型成员的可识别性问题。我们对这些可识别性问题进行了研究，并最终提供了该类中的少数几个成员。此外，还补充了使用具有指数强度和指数基线的 NHPP 治愈率模型示例的模拟结果。利用 E1684 真实数据对模型的应用进行了说明，这些数据来自一项研究，其中包括来自东部合作肿瘤学组 (ECOG) III 期临床试验的 284 名患者。

引用次数: 0

Variable selection and estimation for high-dimensional partially linear spatial autoregressive models with measurement errors 具有测量误差的高维部分线性空间自回归模型的变量选择和估计

IF 0.8 4区数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics and Its Interface

Pub Date : 2024-07-19 DOI: 10.4310/22-sii758

Zhensheng Huang, Shuyu Meng, Linlin Zhang

In this paper, we develop a class of corrected post-model selection estimation method to identify important explanatory variables in parametric component of high-dimensional partially linear spatial autoregressive model with measurement errors. Compared with existing methods, the proposed method adds a new process of re-estimating the selected model parameters after model selection. We show that the post-model selection estimator performs at least as well as the Lasso penalty estimator by establishing some theorems of model selection and estimation properties. Extensive simulation studies not only evaluate the finite sample performance of the proposed method, but also show the superiority of the proposed method over other methods. As an empirical illustration, we apply the proposed model and method to two real data sets.

本文开发了一类修正后模型选择估计方法，用于识别具有测量误差的高维部分线性空间自回归模型参数部分的重要解释变量。与现有方法相比，本文提出的方法增加了在模型选择后重新估计所选模型参数的新过程。通过建立模型选择和估计特性的一些定理，我们证明了模型选择后估计器的性能至少与 Lasso 惩罚估计器相当。广泛的仿真研究不仅评估了所提方法的有限样本性能，而且表明了所提方法优于其他方法。作为实证说明，我们将提出的模型和方法应用于两个真实数据集。

引用次数: 0

A double regression method for graphical modeling of high-dimensional nonlinear and non-Gaussian data 高维非线性和非高斯数据图形建模的双重回归方法

IF 0.8 4区数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics and Its Interface

Pub Date : 2024-07-19 DOI: 10.4310/22-sii756

Siqi Liang, Faming Liang

Graphical models have long been studied in statistics as a tool for inferring conditional independence relationships among a large set of random variables. The most existing works in graphical modeling focus on the cases that the data are Gaussian or mixed and the variables are linearly dependent. In this paper, we propose a double regression method for learning graphical models under the high-dimensional nonlinear and non-Gaussian setting, and prove that the proposed method is consistent under mild conditions. The proposed method works by performing a series of nonparametric conditional independence tests. The conditioning set of each test is reduced via a double regression procedure where a model-free sure independence screening procedure or a sparse deep neural network can be employed. The numerical results indicate that the proposed method works well for high-dimensional nonlinear and non-Gaussian data.

图形模型作为一种推断大量随机变量之间条件独立性关系的工具，在统计学中研究已久。现有的图形建模研究大多集中在数据为高斯或混合数据以及变量为线性相关变量的情况下。在本文中，我们提出了一种在高维非线性和非高斯环境下学习图形模型的双重回归方法，并证明所提出的方法在温和条件下是一致的。所提出的方法通过执行一系列非参数条件独立性检验来实现。每个检验的条件集通过双重回归程序进行缩减，其中可以使用无模型确定独立性筛选程序或稀疏深度神经网络。数值结果表明，所提出的方法能很好地处理高维非线性和非高斯数据。

引用次数: 0

Flexible quasi-beta prime regression models for dependent continuous positive data 针对依赖性连续正数据的灵活准贝塔质回归模型

IF 0.8 4区数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics and Its Interface

Pub Date : 2024-07-19 DOI: 10.4310/22-sii762

João Freitas, Juvêncio Nobre, Caio Azevedo

In many situations of interest, it is common to observe positive responses measured along several assessment conditions, within the same subjects. Usually, such a scenario implies a positive skewness on the response distributions, along with the existence of within-subject dependency. It is known that neglecting these features can lead to a misleading inference. In this paper we extend the beta prime regression model for modeling asymmetric positive data, while taking into account the dependence structure. We consider a useful predictor for modeling a suitable transformation of the mean, along with homogeneous covariance structure. The proposed model is an interesting competitor of the flexible Tweedie regression models, which include distributions such as Gamma and Inverse Gaussian. Furthermore, residual analysis and influence diagnostic tools are proposed. A Monte Carlo experiment is conducted to evaluate the performance of the proposed methodology, under small and moderate sample sizes, along with suitable discussions. The methodology is illustrated with the analysis of a real longitudinal dataset. An R package was developed to allow the practitioners to use the methodology described in this paper.

在许多令人感兴趣的情况下，经常会观察到同一受试者在几种评估条件下测得的正反应。通常，这种情况意味着反应分布呈正偏斜，同时存在受试者内部依赖性。众所周知，忽略这些特征会导致误导性推断。在本文中，我们扩展了贝塔质数回归模型，用于对非对称正向数据建模，同时考虑了依赖结构。我们考虑了一个有用的预测因子，用于对均值的适当变换以及同质协方差结构进行建模。所提出的模型是灵活的特威迪回归模型（包括伽马分布和反高斯分布）的一个有趣的竞争对手。此外，还提出了残差分析和影响诊断工具。我们进行了蒙特卡罗实验，以评估所提方法在小样本量和中等样本量下的性能，并进行了适当的讨论。通过分析一个真实的纵向数据集，对该方法进行了说明。本文还开发了一个 R 软件包，使从业人员能够使用本文所述的方法。

{"title":"Flexible quasi-beta prime regression models for dependent continuous positive data","authors":"João Freitas, Juvêncio Nobre, Caio Azevedo","doi":"10.4310/22-sii762","DOIUrl":"https://doi.org/10.4310/22-sii762","url":null,"abstract":"In many situations of interest, it is common to observe positive responses measured along several assessment conditions, within the same subjects. Usually, such a scenario implies a positive skewness on the response distributions, along with the existence of within-subject dependency. It is known that neglecting these features can lead to a misleading inference. In this paper we extend the beta prime regression model for modeling asymmetric positive data, while taking into account the dependence structure. We consider a useful predictor for modeling a suitable transformation of the mean, along with homogeneous covariance structure. The proposed model is an interesting competitor of the flexible Tweedie regression models, which include distributions such as Gamma and Inverse Gaussian. Furthermore, residual analysis and influence diagnostic tools are proposed. A Monte Carlo experiment is conducted to evaluate the performance of the proposed methodology, under small and moderate sample sizes, along with suitable discussions. The methodology is illustrated with the analysis of a real longitudinal dataset. An R package was developed to allow the practitioners to use the methodology described in this paper.","PeriodicalId":51230,"journal":{"name":"Statistics and Its Interface","volume":"67 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141743416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Default Bayesian testing for the zero-inflated Poisson distribution 零膨胀泊松分布的默认贝叶斯测试

IF 0.8 4区数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Statistics and Its Interface

Pub Date : 2024-07-19 DOI: 10.4310/22-sii750

Yewon Han, Haewon Hwang, Hon Keung Ng, Seong Kim

In a Bayesian model selection and hypothesis testing, users should be cautious when choosing suitable prior distributions, as it is an important problem. More often than not, objective Bayesian analyses utilize noninformative priors such as Jeffreys priors. However, since these noninformative priors are often improper, the Bayes factor associated with these improper priors is not well-defined. To circumvent this indeterminate issue, the Bayes factor can be corrected by intrinsic and fractional methods. These adjusted Bayes factors are asymptotically equivalent to the ordinary Bayes factors calculated with proper priors, called intrinsic priors. In this article, we derive intrinsic priors for testing the point null hypothesis under a zero-inflated Poisson distribution. Extensive simulation studies are performed to support the theoretical results on asymptotic equivalence, and two real datasets are analyzed to illustrate the methodology developed in this paper.

在贝叶斯模型选择和假设检验中，用户应谨慎选择合适的先验分布，因为这是一个重要问题。客观的贝叶斯分析通常使用非信息先验，如 Jeffreys 先验。然而，由于这些非信息先验往往是不恰当的，因此与这些不恰当先验相关的贝叶斯因子并不明确。为了规避这个不确定的问题，可以通过本征法和分数法对贝叶斯因子进行修正。这些调整后的贝叶斯因子在渐近上等同于用适当褒义词计算的普通贝叶斯因子，称为本征褒义词。在本文中，我们推导了在零膨胀泊松分布下测试点零假设的本征先验。为了支持渐近等价的理论结果，我们进行了广泛的模拟研究，并分析了两个真实数据集，以说明本文所开发的方法。

引用次数: 0

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Statistics and Its Interface

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀