Biometrika最新文献

英文中文

Propensity Scores in the Design of Observational Studies for Causal Effects 因果效应观察研究设计中的倾向性得分

IF 2.7 2区数学 Q2 BIOLOGY

Biometrika

Pub Date : 2022-09-28 DOI: 10.1093/biomet/asac054

P. Rosenbaum, D. Rubin

The design of any study, whether experimental or observational, that is intended to estimate the causal effects of a treatment condition relative to a control condition, refers to those activities that precede any examination of outcome variables. As defined in our 1983 article (Rosenbaum & Rubin, 1983), the propensity score is the unit-level conditional probability of assignment to treatment versus control given the observed covariates; so, the propensity score explicitly does not involve any outcome variables, in contrast to other summaries of variables sometimes used in observational studies. Balancing the distributions of covariates in the treatment and control groups by matching or balancing on the propensity score is therefore an aspect of the design of the observational study. In this invited comment on our 1983 article, we review the situation in the early 1980’s, and we recall some apparent paradoxes that propensity scores helped to resolve. We demonstrate that it is possible to balance an enormous number of low-dimensional summaries of a high-dimensional covariate, even though it is generally impossible to match individuals closely for all of the components of a high-dimensional covariate. In a sense, there is only one crucial observed covariate, the propensity score, and there is one crucial unobserved covariate, the ‘principal unobserved covariate’. The propensity score and the principal unobserved covariate are equal when treatment assignment is strongly ignorable, that is, unconfounded. Controlling for observed covariates is a prelude to the crucial step from association to causation, the step that addresses potential biases from unmeasured covariates. The design of an observational study also prepares for the step to causation: by selecting comparisons to increase the design sensitivity, by seeking opportunities to detect bias, by seeking mutually supportive evidence affected by different biases, by incorporating quasi-experimental devices such as multiple control groups, and by including the economist’s instruments. All of these considerations reflect the formal development of sensitivity analyses that were largely informal prior to the 1980s.

任何研究的设计，无论是实验性的还是观察性的，旨在估计治疗条件相对于对照条件的因果影响，都是指在检查结果变量之前的那些活动。正如我们1983年的文章（Rosenbaum&Rubin，1983）中所定义的，倾向得分是在观察到的协变量的情况下，分配给治疗与控制的单位水平条件概率；因此，与观察性研究中有时使用的其他变量汇总相比，倾向评分明确不涉及任何结果变量。因此，通过匹配或平衡倾向得分来平衡治疗组和对照组中协变量的分布是观察性研究设计的一个方面。在这篇受邀对我们1983年的文章发表的评论中，我们回顾了20世纪80年代初的情况，并回顾了倾向得分帮助解决的一些明显的悖论。我们证明了平衡高维协变量的大量低维摘要是可能的，尽管通常不可能为高维协变的所有成分密切匹配个体。从某种意义上说，只有一个关键的观察到的协变量，即倾向得分，还有一个重要的未观察到的协变量，即“主要未观察到协变量”。当治疗分配是强可忽略的，即不成立时，倾向得分和主要未观察协变量是相等的。控制观察到的协变量是从关联到因果关系的关键步骤的前奏，这一步骤解决了未测量协变量的潜在偏差。观察性研究的设计也为因果关系的步骤做了准备：通过选择比较来提高设计灵敏度，通过寻找发现偏见的机会，通过寻求受不同偏见影响的相互支持的证据，通过结合准实验装置，如多个对照组，以及通过纳入经济学家的工具。所有这些考虑都反映了敏感性分析的正式发展，在20世纪80年代之前，敏感性分析基本上是非正式的。

{"title":"Propensity Scores in the Design of Observational Studies for Causal Effects","authors":"P. Rosenbaum, D. Rubin","doi":"10.1093/biomet/asac054","DOIUrl":"https://doi.org/10.1093/biomet/asac054","url":null,"abstract":"\u0000 The design of any study, whether experimental or observational, that is intended to estimate the causal effects of a treatment condition relative to a control condition, refers to those activities that precede any examination of outcome variables. As defined in our 1983 article (Rosenbaum & Rubin, 1983), the propensity score is the unit-level conditional probability of assignment to treatment versus control given the observed covariates; so, the propensity score explicitly does not involve any outcome variables, in contrast to other summaries of variables sometimes used in observational studies. Balancing the distributions of covariates in the treatment and control groups by matching or balancing on the propensity score is therefore an aspect of the design of the observational study. In this invited comment on our 1983 article, we review the situation in the early 1980’s, and we recall some apparent paradoxes that propensity scores helped to resolve. We demonstrate that it is possible to balance an enormous number of low-dimensional summaries of a high-dimensional covariate, even though it is generally impossible to match individuals closely for all of the components of a high-dimensional covariate. In a sense, there is only one crucial observed covariate, the propensity score, and there is one crucial unobserved covariate, the ‘principal unobserved covariate’. The propensity score and the principal unobserved covariate are equal when treatment assignment is strongly ignorable, that is, unconfounded. Controlling for observed covariates is a prelude to the crucial step from association to causation, the step that addresses potential biases from unmeasured covariates. The design of an observational study also prepares for the step to causation: by selecting comparisons to increase the design sensitivity, by seeking opportunities to detect bias, by seeking mutually supportive evidence affected by different biases, by incorporating quasi-experimental devices such as multiple control groups, and by including the economist’s instruments. All of these considerations reflect the formal development of sensitivity analyses that were largely informal prior to the 1980s.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":" ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2022-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47408529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Correction to: ‘Valid sequential inference on probability forecast performance’ 更正:“对概率预测性能的有效顺序推断”

IF 2.7 2区数学 Q2 BIOLOGY

Biometrika

Pub Date : 2022-09-13 DOI: 10.1093/biomet/asac043

A. Henzi, Johanna F. Ziegel

引用次数: 1

Searching for robust associations with a multi-environment knockoff filter. 利用多环境山寨过滤器搜索稳健关联。

IF 2.4 2区数学 Q2 BIOLOGY

Biometrika

Pub Date : 2022-09-01 Epub Date: 2021-11-02 DOI: 10.1093/biomet/asab055

S Li, M Sesia, Y Romano, E Candès, C Sabatti

This paper develops a method based on model-X knockoffs to find conditional associations that are consistent across environments, controlling the false discovery rate. The motivation for this problem is that large data sets may contain numerous associations that are statistically significant and yet misleading, as they are induced by confounders or sampling imperfections. However, associations replicated under different conditions may be more interesting. In fact, consistency sometimes provably leads to valid causal inferences even if conditional associations do not. While the proposed method is widely applicable, this paper highlights its relevance to genome-wide association studies, in which robustness across populations with diverse ancestries mitigates confounding due to unmeasured variants. The effectiveness of this approach is demonstrated by simulations and applications to the UK Biobank data.

本文开发了一种基于 X 模型山寨版的方法，用于寻找跨环境一致的条件关联，同时控制误发现率。提出这个问题的动机是，大型数据集可能包含许多在统计上有意义但却具有误导性的关联，因为它们是由混杂因素或抽样缺陷引起的。然而，在不同条件下复制的关联可能更有趣。事实上，即使条件性关联不成立，有时一致性也能证明因果推论是成立的。虽然提出的方法适用范围很广，但本文强调了它与全基因组关联研究的相关性，在这种研究中，不同血统人群之间的稳健性可减轻由于未测量变异引起的混杂。本文通过对英国生物库数据的模拟和应用，证明了这种方法的有效性。

引用次数: 0

On the relative efficiency of the intent-to-treat Wilcoxon-Mann-Whitney test in the presence of noncompliance. 在存在不依从性的情况下，治疗意图测试的相对效率。

IF 2.7 2区数学 Q2 BIOLOGY

Biometrika

Pub Date : 2022-09-01 DOI: 10.1093/biomet/asab053

Lu Mao

A general framework is set up to study the asymptotic properties of the intent-to-treat Wilcoxon-Mann-Whitney test in randomized experiments with nonignorable noncompliance. Under location-shift alternatives, the Pitman efficiencies of the intent-to-treat Wilcoxon-Mann-Whitney and [Formula: see text] tests are derived. It is shown that the former is superior if the compliers are more likely to be found in high-density regions of the outcome distribution or, equivalently, if the noncompliers tend to reside in the tails. By logical extension, the relative efficiency of the two tests is sharply bounded by least and most favourable scenarios in which the compliers are segregated into regions of lowest and highest density, respectively. Such bounds can be derived analytically as a function of the compliance rate for common location families such as Gaussian, Laplace, logistic and [Formula: see text] distributions. These results can help empirical researchers choose the more efficient test for existing data, and calculate sample size for future trials in anticipation of noncompliance. Results for nonadditive alternatives and other tests follow along similar lines.

建立了一个一般框架来研究不可忽略非依从性随机实验中治疗意向Wilcoxon-Mann-Whitney检验的渐进性质。在位置转移替代方案下，推导出治疗意向威尔考克森-曼-惠特尼检验和[公式:见文本]检验的皮特曼效率。结果表明，如果编译者更有可能出现在结果分布的高密度区域，或者同样地，如果非编译者倾向于驻留在尾部，则前者更优越。从逻辑上说，这两种测试的相对效率受到最不利和最有利的情况的严格限制，在这种情况下，编译器分别被划分为密度最低和最高的区域。这种边界可以解析地推导为常见位置族(如高斯分布、拉普拉斯分布、逻辑分布和[公式:见文本]分布)顺应率的函数。这些结果可以帮助实证研究人员对现有数据选择更有效的测试，并在预期不合规的情况下计算未来试验的样本量。非添加剂替代品和其他测试的结果也遵循类似的路线。

{"title":"On the relative efficiency of the intent-to-treat Wilcoxon-Mann-Whitney test in the presence of noncompliance.","authors":"Lu Mao","doi":"10.1093/biomet/asab053","DOIUrl":"https://doi.org/10.1093/biomet/asab053","url":null,"abstract":"A general framework is set up to study the asymptotic properties of the intent-to-treat Wilcoxon-Mann-Whitney test in randomized experiments with nonignorable noncompliance. Under location-shift alternatives, the Pitman efficiencies of the intent-to-treat Wilcoxon-Mann-Whitney and [Formula: see text] tests are derived. It is shown that the former is superior if the compliers are more likely to be found in high-density regions of the outcome distribution or, equivalently, if the noncompliers tend to reside in the tails. By logical extension, the relative efficiency of the two tests is sharply bounded by least and most favourable scenarios in which the compliers are segregated into regions of lowest and highest density, respectively. Such bounds can be derived analytically as a function of the compliance rate for common location families such as Gaussian, Laplace, logistic and [Formula: see text] distributions. These results can help empirical researchers choose the more efficient test for existing data, and calculate sample size for future trials in anticipation of noncompliance. Results for nonadditive alternatives and other tests follow along similar lines.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"109 3","pages":"873-880"},"PeriodicalIF":2.7,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9401868/pdf/asab053.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10487820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Treatment Effect Quantiles in Stratified Randomized Experiments and Matched Observational Studies 分层随机实验和配对观察研究中的治疗效果量化

IF 2.7 2区数学 Q2 BIOLOGY

Biometrika

Pub Date : 2022-08-24 DOI: 10.1093/biomet/asad030

Yongchang Su, Xinran Li

Evaluating the treatment effect has become an important topic for many applications. However, most existing literature focuses mainly on average treatment effects. When the individual effects are heavy-tailed or have outlier values, not only may the average effect not be appropriate for summarizing treatment effects, but also the conventional inference for it can be sensitive and possibly invalid due to poor large-sample approximations. In this paper we focus on quantiles of individual treatment effects, which can be more robust in the presence of extreme individual effects. Moreover, our inference for them is purely randomization-based, avoiding any distributional assumptions on the units. We first consider inference in stratified randomized experiments, extending the recent work by? Caughey et al. (2021). We show that the computation of valid p-values for testing null hypotheses on quantiles of individual effects can be transformed into instances of the multiple-choice knapsack problem, which can be efficiently solved exactly or slightly conservatively. We then extend our approach to matched observational studies and propose a sensitivity analysis to investigate to what extent our inference on quantiles of individual effects is robust to unmeasured confounding. The proposed randomization inference and sensitivity analysis are simultaneously valid for all quantiles of individual effects, noting that the analysis for the maximum or minimum individual effect coincides with the conventional analysis assuming constant treatment effects.

评价处理效果已成为许多应用领域的重要课题。然而，大多数现有文献主要集中在平均治疗效果上。当个体效应是重尾效应或有离群值时，不仅平均效应可能不适合用于总结治疗效果，而且由于大样本近似性差，常规推断可能是敏感的，并且可能无效。在本文中，我们关注个体治疗效果的分位数，在存在极端个体效应的情况下，它可能更稳健。此外，我们对它们的推断是纯粹基于随机化的，避免了对单位的任何分布假设。我们首先考虑分层随机实验中的推理，将最近的工作扩展到?Caughey et al.(2021)。我们证明了检验个体效应分位数上的零假设的有效p值的计算可以转化为多项选择背包问题的实例，该问题可以有效地精确或稍微保守地解决。然后，我们将我们的方法扩展到匹配的观察性研究，并提出敏感性分析，以调查我们对个体效应分位数的推断在多大程度上对未测量的混淆是稳健的。所提出的随机化推理和敏感性分析对个体效应的所有分位数同时有效，注意到对最大或最小个体效应的分析与假设恒定治疗效应的传统分析一致。

{"title":"Treatment Effect Quantiles in Stratified Randomized Experiments and Matched Observational Studies","authors":"Yongchang Su, Xinran Li","doi":"10.1093/biomet/asad030","DOIUrl":"https://doi.org/10.1093/biomet/asad030","url":null,"abstract":"\u0000 Evaluating the treatment effect has become an important topic for many applications. However, most existing literature focuses mainly on average treatment effects. When the individual effects are heavy-tailed or have outlier values, not only may the average effect not be appropriate for summarizing treatment effects, but also the conventional inference for it can be sensitive and possibly invalid due to poor large-sample approximations. In this paper we focus on quantiles of individual treatment effects, which can be more robust in the presence of extreme individual effects. Moreover, our inference for them is purely randomization-based, avoiding any distributional assumptions on the units. We first consider inference in stratified randomized experiments, extending the recent work by? Caughey et al. (2021). We show that the computation of valid p-values for testing null hypotheses on quantiles of individual effects can be transformed into instances of the multiple-choice knapsack problem, which can be efficiently solved exactly or slightly conservatively. We then extend our approach to matched observational studies and propose a sensitivity analysis to investigate to what extent our inference on quantiles of individual effects is robust to unmeasured confounding. The proposed randomization inference and sensitivity analysis are simultaneously valid for all quantiles of individual effects, noting that the analysis for the maximum or minimum individual effect coincides with the conventional analysis assuming constant treatment effects.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":" ","pages":""},"PeriodicalIF":2.7,"publicationDate":"2022-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48490717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A multiplicative structural nested mean model for zero-inflated outcomes. 零膨胀结果的乘法结构嵌套均值模型。

IF 2.7 2区数学 Q2 BIOLOGY

Biometrika

Pub Date : 2022-08-19 eCollection Date: 2023-06-01 DOI: 10.1093/biomet/asac050

Miao Yu, Wenbin Lu, Shu Yang, Pulak Ghosh

Zero-inflated nonnegative outcomes are common in many applications. In this work, motivated by freemium mobile game data, we propose a class of multiplicative structural nested mean models for zero-inflated nonnegative outcomes which flexibly describes the joint effect of a sequence of treatments in the presence of time-varying confounders. The proposed estimator solves a doubly robust estimating equation, where the nuisance functions, namely the propensity score and conditional outcome means given confounders, are estimated parametrically or nonparametrically. To improve the accuracy, we leverage the characteristic of zero-inflated outcomes by estimating the conditional means in two parts, that is, separately modelling the probability of having positive outcomes given confounders, and the mean outcome conditional on its being positive and given the confounders. We show that the proposed estimator is consistent and asymptotically normal as either the sample size or the follow-up time goes to infinity. Moreover, the typical sandwich formula can be used to estimate the variance of treatment effect estimators consistently, without accounting for the variation due to estimating nuisance functions. Simulation studies and an application to a freemium mobile game dataset are presented to demonstrate the empirical performance of the proposed method and support our theoretical findings.

零膨胀非负结果在许多应用中都很常见。在这项工作中，受免费移动游戏数据的启发，我们提出了一类适用于零膨胀非负结果的乘法结构嵌套均值模型，该模型可以灵活地描述在存在时变混杂因素的情况下一系列处理的联合效应。所提出的估计方法求解了一个双重稳健估计方程，其中的滋扰函数，即倾向得分和给定混杂因素的条件结果均值，可以进行参数估计或非参数估计。为了提高准确性，我们利用了零膨胀结果的特点，将条件均值分为两部分进行估计，即分别模拟给定混杂因素的正结果概率，以及给定混杂因素的正结果条件下的平均结果。我们的研究表明，当样本量或随访时间达到无穷大时，所提出的估计值是一致的，而且渐近正态。此外，典型的三明治公式可用于一致估计治疗效果估计值的方差，而无需考虑因估计滋扰函数而产生的变化。本文介绍了模拟研究和免费手机游戏数据集的应用，以证明所提方法的经验性能，并支持我们的理论发现。

{"title":"A multiplicative structural nested mean model for zero-inflated outcomes.","authors":"Miao Yu, Wenbin Lu, Shu Yang, Pulak Ghosh","doi":"10.1093/biomet/asac050","DOIUrl":"10.1093/biomet/asac050","url":null,"abstract":"Zero-inflated nonnegative outcomes are common in many applications. In this work, motivated by freemium mobile game data, we propose a class of multiplicative structural nested mean models for zero-inflated nonnegative outcomes which flexibly describes the joint effect of a sequence of treatments in the presence of time-varying confounders. The proposed estimator solves a doubly robust estimating equation, where the nuisance functions, namely the propensity score and conditional outcome means given confounders, are estimated parametrically or nonparametrically. To improve the accuracy, we leverage the characteristic of zero-inflated outcomes by estimating the conditional means in two parts, that is, separately modelling the probability of having positive outcomes given confounders, and the mean outcome conditional on its being positive and given the confounders. We show that the proposed estimator is consistent and asymptotically normal as either the sample size or the follow-up time goes to infinity. Moreover, the typical sandwich formula can be used to estimate the variance of treatment effect estimators consistently, without accounting for the variation due to estimating nuisance functions. Simulation studies and an application to a freemium mobile game dataset are presented to demonstrate the empirical performance of the proposed method and support our theoretical findings.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"110 2","pages":"519-536"},"PeriodicalIF":2.7,"publicationDate":"2022-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10183836/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9841636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Characterizing M-estimators 描述M-estimators

IF 2.7 2区数学 Q2 BIOLOGY

Biometrika

Pub Date : 2022-08-17 DOI: 10.1093/biomet/asad026

Timo Dimitriadis, Tobias Fissler, Johanna F. Ziegel

We characterize the full classes of M-estimators for semiparametric models of general functionals by formally connecting the theory of consistent loss functions from forecast evaluation with the theory of M-estimation. This novel characterization result allows us to leverage existing results on loss functions known from the literature on forecast evaluation in estimation theory. We exemplify advantageous implications for the fields of robust, efficient, equivariant and Pareto-optimal M-estimation.

通过将预测评估的一致损失函数理论与M-估计理论形式化地联系起来，我们刻画了一般泛函的半参数模型的全类M-估计量。这一新颖的表征结果使我们能够利用现有的损失函数结果，这些损失函数是从估计理论中的预测评估文献中已知的。我们举例说明了稳健、有效、等变和帕累托最优M-估计领域的有利含义。

引用次数: 5

Multi-stage optimal dynamic treatment regimes for survival outcomes with dependent censoring. 具有依赖性普查的生存结果的多阶段最佳动态治疗制度。

IF 2.7 2区数学 Q2 BIOLOGY

Biometrika

Pub Date : 2022-08-13 eCollection Date: 2023-06-01 DOI: 10.1093/biomet/asac047

Hunyong Cho, Shannon T Holloway, David J Couper, Michael R Kosorok

We propose a reinforcement learning method for estimating an optimal dynamic treatment regime for survival outcomes with dependent censoring. The estimator allows the failure time to be conditionally independent of censoring and dependent on the treatment decision times, supports a flexible number of treatment arms and treatment stages, and can maximize either the mean survival time or the survival probability at a certain time-point. The estimator is constructed using generalized random survival forests and can have polynomial rates of convergence. Simulations and analysis of the Atherosclerosis Risk in Communities study data suggest that the new estimator brings higher expected outcomes than existing methods in various settings.

我们提出了一种强化学习方法，用于估计具有依赖性普查的生存结果的最佳动态治疗机制。该估计方法允许失败时间有条件地独立于普查并依赖于治疗决策时间，支持灵活的治疗臂和治疗阶段数量，并能最大化平均生存时间或特定时间点的生存概率。该估计器使用广义随机生存林构建，收敛速率为多项式。对 "社区动脉粥样硬化风险 "研究数据的模拟和分析表明，在各种情况下，新的估计器比现有方法带来更高的预期结果。

引用次数: 0

Optimal Row-Column Designs 最佳行列设计

IF 2.7 2区数学 Q2 BIOLOGY

Biometrika

Pub Date : 2022-08-10 DOI: 10.1093/biomet/asac046

Zheng Zhou, Yongdao Zhou

Row-column designs have been widely used in experiments involving double confounding. Among them, one that provides unconfounded estimation of all main effects and as many two-factor interactions as possible is preferred, and is called optimal. Most current work focuses on the construction of two-level row-column designs, while the corresponding optimality theory has been largely ignored. Moreover, most constructed designs contain at least one replicate of a full factorial design, which are not flexible as the number of factors increases. In this study, a theoretical framework is built up to evaluate the optimality of row-column designs with prime level. A method for constructing optimal row-column designs with prime level is proposed. Subsequently, optimal full factorial three-level row-column designs are constructed for any parameter combination. Optimal fractional factorial two-level and three-level row-column designs are also constructed for cost-saving.

行-列设计已被广泛用于涉及双重混杂的实验中。其中，对所有主要影响和尽可能多的双因素相互作用提供无条件估计的方法是优选的，并被称为最优方法。目前的大多数工作都集中在两层行列设计的构建上，而相应的最优性理论在很大程度上被忽视了。此外，大多数构建的设计至少包含一个全因子设计的副本，随着因子数量的增加，这是不灵活的。在本研究中，建立了一个理论框架来评估具有素数水平的行-列设计的最优性。提出了一种构造具有素数级的最优行列设计的方法。随后，对于任何参数组合，构造了最优的全阶乘三电平行列设计。为了节省成本，还构造了最优的分数因子两级和三级行列设计。

引用次数: 0

Gradient-based sparse principal component analysis with extensions to online learning. 基于梯度的稀疏主成分分析，并扩展到在线学习。

IF 2.7 2区数学 Q2 BIOLOGY

Biometrika

Pub Date : 2022-07-12 eCollection Date: 2023-06-01 DOI: 10.1093/biomet/asac041

Yixuan Qiu, Jing Lei, Kathryn Roeder

Sparse principal component analysis is an important technique for simultaneous dimensionality reduction and variable selection with high-dimensional data. In this work we combine the unique geometric structure of the sparse principal component analysis problem with recent advances in convex optimization to develop novel gradient-based sparse principal component analysis algorithms. These algorithms enjoy the same global convergence guarantee as the original alternating direction method of multipliers, and can be more efficiently implemented with the rich toolbox developed for gradient methods from the deep learning literature. Most notably, these gradient-based algorithms can be combined with stochastic gradient descent methods to produce efficient online sparse principal component analysis algorithms with provable numerical and statistical performance guarantees. The practical performance and usefulness of the new algorithms are demonstrated in various simulation studies. As an application, we show how the scalability and statistical accuracy of our method enable us to find interesting functional gene groups in high-dimensional RNA sequencing data.

稀疏主成分分析是同时对高维数据进行降维和变量选择的重要技术。在这项工作中，我们将稀疏主成分分析问题的独特几何结构与凸优化的最新进展相结合，开发了新颖的基于梯度的稀疏主成分分析算法。这些算法与原始的交替方向乘法一样，具有全局收敛性保证，而且可以利用深度学习文献中为梯度方法开发的丰富工具箱更高效地实现。最值得注意的是，这些基于梯度的算法可以与随机梯度下降方法相结合，产生高效的在线稀疏主成分分析算法，并具有可证明的数值和统计性能保证。各种模拟研究证明了新算法的实际性能和实用性。作为一项应用，我们展示了我们方法的可扩展性和统计准确性如何使我们能够在高维 RNA 测序数据中找到有趣的功能基因组。

{"title":"Gradient-based sparse principal component analysis with extensions to online learning.","authors":"Yixuan Qiu, Jing Lei, Kathryn Roeder","doi":"10.1093/biomet/asac041","DOIUrl":"10.1093/biomet/asac041","url":null,"abstract":"Sparse principal component analysis is an important technique for simultaneous dimensionality reduction and variable selection with high-dimensional data. In this work we combine the unique geometric structure of the sparse principal component analysis problem with recent advances in convex optimization to develop novel gradient-based sparse principal component analysis algorithms. These algorithms enjoy the same global convergence guarantee as the original alternating direction method of multipliers, and can be more efficiently implemented with the rich toolbox developed for gradient methods from the deep learning literature. Most notably, these gradient-based algorithms can be combined with stochastic gradient descent methods to produce efficient online sparse principal component analysis algorithms with provable numerical and statistical performance guarantees. The practical performance and usefulness of the new algorithms are demonstrated in various simulation studies. As an application, we show how the scalability and statistical accuracy of our method enable us to find interesting functional gene groups in high-dimensional RNA sequencing data.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"110 2","pages":"339-360"},"PeriodicalIF":2.7,"publicationDate":"2022-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10183835/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9841634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Biometrika

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀