American Statistician最新文献

英文中文

Hitting a prime by rolling a die with infinitely many faces 掷一个有无限多个面的骰子来得到一个素数

IF 1.8 4区数学 Q1 Mathematics

American Statistician

Pub Date : 2023-12-01 DOI: 10.1080/00031305.2023.2290720

Shane Chern

Alon and Malinovsky recently proved that it takes on average 2.42849… rolls of fair six-sided dice until the first time the total sum of all rolls arrives at a prime. Naturally, one may extend the...

Alon和Malinovsky最近证明了平均掷2.42849…次均匀的六面骰子，直到所有掷出的总数第一次达到素数。当然，人们可以把……

引用次数: 0

Using Conformal Win Probability to Predict the Winners of the Canceled 2020 NCAA Basketball Tournaments 利用保形获胜概率预测取消的2020年NCAA篮球锦标赛的获胜者

IF 1.8 4区数学 Q1 Mathematics

American Statistician

Pub Date : 2023-11-17 DOI: 10.1080/00031305.2023.2283199

Chancellor Johnstone, Dan Nettleton

The COVID-19 pandemic was responsible for the cancellation of both the men’s and women’s 2020 National Collegiate Athletic Association (NCAA) Division I basketball tournaments. Starting from the po...

2019冠状病毒病大流行导致2020年全国大学体育协会(NCAA)男子和女子一级篮球锦标赛被取消。从po开始…

引用次数: 0

Assignment-Control Plots: A Visual Companion for Causal Inference Study Design. 赋值-对照图：因果推理研究设计的可视化伴侣。

IF 1.8 4区数学 Q1 Mathematics

American Statistician

Pub Date : 2023-01-01 Epub Date: 2022-04-11 DOI: 10.1080/00031305.2022.2051605

Rachael C Aikens, Michael Baiocchi

An important step for any causal inference study design is understanding the distribution of the subjects in terms of measured baseline covariates. However, not all baseline variation is equally important. We propose a set of visualizations that reduce the space of measured covariates into two components of baseline variation important to the design of an observational causal inference study: a propensity score summarizing baseline variation associated with treatment assignment, and prognostic score summarizing baseline variation associated with the untreated potential outcome. These assignment-control plots and variations thereof visualize study design trade-offs and illustrate core methodological concepts in causal inference. As a practical demonstration, we apply assignment-control plots to a hypothetical study of cardiothoracic surgery. To demonstrate how these plots can be used to illustrate nuanced concepts, we use them to visualize unmeasured confounding and to consider the relationship between propensity scores and instrumental variables. While the family of visualization tools for studies of causality is relatively sparse, simple visual tools can be an asset to education, application, and methods development.

任何因果推理研究设计的一个重要步骤都是了解受试者在测量基线协变量方面的分布情况。然而，并非所有基线变化都同样重要。我们提出了一套可视化方法，将测量协变量的空间缩小为对观察性因果推理研究设计非常重要的基线变异的两个组成部分：概括与治疗分配相关的基线变异的倾向得分，以及概括与未治疗的潜在结果相关的基线变异的预后得分。这些分配控制图及其变体直观地反映了研究设计的权衡，并说明了因果推断的核心方法概念。作为实际演示，我们将赋值对照图应用于一项假设的心胸外科研究。为了展示这些图如何用于说明细微的概念，我们用它们来直观地说明未测量的混杂因素，并考虑倾向评分与工具变量之间的关系。虽然用于因果关系研究的可视化工具相对较少，但简单的可视化工具可以成为教育、应用和方法开发的宝贵财富。

{"title":"Assignment-Control Plots: A Visual Companion for Causal Inference Study Design.","authors":"Rachael C Aikens, Michael Baiocchi","doi":"10.1080/00031305.2022.2051605","DOIUrl":"10.1080/00031305.2022.2051605","url":null,"abstract":"An important step for any causal inference study design is understanding the distribution of the subjects in terms of measured baseline covariates. However, not all baseline variation is equally important. We propose a set of visualizations that reduce the space of measured covariates into two components of baseline variation important to the design of an observational causal inference study: a propensity score summarizing baseline variation associated with treatment assignment, and prognostic score summarizing baseline variation associated with the untreated potential outcome. These assignment-control plots and variations thereof visualize study design trade-offs and illustrate core methodological concepts in causal inference. As a practical demonstration, we apply assignment-control plots to a hypothetical study of cardiothoracic surgery. To demonstrate how these plots can be used to illustrate nuanced concepts, we use them to visualize unmeasured confounding and to consider the relationship between propensity scores and instrumental variables. While the family of visualization tools for studies of causality is relatively sparse, simple visual tools can be an asset to education, application, and methods development.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"77 1","pages":"72-84"},"PeriodicalIF":1.8,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9916271/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10712591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

The Sign Test, Paired Data, and Asymmetric Dependence: A Cautionary Tale. 符号检验、配对数据和非对称依赖性：警示故事。

IF 1.8 4区数学 Q1 Mathematics

American Statistician

Pub Date : 2023-01-01 Epub Date: 2022-09-23 DOI: 10.1080/00031305.2022.2110938

Alan D Hutson, Han Yu

In the paired data setting, the sign test is often described in statistical textbooks as a test for comparing differences between the medians of two marginal distributions. There is an implicit assumption that the median of the differences is equivalent to the difference of the medians when employing the sign test in this fashion. We demonstrate however that given asymmetry in the bivariate distribution of the paired data, there are often scenarios where the median of the differences is not equal to the difference of the medians. Further, we show that these scenarios will lead to a false interpretation of the sign test for its intended use in the paired data setting. We illustrate the false-interpretation concept via theory, a simulation study, and through a real-world example based on breast cancer RNA sequencing data obtained from the Cancer Genome Atlas (TCGA).

在配对数据设置中，统计教科书通常将符号检验描述为比较两个边际分布的中位数之间差异的检验。在使用符号检验时，有一个隐含的假设，即差异的中位数等同于中位数之差。然而，我们证明，由于配对数据的二元分布不对称，经常会出现差值的中位数不等于中位数之差的情况。此外，我们还表明，这些情况会导致符号检验在配对数据设置中的预期用途出现错误解释。我们通过理论、模拟研究以及基于从癌症基因组图谱（TCGA）中获得的乳腺癌 RNA 测序数据的实际例子来说明错误解释的概念。

引用次数: 0

Expressing regret: a unified view of credible intervals. 表示遗憾:对可信时间间隔的统一看法。

IF 1.8 4区数学 Q1 Mathematics

American Statistician

Pub Date : 2022-01-01 DOI: 10.1080/00031305.2022.2039764

Kenneth Rice, Lingbo Ye

Posterior uncertainty is typically summarized as a credible interval, an interval in the parameter space that contains a fixed proportion - usually 95% - of the posterior's support. For multivariate parameters, credible sets perform the same role. There are of course many potential 95% intervals from which to choose, yet even standard choices are rarely justified in any formal way. In this paper we give a general method, focusing on the loss function that motivates an estimate - the Bayes rule - around which we construct a credible set. The set contains all points which, as estimates, would have minimally-worse expected loss than the Bayes rule: we call this excess expected loss 'regret'. The approach can be used for any model and prior, and we show how it justifies all widely-used choices of credible interval/set. Further examples show how it provides insights into more complex estimation problems.

后验不确定性通常概括为可信区间，即参数空间中包含固定比例(通常为95%)的后验支持度的区间。对于多变量参数，可信集扮演同样的角色。当然，有许多潜在的95%区间可供选择，但即使是标准的选择也很少以任何正式的方式证明。在本文中，我们给出了一种通用的方法，重点是激励估计的损失函数-贝叶斯规则-我们围绕它构造一个可信集。这个集合包含了所有的点，作为估计，这些点的预期损失比贝叶斯规则要小:我们把这个超额的预期损失称为“后悔”。该方法可用于任何模型和先验，我们展示了它如何证明所有广泛使用的可信区间/集的选择。进一步的示例显示了它如何提供对更复杂的估计问题的见解。

引用次数: 5

Statistical implications of endogeneity induced by residential segregation in small-area modelling of health inequities. 居住隔离引起的内生性在卫生不平等小区域模型中的统计意义。

IF 1.8 4区数学 Q1 Mathematics

American Statistician

Pub Date : 2022-01-01 DOI: 10.1080/00031305.2021.2003245

Rachel C Nethery, Jarvis T Chen, Nancy Krieger, Pamela D Waterman, Emily Peterson, Lance A Waller, Brent A Coull

Health inequities are assessed by health departments to identify social groups disproportionately burdened by disease and by academic researchers to understand how social, economic, and environmental inequities manifest as health inequities. To characterize inequities, group-specific small-area health data are often modeled using log-linear generalized linear models (GLM) or generalized linear mixed models (GLMM) with a random intercept. These approaches estimate the same marginal rate ratio comparing disease rates across groups under standard assumptions. Here we explore how residential segregation combined with social group differences in disease risk can lead to contradictory findings from the GLM and GLMM. We show that this occurs because small-area disease rate data collected under these conditions induce endogeneity in the GLMM due to correlation between the model's offset and random effect. This results in GLMM estimates that represent conditional rather than marginal associations. We refer to endogeneity arising from the offset, which to our knowledge has not been noted previously, as "offset endogeneity". We illustrate this phenomenon in simulated data and real premature mortality data, and we propose alternative modeling approaches to address it. We also introduce to a statistical audience the social epidemiologic terminology for framing health inequities, which enables responsible interpretation of results.

卫生部门评估卫生不平等，以确定受疾病负担过重的社会群体，学术研究人员评估卫生不平等，以了解社会、经济和环境不平等如何表现为卫生不平等。为了描述不公平现象，特定群体的小区域卫生数据通常使用对数线性广义线性模型(GLM)或具有随机截距的广义线性混合模型(GLMM)建模。这些方法在标准假设下估计相同的边际比率，比较各组之间的发病率。在这里，我们探讨了居住隔离与疾病风险的社会群体差异如何导致GLM和GLMM的相互矛盾的结果。我们表明，这是因为在这些条件下收集的小区域疾病发病率数据由于模型偏移和随机效应之间的相关性而导致GLMM的内生性。这导致GLMM估计代表条件而不是边际关联。我们将抵消产生的内生性称为“抵消内生性”，据我们所知，以前没有注意到这一点。我们在模拟数据和真实的过早死亡数据中说明了这一现象，并提出了替代建模方法来解决这一问题。我们还向统计读者介绍社会流行病学术语，以界定卫生不平等现象，从而能够对结果作出负责任的解释。

{"title":"Statistical implications of endogeneity induced by residential segregation in small-area modelling of health inequities.","authors":"Rachel C Nethery, Jarvis T Chen, Nancy Krieger, Pamela D Waterman, Emily Peterson, Lance A Waller, Brent A Coull","doi":"10.1080/00031305.2021.2003245","DOIUrl":"https://doi.org/10.1080/00031305.2021.2003245","url":null,"abstract":"Health inequities are assessed by health departments to identify social groups disproportionately burdened by disease and by academic researchers to understand how social, economic, and environmental inequities manifest as health inequities. To characterize inequities, group-specific small-area health data are often modeled using log-linear generalized linear models (GLM) or generalized linear mixed models (GLMM) with a random intercept. These approaches estimate the same marginal rate ratio comparing disease rates across groups under standard assumptions. Here we explore how residential segregation combined with social group differences in disease risk can lead to contradictory findings from the GLM and GLMM. We show that this occurs because small-area disease rate data collected under these conditions induce endogeneity in the GLMM due to correlation between the model's offset and random effect. This results in GLMM estimates that represent conditional rather than marginal associations. We refer to endogeneity arising from the offset, which to our knowledge has not been noted previously, as \"offset endogeneity\". We illustrate this phenomenon in simulated data and real premature mortality data, and we propose alternative modeling approaches to address it. We also introduce to a statistical audience the social epidemiologic terminology for framing health inequities, which enables responsible interpretation of results.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"76 2","pages":"142-151"},"PeriodicalIF":1.8,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9070859/pdf/nihms-1762308.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10541651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Learning Hamiltonian Monte Carlo in R. 用 R 学习汉密尔顿蒙特卡洛算法

IF 1.8 4区数学 Q1 STATISTICS & PROBABILITY

American Statistician

Pub Date : 2021-01-01 Epub Date: 2021-01-31 DOI: 10.1080/00031305.2020.1865198

Samuel Thomas, Wanzhu Tu

Hamiltonian Monte Carlo (HMC) is a powerful tool for Bayesian computation. In comparison with the traditional Metropolis-Hastings algorithm, HMC offers greater computational efficiency, especially in higher dimensional or more complex modeling situations. To most statisticians, however, the idea of HMC comes from a less familiar origin, one that is based on the theory of classical mechanics. Its implementation, either through Stan or one of its derivative programs, can appear opaque to beginners. A lack of understanding of the inner working of HMC, in our opinion, has hindered its application to a broader range of statistical problems. In this article, we review the basic concepts of HMC in a language that is more familiar to statisticians, and we describe an HMC implementation in R, one of the most frequently used statistical software environments. We also present hmclearn, an R package for learning HMC. This package contains a general-purpose HMC function for data analysis. We illustrate the use of this package in common statistical models. In doing so, we hope to promote this powerful computational tool for wider use. Example code for common statistical models is presented as supplementary material for online publication.

汉密尔顿蒙特卡洛（HMC）是贝叶斯计算的强大工具。与传统的 Metropolis-Hastings 算法相比，HMC 具有更高的计算效率，尤其是在高维或更复杂的建模情况下。不过，对于大多数统计学家来说，HMC 的思想来源于经典力学理论，并不那么为人所熟悉。对于初学者来说，通过 Stan 或其衍生程序来实现 HMC 可能会显得不透明。我们认为，对 HMC 内部工作原理的不了解阻碍了它在更广泛的统计问题中的应用。在本文中，我们用统计学家更熟悉的语言回顾了 HMC 的基本概念，并介绍了 HMC 在 R（最常用的统计软件环境之一）中的实现。我们还介绍了用于学习 HMC 的 R 软件包 hmclearn。该软件包包含一个用于数据分析的通用 HMC 函数。我们说明了该软件包在常见统计模型中的使用。我们希望借此推广这一强大的计算工具，使其得到更广泛的使用。常见统计模型的示例代码将作为在线出版物的补充材料提供。

{"title":"Learning Hamiltonian Monte Carlo in R.","authors":"Samuel Thomas, Wanzhu Tu","doi":"10.1080/00031305.2020.1865198","DOIUrl":"10.1080/00031305.2020.1865198","url":null,"abstract":"Hamiltonian Monte Carlo (HMC) is a powerful tool for Bayesian computation. In comparison with the traditional Metropolis-Hastings algorithm, HMC offers greater computational efficiency, especially in higher dimensional or more complex modeling situations. To most statisticians, however, the idea of HMC comes from a less familiar origin, one that is based on the theory of classical mechanics. Its implementation, either through Stan or one of its derivative programs, can appear opaque to beginners. A lack of understanding of the inner working of HMC, in our opinion, has hindered its application to a broader range of statistical problems. In this article, we review the basic concepts of HMC in a language that is more familiar to statisticians, and we describe an HMC implementation in R, one of the most frequently used statistical software environments. We also present hmclearn, an R package for learning HMC. This package contains a general-purpose HMC function for data analysis. We illustrate the use of this package in common statistical models. In doing so, we hope to promote this powerful computational tool for wider use. Example code for common statistical models is presented as supplementary material for online publication.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"75 4","pages":"403-413"},"PeriodicalIF":1.8,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10353725/pdf/nihms-1670958.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9852609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sampling Strategies for Fast Updating of Gaussian Markov Random Fields. 高斯马尔可夫随机场快速更新的采样策略

IF 1.8 4区数学 Q1 Mathematics

American Statistician

Pub Date : 2021-01-01 Epub Date: 2019-05-31 DOI: 10.1080/00031305.2019.1595144

D Andrew Brown, Christopher S McMahan, Stella Watson Self

Gaussian Markov random fields (GMRFs) are popular for modeling dependence in large areal datasets due to their ease of interpretation and computational convenience afforded by the sparse precision matrices needed for random variable generation. Typically in Bayesian computation, GMRFs are updated jointly in a block Gibbs sampler or componentwise in a single-site sampler via the full conditional distributions. The former approach can speed convergence by updating correlated variables all at once, while the latter avoids solving large matrices. We consider a sampling approach in which the underlying graph can be cut so that conditionally independent sites are updated simultaneously. This algorithm allows a practitioner to parallelize updates of subsets of locations or to take advantage of 'vectorized' calculations in a high-level language such as R. Through both simulated and real data, we demonstrate computational savings that can be achieved versus both single-site and block updating, regardless of whether the data are on a regular or an irregular lattice. The approach provides a good compromise between statistical and computational efficiency and is accessible to statisticians without expertise in numerical analysis or advanced computing.

高斯马尔可夫随机场（GMRFs）因其易于解释和随机变量生成所需的稀疏精度矩阵所带来的计算便利，在大面积数据集的依赖性建模方面很受欢迎。通常在贝叶斯计算中，GMRF 在块吉布斯采样器中联合更新，或在单点采样器中通过全条件分布分量更新。前一种方法可以通过一次性更新相关变量来加快收敛速度，而后一种方法则可以避免求解大型矩阵。我们考虑了一种采样方法，在这种方法中，可以对底层图进行切割，从而同时更新条件独立的站点。通过模拟数据和真实数据，我们展示了与单点更新和块更新相比，无论数据是在规则还是不规则网格上，都能节省计算量。这种方法在统计效率和计算效率之间实现了良好的折中，不具备数值分析或高级计算专业知识的统计人员也可以使用。

{"title":"Sampling Strategies for Fast Updating of Gaussian Markov Random Fields.","authors":"D Andrew Brown, Christopher S McMahan, Stella Watson Self","doi":"10.1080/00031305.2019.1595144","DOIUrl":"10.1080/00031305.2019.1595144","url":null,"abstract":"Gaussian Markov random fields (GMRFs) are popular for modeling dependence in large areal datasets due to their ease of interpretation and computational convenience afforded by the sparse precision matrices needed for random variable generation. Typically in Bayesian computation, GMRFs are updated jointly in a block Gibbs sampler or componentwise in a single-site sampler via the full conditional distributions. The former approach can speed convergence by updating correlated variables all at once, while the latter avoids solving large matrices. We consider a sampling approach in which the underlying graph can be cut so that conditionally independent sites are updated simultaneously. This algorithm allows a practitioner to parallelize updates of subsets of locations or to take advantage of 'vectorized' calculations in a high-level language such as R. Through both simulated and real data, we demonstrate computational savings that can be achieved versus both single-site and block updating, regardless of whether the data are on a regular or an irregular lattice. The approach provides a good compromise between statistical and computational efficiency and is accessible to statisticians without expertise in numerical analysis or advanced computing.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"75 1","pages":"52-65"},"PeriodicalIF":1.8,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7954130/pdf/nihms-1547742.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25485801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Review of Bayesian Perspectives on Sample Size Derivation for Confirmatory Trials. 贝叶斯观点对确证试验样本大小推导的回顾。

IF 1.8 4区数学 Q1 Mathematics

American Statistician

Pub Date : 2021-01-01 Epub Date: 2021-04-22 DOI: 10.1080/00031305.2021.1901782

Kevin Kunzmann, Michael J Grayling, Kim May Lee, David S Robertson, Kaspar Rufibach, James M S Wason

Sample size derivation is a crucial element of planning any confirmatory trial. The required sample size is typically derived based on constraints on the maximal acceptable Type I error rate and minimal desired power. Power depends on the unknown true effect and tends to be calculated either for the smallest relevant effect or a likely point alternative. The former might be problematic if the minimal relevant effect is close to the null, thus requiring an excessively large sample size, while the latter is dubious since it does not account for the a priori uncertainty about the likely alternative effect. A Bayesian perspective on sample size derivation for a frequentist trial can reconcile arguments about the relative a priori plausibility of alternative effects with ideas based on the relevance of effect sizes. Many suggestions as to how such "hybrid" approaches could be implemented in practice have been put forward. However, key quantities are often defined in subtly different ways in the literature. Starting from the traditional entirely frequentist approach to sample size derivation, we derive consistent definitions for the most commonly used hybrid quantities and highlight connections, before discussing and demonstrating their use in sample size derivation for clinical trials.

样本量的确定是计划任何确证试验的关键因素。所需样本量通常是根据可接受的最大 I 类错误率和最小期望功率的限制条件得出的。作用力取决于未知的真实效应，往往根据最小相关效应或可能的点替代效应来计算。如果最小相关效应接近于空值，前者可能会有问题，因此需要过大的样本量，而后者则值得怀疑，因为它没有考虑到可能的替代效应的先验不确定性。从贝叶斯的角度来推导频数试验的样本量，可以调和替代效应的相对先验可信性与基于效应大小相关性的观点。关于如何在实践中采用这种 "混合 "方法，已经提出了许多建议。然而，文献中对关键量的定义往往存在微妙的差异。我们从传统的完全频数主义的样本量推导方法出发，推导出最常用的混合量的一致定义并强调其联系，然后讨论并演示它们在临床试验样本量推导中的应用。

{"title":"A Review of Bayesian Perspectives on Sample Size Derivation for Confirmatory Trials.","authors":"Kevin Kunzmann, Michael J Grayling, Kim May Lee, David S Robertson, Kaspar Rufibach, James M S Wason","doi":"10.1080/00031305.2021.1901782","DOIUrl":"10.1080/00031305.2021.1901782","url":null,"abstract":"Sample size derivation is a crucial element of planning any confirmatory trial. The required sample size is typically derived based on constraints on the maximal acceptable Type I error rate and minimal desired power. Power depends on the unknown true effect and tends to be calculated either for the smallest relevant effect or a likely point alternative. The former might be problematic if the minimal relevant effect is close to the null, thus requiring an excessively large sample size, while the latter is dubious since it does not account for the a priori uncertainty about the likely alternative effect. A Bayesian perspective on sample size derivation for a frequentist trial can reconcile arguments about the relative a priori plausibility of alternative effects with ideas based on the relevance of effect sizes. Many suggestions as to how such \"hybrid\" approaches could be implemented in practice have been put forward. However, key quantities are often defined in subtly different ways in the literature. Starting from the traditional entirely frequentist approach to sample size derivation, we derive consistent definitions for the most commonly used hybrid quantities and highlight connections, before discussing and demonstrating their use in sample size derivation for clinical trials.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":" ","pages":"424-432"},"PeriodicalIF":1.8,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7612172/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39652616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On Causal Inferences for Personalized Medicine: How Hidden Causal Assumptions Led to Erroneous Causal Claims About the D-Value. 论个体化医疗的因果推论:隐藏的因果假设如何导致关于d值的错误因果主张。

IF 1.8 4区数学 Q1 Mathematics

American Statistician

Pub Date : 2020-01-01 Epub Date: 2019-05-20 DOI: 10.1080/00031305.2019.1575771

Sander Greenland, Michael P Fay, Erica H Brittain, Joanna H Shih, Dean A Follmann, Erin E Gabriel, James M Robins

Personalized medicine asks if a new treatment will help a particular patient, rather than if it improves the average response in a population. Without a causal model to distinguish these questions, interpretational mistakes arise. These mistakes are seen in an article by Demidenko [2016] that recommends the "D-value," which is the probability that a randomly chosen person from the new-treatment group has a higher value for the outcome than a randomly chosen person from the control-treatment group. The abstract states "The D-value has a clear interpretation as the proportion of patients who get worse after the treatment" with similar assertions appearing later. We show these statements are incorrect because they require assumptions about the potential outcomes which are neither testable in randomized experiments nor plausible in general. The D-value will not equal the proportion of patients who get worse after treatment if (as expected) those outcomes are correlated. Independence of potential outcomes is unrealistic and eliminates any personalized treatment effects; with dependence, the D-value can even imply treatment is better than control even though most patients are harmed by the treatment. Thus, D-values are misleading for personalized medicine. To prevent misunderstandings, we advise incorporating causal models into basic statistics education.

个性化医疗询问的是一种新的治疗方法是否能帮助特定的病人，而不是它是否能改善人群的平均反应。如果没有因果模型来区分这些问题，就会出现解释上的错误。Demidenko[2016]在一篇推荐“d值”的文章中看到了这些错误，d值是指从新治疗组中随机选择的人比从对照组中随机选择的人具有更高结果值的概率。摘要指出“d值有一个明确的解释，即患者在治疗后病情恶化的比例”，随后出现了类似的断言。我们证明这些陈述是不正确的，因为它们需要对潜在结果进行假设，而这些假设既不能在随机实验中检验，也不能在一般情况下可信。如果(如预期的那样)这些结果是相关的，那么d值将不等于治疗后病情恶化的患者比例。潜在结果的独立性是不现实的，消除了任何个性化的治疗效果;对于依赖性，d值甚至可以暗示治疗比控制好，即使大多数患者受到治疗的伤害。因此，d值对个性化医疗具有误导性。为了防止误解，我们建议将因果模型纳入基础统计教育。

{"title":"On Causal Inferences for Personalized Medicine: How Hidden Causal Assumptions Led to Erroneous Causal Claims About the D-Value.","authors":"Sander Greenland, Michael P Fay, Erica H Brittain, Joanna H Shih, Dean A Follmann, Erin E Gabriel, James M Robins","doi":"10.1080/00031305.2019.1575771","DOIUrl":"https://doi.org/10.1080/00031305.2019.1575771","url":null,"abstract":"Personalized medicine asks if a new treatment will help a particular patient, rather than if it improves the average response in a population. Without a causal model to distinguish these questions, interpretational mistakes arise. These mistakes are seen in an article by Demidenko [2016] that recommends the \"D-value,\" which is the probability that a randomly chosen person from the new-treatment group has a higher value for the outcome than a randomly chosen person from the control-treatment group. The abstract states \"The D-value has a clear interpretation as the proportion of patients who get worse after the treatment\" with similar assertions appearing later. We show these statements are incorrect because they require assumptions about the potential outcomes which are neither testable in randomized experiments nor plausible in general. The D-value will not equal the proportion of patients who get worse after treatment if (as expected) those outcomes are correlated. Independence of potential outcomes is unrealistic and eliminates any personalized treatment effects; with dependence, the D-value can even imply treatment is better than control even though most patients are harmed by the treatment. Thus, D-values are misleading for personalized medicine. To prevent misunderstandings, we advise incorporating causal models into basic statistics education.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"74 3","pages":"243-248"},"PeriodicalIF":1.8,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/00031305.2019.1575771","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38853141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

American Statistician

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀