The American Statistician最新文献

英文中文

Comment on “On the Power of the F-test for Hypotheses in a Linear Model” by Griffiths and Hill (2022) 对Griffiths和Hill(2022)《关于线性模型中假设的f检验的威力》的评论

The American Statistician

Pub Date : 2022-07-03 DOI: 10.1080/00031305.2022.2074541

R. Christensen

Griffiths and Hill (2022) showed that when testing an hypothesis in linear models, it can sometimes be advantageous to incorporate components into the hypothesis that are true. They assume a true linear model and that a constraint R 2 β = d 2 is true. They are interested in comparing the powers of three F tests for H 0 : R 1 β = d 1 :

Griffiths和Hill(2022)表明，在线性模型中检验假设时，将正确的成分纳入假设中有时是有利的。他们假设一个真正的线性模型并且约束r2 β = d2是真的。他们感兴趣的是比较h0: r1 β = d1的三个F测试的功率:

引用次数: 0

Leadership in Statistics and Data Science: Planning for Inclusive Excellence, 领导统计和数据科学:规划包容性卓越，

The American Statistician

Pub Date : 2022-07-03 DOI: 10.1080/00031305.2022.2088201

Emilija Perkovic

the output. The sponsor NDA may alert to other inadvertent disclosures, such as having lunch or dinner with another DMC member while diners at nearby tables can overhear the discussion. Some publicly traded companies have a black-out period during the conduct of an interim where all employees are forbidden from trading stock and from sharing internal corporate news about DSMB activities while a DSMB review is underway or about to occur. In the United States, the Securities Exchange Commission (SEC) does investigate unusual stock trades that occur with public announcement of trial results such as a DSMB recommendation, or other sponsor announcements to investors. SEC may request information from sponsor employees about knowledge of a DSMB meeting or trial results. I encourage readers to purchase this book. During the pandemic there are other reports of DSMB activities, which often are briefly described on sponsor investor news, and subsequently in the press. There are additional and interesting examples of DSMB activities during the pandemic which the interested reader may find by perusing the sponsor investor news and by general internet searches.

输出。赞助商NDA可能会对其他无意中泄露的信息发出警告，例如与另一位DMC成员共进午餐或晚餐，而附近桌子的就餐者可能会无意中听到讨论。一些上市公司在过渡期有一段时间，在DSMB审查正在进行或即将进行期间，所有员工都被禁止交易股票，也禁止分享有关DSMB活动的内部公司新闻。在美国，证券交易委员会(Securities Exchange Commission, SEC)确实会调查在公开宣布试验结果(如DSMB推荐)或向投资者发布其他发起人公告时发生的不寻常股票交易。SEC可以要求赞助商员工提供有关DSMB会议知识或试验结果的信息。我鼓励读者购买这本书。在大流行期间，还有关于DSMB活动的其他报道，这些活动通常在赞助商投资者新闻中简要介绍，随后在报刊上进行介绍。有兴趣的读者可以通过仔细阅读赞助商投资者新闻和一般互联网搜索，发现大流行期间DSMB活动的其他有趣例子。

引用次数: 0

Data Monitoring Committees in Clinical Trials: A Practical Perspective 临床试验中的数据监测委员会:一个实用的视角

The American Statistician

Pub Date : 2022-07-03 DOI: 10.1080/00031305.2022.2088199

C. Barker

引用次数: 0

Rejoinder to Harville (2022) and Christensen (2022) Comments on “On the Power of the F-test for Hypotheses in a Linear Model,” by Griffiths and Hill (2022) 对Griffiths和Hill(2022)的《关于线性模型中假设的f检验的力量》的评论的回驳(2022)

The American Statistician

Pub Date : 2022-07-03 DOI: 10.1080/00031305.2022.2074542

W. Griffiths, R. Carter Hill

The authors would like to thank Professors Christensen and Harville for their comments. These two authors take differ-ent approaches to generalizing and improving the proof of the theorem in Griffiths and Hill (2022). Professor Christensen’s geometric approach, and Professor Harville’s meticulous matrix algebra approach, are suitable for graduate courses in linear models for statisticians in various fields. We believe that there is pedagogic value in discussing the tradeoff between the non-CONTACT centrality parameter and the numerator degrees of freedom in the F -test, and how this affects the power of the F -test. However, just to be clear, we are not advocating adding true hypotheses as a strategy. Professor Christensen’s first sentence may be taken by some to imply that. The main points in our article are also easily demonstrated via simulation, as shown in the supplemen-tary materials, making the ideas accessible to undergraduate students.

作者要感谢克里斯滕森教授和哈维尔教授的评论。这两位作者采用不同的方法来推广和改进Griffiths和Hill(2022)的定理证明。Christensen教授的几何方法，以及Harville教授细致的矩阵代数方法，适用于各个领域统计学家的线性模型研究生课程。我们认为，在讨论非接触中心性参数和F检验中的分子自由度之间的权衡，以及这如何影响F检验的能力，具有教学价值。然而，需要澄清的是，我们并不提倡将真实假设作为一种策略。克里斯滕森教授的第一句话可能会被一些人理解为暗示。我们文章中的主要观点也很容易通过模拟演示，如补充材料中所示，使本科生能够理解这些思想。

引用次数: 0

Statistical Inference for Method of Moments Estimators of a Semi-Supervised Two-Component Mixture Model 半监督双组分混合模型矩估计方法的统计推断

The American Statistician

Pub Date : 2022-06-30 DOI: 10.1080/00031305.2022.2096695

Bradley Lubich, D. Jeske, W. Yao

ABSTRACT A mixture of a distribution of responses from untreated patients and a shift of that distribution is a useful model for the responses from a group of treated patients. The mixture model accounts for the fact that not all the patients in the treated group will respond to the treatment and consequently their responses follow the same distribution as the responses from untreated patients. The treatment effect in this context consists of both the fraction of the treated patients that are responders and the magnitude of the shift in the distribution for the responders. In this article, we investigate asymptotic properties of method of moment estimators for the treatment effect based on a semi-supervised two-component mixture model. From these properties, we develop asymptotic confidence intervals and demonstrate their superior statistical inference performance compared to the computationally intensive bootstrap intervals and their Bias-Corrected versions.

来自未治疗患者的反应分布和该分布的变化的混合是一组治疗患者反应的有用模型。混合模型解释了这样一个事实，即并非治疗组中的所有患者都会对治疗作出反应，因此他们的反应遵循与未治疗患者的反应相同的分布。在这种情况下，治疗效果包括治疗患者中应答者的比例和应答者分布变化的幅度。在本文中，我们研究了基于半监督双组分混合模型的处理效果的矩估计方法的渐近性质。从这些性质出发，我们开发了渐近置信区间，并证明了与计算密集的自举区间及其偏差校正版本相比，它们具有优越的统计推断性能。

引用次数: 1

Evidential Calibration of Confidence Intervals 可信区间的证据性校准

The American Statistician

Pub Date : 2022-06-24 DOI: 10.1080/00031305.2023.2216239

Samuel Pawel, A. Ly, E. Wagenmakers

We present a novel and easy-to-use method for calibrating error-rate based confidence intervals to evidence-based support intervals. Support intervals are obtained from inverting Bayes factors based on a parameter estimate and its standard error. A $k$ support interval can be interpreted as"the observed data are at least $k$ times more likely under the included parameter values than under a specified alternative". Support intervals depend on the specification of prior distributions for the parameter under the alternative, and we present several types that allow different forms of external knowledge to be encoded. We also show how prior specification can to some extent be avoided by considering a class of prior distributions and then computing so-called minimum support intervals which, for a given class of priors, have a one-to-one mapping with confidence intervals. We also illustrate how the sample size of a future study can be determined based on the concept of support. Finally, we show how the bound for the type I error rate of Bayes factors leads to a bound for the coverage of support intervals. An application to data from a clinical trial illustrates how support intervals can lead to inferences that are both intuitive and informative.

我们提出了一种新颖且易于使用的方法，将基于错误率的置信区间校准为基于证据的支持区间。基于参数估计及其标准误差，通过贝叶斯因子反演得到支持区间。k支持区间可以解释为“在包含的参数值下观察到的数据比在指定的替代值下观察到的数据的可能性至少高k倍”。支持间隔依赖于可选方案下参数的先验分布规范，我们提出了几种允许对不同形式的外部知识进行编码的类型。我们还展示了如何通过考虑一类先验分布，然后计算所谓的最小支持区间，从而在一定程度上避免先验规范，对于给定的一类先验，该最小支持区间与置信区间具有一对一的映射。我们还说明了如何根据支持度的概念确定未来研究的样本量。最后，我们展示了贝叶斯因子的第一类错误率的界限如何导致支持区间覆盖的界限。对临床试验数据的应用说明了支持间隔如何导致既直观又翔实的推论。

{"title":"Evidential Calibration of Confidence Intervals","authors":"Samuel Pawel, A. Ly, E. Wagenmakers","doi":"10.1080/00031305.2023.2216239","DOIUrl":"https://doi.org/10.1080/00031305.2023.2216239","url":null,"abstract":"We present a novel and easy-to-use method for calibrating error-rate based confidence intervals to evidence-based support intervals. Support intervals are obtained from inverting Bayes factors based on a parameter estimate and its standard error. A $k$ support interval can be interpreted as\"the observed data are at least $k$ times more likely under the included parameter values than under a specified alternative\". Support intervals depend on the specification of prior distributions for the parameter under the alternative, and we present several types that allow different forms of external knowledge to be encoded. We also show how prior specification can to some extent be avoided by considering a class of prior distributions and then computing so-called minimum support intervals which, for a given class of priors, have a one-to-one mapping with confidence intervals. We also illustrate how the sample size of a future study can be determined based on the concept of support. Finally, we show how the bound for the type I error rate of Bayes factors leads to a bound for the coverage of support intervals. An application to data from a clinical trial illustrates how support intervals can lead to inferences that are both intuitive and informative.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127449032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Hypothesis testing for matched pairs with missing data by maximum mean discrepancy: An application to continuous glucose monitoring 用最大平均差异对缺失数据配对的假设检验:在连续血糖监测中的应用

The American Statistician

Pub Date : 2022-06-03 DOI: 10.1080/00031305.2023.2200512

M. Matabuena, Paulo F'elix, Marc Ditzhaus, J. Vidal, F. Gudé

A frequent problem in statistical science is how to properly handle missing data in matched paired observations. There is a large body of literature coping with the univariate case. Yet, the ongoing technological progress in measuring biological systems raises the need for addressing more complex data, e.g., graphs, strings and probability distributions, among others. In order to fill this gap, this paper proposes new estimators of the maximum mean discrepancy (MMD) to handle complex matched pairs with missing data. These estimators can detect differences in data distributions under different missingness mechanisms. The validity of this approach is proven and further studied in an extensive simulation study, and results of statistical consistency are provided. Data from continuous glucose monitoring in a longitudinal population-based diabetes study are used to illustrate the application of this approach. By employing the new distributional representations together with cluster analysis, new clinical criteria on how glucose changes vary at the distributional level over five years can be explored.

统计科学中经常遇到的一个问题是如何正确处理匹配成对观测中的缺失数据。有大量的文献论述单变量的情况。然而，测量生物系统的持续技术进步提出了处理更复杂数据的需求，例如图、字符串和概率分布等。为了填补这一空白，本文提出了一种新的最大平均差异估计方法来处理具有缺失数据的复杂匹配对。这些估计器可以检测不同缺失机制下数据分布的差异。该方法的有效性在广泛的仿真研究中得到了证明和进一步的研究，并提供了统计一致性的结果。在一项基于纵向人群的糖尿病研究中，连续血糖监测的数据被用来说明这种方法的应用。通过采用新的分布表示和聚类分析，可以探索五年内葡萄糖在分布水平上变化的新临床标准。

{"title":"Hypothesis testing for matched pairs with missing data by maximum mean discrepancy: An application to continuous glucose monitoring","authors":"M. Matabuena, Paulo F'elix, Marc Ditzhaus, J. Vidal, F. Gudé","doi":"10.1080/00031305.2023.2200512","DOIUrl":"https://doi.org/10.1080/00031305.2023.2200512","url":null,"abstract":"A frequent problem in statistical science is how to properly handle missing data in matched paired observations. There is a large body of literature coping with the univariate case. Yet, the ongoing technological progress in measuring biological systems raises the need for addressing more complex data, e.g., graphs, strings and probability distributions, among others. In order to fill this gap, this paper proposes new estimators of the maximum mean discrepancy (MMD) to handle complex matched pairs with missing data. These estimators can detect differences in data distributions under different missingness mechanisms. The validity of this approach is proven and further studied in an extensive simulation study, and results of statistical consistency are provided. Data from continuous glucose monitoring in a longitudinal population-based diabetes study are used to illustrate the application of this approach. By employing the new distributional representations together with cluster analysis, new clinical criteria on how glucose changes vary at the distributional level over five years can be explored.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126181615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Interactive Exploration of Large Dendrograms with Prototypes 大型树状图与原型的互动探索

The American Statistician

Pub Date : 2022-06-03 DOI: 10.1080/00031305.2022.2087734

Andee Kaplan, J. Bien

ABSTRACT Hierarchical clustering is one of the standard methods taught for identifying and exploring the underlying structures that may be present within a dataset. Students are shown examples in which the dendrogram, a visual representation of the hierarchical clustering, reveals a clear clustering structure. However, in practice, data analysts today frequently encounter datasets whose large scale undermines the usefulness of the dendrogram as a visualization tool. Densely packed branches obscure structure, and overlapping labels are impossible to read. In this article we present a new workflow for performing hierarchical clustering via the R package called protoshiny that aims to restore hierarchical clustering to its former role of being an effective and versatile visualization tool. Our proposal leverages interactivity combined with the ability to label internal nodes in a dendrogram with a representative data point (called a prototype). After presenting the workflow, we provide three case studies to demonstrate its utility.

层次聚类是用于识别和探索数据集中可能存在的底层结构的标准方法之一。向学生们展示了一些例子，其中树形图是分层聚类的可视化表示，揭示了一个清晰的聚类结构。然而，在实践中，今天的数据分析师经常遇到数据集的大规模破坏了树形图作为可视化工具的有用性。密集的分支模糊了结构，重叠的标签无法阅读。在本文中，我们提出了一个新的工作流程，通过R包protoshiny来执行分层聚类，旨在将分层聚类恢复到以前作为有效和通用可视化工具的角色。我们的建议利用了交互性，并结合了用代表性数据点(称为原型)标记树形图中的内部节点的能力。在介绍了工作流之后，我们将提供三个案例研究来演示它的实用性。

引用次数: 0

Revisiting the name variant of the two-children problem 重新审视两个孩子问题的名字变体

The American Statistician

Pub Date : 2022-05-25 DOI: 10.1080/00031305.2023.2173293

D. Paindaveine, P. Spindel

Initially proposed by Martin Gardner in the 1950s, the famous two-children problem is often presented as a paradox in probability theory. A relatively recent variant of this paradox states that, while in a two-children family for which at least one child is a girl, the probability that the other child is a boy is 2 / 3, this probability becomes 1 / 2 if the ﬁrst name of the girl is disclosed (provided that two sisters may not be given the same ﬁrst name). We revisit this variant of the problem and show that, if one adopts a natural model for the way ﬁrst names are given to girls, then the probability that the other child is a boy may take any value in ]0 , 2 / 3[. By exploiting the concept of Schur-concavity, we study how this probability depends on model parameters.

著名的二孩问题最初是由马丁·加德纳(Martin Gardner)在20世纪50年代提出的，它经常被视为概率论中的一个悖论。这个悖论的一个相对较新的变体是，在一个有两个孩子的家庭中，至少有一个孩子是女孩，另一个孩子是男孩的概率是2 / 3，如果女孩的名字被披露，这个概率就变成1 / 2(前提是两个姐妹的名字可能不相同)。我们重新审视这个问题的变体，并表明，如果一个人采用给女孩取名字的自然模型，那么另一个孩子是男孩的概率可以取[0,2 / 3]中的任意值。通过利用schur -凹凸性的概念，我们研究了该概率如何依赖于模型参数。

引用次数: 0

Linearity of Unbiased Linear Model Estimators 无偏线性模型估计量的线性性

The American Statistician

Pub Date : 2022-05-11 DOI: 10.1080/00031305.2022.2076743

S. Portnoy

ABSTRACT Best linear unbiased estimators (BLUE’s) are known to be optimal in many respects under normal assumptions. Since variance minimization doesn’t depend on normality and unbiasedness is often considered reasonable, many statisticians have felt that BLUE’s ought to preform relatively well in some generality. The result here considers the general linear model and shows that any measurable estimator that is unbiased over a moderately large family of distributions must be linear. Thus, imposing unbiasedness cannot offer any improvement over imposing linearity. The problem was suggested by Hansen, who showed that any estimator unbiased for nearly all error distributions (with finite covariance) must have a variance no smaller than that of the best linear estimator in some parametric subfamily. Specifically, the hypothesis of linearity can be dropped from the classical Gauss–Markov Theorem. This might suggest that the best unbiased estimator should provide superior performance, but the result here shows that the best unbiased regression estimator can be no better than the best linear estimator.

在一般假设下，最佳线性无偏估计量(BLUE’s)在许多方面是最优的。由于方差最小化不依赖于正态性，无偏性通常被认为是合理的，许多统计学家认为BLUE应该在某些普遍性中表现得相对较好。这里的结果考虑了一般线性模型，并表明任何可测量的估计量在中等大的分布族上是无偏的，必须是线性的。因此，强加无偏性并不能提供任何优于强加线性的改进。这个问题是由Hansen提出的，他表明对于几乎所有误差分布(具有有限协方差)的无偏估计量必须具有不小于某些参数子族中最佳线性估计量的方差。具体来说，线性假设可以从经典的高斯-马尔可夫定理中去掉。这可能表明最好的无偏估计量应该提供更好的性能，但这里的结果表明，最好的无偏回归估计量可能不会比最好的线性估计量更好。

引用次数: 2

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

The American Statistician

全部 Geobiology Appl. Clay Sci. Geochim. Cosmochim. Acta J. Hydrol. Org. Geochem. Carbon Balance Manage. Contrib. Mineral. Petrol. Int. J. Biometeorol. IZV-PHYS SOLID EART+ J. Atmos. Chem. Acta Oceanolog. Sin. Acta Geophys. ACTA GEOL POL ACTA PETROL SIN ACTA GEOL SIN-ENGL AAPG Bull. Acta Geochimica Adv. Atmos. Sci. Adv. Meteorol. Am. J. Phys. Anthropol. Am. J. Sci. Am. Mineral. Annu. Rev. Earth Planet. Sci. Appl. Geochem. Aquat. Geochem. Ann. Glaciol. Archaeol. Anthropol. Sci. ARCHAEOMETRY ARCT ANTARCT ALP RES Asia-Pac. J. Atmos. Sci. ATMOSPHERE-BASEL Atmos. Res. Aust. J. Earth Sci. Atmos. Chem. Phys. Atmos. Meas. Tech. Basin Res. Big Earth Data BIOGEOSCIENCES Geostand. Geoanal. Res. GEOLOGY Geosci. J. Geochem. J. Geochem. Trans. Geosci. Front. Geol. Ore Deposits Global Biogeochem. Cycles Gondwana Res. Geochem. Int. Geol. J. Geophys. Prospect. Geosci. Model Dev. GEOL BELG GROUNDWATER Hydrogeol. J. Hydrol. Earth Syst. Sci. Hydrol. Processes Int. J. Climatol. Int. J. Earth Sci. Int. Geol. Rev. Int. J. Disaster Risk Reduct. Int. J. Geomech. Int. J. Geog. Inf. Sci. Isl. Arc J. Afr. Earth. Sci. J. Adv. Model. Earth Syst. J APPL METEOROL CLIM J. Atmos. Oceanic Technol. J. Atmos. Sol. Terr. Phys. J. Clim. J. Earth Sci. J. Earth Syst. Sci. J. Environ. Eng. Geophys. J. Geog. Sci. Mineral. Mag. Miner. Deposita Mon. Weather Rev. Nat. Hazards Earth Syst. Sci. Nat. Clim. Change Nat. Geosci. Ocean Dyn. Ocean and Coastal Research npj Clim. Atmos. Sci. Ocean Modell. Ocean Sci. Ore Geol. Rev. OCEAN SCI J Paleontol. J. PALAEOGEOGR PALAEOCL PERIOD MINERAL PETROLOGY+ Phys. Chem. Miner. Polar Sci. Prog. Oceanogr. Quat. Sci. Rev. Q. J. Eng. Geol. Hydrogeol. RADIOCARBON Pure Appl. Geophys. Resour. Geol. Rev. Geophys. Sediment. Geol.

﹀