首页 > 最新文献

The American Statistician最新文献

英文 中文
Technical Validation of Plot Designs by Use of Deep Learning 基于深度学习的情节设计技术验证
Pub Date : 2023-10-13 DOI: 10.1080/00031305.2023.2270649
Anne Helby Petersen, Claus Ekstrøm
AbstractWhen does inspecting a certain graphical plot allow for an investigator to reach the right statistical conclusion? Visualizations are commonly used for various tasks in statistics – including model diagnostics and exploratory data analysis – and though attractive due to its intuitive nature, the lack of available methods for validating plots is a major drawback. We propose a new technical validation method for visual reasoning. Our method trains deep neural networks to distinguish between plots simulated under two different data generating mechanisms (null or alternative), and we use the classification accuracy as a technical validation score (TVS). The TVS measures the information content in the plots, and TVS values can be used to compare different plots or different choices of data generating mechanisms, thereby providing a meaningful scale that new visual reasoning procedures can be validated against. We apply the method to three popular diagnostic plots for linear regression, namely scatter plots, quantile-quantile plots and residual plots. We consider various types and degrees of misspecification, as well as different within-plot sample sizes. Our method produces TVSs that increase with increasing sample size and decrease with increasing difficulty, and hence the TVS is a meaningful measure of validity.Keywords: Deep learninggraphical inferencelinear regressionneural networkmodel diagnosticsvisualizationDisclaimerAs a service to authors and researchers we are providing this version of an accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proofs will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to these versions also.
摘要什么时候检查一个特定的图形可以让研究者得出正确的统计结论?可视化通常用于统计中的各种任务-包括模型诊断和探索性数据分析-尽管由于其直观的性质而具有吸引力,但缺乏验证图的可用方法是一个主要缺点。我们提出了一种新的视觉推理技术验证方法。我们的方法训练深度神经网络来区分在两种不同的数据生成机制(null或alternative)下模拟的图,我们使用分类精度作为技术验证分数(TVS)。TVS测量图中的信息内容,TVS值可用于比较不同的图或不同的数据生成机制选择,从而提供一个有意义的尺度,新的视觉推理程序可以根据该尺度进行验证。我们将该方法应用于三种常用的线性回归诊断图,即散点图、分位数-分位数图和残差图。我们考虑了不同类型和程度的错配,以及不同的图内样本量。我们的方法产生的TVS随样本量的增加而增加,随难度的增加而减少,因此TVS是一种有意义的效度度量。关键词:深度学习图形推理线性回归神经网络模型诊断可视化免责声明作为对作者和研究人员的服务,我们提供此版本的已接受手稿(AM)。在最终出版版本记录(VoR)之前,将对该手稿进行编辑、排版和审查。在制作和印前,可能会发现可能影响内容的错误,所有适用于期刊的法律免责声明也与这些版本有关。
{"title":"Technical Validation of Plot Designs by Use of Deep Learning","authors":"Anne Helby Petersen, Claus Ekstrøm","doi":"10.1080/00031305.2023.2270649","DOIUrl":"https://doi.org/10.1080/00031305.2023.2270649","url":null,"abstract":"AbstractWhen does inspecting a certain graphical plot allow for an investigator to reach the right statistical conclusion? Visualizations are commonly used for various tasks in statistics – including model diagnostics and exploratory data analysis – and though attractive due to its intuitive nature, the lack of available methods for validating plots is a major drawback. We propose a new technical validation method for visual reasoning. Our method trains deep neural networks to distinguish between plots simulated under two different data generating mechanisms (null or alternative), and we use the classification accuracy as a technical validation score (TVS). The TVS measures the information content in the plots, and TVS values can be used to compare different plots or different choices of data generating mechanisms, thereby providing a meaningful scale that new visual reasoning procedures can be validated against. We apply the method to three popular diagnostic plots for linear regression, namely scatter plots, quantile-quantile plots and residual plots. We consider various types and degrees of misspecification, as well as different within-plot sample sizes. Our method produces TVSs that increase with increasing sample size and decrease with increasing difficulty, and hence the TVS is a meaningful measure of validity.Keywords: Deep learninggraphical inferencelinear regressionneural networkmodel diagnosticsvisualizationDisclaimerAs a service to authors and researchers we are providing this version of an accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proofs will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to these versions also.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135854309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Phistogram 的Phistogram
Pub Date : 2023-10-09 DOI: 10.1080/00031305.2023.2267639
Adriana Verónica Blanc
AbstractThis article introduces a new kind of histogram-based representation for univariate random variables, named the phistogram because of its perceptual qualities. The technique relies on shifted groupings of data, creating a color-gradient zone that evidences the uncertainty from smoothing and highlights sampling issues. In this way, the phistogram offers a deep and visually appealing perspective on the finite sample peculiarities, being capable of depicting the underlying distribution as well, thus becoming an useful complement to histograms and other statistical summaries. Although not limited to it, the present construction is derived from the equal-area histogram, a variant that differs conceptually from the traditional one. As such a distinction is not greatly emphasized in the literature, the graphical fundamentals are described in detail, and an alternative terminology is proposed to separate some concepts. Additionally, a compact notation is adopted to integrate the representation’s metadata into the graphic itself.Keywords: statistical graphicdata visualization toolperceptioncolor-gradient techniquesmoothing uncertaintyequal-area histogramDisclaimerAs a service to authors and researchers we are providing this version of an accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proofs will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to these versions also.
摘要本文介绍了一种新的基于直方图的单变量随机变量表示方法,由于直方图具有感知特性而被命名为直方图。该技术依赖于数据的移位分组,创建一个颜色梯度区域,以证明平滑的不确定性,并突出采样问题。通过这种方式,直方图提供了对有限样本特性的深刻和视觉上吸引人的视角,也能够描绘潜在的分布,从而成为直方图和其他统计摘要的有用补充。虽然不限于此,但目前的结构源于等面积直方图,这是一种与传统结构在概念上不同的变体。由于这种区别在文献中没有得到很大的强调,因此对图形的基本原理进行了详细的描述,并提出了另一种术语来分离一些概念。此外,采用紧凑的符号将表示的元数据集成到图形本身中。关键词:统计图形数据可视化工具感知颜色梯度技术平滑不确定性等面积直方图免责声明作为对作者和研究人员的服务,我们提供此版本的已接受稿件。在最终出版版本记录(VoR)之前,将对该手稿进行编辑、排版和审查。在制作和印前,可能会发现可能影响内容的错误,所有适用于期刊的法律免责声明也与这些版本有关。
{"title":"The Phistogram","authors":"Adriana Verónica Blanc","doi":"10.1080/00031305.2023.2267639","DOIUrl":"https://doi.org/10.1080/00031305.2023.2267639","url":null,"abstract":"AbstractThis article introduces a new kind of histogram-based representation for univariate random variables, named the phistogram because of its perceptual qualities. The technique relies on shifted groupings of data, creating a color-gradient zone that evidences the uncertainty from smoothing and highlights sampling issues. In this way, the phistogram offers a deep and visually appealing perspective on the finite sample peculiarities, being capable of depicting the underlying distribution as well, thus becoming an useful complement to histograms and other statistical summaries. Although not limited to it, the present construction is derived from the equal-area histogram, a variant that differs conceptually from the traditional one. As such a distinction is not greatly emphasized in the literature, the graphical fundamentals are described in detail, and an alternative terminology is proposed to separate some concepts. Additionally, a compact notation is adopted to integrate the representation’s metadata into the graphic itself.Keywords: statistical graphicdata visualization toolperceptioncolor-gradient techniquesmoothing uncertaintyequal-area histogramDisclaimerAs a service to authors and researchers we are providing this version of an accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proofs will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to these versions also.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135141198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Note on Monte Carlo Integration in High Dimensions 关于高维蒙特卡罗积分的一个注记
Pub Date : 2023-10-09 DOI: 10.1080/00031305.2023.2267637
Yanbo Tang
Monte Carlo integration is a commonly used technique to compute intractable integrals and is typically thought to perform poorly for very high-dimensional integrals. To show that this is not always the case, we examine Monte Carlo integration using techniques from the high-dimensional statistics literature by allowing the dimension of the integral to increase. In doing so, we derive non-asymptotic bounds for the relative and absolute error of the approximation for some general classes of functions through concentration inequalities. We provide concrete examples in which the magnitude of the number of points sampled needed to guarantee a consistent estimate varies between polynomial to exponential, and show that in theory arbitrarily fast or slow rates are possible. This demonstrates that the behaviour of Monte Carlo integration in high dimensions is not uniform. Through our methods we also obtain non-asymptotic confidence intervals which are valid regardless of the number of points sampled.
蒙特卡罗积分是一种常用的计算难解积分的方法,但通常被认为在计算高维积分时表现不佳。为了表明情况并非总是如此,我们通过允许积分的维数增加,使用来自高维统计文献的技术来检查蒙特卡罗积分。在此过程中,我们通过集中不等式导出了一些一般函数类近似的相对误差和绝对误差的非渐近界。我们提供了具体的例子,其中采样点数量的大小需要保证一个一致的估计在多项式和指数之间变化,并表明在理论上任意快或慢的速率是可能的。这证明了高维蒙特卡罗积分的行为是不均匀的。通过我们的方法,我们还获得了非渐近置信区间,无论采样点的数量如何,它都是有效的。
{"title":"A Note on Monte Carlo Integration in High Dimensions","authors":"Yanbo Tang","doi":"10.1080/00031305.2023.2267637","DOIUrl":"https://doi.org/10.1080/00031305.2023.2267637","url":null,"abstract":"Monte Carlo integration is a commonly used technique to compute intractable integrals and is typically thought to perform poorly for very high-dimensional integrals. To show that this is not always the case, we examine Monte Carlo integration using techniques from the high-dimensional statistics literature by allowing the dimension of the integral to increase. In doing so, we derive non-asymptotic bounds for the relative and absolute error of the approximation for some general classes of functions through concentration inequalities. We provide concrete examples in which the magnitude of the number of points sampled needed to guarantee a consistent estimate varies between polynomial to exponential, and show that in theory arbitrarily fast or slow rates are possible. This demonstrates that the behaviour of Monte Carlo integration in high dimensions is not uniform. Through our methods we also obtain non-asymptotic confidence intervals which are valid regardless of the number of points sampled.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135141528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
One-step weighting to generalize and transport treatment effect estimates to a target population* 一步加权来概括和传递治疗效果估计到目标人群*
Pub Date : 2023-10-09 DOI: 10.1080/00031305.2023.2267598
Ambarish Chattopadhyay, Eric R. Cohn, José R. Zubizarreta
AbstractThe problems of generalization and transportation of treatment effect estimates from a study sample to a target population are central to empirical research and statistical methodology. In both randomized experiments and observational studies, weighting methods are often used with this objective. Traditional methods construct the weights by separately modeling the treatment assignment and study selection probabilities and then multiplying functions (e.g., inverses) of their estimates. In this work, we provide a justification and an implementation for weighting in a single step. We show a formal connection between this one-step method and inverse probability and inverse odds weighting. We demonstrate that the resulting estimator for the target average treatment effect is consistent, asymptotically Normal, multiply robust, and semiparametrically efficient. We evaluate the performance of the one-step estimator in a simulation study. We illustrate its use in a case study on the effects of physician racial diversity on preventive healthcare utilization among Black men in California. We provide R code implementing the methodology.Keywords: Causal inferenceGeneralizationTransportationRandomized experimentsObservational studiesWeighting methodsDisclaimerAs a service to authors and researchers we are providing this version of an accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proofs will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to these versions also.
摘要治疗效果估计从研究样本到目标人群的泛化和转移问题是实证研究和统计方法的核心问题。在随机实验和观察性研究中,权重法通常用于此目的。传统方法分别对处理分配和研究选择概率进行建模,然后将其估计值的函数(例如逆函数)相乘,从而构建权重。在这项工作中,我们提供了在单个步骤中加权的理由和实现。我们展示了这种一步法与逆概率和逆几率加权之间的正式联系。我们证明了所得到的目标平均处理效果的估计量是一致的、渐近正态的、乘鲁棒的和半参数有效的。我们在仿真研究中评估了一步估计器的性能。我们说明了它的使用,在一个案例研究的影响,医生种族多样性对预防保健利用黑人男性在加利福尼亚州。我们提供了实现该方法的R代码。关键词:因果推论,概括,运输,随机实验,观察性研究,加权方法免责声明作为对作者和研究人员的服务,我们提供这个版本的接受稿件(AM)。在最终出版版本记录(VoR)之前,将对该手稿进行编辑、排版和审查。在制作和印前,可能会发现可能影响内容的错误,所有适用于期刊的法律免责声明也与这些版本有关。
{"title":"One-step weighting to generalize and transport treatment effect estimates to a target population*","authors":"Ambarish Chattopadhyay, Eric R. Cohn, José R. Zubizarreta","doi":"10.1080/00031305.2023.2267598","DOIUrl":"https://doi.org/10.1080/00031305.2023.2267598","url":null,"abstract":"AbstractThe problems of generalization and transportation of treatment effect estimates from a study sample to a target population are central to empirical research and statistical methodology. In both randomized experiments and observational studies, weighting methods are often used with this objective. Traditional methods construct the weights by separately modeling the treatment assignment and study selection probabilities and then multiplying functions (e.g., inverses) of their estimates. In this work, we provide a justification and an implementation for weighting in a single step. We show a formal connection between this one-step method and inverse probability and inverse odds weighting. We demonstrate that the resulting estimator for the target average treatment effect is consistent, asymptotically Normal, multiply robust, and semiparametrically efficient. We evaluate the performance of the one-step estimator in a simulation study. We illustrate its use in a case study on the effects of physician racial diversity on preventive healthcare utilization among Black men in California. We provide R code implementing the methodology.Keywords: Causal inferenceGeneralizationTransportationRandomized experimentsObservational studiesWeighting methodsDisclaimerAs a service to authors and researchers we are providing this version of an accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proofs will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to these versions also.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135141677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Causal quartets: Different ways to attain the same average treatment effect* 因果四重奏:达到相同平均治疗效果的不同方法*
Pub Date : 2023-10-05 DOI: 10.1080/00031305.2023.2267597
Andrew Gelman, Jessica Hullman, Lauren Kennedy
AbstractThe average causal effect can often be best understood in the context of its variation. We demonstrate with two sets of four graphs, all of which represent the same average effect but with much different patterns of heterogeneity. As with the famous correlation quartet of Anscombe (1973), these graphs dramatize the way in which real-world variation can be more complex than simple numerical summaries. The graphs also give insight into why the average effect is often much smaller than anticipated.DisclaimerAs a service to authors and researchers we are providing this version of an accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proofs will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to these versions also.
摘要平均因果效应通常可以在其变化的背景下得到最好的理解。我们用两组四张图来证明,它们都代表了相同的平均效应,但异质性的模式却大不相同。与ancombe(1973)著名的相关四重奏一样,这些图表戏剧性地表明,现实世界的变化可能比简单的数字总结更复杂。这些图表还揭示了为什么平均效应通常比预期的要小得多。免责声明作为对作者和研究人员的服务,我们提供了这个版本的已接受的手稿(AM)。在最终出版版本记录(VoR)之前,将对该手稿进行编辑、排版和审查。在制作和印前,可能会发现可能影响内容的错误,所有适用于期刊的法律免责声明也与这些版本有关。
{"title":"Causal quartets: Different ways to attain the same average treatment effect*","authors":"Andrew Gelman, Jessica Hullman, Lauren Kennedy","doi":"10.1080/00031305.2023.2267597","DOIUrl":"https://doi.org/10.1080/00031305.2023.2267597","url":null,"abstract":"AbstractThe average causal effect can often be best understood in the context of its variation. We demonstrate with two sets of four graphs, all of which represent the same average effect but with much different patterns of heterogeneity. As with the famous correlation quartet of Anscombe (1973), these graphs dramatize the way in which real-world variation can be more complex than simple numerical summaries. The graphs also give insight into why the average effect is often much smaller than anticipated.DisclaimerAs a service to authors and researchers we are providing this version of an accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proofs will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to these versions also.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134975755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
ANOVA and Mixed Models: A Short Introduction Using RLukas Meier, Boca Raton, FL: Chapman & Hall/CRC Press, 2023, xiv + 187 pp., $66.95(P), ISBN: 978-0-367-70420-9. 方差分析和混合模型:使用RLukas Meier, Boca Raton, FL: Chapman &Hall/CRC Press, 2023, xiv + 187 pp, $66.95(P), ISBN: 978-0-367-70420-9。
Pub Date : 2023-10-02 DOI: 10.1080/00031305.2023.2261817
Brady T. West
{"title":"ANOVA and Mixed Models: A Short Introduction Using RLukas Meier, Boca Raton, FL: Chapman & Hall/CRC Press, 2023, xiv + 187 pp., $66.95(P), ISBN: 978-0-367-70420-9.","authors":"Brady T. West","doi":"10.1080/00031305.2023.2261817","DOIUrl":"https://doi.org/10.1080/00031305.2023.2261817","url":null,"abstract":"","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135902691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Missing data imputation with high-dimensional data 缺少高维数据的数据输入
Pub Date : 2023-10-02 DOI: 10.1080/00031305.2023.2259962
Alberto Brini, Edwin R. van den Heuvel
AbstractImputation of missing data in high-dimensional datasets with more variables P than samples N, P≫N, is hampered by the data dimensionality. For multivariate imputation, the covariance matrix is ill conditioned and cannot be properly estimated. For fully conditional imputation, the regression models for imputation cannot include all the variables. Thus, the high dimension requires special imputation approaches. In this paper, we provide an overview and realistic comparisons of imputation approaches for high-dimensional data when applied to a linear mixed modelling (LMM) framework. We examine approaches from three different classes using simulation studies: multiple imputation with penalized regression, multiple imputation with recursive partitioning and predictive mean matching and multiple imputation with Principal Component Analysis (PCA). We illustrate the methods on a real case study where a multivariate outcome, i.e., an extracted set of correlated biomarkers from human urine samples, was collected and monitored over time and we discuss the proposed methods with more standard imputation techniques that could be applied by ignoring either the multivariate or the longitudinal dimension. Our simulations demonstrate the superiority of the recursive partitioning and predictive mean matching algorithm over the other methods in terms of bias, mean squared error and coverage of the LMM parameter estimates when compared to those obtained from a data analysis without missingness, although it comes at the expense of high computational costs. It is worthwhile reconsidering much faster methodologies like the one relying on PCA.Keywords: high-dimensional datalongitudinal datalinear mixed modelsmissing datamultiple imputationprincipal component analysispenalized regressionrecursive partitioningDisclaimerAs a service to authors and researchers we are providing this version of an accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proofs will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to these versions also.
在变量P多于样本N (P = N)的高维数据集中,缺失数据的计算受到数据维数的限制。对于多元插值,协方差矩阵是病态的,不能正确估计。对于全条件归算,归算的回归模型不能包括所有的变量。因此,高维需要特殊的归算方法。在本文中,我们提供了一个概述和现实的比较,当应用于一个线性混合建模(LMM)框架的高维数据的imputation方法。我们使用模拟研究检查了三种不同类别的方法:惩罚回归的多重输入,递归划分和预测均值匹配的多重输入以及主成分分析(PCA)的多重输入。我们在一个真实的案例研究中说明了这些方法,其中一个多变量结果,即从人类尿液样本中提取的一组相关生物标志物,随着时间的推移被收集和监测,我们讨论了采用更标准的imputation技术提出的方法,这些技术可以通过忽略多变量或纵向维度来应用。我们的模拟证明了递归划分和预测均值匹配算法在偏差、均方误差和LMM参数估计的覆盖范围方面优于其他方法,尽管这是以高计算成本为代价的,但与从无缺失的数据分析中获得的结果相比。重新考虑更快的方法(比如依赖于PCA的方法)是值得的。关键词:高维数据纵向数据线性混合模型缺失数据多元假设主成分分析惩罚回归递归划分免责声明作为对作者和研究人员的服务,我们提供此版本的已接受稿件(AM)。在最终出版版本记录(VoR)之前,将对该手稿进行编辑、排版和审查。在制作和印前,可能会发现可能影响内容的错误,所有适用于期刊的法律免责声明也与这些版本有关。
{"title":"Missing data imputation with high-dimensional data","authors":"Alberto Brini, Edwin R. van den Heuvel","doi":"10.1080/00031305.2023.2259962","DOIUrl":"https://doi.org/10.1080/00031305.2023.2259962","url":null,"abstract":"AbstractImputation of missing data in high-dimensional datasets with more variables P than samples N, P≫N, is hampered by the data dimensionality. For multivariate imputation, the covariance matrix is ill conditioned and cannot be properly estimated. For fully conditional imputation, the regression models for imputation cannot include all the variables. Thus, the high dimension requires special imputation approaches. In this paper, we provide an overview and realistic comparisons of imputation approaches for high-dimensional data when applied to a linear mixed modelling (LMM) framework. We examine approaches from three different classes using simulation studies: multiple imputation with penalized regression, multiple imputation with recursive partitioning and predictive mean matching and multiple imputation with Principal Component Analysis (PCA). We illustrate the methods on a real case study where a multivariate outcome, i.e., an extracted set of correlated biomarkers from human urine samples, was collected and monitored over time and we discuss the proposed methods with more standard imputation techniques that could be applied by ignoring either the multivariate or the longitudinal dimension. Our simulations demonstrate the superiority of the recursive partitioning and predictive mean matching algorithm over the other methods in terms of bias, mean squared error and coverage of the LMM parameter estimates when compared to those obtained from a data analysis without missingness, although it comes at the expense of high computational costs. It is worthwhile reconsidering much faster methodologies like the one relying on PCA.Keywords: high-dimensional datalongitudinal datalinear mixed modelsmissing datamultiple imputationprincipal component analysispenalized regressionrecursive partitioningDisclaimerAs a service to authors and researchers we are providing this version of an accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proofs will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to these versions also.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135829922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A First Course in Linear Model Theory, 2nd ed.Nalini Ravishanker, Zhiyi Chi, and Dipak K. Dey, Boca Raton, FL: Chapman & Hall/CRC Press, 2022, xvi + 513 pp., $110.00(H), ISBN: 978-1-439-85805-9. nalini Ravishanker, Zhiyi Chi, and Dipak K. Dey, Boca Raton, FL: Chapman &霍尔/CRC出版社,2022,16 + 513页,$110.00(H), ISBN: 978-1-439-85805-9。
Pub Date : 2023-10-02 DOI: 10.1080/00031305.2023.2261819
Carlos Cinelli
{"title":"A First Course in Linear Model Theory, 2nd ed.Nalini Ravishanker, Zhiyi Chi, and Dipak K. Dey, Boca Raton, FL: Chapman & Hall/CRC Press, 2022, xvi + 513 pp., $110.00(H), ISBN: 978-1-439-85805-9.","authors":"Carlos Cinelli","doi":"10.1080/00031305.2023.2261819","DOIUrl":"https://doi.org/10.1080/00031305.2023.2261819","url":null,"abstract":"","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135902689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian Modeling and Computation in PythonOsvaldo A. Martin, Ravin Kumar, and Junpeng Lao, Boca Raton, FL: Chapman & Hall/CRC Press, 2022, xxii + 398 pp., $99.95(H), ISBN: 978-0-367-89436-8. python中的贝叶斯建模和计算osvaldo A. Martin, Ravin Kumar和Junpeng Lao, Boca Raton, FL: Chapman &;霍尔/CRC出版社,2022,22 + 398页,99.95美元(H), ISBN: 978-0-367-89436-8。
Pub Date : 2023-10-02 DOI: 10.1080/00031305.2023.2261818
P. Richard Hahn
{"title":"Bayesian Modeling and Computation in PythonOsvaldo A. Martin, Ravin Kumar, and Junpeng Lao, Boca Raton, FL: Chapman & Hall/CRC Press, 2022, xxii + 398 pp., $99.95(H), ISBN: 978-0-367-89436-8.","authors":"P. Richard Hahn","doi":"10.1080/00031305.2023.2261818","DOIUrl":"https://doi.org/10.1080/00031305.2023.2261818","url":null,"abstract":"","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135902687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Application of the Likelihood Ratio Test and the Cochran-Mantel-Haenszel Test to Discrimination Cases 似然比检验和Cochran-Mantel-Haenszel检验在歧视案件中的应用
Pub Date : 2023-09-15 DOI: 10.1080/00031305.2023.2259969
Weiwen Miao, Joseph L. Gastwirth
ABSTRACTIn practice, the ultimate outcome of many important discrimination cases, e.g. the Wal-Mart, Nike and Goldman-Sachs equal pay cases, is determined at the stage when the plaintiffs request that the case be certified as a class action. The primary statistical issue at this time is whether the employment practice in question leads to a common pattern of outcomes disadvantaging most plaintiffs. However, there are no formal procedures or government guidelines for checking whether an employment practice results in a common pattern of disparity. This paper proposes using the slightly modified likelihood ratio test and the one-sided Cochran-Mantel-Haenszel (CMH) test to examine data relevant to deciding whether this commonality requirement is satisfied. Data considered at the class certification stage from several actual cases are analyzed by the proposed procedures. The results often show that the employment practice at issue created a common pattern of disparity, however, based on the evidence presented to the courts, the class action requests were denied.KEYWORDS: Class actionCochran-Mantel-Haenszel testDisparate impactEmployment discriminationLikelihood ratio testStratified dataDisclaimerAs a service to authors and researchers we are providing this version of an accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proofs will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to these versions also.
摘要在实践中,许多重要的歧视案件,如沃尔玛、耐克和高盛同工同酬案,最终结果都是在原告要求认定案件为集体诉讼的阶段决定的。目前的主要统计问题是,所讨论的雇佣实践是否会导致对大多数原告不利的结果的共同模式。然而,没有正式的程序或政府指导方针来检查雇佣实践是否会导致普遍的不平等模式。本文建议使用稍作修改的似然比检验和单侧Cochran-Mantel-Haenszel (CMH)检验来检验相关数据,以确定是否满足这一共性要求。在班级认证阶段考虑的数据,从几个实际案例分析了拟议的程序。结果往往表明,有争议的就业做法造成了一种普遍的不平等模式,然而,根据提交给法院的证据,集体诉讼请求被拒绝了。关键词:集体诉讼cochran - mantel - haenszel检验不同影响就业歧视似然比检验分层数据免责声明作为对作者和研究人员的服务,我们提供此版本的已接受稿件(AM)。在最终出版版本记录(VoR)之前,将对该手稿进行编辑、排版和审查。在制作和印前,可能会发现可能影响内容的错误,所有适用于期刊的法律免责声明也与这些版本有关。
{"title":"The Application of the Likelihood Ratio Test and the Cochran-Mantel-Haenszel Test to Discrimination Cases","authors":"Weiwen Miao, Joseph L. Gastwirth","doi":"10.1080/00031305.2023.2259969","DOIUrl":"https://doi.org/10.1080/00031305.2023.2259969","url":null,"abstract":"ABSTRACTIn practice, the ultimate outcome of many important discrimination cases, e.g. the Wal-Mart, Nike and Goldman-Sachs equal pay cases, is determined at the stage when the plaintiffs request that the case be certified as a class action. The primary statistical issue at this time is whether the employment practice in question leads to a common pattern of outcomes disadvantaging most plaintiffs. However, there are no formal procedures or government guidelines for checking whether an employment practice results in a common pattern of disparity. This paper proposes using the slightly modified likelihood ratio test and the one-sided Cochran-Mantel-Haenszel (CMH) test to examine data relevant to deciding whether this commonality requirement is satisfied. Data considered at the class certification stage from several actual cases are analyzed by the proposed procedures. The results often show that the employment practice at issue created a common pattern of disparity, however, based on the evidence presented to the courts, the class action requests were denied.KEYWORDS: Class actionCochran-Mantel-Haenszel testDisparate impactEmployment discriminationLikelihood ratio testStratified dataDisclaimerAs a service to authors and researchers we are providing this version of an accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proofs will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to these versions also.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135394720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
The American Statistician
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1