The American Statistician最新文献

英文中文

Confidence Distributions for the Autoregressive Parameter 自回归参数的置信分布

The American Statistician

Pub Date : 2023-03-27 DOI: 10.1080/00031305.2023.2226184

Rolf Larsson

The notion of confidence distributions is applied to inference about the parameter in a simple autoregressive model, allowing the parameter to take the value one. This makes it possible to compare to asymptotic approximations in both the stationary and the non stationary cases at the same time. The main point, however, is to compare to a Bayesian analysis of the same problem. A non informative prior for a parameter, in the sense of Jeffreys, is given as the ratio of the confidence density and the likelihood. In this way, the similarity between the confidence and non-informative Bayesian frameworks is exploited. It is shown that, in the stationary case, asymptotically the so induced prior is flat. However, if a unit parameter is allowed, the induced prior has to have a spike at one of some size. Simulation studies and two empirical examples illustrate the ideas.

将置信分布的概念应用于简单自回归模型中参数的推断，使参数取值为1。这使得在平稳和非平稳情况下同时比较渐近逼近成为可能。然而，主要的一点是将其与贝叶斯分析的相同问题进行比较。在杰弗里斯的意义上，参数的非信息先验是置信密度与似然之比。通过这种方式，利用了置信度和非信息贝叶斯框架之间的相似性。结果表明，在平稳情况下，渐近诱导先验是平坦的。但是，如果允许使用单位参数，则诱导先验必须具有某种大小的尖峰。仿真研究和两个实例说明了这些思想。

引用次数: 0

The Wald Confidence Interval for a Binomial p as an Illuminating “Bad” Example 二项p的Wald置信区间作为一个启发性的“坏”例子

The American Statistician

Pub Date : 2023-03-21 DOI: 10.1080/00031305.2023.2183257

Per Gösta Andersson

Abstract When teaching we usually not only demonstrate/discuss how a certain method works, but, not less important, why it works. In contrast, the Wald confidence interval for a binomial p constitutes an excellent example of a case where we might be interested in why a method does not work. It has been in use for many years and, sadly enough, it is still to be found in many textbooks in mathematical statistics/statistics. The reasons for not using this interval are plentiful and this fact gives us a good opportunity to discuss all of its deficiencies and draw conclusions which are of more general interest. We will mostly use already known results and bring them together in a manner appropriate to the teaching situation. The main purpose of this article is to show how to stimulate students to take a more critical view of simplifications and approximations. We primarily aim for master’s students who previously have been confronted with the Wilson (score) interval, but parts of the presentation may as well be suitable for bachelor’s students.

引用次数: 1

A Characterization of Most(More) Powerful Test Statistics with Simple Nonparametric Applications 用简单的非参数应用描述大多数(更)强大的检验统计量

The American Statistician

Pub Date : 2023-03-14 DOI: 10.1080/00031305.2023.2192746

A. Vexler, Alan D. Hutson

Data-driven most powerful tests are statistical hypothesis decision-making tools that deliver the greatest power against a fixed null hypothesis among all corresponding data-based tests of a given size. When the underlying data distributions are known, the likelihood ratio principle can be applied to conduct most powerful tests. Reversing this notion, we consider the following questions. (a) Assuming a test statistic, say T, is given, how can we transform T to improve the power of the test? (b) Can T be used to generate the most powerful test? (c) How does one compare test statistics with respect to an attribute of the desired most powerful decision-making procedure? To examine these questions, we propose one-to-one mapping of the term 'Most Powerful' to the distribution properties of a given test statistic via matching characterization. This form of characterization has practical applicability and aligns well with the general principle of sufficiency. Findings indicate that to improve a given test, we can employ relevant ancillary statistics that do not have changes in their distributions with respect to tested hypotheses. As an example, the present method is illustrated by modifying the usual t-test under nonparametric settings. Numerical studies based on generated data and a real-data set confirm that the proposed approach can be useful in practice.

数据驱动的最强大的检验是统计假设决策工具，它在给定规模的所有相应的基于数据的检验中提供针对固定零假设的最大能力。当底层数据分布已知时，可以应用似然比原理进行最有效的测试。相反，我们考虑以下问题。(a)假设一个检验统计量T是给定的，我们如何变换T来提高检验的有效性?(b) T可以用来产生最强大的测试吗?(c)如何比较测试统计数据与所期望的最有力决策程序的属性?为了检验这些问题，我们建议通过匹配表征将术语“最强大”一对一映射到给定检验统计量的分布属性。这种形式的定性具有实际适用性，并与充分性的一般原则很好地一致。研究结果表明，为了改进给定的检验，我们可以使用相关的辅助统计量，这些统计量的分布相对于被检验的假设没有变化。作为一个例子，通过修改非参数设置下通常的t检验来说明本方法。基于生成数据和实际数据集的数值研究证实了该方法的实用性。

{"title":"A Characterization of Most(More) Powerful Test Statistics with Simple Nonparametric Applications","authors":"A. Vexler, Alan D. Hutson","doi":"10.1080/00031305.2023.2192746","DOIUrl":"https://doi.org/10.1080/00031305.2023.2192746","url":null,"abstract":"Data-driven most powerful tests are statistical hypothesis decision-making tools that deliver the greatest power against a fixed null hypothesis among all corresponding data-based tests of a given size. When the underlying data distributions are known, the likelihood ratio principle can be applied to conduct most powerful tests. Reversing this notion, we consider the following questions. (a) Assuming a test statistic, say T, is given, how can we transform T to improve the power of the test? (b) Can T be used to generate the most powerful test? (c) How does one compare test statistics with respect to an attribute of the desired most powerful decision-making procedure? To examine these questions, we propose one-to-one mapping of the term 'Most Powerful' to the distribution properties of a given test statistic via matching characterization. This form of characterization has practical applicability and aligns well with the general principle of sufficiency. Findings indicate that to improve a given test, we can employ relevant ancillary statistics that do not have changes in their distributions with respect to tested hypotheses. As an example, the present method is illustrated by modifying the usual t-test under nonparametric settings. Numerical studies based on generated data and a real-data set confirm that the proposed approach can be useful in practice.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127477126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Out-of-sample R2: estimation and inference 样本外R2:估计和推断

The American Statistician

Pub Date : 2023-02-10 DOI: 10.1080/00031305.2023.2216252

Stijn Hawinkel, W. Waegeman, Steven Maere

Out-of-sample prediction is the acid test of predictive models, yet an independent test dataset is often not available for assessment of the prediction error. For this reason, out-of-sample performance is commonly estimated using data splitting algorithms such as cross-validation or the bootstrap. For quantitative outcomes, the ratio of variance explained to total variance can be summarized by the coefficient of determination or in-sample $R^2$, which is easy to interpret and to compare across different outcome variables. As opposed to the in-sample $R^2$, the out-of-sample $R^2$ has not been well defined and the variability on the out-of-sample $hat{R}^2$ has been largely ignored. Usually only its point estimate is reported, hampering formal comparison of predictability of different outcome variables. Here we explicitly define the out-of-sample $R^2$ as a comparison of two predictive models, provide an unbiased estimator and exploit recent theoretical advances on uncertainty of data splitting estimates to provide a standard error for the $hat{R}^2$. The performance of the estimators for the $R^2$ and its standard error are investigated in a simulation study. We demonstrate our new method by constructing confidence intervals and comparing models for prediction of quantitative $text{Brassica napus}$ and $text{Zea mays}$ phenotypes based on gene expression data.

样本外预测是预测模型的严格检验，但通常没有独立的测试数据集来评估预测误差。出于这个原因，通常使用数据分割算法(如交叉验证或自举)来估计样本外性能。对于定量结果，解释的方差与总方差的比率可以用决定系数或样本内R^2来概括，这很容易解释和比较不同结果变量。与样本内$R^2$相反，样本外$R^2$没有得到很好的定义，并且样本外$hat{R}^2$的可变性在很大程度上被忽略了。通常只报告其点估计，妨碍了对不同结果变量的可预测性进行正式比较。在这里，我们明确地将样本外$R^2$定义为两个预测模型的比较，提供了一个无偏估计量，并利用最近关于数据分割估计不确定性的理论进展，为$hat{R}^2$提供了一个标准误差。仿真研究了R^2估计器的性能及其标准误差。我们通过构建置信区间和比较基于基因表达数据的定量预测$text{油菜}$和$text{玉米}$表型的模型来证明我们的新方法。

{"title":"Out-of-sample R2: estimation and inference","authors":"Stijn Hawinkel, W. Waegeman, Steven Maere","doi":"10.1080/00031305.2023.2216252","DOIUrl":"https://doi.org/10.1080/00031305.2023.2216252","url":null,"abstract":"Out-of-sample prediction is the acid test of predictive models, yet an independent test dataset is often not available for assessment of the prediction error. For this reason, out-of-sample performance is commonly estimated using data splitting algorithms such as cross-validation or the bootstrap. For quantitative outcomes, the ratio of variance explained to total variance can be summarized by the coefficient of determination or in-sample $R^2$, which is easy to interpret and to compare across different outcome variables. As opposed to the in-sample $R^2$, the out-of-sample $R^2$ has not been well defined and the variability on the out-of-sample $hat{R}^2$ has been largely ignored. Usually only its point estimate is reported, hampering formal comparison of predictability of different outcome variables. Here we explicitly define the out-of-sample $R^2$ as a comparison of two predictive models, provide an unbiased estimator and exploit recent theoretical advances on uncertainty of data splitting estimates to provide a standard error for the $hat{R}^2$. The performance of the estimators for the $R^2$ and its standard error are investigated in a simulation study. We demonstrate our new method by constructing confidence intervals and comparing models for prediction of quantitative $text{Brassica napus}$ and $text{Zea mays}$ phenotypes based on gene expression data.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127156188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhanced Inference for Finite Population Sampling-Based Prevalence Estimation with Misclassification Errors 基于有限总体抽样的误分类误差患病率估计的增强推理

The American Statistician

Pub Date : 2023-02-07 DOI: 10.1080/00031305.2023.2250401

Lin Ge, Yuzi Zhang, L. Waller, R. Lyles

Epidemiologic screening programs often make use of tests with small, but non-zero probabilities of misdiagnosis. In this article, we assume the target population is finite with a fixed number of true cases, and that we apply an imperfect test with known sensitivity and specificity to a sample of individuals from the population. In this setting, we propose an enhanced inferential approach for use in conjunction with sampling-based bias-corrected prevalence estimation. While ignoring the finite nature of the population can yield markedly conservative estimates, direct application of a standard finite population correction (FPC) conversely leads to underestimation of variance. We uncover a way to leverage the typical FPC indirectly toward valid statistical inference. In particular, we derive a readily estimable extra variance component induced by misclassification in this specific but arguably common diagnostic testing scenario. Our approach yields a standard error estimate that properly captures the sampling variability of the usual bias-corrected maximum likelihood estimator of disease prevalence. Finally, we develop an adapted Bayesian credible interval for the true prevalence that offers improved frequentist properties (i.e., coverage and width) relative to a Wald-type confidence interval. We report the simulation results to demonstrate the enhanced performance of the proposed inferential methods.

流行病学筛查项目经常使用误诊概率小但非零的测试。在本文中，我们假设目标人群是有限的，真实病例的数量是固定的，并且我们将具有已知灵敏度和特异性的不完美测试应用于人群中的个体样本。在这种情况下，我们提出了一种增强的推理方法，可与基于抽样的偏差校正患病率估计结合使用。虽然忽略总体的有限性质可以产生明显保守的估计，但直接应用标准有限总体校正(FPC)反过来会导致方差的低估。我们发现了一种间接利用典型FPC进行有效统计推断的方法。特别是，我们推导了一个容易估计的额外方差成分，由错误分类引起，在这个特定的，但可以说是常见的诊断测试场景。我们的方法产生了一个标准误差估计，该估计适当地捕获了疾病流行的通常偏差校正的最大似然估计的抽样变异性。最后，我们为真实流行率开发了一个适应的贝叶斯可信区间，相对于wald型置信区间，它提供了改进的频率特性(即覆盖率和宽度)。我们报告了仿真结果，以证明所提出的推理方法的性能增强。

{"title":"Enhanced Inference for Finite Population Sampling-Based Prevalence Estimation with Misclassification Errors","authors":"Lin Ge, Yuzi Zhang, L. Waller, R. Lyles","doi":"10.1080/00031305.2023.2250401","DOIUrl":"https://doi.org/10.1080/00031305.2023.2250401","url":null,"abstract":"Epidemiologic screening programs often make use of tests with small, but non-zero probabilities of misdiagnosis. In this article, we assume the target population is finite with a fixed number of true cases, and that we apply an imperfect test with known sensitivity and specificity to a sample of individuals from the population. In this setting, we propose an enhanced inferential approach for use in conjunction with sampling-based bias-corrected prevalence estimation. While ignoring the finite nature of the population can yield markedly conservative estimates, direct application of a standard finite population correction (FPC) conversely leads to underestimation of variance. We uncover a way to leverage the typical FPC indirectly toward valid statistical inference. In particular, we derive a readily estimable extra variance component induced by misclassification in this specific but arguably common diagnostic testing scenario. Our approach yields a standard error estimate that properly captures the sampling variability of the usual bias-corrected maximum likelihood estimator of disease prevalence. Finally, we develop an adapted Bayesian credible interval for the true prevalence that offers improved frequentist properties (i.e., coverage and width) relative to a Wald-type confidence interval. We report the simulation results to demonstrate the enhanced performance of the proposed inferential methods.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114532117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

MOVER-R and Penalized MOVER-R Confidence Intervals for the Ratio of Two Quantities 两个数量之比的move - r和惩罚的move - r置信区间

The American Statistician

Pub Date : 2023-01-30 DOI: 10.1080/00031305.2023.2173294

Peng Wang, Yi-lei Ma, Siqi Xu, Yixin Wang, Yu Zhang, Xiangyang Lou, Ming Li, Baolin Wu, Guimin Gao, P. Yin, Nianjun Liu

Abstract Developing a confidence interval for the ratio of two quantities is an important task in statistics because of its omnipresence in real world applications. For such a problem, the MOVER-R (method of variance recovery for the ratio) technique, which is based on the recovery of variance estimates from confidence limits of the numerator and the denominator separately, was proposed as a useful and efficient approach. However, this method implicitly assumes that the confidence interval for the denominator never includes zero, which might be violated in practice. In this article, we first use a new framework to derive the MOVER-R confidence interval, which does not require the above assumption and covers the whole parameter space. We find that MOVER-R can produce an unbounded confidence interval, just like the well-known Fieller method. To overcome this issue, we further propose the penalized MOVER-R. We prove that the new method differs from MOVER-R only at the second order. It, however, always gives a bounded and analytic confidence interval. Through simulation studies and a real data application, we show that the penalized MOVER-R generally provides a better confidence interval than MOVER-R in terms of controlling the coverage probability and the median width.

{"title":"MOVER-R and Penalized MOVER-R Confidence Intervals for the Ratio of Two Quantities","authors":"Peng Wang, Yi-lei Ma, Siqi Xu, Yixin Wang, Yu Zhang, Xiangyang Lou, Ming Li, Baolin Wu, Guimin Gao, P. Yin, Nianjun Liu","doi":"10.1080/00031305.2023.2173294","DOIUrl":"https://doi.org/10.1080/00031305.2023.2173294","url":null,"abstract":"Abstract Developing a confidence interval for the ratio of two quantities is an important task in statistics because of its omnipresence in real world applications. For such a problem, the MOVER-R (method of variance recovery for the ratio) technique, which is based on the recovery of variance estimates from confidence limits of the numerator and the denominator separately, was proposed as a useful and efficient approach. However, this method implicitly assumes that the confidence interval for the denominator never includes zero, which might be violated in practice. In this article, we first use a new framework to derive the MOVER-R confidence interval, which does not require the above assumption and covers the whole parameter space. We find that MOVER-R can produce an unbounded confidence interval, just like the well-known Fieller method. To overcome this issue, we further propose the penalized MOVER-R. We prove that the new method differs from MOVER-R only at the second order. It, however, always gives a bounded and analytic confidence interval. Through simulation studies and a real data application, we show that the penalized MOVER-R generally provides a better confidence interval than MOVER-R in terms of controlling the coverage probability and the median width.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125420368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Consultancy Style Dissertations in Statistics and Data Science: Why and How 统计和数据科学的咨询风格论文:为什么和如何

The American Statistician

Pub Date : 2023-01-11 DOI: 10.1080/00031305.2022.2163689

Serveh Sharifi Far, Vanda Inácio, D. Paulin, M. de Carvalho, Nicole Augustin, Mike Allerhand, Gail S Robertson

Abstract In this article, we chronicle the development of the consultancy style dissertations of the MSc program in Statistics with Data Science at the University of Edinburgh. These dissertations are based on real-world data problems, in joint supervision with industrial and academic partners, and aim to get all students in the cohort together to develop consultancy skills and best practices, and also to promote their statistical leadership. Aligning with recently published research on statistical education suggesting the need for a greater focus on statistical consultancy skills, we summarize our experience in organizing and supervising such consultancy style dissertations, describe the logistics of implementing them, and review the students’ and supervisors’ feedback about these dissertations.

在这篇文章中，我们记录了爱丁堡大学统计与数据科学硕士课程的咨询风格论文的发展。这些论文基于现实世界的数据问题，在工业和学术合作伙伴的共同监督下，旨在让所有学生共同发展咨询技能和最佳实践，并提升他们的统计领导力。根据最近发表的关于统计教育的研究表明，需要更多地关注统计咨询技能，我们总结了我们在组织和监督这种咨询风格的论文方面的经验，描述了实施这些论文的后勤工作，并回顾了学生和导师对这些论文的反馈。

引用次数: 0

Bayesian Log-Rank Test 贝叶斯对数秩检验

The American Statistician

Pub Date : 2023-01-06 DOI: 10.1080/00031305.2022.2161637

Jiaqi Gu, Y. Zhang, G. Yin

Abstract Comparison of two survival curves is a fundamental problem in survival analysis. Although abundant frequentist methods have been developed for comparing survival functions, inference procedures from the Bayesian perspective are rather limited. In this article, we extract the quantity of interest from the classic log-rank test and propose its Bayesian counterpart. Monte Carlo methods, including a Gibbs sampler and a sequential importance sampling procedure, are developed to draw posterior samples of survival functions and a decision rule of hypothesis testing is constructed for making inference. Via simulations and real data analysis, the proposed Bayesian log-rank test is shown to be asymptotically equivalent to the classic one when noninformative prior distributions are used, which provides a Bayesian interpretation of the log-rank test. When using the correct prior information from historical data, the Bayesian log-rank test is shown to outperform the classic one in terms of power. R codes to implement the Bayesian log-rank test are also provided with step-by-step instructions.

两种生存曲线的比较是生存分析中的一个基本问题。虽然已经开发了大量的频率论方法来比较生存函数，但从贝叶斯的角度进行推理的程序相当有限。在本文中，我们从经典的log-rank检验中提取感兴趣的数量，并提出它的贝叶斯对应物。提出了利用Gibbs采样器和顺序重要抽样法提取生存函数后验样本的蒙特卡罗方法，并构造了假设检验的决策规则进行推理。通过仿真和实际数据分析，表明当使用非信息先验分布时，所提出的贝叶斯对数秩检验与经典的贝叶斯对数秩检验是渐近等价的，从而提供了对数秩检验的贝叶斯解释。当使用来自历史数据的正确先验信息时，贝叶斯对数秩检验在功率方面优于经典检验。R代码实现贝叶斯对数秩测试也提供了一步一步的说明。

引用次数: 0

Semi-Structured Distributional Regression 半结构分布回归

The American Statistician

Pub Date : 2023-01-03 DOI: 10.1080/00031305.2022.2164054

D. Rügamer, Chris Kolb, N. Klein

引用次数: 7

Introducing Variational Inference in Statistics and Data Science Curriculum 在统计学和数据科学课程中引入变分推理

The American Statistician

Pub Date : 2023-01-03 DOI: 10.1080/00031305.2023.2232006

Vojtech Kejzlar, Jingchen Hu

Probabilistic models such as logistic regression, Bayesian classification, neural networks, and models for natural language processing, are increasingly more present in both undergraduate and graduate statistics and data science curricula due to their wide range of applications. In this paper, we present a one-week course module for studnets in advanced undergraduate and applied graduate courses on variational inference, a popular optimization-based approach for approximate inference with probabilistic models. Our proposed module is guided by active learning principles: In addition to lecture materials on variational inference, we provide an accompanying class activity, an texttt{R shiny} app, and guided labs based on real data applications of logistic regression and clustering documents using Latent Dirichlet Allocation with texttt{R} code. The main goal of our module is to expose students to a method that facilitates statistical modeling and inference with large datasets. Using our proposed module as a foundation, instructors can adopt and adapt it to introduce more realistic case studies and applications in data science, Bayesian statistics, multivariate analysis, and statistical machine learning courses.

概率模型，如逻辑回归、贝叶斯分类、神经网络和自然语言处理模型，由于其广泛的应用，越来越多地出现在本科和研究生统计学和数据科学课程中。在本文中，我们为高级本科和应用研究生课程的学生提供了一个为期一周的变分推理课程模块，变分推理是一种流行的基于优化的概率模型近似推理方法。我们提出的模块以主动学习原则为指导:除了关于变分推理的讲座材料外，我们还提供了一个附带的课堂活动，一个texttt{R闪亮}的应用程序，以及基于逻辑回归和使用texttt{R}代码使用Latent Dirichlet Allocation聚类文档的实际数据应用的指导实验室。本模块的主要目标是向学生展示一种便于统计建模和大型数据集推理的方法。使用我们提出的模块作为基础，教师可以采用和调整它来引入更现实的案例研究和应用在数据科学、贝叶斯统计、多元分析和统计机器学习课程中。

{"title":"Introducing Variational Inference in Statistics and Data Science Curriculum","authors":"Vojtech Kejzlar, Jingchen Hu","doi":"10.1080/00031305.2023.2232006","DOIUrl":"https://doi.org/10.1080/00031305.2023.2232006","url":null,"abstract":"Probabilistic models such as logistic regression, Bayesian classification, neural networks, and models for natural language processing, are increasingly more present in both undergraduate and graduate statistics and data science curricula due to their wide range of applications. In this paper, we present a one-week course module for studnets in advanced undergraduate and applied graduate courses on variational inference, a popular optimization-based approach for approximate inference with probabilistic models. Our proposed module is guided by active learning principles: In addition to lecture materials on variational inference, we provide an accompanying class activity, an texttt{R shiny} app, and guided labs based on real data applications of logistic regression and clustering documents using Latent Dirichlet Allocation with texttt{R} code. The main goal of our module is to expose students to a method that facilitates statistical modeling and inference with large datasets. Using our proposed module as a foundation, instructors can adopt and adapt it to introduce more realistic case studies and applications in data science, Bayesian statistics, multivariate analysis, and statistical machine learning courses.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127092154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

The American Statistician

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀