Pub Date : 2026-01-08DOI: 10.1080/00031305.2025.2612197
Daniel Gaigall, Julian Gerstenberg
{"title":"On the number of replications in resampling tests and Monte Carlo simulation studies","authors":"Daniel Gaigall, Julian Gerstenberg","doi":"10.1080/00031305.2025.2612197","DOIUrl":"https://doi.org/10.1080/00031305.2025.2612197","url":null,"abstract":"","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"12 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145920155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-31DOI: 10.1080/00031305.2025.2611010
Tianchen Gao, Jiashun Jin, Zheng Tracy Ke, Gabriel Moryoussef
Recently, DeepSeek has been the focus of attention in and beyond the AI community. An interesting problem is how DeepSeek compares to other large language models (LLMs). There are many tasks an LLM can do, and in this paper, we use the task of predicting an outcome using a short text for comparison. We consider two settings, an authorship classification setting and a citation classification setting. In the first one, the goal is to determine whether a short text is written by human or AI. In the second one, the goal is to classify a citation to one of four types using the textual content. For each experiment, we compare DeepSeek with 4 popular LLMs: Claude, Gemini, GPT, and Llama.We find that, in terms of classification accuracy, DeepSeek outperforms Gemini, GPT, and Llama in most cases, but underperforms Claude. We also find that DeepSeek is comparably slower than others but with a low cost to use, while Claude is much more expensive than all the others. Finally, we find that in terms of similarity, the output of DeepSeek is most similar to those of Gemini and Claude (and among all 5 LLMs, Claude and Gemini have the most similar outputs).In this paper, we also present a fully-labeled dataset collected by ourselves, and propose a recipe where we can use the LLMs and a recent data set, MADStat, to generate new data sets. The datasets in our paper can be used as benchmarks for future study on LLMs.
{"title":"A Comparison of DeepSeek and Other LLMs","authors":"Tianchen Gao, Jiashun Jin, Zheng Tracy Ke, Gabriel Moryoussef","doi":"10.1080/00031305.2025.2611010","DOIUrl":"https://doi.org/10.1080/00031305.2025.2611010","url":null,"abstract":"Recently, DeepSeek has been the focus of attention in and beyond the AI community. An interesting problem is how DeepSeek compares to other large language models (LLMs). There are many tasks an LLM can do, and in this paper, we use the <i>task of predicting an outcome using a short text</i> for comparison. We consider two settings, an authorship classification setting and a citation classification setting. In the first one, the goal is to determine whether a short text is written by human or AI. In the second one, the goal is to classify a citation to one of four types using the textual content. For each experiment, we compare DeepSeek with 4 popular LLMs: Claude, Gemini, GPT, and Llama.We find that, in terms of classification accuracy, DeepSeek outperforms Gemini, GPT, and Llama in most cases, but underperforms Claude. We also find that DeepSeek is comparably slower than others but with a low cost to use, while Claude is much more expensive than all the others. Finally, we find that in terms of similarity, the output of DeepSeek is most similar to those of Gemini and Claude (and among all 5 LLMs, Claude and Gemini have the most similar outputs).In this paper, we also present a fully-labeled dataset collected by ourselves, and propose a recipe where we can use the LLMs and a recent data set, MADStat, to generate new data sets. The datasets in our paper can be used as benchmarks for future study on LLMs.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"29 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145937525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-29DOI: 10.1080/00031305.2025.2608724
Richard A. Levine
This article examines how students can engage with generative artificial intelligence (genAI) as collaborators in the statistics learning process. Prompt engineering is positioned as a transferable, tool-agnostic competency that reinforces core elements of statistical thinking, including clarity, iteration, and purposeful inquiry. Through illustrative collaborations, we explore applications such as automating and optimizing code, acquiring programming syntax, and designing simulation studies. While these tasks are drawn from upper-level undergraduate and graduate coursework, the running example–a chi-squared test of association–is intended to spur ideas for incorporating genAI into the introductory statistics classroom. Supplementary materials include a) an outline of a learning management module and structure of the discussion and activities during my class periods covering this module on responsible use of generative AI; b) R Markdown files and compiled pdf documents intended to support classroom integration; c) illustrative comparisons across three widely used platforms–ChatGPT, Copilot, and Gemini–to highlight how differences in output style and reasoning can inform instructional design, rather than to rank or evaluate tools technically. The article concludes with a discussion of strategies for promoting ethical, transparent, and inclusive uses of genAI in statistics education.
本文探讨了学生如何在统计学学习过程中与生成式人工智能(genAI)合作。提示工程被定位为一种可转移的、工具不可知的能力,它强化了统计思维的核心要素,包括清晰度、迭代和有目的的查询。通过说明性合作,我们探索诸如自动化和优化代码、获取编程语法和设计仿真研究等应用程序。虽然这些任务是从高年级的本科和研究生课程中抽取的,但是这个运行的例子——关联的卡方检验——旨在激发将基因人工智能纳入入门统计学课堂的想法。补充材料包括:a)一个学习管理模块的大纲,以及在我的课堂上讨论和活动的结构,该模块涉及负责任地使用生成式人工智能;b) R Markdown文件和已编译的pdf文件,旨在支持课堂整合;c)对三个广泛使用的平台(chatgpt、Copilot和gemini)进行说明性比较,以突出输出风格和推理的差异如何为教学设计提供信息,而不是在技术上对工具进行排名或评估。文章最后讨论了在统计教育中促进道德、透明和包容地使用基因人工智能的策略。
{"title":"Facilitating a Collaborative Relationship between Generative AI and the Statistics Student","authors":"Richard A. Levine","doi":"10.1080/00031305.2025.2608724","DOIUrl":"https://doi.org/10.1080/00031305.2025.2608724","url":null,"abstract":"This article examines how students can engage with generative artificial intelligence (genAI) as collaborators in the statistics learning process. Prompt engineering is positioned as a transferable, tool-agnostic competency that reinforces core elements of statistical thinking, including clarity, iteration, and purposeful inquiry. Through illustrative collaborations, we explore applications such as automating and optimizing code, acquiring programming syntax, and designing simulation studies. While these tasks are drawn from upper-level undergraduate and graduate coursework, the running example–a chi-squared test of association–is intended to spur ideas for incorporating genAI into the introductory statistics classroom. Supplementary materials include a) an outline of a learning management module and structure of the discussion and activities during my class periods covering this module on responsible use of generative AI; b) R Markdown files and compiled pdf documents intended to support classroom integration; c) illustrative comparisons across three widely used platforms–ChatGPT, Copilot, and Gemini–to highlight how differences in output style and reasoning can inform instructional design, rather than to rank or evaluate tools technically. The article concludes with a discussion of strategies for promoting ethical, transparent, and inclusive uses of genAI in statistics education.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"22 1","pages":"1-22"},"PeriodicalIF":1.8,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145894450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-23DOI: 10.1080/00031305.2025.2604805
Joel B. Greenhouse, Christopher J. Phillips
{"title":"Abraham Wald and the Origins of the Sequential Probability Ratio Test","authors":"Joel B. Greenhouse, Christopher J. Phillips","doi":"10.1080/00031305.2025.2604805","DOIUrl":"https://doi.org/10.1080/00031305.2025.2604805","url":null,"abstract":"","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"25 1","pages":"1-12"},"PeriodicalIF":1.8,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145836184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-22DOI: 10.1080/00031305.2025.2606077
Ujjwal Das, Ranojoy Basu
{"title":"Feature selection in Cox model with partially observed covariates: Application to oncology trials","authors":"Ujjwal Das, Ranojoy Basu","doi":"10.1080/00031305.2025.2606077","DOIUrl":"https://doi.org/10.1080/00031305.2025.2606077","url":null,"abstract":"","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"22 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145836183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-22DOI: 10.1080/00031305.2025.2606079
James A. Hanley
Although we have had nearly a century to refine it, our teaching of confidence intervals for parameters is still imperfect. Despite all of our warnings regarding these intervals, it is not uncommon for end-users to mis-interpret them. We discuss some possible reasons for this, and using a printed figure and a Shiny app, work through a simple and close-to-home example while trying to avoid many of these traps. We urge teachers to (a) begin with contexts that require less technical knowledge, or where the technical details can be kept out of the way (b) avoid the traditional (and symmetric) ‘point estimate a z- or t-based margin of error’ confidence intervals that lead to lazy and muddled thinking (c) start with a direct approach – rather than an indirect frequentist one that can end up being misinterpreted and (d) encourage the reverse logic that asks what parameter values might have produced the data we see, rather than what data values will be produced by a parameter value.
{"title":"Probabilistic parameter estimates that require less small print","authors":"James A. Hanley","doi":"10.1080/00031305.2025.2606079","DOIUrl":"https://doi.org/10.1080/00031305.2025.2606079","url":null,"abstract":"Although we have had nearly a century to refine it, our teaching of confidence intervals for parameters is still imperfect. Despite all of our warnings regarding these intervals, it is not uncommon for end-users to mis-interpret them. We discuss some possible reasons for this, and using a printed figure and a <span>Shiny</span> app, work through a simple and close-to-home example while trying to avoid many of these traps. We urge teachers to (a) begin with contexts that require less technical knowledge, or where the technical details can be kept out of the way (b) avoid the traditional (and symmetric) ‘point estimate <span><img alt=\"\" data-formula-source='{\"type\":\"image\",\"src\":\"/cms/asset/489dd3c1-6837-4878-ad3f-ec142d578d2d/utas_a_2606079_ilm0001.gif\"}' src=\"//:0\"/></span><span><img alt=\"\" data-formula-source='{\"type\":\"mathjax\"}' src=\"//:0\"/><math display=\"inline\"><mo>±</mo></math></span> a <i>z</i>- or <i>t</i>-based margin of error’ confidence intervals that lead to lazy and muddled thinking (c) start with a direct approach – rather than an indirect frequentist one that can end up being misinterpreted and (d) encourage the reverse logic that asks what parameter values might have produced the data we see, rather than what data values will be produced by a parameter value.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"45 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145801429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-22DOI: 10.1080/00031305.2025.2603256
Nils Lid Hjort, Emil Aas Stoltenberg
The Stirling approximation formula for 𝑛! dates from 1730. Here we give new and instructive proofs of this and related approximation formulae via tools of probability and statistics. There are connections to the Central Limit Theorem and also to approximations of marginal distributions in Bayesian setups, with arguments which can be worked through by Master and PhD level students (and above). Certain formulae emerge by working through particular instances, some independently verifiable but others perhaps not. A particular case yielding new formulae is that of summing independent uniforms, related to the Irwin–Hall distribution. Yet further proofs of the Stirling flow from examining aspects of limiting normality of the sample median of uniforms, and from these again we find a proof for the Wallis product formula for 𝜋. A section detailing historical aspects and development is included, from Wallis 1656 and de Moivre and Stirling 1730 to Laplace 1778, etc.
{"title":"Probability Proofs for Stirling (and More): the Ubiquitous Role of 2π\u0000","authors":"Nils Lid Hjort, Emil Aas Stoltenberg","doi":"10.1080/00031305.2025.2603256","DOIUrl":"https://doi.org/10.1080/00031305.2025.2603256","url":null,"abstract":"The Stirling approximation formula for <span><img alt=\"\" data-formula-source='{\"type\":\"image\",\"src\":\"/cms/asset/65e2d19f-3328-49b9-9cec-082ae3947173/utas_a_2603256_ilm0002.gif\"}' src=\"//:0\"/></span><span><mjx-container aria-label=\"n factorial\" ctxtmenu_counter=\"0\" ctxtmenu_oldtabindex=\"1\" jax=\"CHTML\" overflow=\"linebreak\" role=\"tree\" sre-explorer- style=\"font-size: 121%;\" tabindex=\"0\"><mjx-math aria-hidden=\"true\" data-semantic-structure=\"(2 0 1)\"><mjx-mrow data-semantic-children=\"0,1\" data-semantic-content=\"1\" data-semantic- data-semantic-owns=\"0 1\" data-semantic-role=\"endpunct\" data-semantic-speech=\"n factorial\" data-semantic-type=\"punctuated\"><mjx-mi data-semantic-annotation=\"clearspeak:simple\" data-semantic-font=\"italic\" data-semantic- data-semantic-parent=\"2\" data-semantic-role=\"latinletter\" data-semantic-type=\"identifier\"><mjx-c>𝑛</mjx-c></mjx-mi><mjx-mo data-semantic- data-semantic-operator=\"punctuated\" data-semantic-parent=\"2\" data-semantic-role=\"exclamation\" data-semantic-type=\"punctuation\"><mjx-c>!</mjx-c></mjx-mo></mjx-mrow></mjx-math></mjx-container></span> dates from 1730. Here we give new and instructive proofs of this and related approximation formulae via tools of probability and statistics. There are connections to the Central Limit Theorem and also to approximations of marginal distributions in Bayesian setups, with arguments which can be worked through by Master and PhD level students (and above). Certain formulae emerge by working through particular instances, some independently verifiable but others perhaps not. A particular case yielding new formulae is that of summing independent uniforms, related to the Irwin–Hall distribution. Yet further proofs of the Stirling flow from examining aspects of limiting normality of the sample median of uniforms, and from these again we find a proof for the Wallis product formula for <span><img alt=\"\" data-formula-source='{\"type\":\"image\",\"src\":\"/cms/asset/fdc92f7b-9d1d-4e7e-9789-f159d9b7d2f2/utas_a_2603256_ilm0003.gif\"}' src=\"//:0\"/></span><span><mjx-container aria-label=\"pi\" ctxtmenu_counter=\"1\" ctxtmenu_oldtabindex=\"1\" jax=\"CHTML\" overflow=\"linebreak\" role=\"tree\" sre-explorer- style=\"font-size: 121%;\" tabindex=\"0\"><mjx-math aria-hidden=\"true\" data-semantic-structure=\"0\"><mjx-mi data-semantic-annotation=\"clearspeak:simple\" data-semantic-font=\"italic\" data-semantic- data-semantic-role=\"greekletter\" data-semantic-speech=\"pi\" data-semantic-type=\"identifier\"><mjx-c>𝜋</mjx-c></mjx-mi></mjx-math></mjx-container></span>. A section detailing historical aspects and development is included, from Wallis 1656 and de Moivre and Stirling 1730 to Laplace 1778, etc.","PeriodicalId":50801,"journal":{"name":"American Statistician","volume":"5 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145801448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}