Rosanne J. Turner , Alexander Ly , Peter D. Grünwald
{"title":"Generic E-variables for exact sequential k-sample tests that allow for optional stopping","authors":"Rosanne J. Turner , Alexander Ly , Peter D. Grünwald","doi":"10.1016/j.jspi.2023.106116","DOIUrl":null,"url":null,"abstract":"<div><p>We develop <span><math><mstyle><mi>E</mi></mstyle></math></span>-variables for testing whether two or more data streams come from the same source or not, and more generally, whether the difference between the sources is larger than some minimal effect size. These <span><math><mstyle><mi>E</mi></mstyle></math></span>-variables lead to exact, nonasymptotic tests that remain safe, i.e., keep their type-I error guarantees, under flexible sampling scenarios such as optional stopping and continuation. In special cases our <span><math><mstyle><mi>E</mi></mstyle></math></span>-variables also have an optimal ‘growth’ property under the alternative. While the construction is generic, we illustrate it through the special case of <span><math><mrow><mi>k</mi><mo>×</mo><mn>2</mn></mrow></math></span> contingency tables, i.e. <span><math><mi>k</mi></math></span> Bernoulli streams, allowing for the incorporation of different restrictions on the composite alternative. Comparison to <span><math><mi>p</mi></math></span>-value analysis in simulations and a real-world 2 × 2 contingency table example show that <span><math><mstyle><mi>E</mi></mstyle></math></span>-variables, through their flexibility, often allow for early stopping of data collection — thereby retaining similar power as classical methods — while also retaining the option of extending or combining data afterwards.</p></div>","PeriodicalId":50039,"journal":{"name":"Journal of Statistical Planning and Inference","volume":"230 ","pages":"Article 106116"},"PeriodicalIF":0.8000,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S037837582300085X/pdfft?md5=572bc8e92c25baa3e6a3f4936ee83e72&pid=1-s2.0-S037837582300085X-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Statistical Planning and Inference","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S037837582300085X","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0
Abstract
We develop -variables for testing whether two or more data streams come from the same source or not, and more generally, whether the difference between the sources is larger than some minimal effect size. These -variables lead to exact, nonasymptotic tests that remain safe, i.e., keep their type-I error guarantees, under flexible sampling scenarios such as optional stopping and continuation. In special cases our -variables also have an optimal ‘growth’ property under the alternative. While the construction is generic, we illustrate it through the special case of contingency tables, i.e. Bernoulli streams, allowing for the incorporation of different restrictions on the composite alternative. Comparison to -value analysis in simulations and a real-world 2 × 2 contingency table example show that -variables, through their flexibility, often allow for early stopping of data collection — thereby retaining similar power as classical methods — while also retaining the option of extending or combining data afterwards.
期刊介绍:
The Journal of Statistical Planning and Inference offers itself as a multifaceted and all-inclusive bridge between classical aspects of statistics and probability, and the emerging interdisciplinary aspects that have a potential of revolutionizing the subject. While we maintain our traditional strength in statistical inference, design, classical probability, and large sample methods, we also have a far more inclusive and broadened scope to keep up with the new problems that confront us as statisticians, mathematicians, and scientists.
We publish high quality articles in all branches of statistics, probability, discrete mathematics, machine learning, and bioinformatics. We also especially welcome well written and up to date review articles on fundamental themes of statistics, probability, machine learning, and general biostatistics. Thoughtful letters to the editors, interesting problems in need of a solution, and short notes carrying an element of elegance or beauty are equally welcome.