{"title":"预期误差成本能否证明在多个 alpha 水平下测试假设是合理的,而不是寻找难以捉摸的最佳 alpha?","authors":"Janet Aisbett","doi":"arxiv-2407.21303","DOIUrl":null,"url":null,"abstract":"Simultaneous testing of one hypothesis at multiple alpha levels can be\nperformed within a conventional Neyman-Pearson framework. This is achieved by\ntreating the hypothesis as a family of hypotheses, each member of which\nexplicitly concerns test level as well as effect size. Such testing encourages\nresearchers to think about error rates and strength of evidence in both the\nstatistical design and reporting stages of a study. Here, we show that these\nmulti-alpha level tests can deliver acceptable expected total error costs. We\nfirst present formulas for expected error costs from single alpha and multiple\nalpha level tests, given prior probabilities of effect sizes that have either\ndichotomous or continuous distributions. Error costs are tied to decisions,\nwith different decisions assumed for each of the potential outcomes in the\nmulti-alpha level case. Expected total costs for tests at single and multiple\nalpha levels are then compared with optimal costs. This comparison highlights\nhow sensitive optimization is to estimated error costs and to assumptions about\nprevalence. Testing at multiple default thresholds removes the need to formally\nidentify decisions, or to model costs and prevalence as required in\noptimization approaches. Although total expected error costs with this approach\nwill not be optimal, our results suggest they may be lower, on average, than\nwhen so-called optimal test levels are based on mis-specified models.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Can expected error costs justify testing a hypothesis at multiple alpha levels rather than searching for an elusive optimal alpha?\",\"authors\":\"Janet Aisbett\",\"doi\":\"arxiv-2407.21303\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Simultaneous testing of one hypothesis at multiple alpha levels can be\\nperformed within a conventional Neyman-Pearson framework. This is achieved by\\ntreating the hypothesis as a family of hypotheses, each member of which\\nexplicitly concerns test level as well as effect size. Such testing encourages\\nresearchers to think about error rates and strength of evidence in both the\\nstatistical design and reporting stages of a study. Here, we show that these\\nmulti-alpha level tests can deliver acceptable expected total error costs. We\\nfirst present formulas for expected error costs from single alpha and multiple\\nalpha level tests, given prior probabilities of effect sizes that have either\\ndichotomous or continuous distributions. Error costs are tied to decisions,\\nwith different decisions assumed for each of the potential outcomes in the\\nmulti-alpha level case. Expected total costs for tests at single and multiple\\nalpha levels are then compared with optimal costs. This comparison highlights\\nhow sensitive optimization is to estimated error costs and to assumptions about\\nprevalence. Testing at multiple default thresholds removes the need to formally\\nidentify decisions, or to model costs and prevalence as required in\\noptimization approaches. Although total expected error costs with this approach\\nwill not be optimal, our results suggest they may be lower, on average, than\\nwhen so-called optimal test levels are based on mis-specified models.\",\"PeriodicalId\":501172,\"journal\":{\"name\":\"arXiv - STAT - Applications\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.21303\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.21303","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Can expected error costs justify testing a hypothesis at multiple alpha levels rather than searching for an elusive optimal alpha?
Simultaneous testing of one hypothesis at multiple alpha levels can be
performed within a conventional Neyman-Pearson framework. This is achieved by
treating the hypothesis as a family of hypotheses, each member of which
explicitly concerns test level as well as effect size. Such testing encourages
researchers to think about error rates and strength of evidence in both the
statistical design and reporting stages of a study. Here, we show that these
multi-alpha level tests can deliver acceptable expected total error costs. We
first present formulas for expected error costs from single alpha and multiple
alpha level tests, given prior probabilities of effect sizes that have either
dichotomous or continuous distributions. Error costs are tied to decisions,
with different decisions assumed for each of the potential outcomes in the
multi-alpha level case. Expected total costs for tests at single and multiple
alpha levels are then compared with optimal costs. This comparison highlights
how sensitive optimization is to estimated error costs and to assumptions about
prevalence. Testing at multiple default thresholds removes the need to formally
identify decisions, or to model costs and prevalence as required in
optimization approaches. Although total expected error costs with this approach
will not be optimal, our results suggest they may be lower, on average, than
when so-called optimal test levels are based on mis-specified models.