Mareen Pigorsch, Ludwig A. Hothorn, Frank Konietschke
Although count data are collected in many experiments, their analysis remains challenging, especially in small sample sizes. Until now, linear or generalized linear models in Poisson or Negative Binomial distributional families have often been used. However, these data frequently show signs of over-, underdispersion, or even zero-inflation, casting doubt on these distributional assumptions and leading to inaccurate test results. Since their distributions are usually skewed, data transformations (e.g., log-transformation) are not unusual. This underscores the need for statistical methods not to hinge on specific distributional assumptions. We delve into multiple contrast tests that allow general contrasts (e.g., many-to-one or all-pairs comparisons) to analyze count data in multi-arm trials. The methods vary in their effect and variance estimation, as well as in approximating the joint distribution of multiple test statistics, including frequently used methods such as linear and generalized linear models, and data transformations. An extensive simulation study demonstrates that a resampling version effectively controls the Type I error rate in various situations, while also highlighting the method's limitations, including overly liberal Type I error rates. Some standard methods, which have inflated Type I error rates, further underscore the need for alternative approaches. Real data applications further emphasize the applicability of these methods.
{"title":"Multiple Contrast Tests for Count Data: Small Sample Approximations and Their Limitations","authors":"Mareen Pigorsch, Ludwig A. Hothorn, Frank Konietschke","doi":"10.1002/bimj.70098","DOIUrl":"10.1002/bimj.70098","url":null,"abstract":"<p>Although count data are collected in many experiments, their analysis remains challenging, especially in small sample sizes. Until now, linear or generalized linear models in Poisson or Negative Binomial distributional families have often been used. However, these data frequently show signs of over-, underdispersion, or even zero-inflation, casting doubt on these distributional assumptions and leading to inaccurate test results. Since their distributions are usually skewed, data transformations (e.g., log-transformation) are not unusual. This underscores the need for statistical methods not to hinge on specific distributional assumptions. We delve into multiple contrast tests that allow general contrasts (e.g., many-to-one or all-pairs comparisons) to analyze count data in multi-arm trials. The methods vary in their effect and variance estimation, as well as in approximating the joint distribution of multiple test statistics, including frequently used methods such as linear and generalized linear models, and data transformations. An extensive simulation study demonstrates that a resampling version effectively controls the Type I error rate in various situations, while also highlighting the method's limitations, including overly liberal Type I error rates. Some standard methods, which have inflated Type I error rates, further underscore the need for alternative approaches. Real data applications further emphasize the applicability of these methods.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 6","pages":""},"PeriodicalIF":1.8,"publicationDate":"2025-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12683215/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145702878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shanshan Wang, Eliana Christou, Eftychia Solea, Jun Song
Functional data analysis has received significant attention due to its frequent occurrence in modern applications, such as in the medical field, where electrocardiograms or electroencephalograms can be used for a better understanding of various medical conditions. Due to the infinite-dimensional nature of functional elements, the current work focuses on dimension reduction techniques. This study shifts its focus to modeling the conditional quantiles of functional data, noting that existing works are limited to quantitative predictors. Consequently, we introduce the first approach to partial dimension reduction for the conditional quantiles under the presence of both functional and categorical predictors. We present the proposed algorithm and derive the convergence rates of the estimators. Moreover, we demonstrate the finite sample performance of the method using simulation examples and a real dataset based on functional magnetic resonance imaging.
{"title":"Dimension Reduction for the Conditional Quantiles of Functional Data With Categorical Predictors.","authors":"Shanshan Wang, Eliana Christou, Eftychia Solea, Jun Song","doi":"10.1002/bimj.70102","DOIUrl":"https://doi.org/10.1002/bimj.70102","url":null,"abstract":"<p><p>Functional data analysis has received significant attention due to its frequent occurrence in modern applications, such as in the medical field, where electrocardiograms or electroencephalograms can be used for a better understanding of various medical conditions. Due to the infinite-dimensional nature of functional elements, the current work focuses on dimension reduction techniques. This study shifts its focus to modeling the conditional quantiles of functional data, noting that existing works are limited to quantitative predictors. Consequently, we introduce the first approach to partial dimension reduction for the conditional quantiles under the presence of both functional and categorical predictors. We present the proposed algorithm and derive the convergence rates of the estimators. Moreover, we demonstrate the finite sample performance of the method using simulation examples and a real dataset based on functional magnetic resonance imaging.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 6","pages":"e70102"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145776649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the competing risks setting, the -year absolute risk for a specific time (e.g., 2 years), also called the cumulative incidence function at time , is often interesting to estimate. It is routinely estimated using the nonparametric Aalen-Johansen estimator. This estimator handles right-censored data and has desirable large sample properties, as it is the nonparametric maximum likelihood estimator (NPMLE). Inference for comparing absolute risks, via either a risk difference or a risk ratio, can therefore be done via usual asymptotic normal approximations and the delta method. However, the small sample performances of this approach are not fully satisfactory. Especially, (i) coverage of confidence intervals may be inaccurate and (ii) comparisons made using a risk ratio and a risk difference can lead to inconsistent conclusions, in terms of statistical significance. We, therefore, introduce an alternative empirical likelihood approach. One advantage of this approach is that it always leads to consistent conclusions when comparing absolute risks via a risk ratio and a risk difference, in terms of significance. Simulation results also suggest that small sample inference using this approach can be more accurate. We present the computation of confidence intervals and p-values using this approach and the asymptotic properties that justify them. We provide formulas and algorithms to compute constrained NPMLE, from which empirical likelihood ratios and inference procedures are derived. The novel approach has been implemented in the timeEL package for R, and some of its advantages are demonstrated via reproducible analyses of bone marrow transplant data.
{"title":"Empirical Likelihood Comparison of Absolute Risks.","authors":"Paul Blanche, Frank Eriksson","doi":"10.1002/bimj.70104","DOIUrl":"https://doi.org/10.1002/bimj.70104","url":null,"abstract":"<p><p>In the competing risks setting, the <math><semantics><mi>t</mi> <annotation>$t$</annotation></semantics> </math> -year absolute risk for a specific time <math><semantics><mi>t</mi> <annotation>$t$</annotation></semantics> </math> (e.g., 2 years), also called the cumulative incidence function at time <math><semantics><mi>t</mi> <annotation>$t$</annotation></semantics> </math> , is often interesting to estimate. It is routinely estimated using the nonparametric Aalen-Johansen estimator. This estimator handles right-censored data and has desirable large sample properties, as it is the nonparametric maximum likelihood estimator (NPMLE). Inference for comparing absolute risks, via either a risk difference or a risk ratio, can therefore be done via usual asymptotic normal approximations and the delta method. However, the small sample performances of this approach are not fully satisfactory. Especially, (i) coverage of confidence intervals may be inaccurate and (ii) comparisons made using a risk ratio and a risk difference can lead to inconsistent conclusions, in terms of statistical significance. We, therefore, introduce an alternative empirical likelihood approach. One advantage of this approach is that it always leads to consistent conclusions when comparing absolute risks via a risk ratio and a risk difference, in terms of significance. Simulation results also suggest that small sample inference using this approach can be more accurate. We present the computation of confidence intervals and p-values using this approach and the asymptotic properties that justify them. We provide formulas and algorithms to compute constrained NPMLE, from which empirical likelihood ratios and inference procedures are derived. The novel approach has been implemented in the timeEL package for R, and some of its advantages are demonstrated via reproducible analyses of bone marrow transplant data.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 6","pages":"e70104"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145776656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper introduces a sensitivity analysis method for multiple testing procedures (MTPs) using marginal -values. The method is based on the Dirichlet process (DP) prior distribution, specified to support the entire space of MTPs, where each MTP controls either the family-wise error rate (FWER) or the false discovery rate (FDR) under arbitrary dependence between -values. This DP-MTP sensitivity analysis method provides uncertainty quantification for MTPs, by accounting for uncertainty in the selection of such MTPs and their respective threshold-based decisions regarding which number of smallest -values are significant discoveries, from a given set of null hypothesis tested, while measuring each -value's probability of significance over the DP prior predictive distribution of this space of all MTPs, and reducing the possible conservativeness of using only one such MTP for multiple testing. The DP-MTP sensitivity analysis method is illustrated through the analysis of over 28,000 -values arising from hypothesis tests performed on a 2022 dataset of a representative sample of three million U.S. high school students observed on 239 variables. They include tests which, respectively, relate variables about the disruption caused by school closures during the COVID-19 pandemic, with various mathematical cognition, academic achievement, and student background variables. R software code for the DP-MTP sensitivity analysis method is provided in the Code and Data Supplement (CDS) of this paper.
本文介绍了一种利用边际p$ p$值对多重测试程序(MTPs)进行灵敏度分析的方法。该方法基于Dirichlet过程(DP)先验分布,指定支持MTP的整个空间,其中每个MTP控制在p$ p$ -值之间任意依赖的家庭错误率(FWER)或错误发现率(FDR)。这种DP- mtp敏感性分析方法为MTPs提供了不确定性量化,通过考虑这些MTPs选择的不确定性,以及它们各自基于阈值的决策,即从给定的一组检验的零假设中,哪些最小的p$ p$值是重要的发现,同时测量每个p$ p$值在所有MTPs空间的DP先验预测分布上的显著性概率。并减少仅使用一种这样的MTP进行多次测试的可能的保守性。DP-MTP敏感性分析方法是通过分析超过28,000个p$ p$值来说明的,这些p$ p$值是在2022年的数据集上进行的假设检验中产生的,该数据集包含300万美国高中生的代表性样本,观察到239个变量。其中包括测试,这些测试分别将COVID-19大流行期间学校关闭造成的中断的变量与各种数学认知、学习成绩和学生背景变量联系起来。本文的code and Data Supplement (CDS)提供了DP-MTP灵敏度分析法的R软件代码。
{"title":"Bayesian Nonparametric Sensitivity Analysis of Multiple Test Procedures Under Dependence.","authors":"George Karabatsos","doi":"10.1002/bimj.70101","DOIUrl":"10.1002/bimj.70101","url":null,"abstract":"<p><p>This paper introduces a sensitivity analysis method for multiple testing procedures (MTPs) using marginal <math><semantics><mi>p</mi> <annotation>$p$</annotation></semantics> </math> -values. The method is based on the Dirichlet process (DP) prior distribution, specified to support the entire space of MTPs, where each MTP controls either the family-wise error rate (FWER) or the false discovery rate (FDR) under arbitrary dependence between <math><semantics><mi>p</mi> <annotation>$p$</annotation></semantics> </math> -values. This DP-MTP sensitivity analysis method provides uncertainty quantification for MTPs, by accounting for uncertainty in the selection of such MTPs and their respective threshold-based decisions regarding which number of smallest <math><semantics><mi>p</mi> <annotation>$p$</annotation></semantics> </math> -values are significant discoveries, from a given set of null hypothesis tested, while measuring each <math><semantics><mi>p</mi> <annotation>$p$</annotation></semantics> </math> -value's probability of significance over the DP prior predictive distribution of this space of all MTPs, and reducing the possible conservativeness of using only one such MTP for multiple testing. The DP-MTP sensitivity analysis method is illustrated through the analysis of over 28,000 <math><semantics><mi>p</mi> <annotation>$p$</annotation></semantics> </math> -values arising from hypothesis tests performed on a 2022 dataset of a representative sample of three million U.S. high school students observed on 239 variables. They include tests which, respectively, relate variables about the disruption caused by school closures during the COVID-19 pandemic, with various mathematical cognition, academic achievement, and student background variables. R software code for the DP-MTP sensitivity analysis method is provided in the Code and Data Supplement (CDS) of this paper.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 6","pages":"e70101"},"PeriodicalIF":1.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12703229/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145758395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}