In this article, we use e-values in the context of multiple hypothesis testing, assuming that the base tests produce independent, or sequential, e-values. Our simulation and empirical studies, as well as theoretical considerations, suggest that, under this assumption, our new algorithms are superior to the known algorithms using independent p-values and to our recent algorithms designed for e-values without the assumption of independence.
在本文中,我们在多重假设检验中使用 e 值,假设基本检验产生独立或连续的 e 值。我们的模拟和实证研究以及理论考虑表明,在这一假设下,我们的新算法优于使用独立 p 值的已知算法,也优于我们最近为不带独立性假设的 e 值设计的算法。
{"title":"True and false discoveries with independent and sequential e-values","authors":"Vladimir Vovk, Ruodu Wang","doi":"10.1002/cjs.11833","DOIUrl":"https://doi.org/10.1002/cjs.11833","url":null,"abstract":"<p>In this article, we use <i>e</i>-values in the context of multiple hypothesis testing, assuming that the base tests produce independent, or sequential, <i>e</i>-values. Our simulation and empirical studies, as well as theoretical considerations, suggest that, under this assumption, our new algorithms are superior to the known algorithms using independent <i>p</i>-values and to our recent algorithms designed for <i>e</i>-values without the assumption of independence.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 4","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.11833","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142642392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nonparametric estimation of a regression curve becomes crucial when the underlying dependence structure between covariates and responses is not explicit. While existing literature has addressed single change-point estimation for regression curves, the problem of multiple change points remains unresolved. In an effort to bridge this gap, this article introduces a nonparametric estimator for multiple change points by minimizing a penalized weighted sum of squared residuals, presenting consistent results under mild conditions. Additionally, we propose a cross-validation-based procedure that possesses the advantage of being tuning-free. Our simulation results showcase the competitive performance of these new procedures when compared with state-of-the-art methods. As an illustration of their utility, we apply these procedures to a real dataset.
{"title":"Multiple change-point detection for regression curves","authors":"Yunlong Wang","doi":"10.1002/cjs.11816","DOIUrl":"10.1002/cjs.11816","url":null,"abstract":"<p>Nonparametric estimation of a regression curve becomes crucial when the underlying dependence structure between covariates and responses is not explicit. While existing literature has addressed single change-point estimation for regression curves, the problem of multiple change points remains unresolved. In an effort to bridge this gap, this article introduces a nonparametric estimator for multiple change points by minimizing a penalized weighted sum of squared residuals, presenting consistent results under mild conditions. Additionally, we propose a cross-validation-based procedure that possesses the advantage of being tuning-free. Our simulation results showcase the competitive performance of these new procedures when compared with state-of-the-art methods. As an illustration of their utility, we apply these procedures to a real dataset.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 4","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141769582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The additive hazards model is one of the most commonly used models for regression analysis of failure time data, and many methods have been developed for its estimation. In this article, we consider the situation where one observes informatively interval-censored data arising from case-cohort studies where covariate information is collected only for a small subcohort of study subjects. By informative or dependent censoring, we mean that the failure time of interest and the censoring mechanism may be correlated. For estimation, we will develop a sieve inverse probability weighting estimation procedure with the use of Bernstein polynomials. The resulting estimators of regression parameters are shown to be consistent and asymptotically normal. An extensive simulation study is conducted and suggests that the proposed method works well in practical situations. An example is also provided.
{"title":"Estimation of the additive hazards model based on case-cohort interval-censored data with dependent censoring","authors":"Yuqing Ma, Peijie Wang, Yichen Lou, Jianguo Sun, Alzheimer's Disease Neuroimaging Initiative","doi":"10.1002/cjs.11818","DOIUrl":"https://doi.org/10.1002/cjs.11818","url":null,"abstract":"<p>The additive hazards model is one of the most commonly used models for regression analysis of failure time data, and many methods have been developed for its estimation. In this article, we consider the situation where one observes informatively interval-censored data arising from case-cohort studies where covariate information is collected only for a small subcohort of study subjects. By informative or dependent censoring, we mean that the failure time of interest and the censoring mechanism may be correlated. For estimation, we will develop a sieve inverse probability weighting estimation procedure with the use of Bernstein polynomials. The resulting estimators of regression parameters are shown to be consistent and asymptotically normal. An extensive simulation study is conducted and suggests that the proposed method works well in practical situations. An example is also provided.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 4","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142641682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Samantha Morrison, Constantine Gatsonis, Issa J. Dahabreh, Bing Li, Jon A. Steingrimsson
We present methods for estimating loss-based measures of the performance of a prediction model in a target population that differs from the source population in which the model was developed, in settings where outcome and covariate data are available from the source population but only covariate data are available on a simple random sample from the target population. Prior work adjusting for differences between the two populations has used various weighting estimators with inverse odds or density ratio weights. Here, we develop more robust estimators for the target population risk (expected loss) that can be used with data-adaptive (e.g., machine learning-based) estimation of nuisance parameters. We examine the large-sample properties of the estimators and evaluate finite-sample performance in simulations. Last, we apply the methods to data from lung cancer screening using nationally representative data from the National Health and Nutrition Examination Survey (NHANES) and extend our methods to account for the complex survey design of the NHANES.
{"title":"Robust estimation of loss-based measures of model performance under covariate shift","authors":"Samantha Morrison, Constantine Gatsonis, Issa J. Dahabreh, Bing Li, Jon A. Steingrimsson","doi":"10.1002/cjs.11815","DOIUrl":"10.1002/cjs.11815","url":null,"abstract":"<p>We present methods for estimating loss-based measures of the performance of a prediction model in a target population that differs from the source population in which the model was developed, in settings where outcome and covariate data are available from the source population but only covariate data are available on a simple random sample from the target population. Prior work adjusting for differences between the two populations has used various weighting estimators with inverse odds or density ratio weights. Here, we develop more robust estimators for the target population risk (expected loss) that can be used with data-adaptive (e.g., machine learning-based) estimation of nuisance parameters. We examine the large-sample properties of the estimators and evaluate finite-sample performance in simulations. Last, we apply the methods to data from lung cancer screening using nationally representative data from the National Health and Nutrition Examination Survey (NHANES) and extend our methods to account for the complex survey design of the NHANES.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 4","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141611950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Estimating the COVID-19 infection fatality rate, inferring the latent incidence and predicting the future epidemic evolution are critical to public health surveillance, but often challenging due to limited data availability or quality. Recently, a Bayesian framework combining time series deconvolution of deaths with a parametric Susceptible–Infectious–Recovered (SIR) model was proposed by Irons and Raftery, 2021. We assess the parameter identifiability of the model using the profile likelihood approach and simulations, when only the time series of deaths and seroprevalence survey data are available. The robustness of the model to the more complex but also more realistic Susceptible–Exposed–Infectious–Recovered (SEIR)-based epidemics is evaluated through simulations; the influence of potential biases in the serosurveys on the inference is also investigated. We use a stationary first-order autoregressive prior to account for the variability of transmission rate over time. The results suggest that the model is relatively robust to SEIR-based epidemics, especially when the reproductive number is low, given sufficient information from serosurveys or priors. However, the lack of parameter identifiability under limited data availability cannot be neglected. We apply the model to infer the COVID-19 infections in Ontario and Quebec, Canada during the Omicron era.
{"title":"An SIR-based Bayesian framework for COVID-19 infection estimation","authors":"Haoyu Wu, David A. Stephens, Erica E. M. Moodie","doi":"10.1002/cjs.11817","DOIUrl":"10.1002/cjs.11817","url":null,"abstract":"<p>Estimating the COVID-19 infection fatality rate, inferring the latent incidence and predicting the future epidemic evolution are critical to public health surveillance, but often challenging due to limited data availability or quality. Recently, a Bayesian framework combining time series deconvolution of deaths with a parametric Susceptible–Infectious–Recovered (SIR) model was proposed by Irons and Raftery, 2021. We assess the parameter identifiability of the model using the profile likelihood approach and simulations, when only the time series of deaths and seroprevalence survey data are available. The robustness of the model to the more complex but also more realistic Susceptible–Exposed–Infectious–Recovered (SEIR)-based epidemics is evaluated through simulations; the influence of potential biases in the serosurveys on the inference is also investigated. We use a stationary first-order autoregressive prior to account for the variability of transmission rate over time. The results suggest that the model is relatively robust to SEIR-based epidemics, especially when the reproductive number is low, given sufficient information from serosurveys or priors. However, the lack of parameter identifiability under limited data availability cannot be neglected. We apply the model to infer the COVID-19 infections in Ontario and Quebec, Canada during the Omicron era.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 4","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.11817","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141611983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider estimation of the mean squared prediction error (MSPE) for observed best prediction (OBP) in small area estimation with count data. The OBP method has been previously developed in this context by Chen et al. (Journal of Survey Statistics and Methodology, 3, 136–161, 2015). However, estimation of the MSPE remains a challenging problem due to potential model misspecification that is considered in this setting. The latter authors proposed a bootstrap method for estimating the MSPE, whose theoretical justification is not clear. We propose to use a Prasad–Rao-type linearization method to estimate the MSPE. Unlike the traditional linearization approaches, our method is computationally oriented and easier to implement in the same regard. Theoretical properties and empirical performance of the proposed method are studied. A real-data application is considered.
{"title":"Estimating the mean squared prediction error of the observed best predictor associated with small area counts: A computationally oriented approach","authors":"Thuan Nguyen, Jiming Jiang","doi":"10.1002/cjs.11810","DOIUrl":"10.1002/cjs.11810","url":null,"abstract":"<p>We consider estimation of the mean squared prediction error (MSPE) for observed best prediction (OBP) in small area estimation with count data. The OBP method has been previously developed in this context by Chen et al. (<i>Journal of Survey Statistics and Methodology</i>, 3, 136–161, 2015). However, estimation of the MSPE remains a challenging problem due to potential model misspecification that is considered in this setting. The latter authors proposed a bootstrap method for estimating the MSPE, whose theoretical justification is not clear. We propose to use a Prasad–Rao-type linearization method to estimate the MSPE. Unlike the traditional linearization approaches, our method is computationally oriented and easier to implement in the same regard. Theoretical properties and empirical performance of the proposed method are studied. A real-data application is considered.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 4","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141572216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Order-restricted hypothesis testing problems frequently arise in practice, including studies involving regression models for longitudinal data. These tests are known to be more powerful than tests that ignore such restrictions. In this article, we consider order-restricted tests for nonlinear mixed-effects models with measurement errors in time-dependent covariates. We propose to use a multiple imputation method to address measurement errors, since this approach allows us to use existing complete-data methods for order-restricted tests. Some theoretical results are presented. We evaluate our proposed methods via simulation studies that demonstrate they are more powerful than either a competing naive method or a two-step approach to testing hypotheses. We illustrate the use of our proposed approach by analyzing data from an HIV/AIDS study.
{"title":"Order-restricted hypothesis tests for nonlinear mixed-effects models with measurement errors in covariates","authors":"Yixin Zhang, Wei Liu, Lang Wu","doi":"10.1002/cjs.11812","DOIUrl":"10.1002/cjs.11812","url":null,"abstract":"<p>Order-restricted hypothesis testing problems frequently arise in practice, including studies involving regression models for longitudinal data. These tests are known to be more powerful than tests that ignore such restrictions. In this article, we consider order-restricted tests for nonlinear mixed-effects models with measurement errors in time-dependent covariates. We propose to use a multiple imputation method to address measurement errors, since this approach allows us to use existing complete-data methods for order-restricted tests. Some theoretical results are presented. We evaluate our proposed methods via simulation studies that demonstrate they are more powerful than either a competing naive method or a two-step approach to testing hypotheses. We illustrate the use of our proposed approach by analyzing data from an HIV/AIDS study.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 4","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141548252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a general mixture of Markov jump processes. The key novel feature of the proposed mixture is that the generator matrices of the Markov processes comprising the mixture are entirely unconstrained. The Markov processes are mixed with distributions that depend on the initial state of the mixture process. The maximum likelihood (ML) estimates of the mixture's parameters are obtained from continuous realizations of the mixture process and their standard errors from an explicit form of the observed Fisher information matrix, which simplifies the Louis (Journal of the Royal Statistical Society Series B, 44:226–233, 1982) general formula for the same matrix. The asymptotic properties of the ML estimators are also derived. A simulation study verifies the estimates' accuracy. The proposed mixture provides an exploratory tool for identifying the homogeneous subpopulations in a heterogeneous population. This is illustrated with an application to a medical dataset.
我们提出了马尔可夫跳跃过程的一般混合物。所提混合物的关键新特征是,构成混合物的马尔可夫过程的生成矩阵完全不受制约。马尔可夫过程的混合分布取决于混合过程的初始状态。混合物参数的最大似然法(ML)估计是从混合物过程的连续实化中获得的,其标准误差是从观察到的费雪信息矩阵的明确形式中获得的,这简化了路易斯(《皇家统计学会杂志》B 辑,44:226-233, 1982 年)关于同一矩阵的一般公式。此外,还得出了 ML 估计数的渐近特性。模拟研究验证了估计的准确性。所提出的混合物为识别异质人群中的同质子群提供了一种探索性工具。我们将通过对一个医疗数据集的应用来说明这一点。
{"title":"Estimation in a general mixture of Markov jump processes","authors":"Halina Frydman, Budhi Arta Surya","doi":"10.1002/cjs.11814","DOIUrl":"10.1002/cjs.11814","url":null,"abstract":"<p>We propose a general mixture of Markov jump processes. The key novel feature of the proposed mixture is that the generator matrices of the Markov processes comprising the mixture are entirely unconstrained. The Markov processes are mixed with distributions that depend on the initial state of the mixture process. The maximum likelihood (ML) estimates of the mixture's parameters are obtained from continuous realizations of the mixture process and their standard errors from an explicit form of the observed Fisher information matrix, which simplifies the Louis (<i>Journal of the Royal Statistical Society Series B</i>, 44:226–233, 1982) general formula for the same matrix. The asymptotic properties of the ML estimators are also derived. A simulation study verifies the estimates' accuracy. The proposed mixture provides an exploratory tool for identifying the homogeneous subpopulations in a heterogeneous population. This is illustrated with an application to a medical dataset.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 4","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141548249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We study the first-order stochastic dominance (SD) test in the context of two independent random samples. We introduce several test statistics that effectively capture violations of the dominance relationship, particularly in the tail regions. Additionally, we develop a resampling procedure to compute the