Samantha Morrison, Constantine Gatsonis, Issa J. Dahabreh, Bing Li, Jon A. Steingrimsson
We present methods for estimating loss-based measures of the performance of a prediction model in a target population that differs from the source population in which the model was developed, in settings where outcome and covariate data are available from the source population but only covariate data are available on a simple random sample from the target population. Prior work adjusting for differences between the two populations has used various weighting estimators with inverse odds or density ratio weights. Here, we develop more robust estimators for the target population risk (expected loss) that can be used with data-adaptive (e.g., machine learning-based) estimation of nuisance parameters. We examine the large-sample properties of the estimators and evaluate finite-sample performance in simulations. Last, we apply the methods to data from lung cancer screening using nationally representative data from the National Health and Nutrition Examination Survey (NHANES) and extend our methods to account for the complex survey design of the NHANES.
{"title":"Robust estimation of loss-based measures of model performance under covariate shift","authors":"Samantha Morrison, Constantine Gatsonis, Issa J. Dahabreh, Bing Li, Jon A. Steingrimsson","doi":"10.1002/cjs.11815","DOIUrl":"10.1002/cjs.11815","url":null,"abstract":"<p>We present methods for estimating loss-based measures of the performance of a prediction model in a target population that differs from the source population in which the model was developed, in settings where outcome and covariate data are available from the source population but only covariate data are available on a simple random sample from the target population. Prior work adjusting for differences between the two populations has used various weighting estimators with inverse odds or density ratio weights. Here, we develop more robust estimators for the target population risk (expected loss) that can be used with data-adaptive (e.g., machine learning-based) estimation of nuisance parameters. We examine the large-sample properties of the estimators and evaluate finite-sample performance in simulations. Last, we apply the methods to data from lung cancer screening using nationally representative data from the National Health and Nutrition Examination Survey (NHANES) and extend our methods to account for the complex survey design of the NHANES.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 4","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141611950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Estimating the COVID-19 infection fatality rate, inferring the latent incidence and predicting the future epidemic evolution are critical to public health surveillance, but often challenging due to limited data availability or quality. Recently, a Bayesian framework combining time series deconvolution of deaths with a parametric Susceptible–Infectious–Recovered (SIR) model was proposed by Irons and Raftery, 2021. We assess the parameter identifiability of the model using the profile likelihood approach and simulations, when only the time series of deaths and seroprevalence survey data are available. The robustness of the model to the more complex but also more realistic Susceptible–Exposed–Infectious–Recovered (SEIR)-based epidemics is evaluated through simulations; the influence of potential biases in the serosurveys on the inference is also investigated. We use a stationary first-order autoregressive prior to account for the variability of transmission rate over time. The results suggest that the model is relatively robust to SEIR-based epidemics, especially when the reproductive number is low, given sufficient information from serosurveys or priors. However, the lack of parameter identifiability under limited data availability cannot be neglected. We apply the model to infer the COVID-19 infections in Ontario and Quebec, Canada during the Omicron era.
{"title":"An SIR-based Bayesian framework for COVID-19 infection estimation","authors":"Haoyu Wu, David A. Stephens, Erica E. M. Moodie","doi":"10.1002/cjs.11817","DOIUrl":"10.1002/cjs.11817","url":null,"abstract":"<p>Estimating the COVID-19 infection fatality rate, inferring the latent incidence and predicting the future epidemic evolution are critical to public health surveillance, but often challenging due to limited data availability or quality. Recently, a Bayesian framework combining time series deconvolution of deaths with a parametric Susceptible–Infectious–Recovered (SIR) model was proposed by Irons and Raftery, 2021. We assess the parameter identifiability of the model using the profile likelihood approach and simulations, when only the time series of deaths and seroprevalence survey data are available. The robustness of the model to the more complex but also more realistic Susceptible–Exposed–Infectious–Recovered (SEIR)-based epidemics is evaluated through simulations; the influence of potential biases in the serosurveys on the inference is also investigated. We use a stationary first-order autoregressive prior to account for the variability of transmission rate over time. The results suggest that the model is relatively robust to SEIR-based epidemics, especially when the reproductive number is low, given sufficient information from serosurveys or priors. However, the lack of parameter identifiability under limited data availability cannot be neglected. We apply the model to infer the COVID-19 infections in Ontario and Quebec, Canada during the Omicron era.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 4","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.11817","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141611983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider estimation of the mean squared prediction error (MSPE) for observed best prediction (OBP) in small area estimation with count data. The OBP method has been previously developed in this context by Chen et al. (Journal of Survey Statistics and Methodology, 3, 136–161, 2015). However, estimation of the MSPE remains a challenging problem due to potential model misspecification that is considered in this setting. The latter authors proposed a bootstrap method for estimating the MSPE, whose theoretical justification is not clear. We propose to use a Prasad–Rao-type linearization method to estimate the MSPE. Unlike the traditional linearization approaches, our method is computationally oriented and easier to implement in the same regard. Theoretical properties and empirical performance of the proposed method are studied. A real-data application is considered.
{"title":"Estimating the mean squared prediction error of the observed best predictor associated with small area counts: A computationally oriented approach","authors":"Thuan Nguyen, Jiming Jiang","doi":"10.1002/cjs.11810","DOIUrl":"10.1002/cjs.11810","url":null,"abstract":"<p>We consider estimation of the mean squared prediction error (MSPE) for observed best prediction (OBP) in small area estimation with count data. The OBP method has been previously developed in this context by Chen et al. (<i>Journal of Survey Statistics and Methodology</i>, 3, 136–161, 2015). However, estimation of the MSPE remains a challenging problem due to potential model misspecification that is considered in this setting. The latter authors proposed a bootstrap method for estimating the MSPE, whose theoretical justification is not clear. We propose to use a Prasad–Rao-type linearization method to estimate the MSPE. Unlike the traditional linearization approaches, our method is computationally oriented and easier to implement in the same regard. Theoretical properties and empirical performance of the proposed method are studied. A real-data application is considered.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 4","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141572216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Order-restricted hypothesis testing problems frequently arise in practice, including studies involving regression models for longitudinal data. These tests are known to be more powerful than tests that ignore such restrictions. In this article, we consider order-restricted tests for nonlinear mixed-effects models with measurement errors in time-dependent covariates. We propose to use a multiple imputation method to address measurement errors, since this approach allows us to use existing complete-data methods for order-restricted tests. Some theoretical results are presented. We evaluate our proposed methods via simulation studies that demonstrate they are more powerful than either a competing naive method or a two-step approach to testing hypotheses. We illustrate the use of our proposed approach by analyzing data from an HIV/AIDS study.
{"title":"Order-restricted hypothesis tests for nonlinear mixed-effects models with measurement errors in covariates","authors":"Yixin Zhang, Wei Liu, Lang Wu","doi":"10.1002/cjs.11812","DOIUrl":"10.1002/cjs.11812","url":null,"abstract":"<p>Order-restricted hypothesis testing problems frequently arise in practice, including studies involving regression models for longitudinal data. These tests are known to be more powerful than tests that ignore such restrictions. In this article, we consider order-restricted tests for nonlinear mixed-effects models with measurement errors in time-dependent covariates. We propose to use a multiple imputation method to address measurement errors, since this approach allows us to use existing complete-data methods for order-restricted tests. Some theoretical results are presented. We evaluate our proposed methods via simulation studies that demonstrate they are more powerful than either a competing naive method or a two-step approach to testing hypotheses. We illustrate the use of our proposed approach by analyzing data from an HIV/AIDS study.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 4","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141548252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a general mixture of Markov jump processes. The key novel feature of the proposed mixture is that the generator matrices of the Markov processes comprising the mixture are entirely unconstrained. The Markov processes are mixed with distributions that depend on the initial state of the mixture process. The maximum likelihood (ML) estimates of the mixture's parameters are obtained from continuous realizations of the mixture process and their standard errors from an explicit form of the observed Fisher information matrix, which simplifies the Louis (Journal of the Royal Statistical Society Series B, 44:226–233, 1982) general formula for the same matrix. The asymptotic properties of the ML estimators are also derived. A simulation study verifies the estimates' accuracy. The proposed mixture provides an exploratory tool for identifying the homogeneous subpopulations in a heterogeneous population. This is illustrated with an application to a medical dataset.
我们提出了马尔可夫跳跃过程的一般混合物。所提混合物的关键新特征是,构成混合物的马尔可夫过程的生成矩阵完全不受制约。马尔可夫过程的混合分布取决于混合过程的初始状态。混合物参数的最大似然法(ML)估计是从混合物过程的连续实化中获得的,其标准误差是从观察到的费雪信息矩阵的明确形式中获得的,这简化了路易斯(《皇家统计学会杂志》B 辑,44:226-233, 1982 年)关于同一矩阵的一般公式。此外,还得出了 ML 估计数的渐近特性。模拟研究验证了估计的准确性。所提出的混合物为识别异质人群中的同质子群提供了一种探索性工具。我们将通过对一个医疗数据集的应用来说明这一点。
{"title":"Estimation in a general mixture of Markov jump processes","authors":"Halina Frydman, Budhi Arta Surya","doi":"10.1002/cjs.11814","DOIUrl":"10.1002/cjs.11814","url":null,"abstract":"<p>We propose a general mixture of Markov jump processes. The key novel feature of the proposed mixture is that the generator matrices of the Markov processes comprising the mixture are entirely unconstrained. The Markov processes are mixed with distributions that depend on the initial state of the mixture process. The maximum likelihood (ML) estimates of the mixture's parameters are obtained from continuous realizations of the mixture process and their standard errors from an explicit form of the observed Fisher information matrix, which simplifies the Louis (<i>Journal of the Royal Statistical Society Series B</i>, 44:226–233, 1982) general formula for the same matrix. The asymptotic properties of the ML estimators are also derived. A simulation study verifies the estimates' accuracy. The proposed mixture provides an exploratory tool for identifying the homogeneous subpopulations in a heterogeneous population. This is illustrated with an application to a medical dataset.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 4","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141548249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We study the first-order stochastic dominance (SD) test in the context of two independent random samples. We introduce several test statistics that effectively capture violations of the dominance relationship, particularly in the tail regions. Additionally, we develop a resampling procedure to compute the