The response envelope model proposed by Cook et al. (2010) is an efficient method to estimate the regression coefficient under the context of the multivariate linear regression model. It improves estimation efficiency by identifying material and immaterial parts of responses and removing the immaterial variation. The response envelope model has been investigated only for continuous response variables. In this paper, we propose the multivariate probit model with latent envelope, in short, the probit envelope model, as a response envelope model for multivariate binary response variables. The probit envelope model takes into account relations between Gaussian latent variables of the multivariate probit model by using the idea of the response envelope model. We address the identifiability of the probit envelope model by employing the essential identifiability concept and suggest a Bayesian method for the parameter estimation. We illustrate the probit envelope model via simulation studies and real-data analysis. The simulation studies show that the probit envelope model has the potential to gain efficiency in estimation compared to the multivariate probit model. The real data analysis shows that the probit envelope model is useful for multi-label classification.
{"title":"Bayesian inference for multivariate probit model with latent envelope.","authors":"Kwangmin Lee, Yeonhee Park","doi":"10.1093/biomtc/ujae059","DOIUrl":"https://doi.org/10.1093/biomtc/ujae059","url":null,"abstract":"<p><p>The response envelope model proposed by Cook et al. (2010) is an efficient method to estimate the regression coefficient under the context of the multivariate linear regression model. It improves estimation efficiency by identifying material and immaterial parts of responses and removing the immaterial variation. The response envelope model has been investigated only for continuous response variables. In this paper, we propose the multivariate probit model with latent envelope, in short, the probit envelope model, as a response envelope model for multivariate binary response variables. The probit envelope model takes into account relations between Gaussian latent variables of the multivariate probit model by using the idea of the response envelope model. We address the identifiability of the probit envelope model by employing the essential identifiability concept and suggest a Bayesian method for the parameter estimation. We illustrate the probit envelope model via simulation studies and real-data analysis. The simulation studies show that the probit envelope model has the potential to gain efficiency in estimation compared to the multivariate probit model. The real data analysis shows that the probit envelope model is useful for multi-label classification.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141475824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article addresses the challenge of estimating receiver operating characteristic (ROC) curves and the areas under these curves (AUC) in the context of an imperfect gold standard, a common issue in diagnostic accuracy studies. We delve into the nonparametric identification and estimation of ROC curves and AUCs when the reference standard for disease status is prone to error. Our approach hinges on the known or estimable accuracy of this imperfect reference standard and the conditional independent assumption, under which we demonstrate the identifiability of ROC curves and propose a nonparametric estimation method. In cases where the accuracy of the imperfect reference standard remains unknown, we establish that while ROC curves are unidentifiable, the sign of the difference between two AUCs is identifiable. This insight leads us to develop a hypothesis-testing method for assessing the relative superiority of AUCs. Compared to the existing methods, the proposed methods are nonparametric so that they do not rely on the parametric model assumptions. In addition, they are applicable to both the ROC/AUC analysis of continuous biomarkers and the AUC analysis of ordinal biomarkers. Our theoretical results and simulation studies validate the proposed methods, which we further illustrate through application in two real-world diagnostic studies.
{"title":"Nonparametric receiver operating characteristic curve analysis with an imperfect gold standard.","authors":"Jiarui Sun, Chao Tang, Wuxiang Xie, Xiao-Hua Zhou","doi":"10.1093/biomtc/ujae063","DOIUrl":"https://doi.org/10.1093/biomtc/ujae063","url":null,"abstract":"<p><p>This article addresses the challenge of estimating receiver operating characteristic (ROC) curves and the areas under these curves (AUC) in the context of an imperfect gold standard, a common issue in diagnostic accuracy studies. We delve into the nonparametric identification and estimation of ROC curves and AUCs when the reference standard for disease status is prone to error. Our approach hinges on the known or estimable accuracy of this imperfect reference standard and the conditional independent assumption, under which we demonstrate the identifiability of ROC curves and propose a nonparametric estimation method. In cases where the accuracy of the imperfect reference standard remains unknown, we establish that while ROC curves are unidentifiable, the sign of the difference between two AUCs is identifiable. This insight leads us to develop a hypothesis-testing method for assessing the relative superiority of AUCs. Compared to the existing methods, the proposed methods are nonparametric so that they do not rely on the parametric model assumptions. In addition, they are applicable to both the ROC/AUC analysis of continuous biomarkers and the AUC analysis of ordinal biomarkers. Our theoretical results and simulation studies validate the proposed methods, which we further illustrate through application in two real-world diagnostic studies.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141589542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We estimate relative hazards and absolute risks (or cumulative incidence or crude risk) under cause-specific proportional hazards models for competing risks from double nested case-control (DNCC) data. In the DNCC design, controls are time-matched not only to cases from the cause of primary interest, but also to cases from competing risks (the phase-two sample). Complete covariate data are available in the phase-two sample, but other cohort members only have information on survival outcomes and some covariates. Design-weighted estimators use inverse sampling probabilities computed from Samuelsen-type calculations for DNCC. To take advantage of additional information available on all cohort members, we augment the estimating equations with a term that is unbiased for zero but improves the efficiency of estimates from the cause-specific proportional hazards model. We establish the asymptotic properties of the proposed estimators, including the estimator of absolute risk, and derive consistent variance estimators. We show that augmented design-weighted estimators are more efficient than design-weighted estimators. Through simulations, we show that the proposed asymptotic methods yield nominal operating characteristics in practical sample sizes. We illustrate the methods using prostate cancer mortality data from the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial Study of the National Cancer Institute.
{"title":"Absolute risk from double nested case-control designs: cause-specific proportional hazards models with and without augmented estimating equations.","authors":"Minjung Lee, Mitchell H Gail","doi":"10.1093/biomtc/ujae062","DOIUrl":"https://doi.org/10.1093/biomtc/ujae062","url":null,"abstract":"<p><p>We estimate relative hazards and absolute risks (or cumulative incidence or crude risk) under cause-specific proportional hazards models for competing risks from double nested case-control (DNCC) data. In the DNCC design, controls are time-matched not only to cases from the cause of primary interest, but also to cases from competing risks (the phase-two sample). Complete covariate data are available in the phase-two sample, but other cohort members only have information on survival outcomes and some covariates. Design-weighted estimators use inverse sampling probabilities computed from Samuelsen-type calculations for DNCC. To take advantage of additional information available on all cohort members, we augment the estimating equations with a term that is unbiased for zero but improves the efficiency of estimates from the cause-specific proportional hazards model. We establish the asymptotic properties of the proposed estimators, including the estimator of absolute risk, and derive consistent variance estimators. We show that augmented design-weighted estimators are more efficient than design-weighted estimators. Through simulations, we show that the proposed asymptotic methods yield nominal operating characteristics in practical sample sizes. We illustrate the methods using prostate cancer mortality data from the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial Study of the National Cancer Institute.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141589541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Interval-censored failure time data frequently arise in various scientific studies where each subject experiences periodical examinations for the occurrence of the failure event of interest, and the failure time is only known to lie in a specific time interval. In addition, collected data may include multiple observed variables with a certain degree of correlation, leading to severe multicollinearity issues. This work proposes a factor-augmented transformation model to analyze interval-censored failure time data while reducing model dimensionality and avoiding multicollinearity elicited by multiple correlated covariates. We provide a joint modeling framework by comprising a factor analysis model to group multiple observed variables into a few latent factors and a class of semiparametric transformation models with the augmented factors to examine their and other covariate effects on the failure event. Furthermore, we propose a nonparametric maximum likelihood estimation approach and develop a computationally stable and reliable expectation-maximization algorithm for its implementation. We establish the asymptotic properties of the proposed estimators and conduct simulation studies to assess the empirical performance of the proposed method. An application to the Alzheimer's Disease Neuroimaging Initiative (ADNI) study is provided. An R package ICTransCFA is also available for practitioners. Data used in preparation of this article were obtained from the ADNI database.
区间删失失效时间数据经常出现在各种科学研究中,在这些研究中,每个受试者都经历了相关失效事件发生的定期检查,而失效时间只知道在一个特定的时间区间内。此外,收集到的数据可能包含多个具有一定相关性的观测变量,从而导致严重的多重共线性问题。本研究提出了一种因子增强变换模型,用于分析区间删失的故障时间数据,同时降低模型维度,避免多个相关协变量引起的多重共线性。我们提供了一个联合建模框架,其中包括一个因子分析模型,用于将多个观测变量归类为几个潜在因子,以及一类带有增强因子的半参数变换模型,用于检验这些因子和其他协变量对故障事件的影响。此外,我们还提出了一种非参数最大似然估计方法,并为其实现开发了一种计算稳定可靠的期望最大化算法。我们建立了所提估计器的渐近特性,并进行了模拟研究,以评估所提方法的经验性能。我们还提供了阿尔茨海默病神经影像倡议(ADNI)研究的应用。此外,还为实践者提供了一个 R 软件包 ICTransCFA。本文编写过程中使用的数据来自 ADNI 数据库。
{"title":"Factor-augmented transformation models for interval-censored failure time data.","authors":"Hongxi Li, Shuwei Li, Liuquan Sun, Xinyuan Song","doi":"10.1093/biomtc/ujae078","DOIUrl":"https://doi.org/10.1093/biomtc/ujae078","url":null,"abstract":"<p><p>Interval-censored failure time data frequently arise in various scientific studies where each subject experiences periodical examinations for the occurrence of the failure event of interest, and the failure time is only known to lie in a specific time interval. In addition, collected data may include multiple observed variables with a certain degree of correlation, leading to severe multicollinearity issues. This work proposes a factor-augmented transformation model to analyze interval-censored failure time data while reducing model dimensionality and avoiding multicollinearity elicited by multiple correlated covariates. We provide a joint modeling framework by comprising a factor analysis model to group multiple observed variables into a few latent factors and a class of semiparametric transformation models with the augmented factors to examine their and other covariate effects on the failure event. Furthermore, we propose a nonparametric maximum likelihood estimation approach and develop a computationally stable and reliable expectation-maximization algorithm for its implementation. We establish the asymptotic properties of the proposed estimators and conduct simulation studies to assess the empirical performance of the proposed method. An application to the Alzheimer's Disease Neuroimaging Initiative (ADNI) study is provided. An R package ICTransCFA is also available for practitioners. Data used in preparation of this article were obtained from the ADNI database.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142035125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Erin E Gabriel, Michael C Sachs, Ingeborg Waernbaum, Els Goetghebeur, Paul F Blanche, Stijn Vansteelandt, Arvid Sjölander, Thomas Scheike
Recently, it has become common for applied works to combine commonly used survival analysis modeling methods, such as the multivariable Cox model and propensity score weighting, with the intention of forming a doubly robust estimator of an exposure effect hazard ratio that is unbiased in large samples when either the Cox model or the propensity score model is correctly specified. This combination does not, in general, produce a doubly robust estimator, even after regression standardization, when there is truly a causal effect. We demonstrate via simulation this lack of double robustness for the semiparametric Cox model, the Weibull proportional hazards model, and a simple proportional hazards flexible parametric model, with both the latter models fit via maximum likelihood. We provide a novel proof that the combination of propensity score weighting and a proportional hazards survival model, fit either via full or partial likelihood, is consistent under the null of no causal effect of the exposure on the outcome under particular censoring mechanisms if either the propensity score or the outcome model is correctly specified and contains all confounders. Given our results suggesting that double robustness only exists under the null, we outline 2 simple alternative estimators that are doubly robust for the survival difference at a given time point (in the above sense), provided the censoring mechanism can be correctly modeled, and one doubly robust method of estimation for the full survival curve. We provide R code to use these estimators for estimation and inference in the supporting information.
{"title":"Propensity weighting plus adjustment in proportional hazards model is not doubly robust.","authors":"Erin E Gabriel, Michael C Sachs, Ingeborg Waernbaum, Els Goetghebeur, Paul F Blanche, Stijn Vansteelandt, Arvid Sjölander, Thomas Scheike","doi":"10.1093/biomtc/ujae069","DOIUrl":"https://doi.org/10.1093/biomtc/ujae069","url":null,"abstract":"<p><p>Recently, it has become common for applied works to combine commonly used survival analysis modeling methods, such as the multivariable Cox model and propensity score weighting, with the intention of forming a doubly robust estimator of an exposure effect hazard ratio that is unbiased in large samples when either the Cox model or the propensity score model is correctly specified. This combination does not, in general, produce a doubly robust estimator, even after regression standardization, when there is truly a causal effect. We demonstrate via simulation this lack of double robustness for the semiparametric Cox model, the Weibull proportional hazards model, and a simple proportional hazards flexible parametric model, with both the latter models fit via maximum likelihood. We provide a novel proof that the combination of propensity score weighting and a proportional hazards survival model, fit either via full or partial likelihood, is consistent under the null of no causal effect of the exposure on the outcome under particular censoring mechanisms if either the propensity score or the outcome model is correctly specified and contains all confounders. Given our results suggesting that double robustness only exists under the null, we outline 2 simple alternative estimators that are doubly robust for the survival difference at a given time point (in the above sense), provided the censoring mechanism can be correctly modeled, and one doubly robust method of estimation for the full survival curve. We provide R code to use these estimators for estimation and inference in the supporting information.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141733497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peisong Han, Haoyue Li, Sung Kyun Park, Bhramar Mukherjee, Jeremy M G Taylor
We consider the setting where (1) an internal study builds a linear regression model for prediction based on individual-level data, (2) some external studies have fitted similar linear regression models that use only subsets of the covariates and provide coefficient estimates for the reduced models without individual-level data, and (3) there is heterogeneity across these study populations. The goal is to integrate the external model summary information into fitting the internal model to improve prediction accuracy. We adapt the James-Stein shrinkage method to propose estimators that are no worse and are oftentimes better in the prediction mean squared error after information integration, regardless of the degree of study population heterogeneity. We conduct comprehensive simulation studies to investigate the numerical performance of the proposed estimators. We also apply the method to enhance a prediction model for patella bone lead level in terms of blood lead level and other covariates by integrating summary information from published literature.
{"title":"Improving prediction of linear regression models by integrating external information from heterogeneous populations: James-Stein estimators.","authors":"Peisong Han, Haoyue Li, Sung Kyun Park, Bhramar Mukherjee, Jeremy M G Taylor","doi":"10.1093/biomtc/ujae072","DOIUrl":"10.1093/biomtc/ujae072","url":null,"abstract":"<p><p>We consider the setting where (1) an internal study builds a linear regression model for prediction based on individual-level data, (2) some external studies have fitted similar linear regression models that use only subsets of the covariates and provide coefficient estimates for the reduced models without individual-level data, and (3) there is heterogeneity across these study populations. The goal is to integrate the external model summary information into fitting the internal model to improve prediction accuracy. We adapt the James-Stein shrinkage method to propose estimators that are no worse and are oftentimes better in the prediction mean squared error after information integration, regardless of the degree of study population heterogeneity. We conduct comprehensive simulation studies to investigate the numerical performance of the proposed estimators. We also apply the method to enhance a prediction model for patella bone lead level in terms of blood lead level and other covariates by integrating summary information from published literature.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11299067/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141888418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We develop a methodology for valid inference after variable selection in logistic regression when the responses are partially observed, that is, when one observes a set of error-prone testing outcomes instead of the true values of the responses. Aiming at selecting important covariates while accounting for missing information in the response data, we apply the expectation-maximization algorithm to compute maximum likelihood estimators subject to LASSO penalization. Subsequent to variable selection, we make inferences on the selected covariate effects by extending post-selection inference methodology based on the polyhedral lemma. Empirical evidence from our extensive simulation study suggests that our post-selection inference results are more reliable than those from naive inference methods that use the same data to perform variable selection and inference without adjusting for variable selection.
{"title":"Post-selection inference in regression models for group testing data.","authors":"Qinyan Shen, Karl Gregory, Xianzheng Huang","doi":"10.1093/biomtc/ujae101","DOIUrl":"https://doi.org/10.1093/biomtc/ujae101","url":null,"abstract":"<p><p>We develop a methodology for valid inference after variable selection in logistic regression when the responses are partially observed, that is, when one observes a set of error-prone testing outcomes instead of the true values of the responses. Aiming at selecting important covariates while accounting for missing information in the response data, we apply the expectation-maximization algorithm to compute maximum likelihood estimators subject to LASSO penalization. Subsequent to variable selection, we make inferences on the selected covariate effects by extending post-selection inference methodology based on the polyhedral lemma. Empirical evidence from our extensive simulation study suggests that our post-selection inference results are more reliable than those from naive inference methods that use the same data to perform variable selection and inference without adjusting for variable selection.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142280082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The dynamics that govern disease spread are hard to model because infections are functions of both the underlying pathogen as well as human or animal behavior. This challenge is increased when modeling how diseases spread between different spatial locations. Many proposed spatial epidemiological models require trade-offs to fit, either by abstracting away theoretical spread dynamics, fitting a deterministic model, or by requiring large computational resources for many simulations. We propose an approach that approximates the complex spatial spread dynamics with a Gaussian process. We first propose a flexible spatial extension to the well-known SIR stochastic process, and then we derive a moment-closure approximation to this stochastic process. This moment-closure approximation yields ordinary differential equations for the evolution of the means and covariances of the susceptibles and infectious through time. Because these ODEs are a bottleneck to fitting our model by MCMC, we approximate them using a low-rank emulator. This approximation serves as the basis for our hierarchical model for noisy, underreported counts of new infections by spatial location and time. We demonstrate using our model to conduct inference on simulated infections from the underlying, true spatial SIR jump process. We then apply our method to model counts of new Zika infections in Brazil from late 2015 through early 2016.
控制疾病传播的动力学很难建模,因为感染既是潜在病原体的函数,也是人类或动物行为的函数。在模拟疾病如何在不同空间地点之间传播时,这一挑战就更大了。许多建议的空间流行病学模型需要权衡利弊才能拟合,要么抽象出理论传播动态,要么拟合确定性模型,要么需要大量计算资源进行多次模拟。我们提出了一种用高斯过程近似复杂空间传播动态的方法。我们首先对著名的 SIR 随机过程提出了灵活的空间扩展,然后推导出这一随机过程的时刻闭合近似值。这种时刻闭合近似得到了易感因子和感染因子的均值和协方差随时间演变的常微分方程。由于这些常微分方程是用 MCMC 拟合模型的瓶颈,因此我们使用低阶仿真器对其进行近似。这一近似值为我们的分层模型奠定了基础,该模型可用于按空间位置和时间计算有噪声的、未充分报告的新感染人数。我们演示了如何使用我们的模型,根据真实的空间 SIR 跳跃过程对模拟感染进行推断。然后,我们将我们的方法应用于巴西 2015 年末至 2016 年初的寨卡新发感染人数建模。
{"title":"A Gaussian-process approximation to a spatial SIR process using moment closures and emulators.","authors":"Parker Trostle, Joseph Guinness, Brian J Reich","doi":"10.1093/biomtc/ujae068","DOIUrl":"10.1093/biomtc/ujae068","url":null,"abstract":"<p><p>The dynamics that govern disease spread are hard to model because infections are functions of both the underlying pathogen as well as human or animal behavior. This challenge is increased when modeling how diseases spread between different spatial locations. Many proposed spatial epidemiological models require trade-offs to fit, either by abstracting away theoretical spread dynamics, fitting a deterministic model, or by requiring large computational resources for many simulations. We propose an approach that approximates the complex spatial spread dynamics with a Gaussian process. We first propose a flexible spatial extension to the well-known SIR stochastic process, and then we derive a moment-closure approximation to this stochastic process. This moment-closure approximation yields ordinary differential equations for the evolution of the means and covariances of the susceptibles and infectious through time. Because these ODEs are a bottleneck to fitting our model by MCMC, we approximate them using a low-rank emulator. This approximation serves as the basis for our hierarchical model for noisy, underreported counts of new infections by spatial location and time. We demonstrate using our model to conduct inference on simulated infections from the underlying, true spatial SIR jump process. We then apply our method to model counts of new Zika infections in Brazil from late 2015 through early 2016.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11261348/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141733496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A dynamic treatment regime (DTR) is a mathematical representation of a multistage decision process. When applied to sequential treatment selection in medical settings, DTRs are useful for identifying optimal therapies for chronic diseases such as AIDs, mental illnesses, substance abuse, and many cancers. Sequential multiple assignment randomized trials (SMARTs) provide a useful framework for constructing DTRs and providing unbiased between-DTR comparisons. A limitation of SMARTs is that they ignore data from past patients that may be useful for reducing the probability of exposing new patients to inferior treatments. In practice, this may result in decreased treatment adherence or dropouts. To address this problem, we propose a generalized outcome-adaptive (GO) SMART design that adaptively unbalances stage-specific randomization probabilities in favor of treatments observed to be more effective in previous patients. To correct for bias induced by outcome adaptive randomization, we propose G-estimators and inverse-probability-weighted estimators of DTR effects embedded in a GO-SMART and show analytically that they are consistent. We report simulation results showing that, compared to a SMART, Response-Adaptive SMART and SMART with adaptive randomization, a GO-SMART design treats significantly more patients with the optimal DTR and achieves a larger number of total responses while maintaining similar or better statistical power.
{"title":"A generalized outcome-adaptive sequential multiple assignment randomized trial design.","authors":"Xue Yang, Yu Cheng, Peter F Thall, Abdus S Wahed","doi":"10.1093/biomtc/ujae073","DOIUrl":"https://doi.org/10.1093/biomtc/ujae073","url":null,"abstract":"<p><p>A dynamic treatment regime (DTR) is a mathematical representation of a multistage decision process. When applied to sequential treatment selection in medical settings, DTRs are useful for identifying optimal therapies for chronic diseases such as AIDs, mental illnesses, substance abuse, and many cancers. Sequential multiple assignment randomized trials (SMARTs) provide a useful framework for constructing DTRs and providing unbiased between-DTR comparisons. A limitation of SMARTs is that they ignore data from past patients that may be useful for reducing the probability of exposing new patients to inferior treatments. In practice, this may result in decreased treatment adherence or dropouts. To address this problem, we propose a generalized outcome-adaptive (GO) SMART design that adaptively unbalances stage-specific randomization probabilities in favor of treatments observed to be more effective in previous patients. To correct for bias induced by outcome adaptive randomization, we propose G-estimators and inverse-probability-weighted estimators of DTR effects embedded in a GO-SMART and show analytically that they are consistent. We report simulation results showing that, compared to a SMART, Response-Adaptive SMART and SMART with adaptive randomization, a GO-SMART design treats significantly more patients with the optimal DTR and achieves a larger number of total responses while maintaining similar or better statistical power.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141896689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Santeri Karppinen, Liviu Ene, Lovisa Engberg Sundström, Juha Karvanen
We address a Bayesian two-stage decision problem in operational forestry where the inner stage considers scheduling the harvesting to fulfill demand targets and the outer stage considers selecting the accuracy of pre-harvest inventories that are used to estimate the timber volumes of the forest tracts. The higher accuracy of the inventory enables better scheduling decisions but also implies higher costs. We focus on the outer stage, which we formulate as a maximization of the posterior value of the inventory decision under a budget constraint. The posterior value depends on the solution to the inner stage problem and its computation is analytically intractable, featuring an NP-hard binary optimization problem within a high-dimensional integral. In particular, the binary optimization problem is a special case of a generalized quadratic assignment problem. We present a practical method that solves the outer stage problem with an approximation which combines Monte Carlo sampling with a greedy, randomized method for the binary optimization problem. We derive inventory decisions for a dataset of 100 Swedish forest tracts across a range of inventory budgets and estimate the value of the information to be obtained.
{"title":"Planning cost-effective operational forest inventories.","authors":"Santeri Karppinen, Liviu Ene, Lovisa Engberg Sundström, Juha Karvanen","doi":"10.1093/biomtc/ujae104","DOIUrl":"https://doi.org/10.1093/biomtc/ujae104","url":null,"abstract":"<p><p>We address a Bayesian two-stage decision problem in operational forestry where the inner stage considers scheduling the harvesting to fulfill demand targets and the outer stage considers selecting the accuracy of pre-harvest inventories that are used to estimate the timber volumes of the forest tracts. The higher accuracy of the inventory enables better scheduling decisions but also implies higher costs. We focus on the outer stage, which we formulate as a maximization of the posterior value of the inventory decision under a budget constraint. The posterior value depends on the solution to the inner stage problem and its computation is analytically intractable, featuring an NP-hard binary optimization problem within a high-dimensional integral. In particular, the binary optimization problem is a special case of a generalized quadratic assignment problem. We present a practical method that solves the outer stage problem with an approximation which combines Monte Carlo sampling with a greedy, randomized method for the binary optimization problem. We derive inventory decisions for a dataset of 100 Swedish forest tracts across a range of inventory budgets and estimate the value of the information to be obtained.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"80 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142340683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}