Pub Date : 2024-12-20Epub Date: 2024-11-03DOI: 10.1002/sim.10255
Paul N Zivich, Rachael K Ross, Bonnie E Shook-Sa, Stephen R Cole, Jessie K Edwards
Iterated conditional expectation (ICE) g-computation is an estimation approach for addressing time-varying confounding for both longitudinal and time-to-event data. Unlike other g-computation implementations, ICE avoids the need to specify models for each time-varying covariate. For variance estimation, previous work has suggested the bootstrap. However, bootstrapping can be computationally intense. Here, we present ICE g-computation as a set of stacked estimating equations. Therefore, the variance for the ICE g-computation estimator can be consistently estimated using the empirical sandwich variance estimator. Performance of the variance estimator was evaluated empirically with a simulation study. The proposed approach is also demonstrated with an illustrative example on the effect of cigarette smoking on the prevalence of hypertension. In the simulation study, the empirical sandwich variance estimator appropriately estimated the variance. When comparing runtimes between the sandwich variance estimator and the bootstrap for the applied example, the sandwich estimator was substantially faster, even when bootstraps were run in parallel. The empirical sandwich variance estimator is a viable option for variance estimation with ICE g-computation.
迭代条件期望(ICE)g-计算是一种估计方法,用于解决纵向数据和时间到事件数据的时变混杂问题。与其他 g 计算实现不同的是,ICE 无需为每个时变协变量指定模型。对于方差估计,以前的工作建议使用引导法。然而,自举法的计算量很大。在这里,我们将 ICE g 计算作为一组堆叠估计方程。因此,ICE g 计算估计器的方差可以使用经验三明治方差估计器进行一致估计。我们通过模拟研究对方差估计器的性能进行了经验评估。此外,还以吸烟对高血压患病率的影响为例,演示了所提出的方法。在模拟研究中,经验夹心方差估计器恰当地估计了方差。在比较三明治方差估计器和自举法在应用实例中的运行时间时,三明治估计器的速度要快得多,即使在并行运行自举法时也是如此。经验三明治方差估计器是利用 ICE g 计算进行方差估计的可行选择。
{"title":"Empirical Sandwich Variance Estimator for Iterated Conditional Expectation g-Computation.","authors":"Paul N Zivich, Rachael K Ross, Bonnie E Shook-Sa, Stephen R Cole, Jessie K Edwards","doi":"10.1002/sim.10255","DOIUrl":"10.1002/sim.10255","url":null,"abstract":"<p><p>Iterated conditional expectation (ICE) g-computation is an estimation approach for addressing time-varying confounding for both longitudinal and time-to-event data. Unlike other g-computation implementations, ICE avoids the need to specify models for each time-varying covariate. For variance estimation, previous work has suggested the bootstrap. However, bootstrapping can be computationally intense. Here, we present ICE g-computation as a set of stacked estimating equations. Therefore, the variance for the ICE g-computation estimator can be consistently estimated using the empirical sandwich variance estimator. Performance of the variance estimator was evaluated empirically with a simulation study. The proposed approach is also demonstrated with an illustrative example on the effect of cigarette smoking on the prevalence of hypertension. In the simulation study, the empirical sandwich variance estimator appropriately estimated the variance. When comparing runtimes between the sandwich variance estimator and the bootstrap for the applied example, the sandwich estimator was substantially faster, even when bootstraps were run in parallel. The empirical sandwich variance estimator is a viable option for variance estimation with ICE g-computation.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5562-5572"},"PeriodicalIF":1.8,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142569401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-20Epub Date: 2024-11-05DOI: 10.1002/sim.10231
Jiarui Sun, Wenjie Hu, Xiao-Hua Zhou
A therapeutic product is usually not suitable for all patients but for only a subpopulation. The safe and effective use of such a therapeutic product requires the co-approval of a companion diagnostic device which can be used to identify suitable patients. While the first-of-a-kind companion diagnostic device is often developed in conjunction with its intended therapeutic product and simultaneously validated through a randomized clinical trial, there remains room for the innovation of new and improved follow-on companion diagnostic devices designed for the same therapeutic product. However, conducting a new randomized trial or a bridging study for the follow on companion devices may be unethical, expensive or unpractical. Hence, there arises a need for an external study to evaluate the concordance between the FDA-approved comparator companion diagnostic device (CCD) and the subsequent follow-on companion diagnostic devices (FCD), indirectly validating the latter. In this article, we introduce a novel external study design, referred to as the targeted treatment design, as an extension of the existing concordance design. Additionally, we present corresponding statistical analysis methods. Our approach combines the CCD randomized trial data and the FCD external study data, enabling the estimation of drug efficacy within the FCD+ and FCD- subpopulations-the parameters crucial for the validation of the FCD. Theoretical results and simulation studies validate the proposed methods and we further illustrate the proposed methods through an application in a real example of non-small-cell lung cancer.
{"title":"Drug Efficacy Estimation for Follow-on Companion Diagnostic Devices Through External Studies.","authors":"Jiarui Sun, Wenjie Hu, Xiao-Hua Zhou","doi":"10.1002/sim.10231","DOIUrl":"10.1002/sim.10231","url":null,"abstract":"<p><p>A therapeutic product is usually not suitable for all patients but for only a subpopulation. The safe and effective use of such a therapeutic product requires the co-approval of a companion diagnostic device which can be used to identify suitable patients. While the first-of-a-kind companion diagnostic device is often developed in conjunction with its intended therapeutic product and simultaneously validated through a randomized clinical trial, there remains room for the innovation of new and improved follow-on companion diagnostic devices designed for the same therapeutic product. However, conducting a new randomized trial or a bridging study for the follow on companion devices may be unethical, expensive or unpractical. Hence, there arises a need for an external study to evaluate the concordance between the FDA-approved comparator companion diagnostic device (CCD) and the subsequent follow-on companion diagnostic devices (FCD), indirectly validating the latter. In this article, we introduce a novel external study design, referred to as the targeted treatment design, as an extension of the existing concordance design. Additionally, we present corresponding statistical analysis methods. Our approach combines the CCD randomized trial data and the FCD external study data, enabling the estimation of drug efficacy within the FCD+ and FCD- subpopulations-the parameters crucial for the validation of the FCD. Theoretical results and simulation studies validate the proposed methods and we further illustrate the proposed methods through an application in a real example of non-small-cell lung cancer.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5605-5617"},"PeriodicalIF":1.8,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142584331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-20Epub Date: 2024-10-25DOI: 10.1002/sim.10257
Fuyu Guo, David M Zucker, Kenneth I Vaden, Sharon Curhan, Judy R Dubno, Molin Wang
Paired organs like eyes, ears, and lungs in humans exhibit similarities, and data from these organs often display remarkable correlations. Accounting for these correlations could enhance classification models used in predicting disease phenotypes. To our knowledge, there is limited, if any, literature addressing this topic, and existing methods do not exploit such correlations. For example, the conventional approach treats each ear as an independent observation when predicting audiometric phenotypes and is agnostic about the correlation of data from the two ears of the same person. This approach may lead to information loss and reduce the model performance. In response to this gap, particularly in the context of audiometric phenotype prediction, this paper proposes new quadratic discriminant analysis (QDA) algorithms that appropriately deal with the dependence between ears. We propose two-stage analysis strategies: (1) conducting data transformations to reduce data dimensionality before applying QDA; and (2) developing new QDA algorithms to partially utilize the dependence between phenotypes of two ears. We conducted simulation studies to compare different transformation methods and to assess the performance of different QDA algorithms. The empirical results suggested that the transformation may only be beneficial when the sample size is relatively small. Moreover, our proposed new QDA algorithms performed better than the conventional approach in both person-level and ear-level accuracy. As an illustration, we applied them to audiometric data from the Medical University of South Carolina Longitudinal Cohort Study of Age-related Hearing Loss. In addition, we developed an R package, PairQDA, to implement the proposed algorithms.
{"title":"New Quadratic Discriminant Analysis Algorithms for Correlated Audiometric Data.","authors":"Fuyu Guo, David M Zucker, Kenneth I Vaden, Sharon Curhan, Judy R Dubno, Molin Wang","doi":"10.1002/sim.10257","DOIUrl":"10.1002/sim.10257","url":null,"abstract":"<p><p>Paired organs like eyes, ears, and lungs in humans exhibit similarities, and data from these organs often display remarkable correlations. Accounting for these correlations could enhance classification models used in predicting disease phenotypes. To our knowledge, there is limited, if any, literature addressing this topic, and existing methods do not exploit such correlations. For example, the conventional approach treats each ear as an independent observation when predicting audiometric phenotypes and is agnostic about the correlation of data from the two ears of the same person. This approach may lead to information loss and reduce the model performance. In response to this gap, particularly in the context of audiometric phenotype prediction, this paper proposes new quadratic discriminant analysis (QDA) algorithms that appropriately deal with the dependence between ears. We propose two-stage analysis strategies: (1) conducting data transformations to reduce data dimensionality before applying QDA; and (2) developing new QDA algorithms to partially utilize the dependence between phenotypes of two ears. We conducted simulation studies to compare different transformation methods and to assess the performance of different QDA algorithms. The empirical results suggested that the transformation may only be beneficial when the sample size is relatively small. Moreover, our proposed new QDA algorithms performed better than the conventional approach in both person-level and ear-level accuracy. As an illustration, we applied them to audiometric data from the Medical University of South Carolina Longitudinal Cohort Study of Age-related Hearing Loss. In addition, we developed an R package, PairQDA, to implement the proposed algorithms.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5473-5483"},"PeriodicalIF":1.8,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142508281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-20Epub Date: 2024-10-31DOI: 10.1002/sim.10264
Rebecca B Silva, Bin Cheng, Richard D Carvajal, Shing M Lee
Broadening eligibility criteria in cancer trials has been advocated to represent the intended patient population more accurately. The advantages are clear in terms of generalizability and recruitment, however there are some important considerations in terms of design for efficiency and patient safety. While toxicity may be expected to be homogeneous across these subpopulations, designs should be able to recommend safe and precise doses if subpopulations with different toxicity profiles exist. Dose-finding designs accounting for patient heterogeneity have been proposed, but existing methods assume that the source of heterogeneity is known. We propose a broadened eligibility dose-finding design to address the situation of unknown patient heterogeneity in phase I cancer clinical trials where eligibility is expanded, and multiple eligibility criteria could potentially lead to different optimal doses for patient subgroups. The design offers a two-in-one approach to dose-finding by simultaneously selecting patient criteria that differentiate the maximum tolerated dose (MTD), using stochastic search variable selection, and recommending the subpopulation-specific MTD if needed. Our simulation study compares the proposed design to the naive approach of assuming patient homogeneity and demonstrates favorable operating characteristics across a wide range of scenarios, allocating patients more often to their true MTD during the trial, recommending more than one MTD when needed, and identifying criteria that differentiate the patient population. The proposed design highlights the advantages of adding more variability at an early stage and demonstrates how assuming patient homogeneity can lead to unsafe or sub-therapeutic dose recommendations.
人们主张扩大癌症试验的资格标准,以便更准确地代表预期的患者群体。在普及性和招募方面的优势显而易见,但在设计效率和患者安全方面也有一些重要的考虑因素。虽然预计这些亚人群的毒性可能是相同的,但如果存在毒性特征不同的亚人群,设计应能够推荐安全和精确的剂量。已经有人提出了考虑患者异质性的剂量寻找设计,但现有方法假定异质性的来源是已知的。我们提出了一种扩大资格的剂量寻找设计,以解决 I 期癌症临床试验中患者异质性未知的情况,在这种情况下,资格范围扩大了,多种资格标准有可能导致患者亚群的最佳剂量不同。该设计提供了一种二合一的剂量寻找方法,即同时选择可区分最大耐受剂量(MTD)的患者标准,使用随机搜索变量选择,并在需要时推荐特定亚群的 MTD。我们的模拟研究将拟议的设计与假定患者同质性的天真方法进行了比较,结果表明,在各种情况下,拟议的设计都具有良好的运行特性,能在试验期间更频繁地将患者分配到其真正的 MTD,在需要时推荐一种以上的 MTD,并能确定区分患者群体的标准。建议的设计突出了在早期阶段增加更多可变性的优势,并展示了假设患者同质性会如何导致不安全或亚治疗剂量推荐。
{"title":"Dose Individualization for Phase I Cancer Trials With Broadened Eligibility.","authors":"Rebecca B Silva, Bin Cheng, Richard D Carvajal, Shing M Lee","doi":"10.1002/sim.10264","DOIUrl":"10.1002/sim.10264","url":null,"abstract":"<p><p>Broadening eligibility criteria in cancer trials has been advocated to represent the intended patient population more accurately. The advantages are clear in terms of generalizability and recruitment, however there are some important considerations in terms of design for efficiency and patient safety. While toxicity may be expected to be homogeneous across these subpopulations, designs should be able to recommend safe and precise doses if subpopulations with different toxicity profiles exist. Dose-finding designs accounting for patient heterogeneity have been proposed, but existing methods assume that the source of heterogeneity is known. We propose a broadened eligibility dose-finding design to address the situation of unknown patient heterogeneity in phase I cancer clinical trials where eligibility is expanded, and multiple eligibility criteria could potentially lead to different optimal doses for patient subgroups. The design offers a two-in-one approach to dose-finding by simultaneously selecting patient criteria that differentiate the maximum tolerated dose (MTD), using stochastic search variable selection, and recommending the subpopulation-specific MTD if needed. Our simulation study compares the proposed design to the naive approach of assuming patient homogeneity and demonstrates favorable operating characteristics across a wide range of scenarios, allocating patients more often to their true MTD during the trial, recommending more than one MTD when needed, and identifying criteria that differentiate the patient population. The proposed design highlights the advantages of adding more variability at an early stage and demonstrates how assuming patient homogeneity can lead to unsafe or sub-therapeutic dose recommendations.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5534-5547"},"PeriodicalIF":1.8,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142547548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-20Epub Date: 2024-10-28DOI: 10.1002/sim.10208
Johannes Wieditz, Clemens Miller, Jan Scholand, Marcus Nemeth
The analysis of survey data is a frequently arising issue in clinical trials, particularly when capturing quantities which are difficult to measure. Typical examples are questionnaires about patient's well-being, pain, or consent to an intervention. In these, data is captured on a discrete scale containing only a limited number of possible answers, from which the respondent has to pick the answer which fits best his/her personal opinion. This data is generally located on an ordinal scale as answers can usually be arranged in an ascending order, for example, "bad", "neutral", "good" for well-being. Since responses are usually stored numerically for data processing purposes, analysis of survey data using ordinary linear regression models are commonly applied. However, assumptions of these models are often not met as linear regression requires a constant variability of the response variable and can yield predictions out of the range of response categories. By using linear models, one only gains insights about the mean response which may affect representativeness. In contrast, ordinal regression models can provide probability estimates for all response categories and yield information about the full response scale beyond the mean. In this work, we provide a concise overview of the fundamentals of latent variable based ordinal models, applications to a real data set, and outline the use of state-of-the-art-software for this purpose. Moreover, we discuss strengths, limitations and typical pitfalls. This is a companion work to a current vignette-based structured interview study in pediatric anesthesia.
{"title":"A Brief Introduction on Latent Variable Based Ordinal Regression Models With an Application to Survey Data.","authors":"Johannes Wieditz, Clemens Miller, Jan Scholand, Marcus Nemeth","doi":"10.1002/sim.10208","DOIUrl":"10.1002/sim.10208","url":null,"abstract":"<p><p>The analysis of survey data is a frequently arising issue in clinical trials, particularly when capturing quantities which are difficult to measure. Typical examples are questionnaires about patient's well-being, pain, or consent to an intervention. In these, data is captured on a discrete scale containing only a limited number of possible answers, from which the respondent has to pick the answer which fits best his/her personal opinion. This data is generally located on an ordinal scale as answers can usually be arranged in an ascending order, for example, \"bad\", \"neutral\", \"good\" for well-being. Since responses are usually stored numerically for data processing purposes, analysis of survey data using ordinary linear regression models are commonly applied. However, assumptions of these models are often not met as linear regression requires a constant variability of the response variable and can yield predictions out of the range of response categories. By using linear models, one only gains insights about the mean response which may affect representativeness. In contrast, ordinal regression models can provide probability estimates for all response categories and yield information about the full response scale beyond the mean. In this work, we provide a concise overview of the fundamentals of latent variable based ordinal models, applications to a real data set, and outline the use of state-of-the-art-software for this purpose. Moreover, we discuss strengths, limitations and typical pitfalls. This is a companion work to a current vignette-based structured interview study in pediatric anesthesia.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5618-5634"},"PeriodicalIF":1.8,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11588990/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142523101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-20Epub Date: 2024-10-28DOI: 10.1002/sim.10239
Rongqian Sun, Xinyuan Song
This study proposes a heterogeneous mediation analysis for survival data that accommodates multiple mediators and sparsity of the predictors. We introduce a joint modeling approach that links the mediation regression and proportional hazards models through Bayesian additive regression trees with shared typologies. The shared tree component is motivated by the fact that confounders and effect modifiers on the causal pathways linked by different mediators often overlap. A sparsity-inducing prior is incorporated to capture the most relevant confounders and effect modifiers on different causal pathways. The individual-specific interventional direct and indirect effects are derived on the scale of the logarithm of hazards and survival function. A Bayesian approach with an efficient Markov chain Monte Carlo algorithm is developed to estimate the conditional interventional effects through the Monte Carlo implementation of the mediation formula. Simulation studies are conducted to verify the empirical performance of the proposed method. An application to the ACTG175 study further demonstrates the method's utility in causal discovery and heterogeneity quantification.
{"title":"Heterogeneous Mediation Analysis for Cox Proportional Hazards Model With Multiple Mediators.","authors":"Rongqian Sun, Xinyuan Song","doi":"10.1002/sim.10239","DOIUrl":"10.1002/sim.10239","url":null,"abstract":"<p><p>This study proposes a heterogeneous mediation analysis for survival data that accommodates multiple mediators and sparsity of the predictors. We introduce a joint modeling approach that links the mediation regression and proportional hazards models through Bayesian additive regression trees with shared typologies. The shared tree component is motivated by the fact that confounders and effect modifiers on the causal pathways linked by different mediators often overlap. A sparsity-inducing prior is incorporated to capture the most relevant confounders and effect modifiers on different causal pathways. The individual-specific interventional direct and indirect effects are derived on the scale of the logarithm of hazards and survival function. A Bayesian approach with an efficient Markov chain Monte Carlo algorithm is developed to estimate the conditional interventional effects through the Monte Carlo implementation of the mediation formula. Simulation studies are conducted to verify the empirical performance of the proposed method. An application to the ACTG175 study further demonstrates the method's utility in causal discovery and heterogeneity quantification.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5497-5512"},"PeriodicalIF":1.8,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11588993/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142523115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-20Epub Date: 2024-10-31DOI: 10.1002/sim.10260
Peter C Austin
In observational health services research, researchers often use clustered data to estimate the independent association between individual outcomes and several cluster-level covariates after adjusting for individual-level characteristics. Generalized estimating equations are a popular method for estimating generalized linear models using clustered data. The conventional Liang-Zeger variance estimator is known to result in estimated standard errors that are biased low when the number of clusters in small. Alternative variance estimators have been proposed for use when the number of clusters is low. Previous studies focused on these alternative variance estimators in the context of cluster randomized trials, which are often characterized by a small number of clusters and by an outcomes regression model that often consists of a single cluster-level variable (the treatment/exposure variable). We addressed the following questions: (i) which estimator is preferred for estimating the standard errors of cluster-level covariates for logistic regression models with multiple binary and continuous cluster-level variables in addition to subject-level variables; (ii) in such settings, how many clusters are required for the Liang-Zeger variance estimator to have acceptable performance for estimating the standard errors of cluster-level covariates. We suggest that when estimating standard errors: (i) when the number of clusters is < 15 use the Kauermann-Carroll estimator; (ii) when the number of clusters is between 15 and 40 use the Fay-Graubard estimator; (iii) when the number of clusters exceeds 40, use the Liang-Zeger estimator or the Fay-Graubard estimator. When estimating confidence intervals, we suggest using the Mancl-DeRouen estimator with a t-distribution.
{"title":"A Comparison of Variance Estimators for Logistic Regression Models Estimated Using Generalized Estimating Equations (GEE) in the Context of Observational Health Services Research.","authors":"Peter C Austin","doi":"10.1002/sim.10260","DOIUrl":"10.1002/sim.10260","url":null,"abstract":"<p><p>In observational health services research, researchers often use clustered data to estimate the independent association between individual outcomes and several cluster-level covariates after adjusting for individual-level characteristics. Generalized estimating equations are a popular method for estimating generalized linear models using clustered data. The conventional Liang-Zeger variance estimator is known to result in estimated standard errors that are biased low when the number of clusters in small. Alternative variance estimators have been proposed for use when the number of clusters is low. Previous studies focused on these alternative variance estimators in the context of cluster randomized trials, which are often characterized by a small number of clusters and by an outcomes regression model that often consists of a single cluster-level variable (the treatment/exposure variable). We addressed the following questions: (i) which estimator is preferred for estimating the standard errors of cluster-level covariates for logistic regression models with multiple binary and continuous cluster-level variables in addition to subject-level variables; (ii) in such settings, how many clusters are required for the Liang-Zeger variance estimator to have acceptable performance for estimating the standard errors of cluster-level covariates. We suggest that when estimating standard errors: (i) when the number of clusters is < 15 use the Kauermann-Carroll estimator; (ii) when the number of clusters is between 15 and 40 use the Fay-Graubard estimator; (iii) when the number of clusters exceeds 40, use the Liang-Zeger estimator or the Fay-Graubard estimator. When estimating confidence intervals, we suggest using the Mancl-DeRouen estimator with a t-distribution.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5548-5561"},"PeriodicalIF":1.8,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11588976/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142547547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-20Epub Date: 2024-11-05DOI: 10.1002/sim.10254
Juhee Lee, Peter F Thall
For phase II clinical trials that determine the acceptability of an experimental treatment based on ordinal toxicity and ordinal response, most monitoring methods require each ordinal outcome to be dichotomized using a selected cut-point. This allows two early stopping rules to be constructed that compare marginal probabilities of toxicity and response to respective upper and lower limits. Important problems with this approach are loss of information due to dichotomization, dependence of treatment acceptability decisions on precisely how each ordinal variable is dichotomized, and ignoring association between the two outcomes. To address these problems, we propose a new Bayesian method, which we call U-Bayes, that exploits elicited numerical utilities of the joint ordinal outcomes to construct one early stopping rule that compares the mean utility to a lower limit. U-Bayes avoids the problems noted above by using the entire joint distribution of the ordinal outcomes, and not dichotomizing the outcomes. A step-by-step algorithm is provided for constructing a U-Bayes rule based on elicited utilities and elicited limits on marginal outcome probabilities. A simulation study shows that U-Bayes greatly improves the probability of determining treatment acceptability compared to conventional designs that use two monitoring rules based on marginal probabilities.
对于根据序数毒性和序数反应确定实验治疗可接受性的 II 期临床试验,大多数监测方法都要求使用选定的切点对每个序数结果进行二分。这样就可以构建两个早期停止规则,将毒性和反应的边际概率与各自的上限和下限进行比较。这种方法存在的重要问题是,二分法会导致信息丢失,治疗可接受性决定取决于每个序数变量如何精确二分,以及忽略两个结果之间的关联。为了解决这些问题,我们提出了一种新的贝叶斯方法(我们称之为 U-Bayes),该方法利用所获得的联合序数结果的数值效用来构建一个早期停止规则,将平均效用与下限进行比较。U-Bayes 通过使用整个序数结果的联合分布,而不是将结果二分,从而避免了上述问题。本文提供了一种分步算法,用于根据激发的效用和激发的边际结果概率限制构建 U-Bayes 规则。一项模拟研究表明,与使用基于边际概率的两种监测规则的传统设计相比,U-贝叶斯法则大大提高了确定治疗可接受性的概率。
{"title":"Bayesian Safety and Futility Monitoring in Phase II Trials Using One Utility-Based Rule.","authors":"Juhee Lee, Peter F Thall","doi":"10.1002/sim.10254","DOIUrl":"10.1002/sim.10254","url":null,"abstract":"<p><p>For phase II clinical trials that determine the acceptability of an experimental treatment based on ordinal toxicity and ordinal response, most monitoring methods require each ordinal outcome to be dichotomized using a selected cut-point. This allows two early stopping rules to be constructed that compare marginal probabilities of toxicity and response to respective upper and lower limits. Important problems with this approach are loss of information due to dichotomization, dependence of treatment acceptability decisions on precisely how each ordinal variable is dichotomized, and ignoring association between the two outcomes. To address these problems, we propose a new Bayesian method, which we call U-Bayes, that exploits elicited numerical utilities of the joint ordinal outcomes to construct one early stopping rule that compares the mean utility to a lower limit. U-Bayes avoids the problems noted above by using the entire joint distribution of the ordinal outcomes, and not dichotomizing the outcomes. A step-by-step algorithm is provided for constructing a U-Bayes rule based on elicited utilities and elicited limits on marginal outcome probabilities. A simulation study shows that U-Bayes greatly improves the probability of determining treatment acceptability compared to conventional designs that use two monitoring rules based on marginal probabilities.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5583-5595"},"PeriodicalIF":1.8,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11781291/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142576941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eun Jin Jang, Anbin Rhee, Soo-Kyung Cho, Keunbaik Lee
Analysis of healthcare utilization, such as hospitalization duration and medical costs, is crucial for policymakers and doctors in experimental and epidemiological investigations. Herein, we examine the healthcare utilization data of patients with systemic lupus erythematosus (SLE). The characteristics of the SLE data were measured over a 10-year period with outliers. Multivariate linear models with multivariate normal error distributions are commonly used to evaluate long series of multivariate longitudinal data. However, when there are outliers or heavy tails in the data, such as those based on healthcare utilization, the assumption of multivariate normality may be too strong, resulting in biased estimates. To address this, we propose multivariate t-linear models (MTLMs) with an autoregressive moving-average (ARMA) covariance matrix. Modeling the covariance matrix for multivariate longitudinal data is difficult since the covariance matrix is high dimensional and must be positive-definite. To address these, we employ a modified ARMA Cholesky decomposition and hypersphere decomposition. Several simulation studies are conducted to demonstrate the performance, robustness, and flexibility of the proposed models. The proposed MTLMs with ARMA structured covariance matrix are applied to analyze the healthcare utilization data of patients with SLE.
{"title":"Analysis of Longitudinal Lupus Data Using Multivariate t-Linear Models.","authors":"Eun Jin Jang, Anbin Rhee, Soo-Kyung Cho, Keunbaik Lee","doi":"10.1002/sim.10248","DOIUrl":"https://doi.org/10.1002/sim.10248","url":null,"abstract":"<p><p>Analysis of healthcare utilization, such as hospitalization duration and medical costs, is crucial for policymakers and doctors in experimental and epidemiological investigations. Herein, we examine the healthcare utilization data of patients with systemic lupus erythematosus (SLE). The characteristics of the SLE data were measured over a 10-year period with outliers. Multivariate linear models with multivariate normal error distributions are commonly used to evaluate long series of multivariate longitudinal data. However, when there are outliers or heavy tails in the data, such as those based on healthcare utilization, the assumption of multivariate normality may be too strong, resulting in biased estimates. To address this, we propose multivariate t-linear models (MTLMs) with an autoregressive moving-average (ARMA) covariance matrix. Modeling the covariance matrix for multivariate longitudinal data is difficult since the covariance matrix is high dimensional and must be positive-definite. To address these, we employ a modified ARMA Cholesky decomposition and hypersphere decomposition. Several simulation studies are conducted to demonstrate the performance, robustness, and flexibility of the proposed models. The proposed MTLMs with ARMA structured covariance matrix are applied to analyze the healthcare utilization data of patients with SLE.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142865174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the framework of causal inference, average treatment effect (ATE) is one of crucial concerns. To estimate it, the propensity score based estimation method and its variants have been widely adopted. However, most existing methods were developed by assuming that binary treatments are precisely measured. In addition, propensity scores are usually formulated as parametric models with respect to confounders. However, in the presence of measurement error in binary treatments and nonlinear relationship between treatments and confounders, existing methods are no longer valid and may yield biased inference results if these features are ignored. In this paper, we first analytically examine the impact of estimation of ATE and derive biases for the estimator of ATE when treatments are contaminated with measurement error. After that, we develop a valid method to address binary treatments with misclassification. Given the corrected treatments, we adopt the random forest method to estimate the propensity score with nonlinear confounders accommodated and then derive the estimator of ATE. Asymptotic properties of the error-eliminated estimator are established. Numerical studies are also conducted to assess the finite sample performance of the proposed estimator, and numerical results verify the importance of correcting for measurement error effects.
{"title":"Nonparametric Estimation for Propensity Scores With Misclassified Treatments.","authors":"Li-Pang Chen","doi":"10.1002/sim.10306","DOIUrl":"https://doi.org/10.1002/sim.10306","url":null,"abstract":"<p><p>In the framework of causal inference, average treatment effect (ATE) is one of crucial concerns. To estimate it, the propensity score based estimation method and its variants have been widely adopted. However, most existing methods were developed by assuming that binary treatments are precisely measured. In addition, propensity scores are usually formulated as parametric models with respect to confounders. However, in the presence of measurement error in binary treatments and nonlinear relationship between treatments and confounders, existing methods are no longer valid and may yield biased inference results if these features are ignored. In this paper, we first analytically examine the impact of estimation of ATE and derive biases for the estimator of ATE when treatments are contaminated with measurement error. After that, we develop a valid method to address binary treatments with misclassification. Given the corrected treatments, we adopt the random forest method to estimate the propensity score with nonlinear confounders accommodated and then derive the estimator of ATE. Asymptotic properties of the error-eliminated estimator are established. Numerical studies are also conducted to assess the finite sample performance of the proposed estimator, and numerical results verify the importance of correcting for measurement error effects.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142847719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}