{"title":"bcmixed: A Package for Median Inference on Longitudinal Data with the Box-Cox Transformation","authors":"K. Maruo, Ryota Ishii, Y. Yamaguchi, M. Gosho","doi":"10.32614/rj-2021-083","DOIUrl":null,"url":null,"abstract":"This article illustrates the use of the bcmixed package and focuses on the two main functions: bcmarg and bcmmrm. The bcmarg function provides inference results for a marginal model of a mixed effect model using the Box–Cox transformation. The bcmmrm function provides model median inferences based on the mixed effect models for repeated measures analysis using the Box–Cox transformation for longitudinal randomized clinical trials. Using the bcmmrm function, analysis results with high power and high interpretability for treatment effects can be obtained for longitudinal randomized clinical trials with skewed outcomes. Further, the bcmixed package provides summarizing and visualization tools, which would be helpful to write clinical trial reports. Introduction Longitudinal data are often observed in medical or biological research. One of the most popular statistical models for studying longitudinal continuous outcomes is the linear mixed effect model. Several packages are available from CRAN that allow for the implementation of linear mixed effects models (e.g., nlme (Pinheiro et al., 2019), glme (Sam Weerahandi et al., 2021), lme4 (Bates et al., 2015), CLME (Jelsema and Peddada, 2016), PLmixed (Rockwood and Jeon, 2019), MCMCglmm (Hadfield, 2010)).The linear mixed effect models assume that longitudinal outcomes follow multivariate normal distribution. However, the distribution of the outcome is often right skewed in the medical and biological fields. Therefore, evaluating fixed effects based on the normal distribution theory may result in inefficient inferences such as power loss for some statistical tests. In addition, although a model-based mean for a certain level of the categorical exploratory variables is often estimated when applying the linear mixed effect model (e.g., the model-based mean for each treatment group of a last visit in a randomized clinical trial), the mean may be inadequate as a representative value for the skewed data. The Box–Cox transformation (Box and Cox, 1964) is often applied to skewed longitudinal data (Lipsitz et al., 2000) to reduce the skewness of a skewed outcome and apply existing statistical models based on a normal distribution. However, it is difficult to directly interpret the model mean estimated on the scale after applying some transformations to the outcome variable. For the sake of the interpretability of the analysis results, Maruo et al. (2015) propose a model median inference method on the original scale based on the Box–Cox transformation in the context of randomized clinical trials. Maruo et al. (2017) extend this method to the framework of mixed effects models for repeated measures (MMRM) analysis (Mallinckrodt et al., 2001) in the context of longitudinal randomized clinical trials. The bcmixed package (Maruo et al., 2020) contains functions to estimate model medians for longitudinal data proposed by Maruo et al. (2017) as well as a sample data set that is used in Maruo et al. (2017). In this package, the parameter estimation is conducted partially using the gls function in the nlme package. This paper illustrates the usage of the package with the sample data in several contexts. The remainder of this manuscript is organized as follows: In section Methods, we describe the methods proposed by Maruo et al. (2017). Then, we illustrate the usage of the bcmixed package with the example data in section Illustrations. Finally, our contributions are summarized in section Summary and discussion. Methods We briefly introduce the method proposed by in Maruo et al. (2017). The detailed expressions can be found in Maruo et al. (2017). The R Journal Vol. XX/YY, AAAA 20ZZ ISSN 2073-4859 CONTRIBUTED RESEARCH ARTICLE 2 Parameter inference for the Box–Cox model Let the outcome be a continuous and positive value. The outcomes are measured over time for each subject i = 1, . . . , n, and the number of planned measurement occasions is T (occasion index: t = 1, . . . , T). The outcome vector for the ith subject is denoted by yi = (yi1, . . . , yini ) T , where ni is the number of measurements for the ith subject. We have ni ≤ T because of missingness. Then, we consider applying the following model (Lipsitz et al., 2000; Box and Cox, 1964): zi = Xiβ + Wibi + ei, (1) where zi is the Box–Cox transformed outcome vector such that the jth element (j = 1, . . . , ni) is denoted by zij = { (yλ ij − 1)/λ, λ 6= 0, log yij, λ = 0, and λ is the transformation parameter. The parameter β is the p-dimensional vector of the fixed effect, bi is the q-dimensional vector of random effects distributed as MVNq(0q, D), ei is the ni-dimensional vector of random errors distributed independently as MVNni (0ni , Σi), and Xi and Wi are ni × p and ni × q design matrices relating the fixed and random effects, respectively. The random variables bi and ei are independent. From Formula (1), it is derived that zi|λ ∼ MVNni (Xiβ, Vi), where Vi = WiDW i + Σi. In this paper, we consider situations where researchers have little interest in random effects and are interested in assessing fixed effects. In such cases, a simple formulation of the linear mixed effect model (1) can be implemented wherein the random effects are not explicitly modeled, but are rather included as part of the covariance matrix Vi. We focus on such a “marginal” mean model. The covariance parameter vector in V = Vi for ni = T (i.e., subjects with no missing values) is denoted as α = (α1, . . . , αm) . The dimension of α, m, depends on T and the specified covariance structure. Let the model parameter vector be θ = (λ, βT , αT)T . The maximum likelihood estimate of θ is obtained through maximizing the profile likelihood for λ (Lipsitz et al., 2000; Maruo et al., 2017). The model-based and robust variance estimators of the maximum likelihood estimator θ̂ are given by V θ = {−Ĥ} −1 and V θ = {−Ĥ} −1 Ĵ{−Ĥ}−1, respectively, where H = ∂2 ∂θ∂θT l(θ), J = ∑n i=1 { ∂ ∂θ li(θ) }{ ∂ ∂θ li(θ) }T , l(θ) is the likelihood function for n subjects, and li(θ) is the likelihood function for the ith subject. The matrices Ĥ and Ĵ are obtained from the matrices H and J by replacing θ by θ̂, respectively. A robust variance estimator is an asymptotically valid estimator even when the model (1) is mis-specified (Cox, 1961). Model median inference We now focus on the continuous and positive outcomes of a certain disease and consider a situation in which the efficacy of some treatments (group index: g = 1, ..., G) is compared based on a randomized, parallel group clinical trial, where the total number of subjects is n. The explanatory variable matrix, Xi, and the fixed effect parameter, β in model (1) are denoted in detail as follows. Now, Xi is given by the ni× (GT +C) matrix that contains the intercept, G− 1 group variables, T− 1 occasion variables, groupby-occasion interaction variables, and C covariates, where the categorical covariates are converted into dummy variables. The fixed effect parameter vector is given by β = (β I , βg , βt , β T gt, β T c ) T , where β I , βg = (βg(1), . . . , βg(G−1)) T , βt = (βt(1), . . . , βt(T−1)) T , βgt = (βgt(1,1), βgt(1,2), . . . , βgt(G−1,T−1)) T , and βc = (βc(1), . . . , βc(C)) T are the intercept, group, occasion, group-by-occasion, and covariate parameter vectors, respectively. Although group G and occasion T is set to be at the reference level, we set βg(G) = βt(T) = βgt(G,t) = βgt(g,T) = 0 for the sake of description (g = 1, . . . , G, t = 1, . . . , T). Under these settings, the model median, ξ(g,t), for the treatment group g at the occasion t on the original scale is given by ξ(g,t) = { λ ( β I + βg(g) + βt(t) + βgt(g,t) + x T c̄ βc ) + 1 }1/λ , where xc̄ is the vector of the mean of each covariate for all subjects. The model median is the inverse Box–Cox transformation of the model means on the Box–Cox transformed scale. The model median can be easily interpreted because it is the estimator of the median on the original scale. Using the delta method, the variance estimator of the maximum likelihood estimator for the model median, ξ̂(g,t), is given by V (·) ξ(g,t) = ∆ T ξ(g,t)V (·) θ ∆ξ(g,t), where ∆ξ(g,t) = ∂ ∂θ ξ(g,t) ∣∣∣ θ=θ̂ . If we use V(·) θ = V (M) θ , we obtain the model-based variance estimator, V (M) ξ(g,t). On the other hand, the robust The R Journal Vol. XX/YY, AAAA 20ZZ ISSN 2073-4859 CONTRIBUTED RESEARCH ARTICLE 3 variance estimator, V xi(g,t), is obtained if we use V (·) θ = V (R) θ . Thus, the model median difference between groups g1 and g2 at occasion t is given by δ(g1;g2,t) = ξ(g1,t) − ξ(g2,t) and the variance estimators of the maximum likelihood estimator of the model median difference, δ̂(g1;g2,t), is given by Vδ(g1;g2,t) = (∆(g1,t) − ∆(g2,t)) TV(·) θ (∆(g1,t) − ∆(g2,t)). The model-based and robust variance estimators are obtained in the same way as the model median estimator. Using the asymptotic normality of the maximum likelihood estimator, the 100(1− α)% confidence interval of δ(g1;g2,t) is obtained as δ̂(g1;g2,t) ±Φ −1(1− α/2) √ V(·) δ(g1;g2,t) , where Φ−1(·) is the quantile function of the standard normal distribution. The Wald-type test for the null hypothesis, δ(g1;g2,t) = 0, is performed with the test statistic t = δ̂(g1;g2,t)/ √ V(·) δ(g1;g2,t) . The performances of above-mentioned inference procedures may be low for a small sample because these are based on the asymptotic properties of the maximum likelihood estimation. Therefore, Maruo et al. (2017) applied the following empirical small sample adjustment for the inferences of the model median differences by referring to the study in Schluchter and Elashoff (1990). They provide an adjusted standard error (SE) for the median difference as √ M/(M− p)V(·) δ(g1;g2,t) for the compound symmetry (CS) or the first-order auto regression (AR(1)) structure and √ n∗/(n∗ − T) × √ V(·) δ(g1;g2,t) for the unstructured (UN) structure. Further, we approximate the null distribution by the t distribution where the degree of freedom for the CS or the AR(1) struc","PeriodicalId":20974,"journal":{"name":"R J.","volume":"38 1","pages":"153"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"R J.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32614/rj-2021-083","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
This article illustrates the use of the bcmixed package and focuses on the two main functions: bcmarg and bcmmrm. The bcmarg function provides inference results for a marginal model of a mixed effect model using the Box–Cox transformation. The bcmmrm function provides model median inferences based on the mixed effect models for repeated measures analysis using the Box–Cox transformation for longitudinal randomized clinical trials. Using the bcmmrm function, analysis results with high power and high interpretability for treatment effects can be obtained for longitudinal randomized clinical trials with skewed outcomes. Further, the bcmixed package provides summarizing and visualization tools, which would be helpful to write clinical trial reports. Introduction Longitudinal data are often observed in medical or biological research. One of the most popular statistical models for studying longitudinal continuous outcomes is the linear mixed effect model. Several packages are available from CRAN that allow for the implementation of linear mixed effects models (e.g., nlme (Pinheiro et al., 2019), glme (Sam Weerahandi et al., 2021), lme4 (Bates et al., 2015), CLME (Jelsema and Peddada, 2016), PLmixed (Rockwood and Jeon, 2019), MCMCglmm (Hadfield, 2010)).The linear mixed effect models assume that longitudinal outcomes follow multivariate normal distribution. However, the distribution of the outcome is often right skewed in the medical and biological fields. Therefore, evaluating fixed effects based on the normal distribution theory may result in inefficient inferences such as power loss for some statistical tests. In addition, although a model-based mean for a certain level of the categorical exploratory variables is often estimated when applying the linear mixed effect model (e.g., the model-based mean for each treatment group of a last visit in a randomized clinical trial), the mean may be inadequate as a representative value for the skewed data. The Box–Cox transformation (Box and Cox, 1964) is often applied to skewed longitudinal data (Lipsitz et al., 2000) to reduce the skewness of a skewed outcome and apply existing statistical models based on a normal distribution. However, it is difficult to directly interpret the model mean estimated on the scale after applying some transformations to the outcome variable. For the sake of the interpretability of the analysis results, Maruo et al. (2015) propose a model median inference method on the original scale based on the Box–Cox transformation in the context of randomized clinical trials. Maruo et al. (2017) extend this method to the framework of mixed effects models for repeated measures (MMRM) analysis (Mallinckrodt et al., 2001) in the context of longitudinal randomized clinical trials. The bcmixed package (Maruo et al., 2020) contains functions to estimate model medians for longitudinal data proposed by Maruo et al. (2017) as well as a sample data set that is used in Maruo et al. (2017). In this package, the parameter estimation is conducted partially using the gls function in the nlme package. This paper illustrates the usage of the package with the sample data in several contexts. The remainder of this manuscript is organized as follows: In section Methods, we describe the methods proposed by Maruo et al. (2017). Then, we illustrate the usage of the bcmixed package with the example data in section Illustrations. Finally, our contributions are summarized in section Summary and discussion. Methods We briefly introduce the method proposed by in Maruo et al. (2017). The detailed expressions can be found in Maruo et al. (2017). The R Journal Vol. XX/YY, AAAA 20ZZ ISSN 2073-4859 CONTRIBUTED RESEARCH ARTICLE 2 Parameter inference for the Box–Cox model Let the outcome be a continuous and positive value. The outcomes are measured over time for each subject i = 1, . . . , n, and the number of planned measurement occasions is T (occasion index: t = 1, . . . , T). The outcome vector for the ith subject is denoted by yi = (yi1, . . . , yini ) T , where ni is the number of measurements for the ith subject. We have ni ≤ T because of missingness. Then, we consider applying the following model (Lipsitz et al., 2000; Box and Cox, 1964): zi = Xiβ + Wibi + ei, (1) where zi is the Box–Cox transformed outcome vector such that the jth element (j = 1, . . . , ni) is denoted by zij = { (yλ ij − 1)/λ, λ 6= 0, log yij, λ = 0, and λ is the transformation parameter. The parameter β is the p-dimensional vector of the fixed effect, bi is the q-dimensional vector of random effects distributed as MVNq(0q, D), ei is the ni-dimensional vector of random errors distributed independently as MVNni (0ni , Σi), and Xi and Wi are ni × p and ni × q design matrices relating the fixed and random effects, respectively. The random variables bi and ei are independent. From Formula (1), it is derived that zi|λ ∼ MVNni (Xiβ, Vi), where Vi = WiDW i + Σi. In this paper, we consider situations where researchers have little interest in random effects and are interested in assessing fixed effects. In such cases, a simple formulation of the linear mixed effect model (1) can be implemented wherein the random effects are not explicitly modeled, but are rather included as part of the covariance matrix Vi. We focus on such a “marginal” mean model. The covariance parameter vector in V = Vi for ni = T (i.e., subjects with no missing values) is denoted as α = (α1, . . . , αm) . The dimension of α, m, depends on T and the specified covariance structure. Let the model parameter vector be θ = (λ, βT , αT)T . The maximum likelihood estimate of θ is obtained through maximizing the profile likelihood for λ (Lipsitz et al., 2000; Maruo et al., 2017). The model-based and robust variance estimators of the maximum likelihood estimator θ̂ are given by V θ = {−Ĥ} −1 and V θ = {−Ĥ} −1 Ĵ{−Ĥ}−1, respectively, where H = ∂2 ∂θ∂θT l(θ), J = ∑n i=1 { ∂ ∂θ li(θ) }{ ∂ ∂θ li(θ) }T , l(θ) is the likelihood function for n subjects, and li(θ) is the likelihood function for the ith subject. The matrices Ĥ and Ĵ are obtained from the matrices H and J by replacing θ by θ̂, respectively. A robust variance estimator is an asymptotically valid estimator even when the model (1) is mis-specified (Cox, 1961). Model median inference We now focus on the continuous and positive outcomes of a certain disease and consider a situation in which the efficacy of some treatments (group index: g = 1, ..., G) is compared based on a randomized, parallel group clinical trial, where the total number of subjects is n. The explanatory variable matrix, Xi, and the fixed effect parameter, β in model (1) are denoted in detail as follows. Now, Xi is given by the ni× (GT +C) matrix that contains the intercept, G− 1 group variables, T− 1 occasion variables, groupby-occasion interaction variables, and C covariates, where the categorical covariates are converted into dummy variables. The fixed effect parameter vector is given by β = (β I , βg , βt , β T gt, β T c ) T , where β I , βg = (βg(1), . . . , βg(G−1)) T , βt = (βt(1), . . . , βt(T−1)) T , βgt = (βgt(1,1), βgt(1,2), . . . , βgt(G−1,T−1)) T , and βc = (βc(1), . . . , βc(C)) T are the intercept, group, occasion, group-by-occasion, and covariate parameter vectors, respectively. Although group G and occasion T is set to be at the reference level, we set βg(G) = βt(T) = βgt(G,t) = βgt(g,T) = 0 for the sake of description (g = 1, . . . , G, t = 1, . . . , T). Under these settings, the model median, ξ(g,t), for the treatment group g at the occasion t on the original scale is given by ξ(g,t) = { λ ( β I + βg(g) + βt(t) + βgt(g,t) + x T c̄ βc ) + 1 }1/λ , where xc̄ is the vector of the mean of each covariate for all subjects. The model median is the inverse Box–Cox transformation of the model means on the Box–Cox transformed scale. The model median can be easily interpreted because it is the estimator of the median on the original scale. Using the delta method, the variance estimator of the maximum likelihood estimator for the model median, ξ̂(g,t), is given by V (·) ξ(g,t) = ∆ T ξ(g,t)V (·) θ ∆ξ(g,t), where ∆ξ(g,t) = ∂ ∂θ ξ(g,t) ∣∣∣ θ=θ̂ . If we use V(·) θ = V (M) θ , we obtain the model-based variance estimator, V (M) ξ(g,t). On the other hand, the robust The R Journal Vol. XX/YY, AAAA 20ZZ ISSN 2073-4859 CONTRIBUTED RESEARCH ARTICLE 3 variance estimator, V xi(g,t), is obtained if we use V (·) θ = V (R) θ . Thus, the model median difference between groups g1 and g2 at occasion t is given by δ(g1;g2,t) = ξ(g1,t) − ξ(g2,t) and the variance estimators of the maximum likelihood estimator of the model median difference, δ̂(g1;g2,t), is given by Vδ(g1;g2,t) = (∆(g1,t) − ∆(g2,t)) TV(·) θ (∆(g1,t) − ∆(g2,t)). The model-based and robust variance estimators are obtained in the same way as the model median estimator. Using the asymptotic normality of the maximum likelihood estimator, the 100(1− α)% confidence interval of δ(g1;g2,t) is obtained as δ̂(g1;g2,t) ±Φ −1(1− α/2) √ V(·) δ(g1;g2,t) , where Φ−1(·) is the quantile function of the standard normal distribution. The Wald-type test for the null hypothesis, δ(g1;g2,t) = 0, is performed with the test statistic t = δ̂(g1;g2,t)/ √ V(·) δ(g1;g2,t) . The performances of above-mentioned inference procedures may be low for a small sample because these are based on the asymptotic properties of the maximum likelihood estimation. Therefore, Maruo et al. (2017) applied the following empirical small sample adjustment for the inferences of the model median differences by referring to the study in Schluchter and Elashoff (1990). They provide an adjusted standard error (SE) for the median difference as √ M/(M− p)V(·) δ(g1;g2,t) for the compound symmetry (CS) or the first-order auto regression (AR(1)) structure and √ n∗/(n∗ − T) × √ V(·) δ(g1;g2,t) for the unstructured (UN) structure. Further, we approximate the null distribution by the t distribution where the degree of freedom for the CS or the AR(1) struc