{"title":"A $C_p$ type criterion for model selection in the GEE method when both scale and correlation parameters are unknown","authors":"Tomoharu Sato, Yu Inatsu","doi":"10.32917/hmj/1583805651","DOIUrl":null,"url":null,"abstract":"Recently, in real data analysis, we consider the data with correlation for many fields, for example medical science, economics and many other fields. Especially, the data what is measured repeatedly over times from same subjects, named longitudinal data, is widely used in those fields. In general, the data from same subject have correlation, on the other hand, the data from different subjects are independent.. Liang and Zeger (1986) introduce an extension of generalized linear model (Nelder and Wedderburn, 1972), named generalized estimating equation (GEE). GEE method is one of the methods to analyze the data with correlation. Defining features of the GEE method are that we can use working correlation matrix one can choose freely. We can get good estimation of parameters if working correlation matrix is correct or not. It is important that we don’t need a full specification of a joint distribution. In those reason, GEE method is widely used in many fields. ”Model selection” is also important problem, so we apply model selection to the GEE. In general, in model selection, we measure the goodness of fit by risk function, and choose the model with smallest risk function. Then, by using the asymptotically unbiased estimator of risk function, we consider the model selection criterion. For example, expected Kullback-Leibler information (Kullback and Leibler, 1951), and most famous Akaike’s information criterion (AIC) (Akaike, 1973, 1974) are used. The AIC is calculated by AIC = −2× (maximumloglikelihood)+2× (thenumberofparameters). Furthermore, the GIC what is expansion of the AIC proposed by Nishii (1984) and Rao (1988) is also applied for many fields. However, we can’t use the model selection criterion based likelihood as AIC or GIC because of we don’t specify joint distribution. Some model selection criteria like AIC and GIC in the GEE method have been already proposed. For example, Pan (2001) proposed the QIC based on the quasi-likelihood (defined by Wedderburn, 1974). Furthermore, the GCp proposed by Cantoni et al. (2005) is the generally extension of Mallow’s Cp (Mallows, 1973). The CIC proposed by Hin and Wang (2009) and Gosho et al. (2011) is criterion what select the correlation structure. Unfortunately, the above criteria are derived without consider the correlation structure so we regard to these criteria don’t reflect the correlation. From this background, in Inatsu and Imori (2013) proposed a new model selection criterion PMSEG (the prediction mean squared error in the GEE) using the risk function based on the prediction mean squared error (PMSE) normalized by the covariance matrix. Inatsu and Imori (2013) proposed this criterion when both correlation and scale parameters are known, but correlation and scale parameters are generally unknown so we consider this criterion when both correlation and scale parameters are unknown. In this paper, the main topic is to propose the model selection criterion considered correlation structure when both correlation and scale parameters are unknown. In order to propose the new model selection criterion, we evaluate the asymptotic bias of the estimator of risk function and consider the influence of estimation correlation parameter and scale parameter. We focus on the ”variable selection” which selecting the optimum combination of variables. The present paper organized as follows: In section 2, we introduce the GEE framework and propose the estimation method for parameters. After that, we perform the stochastic expansion of the GEE estimator. In section 3, we define the estimation of risk function, and evaluate the asymptotic bias by calculate the bias, and propose the new model selection criterion. In section 4, we perform numerical study. In section 5, we conclude our discussion. In appendix, we provide the calculation process for the bias.","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.32917/hmj/1583805651","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Recently, in real data analysis, we consider the data with correlation for many fields, for example medical science, economics and many other fields. Especially, the data what is measured repeatedly over times from same subjects, named longitudinal data, is widely used in those fields. In general, the data from same subject have correlation, on the other hand, the data from different subjects are independent.. Liang and Zeger (1986) introduce an extension of generalized linear model (Nelder and Wedderburn, 1972), named generalized estimating equation (GEE). GEE method is one of the methods to analyze the data with correlation. Defining features of the GEE method are that we can use working correlation matrix one can choose freely. We can get good estimation of parameters if working correlation matrix is correct or not. It is important that we don’t need a full specification of a joint distribution. In those reason, GEE method is widely used in many fields. ”Model selection” is also important problem, so we apply model selection to the GEE. In general, in model selection, we measure the goodness of fit by risk function, and choose the model with smallest risk function. Then, by using the asymptotically unbiased estimator of risk function, we consider the model selection criterion. For example, expected Kullback-Leibler information (Kullback and Leibler, 1951), and most famous Akaike’s information criterion (AIC) (Akaike, 1973, 1974) are used. The AIC is calculated by AIC = −2× (maximumloglikelihood)+2× (thenumberofparameters). Furthermore, the GIC what is expansion of the AIC proposed by Nishii (1984) and Rao (1988) is also applied for many fields. However, we can’t use the model selection criterion based likelihood as AIC or GIC because of we don’t specify joint distribution. Some model selection criteria like AIC and GIC in the GEE method have been already proposed. For example, Pan (2001) proposed the QIC based on the quasi-likelihood (defined by Wedderburn, 1974). Furthermore, the GCp proposed by Cantoni et al. (2005) is the generally extension of Mallow’s Cp (Mallows, 1973). The CIC proposed by Hin and Wang (2009) and Gosho et al. (2011) is criterion what select the correlation structure. Unfortunately, the above criteria are derived without consider the correlation structure so we regard to these criteria don’t reflect the correlation. From this background, in Inatsu and Imori (2013) proposed a new model selection criterion PMSEG (the prediction mean squared error in the GEE) using the risk function based on the prediction mean squared error (PMSE) normalized by the covariance matrix. Inatsu and Imori (2013) proposed this criterion when both correlation and scale parameters are known, but correlation and scale parameters are generally unknown so we consider this criterion when both correlation and scale parameters are unknown. In this paper, the main topic is to propose the model selection criterion considered correlation structure when both correlation and scale parameters are unknown. In order to propose the new model selection criterion, we evaluate the asymptotic bias of the estimator of risk function and consider the influence of estimation correlation parameter and scale parameter. We focus on the ”variable selection” which selecting the optimum combination of variables. The present paper organized as follows: In section 2, we introduce the GEE framework and propose the estimation method for parameters. After that, we perform the stochastic expansion of the GEE estimator. In section 3, we define the estimation of risk function, and evaluate the asymptotic bias by calculate the bias, and propose the new model selection criterion. In section 4, we perform numerical study. In section 5, we conclude our discussion. In appendix, we provide the calculation process for the bias.