Statistical analysis of very high‐dimensional data sets of hierarchically structured binary variables with missing data: An application to marine corps readiness evaluations
{"title":"Statistical analysis of very high‐dimensional data sets of hierarchically structured binary variables with missing data: An application to marine corps readiness evaluations","authors":"S. Zacks, W. Marlow, S. Brier","doi":"10.1002/NAV.3800320310","DOIUrl":null,"url":null,"abstract":"Abstract : The present analysis deals with very high-dimensional data sets, each one containing close to nine hundred binary variables. Each data set corresponds to an evaluation of one complex system. These data sets are characterized by large portions of missing data where, moreover, the unobserved variables are not the same in different evaluations. Thus, the problems which confront the statistical analysis are those of multivariate binary data analysis, where the number of variables is much larger than the sample size and in which missing data varies with the sample elements. The variables, however, are hierarchically structured and the problem of clustering variables to groups does not exist in the present study. In order to motivate the statistical problem under consideration, the Marine Corps Combat Readiness Evaluation System (MCCRES) is described for infantry battalions and then used for exposition. The present paper provides a statistical model for data from MCCRES and develops estimation and prediction procedures which utilize the dependence structure. The E-M algorithm is applied to obtain maximum likelihood estimates of the parameters of interest. Numerical examples illustrate the proposed methods.","PeriodicalId":431817,"journal":{"name":"Naval Research Logistics Quarterly","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1985-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Naval Research Logistics Quarterly","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/NAV.3800320310","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Abstract : The present analysis deals with very high-dimensional data sets, each one containing close to nine hundred binary variables. Each data set corresponds to an evaluation of one complex system. These data sets are characterized by large portions of missing data where, moreover, the unobserved variables are not the same in different evaluations. Thus, the problems which confront the statistical analysis are those of multivariate binary data analysis, where the number of variables is much larger than the sample size and in which missing data varies with the sample elements. The variables, however, are hierarchically structured and the problem of clustering variables to groups does not exist in the present study. In order to motivate the statistical problem under consideration, the Marine Corps Combat Readiness Evaluation System (MCCRES) is described for infantry battalions and then used for exposition. The present paper provides a statistical model for data from MCCRES and develops estimation and prediction procedures which utilize the dependence structure. The E-M algorithm is applied to obtain maximum likelihood estimates of the parameters of interest. Numerical examples illustrate the proposed methods.