{"title":"Statistical Models for Count Data","authors":"A. Muoka, Oscar Ngesa, A. Waititu","doi":"10.11648/J.SJAMS.20160406.12","DOIUrl":null,"url":null,"abstract":"Statistical analyses involving count data may take several forms depending on the context of use, that is; simple counts such as the number of plants in a particular field and categorical data in which counts represent the number of items falling in each of the several categories. The mostly adapted model for analyzing count data is the Poisson model. Other models that can be considered for modeling count data are the negative binomial and the hurdle models. It is of great importance that these models are systematically considered and compared before choosing one at the expense of others to handle count data. In real world situations count data sets may have zero counts which have an importance attached to them. In this work, statistical simulation technique was used to compare the performance of these count data models. Count data sets with different proportions of zero were simulated. Akaike Information Criterion (AIC) was used in the simulation study to compare how well several count data models fit the simulated datasets. From the results of the study it was concluded that negative binomial model fits better to over-dispersed data which has below 0.3 proportion of zeros and that hurdle model performs better in data with 0.3 and above proportion of zero.","PeriodicalId":422938,"journal":{"name":"Science Journal of Applied Mathematics and Statistics","volume":"204 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science Journal of Applied Mathematics and Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11648/J.SJAMS.20160406.12","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Statistical analyses involving count data may take several forms depending on the context of use, that is; simple counts such as the number of plants in a particular field and categorical data in which counts represent the number of items falling in each of the several categories. The mostly adapted model for analyzing count data is the Poisson model. Other models that can be considered for modeling count data are the negative binomial and the hurdle models. It is of great importance that these models are systematically considered and compared before choosing one at the expense of others to handle count data. In real world situations count data sets may have zero counts which have an importance attached to them. In this work, statistical simulation technique was used to compare the performance of these count data models. Count data sets with different proportions of zero were simulated. Akaike Information Criterion (AIC) was used in the simulation study to compare how well several count data models fit the simulated datasets. From the results of the study it was concluded that negative binomial model fits better to over-dispersed data which has below 0.3 proportion of zeros and that hurdle model performs better in data with 0.3 and above proportion of zero.
根据使用情况,涉及计数数据的统计分析可能采取几种形式,即;简单的计数,如特定领域的植物数量和分类数据,其中计数表示属于几个类别中的每个类别的项目数量。最适合分析计数数据的模型是泊松模型。其他可以考虑用于计数数据建模的模型是负二项模型和障碍模型。在选择一个模型来处理计数数据之前,系统地考虑和比较这些模型是非常重要的。在现实世界中,计数数据集可能具有零计数,这些计数具有重要意义。在这项工作中,统计模拟技术被用来比较这些计数数据模型的性能。模拟了不同比例零的计数数据集。在模拟研究中使用赤池信息准则(Akaike Information Criterion, AIC)来比较几种计数数据模型与模拟数据集的拟合程度。从研究结果可以看出,负二项模型对零比例小于0.3的过分散数据拟合效果较好,障碍模型对零比例大于0.3的过分散数据拟合效果较好。