零膨胀和障碍模型在零膨胀计数数据建模中的比较。

Q2 Mathematics Journal of Statistical Distributions and Applications Pub Date : 2021-01-01 Epub Date: 2021-06-24 DOI:10.1186/s40488-021-00121-4

Cindy Xin Feng

{"title":"零膨胀和障碍模型在零膨胀计数数据建模中的比较。","authors":"Cindy Xin Feng","doi":"10.1186/s40488-021-00121-4","DOIUrl":null,"url":null,"abstract":"Counts data with excessive zeros are frequently encountered in practice. For example, the number of health services visits often includes many zeros representing the patients with no utilization during a follow-up time. A common feature of this type of data is that the count measure tends to have excessive zero beyond a common count distribution can accommodate, such as Poisson or negative binomial. Zero-inflated or hurdle models are often used to fit such data. Despite the increasing popularity of ZI and hurdle models, there is still a lack of investigation of the fundamental differences between these two types of models. In this article, we reviewed the zero-inflated and hurdle models and highlighted their differences in terms of their data generating processes. We also conducted simulation studies to evaluate the performances of both types of models. The final choice of regression model should be made after a careful assessment of goodness of fit and should be tailored to a particular data in question.","PeriodicalId":52216,"journal":{"name":"Journal of Statistical Distributions and Applications","volume":"8 1","pages":"8"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8570364/pdf/","citationCount":"0","resultStr":"{\"title\":\"A comparison of zero-inflated and hurdle models for modeling zero-inflated count data.\",\"authors\":\"Cindy Xin Feng\",\"doi\":\"10.1186/s40488-021-00121-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Counts data with excessive zeros are frequently encountered in practice. For example, the number of health services visits often includes many zeros representing the patients with no utilization during a follow-up time. A common feature of this type of data is that the count measure tends to have excessive zero beyond a common count distribution can accommodate, such as Poisson or negative binomial. Zero-inflated or hurdle models are often used to fit such data. Despite the increasing popularity of ZI and hurdle models, there is still a lack of investigation of the fundamental differences between these two types of models. In this article, we reviewed the zero-inflated and hurdle models and highlighted their differences in terms of their data generating processes. We also conducted simulation studies to evaluate the performances of both types of models. The final choice of regression model should be made after a careful assessment of goodness of fit and should be tailored to a particular data in question.\",\"PeriodicalId\":52216,\"journal\":{\"name\":\"Journal of Statistical Distributions and Applications\",\"volume\":\"8 1\",\"pages\":\"8\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8570364/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Statistical Distributions and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/s40488-021-00121-4\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2021/6/24 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Statistical Distributions and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s40488-021-00121-4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/6/24 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"Mathematics","Score":null,"Total":0}

引用次数: 0

摘要

在实际工作中，经常会遇到计数数据中包含过多零的情况。例如，医疗服务就诊次数往往包括许多零，代表在随访期间没有使用过医疗服务的患者。这类数据的一个共同特点是，计数量往往会有过多的零，超出普通计数分布（如泊松分布或负二项分布）所能容纳的范围。零膨胀模型或障碍模型通常用于拟合此类数据。尽管零膨胀模型和阶跃模型越来越受欢迎，但对这两类模型之间的根本区别仍然缺乏研究。在本文中，我们回顾了零膨胀模型和阶跃模型，并强调了它们在数据生成过程方面的差异。我们还进行了模拟研究，以评估这两类模型的性能。回归模型的最终选择应在对拟合度进行仔细评估后做出，并应适合特定的相关数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A comparison of zero-inflated and hurdle models for modeling zero-inflated count data.

Counts data with excessive zeros are frequently encountered in practice. For example, the number of health services visits often includes many zeros representing the patients with no utilization during a follow-up time. A common feature of this type of data is that the count measure tends to have excessive zero beyond a common count distribution can accommodate, such as Poisson or negative binomial. Zero-inflated or hurdle models are often used to fit such data. Despite the increasing popularity of ZI and hurdle models, there is still a lack of investigation of the fundamental differences between these two types of models. In this article, we reviewed the zero-inflated and hurdle models and highlighted their differences in terms of their data generating processes. We also conducted simulation studies to evaluate the performances of both types of models. The final choice of regression model should be made after a careful assessment of goodness of fit and should be tailored to a particular data in question.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Statistical Distributions and Applications Decision Sciences-Statistics, Probability and Uncertainty

自引率

0.00%

发文量

审稿时长

13 weeks