{"title":"Bayesian inference for nonprobability samples with nonignorable missingness","authors":"Zhan Liu, Xuesong Chen, Ruohan Li, Lanbao Hou","doi":"10.1002/sam.11667","DOIUrl":null,"url":null,"abstract":"Nonprobability samples, especially web survey data, have been available in many different fields. However, nonprobability samples suffer from selection bias, which will yield biased estimates. Moreover, missingness, especially nonignorable missingness, may also be encountered in nonprobability samples. Thus, it is a challenging task to make inference from nonprobability samples with nonignorable missingness. In this article, we propose a Bayesian approach to infer the population based on nonprobability samples with nonignorable missingness. In our method, different Logistic regression models are employed to estimate the selection probabilities and the response probabilities; the superpopulation model is used to explain the relationship between the study variable and covariates. Further, Bayesian and approximate Bayesian methods are proposed to estimate the response model parameters and the superpopulation model parameters, respectively. Specifically, the estimating functions for the response model parameters and superpopulation model parameters are utilized to derive the approximate posterior distribution in superpopulation model estimation. Simulation studies are conducted to investigate the finite sample performance of the proposed method. The data from the Pew Research Center and the Behavioral Risk Factor Surveillance System are used to show better performance of our proposed method over the other approaches.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"22 1","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Analysis and Data Mining","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1002/sam.11667","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Nonprobability samples, especially web survey data, have been available in many different fields. However, nonprobability samples suffer from selection bias, which will yield biased estimates. Moreover, missingness, especially nonignorable missingness, may also be encountered in nonprobability samples. Thus, it is a challenging task to make inference from nonprobability samples with nonignorable missingness. In this article, we propose a Bayesian approach to infer the population based on nonprobability samples with nonignorable missingness. In our method, different Logistic regression models are employed to estimate the selection probabilities and the response probabilities; the superpopulation model is used to explain the relationship between the study variable and covariates. Further, Bayesian and approximate Bayesian methods are proposed to estimate the response model parameters and the superpopulation model parameters, respectively. Specifically, the estimating functions for the response model parameters and superpopulation model parameters are utilized to derive the approximate posterior distribution in superpopulation model estimation. Simulation studies are conducted to investigate the finite sample performance of the proposed method. The data from the Pew Research Center and the Behavioral Risk Factor Surveillance System are used to show better performance of our proposed method over the other approaches.
期刊介绍:
Statistical Analysis and Data Mining addresses the broad area of data analysis, including statistical approaches, machine learning, data mining, and applications. Topics include statistical and computational approaches for analyzing massive and complex datasets, novel statistical and/or machine learning methods and theory, and state-of-the-art applications with high impact. Of special interest are articles that describe innovative analytical techniques, and discuss their application to real problems, in such a way that they are accessible and beneficial to domain experts across science, engineering, and commerce.
The focus of the journal is on papers which satisfy one or more of the following criteria:
Solve data analysis problems associated with massive, complex datasets
Develop innovative statistical approaches, machine learning algorithms, or methods integrating ideas across disciplines, e.g., statistics, computer science, electrical engineering, operation research.
Formulate and solve high-impact real-world problems which challenge existing paradigms via new statistical and/or computational models
Provide survey to prominent research topics.