{"title":"Bayesian analysis of two-part nonlinear latent variable model: Semiparametric method","authors":"Jianwei Gou, Ye-mao Xia, De-Peng Jiang","doi":"10.1177/1471082X211059233","DOIUrl":null,"url":null,"abstract":"Two-part model (TPM) is a widely appreciated statistical method for analyzing semi-continuous data. Semi-continuous data can be viewed as arising from two distinct stochastic processes: one governs the occurrence or binary part of data and the other determines the intensity or continuous part. In the regression setting with the semi-continuous outcome as functions of covariates, the binary part is commonly modelled via logistic regression and the continuous component via a log-normal model. The conventional TPM, still imposes assumptions such as log-normal distribution of the continuous part, with no unobserved heterogeneity among the response, and no collinearity among covariates, which are quite often unrealistic in practical applications. In this article, we develop a two-part nonlinear latent variable model (TPNLVM) with mixed multiple semi-continuous and continuous variables. The semi-continuous variables are treated as indicators of the latent factor analysis along with other manifest variables. This reduces the dimensionality of the regression model and alleviates the potential multicollinearity problems. Our TPNLVM can accommodate the nonlinear relationships among latent variables extracted from the factor analysis. To downweight the influence of distribution deviations and extreme observations, we develop a Bayesian semiparametric analysis procedure. The conventional parametric assumptions on the related distributions are relaxed and the Dirichlet process (DP) prior is used to improve model fitting. By taking advantage of the discreteness of DP, our method is effective in capturing the heterogeneity underlying population. Within the Bayesian paradigm, posterior inferences including parameters estimates and model assessment are carried out through Markov Chains Monte Carlo (MCMC) sampling method. To facilitate posterior sampling, we adapt the Polya-Gamma stochastic representation for the logistic model. Using simulation studies, we examine properties and merits of our proposed methods and illustrate our approach by evaluating the effect of treatment on cocaine use and examining whether the treatment effect is moderated by psychiatric problems.","PeriodicalId":1,"journal":{"name":"Accounts of Chemical Research","volume":null,"pages":null},"PeriodicalIF":16.4000,"publicationDate":"2021-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Accounts of Chemical Research","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1177/1471082X211059233","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 2
Abstract
Two-part model (TPM) is a widely appreciated statistical method for analyzing semi-continuous data. Semi-continuous data can be viewed as arising from two distinct stochastic processes: one governs the occurrence or binary part of data and the other determines the intensity or continuous part. In the regression setting with the semi-continuous outcome as functions of covariates, the binary part is commonly modelled via logistic regression and the continuous component via a log-normal model. The conventional TPM, still imposes assumptions such as log-normal distribution of the continuous part, with no unobserved heterogeneity among the response, and no collinearity among covariates, which are quite often unrealistic in practical applications. In this article, we develop a two-part nonlinear latent variable model (TPNLVM) with mixed multiple semi-continuous and continuous variables. The semi-continuous variables are treated as indicators of the latent factor analysis along with other manifest variables. This reduces the dimensionality of the regression model and alleviates the potential multicollinearity problems. Our TPNLVM can accommodate the nonlinear relationships among latent variables extracted from the factor analysis. To downweight the influence of distribution deviations and extreme observations, we develop a Bayesian semiparametric analysis procedure. The conventional parametric assumptions on the related distributions are relaxed and the Dirichlet process (DP) prior is used to improve model fitting. By taking advantage of the discreteness of DP, our method is effective in capturing the heterogeneity underlying population. Within the Bayesian paradigm, posterior inferences including parameters estimates and model assessment are carried out through Markov Chains Monte Carlo (MCMC) sampling method. To facilitate posterior sampling, we adapt the Polya-Gamma stochastic representation for the logistic model. Using simulation studies, we examine properties and merits of our proposed methods and illustrate our approach by evaluating the effect of treatment on cocaine use and examining whether the treatment effect is moderated by psychiatric problems.
期刊介绍:
Accounts of Chemical Research presents short, concise and critical articles offering easy-to-read overviews of basic research and applications in all areas of chemistry and biochemistry. These short reviews focus on research from the author’s own laboratory and are designed to teach the reader about a research project. In addition, Accounts of Chemical Research publishes commentaries that give an informed opinion on a current research problem. Special Issues online are devoted to a single topic of unusual activity and significance.
Accounts of Chemical Research replaces the traditional article abstract with an article "Conspectus." These entries synopsize the research affording the reader a closer look at the content and significance of an article. Through this provision of a more detailed description of the article contents, the Conspectus enhances the article's discoverability by search engines and the exposure for the research.