{"title":"具有非随机缺失的多元极值二进制数据的共享空间模型。","authors":"Xiaoyue Zhao, Lin Zhang, Dipankar Bandyopadhyay","doi":"10.1007/s13571-019-00198-7","DOIUrl":null,"url":null,"abstract":"<p><p>Clinical studies and trials on periodontal disease (PD) generate a large volume of data collected at various tooth locations of a subject. However, they present a number of statistical complexities. When our focus is on understanding the extent of extreme PD progression, standard analysis under a generalized linear mixed model framework with a symmetric (logit) link may be inappropriate, as the binary split (extreme disease versus not) maybe highly skewed. In addition, PD progression is often hypothesized to be spatially-referenced, i.e. proximal teeth may have a similar PD status than those that are distally located. Furthermore, a non-ignorable quantity of missing data is observed, and the missingness is non-random, as it informs the periodontal health status of the subject. In this paper, we address all the above concerns through a shared (spatial) latent factor model, where the latent factor jointly models the extreme binary responses via a generalized extreme value regression, and the non-randomly missing teeth via a probit regression. Our approach is Bayesian, and the inferential framework is powered by within-Gibbs Hamiltonian Monte Carlo techniques. Through simulation studies and application to a real dataset on PD, we demonstrate the potential advantages of our model in terms of model fit, and obtaining precise parameter estimates over alternatives that do not consider the aforementioned complexities.</p>","PeriodicalId":74754,"journal":{"name":"Sankhya. Series B (2008)","volume":"83 2","pages":"374-396"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s13571-019-00198-7","citationCount":"1","resultStr":"{\"title\":\"A shared spatial model for multivariate extreme-valued binary data with non-random missingness.\",\"authors\":\"Xiaoyue Zhao, Lin Zhang, Dipankar Bandyopadhyay\",\"doi\":\"10.1007/s13571-019-00198-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Clinical studies and trials on periodontal disease (PD) generate a large volume of data collected at various tooth locations of a subject. However, they present a number of statistical complexities. When our focus is on understanding the extent of extreme PD progression, standard analysis under a generalized linear mixed model framework with a symmetric (logit) link may be inappropriate, as the binary split (extreme disease versus not) maybe highly skewed. In addition, PD progression is often hypothesized to be spatially-referenced, i.e. proximal teeth may have a similar PD status than those that are distally located. Furthermore, a non-ignorable quantity of missing data is observed, and the missingness is non-random, as it informs the periodontal health status of the subject. In this paper, we address all the above concerns through a shared (spatial) latent factor model, where the latent factor jointly models the extreme binary responses via a generalized extreme value regression, and the non-randomly missing teeth via a probit regression. Our approach is Bayesian, and the inferential framework is powered by within-Gibbs Hamiltonian Monte Carlo techniques. Through simulation studies and application to a real dataset on PD, we demonstrate the potential advantages of our model in terms of model fit, and obtaining precise parameter estimates over alternatives that do not consider the aforementioned complexities.</p>\",\"PeriodicalId\":74754,\"journal\":{\"name\":\"Sankhya. Series B (2008)\",\"volume\":\"83 2\",\"pages\":\"374-396\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1007/s13571-019-00198-7\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sankhya. Series B (2008)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s13571-019-00198-7\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2019/7/16 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sankhya. Series B (2008)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s13571-019-00198-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2019/7/16 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
A shared spatial model for multivariate extreme-valued binary data with non-random missingness.
Clinical studies and trials on periodontal disease (PD) generate a large volume of data collected at various tooth locations of a subject. However, they present a number of statistical complexities. When our focus is on understanding the extent of extreme PD progression, standard analysis under a generalized linear mixed model framework with a symmetric (logit) link may be inappropriate, as the binary split (extreme disease versus not) maybe highly skewed. In addition, PD progression is often hypothesized to be spatially-referenced, i.e. proximal teeth may have a similar PD status than those that are distally located. Furthermore, a non-ignorable quantity of missing data is observed, and the missingness is non-random, as it informs the periodontal health status of the subject. In this paper, we address all the above concerns through a shared (spatial) latent factor model, where the latent factor jointly models the extreme binary responses via a generalized extreme value regression, and the non-randomly missing teeth via a probit regression. Our approach is Bayesian, and the inferential framework is powered by within-Gibbs Hamiltonian Monte Carlo techniques. Through simulation studies and application to a real dataset on PD, we demonstrate the potential advantages of our model in terms of model fit, and obtaining precise parameter estimates over alternatives that do not consider the aforementioned complexities.