贝叶斯金字塔:离散数据的可识别多层离散潜在结构模型

IF 3.1 1区 数学 Q1 STATISTICS & PROBABILITY Journal of the Royal Statistical Society Series B-Statistical Methodology Pub Date : 2023-03-09 DOI:10.1093/jrsssb/qkad010
Yuqi Gu, David B Dunson
{"title":"贝叶斯金字塔:离散数据的可识别多层离散潜在结构模型","authors":"Yuqi Gu, David B Dunson","doi":"10.1093/jrsssb/qkad010","DOIUrl":null,"url":null,"abstract":"Abstract High-dimensional categorical data are routinely collected in biomedical and social sciences. It is of great importance to build interpretable parsimonious models that perform dimension reduction and uncover meaningful latent structures from such discrete data. Identifiability is a fundamental requirement for valid modeling and inference in such scenarios, yet is challenging to address when there are complex latent structures. In this article, we propose a class of identifiable multilayer (potentially deep) discrete latent structure models for discrete data, termed Bayesian Pyramids. We establish the identifiability of Bayesian Pyramids by developing novel transparent conditions on the pyramid-shaped deep latent directed graph. The proposed identifiability conditions can ensure Bayesian posterior consistency under suitable priors. As an illustration, we consider the two-latent-layer model and propose a Bayesian shrinkage estimation approach. Simulation results for this model corroborate the identifiability and estimatability of model parameters. Applications of the methodology to DNA nucleotide sequence data uncover useful discrete latent features that are highly predictive of sequence types. The proposed framework provides a recipe for interpretable unsupervised learning of discrete data and can be a useful alternative to popular machine learning methods.","PeriodicalId":49982,"journal":{"name":"Journal of the Royal Statistical Society Series B-Statistical Methodology","volume":"43 1","pages":"0"},"PeriodicalIF":3.1000,"publicationDate":"2023-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bayesian Pyramids: identifiable multilayer discrete latent structure models for discrete data\",\"authors\":\"Yuqi Gu, David B Dunson\",\"doi\":\"10.1093/jrsssb/qkad010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract High-dimensional categorical data are routinely collected in biomedical and social sciences. It is of great importance to build interpretable parsimonious models that perform dimension reduction and uncover meaningful latent structures from such discrete data. Identifiability is a fundamental requirement for valid modeling and inference in such scenarios, yet is challenging to address when there are complex latent structures. In this article, we propose a class of identifiable multilayer (potentially deep) discrete latent structure models for discrete data, termed Bayesian Pyramids. We establish the identifiability of Bayesian Pyramids by developing novel transparent conditions on the pyramid-shaped deep latent directed graph. The proposed identifiability conditions can ensure Bayesian posterior consistency under suitable priors. As an illustration, we consider the two-latent-layer model and propose a Bayesian shrinkage estimation approach. Simulation results for this model corroborate the identifiability and estimatability of model parameters. Applications of the methodology to DNA nucleotide sequence data uncover useful discrete latent features that are highly predictive of sequence types. The proposed framework provides a recipe for interpretable unsupervised learning of discrete data and can be a useful alternative to popular machine learning methods.\",\"PeriodicalId\":49982,\"journal\":{\"name\":\"Journal of the Royal Statistical Society Series B-Statistical Methodology\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2023-03-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the Royal Statistical Society Series B-Statistical Methodology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/jrsssb/qkad010\",\"RegionNum\":1,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Royal Statistical Society Series B-Statistical Methodology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jrsssb/qkad010","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

摘要

在生物医学和社会科学中,高维分类数据是常规收集的。建立可解释的简约模型,从这些离散数据中进行降维并发现有意义的潜在结构是非常重要的。在这种情况下,可识别性是有效建模和推理的基本要求,但当存在复杂的潜在结构时,解决这一问题具有挑战性。在本文中,我们提出了一类可识别的多层(潜在的深度)离散潜在结构模型,称为贝叶斯金字塔。通过在金字塔形深潜有向图上建立新的透明条件,建立了贝叶斯金字塔的可辨识性。所提出的可识别性条件可以保证在合适的先验条件下贝叶斯后验一致性。作为一个例子,我们考虑了两潜在层模型,并提出了贝叶斯收缩估计方法。该模型的仿真结果证实了模型参数的可辨识性和可估计性。该方法对DNA核苷酸序列数据的应用揭示了有用的离散潜在特征,这些特征对序列类型具有高度预测性。所提出的框架为离散数据的可解释无监督学习提供了一种方法,可以成为流行机器学习方法的有用替代方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Bayesian Pyramids: identifiable multilayer discrete latent structure models for discrete data
Abstract High-dimensional categorical data are routinely collected in biomedical and social sciences. It is of great importance to build interpretable parsimonious models that perform dimension reduction and uncover meaningful latent structures from such discrete data. Identifiability is a fundamental requirement for valid modeling and inference in such scenarios, yet is challenging to address when there are complex latent structures. In this article, we propose a class of identifiable multilayer (potentially deep) discrete latent structure models for discrete data, termed Bayesian Pyramids. We establish the identifiability of Bayesian Pyramids by developing novel transparent conditions on the pyramid-shaped deep latent directed graph. The proposed identifiability conditions can ensure Bayesian posterior consistency under suitable priors. As an illustration, we consider the two-latent-layer model and propose a Bayesian shrinkage estimation approach. Simulation results for this model corroborate the identifiability and estimatability of model parameters. Applications of the methodology to DNA nucleotide sequence data uncover useful discrete latent features that are highly predictive of sequence types. The proposed framework provides a recipe for interpretable unsupervised learning of discrete data and can be a useful alternative to popular machine learning methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
8.80
自引率
0.00%
发文量
83
审稿时长
>12 weeks
期刊介绍: Series B (Statistical Methodology) aims to publish high quality papers on the methodological aspects of statistics and data science more broadly. The objective of papers should be to contribute to the understanding of statistical methodology and/or to develop and improve statistical methods; any mathematical theory should be directed towards these aims. The kinds of contribution considered include descriptions of new methods of collecting or analysing data, with the underlying theory, an indication of the scope of application and preferably a real example. Also considered are comparisons, critical evaluations and new applications of existing methods, contributions to probability theory which have a clear practical bearing (including the formulation and analysis of stochastic models), statistical computation or simulation where original methodology is involved and original contributions to the foundations of statistical science. Reviews of methodological techniques are also considered. A paper, even if correct and well presented, is likely to be rejected if it only presents straightforward special cases of previously published work, if it is of mathematical interest only, if it is too long in relation to the importance of the new material that it contains or if it is dominated by computations or simulations of a routine nature.
期刊最新文献
Model-assisted sensitivity analysis for treatment effects under unmeasured confounding via regularized calibrated estimation. Interpretable discriminant analysis for functional data supported on random nonlinear domains with an application to Alzheimer's disease. GENIUS-MAWII: for robust Mendelian randomization with many weak invalid instruments. Doubly robust calibration of prediction sets under covariate shift. Gradient synchronization for multivariate functional data, with application to brain connectivity.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1