A Common Atoms Model for the Bayesian Nonparametric Analysis of Nested Data.

IF 3 1区 数学 Q1 STATISTICS & PROBABILITY Journal of the American Statistical Association Pub Date : 2023-01-01 Epub Date: 2021-07-14 DOI:10.1080/01621459.2021.1933499
Francesco Denti, Federico Camerlenghi, Michele Guindani, Antonietta Mira
{"title":"A Common Atoms Model for the Bayesian Nonparametric Analysis of Nested Data.","authors":"Francesco Denti, Federico Camerlenghi, Michele Guindani, Antonietta Mira","doi":"10.1080/01621459.2021.1933499","DOIUrl":null,"url":null,"abstract":"<p><p>The use of large datasets for targeted therapeutic interventions requires new ways to characterize the heterogeneity observed across subgroups of a specific population. In particular, models for partially exchangeable data are needed for inference on nested datasets, where the observations are assumed to be organized in different units and some sharing of information is required to learn distinctive features of the units. In this manuscript, we propose a nested common atoms model (CAM) that is particularly suited for the analysis of nested datasets where the distributions of the units are expected to differ only over a small fraction of the observations sampled from each unit. The proposed CAM allows a two-layered clustering at the distributional and observational level and is amenable to scalable posterior inference through the use of a computationally efficient nested slice sampler algorithm. We further discuss how to extend the proposed modeling framework to handle discrete measurements, and we conduct posterior inference on a real microbiome dataset from a diet swap study to investigate how the alterations in intestinal microbiota composition are associated with different eating habits. We further investigate the performance of our model in capturing true distributional structures in the population by means of a simulation study.</p>","PeriodicalId":17227,"journal":{"name":"Journal of the American Statistical Association","volume":"118 541","pages":"405-416"},"PeriodicalIF":3.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/01621459.2021.1933499","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Statistical Association","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1080/01621459.2021.1933499","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/7/14 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 16

Abstract

The use of large datasets for targeted therapeutic interventions requires new ways to characterize the heterogeneity observed across subgroups of a specific population. In particular, models for partially exchangeable data are needed for inference on nested datasets, where the observations are assumed to be organized in different units and some sharing of information is required to learn distinctive features of the units. In this manuscript, we propose a nested common atoms model (CAM) that is particularly suited for the analysis of nested datasets where the distributions of the units are expected to differ only over a small fraction of the observations sampled from each unit. The proposed CAM allows a two-layered clustering at the distributional and observational level and is amenable to scalable posterior inference through the use of a computationally efficient nested slice sampler algorithm. We further discuss how to extend the proposed modeling framework to handle discrete measurements, and we conduct posterior inference on a real microbiome dataset from a diet swap study to investigate how the alterations in intestinal microbiota composition are associated with different eating habits. We further investigate the performance of our model in capturing true distributional structures in the population by means of a simulation study.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于嵌套数据贝叶斯非参数分析的通用原子模型。
利用大型数据集进行有针对性的治疗干预需要新的方法来描述在特定人群的不同亚群中观察到的异质性。在嵌套数据集的推断中,尤其需要部分可交换数据的模型,在嵌套数据集中,观测数据被假定为不同的单元,需要共享一些信息来了解单元的独特特征。在本手稿中,我们提出了一种嵌套共原子模型(CAM),它特别适用于嵌套数据集的分析,在嵌套数据集中,各单元的分布预计只在每个单元采样的一小部分观测值上存在差异。所提出的 CAM 允许在分布和观测水平上进行双层聚类,并可通过使用计算效率高的嵌套切片采样器算法进行可扩展的后验推断。我们进一步讨论了如何扩展所提出的建模框架以处理离散测量,并对饮食交换研究中的真实微生物组数据集进行了后验推断,以研究肠道微生物群组成的改变如何与不同的饮食习惯相关联。我们还通过模拟研究进一步考察了我们的模型在捕捉人群真实分布结构方面的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
7.50
自引率
8.10%
发文量
168
审稿时长
12 months
期刊介绍: Established in 1888 and published quarterly in March, June, September, and December, the Journal of the American Statistical Association ( JASA ) has long been considered the premier journal of statistical science. Articles focus on statistical applications, theory, and methods in economic, social, physical, engineering, and health sciences. Important books contributing to statistical advancement are reviewed in JASA . JASA is indexed in Current Index to Statistics and MathSci Online and reviewed in Mathematical Reviews. JASA is abstracted by Access Company and is indexed and abstracted in the SRM Database of Social Research Methodology.
期刊最新文献
Identifiability and Consistent Estimation for Gaussian Chain Graph Models Data Science and Predictive Analytics: Biomedical and Health Applications using R, 2nd ed. Extremal Random Forests Quantitative Methods for Precision Medicine: Pharmacogenomics in Action. Graphical Principal Component Analysis of Multivariate Functional Time Series
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1