Comparison of two independent populations of compositional data with positive correlations among components using a nested dirichlet distribution.

IF 7.6 1区 心理学 Q1 PSYCHOLOGY, MULTIDISCIPLINARY Psychological methods Pub Date : 2025-01-16 DOI:10.1037/met0000702
Jacob A Turner,Bianca A Luedeker,Monnie McGee
{"title":"Comparison of two independent populations of compositional data with positive correlations among components using a nested dirichlet distribution.","authors":"Jacob A Turner,Bianca A Luedeker,Monnie McGee","doi":"10.1037/met0000702","DOIUrl":null,"url":null,"abstract":"Compositional data are multivariate data made up of components that sum to a fixed value. Often the data are presented as proportions of a whole, where the value of each component is constrained to be between 0 and 1 and the sum of the components is 1. There are many applications in psychology and other disciplines that yield compositional data sets including Morris water maze experiments, psychological well-being scores, analysis of daily physical activity times, and components of household expenditures. Statistical methods exist for compositional data and typically consist of two approaches. The first is to use transformation strategies, such as log ratios, which can lead to results that are challenging to interpret. The second involves using an appropriate distribution, such as the Dirichlet distribution, that captures the key characteristics of compositional data, and allows for ready interpretation of downstream analysis. Unfortunately, the Dirichlet distribution has constraints on variance and correlation that render it inappropriate for some applications. As a result, practicing researchers will often resort to standard two-sample t test or analysis of variance models for each variable in the composition to detect differences in means. We show that a recently published method using the Dirichlet distribution can drastically inflate Type I error rates, and we introduce a global two-sample test to detect differences in mean proportion of components for two independent groups where both groups are from either a Dirichlet or a more flexible nested Dirichlet distribution. We also derive confidence interval formulas for individual components for post hoc testing and further interpretation of results. We illustrate the utility of our methods using a recent Morris water maze experiment and human activity data. (PsycInfo Database Record (c) 2025 APA, all rights reserved).","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":"7 1","pages":""},"PeriodicalIF":7.6000,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychological methods","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1037/met0000702","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Compositional data are multivariate data made up of components that sum to a fixed value. Often the data are presented as proportions of a whole, where the value of each component is constrained to be between 0 and 1 and the sum of the components is 1. There are many applications in psychology and other disciplines that yield compositional data sets including Morris water maze experiments, psychological well-being scores, analysis of daily physical activity times, and components of household expenditures. Statistical methods exist for compositional data and typically consist of two approaches. The first is to use transformation strategies, such as log ratios, which can lead to results that are challenging to interpret. The second involves using an appropriate distribution, such as the Dirichlet distribution, that captures the key characteristics of compositional data, and allows for ready interpretation of downstream analysis. Unfortunately, the Dirichlet distribution has constraints on variance and correlation that render it inappropriate for some applications. As a result, practicing researchers will often resort to standard two-sample t test or analysis of variance models for each variable in the composition to detect differences in means. We show that a recently published method using the Dirichlet distribution can drastically inflate Type I error rates, and we introduce a global two-sample test to detect differences in mean proportion of components for two independent groups where both groups are from either a Dirichlet or a more flexible nested Dirichlet distribution. We also derive confidence interval formulas for individual components for post hoc testing and further interpretation of results. We illustrate the utility of our methods using a recent Morris water maze experiment and human activity data. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用嵌套狄利克雷分布比较成分间正相关的两个独立总体组成数据。
组合数据是由和为固定值的组件组成的多变量数据。数据通常以整体的比例表示,其中每个成分的值被限制在0到1之间,成分的总和为1。在心理学和其他学科中,有许多应用产生了组成数据集,包括莫里斯水迷宫实验、心理健康评分、日常身体活动时间分析和家庭支出组成部分。存在用于组合数据的统计方法,通常包括两种方法。第一种是使用转换策略,例如对数比率,这可能导致难以解释的结果。第二种方法涉及使用适当的分布,例如Dirichlet分布,它捕获了成分数据的关键特征,并允许对下游分析进行现成的解释。不幸的是,狄利克雷分布对方差和相关性有限制,使得它不适合某些应用。因此,实践研究人员通常会对组成中的每个变量采用标准的双样本t检验或方差分析模型来检测平均值的差异。我们表明,最近发表的一种使用狄利克雷分布的方法可以大大提高I型错误率,并且我们引入了一个全局双样本检验来检测两个独立组的平均成分比例的差异,其中两个组都来自狄利克雷分布或更灵活的嵌套狄利克雷分布。我们还推导了用于事后测试和进一步解释结果的单个组件的置信区间公式。我们用最近的莫里斯水迷宫实验和人类活动数据来说明我们的方法的实用性。(PsycInfo Database Record (c) 2025 APA,版权所有)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Psychological methods
Psychological methods PSYCHOLOGY, MULTIDISCIPLINARY-
CiteScore
13.10
自引率
7.10%
发文量
159
期刊介绍: Psychological Methods is devoted to the development and dissemination of methods for collecting, analyzing, understanding, and interpreting psychological data. Its purpose is the dissemination of innovations in research design, measurement, methodology, and quantitative and qualitative analysis to the psychological community; its further purpose is to promote effective communication about related substantive and methodological issues. The audience is expected to be diverse and to include those who develop new procedures, those who are responsible for undergraduate and graduate training in design, measurement, and statistics, as well as those who employ those procedures in research.
期刊最新文献
Troubleshooting Bayesian cognitive models. A novel approach to estimate moderated treatment effects and moderated mediated effects with continuous moderators. Is exploratory factor analysis always to be preferred? A systematic comparison of factor analytic techniques throughout the confirmatory-exploratory continuum. Everything has its price: Foundations of cost-sensitive machine learning and its application in psychology. A primer on synthesizing individual participant data obtained from complex sampling surveys: A two-stage IPD meta-analysis approach.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1