Data aggregation can lead to biased inferences in Bayesian linear mixed models and Bayesian analysis of variance.

IF 7.6 1区 心理学 Q1 PSYCHOLOGY, MULTIDISCIPLINARY Psychological methods Pub Date : 2024-01-25 DOI:10.1037/met0000621
Daniel J Schad, Bruno Nicenboim, Shravan Vasishth
{"title":"Data aggregation can lead to biased inferences in Bayesian linear mixed models and Bayesian analysis of variance.","authors":"Daniel J Schad, Bruno Nicenboim, Shravan Vasishth","doi":"10.1037/met0000621","DOIUrl":null,"url":null,"abstract":"<p><p>Bayesian linear mixed-effects models (LMMs) and Bayesian analysis of variance (ANOVA) are increasingly being used in the cognitive sciences to perform null hypothesis tests, where a null hypothesis that an effect is zero is compared with an alternative hypothesis that the effect exists and is different from zero. While software tools for Bayes factor null hypothesis tests are easily accessible, how to specify the data and the model correctly is often not clear. In Bayesian approaches, many authors use data aggregation at the by-subject level and estimate Bayes factors on aggregated data. Here, we use simulation-based calibration for model inference applied to several example experimental designs to demonstrate that, as with frequentist analysis, such null hypothesis tests on aggregated data can be problematic in Bayesian analysis. Specifically, when random slope variances differ (i.e., violated sphericity assumption), Bayes factors are too conservative for contrasts where the variance is small and they are too liberal for contrasts where the variance is large. Running Bayesian ANOVA on aggregated data can-if the sphericity assumption is violated-likewise lead to biased Bayes factor results. Moreover, Bayes factors for by-subject aggregated data are biased (too liberal) when random item slope variance is present but ignored in the analysis. These problems can be circumvented or reduced by running Bayesian LMMs on nonaggregated data such as on individual trials, and by explicitly modeling the full random effects structure. Reproducible code is available from https://osf.io/mjf47/. (PsycInfo Database Record (c) 2024 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":""},"PeriodicalIF":7.6000,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychological methods","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1037/met0000621","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Bayesian linear mixed-effects models (LMMs) and Bayesian analysis of variance (ANOVA) are increasingly being used in the cognitive sciences to perform null hypothesis tests, where a null hypothesis that an effect is zero is compared with an alternative hypothesis that the effect exists and is different from zero. While software tools for Bayes factor null hypothesis tests are easily accessible, how to specify the data and the model correctly is often not clear. In Bayesian approaches, many authors use data aggregation at the by-subject level and estimate Bayes factors on aggregated data. Here, we use simulation-based calibration for model inference applied to several example experimental designs to demonstrate that, as with frequentist analysis, such null hypothesis tests on aggregated data can be problematic in Bayesian analysis. Specifically, when random slope variances differ (i.e., violated sphericity assumption), Bayes factors are too conservative for contrasts where the variance is small and they are too liberal for contrasts where the variance is large. Running Bayesian ANOVA on aggregated data can-if the sphericity assumption is violated-likewise lead to biased Bayes factor results. Moreover, Bayes factors for by-subject aggregated data are biased (too liberal) when random item slope variance is present but ignored in the analysis. These problems can be circumvented or reduced by running Bayesian LMMs on nonaggregated data such as on individual trials, and by explicitly modeling the full random effects structure. Reproducible code is available from https://osf.io/mjf47/. (PsycInfo Database Record (c) 2024 APA, all rights reserved).

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在贝叶斯线性混合模型和贝叶斯方差分析中,数据聚合可能导致有偏差的推论。
贝叶斯线性混合效应模型(LMMs)和贝叶斯方差分析(ANOVA)越来越多地用于认知科学中的零假设检验,即把效应为零的零假设与效应存在且不同于零的备择假设进行比较。虽然贝叶斯因子零假设检验的软件工具很容易获得,但如何正确指定数据和模型往往并不清楚。在贝叶斯方法中,许多作者使用按受试者水平进行数据聚合,并对聚合数据进行贝叶斯因子估计。在此,我们使用基于模拟的模型推断校准,并将其应用于几个实验设计实例,以证明与频数分析一样,在贝叶斯分析中,这种对聚合数据的零假设检验也可能存在问题。具体来说,当随机斜率方差不同时(即违反球形性假设),贝叶斯因子对于方差较小的对比过于保守,而对于方差较大的对比则过于宽松。如果违反了球形性假设,那么在汇总数据上运行贝叶斯方差分析同样会导致贝叶斯因子结果出现偏差。此外,当随机项目斜率方差存在但在分析中被忽略时,分项目汇总数据的贝叶斯因子也会出现偏差(过于宽松)。这些问题可以通过在非汇总数据(如单个试验)上运行贝叶斯 LMM,以及明确建立完整的随机效应结构模型来规避或减少。可从 https://osf.io/mjf47/ 获取可复制的代码。(PsycInfo 数据库记录 (c) 2024 APA,保留所有权利)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Psychological methods
Psychological methods PSYCHOLOGY, MULTIDISCIPLINARY-
CiteScore
13.10
自引率
7.10%
发文量
159
期刊介绍: Psychological Methods is devoted to the development and dissemination of methods for collecting, analyzing, understanding, and interpreting psychological data. Its purpose is the dissemination of innovations in research design, measurement, methodology, and quantitative and qualitative analysis to the psychological community; its further purpose is to promote effective communication about related substantive and methodological issues. The audience is expected to be diverse and to include those who develop new procedures, those who are responsible for undergraduate and graduate training in design, measurement, and statistics, as well as those who employ those procedures in research.
期刊最新文献
A guided tutorial on linear mixed-effects models for the analysis of accuracies and response times in experiments with fully crossed design. Bayes factors for logistic (mixed-effect) models. Better power by design: Permuted-subblock randomization boosts power in repeated-measures experiments. Building a simpler moderated nonlinear factor analysis model with Markov Chain Monte Carlo estimation. Definition and identification of causal ratio effects.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1