在估计行为的重测信度时,样本量很重要。

IF 3.9 2区 心理学 Q1 PSYCHOLOGY, EXPERIMENTAL Behavior Research Methods Pub Date : 2025-03-21 DOI:10.3758/s13428-025-02599-1
Brendan Williams, Lily FitzGibbon, Daniel Brady, Anastasia Christakou
{"title":"在估计行为的重测信度时,样本量很重要。","authors":"Brendan Williams, Lily FitzGibbon, Daniel Brady, Anastasia Christakou","doi":"10.3758/s13428-025-02599-1","DOIUrl":null,"url":null,"abstract":"<p><p>Intraclass correlation coefficients (ICCs) are a commonly used metric in test-retest reliability research to assess a measure's ability to quantify systematic between-subject differences. However, estimates of between-subject differences are also influenced by factors including within-subject variability, random errors, and measurement bias. Here, we use data collected from a large online sample (N = 150) to (1) quantify test-retest reliability of behavioural and computational measures of reversal learning using ICCs, and (2) use our dataset as the basis for a simulation study investigating the effects of sample size on variance component estimation and the association between estimates of variance components and ICC measures. In line with previously published work, we find reliable behavioural and computational measures of reversal learning, a commonly used assay of behavioural flexibility. Reliable estimates of between-subject, within-subject (across-session), and error variance components for behavioural and computational measures (with ± .05 precision and 80% confidence) required sample sizes ranging from 10 to over 300 (behavioural median N: between-subject = 167, within-subject = 34, error = 103; computational median N: between-subject = 68, within-subject = 20, error = 45). These sample sizes exceed those often used in reliability studies, suggesting that sample sizes larger than are commonly used for reliability studies (circa 30) are required to robustly estimate reliability of task performance measures. Additionally, we found that ICC estimates showed highly positive and highly negative correlations with between-subject and error variance components, respectively, as might be expected, which remained relatively stable across sample sizes. However, ICC estimates were weakly or not correlated with within-subject variance, providing evidence for the importance of variance decomposition for reliability studies.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 4","pages":"123"},"PeriodicalIF":3.9000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11928395/pdf/","citationCount":"0","resultStr":"{\"title\":\"Sample size matters when estimating test-retest reliability of behaviour.\",\"authors\":\"Brendan Williams, Lily FitzGibbon, Daniel Brady, Anastasia Christakou\",\"doi\":\"10.3758/s13428-025-02599-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Intraclass correlation coefficients (ICCs) are a commonly used metric in test-retest reliability research to assess a measure's ability to quantify systematic between-subject differences. However, estimates of between-subject differences are also influenced by factors including within-subject variability, random errors, and measurement bias. Here, we use data collected from a large online sample (N = 150) to (1) quantify test-retest reliability of behavioural and computational measures of reversal learning using ICCs, and (2) use our dataset as the basis for a simulation study investigating the effects of sample size on variance component estimation and the association between estimates of variance components and ICC measures. In line with previously published work, we find reliable behavioural and computational measures of reversal learning, a commonly used assay of behavioural flexibility. Reliable estimates of between-subject, within-subject (across-session), and error variance components for behavioural and computational measures (with ± .05 precision and 80% confidence) required sample sizes ranging from 10 to over 300 (behavioural median N: between-subject = 167, within-subject = 34, error = 103; computational median N: between-subject = 68, within-subject = 20, error = 45). These sample sizes exceed those often used in reliability studies, suggesting that sample sizes larger than are commonly used for reliability studies (circa 30) are required to robustly estimate reliability of task performance measures. Additionally, we found that ICC estimates showed highly positive and highly negative correlations with between-subject and error variance components, respectively, as might be expected, which remained relatively stable across sample sizes. However, ICC estimates were weakly or not correlated with within-subject variance, providing evidence for the importance of variance decomposition for reliability studies.</p>\",\"PeriodicalId\":8717,\"journal\":{\"name\":\"Behavior Research Methods\",\"volume\":\"57 4\",\"pages\":\"123\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-03-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11928395/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Behavior Research Methods\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://doi.org/10.3758/s13428-025-02599-1\",\"RegionNum\":2,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PSYCHOLOGY, EXPERIMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Behavior Research Methods","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.3758/s13428-025-02599-1","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0

摘要

类内相关系数(ICCs)是测试-重测信度研究中常用的度量,用于评估测量方法量化系统受试者间差异的能力。然而,受试者间差异的估计也受到受试者内变异性、随机误差和测量偏差等因素的影响。在这里,我们使用从大型在线样本(N = 150)收集的数据来(1)量化使用ICC的逆转学习行为和计算度量的测试-重测可靠性,(2)使用我们的数据集作为模拟研究的基础,调查样本量对方差成分估计的影响以及方差成分估计与ICC度量之间的关联。与先前发表的工作一致,我们发现了可靠的行为和计算方法的逆转学习,一种常用的行为灵活性分析。受试者间、受试者内(跨时段)的可靠估计,以及行为和计算测量的误差方差成分(±。05精度和80%置信度)需要的样本量范围从10到300以上(行为中位数N:受试者之间= 167,受试者内部= 34,误差= 103;计算中位数N: between-subject = 68, within-subject = 20, error = 45)。这些样本量超过了可靠性研究中经常使用的样本量,这表明需要比通常用于可靠性研究的样本量更大(大约30个)来稳健地估计任务绩效测量的可靠性。此外,我们发现ICC估计分别与主体和误差方差成分呈高度正相关和高度负相关,正如预期的那样,在不同的样本量上保持相对稳定。然而,ICC估计值与主体内方差的相关性较弱或不相关,这为方差分解对可靠性研究的重要性提供了证据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Sample size matters when estimating test-retest reliability of behaviour.

Intraclass correlation coefficients (ICCs) are a commonly used metric in test-retest reliability research to assess a measure's ability to quantify systematic between-subject differences. However, estimates of between-subject differences are also influenced by factors including within-subject variability, random errors, and measurement bias. Here, we use data collected from a large online sample (N = 150) to (1) quantify test-retest reliability of behavioural and computational measures of reversal learning using ICCs, and (2) use our dataset as the basis for a simulation study investigating the effects of sample size on variance component estimation and the association between estimates of variance components and ICC measures. In line with previously published work, we find reliable behavioural and computational measures of reversal learning, a commonly used assay of behavioural flexibility. Reliable estimates of between-subject, within-subject (across-session), and error variance components for behavioural and computational measures (with ± .05 precision and 80% confidence) required sample sizes ranging from 10 to over 300 (behavioural median N: between-subject = 167, within-subject = 34, error = 103; computational median N: between-subject = 68, within-subject = 20, error = 45). These sample sizes exceed those often used in reliability studies, suggesting that sample sizes larger than are commonly used for reliability studies (circa 30) are required to robustly estimate reliability of task performance measures. Additionally, we found that ICC estimates showed highly positive and highly negative correlations with between-subject and error variance components, respectively, as might be expected, which remained relatively stable across sample sizes. However, ICC estimates were weakly or not correlated with within-subject variance, providing evidence for the importance of variance decomposition for reliability studies.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
10.30
自引率
9.30%
发文量
266
期刊介绍: Behavior Research Methods publishes articles concerned with the methods, techniques, and instrumentation of research in experimental psychology. The journal focuses particularly on the use of computer technology in psychological research. An annual special issue is devoted to this field.
期刊最新文献
Beyond traditional assessments of cognitive status: Exploring the potential of spatial navigation tasks. Editorial: introduction to the special issue "methodological challenges of complex latent mediator and moderator models". Self-reported and task-based measures of attention control are distinct. A multidimensional-scaling study of images from diverse everyday-object categories. A primer on intensive longitudinal psychometrics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1