Behavior Research Methods最新文献_第2页

Consistency of perceptual response variability in size estimation and reproduction tasks.

IF 4.6 2区心理学 Q1 PSYCHOLOGY, EXPERIMENTAL

Behavior Research Methods

Pub Date : 2025-03-28 DOI: 10.3758/s13428-025-02650-1

Kenny Yu, Tzu-Yao Lin, Jonas Zaman, Francis Tuerlinckx, Niels Vanhasbroeck

Measuring perceptual uncertainty is important for understanding how perception influences post-perception behavior, while it remains unknown whether measured perceptual responses in behavioral tasks reflect true perceptual uncertainty or methodological artifacts. This study compared two size perception approaches: a visual analog scale (VAS) estimation task and a reproduction task. We recruited 180 participants who completed both tasks by estimating circle diameters on a VAS and adjusting circle sizes to match specified diameters. Our analysis used two Bayesian multilevel models - a variance decomposition model computing intraclass correlation coefficients (ICCs) for overall response consistency, and a generative psychophysical model to characterize perceptual response patterns. Results revealed high overall response consistency across tasks, but detailed variance component analysis uncovered systematic method differences. Participant, stimulus, and interaction variances were consistently higher in the estimation task, indicating greater individual differences and more idiosyncratic responses. Psychophysical analyses further showed that the estimation task produced a steeper perceptual slope and a lower intercept compared to reproduction. Notably, while overall task-level perceptual uncertainty was nearly identical, the scaling of uncertainty with stimulus size was markedly stronger in estimation. These findings suggest that much of the observed variability reflects genuine perceptual uncertainty rather than measurement error, though distinct cognitive demands shape its expression. Our results confirm that both VAS and reproduction tasks yield consistent measures of perceptual variability, underscoring their value in behavioral research and the need for future studies to disentangle intrinsic perceptual processes from task-specific noise.

{"title":"Consistency of perceptual response variability in size estimation and reproduction tasks.","authors":"Kenny Yu, Tzu-Yao Lin, Jonas Zaman, Francis Tuerlinckx, Niels Vanhasbroeck","doi":"10.3758/s13428-025-02650-1","DOIUrl":"https://doi.org/10.3758/s13428-025-02650-1","url":null,"abstract":"Measuring perceptual uncertainty is important for understanding how perception influences post-perception behavior, while it remains unknown whether measured perceptual responses in behavioral tasks reflect true perceptual uncertainty or methodological artifacts. This study compared two size perception approaches: a visual analog scale (VAS) estimation task and a reproduction task. We recruited 180 participants who completed both tasks by estimating circle diameters on a VAS and adjusting circle sizes to match specified diameters. Our analysis used two Bayesian multilevel models - a variance decomposition model computing intraclass correlation coefficients (ICCs) for overall response consistency, and a generative psychophysical model to characterize perceptual response patterns. Results revealed high overall response consistency across tasks, but detailed variance component analysis uncovered systematic method differences. Participant, stimulus, and interaction variances were consistently higher in the estimation task, indicating greater individual differences and more idiosyncratic responses. Psychophysical analyses further showed that the estimation task produced a steeper perceptual slope and a lower intercept compared to reproduction. Notably, while overall task-level perceptual uncertainty was nearly identical, the scaling of uncertainty with stimulus size was markedly stronger in estimation. These findings suggest that much of the observed variability reflects genuine perceptual uncertainty rather than measurement error, though distinct cognitive demands shape its expression. Our results confirm that both VAS and reproduction tasks yield consistent measures of perceptual variability, underscoring their value in behavioral research and the need for future studies to disentangle intrinsic perceptual processes from task-specific noise.","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 5","pages":"127"},"PeriodicalIF":4.6,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143742085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An introduction to Sequential Monte Carlo for Bayesian inference and model comparison-with examples for psychology and behavioral science.

IF 4.6 2区心理学 Q1 PSYCHOLOGY, EXPERIMENTAL

Behavior Research Methods

Pub Date : 2025-03-26 DOI: 10.3758/s13428-025-02642-1

Max Hinne

Bayesian inference is becoming an increasingly popular framework for statistics in the behavioral sciences. However, its application is hampered by its computational intractability - almost all Bayesian analyses require a form of approximation. While some of these approximate inference algorithms, such as Markov chain Monte Carlo (MCMC), have become well known throughout the literature, other approaches exist that are not as widespread. Here, we provide an introduction to another family of approximate inference techniques known as Sequential Monte Carlo (SMC). We show that SMC brings a number of benefits, which we illustrate in three different examples: linear regression and variable selection for depression, growth curve mixture modeling of grade point averages, and in computational modeling of the Iowa Gambling Task. These use cases demonstrate that SMC is efficient in exploring posterior distributions, reaching similar predictive performance as state-of-the-art MCMC approaches in less wall-clock time. Moreover, they show that SMC is effective in dealing with multi-modal distributions, and that SMC not only approximates the posterior distribution but simultaneously provides a useful estimate of the marginal likelihood, which is the essential quantity in Bayesian model comparison. All of this comes at no additional effort from the end user.

{"title":"An introduction to Sequential Monte Carlo for Bayesian inference and model comparison-with examples for psychology and behavioral science.","authors":"Max Hinne","doi":"10.3758/s13428-025-02642-1","DOIUrl":"10.3758/s13428-025-02642-1","url":null,"abstract":"Bayesian inference is becoming an increasingly popular framework for statistics in the behavioral sciences. However, its application is hampered by its computational intractability - almost all Bayesian analyses require a form of approximation. While some of these approximate inference algorithms, such as Markov chain Monte Carlo (MCMC), have become well known throughout the literature, other approaches exist that are not as widespread. Here, we provide an introduction to another family of approximate inference techniques known as Sequential Monte Carlo (SMC). We show that SMC brings a number of benefits, which we illustrate in three different examples: linear regression and variable selection for depression, growth curve mixture modeling of grade point averages, and in computational modeling of the Iowa Gambling Task. These use cases demonstrate that SMC is efficient in exploring posterior distributions, reaching similar predictive performance as state-of-the-art MCMC approaches in less wall-clock time. Moreover, they show that SMC is effective in dealing with multi-modal distributions, and that SMC not only approximates the posterior distribution but simultaneously provides a useful estimate of the marginal likelihood, which is the essential quantity in Bayesian model comparison. All of this comes at no additional effort from the end user.","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 5","pages":"125"},"PeriodicalIF":4.6,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11946982/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143727664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Accounting for item calibration error in computerized adaptive testing.

IF 4.6 2区心理学 Q1 PSYCHOLOGY, EXPERIMENTAL

Behavior Research Methods

Pub Date : 2025-03-26 DOI: 10.3758/s13428-025-02649-8

Aron Fink, Christoph König, Andreas Frey

In computerized adaptive testing (CAT), item parameter estimates derived from calibration studies are considered to be known and are used as fixed values for adaptive item selection and ability estimation. This is not completely accurate because these item parameter estimates contain a certain degree of error. If this error is random, the typical CAT procedure leads to standard errors of the final ability estimates that are too small. If the calibration error is large, it has been shown that the accuracy of the ability estimates is negatively affected due to the capitalization on chance problem, especially for extreme ability levels. In order to find a solution for this fundamental problem of CAT, we conducted a Monte Carlo simulation study to examine three approaches that can be used to consider the uncertainty of item parameter estimates in CAT. The first two approaches used a measurement error modeling approach in which item parameters were treated as covariates that contained errors. The third approach was fully Bayesian. Each of the approaches was compared with regard to the quality of the resulting ability estimates. The results indicate that each of the three approaches is capable of reducing bias and the mean squared error (MSE) of the ability estimates, especially for high item calibration errors. The Bayesian approach clearly outperformed the other approaches. We recommend the Bayesian approach, especially for application areas in which the recruitment of a large calibration sample is infeasible.

{"title":"Accounting for item calibration error in computerized adaptive testing.","authors":"Aron Fink, Christoph König, Andreas Frey","doi":"10.3758/s13428-025-02649-8","DOIUrl":"10.3758/s13428-025-02649-8","url":null,"abstract":"In computerized adaptive testing (CAT), item parameter estimates derived from calibration studies are considered to be known and are used as fixed values for adaptive item selection and ability estimation. This is not completely accurate because these item parameter estimates contain a certain degree of error. If this error is random, the typical CAT procedure leads to standard errors of the final ability estimates that are too small. If the calibration error is large, it has been shown that the accuracy of the ability estimates is negatively affected due to the capitalization on chance problem, especially for extreme ability levels. In order to find a solution for this fundamental problem of CAT, we conducted a Monte Carlo simulation study to examine three approaches that can be used to consider the uncertainty of item parameter estimates in CAT. The first two approaches used a measurement error modeling approach in which item parameters were treated as covariates that contained errors. The third approach was fully Bayesian. Each of the approaches was compared with regard to the quality of the resulting ability estimates. The results indicate that each of the three approaches is capable of reducing bias and the mean squared error (MSE) of the ability estimates, especially for high item calibration errors. The Bayesian approach clearly outperformed the other approaches. We recommend the Bayesian approach, especially for application areas in which the recruitment of a large calibration sample is infeasible.","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 5","pages":"126"},"PeriodicalIF":4.6,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11947018/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143717853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sample size matters when estimating test-retest reliability of behaviour.

IF 4.6 2区心理学 Q1 PSYCHOLOGY, EXPERIMENTAL

Behavior Research Methods

Pub Date : 2025-03-21 DOI: 10.3758/s13428-025-02599-1

Brendan Williams, Lily FitzGibbon, Daniel Brady, Anastasia Christakou

Intraclass correlation coefficients (ICCs) are a commonly used metric in test-retest reliability research to assess a measure's ability to quantify systematic between-subject differences. However, estimates of between-subject differences are also influenced by factors including within-subject variability, random errors, and measurement bias. Here, we use data collected from a large online sample (N = 150) to (1) quantify test-retest reliability of behavioural and computational measures of reversal learning using ICCs, and (2) use our dataset as the basis for a simulation study investigating the effects of sample size on variance component estimation and the association between estimates of variance components and ICC measures. In line with previously published work, we find reliable behavioural and computational measures of reversal learning, a commonly used assay of behavioural flexibility. Reliable estimates of between-subject, within-subject (across-session), and error variance components for behavioural and computational measures (with ± .05 precision and 80% confidence) required sample sizes ranging from 10 to over 300 (behavioural median N: between-subject = 167, within-subject = 34, error = 103; computational median N: between-subject = 68, within-subject = 20, error = 45). These sample sizes exceed those often used in reliability studies, suggesting that sample sizes larger than are commonly used for reliability studies (circa 30) are required to robustly estimate reliability of task performance measures. Additionally, we found that ICC estimates showed highly positive and highly negative correlations with between-subject and error variance components, respectively, as might be expected, which remained relatively stable across sample sizes. However, ICC estimates were weakly or not correlated with within-subject variance, providing evidence for the importance of variance decomposition for reliability studies.

{"title":"Sample size matters when estimating test-retest reliability of behaviour.","authors":"Brendan Williams, Lily FitzGibbon, Daniel Brady, Anastasia Christakou","doi":"10.3758/s13428-025-02599-1","DOIUrl":"10.3758/s13428-025-02599-1","url":null,"abstract":"Intraclass correlation coefficients (ICCs) are a commonly used metric in test-retest reliability research to assess a measure's ability to quantify systematic between-subject differences. However, estimates of between-subject differences are also influenced by factors including within-subject variability, random errors, and measurement bias. Here, we use data collected from a large online sample (N = 150) to (1) quantify test-retest reliability of behavioural and computational measures of reversal learning using ICCs, and (2) use our dataset as the basis for a simulation study investigating the effects of sample size on variance component estimation and the association between estimates of variance components and ICC measures. In line with previously published work, we find reliable behavioural and computational measures of reversal learning, a commonly used assay of behavioural flexibility. Reliable estimates of between-subject, within-subject (across-session), and error variance components for behavioural and computational measures (with ± .05 precision and 80% confidence) required sample sizes ranging from 10 to over 300 (behavioural median N: between-subject = 167, within-subject = 34, error = 103; computational median N: between-subject = 68, within-subject = 20, error = 45). These sample sizes exceed those often used in reliability studies, suggesting that sample sizes larger than are commonly used for reliability studies (circa 30) are required to robustly estimate reliability of task performance measures. Additionally, we found that ICC estimates showed highly positive and highly negative correlations with between-subject and error variance components, respectively, as might be expected, which remained relatively stable across sample sizes. However, ICC estimates were weakly or not correlated with within-subject variance, providing evidence for the importance of variance decomposition for reliability studies.","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 4","pages":"123"},"PeriodicalIF":4.6,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11928395/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ESM-Q: A consensus-based quality assessment tool for experience sampling method items.

IF 4.6 2区心理学 Q1 PSYCHOLOGY, EXPERIMENTAL

Behavior Research Methods

Pub Date : 2025-03-21 DOI: 10.3758/s13428-025-02626-1

Gudrun Eisele, Anu Hiekkaranta, Yoram K Kunkels, Marije Aan Het Rot, Wouter van Ballegooijen, Sara Laureen Bartels, Jojanneke A Bastiaansen, Patrick N Beymer, Lauren M Bylsma, Ryan W Carpenter, William D Ellison, Aaron J Fisher, Thomas Forkmann, Madelyn R Frumkin, Daniel Fulford, Kristin Naragon-Gainey, Talya Greene, Vera E Heininga, Andrew Jones, Elise K Kalokerinos, Peter Kuppens, Kathryn L Modecki, Fabiola Müller, Andreas B Neubauer, Vanessa Panaite, Maude Schneider, Jessie Sun, Stephen J Wilson, Caroline Zygar-Hoffmann, Inez Myin-Germeys, Olivia J Kirtley

The experience sampling method (ESM) is increasingly used by researchers from various disciplines to answer novel questions about individuals' daily lives. Measurement best practices have long been overlooked in ESM research, and recent reviews show that item quality is often not reported in ESM studies. The absence of information about item quality may be partly explained by the lack of consensus on how ESM item quality should be evaluated. As part of the ESM Item Repository project (esmitemrepository.com)-an international open science initiative that collects ESM items in an open item bank and evaluates their quality-we brought together 42 international ESM experts to develop an ESM item quality assessment tool. In four Delphi phases, experts suggested 57 item quality criteria, rated the criteria, provided arguments for and against the criteria, and rated the criteria again, considering reflections from other experts. The result of the Delphi process is ESM-Q: a quality assessment tool consisting of 10 core criteria, as well as an additional 15 supplementary criteria, to be used depending on the type of items being rated and the availability of supplementary information. The criteria cover topics ranging from construct validity to the optimal wording of items. ESM-Q can aid ESM researchers in selecting existing ESM items, developing new high-quality ESM items, and evaluating the quality of ESM items in systematic reviews. Expert reflections also highlight open research questions surrounding ESM item design that form a research agenda for ESM measurement.

{"title":"ESM-Q: A consensus-based quality assessment tool for experience sampling method items.","authors":"Gudrun Eisele, Anu Hiekkaranta, Yoram K Kunkels, Marije Aan Het Rot, Wouter van Ballegooijen, Sara Laureen Bartels, Jojanneke A Bastiaansen, Patrick N Beymer, Lauren M Bylsma, Ryan W Carpenter, William D Ellison, Aaron J Fisher, Thomas Forkmann, Madelyn R Frumkin, Daniel Fulford, Kristin Naragon-Gainey, Talya Greene, Vera E Heininga, Andrew Jones, Elise K Kalokerinos, Peter Kuppens, Kathryn L Modecki, Fabiola Müller, Andreas B Neubauer, Vanessa Panaite, Maude Schneider, Jessie Sun, Stephen J Wilson, Caroline Zygar-Hoffmann, Inez Myin-Germeys, Olivia J Kirtley","doi":"10.3758/s13428-025-02626-1","DOIUrl":"10.3758/s13428-025-02626-1","url":null,"abstract":"The experience sampling method (ESM) is increasingly used by researchers from various disciplines to answer novel questions about individuals' daily lives. Measurement best practices have long been overlooked in ESM research, and recent reviews show that item quality is often not reported in ESM studies. The absence of information about item quality may be partly explained by the lack of consensus on how ESM item quality should be evaluated. As part of the ESM Item Repository project (esmitemrepository.com)-an international open science initiative that collects ESM items in an open item bank and evaluates their quality-we brought together 42 international ESM experts to develop an ESM item quality assessment tool. In four Delphi phases, experts suggested 57 item quality criteria, rated the criteria, provided arguments for and against the criteria, and rated the criteria again, considering reflections from other experts. The result of the Delphi process is ESM-Q: a quality assessment tool consisting of 10 core criteria, as well as an additional 15 supplementary criteria, to be used depending on the type of items being rated and the availability of supplementary information. The criteria cover topics ranging from construct validity to the optimal wording of items. ESM-Q can aid ESM researchers in selecting existing ESM items, developing new high-quality ESM items, and evaluating the quality of ESM items in systematic reviews. Expert reflections also highlight open research questions surrounding ESM item design that form a research agenda for ESM measurement.","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 4","pages":"124"},"PeriodicalIF":4.6,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

What you see is not what you get: Observed scale score comparisons misestimate true group differences.

IF 4.6 2区心理学 Q1 PSYCHOLOGY, EXPERIMENTAL

Behavior Research Methods

Pub Date : 2025-03-19 DOI: 10.3758/s13428-025-02639-w

Bjarne Schmalbach, Ileana Schmalbach, Jochen Hardt

Social sciences of all kinds are interested in latent variables, their measurement, and how they differ between groups. The present study argues the importance of analyzing mean differences between groups using the latent variable approach. Using an open-access repository of widely applied personality questionnaires (N = 999,033), we evaluate the extent to which the commonly used observed sum score is susceptible to measurement error. Our findings show that Cohen's d values based on the observed variance significantly misestimate the true group difference (based on just the factor score variance) in 33 of the 70 studied cases, and by an average of 25.0% (or 0.048 standard deviations). There was no meaningful relationship between the effect size discrepancy and scale reliability as measured by McDonald's ω. We discuss the implications of these results and outline concrete steps that applied researchers can take to improve their analyses.

引用次数: 0

Appropriate modeling of endogeneity in cross-lagged models: Efficacy of auxiliary and model-implied instrumental variables.

IF 4.6 2区心理学 Q1 PSYCHOLOGY, EXPERIMENTAL

Behavior Research Methods

Pub Date : 2025-03-19 DOI: 10.3758/s13428-025-02631-4

Junyan Fang, Zhonglin Wen, Kit-Tai Hau, Xitong Huang

Endogeneity is a critical concern in research methodologies, yet it has been insufficiently addressed in longitudinal cross-lagged models, leading to potentially biased outcomes. This study scrutinized the endogeneity inherent in the cross-lagged panel model (CLPM), a prevalent and representative framework in longitudinal studies. We evaluated the efficacy of the instrumental variables (IV) methods, specifically focusing on both the auxiliary IVs (AIVs) and the model-implied IVs (MIIVs), in mitigating endogeneity issues. Simulation results indicated that endogeneity induced bias in CLPM, notably overestimating cross-lagged effects and thereby amplifying the apparent causal relationships. AIV-CLPM showed a smaller, yet still unacceptably high bias, along with low robustness and elevated type I error rates. In contrast, the MIIV-CLPM produced more accurate estimates with fewer type I errors, and, given sufficient observations, it achieved moderate statistical power. An extended simulation incorporating the random-intercept CLPM supported these findings, highlighting the generalizability of this approach. Furthermore, an empirical illustration demonstrated the practicality and feasibility of the MIIV-CLPM. Overall, MIIV is proven to be a superior modeling option within cross-lagged frameworks, effectively mitigating biases caused by endogeneity.

{"title":"Appropriate modeling of endogeneity in cross-lagged models: Efficacy of auxiliary and model-implied instrumental variables.","authors":"Junyan Fang, Zhonglin Wen, Kit-Tai Hau, Xitong Huang","doi":"10.3758/s13428-025-02631-4","DOIUrl":"10.3758/s13428-025-02631-4","url":null,"abstract":"Endogeneity is a critical concern in research methodologies, yet it has been insufficiently addressed in longitudinal cross-lagged models, leading to potentially biased outcomes. This study scrutinized the endogeneity inherent in the cross-lagged panel model (CLPM), a prevalent and representative framework in longitudinal studies. We evaluated the efficacy of the instrumental variables (IV) methods, specifically focusing on both the auxiliary IVs (AIVs) and the model-implied IVs (MIIVs), in mitigating endogeneity issues. Simulation results indicated that endogeneity induced bias in CLPM, notably overestimating cross-lagged effects and thereby amplifying the apparent causal relationships. AIV-CLPM showed a smaller, yet still unacceptably high bias, along with low robustness and elevated type I error rates. In contrast, the MIIV-CLPM produced more accurate estimates with fewer type I errors, and, given sufficient observations, it achieved moderate statistical power. An extended simulation incorporating the random-intercept CLPM supported these findings, highlighting the generalizability of this approach. Furthermore, an empirical illustration demonstrated the practicality and feasibility of the MIIV-CLPM. Overall, MIIV is proven to be a superior modeling option within cross-lagged frameworks, effectively mitigating biases caused by endogeneity.","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 4","pages":"121"},"PeriodicalIF":4.6,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143662318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Ready to ROC? A tutorial on simulation-based power analyses for null hypothesis significance, minimum-effect, and equivalence testing for ROC curve analyses. 准备好进行 ROC 分析了吗？基于模拟的功率分析教程，用于 ROC 曲线分析中的零假设显著性、最小效应和等效测试。

IF 4.6 2区心理学 Q1 PSYCHOLOGY, EXPERIMENTAL

Behavior Research Methods

Pub Date : 2025-03-18 DOI: 10.3758/s13428-025-02646-x

Paul Riesthuis, Henry Otgaar, Charlotte Bücken

The receiver operating characteristic (ROC) curve and its corresponding (partial) area under the curve (AUC) are frequently used statistical tools in psychological research to assess the discriminability of a test, method, intervention, or procedure. In this paper, we provide a tutorial on conducting simulation-based power analyses for ROC curve and (p)AUC analyses in R. We also created a Shiny app and the R package "ROCpower" to perform such power analyses. In our tutorial, we highlight the importance of setting the smallest effect size of interest (SESOI) for which researchers want to conduct their power analysis. The SESOI is the smallest effect that is practically or theoretically relevant for a specific field of research or study. We provide how such a SESOI can be established and how it changes hypotheses from simply establishing whether there is a statistically significant effect (i.e., null-hypothesis significance testing) to whether the effects are practically or theoretically important (i.e., minimum-effect testing) or whether the effect is too small to care about (i.e., equivalence testing). We show how power analyses for these different hypothesis tests can be conducted via a confidence interval-focused approach. This confidence interval-focused, simulation-based power analysis can be adapted to different research designs and questions and improves the reproducibility of power analyses.

{"title":"Ready to ROC? A tutorial on simulation-based power analyses for null hypothesis significance, minimum-effect, and equivalence testing for ROC curve analyses.","authors":"Paul Riesthuis, Henry Otgaar, Charlotte Bücken","doi":"10.3758/s13428-025-02646-x","DOIUrl":"10.3758/s13428-025-02646-x","url":null,"abstract":"The receiver operating characteristic (ROC) curve and its corresponding (partial) area under the curve (AUC) are frequently used statistical tools in psychological research to assess the discriminability of a test, method, intervention, or procedure. In this paper, we provide a tutorial on conducting simulation-based power analyses for ROC curve and (p)AUC analyses in R. We also created a Shiny app and the R package \"ROCpower\" to perform such power analyses. In our tutorial, we highlight the importance of setting the smallest effect size of interest (SESOI) for which researchers want to conduct their power analysis. The SESOI is the smallest effect that is practically or theoretically relevant for a specific field of research or study. We provide how such a SESOI can be established and how it changes hypotheses from simply establishing whether there is a statistically significant effect (i.e., null-hypothesis significance testing) to whether the effects are practically or theoretically important (i.e., minimum-effect testing) or whether the effect is too small to care about (i.e., equivalence testing). We show how power analyses for these different hypothesis tests can be conducted via a confidence interval-focused approach. This confidence interval-focused, simulation-based power analysis can be adapted to different research designs and questions and improves the reproducibility of power analyses.","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 4","pages":"120"},"PeriodicalIF":4.6,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11920309/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143656171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Performance of location-scale models in meta-analysis: A simulation study.

IF 4.6 2区心理学 Q1 PSYCHOLOGY, EXPERIMENTAL

Behavior Research Methods

Pub Date : 2025-03-17 DOI: 10.3758/s13428-025-02622-5

Desirée Blázquez-Rincón, José Antonio López-López, Wolfgang Viechtbauer

Location-scale models in the field of meta-analysis allow researchers to simultaneously study the influence of moderator variables on the mean (location) and variance (scale) of the distribution of true effects. However, the increased complexity of such models can make model fitting challenging. Moreover, the statistical properties of the estimation and inference methods for such models have not been systematically examined in the meta-analytic context. We therefore conducted a Monte Carlo simulation study to compare different estimation methods (maximum or restricted maximum likelihood estimation), significance tests (Wald-type, permutation, and likelihood-ratio tests), and methods for constructing confidence intervals (Wald-type and profile-likelihood intervals) for the scale coefficients of such models. When restricted maximum likelihood estimation was used, slightly closer to nominal rejection rates and narrower confidence intervals were obtained. The permutation test yielded type I error rates closest to the nominal level, whereas the likelihood-ratio test obtained the highest statistical power. In most scenarios, profile-likelihood intervals showed lower coverage probabilities than the Wald-type method but closer to the nominal 95% level. Finally, slightly higher rejection rates and coverage probabilities were obtained when a dichotomous moderator was examined rather than a continuous one. Despite the need to use some constraints on the parameter space for the scale coefficients and the possibility of non-convergence of some procedures that may affect the fitting of the specified models, location-scale models proved to be a valid and useful tool for modeling the heterogeneity parameter in meta-analysis.

{"title":"Performance of location-scale models in meta-analysis: A simulation study.","authors":"Desirée Blázquez-Rincón, José Antonio López-López, Wolfgang Viechtbauer","doi":"10.3758/s13428-025-02622-5","DOIUrl":"10.3758/s13428-025-02622-5","url":null,"abstract":"Location-scale models in the field of meta-analysis allow researchers to simultaneously study the influence of moderator variables on the mean (location) and variance (scale) of the distribution of true effects. However, the increased complexity of such models can make model fitting challenging. Moreover, the statistical properties of the estimation and inference methods for such models have not been systematically examined in the meta-analytic context. We therefore conducted a Monte Carlo simulation study to compare different estimation methods (maximum or restricted maximum likelihood estimation), significance tests (Wald-type, permutation, and likelihood-ratio tests), and methods for constructing confidence intervals (Wald-type and profile-likelihood intervals) for the scale coefficients of such models. When restricted maximum likelihood estimation was used, slightly closer to nominal rejection rates and narrower confidence intervals were obtained. The permutation test yielded type I error rates closest to the nominal level, whereas the likelihood-ratio test obtained the highest statistical power. In most scenarios, profile-likelihood intervals showed lower coverage probabilities than the Wald-type method but closer to the nominal 95% level. Finally, slightly higher rejection rates and coverage probabilities were obtained when a dichotomous moderator was examined rather than a continuous one. Despite the need to use some constraints on the parameter space for the scale coefficients and the possibility of non-convergence of some procedures that may affect the fitting of the specified models, location-scale models proved to be a valid and useful tool for modeling the heterogeneity parameter in meta-analysis.","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 4","pages":"118"},"PeriodicalIF":4.6,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11914364/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143647006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Samply Stream API: The AI-enhanced method for real-time event data streaming. Samply Stream API：实时事件数据流的人工智能增强方法。

IF 4.6 2区心理学 Q1 PSYCHOLOGY, EXPERIMENTAL

Behavior Research Methods

Pub Date : 2025-03-17 DOI: 10.3758/s13428-025-02634-1

Yury Shevchenko, Ulf-Dietrich Reips

This manuscript introduces a novel method for conducting behavioral and social research by streaming real-time information to participants and manipulating content for experimental purposes via AI. We present an extension of the Samply software, which facilitates the integration of event-related data with mobile surveys and experiments. To assess the feasibility of this method, we conducted an experiment where news headlines were modified by a Chat-GPT algorithm and streamed to participants via the Samply Stream API and mobile push notifications. Feedback from participants indicated that most did not experience technical problems. There was no significant difference in readability across original, paraphrased, and misinformation-injected news conditions, with only 1.2% of all news items reported as unreadable. Participants reported significantly less familiarity with misinformation-injected news (84% unfamiliarity) compared to original and paraphrased news (73% unfamiliarity), suggesting successful manipulation of information without compromising readability. Dropout and non-response rates were comparable to those in other experience sampling studies. The streaming method offers significant potential for various applications, including public opinion research, healthcare, marketing, and environmental monitoring. By enabling the real-time collection of contextually relevant data, this method has the potential to enhance the external validity of behavioral research and provides a powerful tool for studying human behavior in naturalistic settings.

{"title":"Samply Stream API: The AI-enhanced method for real-time event data streaming.","authors":"Yury Shevchenko, Ulf-Dietrich Reips","doi":"10.3758/s13428-025-02634-1","DOIUrl":"10.3758/s13428-025-02634-1","url":null,"abstract":"This manuscript introduces a novel method for conducting behavioral and social research by streaming real-time information to participants and manipulating content for experimental purposes via AI. We present an extension of the Samply software, which facilitates the integration of event-related data with mobile surveys and experiments. To assess the feasibility of this method, we conducted an experiment where news headlines were modified by a Chat-GPT algorithm and streamed to participants via the Samply Stream API and mobile push notifications. Feedback from participants indicated that most did not experience technical problems. There was no significant difference in readability across original, paraphrased, and misinformation-injected news conditions, with only 1.2% of all news items reported as unreadable. Participants reported significantly less familiarity with misinformation-injected news (84% unfamiliarity) compared to original and paraphrased news (73% unfamiliarity), suggesting successful manipulation of information without compromising readability. Dropout and non-response rates were comparable to those in other experience sampling studies. The streaming method offers significant potential for various applications, including public opinion research, healthcare, marketing, and environmental monitoring. By enabling the real-time collection of contextually relevant data, this method has the potential to enhance the external validity of behavioral research and provides a powerful tool for studying human behavior in naturalistic settings.","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 4","pages":"119"},"PeriodicalIF":4.6,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11914333/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143647012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0