Pub Date : 2026-01-16DOI: 10.3758/s13428-025-02911-z
Wendie Yang, Shu Fai Cheung
Standardized coefficients - including factor loadings, correlations, and indirect effects - are fundamental to interpreting structural equation modeling (SEM) results in psychology. However, they often exhibit skewed sampling distributions in finite samples, which are not captured by conventional symmetric confidence intervals (CIs). Methods such as bootstrap CI that do not impose symmetry are more appropriate for these coefficients. Despite its popularity, the widely used R package lavaan (version 0.6-19 or earlier) provides limited bootstrap support for standardized coefficients. Specifically, its function standardizedSolution() uses the delta method for CIs and lacks bootstrap p values. It provides a flexible and powerful function, bootstrapLavaan(), for bootstrapping, and it can be used to form bootstrap CIs for the standardized coefficients. However, this function requires a certain level of R coding skills. Moreover, no built-in functions are available to inspect bootstrap distributions, which are recommended for assessing the stability of the bootstrap estimates. To address these limitations, we developed the semboottools R package, which provides a simple workflow in SEM to form bootstrap confidence intervals for unstandardized and standardized estimates of model and user-defined parameters. It allows researchers to generate percentile or bias-corrected bootstrap CIs, standard errors, asymmetric p values, compare the bootstrap CIs with other CI methods (e.g., delta method), and visualize the distributions of bootstrap estimates - with minimal coding effort. We believe the tool can facilitate researchers in easily forming bootstrap CIs, comparing different CI methods to assess the need for bootstrapping, and examining the distribution of bootstrap estimates to assess their stability.
{"title":"Forming bootstrap confidence intervals and examining bootstrap distributions of standardized coefficients in structural equation modelling: A simplified workflow using the R package semboottools.","authors":"Wendie Yang, Shu Fai Cheung","doi":"10.3758/s13428-025-02911-z","DOIUrl":"https://doi.org/10.3758/s13428-025-02911-z","url":null,"abstract":"<p><p>Standardized coefficients - including factor loadings, correlations, and indirect effects - are fundamental to interpreting structural equation modeling (SEM) results in psychology. However, they often exhibit skewed sampling distributions in finite samples, which are not captured by conventional symmetric confidence intervals (CIs). Methods such as bootstrap CI that do not impose symmetry are more appropriate for these coefficients. Despite its popularity, the widely used R package lavaan (version 0.6-19 or earlier) provides limited bootstrap support for standardized coefficients. Specifically, its function standardizedSolution() uses the delta method for CIs and lacks bootstrap p values. It provides a flexible and powerful function, bootstrapLavaan(), for bootstrapping, and it can be used to form bootstrap CIs for the standardized coefficients. However, this function requires a certain level of R coding skills. Moreover, no built-in functions are available to inspect bootstrap distributions, which are recommended for assessing the stability of the bootstrap estimates. To address these limitations, we developed the semboottools R package, which provides a simple workflow in SEM to form bootstrap confidence intervals for unstandardized and standardized estimates of model and user-defined parameters. It allows researchers to generate percentile or bias-corrected bootstrap CIs, standard errors, asymmetric p values, compare the bootstrap CIs with other CI methods (e.g., delta method), and visualize the distributions of bootstrap estimates - with minimal coding effort. We believe the tool can facilitate researchers in easily forming bootstrap CIs, comparing different CI methods to assess the need for bootstrapping, and examining the distribution of bootstrap estimates to assess their stability.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"58 2","pages":"38"},"PeriodicalIF":3.9,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145987886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.3758/s13428-025-02914-w
Lisa Loy, James P Trujillo, Floris Roelofsen
Gesture recognition technology is a popular area of research, offering applications in many fields, including behaviour research, human-computer interaction (HCI), medical research, and surveillance culture, among others. However, the large quantity of data needed to train a recognition algorithm is not always available, and differences between the training set and one's own research data in factors such as recording conditions and participant characteristics may hinder transferability. To address these issues, we propose training and testing recognition algorithms on virtual agents, a tool that has not yet been used for this purpose in multimodal communication research. We provide an example use case with step-by-step instructions, using mocap data to animate a virtual agent and create customised lighting conditions, backgrounds, and camera angles, creating a virtual agent-only dataset to train and test a gesture recognition algorithm. This approach also allows us to assess the impact of particular features, such as background and lighting. Our best-performing model in optimal background and lighting conditions achieved accuracy of 85.9%. When introducing background clutter and reduced lighting, the accuracy dropped to 71.6%. When testing the virtual agent-trained model on images of humans, the accuracy of target handshape classification ranged from 72% to 95%. The results suggest that training an algorithm on artificial data (1) is a resourceful, convenient, and effective way to customise algorithms, (2) potentially addresses issues of data sparsity, and (3) can be used to assess the impact of many contextual and environmental factors that would not be feasible to systematically assess using human data.
{"title":"Virtual agents as a scalable tool for diverse, robust gesture recognition.","authors":"Lisa Loy, James P Trujillo, Floris Roelofsen","doi":"10.3758/s13428-025-02914-w","DOIUrl":"10.3758/s13428-025-02914-w","url":null,"abstract":"<p><p>Gesture recognition technology is a popular area of research, offering applications in many fields, including behaviour research, human-computer interaction (HCI), medical research, and surveillance culture, among others. However, the large quantity of data needed to train a recognition algorithm is not always available, and differences between the training set and one's own research data in factors such as recording conditions and participant characteristics may hinder transferability. To address these issues, we propose training and testing recognition algorithms on virtual agents, a tool that has not yet been used for this purpose in multimodal communication research. We provide an example use case with step-by-step instructions, using mocap data to animate a virtual agent and create customised lighting conditions, backgrounds, and camera angles, creating a virtual agent-only dataset to train and test a gesture recognition algorithm. This approach also allows us to assess the impact of particular features, such as background and lighting. Our best-performing model in optimal background and lighting conditions achieved accuracy of 85.9%. When introducing background clutter and reduced lighting, the accuracy dropped to 71.6%. When testing the virtual agent-trained model on images of humans, the accuracy of target handshape classification ranged from 72% to 95%. The results suggest that training an algorithm on artificial data (1) is a resourceful, convenient, and effective way to customise algorithms, (2) potentially addresses issues of data sparsity, and (3) can be used to assess the impact of many contextual and environmental factors that would not be feasible to systematically assess using human data.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"58 2","pages":"41"},"PeriodicalIF":3.9,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12811268/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145987841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.3758/s13428-025-02927-5
Jonathan D'hondt, Barbara Briers
Understanding food preferences plays a crucial role in addressing both health concerns, such as obesity, and environmental concerns, such as climate change. Recognizing the impact of lay beliefs on food preferences is essential in addressing these challenges. One prevalent belief is the "unhealthy = tasty intuition" (UTI), the belief that taste and health in food do not go together. While self-report scales and behavioral tasks are commonly used to measure such beliefs, they have distinct methodological purposes: scales are better suited for assessing stable, trait-like constructs, whereas tasks capture more dynamic processes and are well suited for experimental manipulation. This paper introduces a mouse-tracking classification task that provides a process-based behavioral index of UTI, providing a novel approach for assessing implicit beliefs about the relationship between taste and health in food. Three studies validate the task, demonstrating correlations between explicit UTI scores and task performance. Additionally, the task predicts actual food consumption and, importantly, exhibits sensitivity to contextual manipulations. Because this task can be adapted to measure other beliefs, it is a valuable tool for researchers working on individual lay beliefs and decision-making processes. To that end, a template of the task is provided to help other researchers build on this work.
{"title":"A mouse-tracking classification task to measure the unhealthy = tasty intuition.","authors":"Jonathan D'hondt, Barbara Briers","doi":"10.3758/s13428-025-02927-5","DOIUrl":"10.3758/s13428-025-02927-5","url":null,"abstract":"<p><p>Understanding food preferences plays a crucial role in addressing both health concerns, such as obesity, and environmental concerns, such as climate change. Recognizing the impact of lay beliefs on food preferences is essential in addressing these challenges. One prevalent belief is the \"unhealthy = tasty intuition\" (UTI), the belief that taste and health in food do not go together. While self-report scales and behavioral tasks are commonly used to measure such beliefs, they have distinct methodological purposes: scales are better suited for assessing stable, trait-like constructs, whereas tasks capture more dynamic processes and are well suited for experimental manipulation. This paper introduces a mouse-tracking classification task that provides a process-based behavioral index of UTI, providing a novel approach for assessing implicit beliefs about the relationship between taste and health in food. Three studies validate the task, demonstrating correlations between explicit UTI scores and task performance. Additionally, the task predicts actual food consumption and, importantly, exhibits sensitivity to contextual manipulations. Because this task can be adapted to measure other beliefs, it is a valuable tool for researchers working on individual lay beliefs and decision-making processes. To that end, a template of the task is provided to help other researchers build on this work.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"58 2","pages":"37"},"PeriodicalIF":3.9,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12811306/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145987864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.3758/s13428-025-02869-y
Niek Stevenson, Michelle C Donzallaz, Reilly J Innes, Birte U Forstmann, Dora Matzke, Andrew Heathcote
EMC2 is an R package that provides a comprehensive five-phase workflow for Bayesian hierarchical analysis of cognitive models of choice. In the design phase, EMC2 bridges the gap between standard regression analyses and cognitive modeling through linear-model specifications for cognitive-model parameters. In the Bayesian specification and sampling phases, the package provides flexible priors, hierarchical structures, and efficient sampling algorithms, enabling fast, user-friendly estimation of computationally intensive cognitive models. In the final two phases, EMC2 provides a suite of functions for model criticism and inference. Using two leading evidence-accumulation models for illustration, we provide a tutorial on the EMC2-based workflow that eases and guides the process of specifying, evaluating, refining, comparing, and interpreting Bayesian hierarchical cognitive models.
{"title":"Bayesian hierarchical cognitive modeling with the EMC2 package.","authors":"Niek Stevenson, Michelle C Donzallaz, Reilly J Innes, Birte U Forstmann, Dora Matzke, Andrew Heathcote","doi":"10.3758/s13428-025-02869-y","DOIUrl":"https://doi.org/10.3758/s13428-025-02869-y","url":null,"abstract":"<p><p>EMC2 is an R package that provides a comprehensive five-phase workflow for Bayesian hierarchical analysis of cognitive models of choice. In the design phase, EMC2 bridges the gap between standard regression analyses and cognitive modeling through linear-model specifications for cognitive-model parameters. In the Bayesian specification and sampling phases, the package provides flexible priors, hierarchical structures, and efficient sampling algorithms, enabling fast, user-friendly estimation of computationally intensive cognitive models. In the final two phases, EMC2 provides a suite of functions for model criticism and inference. Using two leading evidence-accumulation models for illustration, we provide a tutorial on the EMC2-based workflow that eases and guides the process of specifying, evaluating, refining, comparing, and interpreting Bayesian hierarchical cognitive models.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"58 1","pages":"35"},"PeriodicalIF":3.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145958351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.3758/s13428-025-02823-y
Carolina Guidolin, Johannes Zauner, Steffen Lutz Hartmeyer, Manuel Spitschan
In field studies using wearable light loggers, participants often need to remove the devices, resulting in non-wear intervals of varying and unknown duration. Accurate detection of these intervals is an essential step during data pre-processing. Here, we deployed a multi-modal approach to collect non-wear time during a longitudinal light exposure collection campaign and systematically compare non-wear detection strategies. Healthy participants (n = 26; mean age 28 ± 5 years, 14F) wore a near-corneal plane light logger for 1 week and reported non-wear events in three ways: pressing an "event marker" button on the light logger, placing it in a black bag, and using an app-based Wear log. Wear log entries, checked twice daily, served as ground truth for non-wear detection, showing that non-wear time constituted 5.4 ± 3.8% (mean ± SD) of total participation time. Button presses at the start and end of non-wear intervals were identified in >85.4% of cases when considering time windows beyond 1 min for detection. To detect non-wear intervals based on black bag use and lack of motion, we employed an algorithm that detects clusters of low illuminance and clusters of low activity. Performance was higher for illuminance (F1 = 0.78) than for activity (F1 = 0.52). Light exposure metrics derived from the full dataset, a dataset filtered for non-wear based on self-reports, and a dataset filtered for non-wear using the low illuminance clusters detection algorithm showed minimal differences. Our results highlight that while non-wear detection may be less critical in high-compliance cohorts, systematically collecting and detecting non-wear intervals is feasible and important for ensuring robust data pre-processing.
{"title":"Collecting, detecting, and handling non-wear intervals in longitudinal light exposure data.","authors":"Carolina Guidolin, Johannes Zauner, Steffen Lutz Hartmeyer, Manuel Spitschan","doi":"10.3758/s13428-025-02823-y","DOIUrl":"10.3758/s13428-025-02823-y","url":null,"abstract":"<p><p>In field studies using wearable light loggers, participants often need to remove the devices, resulting in non-wear intervals of varying and unknown duration. Accurate detection of these intervals is an essential step during data pre-processing. Here, we deployed a multi-modal approach to collect non-wear time during a longitudinal light exposure collection campaign and systematically compare non-wear detection strategies. Healthy participants (n = 26; mean age 28 ± 5 years, 14F) wore a near-corneal plane light logger for 1 week and reported non-wear events in three ways: pressing an \"event marker\" button on the light logger, placing it in a black bag, and using an app-based Wear log. Wear log entries, checked twice daily, served as ground truth for non-wear detection, showing that non-wear time constituted 5.4 ± 3.8% (mean ± SD) of total participation time. Button presses at the start and end of non-wear intervals were identified in >85.4% of cases when considering time windows beyond 1 min for detection. To detect non-wear intervals based on black bag use and lack of motion, we employed an algorithm that detects clusters of low illuminance and clusters of low activity. Performance was higher for illuminance (F1 = 0.78) than for activity (F1 = 0.52). Light exposure metrics derived from the full dataset, a dataset filtered for non-wear based on self-reports, and a dataset filtered for non-wear using the low illuminance clusters detection algorithm showed minimal differences. Our results highlight that while non-wear detection may be less critical in high-compliance cohorts, systematically collecting and detecting non-wear intervals is feasible and important for ensuring robust data pre-processing.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"58 1","pages":"36"},"PeriodicalIF":3.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12795912/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145958356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.3758/s13428-025-02910-0
Tuǧçe Nur Pekçetin, Gaye Aşkın, Şeyda Evsen, Tuvana Dilan Karaduman, Badel Barinal, Jana Tunç, Cengiz Acarturk, Burcu A Urgen
We present the HR-ACT (Human-Robot Action) Database, a comprehensive collection of 80 standardized videos featuring matched communicative and noncommunicative actions performed by both a humanoid robot (Pepper) and a human actor. We describe the creation of 40 action exemplars per agent, with actions executed in a similar manner, timing, and number of repetitions. The database includes detailed normative data collected from 438 participants, providing metrics on action identification, confidence ratings, communicativeness ratings, meaning clusters, and H values (an entropy-based measure reflecting response homogeneity). We provide researchers with controlled yet naturalistic stimuli in multiple formats: videos, image frames, and raw animation files (.qanim). These materials support diverse research applications in human-robot interaction, cognitive psychology, and neuroscience. The database enables systematic investigation of action perception across human and robotic agents, while the inclusion of raw animation files allows researchers using Pepper robots to implement these actions for real-time experiments. The full set of stimuli, along with comprehensive normative data and documentation, is publicly available at https://osf.io/8vsxq/ .
{"title":"HR-ACT (Human-Robot Action) Database: Communicative and noncommunicative action videos featuring a human and a humanoid robot.","authors":"Tuǧçe Nur Pekçetin, Gaye Aşkın, Şeyda Evsen, Tuvana Dilan Karaduman, Badel Barinal, Jana Tunç, Cengiz Acarturk, Burcu A Urgen","doi":"10.3758/s13428-025-02910-0","DOIUrl":"10.3758/s13428-025-02910-0","url":null,"abstract":"<p><p>We present the HR-ACT (Human-Robot Action) Database, a comprehensive collection of 80 standardized videos featuring matched communicative and noncommunicative actions performed by both a humanoid robot (Pepper) and a human actor. We describe the creation of 40 action exemplars per agent, with actions executed in a similar manner, timing, and number of repetitions. The database includes detailed normative data collected from 438 participants, providing metrics on action identification, confidence ratings, communicativeness ratings, meaning clusters, and H values (an entropy-based measure reflecting response homogeneity). We provide researchers with controlled yet naturalistic stimuli in multiple formats: videos, image frames, and raw animation files (.qanim). These materials support diverse research applications in human-robot interaction, cognitive psychology, and neuroscience. The database enables systematic investigation of action perception across human and robotic agents, while the inclusion of raw animation files allows researchers using Pepper robots to implement these actions for real-time experiments. The full set of stimuli, along with comprehensive normative data and documentation, is publicly available at https://osf.io/8vsxq/ .</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"58 1","pages":"34"},"PeriodicalIF":3.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12795972/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145958338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-09DOI: 10.3758/s13428-025-02918-6
Giorgio Piazza, Natalia Kartushina, Christoforos Souganidis, James E Flege, Clara D Martin
Psycholinguistic research has become increasingly reliant on online experimentation, making it an attractive approach for studying speech production. However, concerns remain about data quality and participant engagement in online settings. In this preregistered study, we used two tasks-picture naming and reading aloud-to test whether the lexical frequency effect (low-frequency words having shorter speech onset times than high-frequency words) could be reliably detected in the online environment (run at home), both with and without experimenter supervision. Participants completed the same two tasks at home and in the lab. Half of the participants performed both tasks with supervision and the other half unsupervised. In the naming task, all conditions yielded consistent frequency effects (~27-41 ms), comparable to previous online and lab findings. In the reading aloud task, lexical frequency effect emerged in all conditions except for the home-supervised, where the effect was in the expected direction but nonsignificant (~12 ms). Notably, participants were overall faster at home than in the lab (~10 ms), and unsupervised settings yielded the largest effect sizes. This suggests that experimenter presence may inadvertently dampen subtle effects, possibly due to increased self-monitoring or reduced comfort. Such findings indicate the reliability of online platforms for speech production research in psycholinguistics and highlight the nuanced influence of supervision on speech outcomes.
{"title":"Speech onset time at home or in the lab: The role of testing environment and experimenter presence.","authors":"Giorgio Piazza, Natalia Kartushina, Christoforos Souganidis, James E Flege, Clara D Martin","doi":"10.3758/s13428-025-02918-6","DOIUrl":"https://doi.org/10.3758/s13428-025-02918-6","url":null,"abstract":"<p><p>Psycholinguistic research has become increasingly reliant on online experimentation, making it an attractive approach for studying speech production. However, concerns remain about data quality and participant engagement in online settings. In this preregistered study, we used two tasks-picture naming and reading aloud-to test whether the lexical frequency effect (low-frequency words having shorter speech onset times than high-frequency words) could be reliably detected in the online environment (run at home), both with and without experimenter supervision. Participants completed the same two tasks at home and in the lab. Half of the participants performed both tasks with supervision and the other half unsupervised. In the naming task, all conditions yielded consistent frequency effects (~27-41 ms), comparable to previous online and lab findings. In the reading aloud task, lexical frequency effect emerged in all conditions except for the home-supervised, where the effect was in the expected direction but nonsignificant (~12 ms). Notably, participants were overall faster at home than in the lab (~10 ms), and unsupervised settings yielded the largest effect sizes. This suggests that experimenter presence may inadvertently dampen subtle effects, possibly due to increased self-monitoring or reduced comfort. Such findings indicate the reliability of online platforms for speech production research in psycholinguistics and highlight the nuanced influence of supervision on speech outcomes.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"58 1","pages":"33"},"PeriodicalIF":3.9,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145942407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-06DOI: 10.3758/s13428-025-02905-x
Nienke E R van Bueren, Anne H van Hoogmoed, Sanne H G van der Ven, Lisa M Jonkman
Electroencephalography (EEG) provides valuable insights into brain development, but collecting high-quality data can be tedious, limiting its usability with children. This study evaluates the feasibility and reliability of EEG data acquisition in children with a wireless consumer-grade EEG headset (EMOTIV EPOC X), by comparing it to a research-grade system (BioSemi ActiveTwo), with a focus on aperiodic brain activity. The portability of the EMOTIV headset allows for EEG data collection in ecologically valid, real-world settings such as schools, enabling novel insights into brain activity during learning. We recorded EEG from 93 children (aged 9-10 years) using the EMOTIV headset, beginning with a 4-min resting-state measurement, followed by assessments of mathematical ability, visuospatial working memory, and verbal working memory, in a classroom environment. Aperiodic activity, thought to reflect fundamental aspects of neural excitability and cognitive processing, was extracted and its reliability compared across the two EEG systems. We further tested whether aperiodic activity recorded with EMOTIV predicts mathematical ability, replicating earlier research using research-grade EEG equipment. Our findings reveal that, similar to earlier findings, lower aperiodic activity was associated with higher math performance, supporting its role as a neural marker of cognitive ability. These results demonstrate the feasibility and reliability of using a consumer-grade mobile EEG headset to investigate individual differences in cognitive development in naturalistic contexts. This work opens up new opportunities for large-scale, school-based neurocognitive assessments and paves the way for personalized educational approaches based on neural profiles.
{"title":"Comparing aperiodic activity in consumer-grade and research-grade EEG: Reliability and association with mathematical ability.","authors":"Nienke E R van Bueren, Anne H van Hoogmoed, Sanne H G van der Ven, Lisa M Jonkman","doi":"10.3758/s13428-025-02905-x","DOIUrl":"10.3758/s13428-025-02905-x","url":null,"abstract":"<p><p>Electroencephalography (EEG) provides valuable insights into brain development, but collecting high-quality data can be tedious, limiting its usability with children. This study evaluates the feasibility and reliability of EEG data acquisition in children with a wireless consumer-grade EEG headset (EMOTIV EPOC X), by comparing it to a research-grade system (BioSemi ActiveTwo), with a focus on aperiodic brain activity. The portability of the EMOTIV headset allows for EEG data collection in ecologically valid, real-world settings such as schools, enabling novel insights into brain activity during learning. We recorded EEG from 93 children (aged 9-10 years) using the EMOTIV headset, beginning with a 4-min resting-state measurement, followed by assessments of mathematical ability, visuospatial working memory, and verbal working memory, in a classroom environment. Aperiodic activity, thought to reflect fundamental aspects of neural excitability and cognitive processing, was extracted and its reliability compared across the two EEG systems. We further tested whether aperiodic activity recorded with EMOTIV predicts mathematical ability, replicating earlier research using research-grade EEG equipment. Our findings reveal that, similar to earlier findings, lower aperiodic activity was associated with higher math performance, supporting its role as a neural marker of cognitive ability. These results demonstrate the feasibility and reliability of using a consumer-grade mobile EEG headset to investigate individual differences in cognitive development in naturalistic contexts. This work opens up new opportunities for large-scale, school-based neurocognitive assessments and paves the way for personalized educational approaches based on neural profiles.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"58 1","pages":"32"},"PeriodicalIF":3.9,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12775036/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145910324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-05DOI: 10.3758/s13428-025-02830-z
Han Zhang, John Jonides
We present PupEyes, an open-source Python package for preprocessing and visualizing pupil size and fixation data. PupEyes supports data collected from EyeLink and Tobii eye-trackers as well as any generic dataset that conforms to minimal formatting standards. Developed with current best practices, PupEyes provides a comprehensive pupil preprocessing pipeline and interactive tools for data exploration and diagnosis. In addition to pupil size data, PupEyes provides interactive tools for visualizing fixation data, drawing areas of interest (AOIs), and computing AOI-based metrics. PupEyes uses the pandas data structure and can work seamlessly with other data analysis packages within the Python ecosystem. Overall, PupEyes (1) ensures that pupil size data are preprocessed in a principled, transparent, and reproducible manner, (2) helps researchers better understand their data through interactive visualizations, and (3) enables flexible extensions for further analysis tailored to specific research goals. To ensure computational reproducibility, we provide detailed, executable tutorials ( https://pupeyes.readthedocs.io/ ) that allow users to reproduce and modify the code examples in a virtual environment.
{"title":"PupEyes: An interactive Python library for eye movement data processing.","authors":"Han Zhang, John Jonides","doi":"10.3758/s13428-025-02830-z","DOIUrl":"10.3758/s13428-025-02830-z","url":null,"abstract":"<p><p>We present PupEyes, an open-source Python package for preprocessing and visualizing pupil size and fixation data. PupEyes supports data collected from EyeLink and Tobii eye-trackers as well as any generic dataset that conforms to minimal formatting standards. Developed with current best practices, PupEyes provides a comprehensive pupil preprocessing pipeline and interactive tools for data exploration and diagnosis. In addition to pupil size data, PupEyes provides interactive tools for visualizing fixation data, drawing areas of interest (AOIs), and computing AOI-based metrics. PupEyes uses the pandas data structure and can work seamlessly with other data analysis packages within the Python ecosystem. Overall, PupEyes (1) ensures that pupil size data are preprocessed in a principled, transparent, and reproducible manner, (2) helps researchers better understand their data through interactive visualizations, and (3) enables flexible extensions for further analysis tailored to specific research goals. To ensure computational reproducibility, we provide detailed, executable tutorials ( https://pupeyes.readthedocs.io/ ) that allow users to reproduce and modify the code examples in a virtual environment.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"58 1","pages":"29"},"PeriodicalIF":3.9,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12769653/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145905501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-05DOI: 10.3758/s13428-025-02874-1
Yongdong Ouyang, Maria Laura Avila, Anna Heath
Assessing the effectiveness of behavioral interventions in rare diseases is challenging due to extremely limited sample sizes and ethical challenges with withholding intervention when limited treatment options are available. The multiple baseline design (MBD) is commonly used in behavioral science to assess interventions, while allowing all individuals to receive the intervention. MBD is primarily used to evaluate a single intervention so an alternative strategy is needed when evaluating more than one intervention. In this case, a factorial design may be recommended, but a standard factorial design may not be feasible in rare diseases due to extremely limited sample sizes. To address this challenge, we propose the individually randomized multiple baseline factorial design (MBFD), which requires fewer participants but can attain sufficient statistical power for evaluating at least two interventions and their combinations. Furthermore, by incorporating randomization, we enhance the internal validity of the design. This study describes the design characteristics of a standard MBFD, clarifies estimands, and introduces three statistical models under different assumptions. Through simulations, we analyze data from MBFD using linear mixed effect models (LMM) and generalized estimating equations (GEE) to compare biases, sizes, and power of detecting the main effects from the models. We recommend using GEE to mitigate potential random effect misspecifications and suggest small sample corrections, such as Mancl and DeRouen variance estimator, for sample sizes below 120.
{"title":"Design and analysis of individually randomized multiple baseline factorial trials.","authors":"Yongdong Ouyang, Maria Laura Avila, Anna Heath","doi":"10.3758/s13428-025-02874-1","DOIUrl":"10.3758/s13428-025-02874-1","url":null,"abstract":"<p><p>Assessing the effectiveness of behavioral interventions in rare diseases is challenging due to extremely limited sample sizes and ethical challenges with withholding intervention when limited treatment options are available. The multiple baseline design (MBD) is commonly used in behavioral science to assess interventions, while allowing all individuals to receive the intervention. MBD is primarily used to evaluate a single intervention so an alternative strategy is needed when evaluating more than one intervention. In this case, a factorial design may be recommended, but a standard factorial design may not be feasible in rare diseases due to extremely limited sample sizes. To address this challenge, we propose the individually randomized multiple baseline factorial design (MBFD), which requires fewer participants but can attain sufficient statistical power for evaluating at least two interventions and their combinations. Furthermore, by incorporating randomization, we enhance the internal validity of the design. This study describes the design characteristics of a standard MBFD, clarifies estimands, and introduces three statistical models under different assumptions. Through simulations, we analyze data from MBFD using linear mixed effect models (LMM) and generalized estimating equations (GEE) to compare biases, sizes, and power of detecting the main effects from the models. We recommend using GEE to mitigate potential random effect misspecifications and suggest small sample corrections, such as Mancl and DeRouen variance estimator, for sample sizes below 120.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"58 1","pages":"30"},"PeriodicalIF":3.9,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12769596/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145905522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}