Pub Date : 2024-12-01Epub Date: 2024-08-19DOI: 10.3758/s13428-024-02450-z
Anastasia Lada, Philippe Paquier, Ifigenia Dosi, Christina Manouilidou, Simone Sprenger, Stefanie Keulen
Idioms differ from other forms of figurative language because of their dimensions of subjective frequency, ambiguity (possibility of having a literal interpretation), and decomposability (possibility of the idiom's words to assist in its figurative interpretation). This study focuses on the Greek language and aims at providing the first corpus of 400 Greek idioms rated for their dimensions by 113 native Greek students, aged 19 to 39 years. The study aimed at (1) rating all idioms for their degree of subjective frequency, ambiguity, and decomposability, and (2) investigating the relationships between these dimensions. Three different assessments were conducted, during which the participants were asked to evaluate the degree of idioms' subjective frequency, ambiguity, and decomposability. The idioms were selected from a dictionary of Greek idioms titled "Dictionary of Idioms in Modern Greek" (Vlaxopoulos, 2007). This study resulted in the first database of Greek idioms assessed for their dimensions. The intraclass correlation coefficient (ICC) (two-way mixed, absolute agreement) demonstrated high internal consistency in the ratings given for each dimension, for the same idiom, by the different individual raters. Correlational analyses showed that subjective frequency was positively moderately correlated with decomposability, and positively weakly correlated with ambiguity, while decomposability was positively moderately correlated with ambiguity.
{"title":"Four hundred Greek idiomatic expressions: Ratings for subjective frequency, ambiguity, and decomposability.","authors":"Anastasia Lada, Philippe Paquier, Ifigenia Dosi, Christina Manouilidou, Simone Sprenger, Stefanie Keulen","doi":"10.3758/s13428-024-02450-z","DOIUrl":"10.3758/s13428-024-02450-z","url":null,"abstract":"<p><p>Idioms differ from other forms of figurative language because of their dimensions of subjective frequency, ambiguity (possibility of having a literal interpretation), and decomposability (possibility of the idiom's words to assist in its figurative interpretation). This study focuses on the Greek language and aims at providing the first corpus of 400 Greek idioms rated for their dimensions by 113 native Greek students, aged 19 to 39 years. The study aimed at (1) rating all idioms for their degree of subjective frequency, ambiguity, and decomposability, and (2) investigating the relationships between these dimensions. Three different assessments were conducted, during which the participants were asked to evaluate the degree of idioms' subjective frequency, ambiguity, and decomposability. The idioms were selected from a dictionary of Greek idioms titled \"Dictionary of Idioms in Modern Greek\" (Vlaxopoulos, 2007). This study resulted in the first database of Greek idioms assessed for their dimensions. The intraclass correlation coefficient (ICC) (two-way mixed, absolute agreement) demonstrated high internal consistency in the ratings given for each dimension, for the same idiom, by the different individual raters. Correlational analyses showed that subjective frequency was positively moderately correlated with decomposability, and positively weakly correlated with ambiguity, while decomposability was positively moderately correlated with ambiguity.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":" ","pages":"8181-8195"},"PeriodicalIF":4.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142003504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2024-08-15DOI: 10.3758/s13428-024-02455-8
Zak Hussain, Marcel Binz, Rui Mata, Dirk U Wulff
Large language models (LLMs) have the potential to revolutionize behavioral science by accelerating and improving the research cycle, from conceptualization to data analysis. Unlike closed-source solutions, open-source frameworks for LLMs can enable transparency, reproducibility, and adherence to data protection standards, which gives them a crucial advantage for use in behavioral science. To help researchers harness the promise of LLMs, this tutorial offers a primer on the open-source Hugging Face ecosystem and demonstrates several applications that advance conceptual and empirical work in behavioral science, including feature extraction, fine-tuning of models for prediction, and generation of behavioral responses. Executable code is made available at github.com/Zak-Hussain/LLM4BeSci.git . Finally, the tutorial discusses challenges faced by research with (open-source) LLMs related to interpretability and safety and offers a perspective on future research at the intersection of language modeling and behavioral science.
大型语言模型(LLMs)可以加速和改善从概念化到数据分析的研究周期,从而有可能彻底改变行为科学。与封闭源代码的解决方案不同,LLMs 的开源框架可以实现透明性、可重复性并遵守数据保护标准,这为它们在行为科学领域的应用提供了至关重要的优势。为了帮助研究人员利用 LLMs 的前景,本教程提供了有关开源 Hugging Face 生态系统的入门知识,并演示了几种推进行为科学概念和实证工作的应用,包括特征提取、预测模型的微调和行为反应的生成。可执行代码可在 github.com/Zak-Hussain/LLM4BeSci.git 上获取。最后,教程讨论了使用(开源)LLM 进行研究时面临的与可解释性和安全性相关的挑战,并对语言建模和行为科学交叉领域的未来研究提出了展望。
{"title":"A tutorial on open-source large language models for behavioral science.","authors":"Zak Hussain, Marcel Binz, Rui Mata, Dirk U Wulff","doi":"10.3758/s13428-024-02455-8","DOIUrl":"10.3758/s13428-024-02455-8","url":null,"abstract":"<p><p>Large language models (LLMs) have the potential to revolutionize behavioral science by accelerating and improving the research cycle, from conceptualization to data analysis. Unlike closed-source solutions, open-source frameworks for LLMs can enable transparency, reproducibility, and adherence to data protection standards, which gives them a crucial advantage for use in behavioral science. To help researchers harness the promise of LLMs, this tutorial offers a primer on the open-source Hugging Face ecosystem and demonstrates several applications that advance conceptual and empirical work in behavioral science, including feature extraction, fine-tuning of models for prediction, and generation of behavioral responses. Executable code is made available at github.com/Zak-Hussain/LLM4BeSci.git . Finally, the tutorial discusses challenges faced by research with (open-source) LLMs related to interpretability and safety and offers a perspective on future research at the intersection of language modeling and behavioral science.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":" ","pages":"8214-8237"},"PeriodicalIF":4.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525391/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141987375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2024-08-20DOI: 10.3758/s13428-024-02485-2
Masaki Uto, Kota Aramaki
For essay-writing tests, challenges arise when scores assigned to essays are influenced by the characteristics of raters, such as rater severity and consistency. Item response theory (IRT) models incorporating rater parameters have been developed to tackle this issue, exemplified by the many-facet Rasch models. These IRT models enable the estimation of examinees' abilities while accounting for the impact of rater characteristics, thereby enhancing the accuracy of ability measurement. However, difficulties can arise when different groups of examinees are evaluated by different sets of raters. In such cases, test linking is essential for unifying the scale of model parameters estimated for individual examinee-rater groups. Traditional test-linking methods typically require administrators to design groups in which either examinees or raters are partially shared. However, this is often impractical in real-world testing scenarios. To address this, we introduce a novel method for linking the parameters of IRT models with rater parameters that uses neural automated essay scoring technology. Our experimental results indicate that our method successfully accomplishes test linking with accuracy comparable to that of linear linking using few common examinees.
{"title":"Linking essay-writing tests using many-facet models and neural automated essay scoring.","authors":"Masaki Uto, Kota Aramaki","doi":"10.3758/s13428-024-02485-2","DOIUrl":"10.3758/s13428-024-02485-2","url":null,"abstract":"<p><p>For essay-writing tests, challenges arise when scores assigned to essays are influenced by the characteristics of raters, such as rater severity and consistency. Item response theory (IRT) models incorporating rater parameters have been developed to tackle this issue, exemplified by the many-facet Rasch models. These IRT models enable the estimation of examinees' abilities while accounting for the impact of rater characteristics, thereby enhancing the accuracy of ability measurement. However, difficulties can arise when different groups of examinees are evaluated by different sets of raters. In such cases, test linking is essential for unifying the scale of model parameters estimated for individual examinee-rater groups. Traditional test-linking methods typically require administrators to design groups in which either examinees or raters are partially shared. However, this is often impractical in real-world testing scenarios. To address this, we introduce a novel method for linking the parameters of IRT models with rater parameters that uses neural automated essay scoring technology. Our experimental results indicate that our method successfully accomplishes test linking with accuracy comparable to that of linear linking using few common examinees.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":" ","pages":"8450-8479"},"PeriodicalIF":4.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525454/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142008164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2024-09-25DOI: 10.3758/s13428-024-02501-5
David A Ellis, John Towse, Olivia Brown, Alicia Cork, Brittany I Davidson, Sophie Devereux, Joanne Hinds, Matthew Ivory, Sophie Nightingale, Douglas A Parry, Lukasz Piwek, Heather Shaw, Andrea S Towse
Psychological science has thrived thanks to new methods and innovative practices. Journals, including Behavior Research Methods (BRM), continue to support the dissemination and evaluation of research assets including data, software/hardware, statistical code, and databases of stimuli. However, such research assets rarely allow for computational reproducibility, meaning they are difficult to reuse. Therefore, in this preregistered report, we explore how BRM's authors and BRM structures shape the landscape of functional research assets. Our broad research questions concern: (1) How quickly methods and analytical techniques reported in BRM can be used and developed further by other scientists; (2) Whether functionality has improved following changes to BRM journal policy in support of computational reproducibility; (3) Whether we can disentangle such policy changes from changes in reproducibility over time. We randomly sampled equal numbers of papers (N = 204) published in BRM before and after the implementation of policy changes. Pairs of researchers recorded how long it took to ensure assets (data, software/hardware, statistical code, and materials) were fully operational. They also coded the completeness and reusability of the assets. While improvements were observed in all measures, only changes to completeness were altered significantly following the policy changes (d = .37). The effects varied between different types of research assets, with data sets from surveys/experiments showing the largest improvements in completeness and reusability. Perhaps more importantly, changes to policy do appear to have improved the life span of research products by reducing natural decline. We conclude with a discussion of how, in the future, research and policy might better support computational reproducibility within and beyond psychological science.
{"title":"Assessing computational reproducibility in Behavior Research Methods.","authors":"David A Ellis, John Towse, Olivia Brown, Alicia Cork, Brittany I Davidson, Sophie Devereux, Joanne Hinds, Matthew Ivory, Sophie Nightingale, Douglas A Parry, Lukasz Piwek, Heather Shaw, Andrea S Towse","doi":"10.3758/s13428-024-02501-5","DOIUrl":"10.3758/s13428-024-02501-5","url":null,"abstract":"<p><p>Psychological science has thrived thanks to new methods and innovative practices. Journals, including Behavior Research Methods (BRM), continue to support the dissemination and evaluation of research assets including data, software/hardware, statistical code, and databases of stimuli. However, such research assets rarely allow for computational reproducibility, meaning they are difficult to reuse. Therefore, in this preregistered report, we explore how BRM's authors and BRM structures shape the landscape of functional research assets. Our broad research questions concern: (1) How quickly methods and analytical techniques reported in BRM can be used and developed further by other scientists; (2) Whether functionality has improved following changes to BRM journal policy in support of computational reproducibility; (3) Whether we can disentangle such policy changes from changes in reproducibility over time. We randomly sampled equal numbers of papers (N = 204) published in BRM before and after the implementation of policy changes. Pairs of researchers recorded how long it took to ensure assets (data, software/hardware, statistical code, and materials) were fully operational. They also coded the completeness and reusability of the assets. While improvements were observed in all measures, only changes to completeness were altered significantly following the policy changes (d = .37). The effects varied between different types of research assets, with data sets from surveys/experiments showing the largest improvements in completeness and reusability. Perhaps more importantly, changes to policy do appear to have improved the life span of research products by reducing natural decline. We conclude with a discussion of how, in the future, research and policy might better support computational reproducibility within and beyond psychological science.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":" ","pages":"8745-8760"},"PeriodicalIF":4.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525395/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142340233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2024-08-12DOI: 10.3758/s13428-024-02467-4
Diederick C Niehorster, Marianne Gullberg, Marcus Nyström
When lab resources are shared among multiple research projects, issues such as experimental integrity, replicability, and data safety become important. Different research projects often need different software and settings that may well conflict with one another, and data collected for one project may not be safeguarded from exposure to researchers from other projects. In this paper we provide an infrastructure design and an open-source tool, labManager, that render multi-user lab facilities in the behavioral sciences accessible to research projects with widely varying needs. The solutions proposed ensure ease of management while simultaneously offering maximum flexibility by providing research projects with fully separated bare metal environments. This solution also ensures that collected data is kept separate, and compliant with relevant ethical standards and regulations such as General Data Protection Regulation (GDPR) legislation. Furthermore, we discuss preconditions for running shared lab facilities and provide practical advice.
{"title":"Behavioral science labs: How to solve the multi-user problem.","authors":"Diederick C Niehorster, Marianne Gullberg, Marcus Nyström","doi":"10.3758/s13428-024-02467-4","DOIUrl":"10.3758/s13428-024-02467-4","url":null,"abstract":"<p><p>When lab resources are shared among multiple research projects, issues such as experimental integrity, replicability, and data safety become important. Different research projects often need different software and settings that may well conflict with one another, and data collected for one project may not be safeguarded from exposure to researchers from other projects. In this paper we provide an infrastructure design and an open-source tool, labManager, that render multi-user lab facilities in the behavioral sciences accessible to research projects with widely varying needs. The solutions proposed ensure ease of management while simultaneously offering maximum flexibility by providing research projects with fully separated bare metal environments. This solution also ensures that collected data is kept separate, and compliant with relevant ethical standards and regulations such as General Data Protection Regulation (GDPR) legislation. Furthermore, we discuss preconditions for running shared lab facilities and provide practical advice.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":" ","pages":"8238-8258"},"PeriodicalIF":4.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525434/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141970535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2024-08-26DOI: 10.3758/s13428-024-02483-4
Christoph Kiefer, Sarah Wilker, Axel Mayer
In psychology and the social sciences, researchers often model count outcome variables accounting for latent predictors and their interactions. Even though neglecting measurement error in such count regression models (e.g., Poisson or negative binomial regression) can have unfavorable consequences like attenuation bias, such analyses are often carried out in the generalized linear model (GLM) framework using fallible covariates such as sum scores. An alternative is count regression models based on structural equation modeling, which allow to specify latent covariates and thereby account for measurement error. However, the issue of how and when to include interactions between latent covariates or between latent and manifest covariates is rarely discussed for count regression models. In this paper, we present a latent variable count regression model (LV-CRM) allowing for latent covariates as well as interactions among both latent and manifest covariates. We conducted three simulation studies, investigating the estimation accuracy of the LV-CRM and comparing it to GLM-based count regression models. Interestingly, we found that even in scenarios with high reliabilities, the regression coefficients from a GLM-based model can be severely biased. In contrast, even for moderate sample sizes, the LV-CRM provided virtually unbiased regression coefficients. Additionally, statistical inferences yielded mixed results for the GLM-based models (i.e., low coverage rates, but acceptable empirical detection rates), but were generally acceptable using the LV-CRM. We provide an applied example from clinical psychology illustrating how the LV-CRM framework can be used to model count regressions with latent interactions.
{"title":"Interactions between latent variables in count regression models.","authors":"Christoph Kiefer, Sarah Wilker, Axel Mayer","doi":"10.3758/s13428-024-02483-4","DOIUrl":"10.3758/s13428-024-02483-4","url":null,"abstract":"<p><p>In psychology and the social sciences, researchers often model count outcome variables accounting for latent predictors and their interactions. Even though neglecting measurement error in such count regression models (e.g., Poisson or negative binomial regression) can have unfavorable consequences like attenuation bias, such analyses are often carried out in the generalized linear model (GLM) framework using fallible covariates such as sum scores. An alternative is count regression models based on structural equation modeling, which allow to specify latent covariates and thereby account for measurement error. However, the issue of how and when to include interactions between latent covariates or between latent and manifest covariates is rarely discussed for count regression models. In this paper, we present a latent variable count regression model (LV-CRM) allowing for latent covariates as well as interactions among both latent and manifest covariates. We conducted three simulation studies, investigating the estimation accuracy of the LV-CRM and comparing it to GLM-based count regression models. Interestingly, we found that even in scenarios with high reliabilities, the regression coefficients from a GLM-based model can be severely biased. In contrast, even for moderate sample sizes, the LV-CRM provided virtually unbiased regression coefficients. Additionally, statistical inferences yielded mixed results for the GLM-based models (i.e., low coverage rates, but acceptable empirical detection rates), but were generally acceptable using the LV-CRM. We provide an applied example from clinical psychology illustrating how the LV-CRM framework can be used to model count regressions with latent interactions.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":" ","pages":"8932-8954"},"PeriodicalIF":4.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525413/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142071898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2024-09-20DOI: 10.3758/s13428-024-02493-2
Alejandrina Cristia, Lucas Gautheron, Zixing Zhang, Björn Schuller, Camila Scaff, Caroline Rowland, Okko Räsänen, Loann Peurey, Marvin Lavechin, William Havard, Caitlin M Fausey, Margaret Cychosz, Elika Bergelson, Heather Anderson, Najla Al Futaisi, Melanie Soderstrom
Long-form audio recordings are increasingly used to study individual variation, group differences, and many other topics in theoretical and applied fields of developmental science, particularly for the description of children's language input (typically speech from adults) and children's language output (ranging from babble to sentences). The proprietary LENA software has been available for over a decade, and with it, users have come to rely on derived metrics like adult word count (AWC) and child vocalization counts (CVC), which have also more recently been derived using an open-source alternative, the ACLEW pipeline. Yet, there is relatively little work assessing the reliability of long-form metrics in terms of the stability of individual differences across time. Filling this gap, we analyzed eight spoken-language datasets: four from North American English-learning infants, and one each from British English-, French-, American English-/Spanish-, and Quechua-/Spanish-learning infants. The audio data were analyzed using two types of processing software: LENA and the ACLEW open-source pipeline. When all corpora were included, we found relatively low to moderate reliability (across multiple recordings, intraclass correlation coefficient attributed to the child identity [Child ICC], was < 50% for most metrics). There were few differences between the two pipelines. Exploratory analyses suggested some differences as a function of child age and corpora. These findings suggest that, while reliability is likely sufficient for various group-level analyses, caution is needed when using either LENA or ACLEW tools to study individual variation. We also encourage improvement of extant tools, specifically targeting accurate measurement of individual variation.
长篇录音越来越多地被用于研究个体差异、群体差异以及发育科学理论和应用领域的许多其他课题,特别是用于描述儿童的语言输入(通常是成人的讲话)和儿童的语言输出(从咿呀学语到句子)。专有的 LENA 软件已问世十多年,用户已开始依赖成人词数(AWC)和儿童发声数(CVC)等衍生指标。然而,就个体差异在不同时期的稳定性而言,评估长式指标可靠性的工作相对较少。为了填补这一空白,我们分析了八个口语数据集:四个数据集来自学习北美英语的婴儿,另一个数据集来自学习英国英语、法语、美国英语/西班牙语和克丘亚语/西班牙语的婴儿。音频数据使用两种处理软件进行分析:LENA 和 ACLEW 开源管道。当包含所有语料库时,我们发现了相对较低到中等的可靠性(在多个录音中,归因于儿童身份的类内相关系数 [Child ICC] 为
{"title":"Establishing the reliability of metrics extracted from long-form recordings using LENA and the ACLEW pipeline.","authors":"Alejandrina Cristia, Lucas Gautheron, Zixing Zhang, Björn Schuller, Camila Scaff, Caroline Rowland, Okko Räsänen, Loann Peurey, Marvin Lavechin, William Havard, Caitlin M Fausey, Margaret Cychosz, Elika Bergelson, Heather Anderson, Najla Al Futaisi, Melanie Soderstrom","doi":"10.3758/s13428-024-02493-2","DOIUrl":"10.3758/s13428-024-02493-2","url":null,"abstract":"<p><p>Long-form audio recordings are increasingly used to study individual variation, group differences, and many other topics in theoretical and applied fields of developmental science, particularly for the description of children's language input (typically speech from adults) and children's language output (ranging from babble to sentences). The proprietary LENA software has been available for over a decade, and with it, users have come to rely on derived metrics like adult word count (AWC) and child vocalization counts (CVC), which have also more recently been derived using an open-source alternative, the ACLEW pipeline. Yet, there is relatively little work assessing the reliability of long-form metrics in terms of the stability of individual differences across time. Filling this gap, we analyzed eight spoken-language datasets: four from North American English-learning infants, and one each from British English-, French-, American English-/Spanish-, and Quechua-/Spanish-learning infants. The audio data were analyzed using two types of processing software: LENA and the ACLEW open-source pipeline. When all corpora were included, we found relatively low to moderate reliability (across multiple recordings, intraclass correlation coefficient attributed to the child identity [Child ICC], was < 50% for most metrics). There were few differences between the two pipelines. Exploratory analyses suggested some differences as a function of child age and corpora. These findings suggest that, while reliability is likely sufficient for various group-level analyses, caution is needed when using either LENA or ACLEW tools to study individual variation. We also encourage improvement of extant tools, specifically targeting accurate measurement of individual variation.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":" ","pages":"8588-8607"},"PeriodicalIF":4.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142279941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2024-09-20DOI: 10.3758/s13428-024-02476-3
Julien P Irmer, Andreas G Klein, Karin Schermelleh-Engel
The model-implied simulation-based power estimation (MSPE) approach is a new general method for power estimation (Irmer et al., 2024). MSPE was developed especially for power estimation of non-linear structural equation models (SEM), but it also can be applied to linear SEM and manifest models using the R package powerNLSEM. After first providing some information about MSPE and the new adaptive algorithm that automatically selects sample sizes for the best prediction of power using simulation, a tutorial on how to conduct the MSPE for quadratic and interaction SEM (QISEM) using the powerNLSEM package is provided. Power estimation is demonstrated for four methods, latent moderated structural equations (LMS), the unconstrained product indicator (UPI), a simple factor score regression (FSR), and a scale regression (SR) approach to QISEM. In two simulation studies, we highlight the performance of the MSPE for all four methods applied to two QISEM with varying complexity and reliability. Further, we justify the settings of the newly developed adaptive search algorithm via performance evaluations using simulation. Overall, the MSPE using the adaptive approach performs well in terms of bias and Type I error rates.
基于模型推导模拟的功率估计(MSPE)方法是一种新的功率估计通用方法(Irmer 等人,2024 年)。MSPE 是专为非线性结构方程模型(SEM)的幂估计而开发的,但也可使用 R 软件包 powerNLSEM 用于线性 SEM 和显式模型。首先介绍了有关 MSPE 和新的自适应算法的一些信息,该算法可通过模拟自动选择样本大小以获得最佳的预测功率,然后介绍了如何使用 powerNLSEM 软件包对二次和交互 SEM (QISEM) 进行 MSPE。我们演示了四种方法的功率估计,即潜在调节结构方程 (LMS)、无约束乘积指标 (UPI)、简单因子得分回归 (FSR) 和 QISEM 的尺度回归 (SR) 方法。在两项模拟研究中,我们强调了 MSPE 对所有四种方法的性能,并将其应用于两个具有不同复杂性和可靠性的 QISEM。此外,我们还通过模拟性能评估来证明新开发的自适应搜索算法的设置是合理的。总体而言,使用自适应方法的 MSPE 在偏差和 I 类错误率方面表现良好。
{"title":"Estimating power in complex nonlinear structural equation modeling including moderation effects: The powerNLSEM R-package.","authors":"Julien P Irmer, Andreas G Klein, Karin Schermelleh-Engel","doi":"10.3758/s13428-024-02476-3","DOIUrl":"10.3758/s13428-024-02476-3","url":null,"abstract":"<p><p>The model-implied simulation-based power estimation (MSPE) approach is a new general method for power estimation (Irmer et al., 2024). MSPE was developed especially for power estimation of non-linear structural equation models (SEM), but it also can be applied to linear SEM and manifest models using the R package powerNLSEM. After first providing some information about MSPE and the new adaptive algorithm that automatically selects sample sizes for the best prediction of power using simulation, a tutorial on how to conduct the MSPE for quadratic and interaction SEM (QISEM) using the powerNLSEM package is provided. Power estimation is demonstrated for four methods, latent moderated structural equations (LMS), the unconstrained product indicator (UPI), a simple factor score regression (FSR), and a scale regression (SR) approach to QISEM. In two simulation studies, we highlight the performance of the MSPE for all four methods applied to two QISEM with varying complexity and reliability. Further, we justify the settings of the newly developed adaptive search algorithm via performance evaluations using simulation. Overall, the MSPE using the adaptive approach performs well in terms of bias and Type I error rates.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":" ","pages":"8897-8931"},"PeriodicalIF":4.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525415/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142279942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2024-08-07DOI: 10.3758/s13428-024-02474-5
Anna M Langener, Björn S Siepe, Mahmoud Elsherif, Koen Niemeijer, Pia K Andresen, Samir Akre, Laura F Bringmann, Zachary D Cohen, Nathaniel R Choukas, Konstantin Drexl, Luisa Fassi, James Green, Tabea Hoffmann, Raj R Jagesar, Martien J H Kas, Sebastian Kurten, Ramona Schoedel, Gert Stulp, Georgia Turner, Nicholas C Jacobson
Passive smartphone measures hold significant potential and are increasingly employed in psychological and biomedical research to capture an individual's behavior. These measures involve the near-continuous and unobtrusive collection of data from smartphones without requiring active input from participants. For example, GPS sensors are used to determine the (social) context of a person, and accelerometers to measure movement. However, utilizing passive smartphone measures presents methodological challenges during data collection and analysis. Researchers must make multiple decisions when working with such measures, which can result in different conclusions. Unfortunately, the transparency of these decision-making processes is often lacking. The implementation of open science practices is only beginning to emerge in digital phenotyping studies and varies widely across studies. Well-intentioned researchers may fail to report on some decisions due to the variety of choices that must be made. To address this issue and enhance reproducibility in digital phenotyping studies, we propose the adoption of preregistration as a way forward. Although there have been some attempts to preregister digital phenotyping studies, a template for registering such studies is currently missing. This could be problematic due to the high level of complexity that requires a well-structured template. Therefore, our objective was to develop a preregistration template that is easy to use and understandable for researchers. Additionally, we explain this template and provide resources to assist researchers in making informed decisions regarding data collection, cleaning, and analysis. Overall, we aim to make researchers' choices explicit, enhance transparency, and elevate the standards for studies utilizing passive smartphone measures.
{"title":"A template and tutorial for preregistering studies using passive smartphone measures.","authors":"Anna M Langener, Björn S Siepe, Mahmoud Elsherif, Koen Niemeijer, Pia K Andresen, Samir Akre, Laura F Bringmann, Zachary D Cohen, Nathaniel R Choukas, Konstantin Drexl, Luisa Fassi, James Green, Tabea Hoffmann, Raj R Jagesar, Martien J H Kas, Sebastian Kurten, Ramona Schoedel, Gert Stulp, Georgia Turner, Nicholas C Jacobson","doi":"10.3758/s13428-024-02474-5","DOIUrl":"10.3758/s13428-024-02474-5","url":null,"abstract":"<p><p>Passive smartphone measures hold significant potential and are increasingly employed in psychological and biomedical research to capture an individual's behavior. These measures involve the near-continuous and unobtrusive collection of data from smartphones without requiring active input from participants. For example, GPS sensors are used to determine the (social) context of a person, and accelerometers to measure movement. However, utilizing passive smartphone measures presents methodological challenges during data collection and analysis. Researchers must make multiple decisions when working with such measures, which can result in different conclusions. Unfortunately, the transparency of these decision-making processes is often lacking. The implementation of open science practices is only beginning to emerge in digital phenotyping studies and varies widely across studies. Well-intentioned researchers may fail to report on some decisions due to the variety of choices that must be made. To address this issue and enhance reproducibility in digital phenotyping studies, we propose the adoption of preregistration as a way forward. Although there have been some attempts to preregister digital phenotyping studies, a template for registering such studies is currently missing. This could be problematic due to the high level of complexity that requires a well-structured template. Therefore, our objective was to develop a preregistration template that is easy to use and understandable for researchers. Additionally, we explain this template and provide resources to assist researchers in making informed decisions regarding data collection, cleaning, and analysis. Overall, we aim to make researchers' choices explicit, enhance transparency, and elevate the standards for studies utilizing passive smartphone measures.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":" ","pages":"8289-8307"},"PeriodicalIF":4.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525430/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141900815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2024-10-14DOI: 10.3758/s13428-024-02497-y
Edgar Erdfelder, Julian Quevedo Pütter, Martin Schnuerch
Multinomial processing tree (MPT) models are prominent and frequently used tools to model and measure cognitive processes underlying responses in many experimental paradigms. Although MPT models typically refer to cognitive processes within single individuals, they have often been applied to group data aggregated across individuals. We investigate the conditions under which MPT analyses of aggregate data make sense. After introducing the notions of structural and empirical aggregation invariance of MPT models, we show that any MPT model that holds at the level of single individuals must also hold at the aggregate level when it is both structurally and empirically aggregation invariant. Moreover, group-level parameters of aggregation-invariant MPT models are equivalent to the expected values (i.e., means) of the corresponding individual parameters. To investigate the robustness of MPT results for aggregate data when one or both invariance conditions are violated, we additionally performed a series of simulation studies, systematically manipulating (1) the sample sizes in different trees of the model, (2) model parameterization, (3) means and variances of crucial model parameters, and (4) their correlations with other parameters of the respective MPT model. Overall, our results show that MPT parameter estimates based on aggregate data are trustworthy under rather general conditions, provided that a few preconditions are met.
{"title":"On aggregation invariance of multinomial processing tree models.","authors":"Edgar Erdfelder, Julian Quevedo Pütter, Martin Schnuerch","doi":"10.3758/s13428-024-02497-y","DOIUrl":"10.3758/s13428-024-02497-y","url":null,"abstract":"<p><p>Multinomial processing tree (MPT) models are prominent and frequently used tools to model and measure cognitive processes underlying responses in many experimental paradigms. Although MPT models typically refer to cognitive processes within single individuals, they have often been applied to group data aggregated across individuals. We investigate the conditions under which MPT analyses of aggregate data make sense. After introducing the notions of structural and empirical aggregation invariance of MPT models, we show that any MPT model that holds at the level of single individuals must also hold at the aggregate level when it is both structurally and empirically aggregation invariant. Moreover, group-level parameters of aggregation-invariant MPT models are equivalent to the expected values (i.e., means) of the corresponding individual parameters. To investigate the robustness of MPT results for aggregate data when one or both invariance conditions are violated, we additionally performed a series of simulation studies, systematically manipulating (1) the sample sizes in different trees of the model, (2) model parameterization, (3) means and variances of crucial model parameters, and (4) their correlations with other parameters of the respective MPT model. Overall, our results show that MPT parameter estimates based on aggregate data are trustworthy under rather general conditions, provided that a few preconditions are met.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":" ","pages":"8677-8694"},"PeriodicalIF":4.6,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525265/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142456954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}