Pub Date : 2026-01-30DOI: 10.3758/s13428-025-02887-w
Kushin Mukherjee, Holly Huey, Laura M Stoinski, Martin N Hebart, Judith E Fan, Wilma A Bainbridge
The development of large datasets of natural images has galvanized progress in psychology, neuroscience, and computer science. Notably, the THINGS database constitutes a collective effort towards understanding of human visual knowledge by accumulating rich data on a shared set of visual object concepts across several studies. In this paper, we introduce Drawing of THINGS ( DoT ), a novel dataset of 28,627 human drawings of 1854 diverse object concepts, sampled systematically from concrete picturable and nameable nouns in the American English language, mirroring the structure of the THINGS image database. In addition to data on drawings' stroke history, we further collected fine-grained recognition data for each drawing, along with metadata on participant demographics, drawing ability, and mental imagery. We characterize people's ability to communicate and recognize semantic information encoded in drawings and compare this ability to their ability to recognize real-world images of the same visual objects. We also explore the relationship between drawing understanding and the memorability and typicality of the objects contained in THINGS. In sum, we envision DoT as a powerful tool that builds on the THINGS database to advance understanding of how humans express knowledge about visual concepts.
{"title":"Drawings of THINGS: A large-scale drawing dataset of 1854 object concepts.","authors":"Kushin Mukherjee, Holly Huey, Laura M Stoinski, Martin N Hebart, Judith E Fan, Wilma A Bainbridge","doi":"10.3758/s13428-025-02887-w","DOIUrl":"10.3758/s13428-025-02887-w","url":null,"abstract":"<p><p>The development of large datasets of natural images has galvanized progress in psychology, neuroscience, and computer science. Notably, the THINGS database constitutes a collective effort towards understanding of human visual knowledge by accumulating rich data on a shared set of visual object concepts across several studies. In this paper, we introduce Drawing of THINGS ( DoT ), a novel dataset of 28,627 human drawings of 1854 diverse object concepts, sampled systematically from concrete picturable and nameable nouns in the American English language, mirroring the structure of the THINGS image database. In addition to data on drawings' stroke history, we further collected fine-grained recognition data for each drawing, along with metadata on participant demographics, drawing ability, and mental imagery. We characterize people's ability to communicate and recognize semantic information encoded in drawings and compare this ability to their ability to recognize real-world images of the same visual objects. We also explore the relationship between drawing understanding and the memorability and typicality of the objects contained in THINGS. In sum, we envision DoT as a powerful tool that builds on the THINGS database to advance understanding of how humans express knowledge about visual concepts.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"58 2","pages":"57"},"PeriodicalIF":3.9,"publicationDate":"2026-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12858628/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146091869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-29DOI: 10.3758/s13428-025-02913-x
Wenshuo Li, Okan Bulut, Mark J Gierl, Sheng Zhang
Scale abbreviation is a crucial task for researchers aiming to reduce response burden and optimize data collection when using self-report instruments such as online surveys and questionnaires. Among various data-driven strategies available for scale abbreviation, supervised machine learning (SML) algorithms have emerged as a prominent approach due to their accuracy in predicting total scores from the original instrument. However, previous studies offer limited insights into how SML-abbreviated scales can be evaluated using both SML and psychometric metrics across different feature selection techniques. To address this gap, the current study aims to evaluate the effectiveness of seven feature selection methods: item-total-correlation-based filters (ITC), Minimum-Redundancy-Maximum-Relevance (MRMR), Lasso, Sequential Forward Selection (SFS), Sequential Backward Selection (SBS), Genetic Algorithms (GA), and Non-dominated Sorting Genetic Algorithms-II (NSGA-II), all used in conjunction with SML. Additionally, the psychometric properties of these SML methods are compared with two non-SML approaches. Using simulated datasets varying in sample size, model error, and factorial correlations, the study examines predictive accuracy, reliability, and the ability to recover both inter-subscale correlations and external criterion correlations. The findings indicate that no single method consistently excels across all conditions, with specific feature selection techniques performing better under certain circumstances. Key insights are provided to guide researchers in selecting appropriate feature selection methods based on their specific dataset characteristics and goals.
量表缩写是研究人员在使用在线调查和问卷等自我报告工具时,减轻响应负担和优化数据收集的关键任务。在各种可用于缩尺的数据驱动策略中,监督机器学习(SML)算法已成为一种突出的方法,因为它们在预测原始仪器的总分方面具有准确性。然而,先前的研究对如何在不同的特征选择技术中使用SML和心理测量指标来评估SML-缩略量表提供了有限的见解。为了解决这一差距,本研究旨在评估7种特征选择方法的有效性:基于项目总相关性的过滤器(ITC)、最小冗余-最大相关性(MRMR)、Lasso、顺序正向选择(SFS)、顺序向后选择(SBS)、遗传算法(GA)和非支配排序遗传算法- ii (NSGA-II),它们都与SML一起使用。此外,将这些方法与两种非SML方法的心理测量特性进行了比较。本研究使用不同样本量、模型误差和因子相关性的模拟数据集,检验了预测的准确性、可靠性以及恢复亚尺度间相关性和外部标准相关性的能力。研究结果表明,没有一种方法在所有条件下都表现出色,特定的特征选择技术在某些情况下表现更好。提供了关键的见解,以指导研究人员根据其特定的数据集特征和目标选择适当的特征选择方法。
{"title":"Scale abbreviation with supervised machine learning: A comparison of feature selection techniques.","authors":"Wenshuo Li, Okan Bulut, Mark J Gierl, Sheng Zhang","doi":"10.3758/s13428-025-02913-x","DOIUrl":"https://doi.org/10.3758/s13428-025-02913-x","url":null,"abstract":"<p><p>Scale abbreviation is a crucial task for researchers aiming to reduce response burden and optimize data collection when using self-report instruments such as online surveys and questionnaires. Among various data-driven strategies available for scale abbreviation, supervised machine learning (SML) algorithms have emerged as a prominent approach due to their accuracy in predicting total scores from the original instrument. However, previous studies offer limited insights into how SML-abbreviated scales can be evaluated using both SML and psychometric metrics across different feature selection techniques. To address this gap, the current study aims to evaluate the effectiveness of seven feature selection methods: item-total-correlation-based filters (ITC), Minimum-Redundancy-Maximum-Relevance (MRMR), Lasso, Sequential Forward Selection (SFS), Sequential Backward Selection (SBS), Genetic Algorithms (GA), and Non-dominated Sorting Genetic Algorithms-II (NSGA-II), all used in conjunction with SML. Additionally, the psychometric properties of these SML methods are compared with two non-SML approaches. Using simulated datasets varying in sample size, model error, and factorial correlations, the study examines predictive accuracy, reliability, and the ability to recover both inter-subscale correlations and external criterion correlations. The findings indicate that no single method consistently excels across all conditions, with specific feature selection techniques performing better under certain circumstances. Key insights are provided to guide researchers in selecting appropriate feature selection methods based on their specific dataset characteristics and goals.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"58 2","pages":"55"},"PeriodicalIF":3.9,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146083942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-29DOI: 10.3758/s13428-025-02897-8
Matthew K Robison, Stephen Campbell, Lauren D Garner, Ciara Sibley, Joseph Coyne
The present study examined individual differences in 24 measures of cognitive ability in a sample of young adults (N = 255). Each measure was completed twice, separated by a period of 2 weeks, to assess test-retest reliability and retesting (i.e., practice) effects. Latent variable modeling was used to assess the convergent and discriminant validity of the measures, as they were selected to measure seven different cognitive constructs (attention control, processing speed, working memory, primary memory, secondary memory, fluid intelligence, and spatial ability). The measures showed adequate to high intrasession and intersession reliability. Construct-level estimates were highly reliable, and the measurement structure was invariant across the two testing occasions. In several instances, correlations among latent variables warranted further testing to ensure adequate discriminability. Finally, latent state-trait modeling indicated that the majority of systematic variance in cognitive measures is due to latent traits, rather than state-specific or task-specific factors. We discuss the practical and theoretical implications of these findings.
{"title":"A comprehensive psychometrics of cognitive ability measures: Reliability, practice effects, and the stability of latent factor structures across retesting.","authors":"Matthew K Robison, Stephen Campbell, Lauren D Garner, Ciara Sibley, Joseph Coyne","doi":"10.3758/s13428-025-02897-8","DOIUrl":"10.3758/s13428-025-02897-8","url":null,"abstract":"<p><p>The present study examined individual differences in 24 measures of cognitive ability in a sample of young adults (N = 255). Each measure was completed twice, separated by a period of 2 weeks, to assess test-retest reliability and retesting (i.e., practice) effects. Latent variable modeling was used to assess the convergent and discriminant validity of the measures, as they were selected to measure seven different cognitive constructs (attention control, processing speed, working memory, primary memory, secondary memory, fluid intelligence, and spatial ability). The measures showed adequate to high intrasession and intersession reliability. Construct-level estimates were highly reliable, and the measurement structure was invariant across the two testing occasions. In several instances, correlations among latent variables warranted further testing to ensure adequate discriminability. Finally, latent state-trait modeling indicated that the majority of systematic variance in cognitive measures is due to latent traits, rather than state-specific or task-specific factors. We discuss the practical and theoretical implications of these findings.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"58 2","pages":"56"},"PeriodicalIF":3.9,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12855318/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146083978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-26DOI: 10.3758/s13428-025-02894-x
Dirk U Wulff, Pascal J Kieslich, Felix Henninger, Jonas M B Haslbeck, Michael Schulte-Mecklenbeck
{"title":"Publisher Correction: Movement tracking of psychological processes: A tutorial using mousetrap.","authors":"Dirk U Wulff, Pascal J Kieslich, Felix Henninger, Jonas M B Haslbeck, Michael Schulte-Mecklenbeck","doi":"10.3758/s13428-025-02894-x","DOIUrl":"10.3758/s13428-025-02894-x","url":null,"abstract":"","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"58 2","pages":"53"},"PeriodicalIF":3.9,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12835042/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146050125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-26DOI: 10.3758/s13428-025-02935-5
Linjieqiong Huang, Chenxi Li, Xingshan Li
In the absence of inter-word spaces, Chinese text sometimes presents word boundary ambiguity. One common case is the overlapping ambiguous string (OAS), a three-character string (ABC) where the middle character can form distinct words with both the character to its left (AB) and the character to its right (BC), creating segmentation ambiguity between AB-C and A-BC. This structure makes OASs a valuable tool for investigating the cognitive mechanisms of Chinese word segmentation. We introduce a comprehensive OAS database consisting of 952,497 OASs, each with 43 types of linguistic information at the character, word, and OAS levels. To illustrate how to use the database, we conducted an eye-tracking reading experiment manipulating whether the first character of the OAS (i.e., character A) could stand alone in sentences. Results showed that when character A could not stand alone, readers were more likely to group it with the next character B, leading to an AB-C segmentation. These findings validate the utility of the OAS database in understanding word segmentation during Chinese reading. The potential applications of the database in artificial intelligence, education, and writing system reform are discussed.
{"title":"A database of overlapping ambiguous strings in Chinese reading.","authors":"Linjieqiong Huang, Chenxi Li, Xingshan Li","doi":"10.3758/s13428-025-02935-5","DOIUrl":"https://doi.org/10.3758/s13428-025-02935-5","url":null,"abstract":"<p><p>In the absence of inter-word spaces, Chinese text sometimes presents word boundary ambiguity. One common case is the overlapping ambiguous string (OAS), a three-character string (ABC) where the middle character can form distinct words with both the character to its left (AB) and the character to its right (BC), creating segmentation ambiguity between AB-C and A-BC. This structure makes OASs a valuable tool for investigating the cognitive mechanisms of Chinese word segmentation. We introduce a comprehensive OAS database consisting of 952,497 OASs, each with 43 types of linguistic information at the character, word, and OAS levels. To illustrate how to use the database, we conducted an eye-tracking reading experiment manipulating whether the first character of the OAS (i.e., character A) could stand alone in sentences. Results showed that when character A could not stand alone, readers were more likely to group it with the next character B, leading to an AB-C segmentation. These findings validate the utility of the OAS database in understanding word segmentation during Chinese reading. The potential applications of the database in artificial intelligence, education, and writing system reform are discussed.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"58 2","pages":"51"},"PeriodicalIF":3.9,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146050169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-26DOI: 10.3758/s13428-025-02931-9
Chen Feng, Song Wang, Su Li
To facilitate research into the language development of preschool children, this article presents the Chinese Preschool Children's Spoken Lexical Database (CPCSLD), which is a lexical database built from a corpus of spontaneous speech production by 648 Chinese children aged between 3 and 6 years. The corpus comprises 1,199,851 word tokens, which include 21,372 unique words, 1,147 unique tonal syllables, and 400 unique atonal syllables. CPCSLD provides multiple distributional characteristics of both word-level and syllable-level information, including word frequency, token frequency, word length, word syntactic categories, tonal syllable frequency, and atonal syllable frequency, for words in the entire corpus as well as in the three grade-level sub-corpora (K1, K2, and K3). Using CPCSLD, we describe the developmental changes in word syntactic categories and frequency across K1, K2, and K3. Validation analyses showed an advantage for CPCSLD over existing databases in predicting children's picture-naming performance, but not in the semantic decision task. Correlation analyses further revealed distinct developmental patterns in lexical properties: CPCSLD's syllable frequencies were highly correlated with those from other child-based databases, whereas its word frequencies showed low correlations. These findings highlight CPCSLD's sensitivity to early lexical development and its value for research on language production in 3-6-year-old children.
{"title":"CPCSLD: A lexical database of Chinese preschool children's spoken words.","authors":"Chen Feng, Song Wang, Su Li","doi":"10.3758/s13428-025-02931-9","DOIUrl":"https://doi.org/10.3758/s13428-025-02931-9","url":null,"abstract":"<p><p>To facilitate research into the language development of preschool children, this article presents the Chinese Preschool Children's Spoken Lexical Database (CPCSLD), which is a lexical database built from a corpus of spontaneous speech production by 648 Chinese children aged between 3 and 6 years. The corpus comprises 1,199,851 word tokens, which include 21,372 unique words, 1,147 unique tonal syllables, and 400 unique atonal syllables. CPCSLD provides multiple distributional characteristics of both word-level and syllable-level information, including word frequency, token frequency, word length, word syntactic categories, tonal syllable frequency, and atonal syllable frequency, for words in the entire corpus as well as in the three grade-level sub-corpora (K1, K2, and K3). Using CPCSLD, we describe the developmental changes in word syntactic categories and frequency across K1, K2, and K3. Validation analyses showed an advantage for CPCSLD over existing databases in predicting children's picture-naming performance, but not in the semantic decision task. Correlation analyses further revealed distinct developmental patterns in lexical properties: CPCSLD's syllable frequencies were highly correlated with those from other child-based databases, whereas its word frequencies showed low correlations. These findings highlight CPCSLD's sensitivity to early lexical development and its value for research on language production in 3-6-year-old children.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"58 2","pages":"54"},"PeriodicalIF":3.9,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146050120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The dynamic nature of our environment allows us to anticipate the onset of relevant events, enhancing our responses to them. Temporal preparation can be assessed in the laboratory using various tasks, including foreperiod tasks, temporal orienting tasks, and rhythmic tasks. However, the existing literature lacks a unified task to measure the most common temporal preparation effects (i.e., foreperiod, sequential, temporal orienting, and rhythmic effects) in a single session. The main goal of the present study was to fill this gap by devising the temporal preparation task (TEP-Task) to measure temporal preparation effects in a single 35-min testing session. Besides its utility in single-session assessments, the TEP-Task may also serve for future research across diverse populations and experimental demands.
{"title":"Unifying temporal preparation: The temporal preparation task (TEP-Task).","authors":"Mariagrazia Capizzi, Lucie Attout, Giovanna Mioni, Pom Charras","doi":"10.3758/s13428-025-02908-8","DOIUrl":"https://doi.org/10.3758/s13428-025-02908-8","url":null,"abstract":"<p><p>The dynamic nature of our environment allows us to anticipate the onset of relevant events, enhancing our responses to them. Temporal preparation can be assessed in the laboratory using various tasks, including foreperiod tasks, temporal orienting tasks, and rhythmic tasks. However, the existing literature lacks a unified task to measure the most common temporal preparation effects (i.e., foreperiod, sequential, temporal orienting, and rhythmic effects) in a single session. The main goal of the present study was to fill this gap by devising the temporal preparation task (TEP-Task) to measure temporal preparation effects in a single 35-min testing session. Besides its utility in single-session assessments, the TEP-Task may also serve for future research across diverse populations and experimental demands.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"58 2","pages":"52"},"PeriodicalIF":3.9,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146050142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-23DOI: 10.3758/s13428-025-02925-7
Anna Gowenlock, Jennifer Rodd, Beth Malory, Courtenay Norbury
A growing number of psycholinguistic studies use methods from corpus linguistics to examine the language that children encounter in their environment, to understand how they might acquire different aspects of linguistic knowledge. Many of these studies focus on child-directed speech or children's literature, while there is a paucity of work focusing on children's television and video media. We describe the creation and contents of the Corpus of Children's Video Media (CCVM), a specialised corpus designed to represent the spoken language in television and online videos popular among 3-5-year-old children in the UK (available as a scrambled database of tokens). The CCVM was designed to be comparable to an existing corpus of child-directed speech (CDS). We used a dual sampling approach: inclusion decisions were guided by (a) a survey of parents with children in our target age group, and (b) a survey of programmes available on popular streaming platforms. The corpus consists of 233,471 tokens across 161 transcripts (43.12 h of video) and is available on the Open Science Framework (OSF) as a scrambled database of tokens (including gloss, stem, and lemma forms, and part-of-speech tags), organised within transcripts, together with relevant metadata for each transcript. We discuss the challenges of creating a corpus that is comparable to existing datasets and highlight the importance of transparency in this process. We take an open science approach, sharing a detailed data collection and processing protocol, code, and data so that the corpus can be evaluated, extended, and used appropriately by other research teams.
{"title":"Constructing the Corpus of Children's Video Media (CCVM): A new resource and guidelines for constructing comparable and reusable corpora.","authors":"Anna Gowenlock, Jennifer Rodd, Beth Malory, Courtenay Norbury","doi":"10.3758/s13428-025-02925-7","DOIUrl":"10.3758/s13428-025-02925-7","url":null,"abstract":"<p><p>A growing number of psycholinguistic studies use methods from corpus linguistics to examine the language that children encounter in their environment, to understand how they might acquire different aspects of linguistic knowledge. Many of these studies focus on child-directed speech or children's literature, while there is a paucity of work focusing on children's television and video media. We describe the creation and contents of the Corpus of Children's Video Media (CCVM), a specialised corpus designed to represent the spoken language in television and online videos popular among 3-5-year-old children in the UK (available as a scrambled database of tokens). The CCVM was designed to be comparable to an existing corpus of child-directed speech (CDS). We used a dual sampling approach: inclusion decisions were guided by (a) a survey of parents with children in our target age group, and (b) a survey of programmes available on popular streaming platforms. The corpus consists of 233,471 tokens across 161 transcripts (43.12 h of video) and is available on the Open Science Framework (OSF) as a scrambled database of tokens (including gloss, stem, and lemma forms, and part-of-speech tags), organised within transcripts, together with relevant metadata for each transcript. We discuss the challenges of creating a corpus that is comparable to existing datasets and highlight the importance of transparency in this process. We take an open science approach, sharing a detailed data collection and processing protocol, code, and data so that the corpus can be evaluated, extended, and used appropriately by other research teams.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"58 2","pages":"49"},"PeriodicalIF":3.9,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12830416/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146040329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-23DOI: 10.3758/s13428-025-02860-7
Erin M Buchanan, Mahmoud M Elsherif, Jason Geller, Chris L Aberson, Necdet Gurkan, Ettore Ambrosini, Tom Heyman, Maria Montefinese, Wolf Vanpaemel, Krystian Barzykowski, Carlota Batres, Katharina Fellnhofer, Guanxiong Huang, Joseph McFall, Gianni Ribeiro, Jan P Röer, José L Ulloa, Timo B Roettger, K D Valentine, Antonino Visalli, Kathleen Schmidt, Martin R Vasilev, Giada Viviani, Jacob F Miranda, Savannah C Lewis
The planning of sample size for research studies often focuses on obtaining a significant result given a specified level of power, significance, and an anticipated effect size. This planning requires prior knowledge of the study design and a statistical analysis to calculate the proposed sample size. However, there may not be one specific testable analysis from which to derive power (Silberzahn et al., Advances in Methods and Practices in Psychological Science, 1(3), 337356, 2018) or a hypothesis to test for the project (e.g., creation of a stimuli database). Modern power and sample size planning suggestions include accuracy in parameter estimation (AIPE, Kelley, Behavior Research Methods, 39(4), 755-766, 2007; Maxell et al., Annual Review of Psychology, 59, 537-563, 2008) and simulation of proposed analyses (Chalmers & Adkins, The Quantitative Methods for Psychology, 16(4), 248-280, 2020). These toolkits offer flexibility in traditional power analyses that focus on the if-this, then-that approach. However, both AIPE and simulation require either a specific parameter (e.g., mean, effect size, etc.) or a statistical test for planning sample size. In this tutorial, we explore how AIPE and simulation approaches can be combined to accommodate studies that may not have a specific hypothesis test or wish to account for the potential of a multiverse of analyses. Specifically, we focus on studies that use multiple items and suggest that sample sizes can be planned to measure those items adequately and precisely, regardless of the statistical test. This tutorial also provides multiple code vignettes and package functionality that researchers can adapt and apply to their own measures.
研究的样本量规划通常侧重于在给定的功率、显著性水平和预期效应大小的情况下获得显著的结果。这种计划需要事先了解研究设计和统计分析,以计算拟议的样本量。然而,可能没有一个特定的可测试分析可以从中获得力量(Silberzahn等人,《心理科学方法与实践的进展》,1(3),337356,2018)或一个假设来测试该项目(例如,创建刺激数据库)。现代功率和样本量规划建议包括参数估计的准确性(AIPE, Kelley,行为研究方法,39(4),755-766,2007;Maxell et al.,《心理学年度评论》,2008年第59期,537-563页),以及拟议分析的模拟(Chalmers & Adkins,《心理学的定量方法》,16(4),248- 280,2020)。这些工具包为传统的功率分析提供了灵活性,这些分析侧重于if-this, then-that方法。然而,AIPE和模拟都需要特定的参数(例如,平均值、效应大小等)或规划样本量的统计检验。在本教程中,我们将探讨如何将AIPE和模拟方法结合起来,以适应可能没有特定假设检验或希望解释多元宇宙分析潜力的研究。具体来说,我们关注的是使用多个项目的研究,并建议可以计划样本量,以充分和准确地测量这些项目,而不考虑统计测试。本教程还提供了多个代码片段和包功能,研究人员可以将其应用到自己的度量中。
{"title":"Accuracy in parameter estimation and simulation approaches for sample-size planning accounting for item effects.","authors":"Erin M Buchanan, Mahmoud M Elsherif, Jason Geller, Chris L Aberson, Necdet Gurkan, Ettore Ambrosini, Tom Heyman, Maria Montefinese, Wolf Vanpaemel, Krystian Barzykowski, Carlota Batres, Katharina Fellnhofer, Guanxiong Huang, Joseph McFall, Gianni Ribeiro, Jan P Röer, José L Ulloa, Timo B Roettger, K D Valentine, Antonino Visalli, Kathleen Schmidt, Martin R Vasilev, Giada Viviani, Jacob F Miranda, Savannah C Lewis","doi":"10.3758/s13428-025-02860-7","DOIUrl":"10.3758/s13428-025-02860-7","url":null,"abstract":"<p><p>The planning of sample size for research studies often focuses on obtaining a significant result given a specified level of power, significance, and an anticipated effect size. This planning requires prior knowledge of the study design and a statistical analysis to calculate the proposed sample size. However, there may not be one specific testable analysis from which to derive power (Silberzahn et al., Advances in Methods and Practices in Psychological Science, 1(3), 337356, 2018) or a hypothesis to test for the project (e.g., creation of a stimuli database). Modern power and sample size planning suggestions include accuracy in parameter estimation (AIPE, Kelley, Behavior Research Methods, 39(4), 755-766, 2007; Maxell et al., Annual Review of Psychology, 59, 537-563, 2008) and simulation of proposed analyses (Chalmers & Adkins, The Quantitative Methods for Psychology, 16(4), 248-280, 2020). These toolkits offer flexibility in traditional power analyses that focus on the if-this, then-that approach. However, both AIPE and simulation require either a specific parameter (e.g., mean, effect size, etc.) or a statistical test for planning sample size. In this tutorial, we explore how AIPE and simulation approaches can be combined to accommodate studies that may not have a specific hypothesis test or wish to account for the potential of a multiverse of analyses. Specifically, we focus on studies that use multiple items and suggest that sample sizes can be planned to measure those items adequately and precisely, regardless of the statistical test. This tutorial also provides multiple code vignettes and package functionality that researchers can adapt and apply to their own measures.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"58 2","pages":"48"},"PeriodicalIF":3.9,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12830498/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146040332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-23DOI: 10.3758/s13428-025-02920-y
Theodoros A Kyriazos, Mary Poga
Analytical flexibility is an inherent feature of quantitative research that, when exercised without constraint, transparency, or strong theoretical justification, produces systematic bias and undermines inferential validity. This article presents a conceptual and computational framework identifying 10 particularly impactful and prevalent questionable research practices (QRPs) that exemplify how hidden flexibility distorts scientific conclusions across four stages of the research workflow. Rather than proposing a new taxonomy, we operationalize a targeted subset of QRPs into a conceptual framework that links each practice to its underlying bias mechanism. We further map these mechanisms to 10 evidence-based corrective strategies designed to mitigate the specific inferential violations each practice produces. To support education and diagnostic exploration, we present a reproducible R-based simulation suite that allows researchers to examine the impact of QRPs and prevention strategies across context-specific design parameters. This framework contributes to research integrity by offering a theory-based, stage-specific, and simulation-supported approach to identifying, understanding, and preventing the most consequential forms of hidden analytical flexibility in quantitative research.
{"title":"Ten particularly frequent and consequential questionable research practices in quantitative research: Bias mechanisms, preventive strategies, and a simulation-based framework.","authors":"Theodoros A Kyriazos, Mary Poga","doi":"10.3758/s13428-025-02920-y","DOIUrl":"https://doi.org/10.3758/s13428-025-02920-y","url":null,"abstract":"<p><p>Analytical flexibility is an inherent feature of quantitative research that, when exercised without constraint, transparency, or strong theoretical justification, produces systematic bias and undermines inferential validity. This article presents a conceptual and computational framework identifying 10 particularly impactful and prevalent questionable research practices (QRPs) that exemplify how hidden flexibility distorts scientific conclusions across four stages of the research workflow. Rather than proposing a new taxonomy, we operationalize a targeted subset of QRPs into a conceptual framework that links each practice to its underlying bias mechanism. We further map these mechanisms to 10 evidence-based corrective strategies designed to mitigate the specific inferential violations each practice produces. To support education and diagnostic exploration, we present a reproducible R-based simulation suite that allows researchers to examine the impact of QRPs and prevention strategies across context-specific design parameters. This framework contributes to research integrity by offering a theory-based, stage-specific, and simulation-supported approach to identifying, understanding, and preventing the most consequential forms of hidden analytical flexibility in quantitative research.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"58 2","pages":"46"},"PeriodicalIF":3.9,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146040174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}