Mixed-format tests, which typically include dichotomous items and polytomously scored tasks, are employed to assess a wider range of knowledge and skills. Recent behavioral and educational studies have highlighted their practical importance and methodological developments, particularly within the context of multivariate generalizability theory. However, the diverse response types and complex designs of these tests pose significant analytical challenges when modeling data simultaneously. Current methods often struggle to yield reliable results, either due to the inappropriate treatment of different types of response data separately or the imposition of identical covariates across various response types. Moreover, there are few software packages or programs that offer customized solutions for modeling mixed-format tests, addressing these limitations. This tutorial provides a detailed example of using a Bayesian approach to model data collected from a mixed-format test, comprising multiple-choice questions and free-response tasks. The modeling was conducted using the Stan software within the R programming system, with Stan codes tailored to the structure of the test design, following the principles of multivariate generalizability theory. By further examining the effects of prior distributions in this example, this study demonstrates how the adaptability of Bayesian models to diverse test formats, coupled with their potential for nuanced analysis, can significantly advance the field of psychometric modeling.
混合形式测验通常包括二分项目和多分任务,用于评估更广泛的知识和技能。最近的行为学和教育学研究强调了它们的实际重要性和方法论的发展,特别是在多元概括性理论的背景下。然而,这些测验的反应类型多样,设计复杂,在同时建立数据模型时,给分析工作带来了巨大挑战。由于对不同类型的反应数据分别处理不当,或在不同反应类型中施加相同的协变量,目前的方法往往难以得出可靠的结果。此外,很少有软件包或程序能针对这些局限性为混合格式检验建模提供定制解决方案。本教程提供了一个详细示例,说明如何使用贝叶斯方法对从混合形式测验(包括多项选择题和自由回答任务)中收集的数据进行建模。建模是使用 R 编程系统中的 Stan 软件进行的,Stan 代码是根据测试设计的结构定制的,遵循了多元概括性理论的原则。通过进一步研究先验分布在本例中的影响,本研究展示了贝叶斯模型对不同测验形式的适应性,以及其进行细致分析的潜力,如何能极大地推动心理测量建模领域的发展。
{"title":"Customizing Bayesian multivariate generalizability theory to mixed-format tests.","authors":"Zhehan Jiang, Jinying Ouyang, Dingjing Shi, Dexin Shi, Jihong Zhang, Lingling Xu, Fen Cai","doi":"10.3758/s13428-024-02472-7","DOIUrl":"10.3758/s13428-024-02472-7","url":null,"abstract":"<p><p>Mixed-format tests, which typically include dichotomous items and polytomously scored tasks, are employed to assess a wider range of knowledge and skills. Recent behavioral and educational studies have highlighted their practical importance and methodological developments, particularly within the context of multivariate generalizability theory. However, the diverse response types and complex designs of these tests pose significant analytical challenges when modeling data simultaneously. Current methods often struggle to yield reliable results, either due to the inappropriate treatment of different types of response data separately or the imposition of identical covariates across various response types. Moreover, there are few software packages or programs that offer customized solutions for modeling mixed-format tests, addressing these limitations. This tutorial provides a detailed example of using a Bayesian approach to model data collected from a mixed-format test, comprising multiple-choice questions and free-response tasks. The modeling was conducted using the Stan software within the R programming system, with Stan codes tailored to the structure of the test design, following the principles of multivariate generalizability theory. By further examining the effects of prior distributions in this example, this study demonstrates how the adaptability of Bayesian models to diverse test formats, coupled with their potential for nuanced analysis, can significantly advance the field of psychometric modeling.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141787188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01Epub Date: 2023-10-16DOI: 10.3758/s13428-023-02243-w
Noah S Okada, Katherine L McNeely-White, Anne M Cleary, Brooke N Carlaw, Daniel L Drane, Thomas D Parsons, Timothy McMahan, Joseph Neisser, Nigel P Pedersen
Episodic memory may essentially be memory for one's place within a temporally unfolding scene from a first-person perspective. Given this, pervasively used static stimuli may only capture one small part of episodic memory. A promising approach for advancing the study of episodic memory is immersing participants within varying scenes from a first-person perspective. We present a pool of distinct scene stimuli for use in virtual environments and a paradigm that is implementable across varying levels of immersion on multiple virtual reality (VR) platforms and adaptable to studying various aspects of scene and episodic memory. In our task, participants are placed within a series of virtual environments from a first-person perspective and guided through a virtual tour of scenes during a study phase and a test phase. In the test phase, some scenes share a spatial layout with studied scenes; others are completely novel. In three experiments with varying degrees of immersion, we measure scene recall, scene familiarity-detection during recall failure, the subjective experience of déjà vu, the ability to predict the next turn on a tour, the subjective sense of being able to predict the next turn on a tour, and the factors that influence memory search and the inclination to generate candidate recollective information. The level of first-person immersion mattered to multiple facets of episodic memory. The paradigm presents a useful means of advancing mechanistic understanding of how memory operates in realistic dynamic scene environments, including in combination with cognitive neuroscience methods such as functional magnetic resonance imaging and electrophysiology.
{"title":"A virtual reality paradigm with dynamic scene stimuli for use in memory research.","authors":"Noah S Okada, Katherine L McNeely-White, Anne M Cleary, Brooke N Carlaw, Daniel L Drane, Thomas D Parsons, Timothy McMahan, Joseph Neisser, Nigel P Pedersen","doi":"10.3758/s13428-023-02243-w","DOIUrl":"10.3758/s13428-023-02243-w","url":null,"abstract":"<p><p>Episodic memory may essentially be memory for one's place within a temporally unfolding scene from a first-person perspective. Given this, pervasively used static stimuli may only capture one small part of episodic memory. A promising approach for advancing the study of episodic memory is immersing participants within varying scenes from a first-person perspective. We present a pool of distinct scene stimuli for use in virtual environments and a paradigm that is implementable across varying levels of immersion on multiple virtual reality (VR) platforms and adaptable to studying various aspects of scene and episodic memory. In our task, participants are placed within a series of virtual environments from a first-person perspective and guided through a virtual tour of scenes during a study phase and a test phase. In the test phase, some scenes share a spatial layout with studied scenes; others are completely novel. In three experiments with varying degrees of immersion, we measure scene recall, scene familiarity-detection during recall failure, the subjective experience of déjà vu, the ability to predict the next turn on a tour, the subjective sense of being able to predict the next turn on a tour, and the factors that influence memory search and the inclination to generate candidate recollective information. The level of first-person immersion mattered to multiple facets of episodic memory. The paradigm presents a useful means of advancing mechanistic understanding of how memory operates in realistic dynamic scene environments, including in combination with cognitive neuroscience methods such as functional magnetic resonance imaging and electrophysiology.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11018716/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41232061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01Epub Date: 2024-03-04DOI: 10.3758/s13428-024-02368-6
Alexander Mielke, Gal Badihi, Kirsty E Graham, Charlotte Grund, Chie Hashimoto, Alex K Piel, Alexandra Safryghin, Katie E Slocombe, Fiona Stewart, Claudia Wilke, Klaus Zuberbühler, Catherine Hobaiter
Parsing signals from noise is a general problem for signallers and recipients, and for researchers studying communicative systems. Substantial efforts have been invested in comparing how other species encode information and meaning, and how signalling is structured. However, research depends on identifying and discriminating signals that represent meaningful units of analysis. Early approaches to defining signal repertoires applied top-down approaches, classifying cases into predefined signal types. Recently, more labour-intensive methods have taken a bottom-up approach describing detailed features of each signal and clustering cases based on patterns of similarity in multi-dimensional feature-space that were previously undetectable. Nevertheless, it remains essential to assess whether the resulting repertoires are composed of relevant units from the perspective of the species using them, and redefining repertoires when additional data become available. In this paper we provide a framework that takes data from the largest set of wild chimpanzee (Pan troglodytes) gestures currently available, splitting gesture types at a fine scale based on modifying features of gesture expression using latent class analysis (a model-based cluster detection algorithm for categorical variables), and then determining whether this splitting process reduces uncertainty about the goal or community of the gesture. Our method allows different features of interest to be incorporated into the splitting process, providing substantial future flexibility across, for example, species, populations, and levels of signal granularity. Doing so, we provide a powerful tool allowing researchers interested in gestural communication to establish repertoires of relevant units for subsequent analyses within and between systems of communication.
{"title":"Many morphs: Parsing gesture signals from the noise.","authors":"Alexander Mielke, Gal Badihi, Kirsty E Graham, Charlotte Grund, Chie Hashimoto, Alex K Piel, Alexandra Safryghin, Katie E Slocombe, Fiona Stewart, Claudia Wilke, Klaus Zuberbühler, Catherine Hobaiter","doi":"10.3758/s13428-024-02368-6","DOIUrl":"10.3758/s13428-024-02368-6","url":null,"abstract":"<p><p>Parsing signals from noise is a general problem for signallers and recipients, and for researchers studying communicative systems. Substantial efforts have been invested in comparing how other species encode information and meaning, and how signalling is structured. However, research depends on identifying and discriminating signals that represent meaningful units of analysis. Early approaches to defining signal repertoires applied top-down approaches, classifying cases into predefined signal types. Recently, more labour-intensive methods have taken a bottom-up approach describing detailed features of each signal and clustering cases based on patterns of similarity in multi-dimensional feature-space that were previously undetectable. Nevertheless, it remains essential to assess whether the resulting repertoires are composed of relevant units from the perspective of the species using them, and redefining repertoires when additional data become available. In this paper we provide a framework that takes data from the largest set of wild chimpanzee (Pan troglodytes) gestures currently available, splitting gesture types at a fine scale based on modifying features of gesture expression using latent class analysis (a model-based cluster detection algorithm for categorical variables), and then determining whether this splitting process reduces uncertainty about the goal or community of the gesture. Our method allows different features of interest to be incorporated into the splitting process, providing substantial future flexibility across, for example, species, populations, and levels of signal granularity. Doing so, we provide a powerful tool allowing researchers interested in gestural communication to establish repertoires of relevant units for subsequent analyses within and between systems of communication.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11362259/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140027307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01Epub Date: 2024-05-28DOI: 10.3758/s13428-024-02396-2
Elisa S Buchberger, Chi T Ngo, Aaron Peikert, Andreas M Brandmaier, Markus Werkle-Bergner
Determining the compositional structure and dimensionality of psychological constructs lies at the heart of many research questions in developmental science. Structural equation modeling (SEM) provides a versatile framework for formalizing and estimating the relationships among multiple latent constructs. While the flexibility of SEM can accommodate many complex assumptions on the underlying structure of psychological constructs, it makes a priori estimation of statistical power and required sample size challenging. This difficulty is magnified when comparing non-nested SEMs, which prevents the use of traditional likelihood-ratio tests. Sample size estimates for SEM model fit comparisons typically rely on generic rules of thumb. Such heuristics can be misleading because statistical power in SEM depends on a variety of model properties. Here, we demonstrate a Monte Carlo simulation approach for estimating a priori statistical power for model selection when comparing non-nested models in an SEM framework. We provide a step-by-step guide to this approach based on an example from our memory development research in children.
确定心理建构的组成结构和维度是发展科学中许多研究问题的核心。结构方程建模(SEM)为正式确定和估计多个潜在建构之间的关系提供了一个通用框架。虽然结构方程模型的灵活性可以容纳许多关于心理建构基础结构的复杂假设,但它对统计能力和所需样本量的先验估计具有挑战性。在比较非嵌套的 SEM 时,这种困难就会被放大,从而无法使用传统的似然比检验。SEM 模型拟合比较的样本量估计通常依赖于通用的经验法则。这种启发式方法可能会产生误导,因为 SEM 的统计能力取决于各种模型属性。在此,我们展示了一种蒙特卡罗模拟方法,用于在 SEM 框架下比较非嵌套模型时,估计用于模型选择的先验统计能力。我们以儿童记忆力发展研究为例,逐步介绍了这种方法。
{"title":"Estimating statistical power for structural equation models in developmental cognitive science: A tutorial in R : Power simulation for SEMs.","authors":"Elisa S Buchberger, Chi T Ngo, Aaron Peikert, Andreas M Brandmaier, Markus Werkle-Bergner","doi":"10.3758/s13428-024-02396-2","DOIUrl":"10.3758/s13428-024-02396-2","url":null,"abstract":"<p><p>Determining the compositional structure and dimensionality of psychological constructs lies at the heart of many research questions in developmental science. Structural equation modeling (SEM) provides a versatile framework for formalizing and estimating the relationships among multiple latent constructs. While the flexibility of SEM can accommodate many complex assumptions on the underlying structure of psychological constructs, it makes a priori estimation of statistical power and required sample size challenging. This difficulty is magnified when comparing non-nested SEMs, which prevents the use of traditional likelihood-ratio tests. Sample size estimates for SEM model fit comparisons typically rely on generic rules of thumb. Such heuristics can be misleading because statistical power in SEM depends on a variety of model properties. Here, we demonstrate a Monte Carlo simulation approach for estimating a priori statistical power for model selection when comparing non-nested models in an SEM framework. We provide a step-by-step guide to this approach based on an example from our memory development research in children.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11362481/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141161225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01Epub Date: 2024-05-29DOI: 10.3758/s13428-024-02419-y
Jeff Miller
A methodological problem in most reaction time (RT) studies is that some measured RTs may be outliers-that is, they may be very fast or very slow for reasons unconnected to the task-related processing of interest. Numerous ad hoc methods have been suggested to discriminate between such outliers and the valid RTs of interest, but it is extremely difficult to determine how well these methods work in practice because virtually nothing is known about the actual characteristics of outliers in real RT datasets. This article proposes a new method of pooling cumulative distribution function values for examining empirical RT distributions to assess both the proportions of outliers and their latencies relative to those of the valid RTs. As the method is developed, its strengths and weaknesses are examined using simulations based on previously suggested ad hoc models for RT outliers with particular assumed proportions and distributions of valid RTs and outliers. The method is then applied to several large RT datasets from lexical decision tasks, and the results provide the first empirically based description of outlier RTs. For these datasets, fewer than 1% of the RTs seem to be outliers, and the median outlier latency appears to be approximately 4-6 standard deviations of RT above the mean of the valid RT distribution.
{"title":"Estimating the proportions and latencies of reaction time outliers: A pooling method and case study of lexical decision tasks.","authors":"Jeff Miller","doi":"10.3758/s13428-024-02419-y","DOIUrl":"10.3758/s13428-024-02419-y","url":null,"abstract":"<p><p>A methodological problem in most reaction time (RT) studies is that some measured RTs may be outliers-that is, they may be very fast or very slow for reasons unconnected to the task-related processing of interest. Numerous ad hoc methods have been suggested to discriminate between such outliers and the valid RTs of interest, but it is extremely difficult to determine how well these methods work in practice because virtually nothing is known about the actual characteristics of outliers in real RT datasets. This article proposes a new method of pooling cumulative distribution function values for examining empirical RT distributions to assess both the proportions of outliers and their latencies relative to those of the valid RTs. As the method is developed, its strengths and weaknesses are examined using simulations based on previously suggested ad hoc models for RT outliers with particular assumed proportions and distributions of valid RTs and outliers. The method is then applied to several large RT datasets from lexical decision tasks, and the results provide the first empirically based description of outlier RTs. For these datasets, fewer than 1% of the RTs seem to be outliers, and the median outlier latency appears to be approximately 4-6 standard deviations of RT above the mean of the valid RT distribution.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11362516/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141173763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01Epub Date: 2024-06-05DOI: 10.3758/s13428-024-02422-3
Kendall A Mather, Sara J Weston, David M Condon
The assessment of creativity as an individual difference has historically focused on divergent thinking, which is increasingly viewed as involving the associative processes that are also understood to be a key component of creative potential. Research on associative processes has proliferated in many sub-fields, often using Compound Remote Associates (CRA) tasks with an open response format and relatively small participant samples. In the present work, we introduce a new format that is more amenable to large-scale data collection in survey designs, and present evidence for the reliability and validity of CRA measures in general using multiple large samples. Study 1 uses a large, representative dataset (N = 1,323,480) to demonstrate strong unidimensionality and internal consistency (α = .97; ωt = .87), as well as links to individual differences in temperament, cognitive ability, occupation, and job characteristics. Study 2 uses an undergraduate sample (N = 685) to validate the use of a multiple-choice format relative to the traditional approach. Study 3 uses a crowdsourced sample (N = 357) to demonstrate high test-retest reliability of the items (r =.74). Finally, Study 4 uses a sample that overlaps with Study 1 (N = 1,502,922) to provide item response theory (IRT) parameters for a large set of high-quality CRA items that use a multiple-choice response mode, thus facilitating their use in future research on creativity, insight, and related topics.
{"title":"Scaling a common assessment of associative ability: Development and validation of a multiple-choice compound remote associates task.","authors":"Kendall A Mather, Sara J Weston, David M Condon","doi":"10.3758/s13428-024-02422-3","DOIUrl":"10.3758/s13428-024-02422-3","url":null,"abstract":"<p><p>The assessment of creativity as an individual difference has historically focused on divergent thinking, which is increasingly viewed as involving the associative processes that are also understood to be a key component of creative potential. Research on associative processes has proliferated in many sub-fields, often using Compound Remote Associates (CRA) tasks with an open response format and relatively small participant samples. In the present work, we introduce a new format that is more amenable to large-scale data collection in survey designs, and present evidence for the reliability and validity of CRA measures in general using multiple large samples. Study 1 uses a large, representative dataset (N = 1,323,480) to demonstrate strong unidimensionality and internal consistency (α = .97; ωt = .87), as well as links to individual differences in temperament, cognitive ability, occupation, and job characteristics. Study 2 uses an undergraduate sample (N = 685) to validate the use of a multiple-choice format relative to the traditional approach. Study 3 uses a crowdsourced sample (N = 357) to demonstrate high test-retest reliability of the items (r =.74). Finally, Study 4 uses a sample that overlaps with Study 1 (N = 1,502,922) to provide item response theory (IRT) parameters for a large set of high-quality CRA items that use a multiple-choice response mode, thus facilitating their use in future research on creativity, insight, and related topics.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141261464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01Epub Date: 2024-06-24DOI: 10.3758/s13428-024-02439-8
Rajath Shenoy, Lyndsey Nickels, Gopee Krishnan
There have been many published picture corpora. However, more than half of the world's population speaks more than one language and, as language and culture are intertwined, some of the items from a picture corpus designed for a given language in a particular culture may not fit another culture (with the same or different language). There is also an awareness that language research can gain from the study of bi-/multilingual individuals who are immersed in multilingual contexts that foster inter-language interactions. Consequently, we developed a relatively large corpus of pictures (663 nouns, 96 verbs) and collected normative data from multilingual speakers of Kannada (a southern Indian language) on two picture-related measures (name agreement, image agreement) and three word-related measures (familiarity, subjective frequency, age of acquisition), and report objective visual complexity and syllable count of the words. Naming labels were classified into words from the target language (i.e., Kannada), cognates (borrowed from/shared with another language), translation equivalents, and elaborations. The picture corpus had > 85% mean concept agreement with multiple acceptable names (1-7 naming labels) for each concept. The mean percentage name agreement for the modal name was > 70%, with H-statistics of 0.89 for nouns and 0.52 for verbs. We also analyse the variability of responses highlighting the influence of bi-/multilingualism on (picture) naming. The picture corpus is freely accessible to researchers and clinicians. It may be used for future standardization with other languages of similar cultural contexts, and relevant items can be used in languages from different cultures, following suitable standardization.
已出版的图片语料库很多。然而,世界上一半以上的人口使用一种以上的语言,由于语言和文化相互交织,为特定文化中的特定语言设计的图片语料库中的某些项目可能不适合另一种文化(使用相同或不同的语言)。我们还意识到,语言研究可以从对双语/多语个人的研究中获益,因为这些人沉浸在促进语言间互动的多语言环境中。因此,我们开发了一个相对较大的图片语料库(663 个名词,96 个动词),并收集了讲卡纳达语(印度南部语言)的多语言者的规范数据,包括两个与图片相关的测量指标(名称一致、图像一致)和三个与单词相关的测量指标(熟悉度、主观频率、习得年龄),以及客观视觉复杂度和单词音节数的报告。命名标签分为目标语言(即卡纳达语)中的词语、同源词(借用/与其他语言共享)、翻译等效词和阐述词。图片语料库的平均概念一致率大于 85%,每个概念都有多个可接受的名称(1-7 个命名标签)。模态名称的平均名称一致率大于 70%,名词的 H 统计量为 0.89,动词的 H 统计量为 0.52。我们还分析了回答的差异性,强调了双语/多语种对(图片)命名的影响。研究人员和临床医生可免费访问图片语料库。在适当的标准化之后,相关项目可用于不同文化背景的语言。
{"title":"Naming in a multilingual context: Norms for the ICMR-Manipal colour picture corpus in Kannada from the Indian context.","authors":"Rajath Shenoy, Lyndsey Nickels, Gopee Krishnan","doi":"10.3758/s13428-024-02439-8","DOIUrl":"10.3758/s13428-024-02439-8","url":null,"abstract":"<p><p>There have been many published picture corpora. However, more than half of the world's population speaks more than one language and, as language and culture are intertwined, some of the items from a picture corpus designed for a given language in a particular culture may not fit another culture (with the same or different language). There is also an awareness that language research can gain from the study of bi-/multilingual individuals who are immersed in multilingual contexts that foster inter-language interactions. Consequently, we developed a relatively large corpus of pictures (663 nouns, 96 verbs) and collected normative data from multilingual speakers of Kannada (a southern Indian language) on two picture-related measures (name agreement, image agreement) and three word-related measures (familiarity, subjective frequency, age of acquisition), and report objective visual complexity and syllable count of the words. Naming labels were classified into words from the target language (i.e., Kannada), cognates (borrowed from/shared with another language), translation equivalents, and elaborations. The picture corpus had > 85% mean concept agreement with multiple acceptable names (1-7 naming labels) for each concept. The mean percentage name agreement for the modal name was > 70%, with H-statistics of 0.89 for nouns and 0.52 for verbs. We also analyse the variability of responses highlighting the influence of bi-/multilingualism on (picture) naming. The picture corpus is freely accessible to researchers and clinicians. It may be used for future standardization with other languages of similar cultural contexts, and relevant items can be used in languages from different cultures, following suitable standardization.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11362232/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141445375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01Epub Date: 2024-07-03DOI: 10.3758/s13428-024-02447-8
Yiyang Chen, Heather R Daly, Mark A Pitt, Trisha Van Zandt
The discriminability measure is widely used in psychology to estimate sensitivity independently of response bias. The conventional approach to estimate involves a transformation from the hit rate and the false-alarm rate. When performance is perfect, correction methods must be applied to calculate , but these corrections distort the estimate. In three simulation studies, we show that distortion in estimation can arise from other properties of the experimental design (number of trials, sample size, sample variance, task difficulty) that, when combined with application of the correction method, make distortion in any specific experiment design complex and can mislead statistical inference in the worst cases (Type I and Type II errors). To address this problem, we propose that researchers simulate estimation to explore the impact of design choices, given anticipated or observed data. An R Shiny application is introduced that estimates distortion, providing researchers the means to identify distortion and take steps to minimize its impact.
心理学中广泛使用可辨别度量 d ' 来估算灵敏度,而不考虑反应偏差。估算 d ' 的传统方法包括对命中率和误报率进行转换。当表现完美时,必须使用校正方法来计算 d ' ,但这些校正会扭曲估计值。在三项模拟研究中,我们发现实验设计的其他属性(试验次数、样本大小、样本方差、任务难度)也会导致 d ' 估计值失真,这些属性与校正方法的应用相结合,会使任何特定实验设计中的 d ' 失真变得复杂,并在最坏的情况下误导统计推断(第一类和第二类错误)。为了解决这个问题,我们建议研究人员模拟 d' 估计,以探索设计选择对预期或观察数据的影响。我们介绍了一个 R Shiny 应用程序,它可以估算 d ' 失真,为研究人员提供识别失真并采取措施尽量减少其影响的方法。
{"title":"Assessing the distortions introduced when calculating d': A simulation approach.","authors":"Yiyang Chen, Heather R Daly, Mark A Pitt, Trisha Van Zandt","doi":"10.3758/s13428-024-02447-8","DOIUrl":"10.3758/s13428-024-02447-8","url":null,"abstract":"<p><p>The discriminability measure <math><msup><mi>d</mi> <mo>'</mo></msup> </math> is widely used in psychology to estimate sensitivity independently of response bias. The conventional approach to estimate <math><msup><mi>d</mi> <mo>'</mo></msup> </math> involves a transformation from the hit rate and the false-alarm rate. When performance is perfect, correction methods must be applied to calculate <math><msup><mi>d</mi> <mo>'</mo></msup> </math> , but these corrections distort the estimate. In three simulation studies, we show that distortion in <math><msup><mi>d</mi> <mo>'</mo></msup> </math> estimation can arise from other properties of the experimental design (number of trials, sample size, sample variance, task difficulty) that, when combined with application of the correction method, make <math><msup><mi>d</mi> <mo>'</mo></msup> </math> distortion in any specific experiment design complex and can mislead statistical inference in the worst cases (Type I and Type II errors). To address this problem, we propose that researchers simulate <math><msup><mi>d</mi> <mo>'</mo></msup> </math> estimation to explore the impact of design choices, given anticipated or observed data. An R Shiny application is introduced that estimates <math><msup><mi>d</mi> <mo>'</mo></msup> </math> distortion, providing researchers the means to identify distortion and take steps to minimize its impact.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141496962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01Epub Date: 2024-07-15DOI: 10.3758/s13428-024-02465-6
Brian Miller, Marcia Simmering, Elizabeth Ragland
This research is an extension of the recent scale development efforts for the marker variable Attitude Toward the Color Blue (ATCB), which addresses the efficacy of multiple shorter permutations of the scale. The purpose of this study is to develop a shorter version of an ideal marker variable scale used to detect common method variance. Potential uses of the shorter version of ATCB include intensive longitudinal studies, implementation of experience sampling methodology, or any brief survey for which the original version might be cumbersome to implement repeatedly or appear very odd to the respondent when paired with only a few other substantive items. Study 1, uses all six-, five-, and four-item versions of ATCB in confirmatory factor analysis (CFA) marker technique tests on a bivariate relationship. Study 2 analyzes the best- and worst-performing versions of reduced lengths of the ATCB scale found in the first study on another bivariate relationship. Study 3 compares the original seven-item version, as well as randomly selected reduced length versions in a data set with 15 model relationships. Study 4 uses an experiment to determine the efficacy of providing respondents with one of three shorter ATCB scales in a model of three substantive variables. Our findings indicate that ATCB of different permutations and lengths can detect CMV successfully, and that researchers should choose the length of scale based on their survey length. We conclude that ATCB is adaptable for a variety of research situations, presenting it as a valuable tool for high-quality research.
{"title":"Effective and adaptable: Four studies on the shortened attitude toward the color blue marker variable scale.","authors":"Brian Miller, Marcia Simmering, Elizabeth Ragland","doi":"10.3758/s13428-024-02465-6","DOIUrl":"10.3758/s13428-024-02465-6","url":null,"abstract":"<p><p>This research is an extension of the recent scale development efforts for the marker variable Attitude Toward the Color Blue (ATCB), which addresses the efficacy of multiple shorter permutations of the scale. The purpose of this study is to develop a shorter version of an ideal marker variable scale used to detect common method variance. Potential uses of the shorter version of ATCB include intensive longitudinal studies, implementation of experience sampling methodology, or any brief survey for which the original version might be cumbersome to implement repeatedly or appear very odd to the respondent when paired with only a few other substantive items. Study 1, uses all six-, five-, and four-item versions of ATCB in confirmatory factor analysis (CFA) marker technique tests on a bivariate relationship. Study 2 analyzes the best- and worst-performing versions of reduced lengths of the ATCB scale found in the first study on another bivariate relationship. Study 3 compares the original seven-item version, as well as randomly selected reduced length versions in a data set with 15 model relationships. Study 4 uses an experiment to determine the efficacy of providing respondents with one of three shorter ATCB scales in a model of three substantive variables. Our findings indicate that ATCB of different permutations and lengths can detect CMV successfully, and that researchers should choose the length of scale based on their survey length. We conclude that ATCB is adaptable for a variety of research situations, presenting it as a valuable tool for high-quality research.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141619151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01Epub Date: 2024-07-25DOI: 10.3758/s13428-024-02446-9
Anthony P Zanesco, Nicholas T Van Dam, Ekaterina Denkova, Amishi P Jha
The tendency for individuals to mind wander is often measured using experience sampling methods in which probe questions embedded within computerized cognitive tasks attempt to catch episodes of off-task thought at random intervals during task performance. However, mind-wandering probe questions and response options are often chosen ad hoc and vary between studies with extant little guidance as to the psychometric consequences of these methodological decisions. In the present study, we examined the psychometric properties of several common approaches for assessing mind wandering using methods from item response theory (IRT). IRT latent modeling demonstrated that measurement information was generally distributed across the range of trait estimates according to when probes were presented in time. Probes presented earlier in time provided more information about individuals with greater tendency to mind wandering than probes presented later. Furthermore, mind-wandering ratings made on a continuous scale or using multiple categorical rating options provided more information about individuals' latent mind-wandering tendency - across a broader range of the trait continuum - than ratings dichotomized into on-task and off-task categories. In addition, IRT provided evidence that reports of "task-related thoughts" contribute to the task-focused dimension of the construct continuum, providing justification for studies conceptualizing these responses as a kind of task-related focus. Together, we hope these findings will help guide researchers hoping to maximize the measurement precision of their mind wandering assessment procedures.
{"title":"Measuring mind wandering with experience sampling during task performance: An item response theory investigation.","authors":"Anthony P Zanesco, Nicholas T Van Dam, Ekaterina Denkova, Amishi P Jha","doi":"10.3758/s13428-024-02446-9","DOIUrl":"10.3758/s13428-024-02446-9","url":null,"abstract":"<p><p>The tendency for individuals to mind wander is often measured using experience sampling methods in which probe questions embedded within computerized cognitive tasks attempt to catch episodes of off-task thought at random intervals during task performance. However, mind-wandering probe questions and response options are often chosen ad hoc and vary between studies with extant little guidance as to the psychometric consequences of these methodological decisions. In the present study, we examined the psychometric properties of several common approaches for assessing mind wandering using methods from item response theory (IRT). IRT latent modeling demonstrated that measurement information was generally distributed across the range of trait estimates according to when probes were presented in time. Probes presented earlier in time provided more information about individuals with greater tendency to mind wandering than probes presented later. Furthermore, mind-wandering ratings made on a continuous scale or using multiple categorical rating options provided more information about individuals' latent mind-wandering tendency - across a broader range of the trait continuum - than ratings dichotomized into on-task and off-task categories. In addition, IRT provided evidence that reports of \"task-related thoughts\" contribute to the task-focused dimension of the construct continuum, providing justification for studies conceptualizing these responses as a kind of task-related focus. Together, we hope these findings will help guide researchers hoping to maximize the measurement precision of their mind wandering assessment procedures.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":null,"pages":null},"PeriodicalIF":4.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11362314/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141756851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}