Behavior Research Methods最新文献_第7页

A normative database of Swahili-Chinese paired associates. 斯瓦希里语-汉语配对的规范数据库。

IF 4.6 2区心理学 Q1 PSYCHOLOGY, EXPERIMENTAL

Behavior Research Methods

Pub Date : 2025-01-03 DOI: 10.3758/s13428-024-02531-z

Tian Fan, Wenbo Zhao, Bukuan Sun, Shaohang Liu, Yue Yin, Muzi Xu, Xiao Hu, Chunliang Yang, Liang Luo

Over the past few decades, Swahili-English and Lithuanian-English word pair databases have been extensively utilized in research on learning and memory. However, these normative databases are specifically designed for generating study stimuli in learning and memory research involving native (or fluent) English speakers. Consequently, they are not suitable for investigations that encompass populations whose first language is not English, such as Chinese individuals. Notably, native Chinese speakers constitute a substantial proportion, approximately 18%, of the global population. The current study aims to establish a new database of translation equivalences, specifically tailored to facilitate research on learning, memory, and metacognition among the Chinese population. We present a comprehensive set of normative measures for 200 Swahili-Chinese paired associates, including recall accuracy, recall latency, error patterns, confidence ratings, perceived learning difficulty, judgments of learning, and perceived learning interestingness for the entire word pairs. Additionally, we include word-likeness ratings and word length for the Swahili words, and concreteness ratings, familiarity ratings, word frequency, and number of strokes for the Chinese words. This diverse array of measures, gathered across a substantial number of Swahili-Chinese word pairs, is poised to effectively support future research seeking to investigate the intricate processes of learning, memory and metacognition within the Chinese population.

在过去的几十年里，斯瓦希里语-英语和立陶宛语-英语词对数据库在学习和记忆研究中得到了广泛的应用。然而，这些规范的数据库是专门为在涉及母语（或流利）英语人士的学习和记忆研究中产生学习刺激而设计的。因此，它们不适合调查包括第一语言不是英语的人群，如中国人。值得注意的是，以汉语为母语的人占全球人口的相当大比例，约占18%。本研究旨在建立一个新的翻译对等数据库，以促进汉语学习、记忆和元认知方面的研究。我们对200对斯瓦希里语-汉语配对的联想对象进行了一套全面的规范测量，包括回忆正确率、回忆延迟、错误模式、信心评级、感知学习困难、学习判断和感知学习兴趣。此外，我们还包括斯瓦希里语单词的单词相似度评分和单词长度评分，以及中文单词的具体度评分、熟悉度评分、词频和笔画数评分。从大量斯瓦希里语-汉语对词中收集的这些不同的测量方法，将有效地支持未来的研究，以调查中国人学习、记忆和元认知的复杂过程。

{"title":"A normative database of Swahili-Chinese paired associates.","authors":"Tian Fan, Wenbo Zhao, Bukuan Sun, Shaohang Liu, Yue Yin, Muzi Xu, Xiao Hu, Chunliang Yang, Liang Luo","doi":"10.3758/s13428-024-02531-z","DOIUrl":"10.3758/s13428-024-02531-z","url":null,"abstract":"Over the past few decades, Swahili-English and Lithuanian-English word pair databases have been extensively utilized in research on learning and memory. However, these normative databases are specifically designed for generating study stimuli in learning and memory research involving native (or fluent) English speakers. Consequently, they are not suitable for investigations that encompass populations whose first language is not English, such as Chinese individuals. Notably, native Chinese speakers constitute a substantial proportion, approximately 18%, of the global population. The current study aims to establish a new database of translation equivalences, specifically tailored to facilitate research on learning, memory, and metacognition among the Chinese population. We present a comprehensive set of normative measures for 200 Swahili-Chinese paired associates, including recall accuracy, recall latency, error patterns, confidence ratings, perceived learning difficulty, judgments of learning, and perceived learning interestingness for the entire word pairs. Additionally, we include word-likeness ratings and word length for the Swahili words, and concreteness ratings, familiarity ratings, word frequency, and number of strokes for the Chinese words. This diverse array of measures, gathered across a substantial number of Swahili-Chinese word pairs, is poised to effectively support future research seeking to investigate the intricate processes of learning, memory and metacognition within the Chinese population.","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 1","pages":"40"},"PeriodicalIF":4.6,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142920528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A multiverse assessment of the reliability of the self-matching task as a measurement of the self-prioritization effect. 自匹配任务的多重宇宙可靠性评估作为自优先级效应的测量。

IF 4.6 2区心理学 Q1 PSYCHOLOGY, EXPERIMENTAL

Behavior Research Methods

Pub Date : 2025-01-02 DOI: 10.3758/s13428-024-02538-6

Zheng Liu, Mengzhen Hu, Yuanrui Zheng, Jie Sui, Hu Chuan-Peng

The self-matching task (SMT) is widely used to investigate the cognitive mechanisms underlying the self-prioritization effect (SPE), wherein performance is enhanced for self-associated stimuli compared to other-associated ones. Although the SMT robustly elicits the SPE, there is a lack of data quantifying the reliability of this paradigm. This is problematic, given the prevalence of the reliability paradox in cognitive tasks: many well-established cognitive tasks demonstrate relatively low reliability when used to evaluate individual differences, despite exhibiting replicable effects at the group level. To fill this gap, this preregistered study investigated the reliability of SPE derived from the SMT using a multiverse approach, combining all possible indicators and baselines reported in the literature. We first examined the robustness of 24 SPE measures across 42 datasets (N = 2250) using a meta-analytical approach. We then calculated the split-half reliability (r) and intraclass correlation coefficient (ICC2) for each SPE measure. Our findings revealed a robust group-level SPE across datasets. However, when evaluating individual differences, SPE indices derived from reaction time (RT) and efficiency exhibited relatively higher, compared to other SPE indices, but still unsatisfied split-half reliability (approximately 0.5). The reliability across multiple time points, as assessed by ICC2, RT, and efficiency, demonstrated moderate levels of test-retest reliability (close to 0.5). These findings revealed the presence of a reliability paradox in the context of SMT-based SPE assessment. We discussed the implications of how to enhance individual-level reliability using this paradigm for future study design.

自我匹配任务（SMT）被广泛用于研究自我优先效应（SPE）的认知机制，其中自我相关刺激比其他相关刺激的表现更强。尽管SMT健壮地引出了SPE，但缺乏量化该范式可靠性的数据。考虑到认知任务中普遍存在的可靠性悖论，这是有问题的：尽管在群体水平上表现出可复制的效果，但许多公认的认知任务在用于评估个体差异时显示出相对较低的可靠性。为了填补这一空白，这项预先注册的研究使用多元宇宙方法，结合文献中报道的所有可能的指标和基线，调查了SMT得出的SPE的可靠性。我们首先使用荟萃分析方法检查了42个数据集（N = 2250）中24个SPE测量的稳健性。然后，我们计算了每个SPE测量的分半信度(r)和类内相关系数（ICC2）。我们的研究结果揭示了跨数据集的稳健的组级SPE。然而，在评估个体差异时，由反应时间（RT）和效率得出的SPE指标与其他SPE指标相比表现出相对较高的水平，但仍然不满足半分信度（约为0.5）。通过ICC2、RT和效率评估，多个时间点的信度显示中等水平的重测信度（接近0.5）。这些发现揭示了基于smt的SPE评估中存在的可靠性悖论。我们讨论了如何在未来的研究设计中使用这种范式来提高个人水平的可靠性。

{"title":"A multiverse assessment of the reliability of the self-matching task as a measurement of the self-prioritization effect.","authors":"Zheng Liu, Mengzhen Hu, Yuanrui Zheng, Jie Sui, Hu Chuan-Peng","doi":"10.3758/s13428-024-02538-6","DOIUrl":"10.3758/s13428-024-02538-6","url":null,"abstract":"The self-matching task (SMT) is widely used to investigate the cognitive mechanisms underlying the self-prioritization effect (SPE), wherein performance is enhanced for self-associated stimuli compared to other-associated ones. Although the SMT robustly elicits the SPE, there is a lack of data quantifying the reliability of this paradigm. This is problematic, given the prevalence of the reliability paradox in cognitive tasks: many well-established cognitive tasks demonstrate relatively low reliability when used to evaluate individual differences, despite exhibiting replicable effects at the group level. To fill this gap, this preregistered study investigated the reliability of SPE derived from the SMT using a multiverse approach, combining all possible indicators and baselines reported in the literature. We first examined the robustness of 24 SPE measures across 42 datasets (N = 2250) using a meta-analytical approach. We then calculated the split-half reliability (r) and intraclass correlation coefficient (ICC2) for each SPE measure. Our findings revealed a robust group-level SPE across datasets. However, when evaluating individual differences, SPE indices derived from reaction time (RT) and efficiency exhibited relatively higher, compared to other SPE indices, but still unsatisfied split-half reliability (approximately 0.5). The reliability across multiple time points, as assessed by ICC2, RT, and efficiency, demonstrated moderate levels of test-retest reliability (close to 0.5). These findings revealed the presence of a reliability paradox in the context of SMT-based SPE assessment. We discussed the implications of how to enhance individual-level reliability using this paradigm for future study design.","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 1","pages":"37"},"PeriodicalIF":4.6,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142920524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

How accurately can we estimate spontaneous body kinematics from video recordings? Effect of movement amplitude on OpenPose accuracy. 我们从录像中估计人体自发运动的准确度有多高？运动幅度对OpenPose精度的影响。

IF 4.6 2区心理学 Q1 PSYCHOLOGY, EXPERIMENTAL

Behavior Research Methods

Pub Date : 2025-01-02 DOI: 10.3758/s13428-024-02546-6

Atesh Koul, Giacomo Novembre

Estimating how the human body moves in space and time-body kinematics-has important applications for industry, healthcare, and several research fields. Gold-standard methodologies capturing body kinematics are expensive and impractical for naturalistic recordings as they rely on infrared-reflective wearables and bulky instrumentation. To overcome these limitations, several algorithms have been developed to extract body kinematics from plain video recordings. This comes with a drop in accuracy, which however has not been clearly quantified. To fill this knowledge gap, we analysed a dataset comprising 46 human participants exhibiting spontaneous movements of varying amplitude. Body kinematics were estimated using OpenPose (video-based) and Vicon (infrared-based) motion capture systems simultaneously. OpenPose accuracy was assessed using Vicon estimates as ground truth. We report that OpenPose accuracy is overall moderate and varies substantially across participants and body parts. This is explained by variability in movement amplitude. OpenPose estimates are weak for low-amplitude movements. Conversely, large-amplitude movements (i.e., > ~ 10 cm) yield highly accurate estimates. The relationship between accuracy and movement amplitude is not linear (but mostly exponential or power) and relatively robust to camera-body distance. Together, these results dissect the limits of video-based motion capture and provide useful guidelines for future studies.

估计人体如何在空间和时间中运动-身体运动学-在工业，医疗保健和一些研究领域具有重要的应用。捕捉人体运动学的黄金标准方法对于自然记录来说是昂贵且不切实际的，因为它们依赖于红外反射可穿戴设备和笨重的仪器。为了克服这些限制，已经开发了几种算法来从普通视频记录中提取人体运动学。这伴随着准确性的下降，然而，这还没有明确的量化。为了填补这一知识空白，我们分析了一个包含46名人类参与者的数据集，这些参与者表现出不同幅度的自发运动。同时使用OpenPose（基于视频的）和Vicon（基于红外的）运动捕捉系统估计身体运动学。使用Vicon估计作为基础真值来评估OpenPose的准确性。我们报告说，OpenPose的准确性总体上是中等的，并且在参与者和身体部位之间存在很大差异。这可以用运动幅度的变化来解释。OpenPose对低振幅运动的估计很弱。相反，大振幅运动（即> ~ 10cm）产生高度准确的估计。精度和运动幅度之间的关系不是线性的（但大多是指数或幂），并且相对健壮的相机与身体的距离。总之，这些结果剖析了基于视频的动作捕捉的局限性，并为未来的研究提供了有用的指导。

{"title":"How accurately can we estimate spontaneous body kinematics from video recordings? Effect of movement amplitude on OpenPose accuracy.","authors":"Atesh Koul, Giacomo Novembre","doi":"10.3758/s13428-024-02546-6","DOIUrl":"10.3758/s13428-024-02546-6","url":null,"abstract":"Estimating how the human body moves in space and time-body kinematics-has important applications for industry, healthcare, and several research fields. Gold-standard methodologies capturing body kinematics are expensive and impractical for naturalistic recordings as they rely on infrared-reflective wearables and bulky instrumentation. To overcome these limitations, several algorithms have been developed to extract body kinematics from plain video recordings. This comes with a drop in accuracy, which however has not been clearly quantified. To fill this knowledge gap, we analysed a dataset comprising 46 human participants exhibiting spontaneous movements of varying amplitude. Body kinematics were estimated using OpenPose (video-based) and Vicon (infrared-based) motion capture systems simultaneously. OpenPose accuracy was assessed using Vicon estimates as ground truth. We report that OpenPose accuracy is overall moderate and varies substantially across participants and body parts. This is explained by variability in movement amplitude. OpenPose estimates are weak for low-amplitude movements. Conversely, large-amplitude movements (i.e., > ~ 10 cm) yield highly accurate estimates. The relationship between accuracy and movement amplitude is not linear (but mostly exponential or power) and relatively robust to camera-body distance. Together, these results dissect the limits of video-based motion capture and provide useful guidelines for future studies.","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 1","pages":"38"},"PeriodicalIF":4.6,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11695451/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142920529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Measuring visual ability in linguistically diverse populations. 测量不同语言人群的视觉能力。

IF 4.6 2区心理学 Q1 PSYCHOLOGY, EXPERIMENTAL

Behavior Research Methods

Pub Date : 2024-12-30 DOI: 10.3758/s13428-024-02579-x

Madison A Hooper, Andrew Tomarken, Isabel Gauthier

Measurement of object recognition (OR) ability could predict learning and success in real-world settings, and there is hope that it may reduce bias often observed in cognitive tests. Although the measurement of visual OR is not expected to be influenced by the language of participants or the language of instructions, these assumptions remain largely untested. Here, we address the challenges of measuring OR abilities across linguistically diverse populations. In Study 1, we find that English-Spanish bilinguals, when randomly assigned to the English or Spanish version of the novel object memory test (NOMT), exhibit a highly similar overall performance. Study 2 extends this by assessing psychometric equivalence using an approach grounded in item response theory (IRT). We examined whether groups fluent in English or Spanish differed in (a) latent OR ability as assessed by a three-parameter logistic IRT model, and (2) the mapping of observed item responses on the latent OR construct, as assessed by differential item functioning (DIF) analyses. Spanish speakers performed better than English speakers, a difference we suggest is due to motivational differences between groups of vastly different size on the Prolific platform. That we found no substantial DIF between the groups tested in English or Spanish on the NOMT indicates measurement invariance. The feasibility of increasing diversity by combining groups tested in different languages remains unexplored. Adopting this approach could enable visual scientists to enhance diversity, equity, and inclusion in their research, and potentially in the broader application of their work in society.

物体识别（OR）能力的测量可以预测现实环境中的学习和成功，并且有希望减少在认知测试中经常观察到的偏见。虽然视觉OR的测量不受参与者语言或指令语言的影响，但这些假设在很大程度上仍未经检验。在这里，我们解决了在不同语言人群中测量OR能力的挑战。在研究1中，我们发现，当被随机分配到英语或西班牙语版本的新对象记忆测试（NOMT）时，英语-西班牙语双语者表现出高度相似的整体表现。研究2通过使用基于项目反应理论（IRT）的方法评估心理测量等效性来扩展这一研究。我们检验了英语流利组和西班牙语流利组在以下方面是否存在差异：(a)通过三参数logistic IRT模型评估潜在or能力，以及(2)通过差异项目功能（DIF）分析评估观察到的项目反应在潜在or结构上的映射。说西班牙语的人比说英语的人表现得更好，我们认为这种差异是由于多产平台上不同规模的群体之间的动机差异造成的。我们发现在英语和西班牙语的NOMT测试组之间没有实质性的差异表明测量不变性。通过组合不同语言测试的群体来增加多样性的可行性仍未得到探索。采用这种方法可以使视觉科学家在他们的研究中增强多样性、公平性和包容性，并有可能使他们的工作在社会上得到更广泛的应用。

{"title":"Measuring visual ability in linguistically diverse populations.","authors":"Madison A Hooper, Andrew Tomarken, Isabel Gauthier","doi":"10.3758/s13428-024-02579-x","DOIUrl":"10.3758/s13428-024-02579-x","url":null,"abstract":"Measurement of object recognition (OR) ability could predict learning and success in real-world settings, and there is hope that it may reduce bias often observed in cognitive tests. Although the measurement of visual OR is not expected to be influenced by the language of participants or the language of instructions, these assumptions remain largely untested. Here, we address the challenges of measuring OR abilities across linguistically diverse populations. In Study 1, we find that English-Spanish bilinguals, when randomly assigned to the English or Spanish version of the novel object memory test (NOMT), exhibit a highly similar overall performance. Study 2 extends this by assessing psychometric equivalence using an approach grounded in item response theory (IRT). We examined whether groups fluent in English or Spanish differed in (a) latent OR ability as assessed by a three-parameter logistic IRT model, and (2) the mapping of observed item responses on the latent OR construct, as assessed by differential item functioning (DIF) analyses. Spanish speakers performed better than English speakers, a difference we suggest is due to motivational differences between groups of vastly different size on the Prolific platform. That we found no substantial DIF between the groups tested in English or Spanish on the NOMT indicates measurement invariance. The feasibility of increasing diversity by combining groups tested in different languages remains unexplored. Adopting this approach could enable visual scientists to enhance diversity, equity, and inclusion in their research, and potentially in the broader application of their work in society.","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 1","pages":"36"},"PeriodicalIF":4.6,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11685244/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142909184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

How do manipulation checks interfere with the inference of causal relationships? 操纵检查如何干扰因果关系的推断？

IF 4.6 2区心理学 Q1 PSYCHOLOGY, EXPERIMENTAL

Behavior Research Methods

Pub Date : 2024-12-30 DOI: 10.3758/s13428-024-02573-3

Yuhwa Han, Wooyeol Lee

This study investigates the performance of mediation analyses, including manipulation check variables, in experimental studies where manipulated psychological attributes are independent variables. We simulated the level of manipulation intensities and measurement errors of the manipulation check variable to test the validity of the analytic practice. Our results showed that when manipulation is successful and measurement error is low, mediation analyses with the manipulation check variable revealed an unstable path coefficient and standard error. Moreover, many of the detected indirect effects were inconsistent mediation situations. However, when individual differences in psychological attributes remained within the condition (low manipulation intensity) and the manipulation check variable contained low measurement error, the indirect effect indicated the validity of the manipulation. We discuss the implications of our findings for the use of manipulation checks in experimental research.

本研究探讨了在实验研究中，被操纵心理属性为自变量的中介分析的表现，包括操纵检查变量。我们模拟了操作强度水平和操作检查变量的测量误差，以检验分析实践的有效性。结果表明，当操作成功且测量误差较低时，使用操作检验变量的中介分析显示路径系数和标准误差不稳定。此外，许多检测到的间接影响是不一致的调解情况。然而，当心理属性的个体差异保持在低操作强度的条件下，且操作检验变量的测量误差较低时，间接效应表明操作的有效性。我们讨论了我们的发现在实验研究中使用操纵检查的含义。

引用次数: 0

A large-scale database of Mandarin Chinese word associations from the Small World of Words Project. 来自“小词世界”项目的大型汉语词汇关联数据库。

IF 4.6 2区心理学 Q1 PSYCHOLOGY, EXPERIMENTAL

Behavior Research Methods

Pub Date : 2024-12-30 DOI: 10.3758/s13428-024-02513-1

Bing Li, Ziyi Ding, Simon De Deyne, Qing Cai

Word associations are among the most direct ways to measure word meaning in human minds, capturing various relationships, even those formed by non-linguistic experiences. Although large-scale word associations exist for Dutch, English, and Spanish, there is a lack of data for Mandarin Chinese, the most widely spoken language from a distinct language family. Here we present the Small World of Words-Zhongwen (Chinese) (SWOW-ZH), a word association dataset of Mandarin Chinese derived from a three-response word association task. This dataset covers responses for over 10,000 cue words from more than 40,000 participants. We constructed a semantic network based on this dataset and evaluated concurrent validity of association-based measures by predicting human processing latencies and comparing them with text-based measures and word embeddings. Our results show that word centrality significantly predicts lexical decision and word naming speed. Furthermore, SWOW-ZH notably outperforms text-based embeddings and transformer-based large language models in predicting human-rated word relationships across varying sample sizes. We also highlight the unique characteristics of Chinese word associations, particularly focusing on word formation. Combined, our findings underscore the critical importance of large-scale human experimental data and its unique contribution to understanding the complexity and richness of language.

单词联想是衡量人类思维中单词含义的最直接方法之一，它捕捉到各种关系，甚至是非语言经验形成的关系。虽然荷兰语、英语和西班牙语存在大规模的词汇关联，但汉语普通话缺乏数据，汉语是一个不同语系中使用最广泛的语言。本文提出了一个基于三反应词关联任务的汉语词关联数据集——小词世界-中文（中文）（SWOW-ZH）。该数据集涵盖了来自40,000多名参与者的10,000多个提示词的回复。我们基于该数据集构建了一个语义网络，并通过预测人类处理延迟，并将其与基于文本的度量和词嵌入进行比较，来评估基于关联的度量的并发有效性。我们的研究结果表明，单词中心性显著预测词汇决策和单词命名速度。此外，SWOW-ZH在预测不同样本量的人类评价词关系方面明显优于基于文本的嵌入和基于转换器的大型语言模型。我们还强调了汉语单词联想的独特特点，特别注重构词。综上所述，我们的研究结果强调了大规模人类实验数据的重要性及其对理解语言的复杂性和丰富性的独特贡献。

{"title":"A large-scale database of Mandarin Chinese word associations from the Small World of Words Project.","authors":"Bing Li, Ziyi Ding, Simon De Deyne, Qing Cai","doi":"10.3758/s13428-024-02513-1","DOIUrl":"10.3758/s13428-024-02513-1","url":null,"abstract":"Word associations are among the most direct ways to measure word meaning in human minds, capturing various relationships, even those formed by non-linguistic experiences. Although large-scale word associations exist for Dutch, English, and Spanish, there is a lack of data for Mandarin Chinese, the most widely spoken language from a distinct language family. Here we present the Small World of Words-Zhongwen (Chinese) (SWOW-ZH), a word association dataset of Mandarin Chinese derived from a three-response word association task. This dataset covers responses for over 10,000 cue words from more than 40,000 participants. We constructed a semantic network based on this dataset and evaluated concurrent validity of association-based measures by predicting human processing latencies and comparing them with text-based measures and word embeddings. Our results show that word centrality significantly predicts lexical decision and word naming speed. Furthermore, SWOW-ZH notably outperforms text-based embeddings and transformer-based large language models in predicting human-rated word relationships across varying sample sizes. We also highlight the unique characteristics of Chinese word associations, particularly focusing on word formation. Combined, our findings underscore the critical importance of large-scale human experimental data and its unique contribution to understanding the complexity and richness of language.","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 1","pages":"34"},"PeriodicalIF":4.6,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142909180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Metacontrast masking does not change with different display technologies: A comparison of CRT and LCD monitors. 不同的显示技术不会改变元对比掩蔽：CRT和LCD显示器的比较。

IF 4.6 2区心理学 Q1 PSYCHOLOGY, EXPERIMENTAL

Behavior Research Methods

Pub Date : 2024-12-30 DOI: 10.3758/s13428-024-02526-w

Tomke Trußner, Thorsten Albrecht, Uwe Mattler

Most vision labs have had to replace the formerly dominant CRT screens with LCDs and several studies have investigated whether changing the display type leads to changes in perceptual phenomena, since fundamental properties of the stimulation, e.g., the transition time between frames, differ between these different display technologies. While many phenomena have proven robust, Kihara et al. (2010) reported different metacontrast masking functions on LCDs compared to CRTs. This difference poses a challenge for the integration of new LCD-based findings with the established knowledge from studies with CRTs and requires theoretical accounts that consider the effects of different display types. However, before further conclusions can be drawn, the basic findings should be secured. Therefore, we tried to reproduce the display type effect by comparing metacontrast masking on an LCD and a CRT in two experiments. Our approach differs from the previous study by increasing the power and reliability of the measurements and carefully matching the two display types. In addition to display type, we varied target-mask stimulus-onset asynchrony (SOA) and stimulus-background polarity. Regardless of display type and polarity, we found the typical type-B masking functions. Evidence for a SOA-dependent display type effect in the black-on-white polarity condition from Experiment 1 was not replicated in Experiment 2. Overall, the results indicate that metacontrast masking effects on objective and subjective measurements, i.e., discriminatory sensitivity and phenomenological reports, do not vary significantly with display technologies. This lack of display effects is discussed in the context of current theories of metacontrast masking.

大多数视觉实验室不得不用lcd取代以前占主导地位的CRT屏幕，一些研究已经调查了改变显示类型是否会导致感知现象的变化，因为刺激的基本特性，例如帧之间的过渡时间，在这些不同的显示技术之间是不同的。虽然许多现象已被证明是稳健的，但Kihara等人（2010）报道了与crt相比，lcd上的元对比度掩膜功能不同。这种差异对将基于lcd的新发现与来自crt研究的既定知识相结合提出了挑战，并且需要考虑不同显示类型影响的理论解释。然而，在得出进一步的结论之前，应该确保基本的调查结果。因此，我们试图在两个实验中通过比较LCD和CRT上的元对比掩蔽来再现显示类型效应。我们的方法与之前的研究不同，增加了测量的功率和可靠性，并仔细匹配了两种显示类型。除了显示类型外，我们还改变了目标-掩膜刺激-开始异步（SOA）和刺激-背景极性。无论显示类型和极性如何，我们都发现了典型的b型屏蔽函数。实验1中关于黑白极性条件下soa依赖的显示类型效应的证据在实验2中没有被复制。总体而言，结果表明，元对比掩蔽效应对客观和主观测量，即歧视敏感性和现象学报告，不显着改变显示技术。在当前的元对比掩蔽理论的背景下讨论了这种显示效果的缺乏。

{"title":"Metacontrast masking does not change with different display technologies: A comparison of CRT and LCD monitors.","authors":"Tomke Trußner, Thorsten Albrecht, Uwe Mattler","doi":"10.3758/s13428-024-02526-w","DOIUrl":"10.3758/s13428-024-02526-w","url":null,"abstract":"Most vision labs have had to replace the formerly dominant CRT screens with LCDs and several studies have investigated whether changing the display type leads to changes in perceptual phenomena, since fundamental properties of the stimulation, e.g., the transition time between frames, differ between these different display technologies. While many phenomena have proven robust, Kihara et al. (2010) reported different metacontrast masking functions on LCDs compared to CRTs. This difference poses a challenge for the integration of new LCD-based findings with the established knowledge from studies with CRTs and requires theoretical accounts that consider the effects of different display types. However, before further conclusions can be drawn, the basic findings should be secured. Therefore, we tried to reproduce the display type effect by comparing metacontrast masking on an LCD and a CRT in two experiments. Our approach differs from the previous study by increasing the power and reliability of the measurements and carefully matching the two display types. In addition to display type, we varied target-mask stimulus-onset asynchrony (SOA) and stimulus-background polarity. Regardless of display type and polarity, we found the typical type-B masking functions. Evidence for a SOA-dependent display type effect in the black-on-white polarity condition from Experiment 1 was not replicated in Experiment 2. Overall, the results indicate that metacontrast masking effects on objective and subjective measurements, i.e., discriminatory sensitivity and phenomenological reports, do not vary significantly with display technologies. This lack of display effects is discussed in the context of current theories of metacontrast masking.","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 1","pages":"30"},"PeriodicalIF":4.6,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11685275/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142909185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cross-device and test-retest reliability of speech acoustic measurements derived from consumer-grade mobile recording devices. 来自消费级移动记录设备的语音声学测量的跨设备和测试-重测试可靠性。

IF 4.6 2区心理学 Q1 PSYCHOLOGY, EXPERIMENTAL

Behavior Research Methods

Pub Date : 2024-12-30 DOI: 10.3758/s13428-024-02584-0

Zian Hu, Zhenglin Zhang, Hai Li, Li-Zhuang Yang

In recent years, there has been growing interest in remote speech assessment through automated speech acoustic analysis. While the reliability of widely used features has been validated in professional recording settings, it remains unclear how the heterogeneity of consumer-grade recording devices, commonly used in nonclinical settings, impacts the reliability of these measurements. To address this issue, we systematically investigated the cross-device and test-retest reliability of classical speech acoustic measurements in a sample of healthy Chinese adults using consumer-grade equipment across three popular speech tasks: sustained phonation (SP), diadochokinesis (DDK), and picture description (PicD). A total of 51 participants completed two recording sessions spaced at least 24 hours apart. Speech outputs were recorded simultaneously using four devices: a voice recorder, laptop, tablet, and smartphone. Our results demonstrated good reliability for fundamental frequency and cepstral peak prominence in the SP task across testing sessions and devices. Other features from the SP and PicD tasks exhibited acceptable test-retest reliability, except for the period perturbation quotient from the tablet and formant frequency from the smartphone. However, measures from the DDK task showed a significant decrease in reliability on consumer-grade recording devices compared to professional devices. These findings indicate that the lower recording quality of consumer-grade equipment may compromise the reproducibility of syllable rate estimation, which is critical for DDK analysis. This study underscores the need for standardization of remote speech monitoring methodologies to ensure that remote home assessment provides accurate and reliable results for early screening.

近年来，人们对通过自动语音声学分析进行远程语音评估越来越感兴趣。虽然广泛使用的功能的可靠性已在专业记录环境中得到验证，但仍不清楚消费级记录设备的异质性如何影响这些测量的可靠性，这些设备通常用于非临床环境。为了解决这一问题，我们系统地研究了使用消费级设备的健康中国成年人样本中经典语音声学测量的跨设备和测试-重测试可靠性，包括三种常见的语音任务：持续发声（SP）、双音波运动（DDK）和图像描述（PicD）。共有51名参与者完成了两次录音，间隔至少24小时。语音输出使用四种设备同时记录：录音机、笔记本电脑、平板电脑和智能手机。我们的研究结果表明，在跨测试会话和设备的SP任务中，基频和倒谱峰突出具有良好的可靠性。SP和PicD任务的其他特征表现出可接受的重测可靠性，除了平板电脑的周期扰动商和智能手机的形成频率。然而，来自DDK任务的测量显示，与专业设备相比，消费级记录设备的可靠性显著降低。这些发现表明，消费级设备较低的记录质量可能会损害音节率估计的再现性，这对DDK分析至关重要。这项研究强调了远程语音监测方法标准化的必要性，以确保远程家庭评估为早期筛查提供准确可靠的结果。

{"title":"Cross-device and test-retest reliability of speech acoustic measurements derived from consumer-grade mobile recording devices.","authors":"Zian Hu, Zhenglin Zhang, Hai Li, Li-Zhuang Yang","doi":"10.3758/s13428-024-02584-0","DOIUrl":"10.3758/s13428-024-02584-0","url":null,"abstract":"In recent years, there has been growing interest in remote speech assessment through automated speech acoustic analysis. While the reliability of widely used features has been validated in professional recording settings, it remains unclear how the heterogeneity of consumer-grade recording devices, commonly used in nonclinical settings, impacts the reliability of these measurements. To address this issue, we systematically investigated the cross-device and test-retest reliability of classical speech acoustic measurements in a sample of healthy Chinese adults using consumer-grade equipment across three popular speech tasks: sustained phonation (SP), diadochokinesis (DDK), and picture description (PicD). A total of 51 participants completed two recording sessions spaced at least 24 hours apart. Speech outputs were recorded simultaneously using four devices: a voice recorder, laptop, tablet, and smartphone. Our results demonstrated good reliability for fundamental frequency and cepstral peak prominence in the SP task across testing sessions and devices. Other features from the SP and PicD tasks exhibited acceptable test-retest reliability, except for the period perturbation quotient from the tablet and formant frequency from the smartphone. However, measures from the DDK task showed a significant decrease in reliability on consumer-grade recording devices compared to professional devices. These findings indicate that the lower recording quality of consumer-grade equipment may compromise the reproducibility of syllable rate estimation, which is critical for DDK analysis. This study underscores the need for standardization of remote speech monitoring methodologies to ensure that remote home assessment provides accurate and reliable results for early screening.","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 1","pages":"35"},"PeriodicalIF":4.6,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142909182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Assessing handwriting skills in a web browser: Development and validation of an automated online test in Japanese Kanji. 在网络浏览器中评估手写技能：日语汉字自动在线测试的开发与验证。

IF 4.6 2区心理学 Q1 PSYCHOLOGY, EXPERIMENTAL

Behavior Research Methods

Pub Date : 2024-12-30 DOI: 10.3758/s13428-024-02562-6

Tomohiro Inoue, Yucan Chen, Toshio Ohyanagi

Online language and literacy assessments have become prevalent in research and practice across settings. However, a notable exception is the assessment of handwriting and spelling, which has traditionally been conducted in person with paper and pencil. In light of this, we developed an automated, browser-based handwriting test application (Online Assessment of Handwriting and Spelling: OAHaS) for Japanese Kanji (Study 1) and examined its psychometric properties (Study 2). The automated scoring function using convolutional neural network (CNN) models achieved high recall (98.7%) and specificity (84.4%), as well as high agreement with manual scoring (95.4%). Additionally, behavioral validation with data from primary school children (N = 261, 49.0% female, age range = 6-12 years) indicated the high reliability and validity of our online test application, with a strong correlation between children's scores on the online and paper-based tests (r = .86). Moreover, our analysis indicated the potential utility of writing fluency measures (latency and duration) that are automatically recorded by OAHaS. Taken together, our browser-based application demonstrated the feasibility and viability of remote and automated assessment of handwriting skills, providing a streamlined approach to research and practice on handwriting. The source code of the application and supporting materials are available on Open Science Framework ( https://osf.io/gver2/ ).

在线语言和读写能力评估在各种研究和实践中都很普遍。然而，一个值得注意的例外是对笔迹和拼写的评估，传统上是亲自用纸和铅笔进行的。鉴于此，我们开发了一个基于浏览器的日语汉字自动书写测试应用程序（Online Assessment of handwriting and Spelling: OAHaS）（研究1），并检验了其心理测量学特性（研究2）。使用卷积神经网络（CNN）模型的自动评分功能达到了高召回率（98.7%）和特异性（84.4%），与人工评分的一致性（95.4%）很高。此外，对小学生（N = 261，女性49.0%，年龄范围= 6-12岁）的行为验证表明，我们的在线测试应用程序具有高信度和效度，儿童在线测试成绩与纸质测试成绩之间存在很强的相关性（r = .86）。此外，我们的分析表明，oaa自动记录的写作流畅性测量（延迟和持续时间）的潜在效用。总之，我们基于浏览器的应用程序展示了远程和自动评估手写技能的可行性和可行性，为手写研究和实践提供了一种简化的方法。该应用程序的源代码和支持材料可在Open Science Framework （https://osf.io/gver2/）上获得。

{"title":"Assessing handwriting skills in a web browser: Development and validation of an automated online test in Japanese Kanji.","authors":"Tomohiro Inoue, Yucan Chen, Toshio Ohyanagi","doi":"10.3758/s13428-024-02562-6","DOIUrl":"10.3758/s13428-024-02562-6","url":null,"abstract":"Online language and literacy assessments have become prevalent in research and practice across settings. However, a notable exception is the assessment of handwriting and spelling, which has traditionally been conducted in person with paper and pencil. In light of this, we developed an automated, browser-based handwriting test application (Online Assessment of Handwriting and Spelling: OAHaS) for Japanese Kanji (Study 1) and examined its psychometric properties (Study 2). The automated scoring function using convolutional neural network (CNN) models achieved high recall (98.7%) and specificity (84.4%), as well as high agreement with manual scoring (95.4%). Additionally, behavioral validation with data from primary school children (N = 261, 49.0% female, age range = 6-12 years) indicated the high reliability and validity of our online test application, with a strong correlation between children's scores on the online and paper-based tests (r = .86). Moreover, our analysis indicated the potential utility of writing fluency measures (latency and duration) that are automatically recorded by OAHaS. Taken together, our browser-based application demonstrated the feasibility and viability of remote and automated assessment of handwriting skills, providing a streamlined approach to research and practice on handwriting. The source code of the application and supporting materials are available on Open Science Framework ( https://osf.io/gver2/ ).","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 1","pages":"32"},"PeriodicalIF":4.6,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11685258/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142909181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Priority attribute algorithm for Q-matrix validation: A didactic. q矩阵验证的优先属性算法：教学。

IF 4.6 2区心理学 Q1 PSYCHOLOGY, EXPERIMENTAL

Behavior Research Methods

Pub Date : 2024-12-30 DOI: 10.3758/s13428-024-02547-5

Haijiang Qin, Lei Guo

The Q-matrix is one of the core components of cognitive diagnostic assessment, which is a matrix describing the relationship between items and the attributes being assessed. Numerous studies have shown that inaccuracies in defining the Q-matrix can degrade parameter estimation and model fitting results. Currently, Q-matrix validation often involves exhaustive search algorithms (ESA), which traverse through all possible $q$ -vectors and determine the optimal $q$ -vector for items based on indicators or criteria corresponding to different validation methods. However, ESA methods are time-consuming, especially when the number of attributes is large, as the search complexity grows exponentially. This study proposes a more efficient search algorithm, the priority attribute algorithm (PAA), which conducts searches one by one according to the priority of attributes, greatly simplifying the search process. Simulation studies indicate that PAA can significantly enhance search efficiency while maintaining the same or even higher accuracy than ESA, particularly when dealing with a large number of attributes. Moreover, the Q-matrix validation method employing PAA demonstrates better applicability to small samples. A real-data analysis indicates that applying the PAA-based Q-matrix validation method may yield suggested Q-matrices with higher model-data fit and greater practical utility.

q矩阵是认知诊断评估的核心组成部分之一，它是描述项目与被评估属性之间关系的矩阵。大量研究表明，定义q矩阵的不准确会降低参数估计和模型拟合的结果。目前，q矩阵验证通常涉及穷举搜索算法（ESA），该算法遍历所有可能的q -向量，并根据不同验证方法对应的指标或标准确定项目的最优q -向量。然而，随着搜索复杂度呈指数级增长，ESA方法耗时较长，特别是当属性数量较大时。本研究提出了一种更高效的搜索算法——优先属性算法（PAA），该算法根据属性的优先级进行逐个搜索，大大简化了搜索过程。仿真研究表明，PAA可以显著提高搜索效率，同时保持与ESA相同甚至更高的精度，特别是在处理大量属性时。此外，采用PAA的q矩阵验证方法对小样本具有更好的适用性。实际数据分析表明，应用基于聚丙烯酸的q矩阵验证方法可以得到具有较高模型数据拟合性和更大实用价值的建议q矩阵。

{"title":"Priority attribute algorithm for Q-matrix validation: A didactic.","authors":"Haijiang Qin, Lei Guo","doi":"10.3758/s13428-024-02547-5","DOIUrl":"10.3758/s13428-024-02547-5","url":null,"abstract":"The Q-matrix is one of the core components of cognitive diagnostic assessment, which is a matrix describing the relationship between items and the attributes being assessed. Numerous studies have shown that inaccuracies in defining the Q-matrix can degrade parameter estimation and model fitting results. Currently, Q-matrix validation often involves exhaustive search algorithms (ESA), which traverse through all possible <math><mi>q</mi></math> -vectors and determine the optimal <math><mi>q</mi></math> -vector for items based on indicators or criteria corresponding to different validation methods. However, ESA methods are time-consuming, especially when the number of attributes is large, as the search complexity grows exponentially. This study proposes a more efficient search algorithm, the priority attribute algorithm (PAA), which conducts searches one by one according to the priority of attributes, greatly simplifying the search process. Simulation studies indicate that PAA can significantly enhance search efficiency while maintaining the same or even higher accuracy than ESA, particularly when dealing with a large number of attributes. Moreover, the Q-matrix validation method employing PAA demonstrates better applicability to small samples. A real-data analysis indicates that applying the PAA-based Q-matrix validation method may yield suggested Q-matrices with higher model-data fit and greater practical utility.","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 1","pages":"31"},"PeriodicalIF":4.6,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142909186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0