Over the past few decades, Swahili-English and Lithuanian-English word pair databases have been extensively utilized in research on learning and memory. However, these normative databases are specifically designed for generating study stimuli in learning and memory research involving native (or fluent) English speakers. Consequently, they are not suitable for investigations that encompass populations whose first language is not English, such as Chinese individuals. Notably, native Chinese speakers constitute a substantial proportion, approximately 18%, of the global population. The current study aims to establish a new database of translation equivalences, specifically tailored to facilitate research on learning, memory, and metacognition among the Chinese population. We present a comprehensive set of normative measures for 200 Swahili-Chinese paired associates, including recall accuracy, recall latency, error patterns, confidence ratings, perceived learning difficulty, judgments of learning, and perceived learning interestingness for the entire word pairs. Additionally, we include word-likeness ratings and word length for the Swahili words, and concreteness ratings, familiarity ratings, word frequency, and number of strokes for the Chinese words. This diverse array of measures, gathered across a substantial number of Swahili-Chinese word pairs, is poised to effectively support future research seeking to investigate the intricate processes of learning, memory and metacognition within the Chinese population.
{"title":"A normative database of Swahili-Chinese paired associates.","authors":"Tian Fan, Wenbo Zhao, Bukuan Sun, Shaohang Liu, Yue Yin, Muzi Xu, Xiao Hu, Chunliang Yang, Liang Luo","doi":"10.3758/s13428-024-02531-z","DOIUrl":"10.3758/s13428-024-02531-z","url":null,"abstract":"<p><p>Over the past few decades, Swahili-English and Lithuanian-English word pair databases have been extensively utilized in research on learning and memory. However, these normative databases are specifically designed for generating study stimuli in learning and memory research involving native (or fluent) English speakers. Consequently, they are not suitable for investigations that encompass populations whose first language is not English, such as Chinese individuals. Notably, native Chinese speakers constitute a substantial proportion, approximately 18%, of the global population. The current study aims to establish a new database of translation equivalences, specifically tailored to facilitate research on learning, memory, and metacognition among the Chinese population. We present a comprehensive set of normative measures for 200 Swahili-Chinese paired associates, including recall accuracy, recall latency, error patterns, confidence ratings, perceived learning difficulty, judgments of learning, and perceived learning interestingness for the entire word pairs. Additionally, we include word-likeness ratings and word length for the Swahili words, and concreteness ratings, familiarity ratings, word frequency, and number of strokes for the Chinese words. This diverse array of measures, gathered across a substantial number of Swahili-Chinese word pairs, is poised to effectively support future research seeking to investigate the intricate processes of learning, memory and metacognition within the Chinese population.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 1","pages":"40"},"PeriodicalIF":4.6,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142920528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-02DOI: 10.3758/s13428-024-02538-6
Zheng Liu, Mengzhen Hu, Yuanrui Zheng, Jie Sui, Hu Chuan-Peng
The self-matching task (SMT) is widely used to investigate the cognitive mechanisms underlying the self-prioritization effect (SPE), wherein performance is enhanced for self-associated stimuli compared to other-associated ones. Although the SMT robustly elicits the SPE, there is a lack of data quantifying the reliability of this paradigm. This is problematic, given the prevalence of the reliability paradox in cognitive tasks: many well-established cognitive tasks demonstrate relatively low reliability when used to evaluate individual differences, despite exhibiting replicable effects at the group level. To fill this gap, this preregistered study investigated the reliability of SPE derived from the SMT using a multiverse approach, combining all possible indicators and baselines reported in the literature. We first examined the robustness of 24 SPE measures across 42 datasets (N = 2250) using a meta-analytical approach. We then calculated the split-half reliability (r) and intraclass correlation coefficient (ICC2) for each SPE measure. Our findings revealed a robust group-level SPE across datasets. However, when evaluating individual differences, SPE indices derived from reaction time (RT) and efficiency exhibited relatively higher, compared to other SPE indices, but still unsatisfied split-half reliability (approximately 0.5). The reliability across multiple time points, as assessed by ICC2, RT, and efficiency, demonstrated moderate levels of test-retest reliability (close to 0.5). These findings revealed the presence of a reliability paradox in the context of SMT-based SPE assessment. We discussed the implications of how to enhance individual-level reliability using this paradigm for future study design.
{"title":"A multiverse assessment of the reliability of the self-matching task as a measurement of the self-prioritization effect.","authors":"Zheng Liu, Mengzhen Hu, Yuanrui Zheng, Jie Sui, Hu Chuan-Peng","doi":"10.3758/s13428-024-02538-6","DOIUrl":"10.3758/s13428-024-02538-6","url":null,"abstract":"<p><p>The self-matching task (SMT) is widely used to investigate the cognitive mechanisms underlying the self-prioritization effect (SPE), wherein performance is enhanced for self-associated stimuli compared to other-associated ones. Although the SMT robustly elicits the SPE, there is a lack of data quantifying the reliability of this paradigm. This is problematic, given the prevalence of the reliability paradox in cognitive tasks: many well-established cognitive tasks demonstrate relatively low reliability when used to evaluate individual differences, despite exhibiting replicable effects at the group level. To fill this gap, this preregistered study investigated the reliability of SPE derived from the SMT using a multiverse approach, combining all possible indicators and baselines reported in the literature. We first examined the robustness of 24 SPE measures across 42 datasets (N = 2250) using a meta-analytical approach. We then calculated the split-half reliability (r) and intraclass correlation coefficient (ICC2) for each SPE measure. Our findings revealed a robust group-level SPE across datasets. However, when evaluating individual differences, SPE indices derived from reaction time (RT) and efficiency exhibited relatively higher, compared to other SPE indices, but still unsatisfied split-half reliability (approximately 0.5). The reliability across multiple time points, as assessed by ICC2, RT, and efficiency, demonstrated moderate levels of test-retest reliability (close to 0.5). These findings revealed the presence of a reliability paradox in the context of SMT-based SPE assessment. We discussed the implications of how to enhance individual-level reliability using this paradigm for future study design.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 1","pages":"37"},"PeriodicalIF":4.6,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142920524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-02DOI: 10.3758/s13428-024-02546-6
Atesh Koul, Giacomo Novembre
Estimating how the human body moves in space and time-body kinematics-has important applications for industry, healthcare, and several research fields. Gold-standard methodologies capturing body kinematics are expensive and impractical for naturalistic recordings as they rely on infrared-reflective wearables and bulky instrumentation. To overcome these limitations, several algorithms have been developed to extract body kinematics from plain video recordings. This comes with a drop in accuracy, which however has not been clearly quantified. To fill this knowledge gap, we analysed a dataset comprising 46 human participants exhibiting spontaneous movements of varying amplitude. Body kinematics were estimated using OpenPose (video-based) and Vicon (infrared-based) motion capture systems simultaneously. OpenPose accuracy was assessed using Vicon estimates as ground truth. We report that OpenPose accuracy is overall moderate and varies substantially across participants and body parts. This is explained by variability in movement amplitude. OpenPose estimates are weak for low-amplitude movements. Conversely, large-amplitude movements (i.e., > ~ 10 cm) yield highly accurate estimates. The relationship between accuracy and movement amplitude is not linear (but mostly exponential or power) and relatively robust to camera-body distance. Together, these results dissect the limits of video-based motion capture and provide useful guidelines for future studies.
{"title":"How accurately can we estimate spontaneous body kinematics from video recordings? Effect of movement amplitude on OpenPose accuracy.","authors":"Atesh Koul, Giacomo Novembre","doi":"10.3758/s13428-024-02546-6","DOIUrl":"10.3758/s13428-024-02546-6","url":null,"abstract":"<p><p>Estimating how the human body moves in space and time-body kinematics-has important applications for industry, healthcare, and several research fields. Gold-standard methodologies capturing body kinematics are expensive and impractical for naturalistic recordings as they rely on infrared-reflective wearables and bulky instrumentation. To overcome these limitations, several algorithms have been developed to extract body kinematics from plain video recordings. This comes with a drop in accuracy, which however has not been clearly quantified. To fill this knowledge gap, we analysed a dataset comprising 46 human participants exhibiting spontaneous movements of varying amplitude. Body kinematics were estimated using OpenPose (video-based) and Vicon (infrared-based) motion capture systems simultaneously. OpenPose accuracy was assessed using Vicon estimates as ground truth. We report that OpenPose accuracy is overall moderate and varies substantially across participants and body parts. This is explained by variability in movement amplitude. OpenPose estimates are weak for low-amplitude movements. Conversely, large-amplitude movements (i.e., > ~ 10 cm) yield highly accurate estimates. The relationship between accuracy and movement amplitude is not linear (but mostly exponential or power) and relatively robust to camera-body distance. Together, these results dissect the limits of video-based motion capture and provide useful guidelines for future studies.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 1","pages":"38"},"PeriodicalIF":4.6,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11695451/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142920529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-30DOI: 10.3758/s13428-024-02579-x
Madison A Hooper, Andrew Tomarken, Isabel Gauthier
Measurement of object recognition (OR) ability could predict learning and success in real-world settings, and there is hope that it may reduce bias often observed in cognitive tests. Although the measurement of visual OR is not expected to be influenced by the language of participants or the language of instructions, these assumptions remain largely untested. Here, we address the challenges of measuring OR abilities across linguistically diverse populations. In Study 1, we find that English-Spanish bilinguals, when randomly assigned to the English or Spanish version of the novel object memory test (NOMT), exhibit a highly similar overall performance. Study 2 extends this by assessing psychometric equivalence using an approach grounded in item response theory (IRT). We examined whether groups fluent in English or Spanish differed in (a) latent OR ability as assessed by a three-parameter logistic IRT model, and (2) the mapping of observed item responses on the latent OR construct, as assessed by differential item functioning (DIF) analyses. Spanish speakers performed better than English speakers, a difference we suggest is due to motivational differences between groups of vastly different size on the Prolific platform. That we found no substantial DIF between the groups tested in English or Spanish on the NOMT indicates measurement invariance. The feasibility of increasing diversity by combining groups tested in different languages remains unexplored. Adopting this approach could enable visual scientists to enhance diversity, equity, and inclusion in their research, and potentially in the broader application of their work in society.
{"title":"Measuring visual ability in linguistically diverse populations.","authors":"Madison A Hooper, Andrew Tomarken, Isabel Gauthier","doi":"10.3758/s13428-024-02579-x","DOIUrl":"10.3758/s13428-024-02579-x","url":null,"abstract":"<p><p>Measurement of object recognition (OR) ability could predict learning and success in real-world settings, and there is hope that it may reduce bias often observed in cognitive tests. Although the measurement of visual OR is not expected to be influenced by the language of participants or the language of instructions, these assumptions remain largely untested. Here, we address the challenges of measuring OR abilities across linguistically diverse populations. In Study 1, we find that English-Spanish bilinguals, when randomly assigned to the English or Spanish version of the novel object memory test (NOMT), exhibit a highly similar overall performance. Study 2 extends this by assessing psychometric equivalence using an approach grounded in item response theory (IRT). We examined whether groups fluent in English or Spanish differed in (a) latent OR ability as assessed by a three-parameter logistic IRT model, and (2) the mapping of observed item responses on the latent OR construct, as assessed by differential item functioning (DIF) analyses. Spanish speakers performed better than English speakers, a difference we suggest is due to motivational differences between groups of vastly different size on the Prolific platform. That we found no substantial DIF between the groups tested in English or Spanish on the NOMT indicates measurement invariance. The feasibility of increasing diversity by combining groups tested in different languages remains unexplored. Adopting this approach could enable visual scientists to enhance diversity, equity, and inclusion in their research, and potentially in the broader application of their work in society.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 1","pages":"36"},"PeriodicalIF":4.6,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11685244/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142909184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-30DOI: 10.3758/s13428-024-02573-3
Yuhwa Han, Wooyeol Lee
This study investigates the performance of mediation analyses, including manipulation check variables, in experimental studies where manipulated psychological attributes are independent variables. We simulated the level of manipulation intensities and measurement errors of the manipulation check variable to test the validity of the analytic practice. Our results showed that when manipulation is successful and measurement error is low, mediation analyses with the manipulation check variable revealed an unstable path coefficient and standard error. Moreover, many of the detected indirect effects were inconsistent mediation situations. However, when individual differences in psychological attributes remained within the condition (low manipulation intensity) and the manipulation check variable contained low measurement error, the indirect effect indicated the validity of the manipulation. We discuss the implications of our findings for the use of manipulation checks in experimental research.
{"title":"How do manipulation checks interfere with the inference of causal relationships?","authors":"Yuhwa Han, Wooyeol Lee","doi":"10.3758/s13428-024-02573-3","DOIUrl":"10.3758/s13428-024-02573-3","url":null,"abstract":"<p><p>This study investigates the performance of mediation analyses, including manipulation check variables, in experimental studies where manipulated psychological attributes are independent variables. We simulated the level of manipulation intensities and measurement errors of the manipulation check variable to test the validity of the analytic practice. Our results showed that when manipulation is successful and measurement error is low, mediation analyses with the manipulation check variable revealed an unstable path coefficient and standard error. Moreover, many of the detected indirect effects were inconsistent mediation situations. However, when individual differences in psychological attributes remained within the condition (low manipulation intensity) and the manipulation check variable contained low measurement error, the indirect effect indicated the validity of the manipulation. We discuss the implications of our findings for the use of manipulation checks in experimental research.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 1","pages":"33"},"PeriodicalIF":4.6,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11685263/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142909183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-30DOI: 10.3758/s13428-024-02513-1
Bing Li, Ziyi Ding, Simon De Deyne, Qing Cai
Word associations are among the most direct ways to measure word meaning in human minds, capturing various relationships, even those formed by non-linguistic experiences. Although large-scale word associations exist for Dutch, English, and Spanish, there is a lack of data for Mandarin Chinese, the most widely spoken language from a distinct language family. Here we present the Small World of Words-Zhongwen (Chinese) (SWOW-ZH), a word association dataset of Mandarin Chinese derived from a three-response word association task. This dataset covers responses for over 10,000 cue words from more than 40,000 participants. We constructed a semantic network based on this dataset and evaluated concurrent validity of association-based measures by predicting human processing latencies and comparing them with text-based measures and word embeddings. Our results show that word centrality significantly predicts lexical decision and word naming speed. Furthermore, SWOW-ZH notably outperforms text-based embeddings and transformer-based large language models in predicting human-rated word relationships across varying sample sizes. We also highlight the unique characteristics of Chinese word associations, particularly focusing on word formation. Combined, our findings underscore the critical importance of large-scale human experimental data and its unique contribution to understanding the complexity and richness of language.
{"title":"A large-scale database of Mandarin Chinese word associations from the Small World of Words Project.","authors":"Bing Li, Ziyi Ding, Simon De Deyne, Qing Cai","doi":"10.3758/s13428-024-02513-1","DOIUrl":"10.3758/s13428-024-02513-1","url":null,"abstract":"<p><p>Word associations are among the most direct ways to measure word meaning in human minds, capturing various relationships, even those formed by non-linguistic experiences. Although large-scale word associations exist for Dutch, English, and Spanish, there is a lack of data for Mandarin Chinese, the most widely spoken language from a distinct language family. Here we present the Small World of Words-Zhongwen (Chinese) (SWOW-ZH), a word association dataset of Mandarin Chinese derived from a three-response word association task. This dataset covers responses for over 10,000 cue words from more than 40,000 participants. We constructed a semantic network based on this dataset and evaluated concurrent validity of association-based measures by predicting human processing latencies and comparing them with text-based measures and word embeddings. Our results show that word centrality significantly predicts lexical decision and word naming speed. Furthermore, SWOW-ZH notably outperforms text-based embeddings and transformer-based large language models in predicting human-rated word relationships across varying sample sizes. We also highlight the unique characteristics of Chinese word associations, particularly focusing on word formation. Combined, our findings underscore the critical importance of large-scale human experimental data and its unique contribution to understanding the complexity and richness of language.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 1","pages":"34"},"PeriodicalIF":4.6,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142909180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-30DOI: 10.3758/s13428-024-02526-w
Tomke Trußner, Thorsten Albrecht, Uwe Mattler
Most vision labs have had to replace the formerly dominant CRT screens with LCDs and several studies have investigated whether changing the display type leads to changes in perceptual phenomena, since fundamental properties of the stimulation, e.g., the transition time between frames, differ between these different display technologies. While many phenomena have proven robust, Kihara et al. (2010) reported different metacontrast masking functions on LCDs compared to CRTs. This difference poses a challenge for the integration of new LCD-based findings with the established knowledge from studies with CRTs and requires theoretical accounts that consider the effects of different display types. However, before further conclusions can be drawn, the basic findings should be secured. Therefore, we tried to reproduce the display type effect by comparing metacontrast masking on an LCD and a CRT in two experiments. Our approach differs from the previous study by increasing the power and reliability of the measurements and carefully matching the two display types. In addition to display type, we varied target-mask stimulus-onset asynchrony (SOA) and stimulus-background polarity. Regardless of display type and polarity, we found the typical type-B masking functions. Evidence for a SOA-dependent display type effect in the black-on-white polarity condition from Experiment 1 was not replicated in Experiment 2. Overall, the results indicate that metacontrast masking effects on objective and subjective measurements, i.e., discriminatory sensitivity and phenomenological reports, do not vary significantly with display technologies. This lack of display effects is discussed in the context of current theories of metacontrast masking.
{"title":"Metacontrast masking does not change with different display technologies: A comparison of CRT and LCD monitors.","authors":"Tomke Trußner, Thorsten Albrecht, Uwe Mattler","doi":"10.3758/s13428-024-02526-w","DOIUrl":"10.3758/s13428-024-02526-w","url":null,"abstract":"<p><p>Most vision labs have had to replace the formerly dominant CRT screens with LCDs and several studies have investigated whether changing the display type leads to changes in perceptual phenomena, since fundamental properties of the stimulation, e.g., the transition time between frames, differ between these different display technologies. While many phenomena have proven robust, Kihara et al. (2010) reported different metacontrast masking functions on LCDs compared to CRTs. This difference poses a challenge for the integration of new LCD-based findings with the established knowledge from studies with CRTs and requires theoretical accounts that consider the effects of different display types. However, before further conclusions can be drawn, the basic findings should be secured. Therefore, we tried to reproduce the display type effect by comparing metacontrast masking on an LCD and a CRT in two experiments. Our approach differs from the previous study by increasing the power and reliability of the measurements and carefully matching the two display types. In addition to display type, we varied target-mask stimulus-onset asynchrony (SOA) and stimulus-background polarity. Regardless of display type and polarity, we found the typical type-B masking functions. Evidence for a SOA-dependent display type effect in the black-on-white polarity condition from Experiment 1 was not replicated in Experiment 2. Overall, the results indicate that metacontrast masking effects on objective and subjective measurements, i.e., discriminatory sensitivity and phenomenological reports, do not vary significantly with display technologies. This lack of display effects is discussed in the context of current theories of metacontrast masking.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 1","pages":"30"},"PeriodicalIF":4.6,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11685275/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142909185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-30DOI: 10.3758/s13428-024-02584-0
Zian Hu, Zhenglin Zhang, Hai Li, Li-Zhuang Yang
In recent years, there has been growing interest in remote speech assessment through automated speech acoustic analysis. While the reliability of widely used features has been validated in professional recording settings, it remains unclear how the heterogeneity of consumer-grade recording devices, commonly used in nonclinical settings, impacts the reliability of these measurements. To address this issue, we systematically investigated the cross-device and test-retest reliability of classical speech acoustic measurements in a sample of healthy Chinese adults using consumer-grade equipment across three popular speech tasks: sustained phonation (SP), diadochokinesis (DDK), and picture description (PicD). A total of 51 participants completed two recording sessions spaced at least 24 hours apart. Speech outputs were recorded simultaneously using four devices: a voice recorder, laptop, tablet, and smartphone. Our results demonstrated good reliability for fundamental frequency and cepstral peak prominence in the SP task across testing sessions and devices. Other features from the SP and PicD tasks exhibited acceptable test-retest reliability, except for the period perturbation quotient from the tablet and formant frequency from the smartphone. However, measures from the DDK task showed a significant decrease in reliability on consumer-grade recording devices compared to professional devices. These findings indicate that the lower recording quality of consumer-grade equipment may compromise the reproducibility of syllable rate estimation, which is critical for DDK analysis. This study underscores the need for standardization of remote speech monitoring methodologies to ensure that remote home assessment provides accurate and reliable results for early screening.
{"title":"Cross-device and test-retest reliability of speech acoustic measurements derived from consumer-grade mobile recording devices.","authors":"Zian Hu, Zhenglin Zhang, Hai Li, Li-Zhuang Yang","doi":"10.3758/s13428-024-02584-0","DOIUrl":"10.3758/s13428-024-02584-0","url":null,"abstract":"<p><p>In recent years, there has been growing interest in remote speech assessment through automated speech acoustic analysis. While the reliability of widely used features has been validated in professional recording settings, it remains unclear how the heterogeneity of consumer-grade recording devices, commonly used in nonclinical settings, impacts the reliability of these measurements. To address this issue, we systematically investigated the cross-device and test-retest reliability of classical speech acoustic measurements in a sample of healthy Chinese adults using consumer-grade equipment across three popular speech tasks: sustained phonation (SP), diadochokinesis (DDK), and picture description (PicD). A total of 51 participants completed two recording sessions spaced at least 24 hours apart. Speech outputs were recorded simultaneously using four devices: a voice recorder, laptop, tablet, and smartphone. Our results demonstrated good reliability for fundamental frequency and cepstral peak prominence in the SP task across testing sessions and devices. Other features from the SP and PicD tasks exhibited acceptable test-retest reliability, except for the period perturbation quotient from the tablet and formant frequency from the smartphone. However, measures from the DDK task showed a significant decrease in reliability on consumer-grade recording devices compared to professional devices. These findings indicate that the lower recording quality of consumer-grade equipment may compromise the reproducibility of syllable rate estimation, which is critical for DDK analysis. This study underscores the need for standardization of remote speech monitoring methodologies to ensure that remote home assessment provides accurate and reliable results for early screening.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 1","pages":"35"},"PeriodicalIF":4.6,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142909182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-30DOI: 10.3758/s13428-024-02562-6
Tomohiro Inoue, Yucan Chen, Toshio Ohyanagi
Online language and literacy assessments have become prevalent in research and practice across settings. However, a notable exception is the assessment of handwriting and spelling, which has traditionally been conducted in person with paper and pencil. In light of this, we developed an automated, browser-based handwriting test application (Online Assessment of Handwriting and Spelling: OAHaS) for Japanese Kanji (Study 1) and examined its psychometric properties (Study 2). The automated scoring function using convolutional neural network (CNN) models achieved high recall (98.7%) and specificity (84.4%), as well as high agreement with manual scoring (95.4%). Additionally, behavioral validation with data from primary school children (N = 261, 49.0% female, age range = 6-12 years) indicated the high reliability and validity of our online test application, with a strong correlation between children's scores on the online and paper-based tests (r = .86). Moreover, our analysis indicated the potential utility of writing fluency measures (latency and duration) that are automatically recorded by OAHaS. Taken together, our browser-based application demonstrated the feasibility and viability of remote and automated assessment of handwriting skills, providing a streamlined approach to research and practice on handwriting. The source code of the application and supporting materials are available on Open Science Framework ( https://osf.io/gver2/ ).
在线语言和读写能力评估在各种研究和实践中都很普遍。然而,一个值得注意的例外是对笔迹和拼写的评估,传统上是亲自用纸和铅笔进行的。鉴于此,我们开发了一个基于浏览器的日语汉字自动书写测试应用程序(Online Assessment of handwriting and Spelling: OAHaS)(研究1),并检验了其心理测量学特性(研究2)。使用卷积神经网络(CNN)模型的自动评分功能达到了高召回率(98.7%)和特异性(84.4%),与人工评分的一致性(95.4%)很高。此外,对小学生(N = 261,女性49.0%,年龄范围= 6-12岁)的行为验证表明,我们的在线测试应用程序具有高信度和效度,儿童在线测试成绩与纸质测试成绩之间存在很强的相关性(r = .86)。此外,我们的分析表明,oaa自动记录的写作流畅性测量(延迟和持续时间)的潜在效用。总之,我们基于浏览器的应用程序展示了远程和自动评估手写技能的可行性和可行性,为手写研究和实践提供了一种简化的方法。该应用程序的源代码和支持材料可在Open Science Framework (https://osf.io/gver2/)上获得。
{"title":"Assessing handwriting skills in a web browser: Development and validation of an automated online test in Japanese Kanji.","authors":"Tomohiro Inoue, Yucan Chen, Toshio Ohyanagi","doi":"10.3758/s13428-024-02562-6","DOIUrl":"10.3758/s13428-024-02562-6","url":null,"abstract":"<p><p>Online language and literacy assessments have become prevalent in research and practice across settings. However, a notable exception is the assessment of handwriting and spelling, which has traditionally been conducted in person with paper and pencil. In light of this, we developed an automated, browser-based handwriting test application (Online Assessment of Handwriting and Spelling: OAHaS) for Japanese Kanji (Study 1) and examined its psychometric properties (Study 2). The automated scoring function using convolutional neural network (CNN) models achieved high recall (98.7%) and specificity (84.4%), as well as high agreement with manual scoring (95.4%). Additionally, behavioral validation with data from primary school children (N = 261, 49.0% female, age range = 6-12 years) indicated the high reliability and validity of our online test application, with a strong correlation between children's scores on the online and paper-based tests (r = .86). Moreover, our analysis indicated the potential utility of writing fluency measures (latency and duration) that are automatically recorded by OAHaS. Taken together, our browser-based application demonstrated the feasibility and viability of remote and automated assessment of handwriting skills, providing a streamlined approach to research and practice on handwriting. The source code of the application and supporting materials are available on Open Science Framework ( https://osf.io/gver2/ ).</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 1","pages":"32"},"PeriodicalIF":4.6,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11685258/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142909181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-30DOI: 10.3758/s13428-024-02547-5
Haijiang Qin, Lei Guo
The Q-matrix is one of the core components of cognitive diagnostic assessment, which is a matrix describing the relationship between items and the attributes being assessed. Numerous studies have shown that inaccuracies in defining the Q-matrix can degrade parameter estimation and model fitting results. Currently, Q-matrix validation often involves exhaustive search algorithms (ESA), which traverse through all possible -vectors and determine the optimal -vector for items based on indicators or criteria corresponding to different validation methods. However, ESA methods are time-consuming, especially when the number of attributes is large, as the search complexity grows exponentially. This study proposes a more efficient search algorithm, the priority attribute algorithm (PAA), which conducts searches one by one according to the priority of attributes, greatly simplifying the search process. Simulation studies indicate that PAA can significantly enhance search efficiency while maintaining the same or even higher accuracy than ESA, particularly when dealing with a large number of attributes. Moreover, the Q-matrix validation method employing PAA demonstrates better applicability to small samples. A real-data analysis indicates that applying the PAA-based Q-matrix validation method may yield suggested Q-matrices with higher model-data fit and greater practical utility.
{"title":"Priority attribute algorithm for Q-matrix validation: A didactic.","authors":"Haijiang Qin, Lei Guo","doi":"10.3758/s13428-024-02547-5","DOIUrl":"10.3758/s13428-024-02547-5","url":null,"abstract":"<p><p>The Q-matrix is one of the core components of cognitive diagnostic assessment, which is a matrix describing the relationship between items and the attributes being assessed. Numerous studies have shown that inaccuracies in defining the Q-matrix can degrade parameter estimation and model fitting results. Currently, Q-matrix validation often involves exhaustive search algorithms (ESA), which traverse through all possible <math><mi>q</mi></math> -vectors and determine the optimal <math><mi>q</mi></math> -vector for items based on indicators or criteria corresponding to different validation methods. However, ESA methods are time-consuming, especially when the number of attributes is large, as the search complexity grows exponentially. This study proposes a more efficient search algorithm, the priority attribute algorithm (PAA), which conducts searches one by one according to the priority of attributes, greatly simplifying the search process. Simulation studies indicate that PAA can significantly enhance search efficiency while maintaining the same or even higher accuracy than ESA, particularly when dealing with a large number of attributes. Moreover, the Q-matrix validation method employing PAA demonstrates better applicability to small samples. A real-data analysis indicates that applying the PAA-based Q-matrix validation method may yield suggested Q-matrices with higher model-data fit and greater practical utility.</p>","PeriodicalId":8717,"journal":{"name":"Behavior Research Methods","volume":"57 1","pages":"31"},"PeriodicalIF":4.6,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142909186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}