Pub Date : 2026-01-01Epub Date: 2025-01-24DOI: 10.1177/10731911241301473
Katherine L Collison, Donald R Lynam, Tianwei V Du, Susan C South
The Self-Control Scale (SCS) is one of the most widely used measures in the clinical, personality, and social psychology fields. It is often treated as unidimensional, even though no research supports such a unidimensional factor structure. We tested the factor structure in an undergraduate sample as well as a community sample used for additional confirmatory analyses. Factors from the best-fitting confirmatory models were correlated with putatively related and distinct constructs to assess their (dis)similarities. Consistent with hypotheses, the best-fitting factor structure consisted of multiple, correlated factors; however, none of the factor solutions met pre-specified fit criteria. Several additional analyses were conducted beyond the preregistered analyses to find a reasonably fitting factor solution. Ultimately, study findings support a two-factor solution using the items of the Brief Self-Control Scale. Results are discussed for the full 36-item scale as well as the brief, 13-item scale. We conclude with lessons learned from a Registered Report focused on factor analysis.
{"title":"Testing a Multidimensional Factor Structure of the Self-Control Scale.","authors":"Katherine L Collison, Donald R Lynam, Tianwei V Du, Susan C South","doi":"10.1177/10731911241301473","DOIUrl":"10.1177/10731911241301473","url":null,"abstract":"<p><p>The Self-Control Scale (SCS) is one of the most widely used measures in the clinical, personality, and social psychology fields. It is often treated as unidimensional, even though no research supports such a unidimensional factor structure. We tested the factor structure in an undergraduate sample as well as a community sample used for additional confirmatory analyses. Factors from the best-fitting confirmatory models were correlated with putatively related and distinct constructs to assess their (dis)similarities. Consistent with hypotheses, the best-fitting factor structure consisted of multiple, correlated factors; however, none of the factor solutions met pre-specified fit criteria. Several additional analyses were conducted beyond the preregistered analyses to find a reasonably fitting factor solution. Ultimately, study findings support a two-factor solution using the items of the Brief Self-Control Scale. Results are discussed for the full 36-item scale as well as the brief, 13-item scale. We conclude with lessons learned from a Registered Report focused on factor analysis.</p>","PeriodicalId":8577,"journal":{"name":"Assessment","volume":" ","pages":"115-130"},"PeriodicalIF":3.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143031976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-02-03DOI: 10.1177/10731911241310312
Xitao Liu, Christopher Falco, Gregory Guldner, Jason T Siegel
Research on the construct of flourishing spans many fields of study. This study extends previous work by VanderWeele by investigating the measurement of flourishing, focusing on the structure and convergent validity of the Flourish Index (FI) and the Secure Flourish Index (SFI) within a national, multi-site sample of resident physicians. Through exploratory and confirmatory factor analyses (EFAs and CFAs), we assessed whether the FI and the SFI aligned with the theoretical flourishing models that VanderWeele suggested. We examined the convergent validity of both indices by testing whether they exhibited expected correlations with six different scales. The results of factor analyses and scale validation showed that data collected by the FI and the SFI fit the structural model of flourishing proposed by VanderWeele. Although prior studies reliably indicate that CFA results align with VanderWeele's model, this is a rare study where the EFA results also demonstrated a structure that aligns with his framework. Both scales exhibited strong convergent validity, producing data correlated with all six measures in the predicted directions. Although convergent validity has been previously shown, this study replicated and expanded evidence of the construct validity of data provided by the FI and the SFI.
{"title":"Psychometric Properties of the Flourish Index and the Secure Flourish Index in Healthcare Settings.","authors":"Xitao Liu, Christopher Falco, Gregory Guldner, Jason T Siegel","doi":"10.1177/10731911241310312","DOIUrl":"10.1177/10731911241310312","url":null,"abstract":"<p><p>Research on the construct of flourishing spans many fields of study. This study extends previous work by VanderWeele by investigating the measurement of flourishing, focusing on the structure and convergent validity of the Flourish Index (FI) and the Secure Flourish Index (SFI) within a national, multi-site sample of resident physicians. Through exploratory and confirmatory factor analyses (EFAs and CFAs), we assessed whether the FI and the SFI aligned with the theoretical flourishing models that VanderWeele suggested. We examined the convergent validity of both indices by testing whether they exhibited expected correlations with six different scales. The results of factor analyses and scale validation showed that data collected by the FI and the SFI fit the structural model of flourishing proposed by VanderWeele. Although prior studies reliably indicate that CFA results align with VanderWeele's model, this is a rare study where the EFA results also demonstrated a structure that aligns with his framework. Both scales exhibited strong convergent validity, producing data correlated with all six measures in the predicted directions. Although convergent validity has been previously shown, this study replicated and expanded evidence of the construct validity of data provided by the FI and the SFI.</p>","PeriodicalId":8577,"journal":{"name":"Assessment","volume":" ","pages":"143-159"},"PeriodicalIF":3.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143078500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-01Epub Date: 2025-02-24DOI: 10.1177/10731911251317785
G Leonard Burns, Juan José Montaño, Stephen P Becker, Mateu Servera
Psychometric and normative information is provided for the Child and Adolescent Behavior Inventory (CABI) cognitive disengagement syndrome, anxiety, depression, attention-deficit/hyperactivity disorder (ADHD)-inattention, ADHD-hyperactivity/impulsivity, oppositional defiant disorder, social impairment, peer rejection, withdrawal from peer interactions, and academic impairment scales with a nationally representative sample of Spanish youth. Parents of 5,525 Spanish youth (ages 5-16, 56.1% males) completed the CABI scales on their sons and daughters. Scores on the 10 CABI scales demonstrated excellent reliability, invariance, and validity for males and females within early childhood (ages 5-8), middle childhood (ages 9-12), and adolescence (ages 13-16). Normative information (T-scores) is provided for females and males within each age group for the 10 CABI scales. The new psychometric and normative information increase the usefulness of the CABI scale scores for research and clinical activities. Copies of the CABI and the norms are available at no cost to professionals.
{"title":"Psychometric and Normative Information on the Child and Adolescent Behavior Inventory With Parent Ratings in a Nationally Representative Sample of Spanish Youth.","authors":"G Leonard Burns, Juan José Montaño, Stephen P Becker, Mateu Servera","doi":"10.1177/10731911251317785","DOIUrl":"10.1177/10731911251317785","url":null,"abstract":"<p><p>Psychometric and normative information is provided for the Child and Adolescent Behavior Inventory (CABI) cognitive disengagement syndrome, anxiety, depression, attention-deficit/hyperactivity disorder (ADHD)-inattention, ADHD-hyperactivity/impulsivity, oppositional defiant disorder, social impairment, peer rejection, withdrawal from peer interactions, and academic impairment scales with a nationally representative sample of Spanish youth. Parents of 5,525 Spanish youth (ages 5-16, 56.1% males) completed the CABI scales on their sons and daughters. Scores on the 10 CABI scales demonstrated excellent reliability, invariance, and validity for males and females within early childhood (ages 5-8), middle childhood (ages 9-12), and adolescence (ages 13-16). Normative information (<i>T</i>-scores) is provided for females and males within each age group for the 10 CABI scales. The new psychometric and normative information increase the usefulness of the CABI scale scores for research and clinical activities. Copies of the CABI and the norms are available at no cost to professionals.</p>","PeriodicalId":8577,"journal":{"name":"Assessment","volume":" ","pages":"77-88"},"PeriodicalIF":3.4,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143481990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-31DOI: 10.1177/10731911251406404
Wenjuan Liu, Jiuju Wang, Hanwen Zhang, Yuping Zhang, Hongyun Liu, Hua Shu, Yufeng Wang, Yueqin Hu, Hong Li
Existing diagnosis instruments for developmental dyslexia (DD) in mainland China are limited in generalizability and typically rely on traditional norming approaches, which require large sample sizes to achieve precision. This study aims to develop and validate the Beijing Normal University Diagnostic Tool for Chinese Mandarin Developmental Dyslexia (BNU-DTCMDD), a DD diagnostic tool with regression-based norms for elementary school students in mainland China. A nationally representative sample of 3,782 first-to-sixth-grade students and a clinical sample of 84 first-to-sixth-grade students diagnosed with specific learning disabilities (SLD) were administered the BNU-DTCMDD, comprising six tasks that measure reading abilities and related cognitive skills. The tool demonstrated high internal consistency (Cronbach's α .73-.99), good test-retest reliability (Pearson's r .68-.99), good structural validity, and reasonable criterion validity (Cohen's d 0.27-0.63). Norms were established using generalized additive models for location, scale, and shape (GAMLSS), yielding percentile curves and Z-scores. Based on the norms, the prevalence of DD was 6.08% in the normative sample and 73.81% in the clinical sample with SLD. The BNU-DTCMDD can diagnose DD in elementary school students in mainland China with good reliability and validity, and its regression-based norms overcome the statistical constraints of traditional norming and support timely diagnosis and intervention for DD.
中国大陆现有的发展性阅读障碍(DD)诊断工具的通用性有限,通常依赖于传统的规范化方法,需要大样本量才能达到精度。本研究旨在开发并验证北京师范大学中文普通话发展性阅读障碍诊断工具(BNU-DTCMDD),这是一个基于回归规范的中国大陆小学生阅读障碍诊断工具。对3782名一至六年级学生和84名诊断为特殊学习障碍(SLD)的一至六年级学生进行了全国代表性样本的BNU-DTCMDD,包括六个测试阅读能力和相关认知技能的任务。该工具具有较高的内部一致性(Cronbach’s α = 0.73 ~ 0.99),良好的重测信度(Pearson’s r = 0.68 ~ 0.99),良好的结构效度和合理的标准效度(Cohen’s d = 0.27 ~ 0.63)。使用位置、规模和形状的广义加性模型(GAMLSS)建立规范,产生百分位曲线和z分数。根据规范,规范样本中DD患病率为6.08%,临床样本中SLD患病率为73.81%。BNU-DTCMDD能够诊断中国大陆小学生的DD,具有良好的信度和效度,其基于回归的规范克服了传统规范的统计约束,支持DD的及时诊断和干预。
{"title":"Toward Digital Assessment of Developmental Dyslexia in Mainland China: Establishing Nationwide Norms With a GAMLSS Approach.","authors":"Wenjuan Liu, Jiuju Wang, Hanwen Zhang, Yuping Zhang, Hongyun Liu, Hua Shu, Yufeng Wang, Yueqin Hu, Hong Li","doi":"10.1177/10731911251406404","DOIUrl":"https://doi.org/10.1177/10731911251406404","url":null,"abstract":"<p><p>Existing diagnosis instruments for developmental dyslexia (DD) in mainland China are limited in generalizability and typically rely on traditional norming approaches, which require large sample sizes to achieve precision. This study aims to develop and validate the Beijing Normal University Diagnostic Tool for Chinese Mandarin Developmental Dyslexia (BNU-DTCMDD), a DD diagnostic tool with regression-based norms for elementary school students in mainland China. A nationally representative sample of 3,782 first-to-sixth-grade students and a clinical sample of 84 first-to-sixth-grade students diagnosed with specific learning disabilities (SLD) were administered the BNU-DTCMDD, comprising six tasks that measure reading abilities and related cognitive skills. The tool demonstrated high internal consistency (Cronbach's α .73-.99), good test-retest reliability (Pearson's <i>r</i> .68-.99), good structural validity, and reasonable criterion validity (Cohen's <i>d</i> 0.27-0.63). Norms were established using generalized additive models for location, scale, and shape (GAMLSS), yielding percentile curves and Z-scores. Based on the norms, the prevalence of DD was 6.08% in the normative sample and 73.81% in the clinical sample with SLD. The BNU-DTCMDD can diagnose DD in elementary school students in mainland China with good reliability and validity, and its regression-based norms overcome the statistical constraints of traditional norming and support timely diagnosis and intervention for DD.</p>","PeriodicalId":8577,"journal":{"name":"Assessment","volume":" ","pages":"10731911251406404"},"PeriodicalIF":3.4,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145861912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-31DOI: 10.1177/10731911251406405
Davide Marengo, Claudio Longobardi
Suicidal ideation in adolescents is a critical public health issue requiring early detection. This study examined whether machine learning (ML) and large language models (LLMs) can detect ideation in 1,197 students (ages 10-15) using self-reported Strengths and Difficulties Questionnaire (SDQ) data. Clinically relevant ideation was defined using Suicidal Ideation Questionnaire-Junior (SIQ-JR) cut-offs. Gemini 1.5 Pro and GPT-4o were prompted to estimate SIQ-JR scores from SDQ responses and demographics; Logistic Regression, Naive Bayes, and Random Forest models were trained on either SDQ data or LLM predictions. LLM predictions correlated with SIQ-JR (ρ = .61) and showed good discrimination across thresholds (area under the curve (AUC) ≥ .83), with item-level associations paralleling self-reports, revealing strong associations with emotional symptoms and peer problems. In cross-validated analyses, the best SDQ-based ML model reached sensitivity = .85 and specificity = .72; the best LLM-based model achieved .80 and .74. Notably, ML models trained directly on SDQ responses consistently outperformed those incorporating LLM predictions across all SIQ-JR thresholds. Nonetheless, LLMs demonstrated promising accuracy in identifying suicidal ideation based on SDQ and demographic data. Further refinement and validation are required before these approaches can be considered viable for clinical implementation.
{"title":"Detecting Suicidal Ideation in Adolescence Using Self-Reported Emotional and Behavioral Patterns: Comparing Machine Learning and Large Language Model Predictions.","authors":"Davide Marengo, Claudio Longobardi","doi":"10.1177/10731911251406405","DOIUrl":"https://doi.org/10.1177/10731911251406405","url":null,"abstract":"<p><p>Suicidal ideation in adolescents is a critical public health issue requiring early detection. This study examined whether machine learning (ML) and large language models (LLMs) can detect ideation in 1,197 students (ages 10-15) using self-reported Strengths and Difficulties Questionnaire (SDQ) data. Clinically relevant ideation was defined using Suicidal Ideation Questionnaire-Junior (SIQ-JR) cut-offs. Gemini 1.5 Pro and GPT-4o were prompted to estimate SIQ-JR scores from SDQ responses and demographics; Logistic Regression, Naive Bayes, and Random Forest models were trained on either SDQ data or LLM predictions. LLM predictions correlated with SIQ-JR (ρ = .61) and showed good discrimination across thresholds (area under the curve (AUC) ≥ .83), with item-level associations paralleling self-reports, revealing strong associations with emotional symptoms and peer problems. In cross-validated analyses, the best SDQ-based ML model reached sensitivity = .85 and specificity = .72; the best LLM-based model achieved .80 and .74. Notably, ML models trained directly on SDQ responses consistently outperformed those incorporating LLM predictions across all SIQ-JR thresholds. Nonetheless, LLMs demonstrated promising accuracy in identifying suicidal ideation based on SDQ and demographic data. Further refinement and validation are required before these approaches can be considered viable for clinical implementation.</p>","PeriodicalId":8577,"journal":{"name":"Assessment","volume":" ","pages":"10731911251406405"},"PeriodicalIF":3.4,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145861955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-30DOI: 10.1177/10731911251395993
Lauren M Henry, Kyunghun Lee, Eleanor Hansen, Elizabeth Tandilashvili, James Rozsypal, Trinity Erjo, Julia G Raven, Haley M Reynolds, Philip Curtis, Simone P Haller, Daniel S Pine, Elizabeth S Norton, Lauren S Wakschlag, Francisco Pereira, Melissa A Brotman
Atypical cry in infants/toddlers may serve as early, ecologically valid, and scalable indicators of irritability, a transdiagnostic mental health risk marker. Machine learning may identify cry in daylong audio recordings toward predicting outcomes. We developed a novel cry detection algorithm and evaluated performance against our reimplementation of an existing algorithm. In PyTorch, we reimplemented a support vector machine classifier that uses acoustic and deep spectral features from a modified AlexNet. We developed a novel classifier combining wav2vec 2.0 with conventional audio features and gradient boosting machines. Both classifiers were trained and evaluated using a previously annotated open-source data set (N = 21). In a new data set (N = 100), we annotated cry and examined the performance of both classifiers in identifying this ground truth. The existing and novel algorithms performed well in identifying ground truth cry in both the data set in which they were developed (AUCs = 0.897, 0.936) and the new data set (AUCs = 0.841, 0.902), underscoring generalization to unseen data. Bayesian comparison demonstrated that the novel algorithm outperformed the existing algorithm, which can be attributed to the novel algorithm's feature space and use of gradient boosting machines. This research provides a foundation for efficient detection of atypical cry patterns, with implications for earlier identification of dysregulated irritability presaging psychopathology.
{"title":"Detecting Cry in Daylong Audio Recordings Using Machine Learning: The Development and Evaluation of Binary Classifiers.","authors":"Lauren M Henry, Kyunghun Lee, Eleanor Hansen, Elizabeth Tandilashvili, James Rozsypal, Trinity Erjo, Julia G Raven, Haley M Reynolds, Philip Curtis, Simone P Haller, Daniel S Pine, Elizabeth S Norton, Lauren S Wakschlag, Francisco Pereira, Melissa A Brotman","doi":"10.1177/10731911251395993","DOIUrl":"https://doi.org/10.1177/10731911251395993","url":null,"abstract":"<p><p>Atypical cry in infants/toddlers may serve as early, ecologically valid, and scalable indicators of irritability, a transdiagnostic mental health risk marker. Machine learning may identify cry in daylong audio recordings toward predicting outcomes. We developed a novel cry detection algorithm and evaluated performance against our reimplementation of an existing algorithm. In PyTorch, we reimplemented a support vector machine classifier that uses acoustic and deep spectral features from a modified AlexNet. We developed a novel classifier combining wav2vec 2.0 with conventional audio features and gradient boosting machines. Both classifiers were trained and evaluated using a previously annotated open-source data set (<i>N</i> = 21). In a new data set (<i>N</i> = 100), we annotated cry and examined the performance of both classifiers in identifying this ground truth. The existing and novel algorithms performed well in identifying ground truth cry in both the data set in which they were developed (AUCs = 0.897, 0.936) and the new data set (AUCs = 0.841, 0.902), underscoring generalization to unseen data. Bayesian comparison demonstrated that the novel algorithm outperformed the existing algorithm, which can be attributed to the novel algorithm's feature space and use of gradient boosting machines. This research provides a foundation for efficient detection of atypical cry patterns, with implications for earlier identification of dysregulated irritability presaging psychopathology.</p>","PeriodicalId":8577,"journal":{"name":"Assessment","volume":" ","pages":"10731911251395993"},"PeriodicalIF":3.4,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145861944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-30DOI: 10.1177/10731911251401405
Maya A Marder, Corey Richier, Gregory A Miller, Wendy Heller
Studies of passive freeze behavior, an innate reaction to perceived or actual threat, have largely been concerned with its physical manifestations in the face of imminent danger (e.g., tonic immobility). Relatively little work has examined psychological aspects of the freezing phenomenon (e.g., cognitive freezing and threat evaluation) that may contribute significantly to the freezing episode. The present research considers dimensions of freezing, a set of contexts that may elicit freezing, and ways freezing relates to other internalizing symptoms or previous experiences of traumatic life events. The Anxious Freezing Questionnaire (AFQ) was developed using university samples (N = 653, N = 447, N = 590). Scale development best practices characterized a three-factor solution yielding physical freezing, cognitive freezing, and threat evaluation factors with good reliability and validity that were moderately correlated with, yet distinguishable from, other anxiety scales. Findings indicate that social-evaluative and performance contexts are relevant for freezing episodes. Results showed that previous experiences of traumatic events were significantly associated with higher levels of anxious freezing across all factors. This instrument has promise for identifying individual differences in profiles of anxiety-related freezing, with consideration of dimensional symptoms and a range of freezing-related contexts that may occur in everyday life.
被动冻结行为是一种对感知或实际威胁的先天反应,其研究主要涉及其在面对迫在眉睫的危险时的身体表现(例如,强直性静止)。相对而言,很少有研究研究冻结现象的心理方面(例如,认知冻结和威胁评估),这些方面可能对冻结事件有重大影响。目前的研究考虑了冻结的维度,一组可能引起冻结的环境,以及冻结与其他内化症状或先前创伤性生活事件经历相关的方式。焦虑性冻结问卷(AFQ)采用大学样本(N = 653, N = 447, N = 590)编制。量表开发最佳实践的特点是三因素解决方案产生物理冻结、认知冻结和威胁评估因素,具有良好的信度和效度,与其他焦虑量表适度相关,但与其他焦虑量表区分开来。研究结果表明,社会评价和表现情境与冻结发作有关。结果显示,以往的创伤性事件经历与所有因素中较高水平的焦虑性冻结显著相关。该工具有望在考虑维度症状和日常生活中可能发生的一系列与冻结相关的背景的情况下,识别与焦虑相关的冻结概况的个体差异。
{"title":"Conceptualization and Measurement of Anxious Freezing.","authors":"Maya A Marder, Corey Richier, Gregory A Miller, Wendy Heller","doi":"10.1177/10731911251401405","DOIUrl":"https://doi.org/10.1177/10731911251401405","url":null,"abstract":"<p><p>Studies of passive freeze behavior, an innate reaction to perceived or actual threat, have largely been concerned with its physical manifestations in the face of imminent danger (e.g., tonic immobility). Relatively little work has examined psychological aspects of the freezing phenomenon (e.g., cognitive freezing and threat evaluation) that may contribute significantly to the freezing episode. The present research considers dimensions of freezing, a set of contexts that may elicit freezing, and ways freezing relates to other internalizing symptoms or previous experiences of traumatic life events. The Anxious Freezing Questionnaire (AFQ) was developed using university samples (<i>N</i> = 653, <i>N</i> = 447, <i>N</i> = 590). Scale development best practices characterized a three-factor solution yielding physical freezing, cognitive freezing, and threat evaluation factors with good reliability and validity that were moderately correlated with, yet distinguishable from, other anxiety scales. Findings indicate that social-evaluative and performance contexts are relevant for freezing episodes. Results showed that previous experiences of traumatic events were significantly associated with higher levels of anxious freezing across all factors. This instrument has promise for identifying individual differences in profiles of anxiety-related freezing, with consideration of dimensional symptoms and a range of freezing-related contexts that may occur in everyday life.</p>","PeriodicalId":8577,"journal":{"name":"Assessment","volume":" ","pages":"10731911251401405"},"PeriodicalIF":3.4,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145861982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-29DOI: 10.1177/10731911251401321
Kyle D Austin, Hannah K Crawley, William Fleeson, R Michael Furr
We propose, illustrate, and evaluate the use of artificial intelligence (AI) to advance rigorous hypothesis-driven scale validation. Using a qualitative approach, we found that AI provided useful suggestions for measures to be used as criteria in scale validation research. Using data and expert predictions previously used to validate nine scales/subscales, we evaluated AI's ability to produce precise, psychologically reasonable validity hypotheses. ChatGPT and Gemini produced hypotheses with "inter-trial consistency" similar to experts' "inter-rater consistency," and their hypotheses agreed strongly with experts' hypotheses. Importantly, their hypothesized validity correlations were roughly as accurate (in terms of corresponding with actual validity correlations) as were experts' hypotheses. Replicating across nine scales/subscales, results are encouraging regarding the use of AI to facilitate a precise hypothesis-driven approach to convergent and discriminant validity in a way that saves time with little-to-no cost in psychological or psychometric quality.
{"title":"Using Generative Artificial Intelligence to Advance Hypothesis-Driven Scale Validation: Identifying Criterion Measures and Generating Precise a Priori Hypotheses.","authors":"Kyle D Austin, Hannah K Crawley, William Fleeson, R Michael Furr","doi":"10.1177/10731911251401321","DOIUrl":"https://doi.org/10.1177/10731911251401321","url":null,"abstract":"<p><p>We propose, illustrate, and evaluate the use of artificial intelligence (AI) to advance rigorous hypothesis-driven scale validation. Using a qualitative approach, we found that AI provided useful suggestions for measures to be used as criteria in scale validation research. Using data and expert predictions previously used to validate nine scales/subscales, we evaluated AI's ability to produce precise, psychologically reasonable validity hypotheses. ChatGPT and Gemini produced hypotheses with \"inter-trial consistency\" similar to experts' \"inter-rater consistency,\" and their hypotheses agreed strongly with experts' hypotheses. Importantly, their hypothesized validity correlations were roughly as accurate (in terms of corresponding with actual validity correlations) as were experts' hypotheses. Replicating across nine scales/subscales, results are encouraging regarding the use of AI to facilitate a precise hypothesis-driven approach to convergent and discriminant validity in a way that saves time with little-to-no cost in psychological or psychometric quality.</p>","PeriodicalId":8577,"journal":{"name":"Assessment","volume":" ","pages":"10731911251401321"},"PeriodicalIF":3.4,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145853536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-29DOI: 10.1177/10731911251403907
Xinlei Zang, Juan Yang
The present study aimed to develop and validate a multimodal self-esteem recognition method based on a self-introduction task, with the goal of achieving automated self-esteem evaluation. We recruited two independent samples of undergraduate students (N = 211 and N = 63) and collected 40-second self-introduction videos along with Rosenberg Self-Esteem Scale (RSES) scores. Features were extracted from three modalities-visual, audio, and text-and three-class models were trained using the dataset of 211 participants. Results indicated that the late-fusion multimodal model achieved the highest performance (Accuracy, ACC = 0.447 ± 0.019; Macro-averaged F1, Macro-F1 = 0.438 ± 0.020) and further demonstrated cross-sample generalizability when validated on the independent sample of 63 participants (ACC = 0.381, Macro-F1 = 0.379). Reliability testing showed good interrater consistency (Fleiss' κ = 0.723, Intraclass Correlation Coefficient, ICC = 0.745). Criterion-related validity analyses indicated that the proposed method was significantly correlated with life satisfaction, subjective happiness, positive and negative affect, depression, anxiety, stress, relational self-esteem, and collective self-esteem. Moreover, incremental validity analyses indicated that the multimodal model provided additional predictive value for positive affect beyond the RSES. Taken together, these findings provide preliminary evidence that multimodal behavioral features can assist in achieving automated self-esteem evaluation, offering a feasible, low-burden complement to traditional self-report.
{"title":"Self-Esteem Assessment Based on Self-Introduction: A Multimodal Approach to Personality Computing.","authors":"Xinlei Zang, Juan Yang","doi":"10.1177/10731911251403907","DOIUrl":"https://doi.org/10.1177/10731911251403907","url":null,"abstract":"<p><p>The present study aimed to develop and validate a multimodal self-esteem recognition method based on a self-introduction task, with the goal of achieving automated self-esteem evaluation. We recruited two independent samples of undergraduate students (<i>N</i> = 211 and <i>N</i> = 63) and collected 40-second self-introduction videos along with Rosenberg Self-Esteem Scale (RSES) scores. Features were extracted from three modalities-visual, audio, and text-and three-class models were trained using the dataset of 211 participants. Results indicated that the late-fusion multimodal model achieved the highest performance (Accuracy, ACC = 0.447 ± 0.019; Macro-averaged F1, Macro-F1 = 0.438 ± 0.020) and further demonstrated cross-sample generalizability when validated on the independent sample of 63 participants (ACC = 0.381, Macro-F1 = 0.379). Reliability testing showed good interrater consistency (Fleiss' κ = 0.723, Intraclass Correlation Coefficient, ICC = 0.745). Criterion-related validity analyses indicated that the proposed method was significantly correlated with life satisfaction, subjective happiness, positive and negative affect, depression, anxiety, stress, relational self-esteem, and collective self-esteem. Moreover, incremental validity analyses indicated that the multimodal model provided additional predictive value for positive affect beyond the RSES. Taken together, these findings provide preliminary evidence that multimodal behavioral features can assist in achieving automated self-esteem evaluation, offering a feasible, low-burden complement to traditional self-report.</p>","PeriodicalId":8577,"journal":{"name":"Assessment","volume":" ","pages":"10731911251403907"},"PeriodicalIF":3.4,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145848769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-28DOI: 10.1177/10731911251401306
Pinar Toptas, Tycho J Dekkers, Annabeth P Groenman, Geraldina F Gaastra, Dick de Waard, Anselm B M Fuermaier
Assessing the credibility of presented problems is an essential part of the clinical evaluation of attention-deficit/hyperactivity disorder (ADHD) in adulthood. We conducted a systematic review and meta-analysis to examine Continuous Performance Tests (CPTs) as embedded validity indicators. Eighteen studies (n = 3,021; 67 effect sizes) were analyzed: eight simulation studies and ten analogue studies. Moderating variables included study design (simulation vs. criterion) and sample type (student vs. nonstudent). CPTs effectively distinguish between credible and noncredible performance (g = 0.73). Effect sizes were nearly twice as large in simulation studies (g = 0.94) compared to criterion group studies (g = 0.55), underscoring the influence of study design on the interpretation of research findings. Student and nonstudent groups did not differ significantly. CPTs are valuable as embedded validity indicators. Given the moderate effects, clinical decisions should not rely on a single CPT but on a variety of measures.
{"title":"Evaluating Continuous Performance Tests as Embedded Measures of Performance Validity in ADHD Assessments: A Systematic Review and Meta-Analysis.","authors":"Pinar Toptas, Tycho J Dekkers, Annabeth P Groenman, Geraldina F Gaastra, Dick de Waard, Anselm B M Fuermaier","doi":"10.1177/10731911251401306","DOIUrl":"https://doi.org/10.1177/10731911251401306","url":null,"abstract":"<p><p>Assessing the credibility of presented problems is an essential part of the clinical evaluation of attention-deficit/hyperactivity disorder (ADHD) in adulthood. We conducted a systematic review and meta-analysis to examine Continuous Performance Tests (CPTs) as embedded validity indicators. Eighteen studies (<i>n</i> = 3,021; 67 effect sizes) were analyzed: eight simulation studies and ten analogue studies. Moderating variables included study design (simulation vs. criterion) and sample type (student vs. nonstudent). CPTs effectively distinguish between credible and noncredible performance (<i>g</i> = 0.73). Effect sizes were nearly twice as large in simulation studies (<i>g</i> = 0.94) compared to criterion group studies (<i>g</i> = 0.55), underscoring the influence of study design on the interpretation of research findings. Student and nonstudent groups did not differ significantly. CPTs are valuable as embedded validity indicators. Given the moderate effects, clinical decisions should not rely on a single CPT but on a variety of measures.</p>","PeriodicalId":8577,"journal":{"name":"Assessment","volume":" ","pages":"10731911251401306"},"PeriodicalIF":3.4,"publicationDate":"2025-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145848785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}