Assessment最新文献_第4页

Detecting Suicidal Ideation in Adolescence Using Self-Reported Emotional and Behavioral Patterns: Comparing Machine Learning and Large Language Model Predictions. 使用自我报告的情绪和行为模式检测青少年自杀意念：比较机器学习和大型语言模型预测。

IF 3.4 2区心理学 Q1 PSYCHOLOGY, CLINICAL

Assessment

Pub Date : 2025-12-31 DOI: 10.1177/10731911251406405

Davide Marengo, Claudio Longobardi

Suicidal ideation in adolescents is a critical public health issue requiring early detection. This study examined whether machine learning (ML) and large language models (LLMs) can detect ideation in 1,197 students (ages 10-15) using self-reported Strengths and Difficulties Questionnaire (SDQ) data. Clinically relevant ideation was defined using Suicidal Ideation Questionnaire-Junior (SIQ-JR) cut-offs. Gemini 1.5 Pro and GPT-4o were prompted to estimate SIQ-JR scores from SDQ responses and demographics; Logistic Regression, Naive Bayes, and Random Forest models were trained on either SDQ data or LLM predictions. LLM predictions correlated with SIQ-JR (ρ = .61) and showed good discrimination across thresholds (area under the curve (AUC) ≥ .83), with item-level associations paralleling self-reports, revealing strong associations with emotional symptoms and peer problems. In cross-validated analyses, the best SDQ-based ML model reached sensitivity = .85 and specificity = .72; the best LLM-based model achieved .80 and .74. Notably, ML models trained directly on SDQ responses consistently outperformed those incorporating LLM predictions across all SIQ-JR thresholds. Nonetheless, LLMs demonstrated promising accuracy in identifying suicidal ideation based on SDQ and demographic data. Further refinement and validation are required before these approaches can be considered viable for clinical implementation.

青少年自杀意念是一个重要的公共卫生问题，需要及早发现。本研究考察了机器学习（ML）和大型语言模型（llm）是否可以使用自我报告的优势和困难问卷（SDQ）数据检测1,197名学生（10-15岁）的思维。采用青少年自杀意念量表（SIQ-JR）的截止点定义临床相关意念。Gemini 1.5 Pro和gpt - 40被提示根据SDQ反应和人口统计数据估计SIQ-JR分数；逻辑回归、朴素贝叶斯和随机森林模型在SDQ数据或LLM预测上进行训练。LLM预测与SIQ-JR相关（ρ = 0.61），并且在阈值（曲线下面积(AUC)≥0.83）上表现出良好的区分，项目水平的关联与自我报告平行，显示出与情绪症状和同伴问题的强烈关联。在交叉验证分析中，基于sdq的最佳ML模型灵敏度为0.85，特异性为0.72；最好的基于法学硕士的模型。80和。74。值得注意的是，直接在SDQ响应上训练的ML模型在所有SIQ-JR阈值上的表现始终优于那些结合LLM预测的模型。尽管如此，法学硕士在基于SDQ和人口统计数据识别自杀意念方面表现出了很好的准确性。在这些方法被认为是可行的临床实施之前，需要进一步的改进和验证。

{"title":"Detecting Suicidal Ideation in Adolescence Using Self-Reported Emotional and Behavioral Patterns: Comparing Machine Learning and Large Language Model Predictions.","authors":"Davide Marengo, Claudio Longobardi","doi":"10.1177/10731911251406405","DOIUrl":"https://doi.org/10.1177/10731911251406405","url":null,"abstract":"Suicidal ideation in adolescents is a critical public health issue requiring early detection. This study examined whether machine learning (ML) and large language models (LLMs) can detect ideation in 1,197 students (ages 10-15) using self-reported Strengths and Difficulties Questionnaire (SDQ) data. Clinically relevant ideation was defined using Suicidal Ideation Questionnaire-Junior (SIQ-JR) cut-offs. Gemini 1.5 Pro and GPT-4o were prompted to estimate SIQ-JR scores from SDQ responses and demographics; Logistic Regression, Naive Bayes, and Random Forest models were trained on either SDQ data or LLM predictions. LLM predictions correlated with SIQ-JR (ρ = .61) and showed good discrimination across thresholds (area under the curve (AUC) ≥ .83), with item-level associations paralleling self-reports, revealing strong associations with emotional symptoms and peer problems. In cross-validated analyses, the best SDQ-based ML model reached sensitivity = .85 and specificity = .72; the best LLM-based model achieved .80 and .74. Notably, ML models trained directly on SDQ responses consistently outperformed those incorporating LLM predictions across all SIQ-JR thresholds. Nonetheless, LLMs demonstrated promising accuracy in identifying suicidal ideation based on SDQ and demographic data. Further refinement and validation are required before these approaches can be considered viable for clinical implementation.","PeriodicalId":8577,"journal":{"name":"Assessment","volume":" ","pages":"10731911251406405"},"PeriodicalIF":3.4,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145861955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Detecting Cry in Daylong Audio Recordings Using Machine Learning: The Development and Evaluation of Binary Classifiers. 用机器学习检测全天录音中的哭泣：二元分类器的发展和评估。

IF 3.4 2区心理学 Q1 PSYCHOLOGY, CLINICAL

Assessment

Pub Date : 2025-12-30 DOI: 10.1177/10731911251395993

Lauren M Henry, Kyunghun Lee, Eleanor Hansen, Elizabeth Tandilashvili, James Rozsypal, Trinity Erjo, Julia G Raven, Haley M Reynolds, Philip Curtis, Simone P Haller, Daniel S Pine, Elizabeth S Norton, Lauren S Wakschlag, Francisco Pereira, Melissa A Brotman

Atypical cry in infants/toddlers may serve as early, ecologically valid, and scalable indicators of irritability, a transdiagnostic mental health risk marker. Machine learning may identify cry in daylong audio recordings toward predicting outcomes. We developed a novel cry detection algorithm and evaluated performance against our reimplementation of an existing algorithm. In PyTorch, we reimplemented a support vector machine classifier that uses acoustic and deep spectral features from a modified AlexNet. We developed a novel classifier combining wav2vec 2.0 with conventional audio features and gradient boosting machines. Both classifiers were trained and evaluated using a previously annotated open-source data set (N = 21). In a new data set (N = 100), we annotated cry and examined the performance of both classifiers in identifying this ground truth. The existing and novel algorithms performed well in identifying ground truth cry in both the data set in which they were developed (AUCs = 0.897, 0.936) and the new data set (AUCs = 0.841, 0.902), underscoring generalization to unseen data. Bayesian comparison demonstrated that the novel algorithm outperformed the existing algorithm, which can be attributed to the novel algorithm's feature space and use of gradient boosting machines. This research provides a foundation for efficient detection of atypical cry patterns, with implications for earlier identification of dysregulated irritability presaging psychopathology.

婴儿/学步儿童的非典型哭闹可以作为早期的、生态有效的、可扩展的易怒指标，是一种跨诊断的心理健康风险标记。机器学习可以在一整天的录音中识别哭声，以预测结果。我们开发了一种新的哭泣检测算法，并对我们重新实现现有算法的性能进行了评估。在PyTorch中，我们重新实现了一个支持向量机分类器，它使用了经过修改的AlexNet的声学和深光谱特征。我们开发了一种新的分类器，将wav2vec 2.0与传统的音频特征和梯度增强机器相结合。两个分类器都使用先前注释的开源数据集（N = 21）进行训练和评估。在一个新的数据集（N = 100）中，我们对哭泣进行了注释，并检查了两个分类器在识别这个基础真理方面的性能。现有算法和新算法在其开发的数据集（auc = 0.897, 0.936）和新数据集（auc = 0.841, 0.902）中都表现良好，强调了对未见数据的泛化。贝叶斯比较表明，新算法优于现有算法，这可归因于新算法的特征空间和梯度增强机的使用。本研究为非典型哭泣模式的有效检测提供了基础，对早期识别失调的易怒预示精神病理具有重要意义。

{"title":"Detecting Cry in Daylong Audio Recordings Using Machine Learning: The Development and Evaluation of Binary Classifiers.","authors":"Lauren M Henry, Kyunghun Lee, Eleanor Hansen, Elizabeth Tandilashvili, James Rozsypal, Trinity Erjo, Julia G Raven, Haley M Reynolds, Philip Curtis, Simone P Haller, Daniel S Pine, Elizabeth S Norton, Lauren S Wakschlag, Francisco Pereira, Melissa A Brotman","doi":"10.1177/10731911251395993","DOIUrl":"https://doi.org/10.1177/10731911251395993","url":null,"abstract":"Atypical cry in infants/toddlers may serve as early, ecologically valid, and scalable indicators of irritability, a transdiagnostic mental health risk marker. Machine learning may identify cry in daylong audio recordings toward predicting outcomes. We developed a novel cry detection algorithm and evaluated performance against our reimplementation of an existing algorithm. In PyTorch, we reimplemented a support vector machine classifier that uses acoustic and deep spectral features from a modified AlexNet. We developed a novel classifier combining wav2vec 2.0 with conventional audio features and gradient boosting machines. Both classifiers were trained and evaluated using a previously annotated open-source data set (N = 21). In a new data set (N = 100), we annotated cry and examined the performance of both classifiers in identifying this ground truth. The existing and novel algorithms performed well in identifying ground truth cry in both the data set in which they were developed (AUCs = 0.897, 0.936) and the new data set (AUCs = 0.841, 0.902), underscoring generalization to unseen data. Bayesian comparison demonstrated that the novel algorithm outperformed the existing algorithm, which can be attributed to the novel algorithm's feature space and use of gradient boosting machines. This research provides a foundation for efficient detection of atypical cry patterns, with implications for earlier identification of dysregulated irritability presaging psychopathology.","PeriodicalId":8577,"journal":{"name":"Assessment","volume":" ","pages":"10731911251395993"},"PeriodicalIF":3.4,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145861944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Conceptualization and Measurement of Anxious Freezing. 焦虑性冻结的概念和测量。

IF 3.4 2区心理学 Q1 PSYCHOLOGY, CLINICAL

Assessment

Pub Date : 2025-12-30 DOI: 10.1177/10731911251401405

Maya A Marder, Corey Richier, Gregory A Miller, Wendy Heller

Studies of passive freeze behavior, an innate reaction to perceived or actual threat, have largely been concerned with its physical manifestations in the face of imminent danger (e.g., tonic immobility). Relatively little work has examined psychological aspects of the freezing phenomenon (e.g., cognitive freezing and threat evaluation) that may contribute significantly to the freezing episode. The present research considers dimensions of freezing, a set of contexts that may elicit freezing, and ways freezing relates to other internalizing symptoms or previous experiences of traumatic life events. The Anxious Freezing Questionnaire (AFQ) was developed using university samples (N = 653, N = 447, N = 590). Scale development best practices characterized a three-factor solution yielding physical freezing, cognitive freezing, and threat evaluation factors with good reliability and validity that were moderately correlated with, yet distinguishable from, other anxiety scales. Findings indicate that social-evaluative and performance contexts are relevant for freezing episodes. Results showed that previous experiences of traumatic events were significantly associated with higher levels of anxious freezing across all factors. This instrument has promise for identifying individual differences in profiles of anxiety-related freezing, with consideration of dimensional symptoms and a range of freezing-related contexts that may occur in everyday life.

被动冻结行为是一种对感知或实际威胁的先天反应，其研究主要涉及其在面对迫在眉睫的危险时的身体表现（例如，强直性静止）。相对而言，很少有研究研究冻结现象的心理方面（例如，认知冻结和威胁评估），这些方面可能对冻结事件有重大影响。目前的研究考虑了冻结的维度，一组可能引起冻结的环境，以及冻结与其他内化症状或先前创伤性生活事件经历相关的方式。焦虑性冻结问卷（AFQ）采用大学样本（N = 653, N = 447, N = 590）编制。量表开发最佳实践的特点是三因素解决方案产生物理冻结、认知冻结和威胁评估因素，具有良好的信度和效度，与其他焦虑量表适度相关，但与其他焦虑量表区分开来。研究结果表明，社会评价和表现情境与冻结发作有关。结果显示，以往的创伤性事件经历与所有因素中较高水平的焦虑性冻结显著相关。该工具有望在考虑维度症状和日常生活中可能发生的一系列与冻结相关的背景的情况下，识别与焦虑相关的冻结概况的个体差异。

{"title":"Conceptualization and Measurement of Anxious Freezing.","authors":"Maya A Marder, Corey Richier, Gregory A Miller, Wendy Heller","doi":"10.1177/10731911251401405","DOIUrl":"https://doi.org/10.1177/10731911251401405","url":null,"abstract":"Studies of passive freeze behavior, an innate reaction to perceived or actual threat, have largely been concerned with its physical manifestations in the face of imminent danger (e.g., tonic immobility). Relatively little work has examined psychological aspects of the freezing phenomenon (e.g., cognitive freezing and threat evaluation) that may contribute significantly to the freezing episode. The present research considers dimensions of freezing, a set of contexts that may elicit freezing, and ways freezing relates to other internalizing symptoms or previous experiences of traumatic life events. The Anxious Freezing Questionnaire (AFQ) was developed using university samples (N = 653, N = 447, N = 590). Scale development best practices characterized a three-factor solution yielding physical freezing, cognitive freezing, and threat evaluation factors with good reliability and validity that were moderately correlated with, yet distinguishable from, other anxiety scales. Findings indicate that social-evaluative and performance contexts are relevant for freezing episodes. Results showed that previous experiences of traumatic events were significantly associated with higher levels of anxious freezing across all factors. This instrument has promise for identifying individual differences in profiles of anxiety-related freezing, with consideration of dimensional symptoms and a range of freezing-related contexts that may occur in everyday life.","PeriodicalId":8577,"journal":{"name":"Assessment","volume":" ","pages":"10731911251401405"},"PeriodicalIF":3.4,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145861982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Using Generative Artificial Intelligence to Advance Hypothesis-Driven Scale Validation: Identifying Criterion Measures and Generating Precise a Priori Hypotheses. 使用生成式人工智能推进假设驱动的尺度验证：识别标准度量和生成精确的先验假设。

IF 3.4 2区心理学 Q1 PSYCHOLOGY, CLINICAL

Assessment

Pub Date : 2025-12-29 DOI: 10.1177/10731911251401321

Kyle D Austin, Hannah K Crawley, William Fleeson, R Michael Furr

We propose, illustrate, and evaluate the use of artificial intelligence (AI) to advance rigorous hypothesis-driven scale validation. Using a qualitative approach, we found that AI provided useful suggestions for measures to be used as criteria in scale validation research. Using data and expert predictions previously used to validate nine scales/subscales, we evaluated AI's ability to produce precise, psychologically reasonable validity hypotheses. ChatGPT and Gemini produced hypotheses with "inter-trial consistency" similar to experts' "inter-rater consistency," and their hypotheses agreed strongly with experts' hypotheses. Importantly, their hypothesized validity correlations were roughly as accurate (in terms of corresponding with actual validity correlations) as were experts' hypotheses. Replicating across nine scales/subscales, results are encouraging regarding the use of AI to facilitate a precise hypothesis-driven approach to convergent and discriminant validity in a way that saves time with little-to-no cost in psychological or psychometric quality.

我们提出，说明和评估人工智能（AI）的使用，以推进严格的假设驱动的规模验证。采用定性方法，我们发现人工智能为尺度验证研究中用作标准的措施提供了有用的建议。利用之前用于验证九个量表/子量表的数据和专家预测，我们评估了人工智能产生精确、心理上合理的有效性假设的能力。ChatGPT和Gemini提出的假设具有“试验间一致性”，类似于专家的“评估间一致性”，他们的假设与专家的假设非常一致。重要的是，他们假设的有效性相关性与专家的假设大致相同准确（就与实际有效性相关性的对应而言）。在九个量表/子量表中复制，结果令人鼓舞，关于使用人工智能来促进精确的假设驱动方法，以一种节省时间的方式，在心理或心理测量质量方面几乎没有成本。

{"title":"Using Generative Artificial Intelligence to Advance Hypothesis-Driven Scale Validation: Identifying Criterion Measures and Generating Precise a Priori Hypotheses.","authors":"Kyle D Austin, Hannah K Crawley, William Fleeson, R Michael Furr","doi":"10.1177/10731911251401321","DOIUrl":"https://doi.org/10.1177/10731911251401321","url":null,"abstract":"We propose, illustrate, and evaluate the use of artificial intelligence (AI) to advance rigorous hypothesis-driven scale validation. Using a qualitative approach, we found that AI provided useful suggestions for measures to be used as criteria in scale validation research. Using data and expert predictions previously used to validate nine scales/subscales, we evaluated AI's ability to produce precise, psychologically reasonable validity hypotheses. ChatGPT and Gemini produced hypotheses with \"inter-trial consistency\" similar to experts' \"inter-rater consistency,\" and their hypotheses agreed strongly with experts' hypotheses. Importantly, their hypothesized validity correlations were roughly as accurate (in terms of corresponding with actual validity correlations) as were experts' hypotheses. Replicating across nine scales/subscales, results are encouraging regarding the use of AI to facilitate a precise hypothesis-driven approach to convergent and discriminant validity in a way that saves time with little-to-no cost in psychological or psychometric quality.","PeriodicalId":8577,"journal":{"name":"Assessment","volume":" ","pages":"10731911251401321"},"PeriodicalIF":3.4,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145853536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Self-Esteem Assessment Based on Self-Introduction: A Multimodal Approach to Personality Computing. 基于自我介绍的自尊评估：一种多模态人格计算方法。

IF 3.4 2区心理学 Q1 PSYCHOLOGY, CLINICAL

Assessment

Pub Date : 2025-12-29 DOI: 10.1177/10731911251403907

Xinlei Zang, Juan Yang

The present study aimed to develop and validate a multimodal self-esteem recognition method based on a self-introduction task, with the goal of achieving automated self-esteem evaluation. We recruited two independent samples of undergraduate students (N = 211 and N = 63) and collected 40-second self-introduction videos along with Rosenberg Self-Esteem Scale (RSES) scores. Features were extracted from three modalities-visual, audio, and text-and three-class models were trained using the dataset of 211 participants. Results indicated that the late-fusion multimodal model achieved the highest performance (Accuracy, ACC = 0.447 ± 0.019; Macro-averaged F1, Macro-F1 = 0.438 ± 0.020) and further demonstrated cross-sample generalizability when validated on the independent sample of 63 participants (ACC = 0.381, Macro-F1 = 0.379). Reliability testing showed good interrater consistency (Fleiss' κ = 0.723, Intraclass Correlation Coefficient, ICC = 0.745). Criterion-related validity analyses indicated that the proposed method was significantly correlated with life satisfaction, subjective happiness, positive and negative affect, depression, anxiety, stress, relational self-esteem, and collective self-esteem. Moreover, incremental validity analyses indicated that the multimodal model provided additional predictive value for positive affect beyond the RSES. Taken together, these findings provide preliminary evidence that multimodal behavioral features can assist in achieving automated self-esteem evaluation, offering a feasible, low-burden complement to traditional self-report.

本研究旨在开发和验证一种基于自我介绍任务的多模态自尊识别方法，以实现自动化自尊评估。我们招募了两个独立的大学生样本（N = 211和N = 63），收集了40秒的自我介绍视频和罗森博格自尊量表（RSES）分数。从视觉、音频和文本三种形态中提取特征，并使用211名参与者的数据集训练三类模型。结果表明，后期融合多模态模型的准确率最高（准确度，ACC = 0.447±0.019；宏观平均F1， Macro-F1 = 0.438±0.020），并在63名参与者的独立样本上验证了该模型的跨样本可泛化性（ACC = 0.381，宏观平均F1 = 0.379）。信度检验显示，类间一致性良好（Fleiss’κ = 0.723，类内相关系数，ICC = 0.745）。效标相关效度分析表明，该方法与生活满意度、主观幸福感、积极和消极情绪、抑郁、焦虑、压力、关系自尊和集体自尊显著相关。此外，增量效度分析表明，多模态模型在RSES之外提供了积极影响的额外预测价值。综上所述，这些发现提供了初步证据，证明多模式行为特征有助于实现自动化自尊评估，为传统的自我报告提供了一种可行的、低负担的补充。

{"title":"Self-Esteem Assessment Based on Self-Introduction: A Multimodal Approach to Personality Computing.","authors":"Xinlei Zang, Juan Yang","doi":"10.1177/10731911251403907","DOIUrl":"https://doi.org/10.1177/10731911251403907","url":null,"abstract":"The present study aimed to develop and validate a multimodal self-esteem recognition method based on a self-introduction task, with the goal of achieving automated self-esteem evaluation. We recruited two independent samples of undergraduate students (N = 211 and N = 63) and collected 40-second self-introduction videos along with Rosenberg Self-Esteem Scale (RSES) scores. Features were extracted from three modalities-visual, audio, and text-and three-class models were trained using the dataset of 211 participants. Results indicated that the late-fusion multimodal model achieved the highest performance (Accuracy, ACC = 0.447 ± 0.019; Macro-averaged F1, Macro-F1 = 0.438 ± 0.020) and further demonstrated cross-sample generalizability when validated on the independent sample of 63 participants (ACC = 0.381, Macro-F1 = 0.379). Reliability testing showed good interrater consistency (Fleiss' κ = 0.723, Intraclass Correlation Coefficient, ICC = 0.745). Criterion-related validity analyses indicated that the proposed method was significantly correlated with life satisfaction, subjective happiness, positive and negative affect, depression, anxiety, stress, relational self-esteem, and collective self-esteem. Moreover, incremental validity analyses indicated that the multimodal model provided additional predictive value for positive affect beyond the RSES. Taken together, these findings provide preliminary evidence that multimodal behavioral features can assist in achieving automated self-esteem evaluation, offering a feasible, low-burden complement to traditional self-report.","PeriodicalId":8577,"journal":{"name":"Assessment","volume":" ","pages":"10731911251403907"},"PeriodicalIF":3.4,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145848769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evaluating Continuous Performance Tests as Embedded Measures of Performance Validity in ADHD Assessments: A Systematic Review and Meta-Analysis. 评价连续表现测试作为ADHD评估中表现效度的嵌入测量：系统回顾和荟萃分析。

IF 3.4 2区心理学 Q1 PSYCHOLOGY, CLINICAL

Assessment

Pub Date : 2025-12-28 DOI: 10.1177/10731911251401306

Pinar Toptas, Tycho J Dekkers, Annabeth P Groenman, Geraldina F Gaastra, Dick de Waard, Anselm B M Fuermaier

Assessing the credibility of presented problems is an essential part of the clinical evaluation of attention-deficit/hyperactivity disorder (ADHD) in adulthood. We conducted a systematic review and meta-analysis to examine Continuous Performance Tests (CPTs) as embedded validity indicators. Eighteen studies (n = 3,021; 67 effect sizes) were analyzed: eight simulation studies and ten criterion studies. Moderating variables included study design (simulation vs. criterion) and sample type (student vs. nonstudent). CPTs effectively distinguish between credible and noncredible performance (g = 0.73). Effect sizes were nearly twice as large in simulation studies (g = 0.94) compared to criterion group studies (g = 0.55), underscoring the influence of study design on the interpretation of research findings. Student and nonstudent groups did not differ significantly. CPTs are valuable as embedded validity indicators. Given the moderate effects, clinical decisions should not rely on a single CPT but on a variety of measures.

评估所呈现问题的可信度是成年期注意力缺陷/多动障碍（ADHD）临床评估的重要组成部分。我们进行了系统回顾和荟萃分析，以检验连续性能测试（CPTs）作为嵌入效度指标。18项研究（n = 3,021； 67个效应值）被分析：8项模拟研究和10项模拟研究。调节变量包括研究设计（模拟vs标准）和样本类型（学生vs非学生）。CPTs有效区分可信和不可信绩效（g = 0.73）。与标准组研究（g = 0.55）相比，模拟研究（g = 0.94）的效应量几乎是标准组研究（g = 0.55）的两倍，强调了研究设计对研究结果解释的影响。学生组和非学生组没有显著差异。cpt作为嵌入效度指标是有价值的。鉴于效果适中，临床决策不应依赖单一的CPT，而应依赖多种措施。

{"title":"Evaluating Continuous Performance Tests as Embedded Measures of Performance Validity in ADHD Assessments: A Systematic Review and Meta-Analysis.","authors":"Pinar Toptas, Tycho J Dekkers, Annabeth P Groenman, Geraldina F Gaastra, Dick de Waard, Anselm B M Fuermaier","doi":"10.1177/10731911251401306","DOIUrl":"10.1177/10731911251401306","url":null,"abstract":"Assessing the credibility of presented problems is an essential part of the clinical evaluation of attention-deficit/hyperactivity disorder (ADHD) in adulthood. We conducted a systematic review and meta-analysis to examine Continuous Performance Tests (CPTs) as embedded validity indicators. Eighteen studies (n = 3,021; 67 effect sizes) were analyzed: eight simulation studies and ten criterion studies. Moderating variables included study design (simulation vs. criterion) and sample type (student vs. nonstudent). CPTs effectively distinguish between credible and noncredible performance (g = 0.73). Effect sizes were nearly twice as large in simulation studies (g = 0.94) compared to criterion group studies (g = 0.55), underscoring the influence of study design on the interpretation of research findings. Student and nonstudent groups did not differ significantly. CPTs are valuable as embedded validity indicators. Given the moderate effects, clinical decisions should not rely on a single CPT but on a variety of measures.","PeriodicalId":8577,"journal":{"name":"Assessment","volume":" ","pages":"10731911251401306"},"PeriodicalIF":3.4,"publicationDate":"2025-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145848785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Development and Validation of the Motives for Using Substances for Trauma Coping (MUST-Cope) Questionnaire: A Novel Multidimensional Scale to Assess Trauma-Specific Substance Use Coping Motives. 创伤应对物质使用动机的开发与验证（MUST-Cope）问卷：一种评估创伤特异性物质使用应对动机的新多维量表。

IF 3.4 2区心理学 Q1 PSYCHOLOGY, CLINICAL

Assessment

Pub Date : 2025-12-26 DOI: 10.1177/10731911251403897

Kelly E Dixon, Andrew Lac

The present study sought to develop and validate a novel multidimensional assessment of substance use (SU) coping motives to manage trauma symptoms. In Study 1 (N = 326 trauma-exposed adults recruited from several online platforms), a set of questionnaire items was created and administered, and exploratory factor analysis was performed. A correlated four-factor structure represented by cognitive-affective motives, physiological motives, sleep motives, and social motives emerged. In Study 2 (N = 261 trauma-exposed adults recruited from ResearchMatch), confirmatory factor analysis cross-validated the correlated four-factor structure and additionally tested a five-factor higher-order structure. In tests of convergent, discriminant, and criterion validities, the subscales demonstrated differential correlations with previously validated measures of SU motives and positively correlated with higher PTSD symptom severity, functional impairment, and alcohol and drug use severity. The final 31-item Motives for Using Substances for Trauma Coping (MUST-Cope) Questionnaire offers a novel multifactorial measurement instrument to help researchers and clinicians assess and identify functional coping motives for SU that can be targeted in psychosocial treatment.

本研究旨在开发和验证一种新的多维评估物质使用（SU）应对动机来管理创伤症状。在研究1中（N = 326名从多个网络平台招募的创伤暴露成人），我们创建并实施了一套问卷，并进行了探索性因素分析。形成了认知情感动机、生理动机、睡眠动机和社会动机四个相关的四因素结构。在研究2中（N = 261名来自ResearchMatch的创伤暴露成人），验证性因子分析交叉验证了相关的四因素结构，并额外测试了五因素高阶结构。在收敛效度、判别效度和标准效度的测试中，子量表显示出与先前验证过的SU动机测量的差异相关性，并与较高的PTSD症状严重程度、功能损害、酒精和药物使用严重程度呈正相关。最后的31项创伤应对动机问卷（MUST-Cope）提供了一种新的多因素测量工具，帮助研究人员和临床医生评估和确定SU的功能性应对动机，从而可以针对性地进行心理社会治疗。

{"title":"Development and Validation of the Motives for Using Substances for Trauma Coping (MUST-Cope) Questionnaire: A Novel Multidimensional Scale to Assess Trauma-Specific Substance Use Coping Motives.","authors":"Kelly E Dixon, Andrew Lac","doi":"10.1177/10731911251403897","DOIUrl":"https://doi.org/10.1177/10731911251403897","url":null,"abstract":"The present study sought to develop and validate a novel multidimensional assessment of substance use (SU) coping motives to manage trauma symptoms. In Study 1 (N = 326 trauma-exposed adults recruited from several online platforms), a set of questionnaire items was created and administered, and exploratory factor analysis was performed. A correlated four-factor structure represented by cognitive-affective motives, physiological motives, sleep motives, and social motives emerged. In Study 2 (N = 261 trauma-exposed adults recruited from ResearchMatch), confirmatory factor analysis cross-validated the correlated four-factor structure and additionally tested a five-factor higher-order structure. In tests of convergent, discriminant, and criterion validities, the subscales demonstrated differential correlations with previously validated measures of SU motives and positively correlated with higher PTSD symptom severity, functional impairment, and alcohol and drug use severity. The final 31-item Motives for Using Substances for Trauma Coping (MUST-Cope) Questionnaire offers a novel multifactorial measurement instrument to help researchers and clinicians assess and identify functional coping motives for SU that can be targeted in psychosocial treatment.","PeriodicalId":8577,"journal":{"name":"Assessment","volume":" ","pages":"10731911251403897"},"PeriodicalIF":3.4,"publicationDate":"2025-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145832998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

One Construct or Many? Clarifying the Structure and Meaning of Measures of Psychological and Cognitive Flexibility and Their Components in a Community and Chronic Pain Sample. 一个构造还是多个构造？澄清社区和慢性疼痛样本的心理和认知灵活性及其组成部分的结构和意义。

IF 3.4 2区心理学 Q1 PSYCHOLOGY, CLINICAL

Assessment

Pub Date : 2025-12-26 DOI: 10.1177/10731911251399030

Jayden Lucas, Jeffery M Lackner, Gregory Gudleski, Andrew H Rogers, Rodrigo Becerra, Kristin Naragon-Gainey

There are a plethora of "flexibility" constructs and measures in psychology, but the extent to which they assess the same or different constructs, and whether flexibility and inflexibility are separate constructs (vs. extremes of the same bipolar continuum), remains underexplored. We examined the distinctiveness of seven different self-report measures of psychological (in)flexibility and cognitive flexibility using an online community (N = 465) and a chronic pain sample (N = 445). We analyzed the latent structure of these questionnaires using item-level exploratory structural equation modeling that controlled for measure-specific variance, and we tested these factors in relation to a range of mental health outcomes (concurrent validity) and discriminant validity measures. Findings indicate that psychological and cognitive flexibility questionnaires can be characterized at multiple levels, including six lower-order components that span individual measures and global factors that account for their shared variance. The six factors were broadly and uniquely associated with clinically relevant variables, including symptoms and well-being. We also found support for the notion that flexibility and inflexibility exist on a single bipolar continuum, rather than being characterized as separate. Implications for clinical assessment in research and intervention settings are discussed.

心理学中有过多的“灵活性”构念和测量，但它们评估相同或不同构念的程度，以及灵活性和不灵活性是否是单独的构念（相对于同一双相连续体的极端），仍未得到充分探索。我们使用在线社区（N = 465）和慢性疼痛样本（N = 445）检验了七种不同的心理（in）灵活性和认知灵活性自我报告测量的独特性。我们使用项目水平探索性结构方程模型分析了这些问卷的潜在结构，控制了测量特定方差，并测试了这些因素与一系列心理健康结果（并发效度）和判别效度的关系。研究结果表明，心理和认知灵活性问卷可以在多个层面上进行表征，包括跨越个体测量的六个低阶成分和解释其共同方差的全局因素。这六个因素与临床相关变量广泛而独特地相关，包括症状和幸福感。我们还发现，灵活性和不灵活性存在于一个单一的两极连续体上，而不是被描述为分开的概念得到了支持。在研究和干预设置临床评估的意义进行了讨论。

{"title":"One Construct or Many? Clarifying the Structure and Meaning of Measures of Psychological and Cognitive Flexibility and Their Components in a Community and Chronic Pain Sample.","authors":"Jayden Lucas, Jeffery M Lackner, Gregory Gudleski, Andrew H Rogers, Rodrigo Becerra, Kristin Naragon-Gainey","doi":"10.1177/10731911251399030","DOIUrl":"https://doi.org/10.1177/10731911251399030","url":null,"abstract":"There are a plethora of \"flexibility\" constructs and measures in psychology, but the extent to which they assess the same or different constructs, and whether flexibility and inflexibility are separate constructs (vs. extremes of the same bipolar continuum), remains underexplored. We examined the distinctiveness of seven different self-report measures of psychological (in)flexibility and cognitive flexibility using an online community (N = 465) and a chronic pain sample (N = 445). We analyzed the latent structure of these questionnaires using item-level exploratory structural equation modeling that controlled for measure-specific variance, and we tested these factors in relation to a range of mental health outcomes (concurrent validity) and discriminant validity measures. Findings indicate that psychological and cognitive flexibility questionnaires can be characterized at multiple levels, including six lower-order components that span individual measures and global factors that account for their shared variance. The six factors were broadly and uniquely associated with clinically relevant variables, including symptoms and well-being. We also found support for the notion that flexibility and inflexibility exist on a single bipolar continuum, rather than being characterized as separate. Implications for clinical assessment in research and intervention settings are discussed.","PeriodicalId":8577,"journal":{"name":"Assessment","volume":" ","pages":"10731911251399030"},"PeriodicalIF":3.4,"publicationDate":"2025-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145833058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Psychometric Analysis of the Military Stigma Scale. 军人耻感量表的心理测量分析。

IF 3.4 2区心理学 Q1 PSYCHOLOGY, CLINICAL

Assessment

Pub Date : 2025-12-25 DOI: 10.1177/10731911251398039

Samantha Cacace, Robert J Cramer, Max Stivers, Raymond P Tucker, Marcus VanSickle

U.S. military populations experience a high level of mental health concerns, including post-traumatic stress disorder, clinical depression, and suicide, when compared with their civilian counterparts, and tend to access mental health services at a lower rate. Military health scholars have noted that stigma against mental health help-seeking has multiple sources, including professional, personal, and social components, though these components are rarely separated in examining why military service members avoid clinical help. Valid measurement of these factors is necessary to examine the heart of rising clinical needs. The current study replicates and extends prior work applying a bifactor model to the Military Stigma Scale (MSS). In a sample of n = 1,832 Army National Guard members, a bifactor model presented acceptable fit, though invariance testing by rank and education indicates disparate experiences with military service as deviating influences. Specifically, Private Stigma was significantly lower in higher paygrade service members and those with a college degree, while Public Stigma was higher. Results call into question the theoretical viability of a bifactor model of the MSS, especially in the evaluation of Expected Common Variance and specific factor reliability.

与平民相比，美国军人经历了高度的心理健康问题，包括创伤后应激障碍、临床抑郁症和自杀，并且倾向于以较低的比率获得心理健康服务。军事健康学者注意到，对寻求心理健康帮助的耻辱感有多种来源，包括专业、个人和社会因素，尽管在研究军人为什么避免临床帮助时很少将这些因素分开。这些因素的有效测量是必要的，以检查日益增长的临床需求的核心。本研究复制并扩展了先前的工作，将双因素模型应用于军事耻辱量表（MSS）。在一个n = 1832名陆军国民警卫队成员的样本中，一个双因素模型呈现出可接受的拟合，尽管军衔和教育的不变性检验表明，兵役的不同经历是偏离的影响。具体来说，在高工资等级和大学学历的服役人员中，私人耻辱感明显较低，而公共耻辱感较高。结果对MSS的双因素模型的理论可行性提出了质疑，特别是在评估预期共同方差和特定因素可靠性方面。

{"title":"A Psychometric Analysis of the Military Stigma Scale.","authors":"Samantha Cacace, Robert J Cramer, Max Stivers, Raymond P Tucker, Marcus VanSickle","doi":"10.1177/10731911251398039","DOIUrl":"https://doi.org/10.1177/10731911251398039","url":null,"abstract":"U.S. military populations experience a high level of mental health concerns, including post-traumatic stress disorder, clinical depression, and suicide, when compared with their civilian counterparts, and tend to access mental health services at a lower rate. Military health scholars have noted that stigma against mental health help-seeking has multiple sources, including professional, personal, and social components, though these components are rarely separated in examining why military service members avoid clinical help. Valid measurement of these factors is necessary to examine the heart of rising clinical needs. The current study replicates and extends prior work applying a bifactor model to the Military Stigma Scale (MSS). In a sample of n = 1,832 Army National Guard members, a bifactor model presented acceptable fit, though invariance testing by rank and education indicates disparate experiences with military service as deviating influences. Specifically, Private Stigma was significantly lower in higher paygrade service members and those with a college degree, while Public Stigma was higher. Results call into question the theoretical viability of a bifactor model of the MSS, especially in the evaluation of Expected Common Variance and specific factor reliability.","PeriodicalId":8577,"journal":{"name":"Assessment","volume":" ","pages":"10731911251398039"},"PeriodicalIF":3.4,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145832978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Embedded Validity Scales to Examine Caregiver Response Styles When Measuring Infant/Toddler Developmental Status. 在测量婴儿/幼儿发展状态时，嵌入效度量表来检验照顾者的反应风格。

IF 3.4 2区心理学 Q1 PSYCHOLOGY, CLINICAL

Assessment

Pub Date : 2025-12-22 DOI: 10.1177/10731911251391563

Renee Lajiness-O'Neill, Michelle Lobermeier, Angela D Staples, Annette Richard, Alissa Huth-Bocks, Seth Warschausky, H Gerry Taylor, Natasha Lang, Angela Lukomski, Laszlo Erdodi

Validity of caregiver response to a developmental screening instrument was examined in 571 caregivers (51.7% identifying as ethnic minority) of infants/toddlers (48% female) assessed longitudinally from birth to 18 months. Three embedded validity scales were designed to detect: atypical (ATP), negative (NRS), and positive (PRS) response styles. Rates of responding on the ATP, NRS, and PRS scales relative to established validity measures, temporal stability including test-retest reliability of the scales, and relations between response styles and maternal education were examined. Response bias was low; however, significant differences due to maternal education were evident. More variable scores (ATP) and more advanced development (PRS) was consistently reported by caregivers with lower education. Caregivers with higher education reported their infants' development as less advanced (NRS). Base rates of uncommon responding ranged from 11.6% to 14.4% and 5.8% to 9.1% at liberal and conservative cut scores. Preliminary analysis of additional social-contextual sources of variation (e.g., caregiver mental health) in response styles suggests the need for complex modeling of multiple sources of bias in caregiver-reported developmental outcomes. These are the first embedded validity scales to be designed within a caregiver-reported instrument of infant/toddler development.

对571名婴幼儿（少数民族占51.7%）（女性占48%）的照护者（从出生到18个月）进行纵向评估，考察了照护者对发育筛查工具反应的有效性。设计了三个嵌入效度量表来检测：非典型（ATP）、消极（NRS）和积极（PRS）的反应风格。研究了ATP、NRS和PRS量表相对于已建立的效度量表的反应率、量表的时间稳定性（包括重测信度）以及反应方式与母亲教育程度之间的关系。反应偏倚低；然而，由于母亲教育程度的不同，显著差异是明显的。更多的可变分数（ATP）和更先进的发展（PRS）一致报告较低的教育照顾者。受过高等教育的护理人员报告他们的婴儿发育较不先进（NRS）。在自由派和保守派的削减分数中，不常见反应的基本比率从11.6%到14.4%不等，从5.8%到9.1%不等。对反应方式的其他社会环境差异来源（例如，照顾者心理健康）的初步分析表明，需要对照顾者报告的发展结果中的多种偏见来源进行复杂的建模。这些是第一个嵌入效度量表被设计在照顾者报告的婴儿/幼儿发展的工具。

{"title":"Embedded Validity Scales to Examine Caregiver Response Styles When Measuring Infant/Toddler Developmental Status.","authors":"Renee Lajiness-O'Neill, Michelle Lobermeier, Angela D Staples, Annette Richard, Alissa Huth-Bocks, Seth Warschausky, H Gerry Taylor, Natasha Lang, Angela Lukomski, Laszlo Erdodi","doi":"10.1177/10731911251391563","DOIUrl":"https://doi.org/10.1177/10731911251391563","url":null,"abstract":"Validity of caregiver response to a developmental screening instrument was examined in 571 caregivers (51.7% identifying as ethnic minority) of infants/toddlers (48% female) assessed longitudinally from birth to 18 months. Three embedded validity scales were designed to detect: atypical (ATP), negative (NRS), and positive (PRS) response styles. Rates of responding on the ATP, NRS, and PRS scales relative to established validity measures, temporal stability including test-retest reliability of the scales, and relations between response styles and maternal education were examined. Response bias was low; however, significant differences due to maternal education were evident. More variable scores (ATP) and more advanced development (PRS) was consistently reported by caregivers with lower education. Caregivers with higher education reported their infants' development as less advanced (NRS). Base rates of uncommon responding ranged from 11.6% to 14.4% and 5.8% to 9.1% at liberal and conservative cut scores. Preliminary analysis of additional social-contextual sources of variation (e.g., caregiver mental health) in response styles suggests the need for complex modeling of multiple sources of bias in caregiver-reported developmental outcomes. These are the first embedded validity scales to be designed within a caregiver-reported instrument of infant/toddler development.","PeriodicalId":8577,"journal":{"name":"Assessment","volume":" ","pages":"10731911251391563"},"PeriodicalIF":3.4,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145803146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0