Pub Date : 2025-12-31DOI: 10.1177/10731911251406405
Davide Marengo, Claudio Longobardi
Suicidal ideation in adolescents is a critical public health issue requiring early detection. This study examined whether machine learning (ML) and large language models (LLMs) can detect ideation in 1,197 students (ages 10-15) using self-reported Strengths and Difficulties Questionnaire (SDQ) data. Clinically relevant ideation was defined using Suicidal Ideation Questionnaire-Junior (SIQ-JR) cut-offs. Gemini 1.5 Pro and GPT-4o were prompted to estimate SIQ-JR scores from SDQ responses and demographics; Logistic Regression, Naive Bayes, and Random Forest models were trained on either SDQ data or LLM predictions. LLM predictions correlated with SIQ-JR (ρ = .61) and showed good discrimination across thresholds (area under the curve (AUC) ≥ .83), with item-level associations paralleling self-reports, revealing strong associations with emotional symptoms and peer problems. In cross-validated analyses, the best SDQ-based ML model reached sensitivity = .85 and specificity = .72; the best LLM-based model achieved .80 and .74. Notably, ML models trained directly on SDQ responses consistently outperformed those incorporating LLM predictions across all SIQ-JR thresholds. Nonetheless, LLMs demonstrated promising accuracy in identifying suicidal ideation based on SDQ and demographic data. Further refinement and validation are required before these approaches can be considered viable for clinical implementation.
{"title":"Detecting Suicidal Ideation in Adolescence Using Self-Reported Emotional and Behavioral Patterns: Comparing Machine Learning and Large Language Model Predictions.","authors":"Davide Marengo, Claudio Longobardi","doi":"10.1177/10731911251406405","DOIUrl":"https://doi.org/10.1177/10731911251406405","url":null,"abstract":"<p><p>Suicidal ideation in adolescents is a critical public health issue requiring early detection. This study examined whether machine learning (ML) and large language models (LLMs) can detect ideation in 1,197 students (ages 10-15) using self-reported Strengths and Difficulties Questionnaire (SDQ) data. Clinically relevant ideation was defined using Suicidal Ideation Questionnaire-Junior (SIQ-JR) cut-offs. Gemini 1.5 Pro and GPT-4o were prompted to estimate SIQ-JR scores from SDQ responses and demographics; Logistic Regression, Naive Bayes, and Random Forest models were trained on either SDQ data or LLM predictions. LLM predictions correlated with SIQ-JR (ρ = .61) and showed good discrimination across thresholds (area under the curve (AUC) ≥ .83), with item-level associations paralleling self-reports, revealing strong associations with emotional symptoms and peer problems. In cross-validated analyses, the best SDQ-based ML model reached sensitivity = .85 and specificity = .72; the best LLM-based model achieved .80 and .74. Notably, ML models trained directly on SDQ responses consistently outperformed those incorporating LLM predictions across all SIQ-JR thresholds. Nonetheless, LLMs demonstrated promising accuracy in identifying suicidal ideation based on SDQ and demographic data. Further refinement and validation are required before these approaches can be considered viable for clinical implementation.</p>","PeriodicalId":8577,"journal":{"name":"Assessment","volume":" ","pages":"10731911251406405"},"PeriodicalIF":3.4,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145861955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-30DOI: 10.1177/10731911251395993
Lauren M Henry, Kyunghun Lee, Eleanor Hansen, Elizabeth Tandilashvili, James Rozsypal, Trinity Erjo, Julia G Raven, Haley M Reynolds, Philip Curtis, Simone P Haller, Daniel S Pine, Elizabeth S Norton, Lauren S Wakschlag, Francisco Pereira, Melissa A Brotman
Atypical cry in infants/toddlers may serve as early, ecologically valid, and scalable indicators of irritability, a transdiagnostic mental health risk marker. Machine learning may identify cry in daylong audio recordings toward predicting outcomes. We developed a novel cry detection algorithm and evaluated performance against our reimplementation of an existing algorithm. In PyTorch, we reimplemented a support vector machine classifier that uses acoustic and deep spectral features from a modified AlexNet. We developed a novel classifier combining wav2vec 2.0 with conventional audio features and gradient boosting machines. Both classifiers were trained and evaluated using a previously annotated open-source data set (N = 21). In a new data set (N = 100), we annotated cry and examined the performance of both classifiers in identifying this ground truth. The existing and novel algorithms performed well in identifying ground truth cry in both the data set in which they were developed (AUCs = 0.897, 0.936) and the new data set (AUCs = 0.841, 0.902), underscoring generalization to unseen data. Bayesian comparison demonstrated that the novel algorithm outperformed the existing algorithm, which can be attributed to the novel algorithm's feature space and use of gradient boosting machines. This research provides a foundation for efficient detection of atypical cry patterns, with implications for earlier identification of dysregulated irritability presaging psychopathology.
{"title":"Detecting Cry in Daylong Audio Recordings Using Machine Learning: The Development and Evaluation of Binary Classifiers.","authors":"Lauren M Henry, Kyunghun Lee, Eleanor Hansen, Elizabeth Tandilashvili, James Rozsypal, Trinity Erjo, Julia G Raven, Haley M Reynolds, Philip Curtis, Simone P Haller, Daniel S Pine, Elizabeth S Norton, Lauren S Wakschlag, Francisco Pereira, Melissa A Brotman","doi":"10.1177/10731911251395993","DOIUrl":"https://doi.org/10.1177/10731911251395993","url":null,"abstract":"<p><p>Atypical cry in infants/toddlers may serve as early, ecologically valid, and scalable indicators of irritability, a transdiagnostic mental health risk marker. Machine learning may identify cry in daylong audio recordings toward predicting outcomes. We developed a novel cry detection algorithm and evaluated performance against our reimplementation of an existing algorithm. In PyTorch, we reimplemented a support vector machine classifier that uses acoustic and deep spectral features from a modified AlexNet. We developed a novel classifier combining wav2vec 2.0 with conventional audio features and gradient boosting machines. Both classifiers were trained and evaluated using a previously annotated open-source data set (<i>N</i> = 21). In a new data set (<i>N</i> = 100), we annotated cry and examined the performance of both classifiers in identifying this ground truth. The existing and novel algorithms performed well in identifying ground truth cry in both the data set in which they were developed (AUCs = 0.897, 0.936) and the new data set (AUCs = 0.841, 0.902), underscoring generalization to unseen data. Bayesian comparison demonstrated that the novel algorithm outperformed the existing algorithm, which can be attributed to the novel algorithm's feature space and use of gradient boosting machines. This research provides a foundation for efficient detection of atypical cry patterns, with implications for earlier identification of dysregulated irritability presaging psychopathology.</p>","PeriodicalId":8577,"journal":{"name":"Assessment","volume":" ","pages":"10731911251395993"},"PeriodicalIF":3.4,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145861944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-30DOI: 10.1177/10731911251401405
Maya A Marder, Corey Richier, Gregory A Miller, Wendy Heller
Studies of passive freeze behavior, an innate reaction to perceived or actual threat, have largely been concerned with its physical manifestations in the face of imminent danger (e.g., tonic immobility). Relatively little work has examined psychological aspects of the freezing phenomenon (e.g., cognitive freezing and threat evaluation) that may contribute significantly to the freezing episode. The present research considers dimensions of freezing, a set of contexts that may elicit freezing, and ways freezing relates to other internalizing symptoms or previous experiences of traumatic life events. The Anxious Freezing Questionnaire (AFQ) was developed using university samples (N = 653, N = 447, N = 590). Scale development best practices characterized a three-factor solution yielding physical freezing, cognitive freezing, and threat evaluation factors with good reliability and validity that were moderately correlated with, yet distinguishable from, other anxiety scales. Findings indicate that social-evaluative and performance contexts are relevant for freezing episodes. Results showed that previous experiences of traumatic events were significantly associated with higher levels of anxious freezing across all factors. This instrument has promise for identifying individual differences in profiles of anxiety-related freezing, with consideration of dimensional symptoms and a range of freezing-related contexts that may occur in everyday life.
被动冻结行为是一种对感知或实际威胁的先天反应,其研究主要涉及其在面对迫在眉睫的危险时的身体表现(例如,强直性静止)。相对而言,很少有研究研究冻结现象的心理方面(例如,认知冻结和威胁评估),这些方面可能对冻结事件有重大影响。目前的研究考虑了冻结的维度,一组可能引起冻结的环境,以及冻结与其他内化症状或先前创伤性生活事件经历相关的方式。焦虑性冻结问卷(AFQ)采用大学样本(N = 653, N = 447, N = 590)编制。量表开发最佳实践的特点是三因素解决方案产生物理冻结、认知冻结和威胁评估因素,具有良好的信度和效度,与其他焦虑量表适度相关,但与其他焦虑量表区分开来。研究结果表明,社会评价和表现情境与冻结发作有关。结果显示,以往的创伤性事件经历与所有因素中较高水平的焦虑性冻结显著相关。该工具有望在考虑维度症状和日常生活中可能发生的一系列与冻结相关的背景的情况下,识别与焦虑相关的冻结概况的个体差异。
{"title":"Conceptualization and Measurement of Anxious Freezing.","authors":"Maya A Marder, Corey Richier, Gregory A Miller, Wendy Heller","doi":"10.1177/10731911251401405","DOIUrl":"https://doi.org/10.1177/10731911251401405","url":null,"abstract":"<p><p>Studies of passive freeze behavior, an innate reaction to perceived or actual threat, have largely been concerned with its physical manifestations in the face of imminent danger (e.g., tonic immobility). Relatively little work has examined psychological aspects of the freezing phenomenon (e.g., cognitive freezing and threat evaluation) that may contribute significantly to the freezing episode. The present research considers dimensions of freezing, a set of contexts that may elicit freezing, and ways freezing relates to other internalizing symptoms or previous experiences of traumatic life events. The Anxious Freezing Questionnaire (AFQ) was developed using university samples (<i>N</i> = 653, <i>N</i> = 447, <i>N</i> = 590). Scale development best practices characterized a three-factor solution yielding physical freezing, cognitive freezing, and threat evaluation factors with good reliability and validity that were moderately correlated with, yet distinguishable from, other anxiety scales. Findings indicate that social-evaluative and performance contexts are relevant for freezing episodes. Results showed that previous experiences of traumatic events were significantly associated with higher levels of anxious freezing across all factors. This instrument has promise for identifying individual differences in profiles of anxiety-related freezing, with consideration of dimensional symptoms and a range of freezing-related contexts that may occur in everyday life.</p>","PeriodicalId":8577,"journal":{"name":"Assessment","volume":" ","pages":"10731911251401405"},"PeriodicalIF":3.4,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145861982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-29DOI: 10.1177/10731911251401321
Kyle D Austin, Hannah K Crawley, William Fleeson, R Michael Furr
We propose, illustrate, and evaluate the use of artificial intelligence (AI) to advance rigorous hypothesis-driven scale validation. Using a qualitative approach, we found that AI provided useful suggestions for measures to be used as criteria in scale validation research. Using data and expert predictions previously used to validate nine scales/subscales, we evaluated AI's ability to produce precise, psychologically reasonable validity hypotheses. ChatGPT and Gemini produced hypotheses with "inter-trial consistency" similar to experts' "inter-rater consistency," and their hypotheses agreed strongly with experts' hypotheses. Importantly, their hypothesized validity correlations were roughly as accurate (in terms of corresponding with actual validity correlations) as were experts' hypotheses. Replicating across nine scales/subscales, results are encouraging regarding the use of AI to facilitate a precise hypothesis-driven approach to convergent and discriminant validity in a way that saves time with little-to-no cost in psychological or psychometric quality.
{"title":"Using Generative Artificial Intelligence to Advance Hypothesis-Driven Scale Validation: Identifying Criterion Measures and Generating Precise a Priori Hypotheses.","authors":"Kyle D Austin, Hannah K Crawley, William Fleeson, R Michael Furr","doi":"10.1177/10731911251401321","DOIUrl":"https://doi.org/10.1177/10731911251401321","url":null,"abstract":"<p><p>We propose, illustrate, and evaluate the use of artificial intelligence (AI) to advance rigorous hypothesis-driven scale validation. Using a qualitative approach, we found that AI provided useful suggestions for measures to be used as criteria in scale validation research. Using data and expert predictions previously used to validate nine scales/subscales, we evaluated AI's ability to produce precise, psychologically reasonable validity hypotheses. ChatGPT and Gemini produced hypotheses with \"inter-trial consistency\" similar to experts' \"inter-rater consistency,\" and their hypotheses agreed strongly with experts' hypotheses. Importantly, their hypothesized validity correlations were roughly as accurate (in terms of corresponding with actual validity correlations) as were experts' hypotheses. Replicating across nine scales/subscales, results are encouraging regarding the use of AI to facilitate a precise hypothesis-driven approach to convergent and discriminant validity in a way that saves time with little-to-no cost in psychological or psychometric quality.</p>","PeriodicalId":8577,"journal":{"name":"Assessment","volume":" ","pages":"10731911251401321"},"PeriodicalIF":3.4,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145853536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-29DOI: 10.1177/10731911251403907
Xinlei Zang, Juan Yang
The present study aimed to develop and validate a multimodal self-esteem recognition method based on a self-introduction task, with the goal of achieving automated self-esteem evaluation. We recruited two independent samples of undergraduate students (N = 211 and N = 63) and collected 40-second self-introduction videos along with Rosenberg Self-Esteem Scale (RSES) scores. Features were extracted from three modalities-visual, audio, and text-and three-class models were trained using the dataset of 211 participants. Results indicated that the late-fusion multimodal model achieved the highest performance (Accuracy, ACC = 0.447 ± 0.019; Macro-averaged F1, Macro-F1 = 0.438 ± 0.020) and further demonstrated cross-sample generalizability when validated on the independent sample of 63 participants (ACC = 0.381, Macro-F1 = 0.379). Reliability testing showed good interrater consistency (Fleiss' κ = 0.723, Intraclass Correlation Coefficient, ICC = 0.745). Criterion-related validity analyses indicated that the proposed method was significantly correlated with life satisfaction, subjective happiness, positive and negative affect, depression, anxiety, stress, relational self-esteem, and collective self-esteem. Moreover, incremental validity analyses indicated that the multimodal model provided additional predictive value for positive affect beyond the RSES. Taken together, these findings provide preliminary evidence that multimodal behavioral features can assist in achieving automated self-esteem evaluation, offering a feasible, low-burden complement to traditional self-report.
{"title":"Self-Esteem Assessment Based on Self-Introduction: A Multimodal Approach to Personality Computing.","authors":"Xinlei Zang, Juan Yang","doi":"10.1177/10731911251403907","DOIUrl":"https://doi.org/10.1177/10731911251403907","url":null,"abstract":"<p><p>The present study aimed to develop and validate a multimodal self-esteem recognition method based on a self-introduction task, with the goal of achieving automated self-esteem evaluation. We recruited two independent samples of undergraduate students (<i>N</i> = 211 and <i>N</i> = 63) and collected 40-second self-introduction videos along with Rosenberg Self-Esteem Scale (RSES) scores. Features were extracted from three modalities-visual, audio, and text-and three-class models were trained using the dataset of 211 participants. Results indicated that the late-fusion multimodal model achieved the highest performance (Accuracy, ACC = 0.447 ± 0.019; Macro-averaged F1, Macro-F1 = 0.438 ± 0.020) and further demonstrated cross-sample generalizability when validated on the independent sample of 63 participants (ACC = 0.381, Macro-F1 = 0.379). Reliability testing showed good interrater consistency (Fleiss' κ = 0.723, Intraclass Correlation Coefficient, ICC = 0.745). Criterion-related validity analyses indicated that the proposed method was significantly correlated with life satisfaction, subjective happiness, positive and negative affect, depression, anxiety, stress, relational self-esteem, and collective self-esteem. Moreover, incremental validity analyses indicated that the multimodal model provided additional predictive value for positive affect beyond the RSES. Taken together, these findings provide preliminary evidence that multimodal behavioral features can assist in achieving automated self-esteem evaluation, offering a feasible, low-burden complement to traditional self-report.</p>","PeriodicalId":8577,"journal":{"name":"Assessment","volume":" ","pages":"10731911251403907"},"PeriodicalIF":3.4,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145848769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-28DOI: 10.1177/10731911251401306
Pinar Toptas, Tycho J Dekkers, Annabeth P Groenman, Geraldina F Gaastra, Dick de Waard, Anselm B M Fuermaier
Assessing the credibility of presented problems is an essential part of the clinical evaluation of attention-deficit/hyperactivity disorder (ADHD) in adulthood. We conducted a systematic review and meta-analysis to examine Continuous Performance Tests (CPTs) as embedded validity indicators. Eighteen studies (n = 3,021; 67 effect sizes) were analyzed: eight simulation studies and ten criterion studies. Moderating variables included study design (simulation vs. criterion) and sample type (student vs. nonstudent). CPTs effectively distinguish between credible and noncredible performance (g = 0.73). Effect sizes were nearly twice as large in simulation studies (g = 0.94) compared to criterion group studies (g = 0.55), underscoring the influence of study design on the interpretation of research findings. Student and nonstudent groups did not differ significantly. CPTs are valuable as embedded validity indicators. Given the moderate effects, clinical decisions should not rely on a single CPT but on a variety of measures.
{"title":"Evaluating Continuous Performance Tests as Embedded Measures of Performance Validity in ADHD Assessments: A Systematic Review and Meta-Analysis.","authors":"Pinar Toptas, Tycho J Dekkers, Annabeth P Groenman, Geraldina F Gaastra, Dick de Waard, Anselm B M Fuermaier","doi":"10.1177/10731911251401306","DOIUrl":"10.1177/10731911251401306","url":null,"abstract":"<p><p>Assessing the credibility of presented problems is an essential part of the clinical evaluation of attention-deficit/hyperactivity disorder (ADHD) in adulthood. We conducted a systematic review and meta-analysis to examine Continuous Performance Tests (CPTs) as embedded validity indicators. Eighteen studies (<i>n</i> = 3,021; 67 effect sizes) were analyzed: eight simulation studies and ten criterion studies. Moderating variables included study design (simulation vs. criterion) and sample type (student vs. nonstudent). CPTs effectively distinguish between credible and noncredible performance (<i>g</i> = 0.73). Effect sizes were nearly twice as large in simulation studies (<i>g</i> = 0.94) compared to criterion group studies (<i>g</i> = 0.55), underscoring the influence of study design on the interpretation of research findings. Student and nonstudent groups did not differ significantly. CPTs are valuable as embedded validity indicators. Given the moderate effects, clinical decisions should not rely on a single CPT but on a variety of measures.</p>","PeriodicalId":8577,"journal":{"name":"Assessment","volume":" ","pages":"10731911251401306"},"PeriodicalIF":3.4,"publicationDate":"2025-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145848785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-26DOI: 10.1177/10731911251403897
Kelly E Dixon, Andrew Lac
The present study sought to develop and validate a novel multidimensional assessment of substance use (SU) coping motives to manage trauma symptoms. In Study 1 (N = 326 trauma-exposed adults recruited from several online platforms), a set of questionnaire items was created and administered, and exploratory factor analysis was performed. A correlated four-factor structure represented by cognitive-affective motives, physiological motives, sleep motives, and social motives emerged. In Study 2 (N = 261 trauma-exposed adults recruited from ResearchMatch), confirmatory factor analysis cross-validated the correlated four-factor structure and additionally tested a five-factor higher-order structure. In tests of convergent, discriminant, and criterion validities, the subscales demonstrated differential correlations with previously validated measures of SU motives and positively correlated with higher PTSD symptom severity, functional impairment, and alcohol and drug use severity. The final 31-item Motives for Using Substances for Trauma Coping (MUST-Cope) Questionnaire offers a novel multifactorial measurement instrument to help researchers and clinicians assess and identify functional coping motives for SU that can be targeted in psychosocial treatment.
{"title":"Development and Validation of the Motives for Using Substances for Trauma Coping (MUST-Cope) Questionnaire: A Novel Multidimensional Scale to Assess Trauma-Specific Substance Use Coping Motives.","authors":"Kelly E Dixon, Andrew Lac","doi":"10.1177/10731911251403897","DOIUrl":"https://doi.org/10.1177/10731911251403897","url":null,"abstract":"<p><p>The present study sought to develop and validate a novel multidimensional assessment of substance use (SU) coping motives to manage trauma symptoms. In Study 1 (<i>N</i> = 326 trauma-exposed adults recruited from several online platforms), a set of questionnaire items was created and administered, and exploratory factor analysis was performed. A correlated four-factor structure represented by cognitive-affective motives, physiological motives, sleep motives, and social motives emerged. In Study 2 (<i>N</i> = 261 trauma-exposed adults recruited from ResearchMatch), confirmatory factor analysis cross-validated the correlated four-factor structure and additionally tested a five-factor higher-order structure. In tests of convergent, discriminant, and criterion validities, the subscales demonstrated differential correlations with previously validated measures of SU motives and positively correlated with higher PTSD symptom severity, functional impairment, and alcohol and drug use severity. The final 31-item Motives for Using Substances for Trauma Coping (MUST-Cope) Questionnaire offers a novel multifactorial measurement instrument to help researchers and clinicians assess and identify functional coping motives for SU that can be targeted in psychosocial treatment.</p>","PeriodicalId":8577,"journal":{"name":"Assessment","volume":" ","pages":"10731911251403897"},"PeriodicalIF":3.4,"publicationDate":"2025-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145832998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-26DOI: 10.1177/10731911251399030
Jayden Lucas, Jeffery M Lackner, Gregory Gudleski, Andrew H Rogers, Rodrigo Becerra, Kristin Naragon-Gainey
There are a plethora of "flexibility" constructs and measures in psychology, but the extent to which they assess the same or different constructs, and whether flexibility and inflexibility are separate constructs (vs. extremes of the same bipolar continuum), remains underexplored. We examined the distinctiveness of seven different self-report measures of psychological (in)flexibility and cognitive flexibility using an online community (N = 465) and a chronic pain sample (N = 445). We analyzed the latent structure of these questionnaires using item-level exploratory structural equation modeling that controlled for measure-specific variance, and we tested these factors in relation to a range of mental health outcomes (concurrent validity) and discriminant validity measures. Findings indicate that psychological and cognitive flexibility questionnaires can be characterized at multiple levels, including six lower-order components that span individual measures and global factors that account for their shared variance. The six factors were broadly and uniquely associated with clinically relevant variables, including symptoms and well-being. We also found support for the notion that flexibility and inflexibility exist on a single bipolar continuum, rather than being characterized as separate. Implications for clinical assessment in research and intervention settings are discussed.
{"title":"One Construct or Many? Clarifying the Structure and Meaning of Measures of Psychological and Cognitive Flexibility and Their Components in a Community and Chronic Pain Sample.","authors":"Jayden Lucas, Jeffery M Lackner, Gregory Gudleski, Andrew H Rogers, Rodrigo Becerra, Kristin Naragon-Gainey","doi":"10.1177/10731911251399030","DOIUrl":"https://doi.org/10.1177/10731911251399030","url":null,"abstract":"<p><p>There are a plethora of \"flexibility\" constructs and measures in psychology, but the extent to which they assess the same or different constructs, and whether flexibility and inflexibility are separate constructs (vs. extremes of the same bipolar continuum), remains underexplored. We examined the distinctiveness of seven different self-report measures of psychological (in)flexibility and cognitive flexibility using an online community (<i>N</i> = 465) and a chronic pain sample (<i>N</i> = 445). We analyzed the latent structure of these questionnaires using item-level exploratory structural equation modeling that controlled for measure-specific variance, and we tested these factors in relation to a range of mental health outcomes (concurrent validity) and discriminant validity measures. Findings indicate that psychological and cognitive flexibility questionnaires can be characterized at multiple levels, including six lower-order components that span individual measures and global factors that account for their shared variance. The six factors were broadly and uniquely associated with clinically relevant variables, including symptoms and well-being. We also found support for the notion that flexibility and inflexibility exist on a single bipolar continuum, rather than being characterized as separate. Implications for clinical assessment in research and intervention settings are discussed.</p>","PeriodicalId":8577,"journal":{"name":"Assessment","volume":" ","pages":"10731911251399030"},"PeriodicalIF":3.4,"publicationDate":"2025-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145833058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-25DOI: 10.1177/10731911251398039
Samantha Cacace, Robert J Cramer, Max Stivers, Raymond P Tucker, Marcus VanSickle
U.S. military populations experience a high level of mental health concerns, including post-traumatic stress disorder, clinical depression, and suicide, when compared with their civilian counterparts, and tend to access mental health services at a lower rate. Military health scholars have noted that stigma against mental health help-seeking has multiple sources, including professional, personal, and social components, though these components are rarely separated in examining why military service members avoid clinical help. Valid measurement of these factors is necessary to examine the heart of rising clinical needs. The current study replicates and extends prior work applying a bifactor model to the Military Stigma Scale (MSS). In a sample of n = 1,832 Army National Guard members, a bifactor model presented acceptable fit, though invariance testing by rank and education indicates disparate experiences with military service as deviating influences. Specifically, Private Stigma was significantly lower in higher paygrade service members and those with a college degree, while Public Stigma was higher. Results call into question the theoretical viability of a bifactor model of the MSS, especially in the evaluation of Expected Common Variance and specific factor reliability.
{"title":"A Psychometric Analysis of the Military Stigma Scale.","authors":"Samantha Cacace, Robert J Cramer, Max Stivers, Raymond P Tucker, Marcus VanSickle","doi":"10.1177/10731911251398039","DOIUrl":"https://doi.org/10.1177/10731911251398039","url":null,"abstract":"<p><p>U.S. military populations experience a high level of mental health concerns, including post-traumatic stress disorder, clinical depression, and suicide, when compared with their civilian counterparts, and tend to access mental health services at a lower rate. Military health scholars have noted that stigma against mental health help-seeking has multiple sources, including professional, personal, and social components, though these components are rarely separated in examining why military service members avoid clinical help. Valid measurement of these factors is necessary to examine the heart of rising clinical needs. The current study replicates and extends prior work applying a bifactor model to the Military Stigma Scale (MSS). In a sample of <i>n</i> = 1,832 Army National Guard members, a bifactor model presented acceptable fit, though invariance testing by rank and education indicates disparate experiences with military service as deviating influences. Specifically, <i>Private Stigma</i> was significantly lower in higher paygrade service members and those with a college degree, while <i>Public Stigma</i> was higher. Results call into question the theoretical viability of a bifactor model of the MSS, especially in the evaluation of Expected Common Variance and specific factor reliability.</p>","PeriodicalId":8577,"journal":{"name":"Assessment","volume":" ","pages":"10731911251398039"},"PeriodicalIF":3.4,"publicationDate":"2025-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145832978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Validity of caregiver response to a developmental screening instrument was examined in 571 caregivers (51.7% identifying as ethnic minority) of infants/toddlers (48% female) assessed longitudinally from birth to 18 months. Three embedded validity scales were designed to detect: atypical (ATP), negative (NRS), and positive (PRS) response styles. Rates of responding on the ATP, NRS, and PRS scales relative to established validity measures, temporal stability including test-retest reliability of the scales, and relations between response styles and maternal education were examined. Response bias was low; however, significant differences due to maternal education were evident. More variable scores (ATP) and more advanced development (PRS) was consistently reported by caregivers with lower education. Caregivers with higher education reported their infants' development as less advanced (NRS). Base rates of uncommon responding ranged from 11.6% to 14.4% and 5.8% to 9.1% at liberal and conservative cut scores. Preliminary analysis of additional social-contextual sources of variation (e.g., caregiver mental health) in response styles suggests the need for complex modeling of multiple sources of bias in caregiver-reported developmental outcomes. These are the first embedded validity scales to be designed within a caregiver-reported instrument of infant/toddler development.
{"title":"Embedded Validity Scales to Examine Caregiver Response Styles When Measuring Infant/Toddler Developmental Status.","authors":"Renee Lajiness-O'Neill, Michelle Lobermeier, Angela D Staples, Annette Richard, Alissa Huth-Bocks, Seth Warschausky, H Gerry Taylor, Natasha Lang, Angela Lukomski, Laszlo Erdodi","doi":"10.1177/10731911251391563","DOIUrl":"https://doi.org/10.1177/10731911251391563","url":null,"abstract":"<p><p>Validity of caregiver response to a developmental screening instrument was examined in 571 caregivers (51.7% identifying as ethnic minority) of infants/toddlers (48% female) assessed longitudinally from birth to 18 months. Three embedded validity scales were designed to detect: atypical (ATP), negative (NRS), and positive (PRS) response styles. Rates of responding on the ATP, NRS, and PRS scales relative to established validity measures, temporal stability including test-retest reliability of the scales, and relations between response styles and maternal education were examined. Response bias was low; however, significant differences due to maternal education were evident. More variable scores (ATP) and more advanced development (PRS) was consistently reported by caregivers with lower education. Caregivers with higher education reported their infants' development as less advanced (NRS). Base rates of uncommon responding ranged from 11.6% to 14.4% and 5.8% to 9.1% at liberal and conservative cut scores. Preliminary analysis of additional social-contextual sources of variation (e.g., caregiver mental health) in response styles suggests the need for complex modeling of multiple sources of bias in caregiver-reported developmental outcomes. These are the first embedded validity scales to be designed within a caregiver-reported instrument of infant/toddler development.</p>","PeriodicalId":8577,"journal":{"name":"Assessment","volume":" ","pages":"10731911251391563"},"PeriodicalIF":3.4,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145803146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}