Katie Aafjes-van Doorn, Marcelo Cicconet, Jordan Bate, Jeffrey F Cohn, Marc Aafjes
{"title":"Development of an artificial intelligence-based measure of therapists' skills: A multimodal proof of concept.","authors":"Katie Aafjes-van Doorn, Marcelo Cicconet, Jordan Bate, Jeffrey F Cohn, Marc Aafjes","doi":"10.1037/pst0000561","DOIUrl":null,"url":null,"abstract":"<p><p>The facilitative interpersonal skills (FIS) task is a performance-based task designed to assess clinicians' capacity for facilitating a collaborative relationship. Performance on FIS is a robust clinician-level predictor of treatment outcomes. However, the FIS task has limited scalability because human rating of FIS requires specialized training and is time-intensive. We aimed to catalyze a \"big needle jump\" by developing an artificial intelligence- (AI-) based automated FIS measurement that captures all behavioral audiovisual markers available to human FIS raters. A total of 956 response clips were collected from 78 mental health clinicians. Three human raters rated the eight FIS subscales and reached sufficient interrater reliability (intraclass correlation based on three raters [ICC3k] for overall FIS = 0.85). We extracted text-, audio-, and video-based features and applied multimodal modeling (multilayer perceptron with a single hidden layer) to predict overall FIS and eight FIS subscales rated along a 1-5 scale continuum. We conducted 10-fold cross-validation analyses. For overall FIS, we reached moderate size relationships with the human-based ratings (Spearman's ρ = .50). Performance for subscales was variable (Spearman's ρ from .30 to .61). Inclusion of audio and video modalities improved the accuracy of the model, especially for the Emotional Expression and Verbal Fluency subscales. All three modalities contributed to the prediction performance, with text-based features contributing relatively most. Our multimodal model performed better than previously published unimodal models on the overall FIS and some FIS subscales. If confirmed in external validation studies, this AI-based FIS measurement may be used for the development of feedback tools for more targeted training, supervision, and deliberate practice. (PsycInfo Database Record (c) 2025 APA, all rights reserved).</p>","PeriodicalId":20910,"journal":{"name":"Psychotherapy","volume":" ","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychotherapy","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1037/pst0000561","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PSYCHOLOGY, CLINICAL","Score":null,"Total":0}
引用次数: 0
Abstract
The facilitative interpersonal skills (FIS) task is a performance-based task designed to assess clinicians' capacity for facilitating a collaborative relationship. Performance on FIS is a robust clinician-level predictor of treatment outcomes. However, the FIS task has limited scalability because human rating of FIS requires specialized training and is time-intensive. We aimed to catalyze a "big needle jump" by developing an artificial intelligence- (AI-) based automated FIS measurement that captures all behavioral audiovisual markers available to human FIS raters. A total of 956 response clips were collected from 78 mental health clinicians. Three human raters rated the eight FIS subscales and reached sufficient interrater reliability (intraclass correlation based on three raters [ICC3k] for overall FIS = 0.85). We extracted text-, audio-, and video-based features and applied multimodal modeling (multilayer perceptron with a single hidden layer) to predict overall FIS and eight FIS subscales rated along a 1-5 scale continuum. We conducted 10-fold cross-validation analyses. For overall FIS, we reached moderate size relationships with the human-based ratings (Spearman's ρ = .50). Performance for subscales was variable (Spearman's ρ from .30 to .61). Inclusion of audio and video modalities improved the accuracy of the model, especially for the Emotional Expression and Verbal Fluency subscales. All three modalities contributed to the prediction performance, with text-based features contributing relatively most. Our multimodal model performed better than previously published unimodal models on the overall FIS and some FIS subscales. If confirmed in external validation studies, this AI-based FIS measurement may be used for the development of feedback tools for more targeted training, supervision, and deliberate practice. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
期刊介绍:
Psychotherapy Theory, Research, Practice, Training publishes a wide variety of articles relevant to the field of psychotherapy. The journal strives to foster interactions among individuals involved with training, practice theory, and research since all areas are essential to psychotherapy. This journal is an invaluable resource for practicing clinical and counseling psychologists, social workers, and mental health professionals.