{"title":"自动视频面试够智能吗?机器学习认知能力评估的行为模式、可靠性、有效性和偏差。","authors":"Louis Hickman,Louis Tay,Sang Eun Woo","doi":"10.1037/apl0001236","DOIUrl":null,"url":null,"abstract":"Automated video interviews (AVIs) that use machine learning (ML) algorithms to assess interviewees are increasingly popular. Extending prior AVI research focusing on noncognitive constructs, the present study critically evaluates the possibility of assessing cognitive ability with AVIs. By developing and examining AVI ML models trained to predict measures of three cognitive ability constructs (i.e., general mental ability, verbal ability, and intellect [as observed at zero acquaintance]), this research contributes to the literature in several ways. First, it advances our understanding of how cognitive abilities relate to interviewee behavior. Specifically, we found that verbal behaviors best predicted interviewee cognitive abilities, while neither paraverbal nor nonverbal behaviors provided incremental validity, suggesting that only verbal behaviors should be used to assess cognitive abilities. Second, across two samples of mock video interviews, we extensively evaluated the psychometric properties of the verbal behavior AVI ML model scores, including their reliability (internal consistency across interview questions and test-retest), validity (relationships with other variables and content), and fairness and bias (measurement and predictive). Overall, the general mental ability, verbal ability, and intellect AVI models captured similar behavioral manifestations of cognitive ability. Validity evidence results were mixed: For example, AVIs trained on observer-rated intellect exhibited superior convergent and criterion relationships (compared to the observer ratings they were trained to model) but had limited discriminant validity evidence. Our findings illustrate the importance of examining psychometric properties beyond convergence with the test that ML algorithms are trained to model. We provide recommendations for enhancing discriminant validity evidence in future AVIs. (PsycInfo Database Record (c) 2024 APA, all rights reserved).","PeriodicalId":15135,"journal":{"name":"Journal of Applied Psychology","volume":"217 1","pages":""},"PeriodicalIF":9.4000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Are automated video interviews smart enough? Behavioral modes, reliability, validity, and bias of machine learning cognitive ability assessments.\",\"authors\":\"Louis Hickman,Louis Tay,Sang Eun Woo\",\"doi\":\"10.1037/apl0001236\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automated video interviews (AVIs) that use machine learning (ML) algorithms to assess interviewees are increasingly popular. Extending prior AVI research focusing on noncognitive constructs, the present study critically evaluates the possibility of assessing cognitive ability with AVIs. By developing and examining AVI ML models trained to predict measures of three cognitive ability constructs (i.e., general mental ability, verbal ability, and intellect [as observed at zero acquaintance]), this research contributes to the literature in several ways. First, it advances our understanding of how cognitive abilities relate to interviewee behavior. Specifically, we found that verbal behaviors best predicted interviewee cognitive abilities, while neither paraverbal nor nonverbal behaviors provided incremental validity, suggesting that only verbal behaviors should be used to assess cognitive abilities. Second, across two samples of mock video interviews, we extensively evaluated the psychometric properties of the verbal behavior AVI ML model scores, including their reliability (internal consistency across interview questions and test-retest), validity (relationships with other variables and content), and fairness and bias (measurement and predictive). Overall, the general mental ability, verbal ability, and intellect AVI models captured similar behavioral manifestations of cognitive ability. Validity evidence results were mixed: For example, AVIs trained on observer-rated intellect exhibited superior convergent and criterion relationships (compared to the observer ratings they were trained to model) but had limited discriminant validity evidence. Our findings illustrate the importance of examining psychometric properties beyond convergence with the test that ML algorithms are trained to model. We provide recommendations for enhancing discriminant validity evidence in future AVIs. (PsycInfo Database Record (c) 2024 APA, all rights reserved).\",\"PeriodicalId\":15135,\"journal\":{\"name\":\"Journal of Applied Psychology\",\"volume\":\"217 1\",\"pages\":\"\"},\"PeriodicalIF\":9.4000,\"publicationDate\":\"2024-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Applied Psychology\",\"FirstCategoryId\":\"102\",\"ListUrlMain\":\"https://doi.org/10.1037/apl0001236\",\"RegionNum\":1,\"RegionCategory\":\"心理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MANAGEMENT\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Applied Psychology","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1037/apl0001236","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MANAGEMENT","Score":null,"Total":0}
引用次数: 0
摘要
使用机器学习(ML)算法评估受访者的自动视频访谈(AVI)越来越受欢迎。本研究扩展了之前针对非认知建构的 AVI 研究,对使用 AVI 评估认知能力的可能性进行了批判性评估。通过开发和检验经过训练的 AVI ML 模型来预测三种认知能力结构(即一般心智能力、言语能力和智力[在零认识时观察到的])的测量结果,本研究在几个方面对文献做出了贡献。首先,它加深了我们对认知能力与受访者行为之间关系的理解。具体来说,我们发现言语行为最能预测受访者的认知能力,而准言语行为和非言语行为都不能提供增量有效性,这表明只应使用言语行为来评估认知能力。其次,在两个模拟视频面试样本中,我们广泛评估了言语行为 AVI ML 模型得分的心理测量特性,包括其可靠性(不同面试问题之间的内部一致性和重测)、有效性(与其他变量和内容之间的关系)以及公平性和偏差(测量和预测)。总体而言,一般心智能力、言语能力和智力 AVI 模型捕捉到了认知能力的类似行为表现。有效性证据结果不一:例如,根据观察者评定的智力进行训练的 AVIs 表现出较好的收敛性和标准性关系(与观察者评定的智力相比),但其判别效度证据有限。我们的研究结果表明,在对 ML 算法进行建模训练时,除了对测试的收敛性进行检查外,还必须对心理测量特性进行检查。我们为在未来的 AVI 中增强判别有效性证据提供了建议。(PsycInfo Database Record (c) 2024 APA, 版权所有)。
Are automated video interviews smart enough? Behavioral modes, reliability, validity, and bias of machine learning cognitive ability assessments.
Automated video interviews (AVIs) that use machine learning (ML) algorithms to assess interviewees are increasingly popular. Extending prior AVI research focusing on noncognitive constructs, the present study critically evaluates the possibility of assessing cognitive ability with AVIs. By developing and examining AVI ML models trained to predict measures of three cognitive ability constructs (i.e., general mental ability, verbal ability, and intellect [as observed at zero acquaintance]), this research contributes to the literature in several ways. First, it advances our understanding of how cognitive abilities relate to interviewee behavior. Specifically, we found that verbal behaviors best predicted interviewee cognitive abilities, while neither paraverbal nor nonverbal behaviors provided incremental validity, suggesting that only verbal behaviors should be used to assess cognitive abilities. Second, across two samples of mock video interviews, we extensively evaluated the psychometric properties of the verbal behavior AVI ML model scores, including their reliability (internal consistency across interview questions and test-retest), validity (relationships with other variables and content), and fairness and bias (measurement and predictive). Overall, the general mental ability, verbal ability, and intellect AVI models captured similar behavioral manifestations of cognitive ability. Validity evidence results were mixed: For example, AVIs trained on observer-rated intellect exhibited superior convergent and criterion relationships (compared to the observer ratings they were trained to model) but had limited discriminant validity evidence. Our findings illustrate the importance of examining psychometric properties beyond convergence with the test that ML algorithms are trained to model. We provide recommendations for enhancing discriminant validity evidence in future AVIs. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
期刊介绍:
The Journal of Applied Psychology® focuses on publishing original investigations that contribute new knowledge and understanding to fields of applied psychology (excluding clinical and applied experimental or human factors, which are better suited for other APA journals). The journal primarily considers empirical and theoretical investigations that enhance understanding of cognitive, motivational, affective, and behavioral psychological phenomena in work and organizational settings. These phenomena can occur at individual, group, organizational, or cultural levels, and in various work settings such as business, education, training, health, service, government, or military institutions. The journal welcomes submissions from both public and private sector organizations, for-profit or nonprofit. It publishes several types of articles, including:
1.Rigorously conducted empirical investigations that expand conceptual understanding (original investigations or meta-analyses).
2.Theory development articles and integrative conceptual reviews that synthesize literature and generate new theories on psychological phenomena to stimulate novel research.
3.Rigorously conducted qualitative research on phenomena that are challenging to capture with quantitative methods or require inductive theory building.